Graphics and Media • 1:05:16
QuickTime 7 seamlessly leverages the latest graphics advancements of Mac OS X. Learn about QuickTime 7's own rendering pipeline and how you can use the power of the new Visual Context together with Core Video, Core Image and OpenGL to create amazing video applications. Also learn how to directly access the core media technologies to customize your own powerful rendering pipelines.
Speakers: Sean Gies, Frank Doepke
Unlisted on Apple Developer site
Transcript
This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.
Good morning everyone. Hope you're enjoying the conference so far. My name is Sean Gies. I'm one of the video engineers in the QuickTime team and I'd like to welcome you to our session on High Performance Video in Mac OS X Tiger. So I see we've got some pretty good turnout here.
You'll be glad to know we're going to be talking about a lot of the same things we covered last year at WWDC. So not going to miss anything. We're also going to try to answer a lot of the questions that we've been receiving over the past year. So hope you enjoy.
So before we get started in the new video pipeline, let's talk about where video was before Tiger. So this is our old rendering model. You see we have Quicktime, the Image Compression Manager, and Codecs. They were all talking through Quickdraw. And in the APIs, that would be the Graphport and the G-World and so forth. That worked pretty well.
But it wasn't quite optimal. So this transfer codec was introduced. Now, this is a sneaky codec. It could see that G-World and go, I can do better. I'll talk directly to the hardware and get that video to the screen a little bit faster, maybe support more native YCVCR pixel formats than you would have had through the G-World normally. And this gave you a little more performance. Unfortunately, it's not too flexible.
The video would go straight to the graphics hardware and there wasn't really a way for your application to get in there and do interesting things with it. So we realized we had to redesign this a bit. And when we did so, we had some goals. One of which was we needed to have very good integration with OpenGL. Now OpenGL is our connection to the graphics hardware. And it's how we can take advantage of those new graphics processors and all the programmability and just raw horsepower available. So that was one thing.
We also wanted to have it to be layered on top of Quartz. And this was sort of a way to not marry QuickTime to OpenGL, you know, kind of how the way it's been married to QuickDraw. And it gives us another layer that you can hook into. This is where things like Core Video and Core Image technologies are.
It allows QuickTime to leverage those as well. Now we also wanted support for multiple buffers. This was another important problem with the old model where a codec would be decompressing into this buffer and before it could start the next frame, it had to wait for the graphics hardware to go and download all those pixels onto the graphics card. And there's this dead time in between where the codec could not begin doing work on the next frame. So that was the thing we had to clear up.
And architecturally, we want to separate the decoding of video from the presentation of video. Now this is really where we allow your applications to hook in, in between there. So in the old model, when the codecs were finished decompressing the data, well those pixels were already on the screen. I mean there was nowhere for you to get in there and interrupt.
So here's sort of a logical view of this new pipeline. So we have QuickTime on top sitting on Quartz, which is core graphics, core image and core video, all of which talk through OpenGL to get to the graphics hardware. So how about a little more dynamic view of this pipeline, a little more useful? So you got the movie, which has your video frames, and you have some graphics hardware you need to get that to.
So QuickTime provides the video and we use OpenGL to render to the graphics hardware. So how do we get the frames out of that movie? Well, we've introduced a new construct called a Visual Context. Now this is like a tap into the movie through which you can extract video frames.
And you'll notice here there's this gap. We need to bridge the QuickTime APIs, which are rather high level, over to the OpenGL APIs. They're completely different styles of programming. So this is where we've introduced Core Video. It connects these two very different worlds. And it provides timing services so you know when to draw and buffering services which contain what you're going to draw.
And in this gap you can continue to fill in things. So we have a place where your application can hook in. You can do additional OpenGL transformations or you can pull in Core Image and do, you know, crazy effects. Now these things are optional. This is where you can do whatever you want and it doesn't need to be there if your application is just, you know, playing back video simply.
So what does this give us? Well, there's a couple facets to the performance enhancements. You have more pipelining. This is where the CPU has been freed up so the codec can be working at the same time that the graphics hardware is downloading pixels. So this becomes relative, or actually very important for high definition video where you have a lot more pixels. And the compression technology takes a lot more time to compute. And because it's high definition and there's so much more video, well that dead time while it's uploading that video becomes even more important.
The other facet has to do with taking advantage of that GPU. So now that we have OpenGL under the hood, under QuickTime, it becomes very easy to leverage the GPU and do things like color sync, do color matching in real time, adding filters and effects to the video, or just fun new things that weren't really possible before.
So, I realize there's a class of applications who merely want to play back video efficiently. They want to play back that high definition H.264 clip without dropping frames on the floor. And for that, it's very easy. Cocoa developers, if you adopt QtKit, you can use the QtMovieView object.
Carbon developers, if you've adopted the high level toolbox, you can use H.I.MovieView. Now both of these views are implemented to use the new video pipeline. So it's going to take advantage of OpenGL and the GPU and all that kind of stuff. And it's pretty much free. With Cocoa, you can do this stuff in Interface Builder and have it playing back without writing any code.
So, back to our diagram. The view basically takes care of everything here. You give it a movie, it does all the work of pulling out the frames, dealing with the timing, managing buffers and then drawing them to the screen for you. So with that, I'd like to have a little demo showing you how QuickTime Player is doing this. Demo 1 please. All right, so I'm going to open a movie here. This is an H.264 clip. It is 960 by 540.
So QuickTime Player is a Cocoa application now and it is merely a client of the Qt MovieView object that I just mentioned. So QuickTime Player has no idea how that video is getting to the screen, it's just relying on the MovieView to do it for it. And we have this great new live resizing feature. Now, the interesting thing here, compared to before, the codec has no idea that I'm changing the size of this window.
It continues to decode frames at its native resolution very efficiently. Whereas in the old model, every time you needed to resize the video, you tore down the codec chain, reinitialized it to decompress at the new size and that was a lot of overhead. Now, only OpenGL knows that we need to change the size of this to surface on the screen. And the codec just carries on doing everything in real time.
So we also have Color Sync being used here. Now this video is tagged with the information that describes the high definition TV color profile and our display has been calibrated for the characteristics of this display and we're getting Color Sync for free and it's all being color matched in real time. Now, since we're using the GPU, we can do a few other things. Here's a little script that makes this easier for me, which takes that movie and creates a new one from it.
This movie is the same but it has two tracks now. It's that same clip but below and flipped. So this work of compositing the video is happening on the GPU. It has two streams of video frames coming in and we have an off-screen buffer on the graphics card into which we're rendering these two streams of video. And it's all happening in real time and again the view doesn't really know it's happening, it's just you get these larger textures out of the movie.
And as well, on top of that, we're using the GPU to do other operations like these video controls where we're changing the color, saturation, contrast, tint, all that. And this is not affecting our CPU at all because all the work is being done on that graphics processor. It's basically get that for free.
And any application that you use QtKit, you're going to get this behavior, you know, out of the box. These APIs are public, you can tell the movie, you know, be twice as bright. All right, so that was a demo. QuickTime Player using QtKit. Back to the slides, please.
All right, so not all applications want to merely play video in a rectangle on the screen. Maybe you want to do some additional video processing. You want to apply some effects to that video, make it look cool. Or maybe you want to draw additional elements on top of the video. Or maybe it's the inverse of that. Maybe you have other drawing elements, and you want the video to be something embedded in that larger scene. Well, you can't use the view to do this.
You're going to have to implement some of its behavior. You're going to have to learn OpenGL or Core Image and do that rendering yourself. And on hardware that can't support this, you're going to have to fall back to the G World case So, let me show you a few examples of this working. Back to demo one please.
So this is an application I showed last year. And the point of this is to show that we're using OpenGL. Got that same movie. It's continuing to be composited in real time. And these OpenGL transformations are trivial. You're just rotating this thing around. And to really showcase the horsepower of these GPUs, showing how much video we can render at the same time. So you can have quite a few frames there.
So instead of just rendering a frame and then releasing it to be recycled later, this application holds on to a few frames and continues to render them over and over and over. And you get an interesting effect. I mean, you can see. You can, I think I have about 30 frames here. Playing back, no problem. So these graphics processors have a whole lot of horsepower.
The other application you may have seen in the previous session this week is this jigsaw puzzle. So I have a high definition clip here. I believe this is a 1280 by 720 progressive. So, it's a little jigsaw puzzle, and this was relatively easy to write. The most difficult part was, you know, making those shapes, but I'd like to explain what's going on here. So, we have the video playing to a texture, or really a series of textures, and every time I get a new texture, I have to draw this whole puzzle from scratch.
So, for each piece, I have a mask. It's a little grayscale image that is the shape of the puzzle piece. And first, I render that using core image to get that shadow effect. It's being run through a Gaussian blur filter of a radius of maybe two or so.
And then, I take that same mask and I draw the video and the mask simultaneously to cut out a piece of the video and draw it on top of where that shadow was, and just repeat that process for every piece. And you know, it's quite, you know, it's an interesting effect. So the gratuitous little cute demo. Okay. And if this is going to work, I hope it does.
So we got here some live video coming off this i80c camera here. So instead of using a movie as a source, I'm using a sequence grabber to get this video off of the camera and run it through the exact same code that was doing all that rendering before. So, live video, movies, all working through the same video pipeline. Alright. Back to slides please.
Slides please. All right, so that just shows some of the fun new things that really weren't possible before with the old rendering pipeline. So how does this work? Well, the view's not going to help you. That's right out. So you have to get underneath this thing and figure out what is it doing? How is it getting those frames to the screen? And how can you do that? So first, you need a movie. Now you may wonder why this is about rendering. We're opening a movie, OK.
There's an interesting new semantic that we've introduced that is important to video rendering when it comes to opening your movie. We've added an API called newMovieFromProperties. And it's a replacement for all the other new movie calls that are available in QuickTime. I don't know, all 10, 12, how many there are.
And you use this API by specifying a list of properties that describe how to open the movie. Maybe you're telling it open from a file, or open from a URL, or open from a handle, or data reference. And you can also supply options, like I want this movie to be active immediately, or allow it to download progressively and continue asynchronously.
But the important thing here is that the movie will not necessarily inherit the current G world as is the case with all the other calls. That was a very subtle side effect of all the other ones is that upon creation the movie would be ready to start drawing to that G world that would happen to be active at the time you created the movie.
So with this new call you can specify a visual context at creation time and the movie will go there instead of to the G world. Now if you're using QtKit, the QtMovie class will be using this under the hood so you don't need to worry about it there.
So you got the movie. Now you need a visual context so you can get those frames out. First, what's a visual context? Well, this is our abstract rendering destination for QuickTime movies. And it's our replacement for the G world. One of the biggest differences between Visual Context and the G World as far as writing your code is concerned is that your application is now responsible for getting those pixels to the screen.
When you use the G World, you just told the movie, render to this G World and it did all the work of getting those pixels to the screen. When you use a Visual Context, you merely get a frame of video out and it's up to you to get it to the screen.
Now this is more responsibility but it lets you do all those interesting things and it gives us the advantages of the GPU and so you can take advantage of all that horsepower down there. It enables us to have multiple buffers in flight so the codec is no longer constrained, you know, waiting for the upload before it can start the next decode. And it's not restricted to certain types of movies for playback to an OpenGL texture.
We've talked about some previous techniques for doing this and they were typically limited to single track video. So you can have a bunch of video media pieces of content but this is not supposed to be restricted. This will work with, you know, multi-track movies as we've seen here in the demo.
It'll work with different types of media so you don't need to worry about that anymore. It gives you more asynchronous processing so this is, you know, I talked about before, the codec can continue working while you're rendering with the processor, the graphics processor. So how do you create one? We have this API, QT OpenGL Texture Context Create, and it will create you a type of visual context through which you can pull out OpenGL textures for the video frames.
Now obviously you're going to need OpenGL to use this. So you're going to need the CGL context and the CGL pixel format. Now these are the core OpenGL API types. And unless you're doing only full screen or only off screen OpenGL rendering, you're probably going to be talking to OpenGL through Cocoa or Carbon. Now Cocoa uses the NSOpenGL objects and Carbon you'll be using AGL. Both of these have ways to get the underlying core OpenGL objects, which you will give to QuickTime.
Now it's important to note that this call can fail if one of your displays does not support Quartz Extreme. So you've got to be prepared for this and possibly fall back to using G-Worlds if you need to continue to support that hardware. So you have the Visual Context, you have the movie, you need to connect them.
So that's what setMovieVisualContext is for. It's the replacement for setMovieGWorld. And quite simply, it points that movie at the visual context. Now, a visual context can only have one movie connected to it. So you've got to be careful of that. It'll fail if a different movie is already connected to that visual context. So it's important that you can disconnect them.
And I want to point this out. Calling setMovieVisualContext with a null pointer as the visual context parameter, it's an important semantic for disconnecting a movie from a visual context. And also, similar to the new movie calls, there is a difference here compared to setMovieGWorld. setMovieGWorld, when you passed it null for the GWorld, it didn't mean don't render anywhere. It meant go take the current GWorld and start rendering there instead. So the visual context call is different, where passing no will cause the movie to stop rendering.
Now it's important that you use this when you maybe have to switch the movie to start using a G World. Just calling setMovieGWorld won't necessarily disconnect the movie from the Visual Context for compatibility reasons. So it's important that if you do that and you want to make sure that movie is disconnected, you pass null to setMovieVisualContext. Okay, so they're connected. Now you need to get those images out.
These are where these three APIs come in. So the visual context is kind of a pull model. You ask it, is there a new frame available for this time using that first API there? Is new image available? And then if that returns true, you'll move on to the next one, copy image for time. And you'll probably pass it the same time stamp as you asked in the first call.
And then you get the image out and you can render as you will. And when you're done, after rendering, you need to call these visual context task call. Now this allows the visual context to do some housekeeping work, recycle resources, reclaim buffers, all that kind of thing. And due to threading issues, it's important that you call this while you have OpenGL locked and protected from other threads.
So, you might wonder, where do I get the timestamps and when do I call those functions and what do these buffers look like that I get out? This is where Core Video comes into play. So let's talk about Core Video for a little bit. So it provides buffer services and timing services.
So this is how we can bridge the gap between a QuickTime Visual Context and OpenGL, where you have buffer objects that speak both languages. And it allows us to connect those two things. And it deals with all the ugly details under the hood of managing buffers and managing their connections to OpenGL and basically all the things you have to do to make the video playback efficient.
And also, it deals with timing. So it's going to be telling you when you need to draw. And it's kind of analogous to the Core Audio model, where the audio hardware is telling you when it needs samples. Core Video will tell you when you need to draw video frames.
So about the buffers. Core Video buffers are core foundation based objects. So they have the same retain and release semantics of the rest of core foundation. And there's sort of a hierarchy of classes for the buffers. At the top there's CVBuffer, just kind of an abstract class. And with this you can have attachments on the buffers. Now attachments can be anything you want. They're just a CF type. Think of it as like a dictionary of stuff that's on the buffer.
Examples might be the timestamp of the buffer or the color space of the buffer, things like that. And you can put anything you want on there yourself, custom data. That's at the CvBuffer level. The next thing in the hierarchy is CvImageBuffer. Now this is specific to images and you'll find things like the dimensions of the image here. And all of the Visual Context APIs that deal with Core Video buffers, they'll be using the CvImageBuffer type. Now in reality there are multiple types of image buffers. Like here we have the Pixel Buffer, the OpenGL Texture and the OpenGL Buffer.
For now let's talk about the OpenGL Texture. This is a simple, I shouldn't say simple, it's a wrapper around an OpenGL Texture. And it does all the dirty work of managing the memory mappings and recycling textures and memory and all that stuff. And if you speak OpenGL, you know about texture targets and texture names. And if you can get the texture target names, you can get the texture targets and texture names.
You can get the name out of this thing, you can render that texture using OpenGL directly. And to do that, you're going to need texture coordinates. And there's an API we recommend you use, OpenGL Texture Get Clean Text Cords. Now we recommend that over just getting the size of the image and going from zero to whatever width and height.
So timing, this is where the Core Video display link comes into play. The display link is an object that helps you drive that visual context pull model, where it's giving you the callbacks to say, if you start drawing right now and you hurry up, your pixels will get on the screen in exactly 25 milliseconds. So it's going to give you that time stamp that you pass to the visual context to query if there are new textures available, and then to get that texture out. And it does this on a dedicated I/O thread.
And because a computer can support multiple displays, you'll need to keep that display link synchronized with-- with the current display that the video is being played on. So there's a call to do that. So once you have these buffers, you need to render them. Well, there's two obvious choices off the bat. You have Core Image and you have OpenGL. With Core Image, it might not be accelerated. I think not be available is the wrong word. Might not be accelerated. And since we're talking about live video, maybe that's the same thing.
and with Core Image you get built-in color matching. So all that color sync work, that's basically free if you render with Core Image. And effects are downright trivial. It's easy. You can do it in one line of code. Long line, but one line of code can take that image and apply a crazy filter to it.
and it supports all the usual video transformations and compositions and things like that. So that's the easy way. You can also use OpenGL. Now if you go this route, you can have broader hardware support. You don't need to worry whether the graphics has fragment programmability. You can render without that. And it could be more efficient if you know exactly what you need to render and how you want to render it. Unfortunately, it's going to be a bit more code to write than using Core Image.
So color management. Many of you are probably familiar with gamma. And dealing with gamma in video has been an issue for a long time. And that's just a small piece of the color management puzzle. There are more aspects to images and video than just the gamma curve. There's like the chromacities of red, green, and blue.
The red on your display might be different from the red that the camera recorded. So color management is how we solve this issue. And applications have been describing in the movie file the characteristics of this video for a while now. And there's a document we have that describes how you can put this information in your video if you're capturing video.
When you get the frames out of the Visual Context, they're going to have a color profile attached to them. And this color profile describes that source video, its color characteristics. And when you render that video using Core Image, it'll have some color profiles that describe the destination and a working color space. So the color is going to be transformed for you.
The Visual Context is a couple APIs for describing how you want these colors to be managed. It's very analogous to the Core Image model where you have a working color space in which the pixels are manipulated and an output color space in which everything is eventually matched to for the display. That's typically going to be the calibrated display profile for your monitor. And again, very little code is required for this. You just need to get the display profile, give it the core image, and then just draw the buffers and they're color managed for you.
So I'm going to be showing you some sample code. Now this is really sort of a best practices example of how to render using the new video pipeline. And it uses QtKit, but not for rendering. It uses the QtMovie class to manage that movie, to call movies tasks when necessary, and deal with opening the movie and all that kind of stuff.
And it does the rendering in one of three ways. It uses OpenGL directly and if Core Image is accelerated, it'll use that to get the color management. And if the hardware does not support Quartz Extreme, we're going to fall back to using GWorlds. So let's walk through the code for a little bit. Demo 1 please.
So this example has been recently posted to the ADC or WWDC website, so you can follow along if you have it. So I'm going to walk through the different stages of the pipeline that I talked about earlier. The first one being movie creation. Now I'm using QtKit so this is trivial. I just call the initWithFileRoutine and QtKit creates that using that new movie from Properties Call under the hood and it's all easy. Then here I tell the movie to start looping.
Okay, so that's movie creation. Now, creating the Visual Context. There's basically three steps here. The first is to create OpenGL objects. So we use the NSOpenGLPixel format and the NSOpenGLContext to do that. And from those, I get the underlying core OpenGL objects. See the CGLContext and the CGLPixel format. I stash those away because I use them quite frequently in the code.
Then I create the display link and the visual context. You see here I pass the CGL context and CGL pixel format to that visual context creation call. So now I have the display link for the timing and I have the visual context so I can pull buffers out. Well, what does that callback look like? Sorry, before that, we need to connect it to the movie. So my view class will provide the visual context that it created to the controller object here. And the controller gets the movie from the document and simply calls setMovieVisualContext.
And that's it. You see all the other boilerplate code here for calling back to G-Worlds. You should look at that if you're interested in how to write a robust video rendering app here. Okay, so at some point you need to kick off that display link to get it started so it's going to start calling you every display refresh time.
The lock focus call is a good place to hook in to do that. This is called right before the view is about to start rendering. So we'll make sure if we haven't been connected yet, let's do so. So we connect the OpenGL context, get the display link set up, provide it our callback function, get it updated to the current display that our view is rendering to, and then call start. And once we call start, it's going to begin giving us those callbacks every, well on this machine, 60 times a second because I have this LCD here.
All right. Now in that callback is where we do the meat of our rendering. And here's the little algorithm that we use to drive the visual context. The first thing we do is we ask the visual context if a new image is available. And here we just pass it the time stamp that we got from that display link callback.
And when that returns true, we move on, release any old buffers that we had pulled out of the visual context, and then get a new one out of it with copy image for time, passing it that same output time stamp. And that gives us a CV OpenGL texture object. And I'll stash that away in my class and then move on to the rendering step.
So, rendering. I'm not going to get into details here because Frank later will have some more tutorials that will show you different ways of rendering. So I want to call out a few details here. This is the render function. And you'll notice here that I check to see that our texture is not null. Now, this is subtle, but a Visual Context may return a success code when you call copy image for time, but it'll give you a null pointer.
This is completely valid and you had better expect it because that just means there is no video for that time. Maybe there was video and then to get you to stop rendering it, it has to tell you, okay, now start rendering nothing. So you have to be prepared for that. And in this case, when we encounter that, I just clear the buffer using OpenGL.
At the end, when we're done drawing, we call the flush buffer method to get that data onto the screen. And then I call the task routine. And I do it here because we still have our OpenGL locks taken, so no other threads are going to be messing with OpenGL at this point.
And it's a good time to do this housekeeping work. So that's pretty much it. Those are the important points of this application. There's a lot more code in here dealing with resizing and all that fun stuff. But it's a very simple application. Just renders the video. And that's it. Back to slides.
So there's a few more uses of visual context that I'd like to go over real quick. The Image Compression Manager can drive visual context also. And you don't have to necessarily connect the visual context to a movie. You can connect it to an ICM decompression session. Now this is what I did in the jigsaw puzzle where I had the live video going.
I had a sequence grabber. I was pulling frames down. And I was feeding them through the decompression session, which itself was connected to that visual context. So the rendering code had no idea where the video was coming from. It was just doing the visual context work. And I connected it through this API to that decompression session.
Oh, and there's a session after this, you know, 2 o'clock, on using those new APIs and decompression sessions and so forth. So, CVPixelBuffer. This is another core video buffer that I didn't talk about earlier. I'll just focus on the OpenGL texture for a moment. But this is where you have memory-based pixel data. So this is equivalent to having like a pixmap in the previous rendering model.
So now it's a reference-counted buffer object. And it's used heavily internally by QuickTime for all of the decompression session work. And it's really the foundation for our multi-buffer ACM APIs and allows us to have, you know, one pixel buffer being mapped to the hardware while we're decompressing into another pixel buffer elsewhere.
Sometimes you may want to get these out of the movie instead of OpenGL Textures. Maybe you just want to do, you know, CPU processing on those buffers. Well, you can do QTPixelBufferContextCreate and it will give you also a Visual Context. But this one will not give you OpenGL Textures. It will give you CVPixelBufferObjects.
And from there you can get the base address and the pixel format and do whatever you want. And it's useful for things that don't need the GPU to work. Maybe you're doing offline rendering or you're just extracting video. Now you can--to this API you can pass options to say, "I want specific pixel formats," or "I want a specific dimension for this pixel buffer," and it'll work. If you don't pass any options here, you'll get whatever the native size is of the movie video. And with that, I want to hand it over to Frank who's going to talk more about rendering and using Core Image.
Thank you, Sean. Welcome to sunny California. That worked better in the rehearsal. Okay, we are talking a little bit now about Core Image. My name is Frank Doepke and I'm actually showing you some of the rendering effects that Sean already introduced a little bit. You've seen the pipeline, same slide as we had already last year, and we're going on to some of the more details. So let's start first with the core image part of it.
So what does Core Image bring us as an addition to our playback scenario? So we get over a hundred effects right out of the box from Tiger and already developers are writing additional image units which even provides us even with more filter. We get very fast processing of these filters right on the GPU. So you can do fancy effects and it doesn't tax your CPU.
What you need to learn is basically how do we get from the CV image buffers that we get from QuickTime through Core Video and put them into CI image. So I will show you that part. And the same is true like how do we get the CI context which is our rendering target for Core Image.
One thing that is important to know here is that you can chain these filters together. So right now we've just seen, okay, we use one filter for the color correction, but I can do multiple effects and then put them together into one pipeline. And you want to chain them together because these chains can fold together in one pixel program.
So if you haven't been at the Core Image session yesterday, you will actually see that there is an advantage in Core Image built in that it lazily evaluates these filters and therefore computes a very small program out of a longer filter chain to make the rendering more efficient.
So the sample code that we want to look at is the CIVDU Demo GL. I call it episode two because it's actually a slightly different version that you've seen already on your Tiger disk. So we show the integration Core Image going through Core Video and OpenGL and QuickTime.
We actually look at how we create the CI context from our NSOpenGL context. And we use the CI image from CvImageBuffer API to really get our frames from QuickTime into Core Image. Last year, for those who have been here, we showed mostly to use image with texture. This is still a valid API call, but I would like you not to use it for this case as it has some drawbacks. It can go into technical details in the lab if you want to stop by.
So also show you how to create a filter pipeline as already mentioned. We'll use multiple filters on that video and then we also use some compositing effects to draw on top of our video and I will show you how to do that as well. So with this I would like to go to the demo machine.
So here we have our CI Video Demo GL. And before I actually show too much of the code, I will actually show you what this application really can do. I'll run it in just a second. Okay. So here I have some video. It's not running in this moment.
I can scrub through it. It's a very short clip. You will see that clip quite a bit more in the afternoon as well. And I have a zoom blur effect which I can move around. As you can see right here and I can set how much of a zoom blur I want and I can even take it completely out.
So this is kind of my zoom blur. And I can do this while the clip is playing. You can see this. And I change the zoom blur. This is all running live. Same as I can do with-- we showed that we can do some color correction as well. So if I say, well, I want this a little bit brighter or darker, saturation nicely in. And now we have a very, well, psychedelic scene here. So this is, in short, what this application can do. Now let's have a look at the code, actually, how we did this.
First part, as I said, okay, we need to create a CI context. And I've do this in the prepare for OpenGL of my view. So when you look at it, we again set up our display link. And we need to look at basically the amount of displays that we have, set our display link up, and I have some, I set up my output callback. I'm ready for rendering on this kind of part now. The second part that I'm going to need to look at is... Actually, sorry, I need to go a little bit more on top here.
How do I set up the context for the CI drawing itself? So I have my CGL context and I create a CI context from it. I set up the color space which is important for the rendering part as we mentioned already that CI will do the color correction for you.
And then I set up a bunch of filters. So I used the color control filter which is the one that does all the hue, saturation, brightness control and I used the CI Zoom Blur that was the fancy effect that I put on top of it. So I don't set all the values on the Zoomer right here. So what I'm doing actually is I use set defaults to make sure that all the parameters of my filter are at least initialized to something sensible.
So that is now everything set up. How do we do the rendering part of the filter? Well, that is very simple. The callback that actually like Sean already showed that actually now will render these frames coming from QuickTime will simply go in... Oops, this is actually in the wrong space.
Provide me with a CV image buffer and I create a CI image from it by simply using Image with CV Image Buffer. So now from this buffer, I get my rectangle. So this is really the frame size coming out of QuickTime that I wanted to use. Then I run my color correction on it. And now the output of this color correction I use as the input to my effect filter, which was the zoom blur.
And then I draw, last but not least, my image to the screen. And you can see that I'm using here the image rectangle that I got out before I applied the filter. So like the zoom draw, you can imagine this actually makes the image much bigger because it zooms outside. But I really only want to render like the original frame size. So that's why I had to get the actual extent from the original frame and then render really just using this as my source rectangle.
The other part that I can show you here is how do I actually use my "So I have simply an event handler and while I'm playing back, again I have to make sure that I'm actually locking my OpenGL context here, I can change the parameter of my filter. So this is the effect filter. This is how I'm doing the zoom part and moving it around with the mouse. But it's simply changing the input center as that is my target.
The next part for that, actually I want to go quickly back into like running the application, is like actually how do I get my stuff back? So we've seen how we render and then can bring everything back to the screen. But we often get the question, OK, screen is not the only target. I want to get stuff back from my graphics. So for this, let's go quickly back to slides and I'll talk a little bit about that.
Can we go to slides, please? So how do I get the stuff back from the GPU? There are two ways that I want to talk about. The first part is when I'm using Core Image, I can ask the CI context, the Core Image context, to bring back a CG image. And this is good so I can write the CG image out to disk or just render it as a thumbnail and some views that I need in my application. So this is very easy and it's good if you just want to use CG image.
If you want to read back what's on the GPU, now let's say I do actually some stuff on top of my Core Image rendering and I'm using OpenGL on top of it. So there's a way how I can read back these pixels from the graphics card and do something with it. So that's what GL Read Pixels is doing.
This is not also, now for those who are a little bit more familiar with OpenGL, they say, oh, GL ReadPixel, that's old stuff. There's a more advanced way of doing it. It's just a little bit more complicated. And for those who are interested, please go to the Maximizing OpenGL Performance session that is on Thursday at 2 o'clock.
And there you can see a little bit more fancier ways how to read back pixels from the GPU. So we'll go back into our CI Video Demo GL and we have a look at how we can use Core Image together with Image I/O to write the stuff out to a file.
And when we just look at the OpenGL Context, we'll actually use the GL Read Pixels part, read the pixels from the graphics card, and then we use movie export from procedures to export what we've just rendered into a QuickTime movie. And with this, I would like to go back to the demo machine. So let me pick a little bit more nicer frame here. OK. Move my zoomer. And now what I can do is simply save this frame. And let me put this on the desktop.
"The main point here is my frame. I can open it up in preview. Now you see this is actually the full size of the frame and I will show you in code why this looks so much different. But before I do that, I'll also show you that we can actually export this movie. I have this little export button here. Now I'm creating a movie. I want to prepare for internet and instead of H264, let me just sort of time constrain use actually DV.
So what I'm doing here is actually I'm rendering the movie on the screen and output it out into a QuickTime movie. And there it's open. This is now in the QuickTime player. And I can play back my movie. It has even some audio with it. Okay, let's have a look back into the code of this. Okay.
So as I promised first, it's very easy to just get the CG image out of it. I have my save panel, where I'm simply asking the user, OK, where do you want to store your image? And then I have a URL. And now what I can create is a CG image destination. And with the CG image destination from URL, I'm telling it I want to use actually a JPEG here. That is my target for file format.
When I have this destination, all I need to do is, on the CI context that I have, I ask it, give me a CG image from my effects builder pipeline, which I have right here. And I have this image, which I simply add now to my destination. I need to finalize it. And that writes it out as a file. So it's very simple. I have now a JPEG file on disk.
The part that is slightly different here is, you can see that I'm really using the extent of the final image. In comparison to before, where I actually used the extent of my buffer that I'm getting from QuickTime, I'm now actually using the extent of the final rendering. That's why you saw how we have this zoom blur, the really big image coming out of it.
The next part is, okay, how did we do the movie export? So for this, we need to set up our export first. So I create a QuickTime export component and there will be more in other sessions. I'm going a little bit fast on this now. We set up the component and the crucial part here is actually that we set up two callbacks for the video track that allows me to do the rendering part.
For the audio that we have in here, I simply pass through the audio. So I'm simply telling QuickTime, OK, what is your standard behavior of passing through the audio? And that is what I'm doing right here, since I'm just talking about image parts and not audio. Let the user change the settings. That picks the compressor. And now, I already mentioned that I'm rendering on screen. This is probably not a good practice in your applications. First of all, you normally don't want to really show what you're compressing on screen and it has a slight downside.
The coordinate system that we use in OpenGL is exactly flip-flop to the one that we actually use in QuickTime. So what I have to do is I have to read back the pixels and then flip them so that I'm not rendering my movie upside down. So I need two buffers. I have hard coded here to 720 by 480 just because I'm a little bit lazy. And then I have my first context in which I'm actually reading back and then I'm flipping my pixels so that I actually have them in the QuickTime format.
Then I need to set up my image description so that actually QuickTime knows what I'm really rendering here. And then I call MovieExpress from Procedures. So it will now call in a tight loop for every frame that it wants to render. So let's look actually how this frame callback looks like.
So it starts always by calling me for each single frame. And so this is running in a tight loop. The part that's important is we're dealing with core images. It's using Objective-C code. And so we definitely have a need for creating an auto-release pool. As we are doing this in a tight loop, normally you would just release your auto-release pool at the end of an event loop, which is not efficient enough here because you do this over a long period of time for multiple frames and for each frame you will accumulate some auto-released objects.
So you will mash up your memory and you don't want to do that. So it's important that you create an auto-release pool around your rendering that you do here in Core Image. So that's why I set up one and at the end of this call I simply release this auto-release pool.
I'm now using the time that gets passed into me and set the movie point to this time. I make sure that it really renders and then I really render my frame onto the screen. As I said, that's not probably the best behavior, but this makes it a little bit easier in this demo to show. I render everything and now all I need to do is I need to read it back. I'll look into the read back part in just a moment.
Now I have my pixels coming back from the graphics card and as I mentioned, I have to flip them and that's what I'm just doing in this section here. It's just a very simple flipping code. And I can return my rendered pixels into the parameters and that's all I need to do actually in my movie callback to do the rendering part. So let's have a look at the readback part.
Very scary OpenGL code. It's just a few lines. Make sure that we keep our state before. Then we set up our memory layout and all we need to call is GLReadPixels. This reads it back from the graphics card synchronously. And then we clean up and we are done.
So we have our frame coming back from the graphics card. And that simply exports the whole movie. Now there's one part that I actually want to do a little bit here to show you how easy it is actually to deal with Core Image. And I have my little cheat sheet here.
The version that you already have on your disks shows actually a little bit more. It shows already a time code on top of your video. So let me show you actually how I simply can add this. So all that I need to do is I need a composite filter. This is my very first step. And let me add this to my video view.
[Transcript missing]
And release my filter. Let me do this right here. And the last part is now in the rendering part, I simply need to chain this into my current rendering part. So I'm doing this here, and I go back to my filter rendering. And at the very end of it, before I actually draw, I will add my filter.
So I'm using the composite filter. The output of the effect filter is now my real background image. And then on top of it, I put a timecode image. And I have already prepared that a little bit up front. So we have this timecode overlay from which I'm getting another CI image. And so this is simply put on top of it now.
Now, instead of using the output of my effect filter, I'm using my composite filter here and draw that image. Now, let's hope that everything works fine and let's run this. It doesn't want to save. Okay. And there you can see I have now a timecode track right on top of my screen. And this is how simple I can render with Core Image. OK, let's go back to slides, please. Thank you.
Now we talk about OpenGL as the other way of rendering VCD onto the screen. The first part is, okay, where we are in our pipeline, we've seen the Core Image part and now we talk about our OpenGL. So we can do transformation and also the rendering part. First thing when we talk about OpenGL is shop safety. Wear your threat safety glasses.
OpenGL contexts are not reentrant safe, so you have to deal with threat safety here. Now I'm just giving you three examples of like how to actually do this. You can use the Pthread logs, which I'm doing in some of the sample code already. And you can also see that we can use a shared OpenGL context to get around some of the threat safety issues. And as you saw already in Sean's demo, there's in Tiger a new API which I really recommend to use also. It's the CGL log context and CGL unlock context. Those are some of the calls that you can use to make your context threat safe.
You also need to do this whenever you talk to the Visual Context, the Qt OpenGL Texture Context. Although you can actually use IsNewMovieTextureAvailable outside of this lock because that one is actually thread safe. So you can check outside of your locking if you actually have a new frame available.
And if you use AppKit and you use the NSOpenGL view, there's one thing that you need to keep in mind. There's an update call which you need to subclass and actually make sure that you put locks before and after you call super update as there's some OpenGL code being executed within AppKit on this part.
Now even more about threat safety, there's a very nice tool called the OpenGL Profiler and with the WWDC disk that you got, there's a new version of it with the developer tools which really works great and it helps you a lot because you can set breakpoints on threat safety issues which allows you to catch the threat safety before your machine goes downhill. And you can see more about this also again in the Maximizing OpenGL Performance session.
So we look again now at our live video mixer. Again, the same stuff as we've seen last year, but slightly refined. First part that we did, so we updated everything to the latest APIs that we are really shipping in Tiger. And we've also improved a little bit our timings that the movies really run in sync.
Because we really play three movies into one surface, and the movies you will see is like one video shot from different camera angles. And that's what we will render in OpenGL. And we simply use OpenGL here, no Core Image, just to do all the rendering effects and the compositing.
There's something new in the sample code that I will show you, which is not available to you yet, but will be available soon. We use the AVC video services to read back the frames and put them out on FireWire. So we again use GLReadPixels and then we compress them using QuickTime and use the DVFireWire output. And this was worked with any kind of even like the Core Image part. So this session for this was actually yesterday.
So you wish you knew that already earlier, but you can still go back on the slides and get the information when you look up for session 504 and get the information what happens there with the new FireWire SDK. And with this, I would like to go back to the demo machine. Okay, let me show you the live video mixer first, what it does. So I'm opening up three movies.
You can play them back. I'd like to thank Ralph here in the audience for the great pool play that we have here. So we can now mix this on top of each other with semi-transparency, this channel, I can use this channel. And we can use some funny shapes. I can position this movie up in this corner over here.
This one I'm going to run into a bubble and position this into the other corner. Let's say the background I'm just running through some fancy brush strokes. So this is simply now running with OpenGL and QuickTime and the new part for it and therefore we have this elaborate two camera setup here.
I'm now simply having one FireWire camera here which I'm actually running my output to. And since this screen is a little bit small, I'm trying now to show you this in my QuickTime movie recorder. Hopefully it should show anything. It does not want to do this now. Bear with me. We'll try this once more.
[Transcript missing]
It's now the small display here of my camera. You can see that's where actually what I'm pointing at. Make this a little bit better. And now I can really move my video around and change this back like, okay, I don't want this brush stroke in the background.
This is running on the full frame part. I can change this to be a star. And you can see up here in my camera display that I'm actually really playing this back out on the FireWire. Sean Gies, Frank Doepke So let's have a look actually how we did this in code.
So the key part that I just want to point out here is so much how do I really do the rendering in OpenGL. And this actually reflects a little bit back also to what we saw earlier with the jigsaw puzzle where we use something which is called multi-texturing, where I actually use one texture as a mask to really render my video.
So you saw that we actually have the capability of running videos through shapes or just as a major part I can also do it without a shape, just running it in plain. So what I'm using here is actually I'm setting up my blending of OpenGL. I'm getting my textures.
And then I'm, all what I'm using is actually I use the color and you can see the opacity part. That is giving me the capability of actually running with a different opacity of my video. And all I have to do is now render this texture on a quad, my rectangle that I'm actually rendering out.
When I want to use any of the shapes, I need to set up the multi-texturing. So I'm loading one texture as the first texture target, and the second texture as the second texture target. The first texture is actually my shape, which is just a grayscale image that I'm using as an alpha mask. And those two together get rendered, then you can see slightly more code.
[Transcript missing]
Now the next part that I needed to do for the speech bubble was, OK, I need something on top of it. This was the white outline. This is the shape here. So I'm again using this, the texture for that, which is just a regular RGB texture, and render this on one quad on top of each other. And OpenGL is compositing all these steps on top of each other when I'm rendering it out to the graphics card. And that is actually how I do the rendering in OpenGL. So I will go back to slides, please.
So what did we use? We used GL Blend together with Alpha to get the opacity effect that we had in the video. Then we used the masking with multi-textures. That is actually giving us the capability of stamping out video. And then we look at the FireWire part. I haven't showed you much in code yet because my sample code is not quite ready for you yet.
But we use again GLReadPixels to, same way as I've done in the movie export, to read back from the graphics card. I have to make sure that my coordinate system is the right way around. Then I compress it into DV by using QuickTime, just as a simple codec for this. And then I can simply use the AVC video services to render this out through FireWire. And the sample code will hopefully be available soon for you so you can all see which kind of tricks we used.
and now comes the part where hopefully you don't get any X thrown at me. There are problems and I need to talk about those as well. There are certain hardware limitations that we have to deal with when we talk about OpenGL and Core Image. First of all, when you talk about OpenGL, we have to have a Quartz Extreme capable graphics hardware on all displays. So if you found in your attic an old H128 PCI card that takes the Quartz Extreme capability available for QuickTime in May simply from you, you can't use those two together.
This is a limitation that we have to deal with. The other part is different graphics cards can support different sizes of actually the drawable surface and even like the texture size. So if you have some special movies that are really, really big, you might actually have to resize those down into the limitations that you have to deal with the graphics card.
This will not hit most of you because actually the graphics hardware is pretty good that we have in our machines, so you will not necessarily run into this, but it can be. And when we talk about Core Image, Core Image is always available for you on Tiger. Actually, the live video mix that we saw will run on Pantra as well with QuickTime 7. But Core Image, always available on Tiger. But depending on the graphics hardware, will run on the GPU or on the CPU.
And for a lot of the filter effects, actually on the CPU, will not get the real-time performance as we've seen here. And again, I would like to point out threat safety. With OpenGL, it's really an important issue that you look out. Make sure that all your access to OpenGL is locked when you render from multiple threats, as we do here. here.
So we've covered quite a bit today and so let's just recapture a little bit more what we really have as an important message to you here. So the first part, we have our new rendering architecture where QuickTime sits on top of our Quartz layers, Core Graphics, Core Image, Core Video and then we render through OpenGL into the graphics hardware.
We have the new video pipeline where you can simply use the Qt Movie View or the HI Movie View as the easiest step when you just want to render out video to the graphics hardware. Or if you want to do your own pipeline effects, you simply can use the Core Image and also the OpenGL part to customize your frame pipeline.
More information, you definitely find more on our website. There's normally a lot of documentation there, sample code and all the other resources like the iSchool that was already pointed out. Sessions. Well, unfortunately we don't have a time machine. The FireWire session was yesterday, but you still find the information.
Then I would like to point out today afternoon Sam Bushel's advanced video formats, always very entertaining, also very informative. Where you see more about the export and also the sequence grabber part. Then we look into the color imaging part. The color sync is also on Wednesday afternoon and the maximizing OpenGL performance part is a lab also and is a session. Feedback on Friday. You can talk to us and we will hopefully also listen to you.
And you can find us in the lab. We have a graphics and media lab and you can just go from here down the hall, take a turn and there we are. Sample code you've seen already that we introduced the video viewer, which was Sean's sample today. We have the CIVideo Demo GL. Watch out that this is a new version that you have on the WWDC distribution.