Mastering QuickTime Digital Video Techniques - WWDC 2007

Graphics and Imaging • 1:00:51

QuickTime is the cornerstone framework for digital video applications on Mac OS X using OpenGL, visual context rendering, Core Video, and Core Image. Learn expert techniques relating to video application development. Deepen your understanding of the QuickTime rendering pipeline, the difference between visual context rendering and older GWorld-based rendering, and how QuickTime deals with color and gamma. Learn about clean aperture, pixel aspect ratio, media tagging, and QuickTime pixel formats. A must-attend session for all developers working with digital video on Mac OS X.

Speakers: Jean-Michel Berthoud, Ken Greenebaum, David Black

Unlisted on Apple Developer site

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript has potential transcription errors. We are working on an improved version.

Good afternoon everybody. You are about to listen to the Session 409 called Mastering QuickTime Digital Video Techniques. So what is this session all about? Actually, instead of usually talking about new APIs and flashy demo that we are doing, we are stepping back here and trying to give you more concepts and key elements you need to know in order to write a video professional applications.

So we'll talk about three things. First of all what defines professional video applications. Then a couple of QuickTime concept and vocabulary you need to understand what this technology is. And the last one is how to make these two technology happy together. But I am not somebody who is writing application and pro application as well. So I have somebody which is going to come up on stage to tell us what professional video application is all about. So let's welcome David Black, Senior Architect Pro application. David?

( Applause )

-
some of the topics that Ken and Jean-Michel are going to talk about today. And really talk about what a pro application is in the context of media as opposed to maybe a consumer application. And really, we're going to come up with a base definition of a pro application. It is a tool that really is designed for production professionals.

You know, it can be a really big complex application, it can be a small specialized tool. You know, in some cases it will do just one operation, in other cases it's a scriptable environment to do lots of things. But really, the key concept here is that it's built for creative people who aren't necessarily computer people. So there will be logic there to handle all the graphics operations and creative looks they want to build. But without, necessarily, the complexity that an engineer will understand. And another key concept here, especially when it comes to larger customers, is that it's reliable and consistent.

Some pro customer expectations. We touched this a little bit before. We want a tool that's useable. They can learn it. They can get data in and out of it. It's reliable. It doesn't crash a whole lot. Or at least it's a little bit predictable. And it doesn't necessarily consume extreme amounts of time.

If you have a tool that will render at -- you know, one frame takes four minutes to render, you're certainly going to limit yourself on who's going to use that tool. And also the tools are repeatable. But this comes back to reliability. But if I am going run media through a tool multiple times I will get the same results with the same parameters.

Just for a minute, you know, I talked about pro customers. And the pro space in many ways is kind of split in two these days. Where you sort of have the high end. You know, the production houses, the film studios. And sort of the presumers or the Indies.

Right? These are people who are buying a lot more pro applications in volume than the high end guys. And there's certainly -- what you think about it in terms of what kind of tools they need in their expectations. On the one hand, they expect everything the pro people want, or the high end pro people want.

But they have the advantage that they will trade some convenience for cost. If it takes a long time to render their film, it's okay. Because they don't necessarily have the budgets of the big guys. And you think in some ways this would mean that they're more forgiving of flaws.

But they're actually not in many ways. They are probably the more difficult customer to code for. Because they're not necessarily going to spend a lot of money. And whereas the big guys will take a lot of time to evaluate a piece of software and confirm it meets their needs. The little guys in some ways will flock to something that has a lot of forum or review buzz. And they might be as proficient, or consistently proficient as you expect. So they can be a tech support challenge.

This is kind of the slide I sort of -- in coming up with this presentation with Ken and Joe Michelle, I realized this might sort of make people panic a little bit. It's not really as dire, maybe, as it sounds. Because really, when can comes down to creating a pro application.

As long as you're implementing for the high end guys you're really going to get everybody for the most part. You know, just make sure your interfaces are well developed. If you have chances to go through Q A cycles, make sure your software is stable. Everyone's really going to be happy.

But you know, if you have to sort of segment for one or the other, you can target the reliability features of a high end, and you will eventually get (Inaudible) -- I also want to sort of set the stage with a work flow. You can start off by, you know, acquiring media. Work flow that your customers will go thorough with your products. And you want to think about where they fit in this chain. So you might shoot with a high def video camera, Telecine off of film.

And then once they have that content into the system they're going to use your tools, our tools, multiple tools, to go ahead and edit that content, you know, make changes to it, render it. And really provide, you know, as a creative, their value add what comes out of the lens.

Once they've got that phase complete, you know, they're going to take that content out of your application and they're going to put it into maybe a final container, or maybe an intermediate container. Sort of collapse everything down and get sort of the component pieces into at least a single chunk. And then of course when you're done with that, it's going to go somewhere for playback. They might be sending it out to film, might be sending it out to DVD mastering, might be sending it out to Apple TV or iPod.

And really, the message we want to talk about today is that all of these operations you can do in QuickTime. And you can do them for the pro customer. They're just, you know, certain concerns, common issues you'll want to keep in mind and code around so that ultimately you're meeting everyone's expectations. And really, you know, that is what the session's about. You know. Those details, those "gottchas" that Jean Michelle and Ken are going to go into. And with that I'll actually invite Jean Michelle back on the stage to begin with a very brief overview.

So, before we dive into this detail, David was mentioning we need to get a quick tour of QuickTime, what it is, so you understand the vocabulary, the concept, and the other thing that we're going to talk about. So what is QuickTime? It's a set of APIs, basically, in OS X terminology, it is a framework. But it's also a file format which let's you capture and modify your content and deliver it to your end users. So when you combine all this together you basically have a solution for your application.

So we're going to go one by one to all these bullets on screen, and after that we'll tell you all that's relative to your pro video application. So let's start with the first one. The movie. The movie is a basic container of all this content or whatever it is, audio, video -- and we organize it and try to display it in a fashionable manner. The important thing is that the movie can be a file, but also can stay in memory.

The first element in this movie is the track. Each different type of data is organized in a track which can contain different types of media. One more time, audio and video. But there could also be multiple video tracks. Representing different piece of your content. Inside this track we have what we call media.

And media is where the bits which are going to produce this content. The media doesn't have to be physically in the QuickTime representation. It can be outside of it. And how do we represent that? We have what we call data references, which actually allow QuickTime to fetch this data either locally inside your movie, or even on the network. And we'll see later what this is. Why this is important for you.

Another concern that QuickTime have is the Time Scale. Basically, it's a really simple definition from us, but it's critical for your content. It's the number of time -- per second, simply, I mean in this example on the graphic, for instnace, if you count the little ticks you'll see 18 of them, I believe. Which will make your movie user Time Scale 16.

Historically, QuickTime has defaulted to use a value of 600 for this Time Scale. And the reason why, like, 16 years ago when we decided to create our first movie, 600 was a perfect number. You could play 12, 15, 24, 36 -- in FPS what else did you need. In the pro video space today that is no longer true. We need something much better than that.

The last thing that you need to know is that this Time Scale is present inside your movie to drive the entire movie API you're going to use. But it's also contained inside your media track. So you've got to be able careful when you're going to select both of them that they make sense.

Tagging. What we mean by tagging on the video side is basically describe the color space of your video as well as some temporal information, like field, and also two other tag, called pasp and clap -- we'll go through each of them later. Another QuickTime capabilities is getting content from other files within QuickTime.

So on one hand you have the importer which are getting-- sorry -- getting you in these files, and the exporter, which are going the other way. Codecs. Video codecs. What they are. Basically they take the cells in and they compress them. And the decompressor is doing exactly the opposite way. It is taking this compressed data stream and produce pixel. From the outside, very simple. Inside they are extremely complex pieces.

So the pixel that I was just mentioning that this codecs get in and out, they can be represented in different color space. The one I have on screen right now is our common QuickTime format called 2yuv which is 42 yuv representation of pixels. Another one could be RGB. That graphics people usually understand much better than YU space. But they are unified with some of this 4 CC, like this one is alpha RGB. Last link. Visual context.

This is something we introduce with QuickTime 7.0 to replace entire pipeline video engine. We used to be based on GWorld (assumed spelling), and the visual context totally replaced it. And basically it built around OS X native technology instead of the predicated one. So we integrate fully with Core Image openGL, and Core Video.

So now that we are done with the terminology side of the -- of this presentation, let me go back to the slide a minute ago it will show in the application or the processing pipeline. And let's take a look at the first one. Which is the acquisition of the media. We will go through each of these points one by one to describe exactly what we mean by that.

There is three ways your application is going to get content. The first one is to digitize. If you have VCR or any input device capabilities will be able to take this digitized and eventually compress it, rendition of the video and make a movie out of it. The next one that I said earlier, is to import existing file. Which I not represented in the QuickTime format. But for instance, AVI file or anything else, and translate that to a QuickTime Movie.

And the last one which is the easiest one is to natively open QuickTime file that other application created. But nothing in this process is very specific to pro video. Any application could do that really easily in a couple line of code. So what would you be worried about if you write professional video application? It is extremely critical for you to check that the content is properly ordered.

It's not because it's a QuickTime Movie that you can trust it as properly tagged. You have to verify that the Time Scale that was used to create this movie is appropriate for what you want to do. You have to also make sure that the video is fully described in the movie. Otherwise you are introducing content which is untagged, and QuickTime is going to have to guess how to process it. And when we guess, most of the time we guess wrong.

In a professional application you can nag the user and say you are getting untagged content. Please tell us where its coming from. Because sometimes I know it's not in the file. But they know that this file, they got it from this production houses, and they know it is SD NTSC (unclear) content. So they can tell you, and you can override this tagging.

If you let content get inside our application untagged, basically it's going to be an issue for all the rest of the processing. So you can offer different choice if you want, to have the user understand what you are talking about. But you should never let untagged content flow through your application.

The way you view tags is to use a tool we call Dumpster, or basically what is showing you the internal of the movie. And on this picture you will see if you can read of course, that there is 4 which are very important for us. Which are a field, back, clap, and color.

The column one is probably the most critical one. This is describing how the video is going to be processed internally by QuickTime and all the rendering tag lengths. We define what we call an nclc at home, which has three information containers. The primaries, the transfer function, and the matrix. So this is describing, basically, three command crawl space in the video world which are either NTSC, PAL or HD. If you don't use one of the three tags, probably you're going to run into what people describe as gamma shift, brightness issues, stuff like that.

We have a nice floe posted on the Web site called Ice Floe 19 which exactly describe how we enter all this specific value internally. Before this nclc we had what we called a gamma tag. This is a Legacy tag. If you put it in the movie, we will still respect it. But if it's fully described by the nclc we will override it.

Something else you have to be aware when you -- in the crawl space. Pixels in the computer space are always square. Well, they are not in a video space. Because basically in the video space you didn't really have pixels. It's coming from some analog lines. It depends on how simple. And you end up as -- the pixel have a different aspect ratio and in order to tell QuickTime how to -- the content should be presented you better make sure that this information is correct in your file.

Other concepts that are quite different from the computer and the video space. In the video space they defined what they call a Clean Aperture, which is basically the area that you want to present to the user. And the reason why -- because in the analog space there was some garbage on the side that they couldn't get right. On the computer space all the pixels are well defined. You have to make sure that the movie you're getting into your app does contain this information as well. So we present it, properly, to the end user.

Frame weight. Another topic, when this time it's not about video versus computer, it's not -- video versus QuickTime. QuickTime doesn't really have natively definition of frame rate. You can make a movie with any kind of frame duration. QuickTime, again, will be happy to deal with that. In the video space you're talking about a constant frame rate. Either for For film 24fps, for NTSC 29.97 or for PAL 25 fps.

but QuickTime doesn't have any API interest to make this movie at this frame rate. We can have three frame with a special integration in it, in the last one which has nothing to do with the other one. In the video space they can't live with that. When you're going to go back to your device this last frame will be unmanageable for them.

It's up to your application to enforce this policy, because as I said, QuickTime can mix anything. You can create a QuickTime Movie with a 24 fps NTSC or PAL source content, we don't care. So you have to make sure that you don't let QuickTime do this fancy stuff.

So my last slide is about time code. We had support for time code in QuickTime since a long, long time. But the reason I put that up there is that we have a new 64-bit version of it, which allows us to support higher sampling rate for the audio side. If you are using it you might consider switching to the new representation. So I am done with the acquisition part. Let's have Ken talk to you about the rest of the pipeline. Okay?

( Applause )

Thank you, Jean-Michel. So, I work with Jean-Michel in QuickTime, and actually I work on just these aspects of the QuickTime pipeline. The back of the pipeline. Rendering, export, and display. And this is what I will be talking about today. I will be taking us through the next three sections. Editing, export, and playback. But first edit and render. We're going to be looking at these five points. And we're going to be covering a lot of territory. So we're going to maybe skip around a little bit.

So first we're going to turn back to Time Scale. So Jean-Michel already described Time Scale to you, so now we'll take a look at some of the trade offs you have in Time Scale. And like so many things, there really are trade offs. So when I describe Time Scale it's a little confusing.

So if I describe Time Scale as being numerically larger, it's really talking about chopping your second into more and more pieces. And that's giving you a finer and finer granularity for edits. And you might say, well, why would you want any granularity beneath the frame level. And that's for a couple of reasons. First, Jean-Michel described just how flexible QuickTime is with respect to durations and what you can do.

And you can actually do quite a bit more than what traditional video can do. So that's one reason why you might want to chop things up into finer pieces. But also for audio. Your audio edits, you might want those to be -- more fine or more accurate. Basically sub frame edits. And there are some other things we'll talk about as well.

So if you look at the bottom of the display you'll see that there is a movie. And that move o is intended to be film. And film is 24 frames per second. So on the top you'll see a bunch of orange bars and those are broken into pieces.

And that shows you the granularity of the possible edits if you had a Time Scale that corresponded to your frame rate. In this case, 24. $ So you get one potential edit per frame. Now if we multiply by three you get the bottom row. And there we see that we have a Time Scale of 72. r And there are three possibly edit points per frame.

Now there's one particular requirement that's very important to remember. And that is not only does your movie have a Time Scale associated with it, but every piece of media also has a Time Scale. So at the very least you have to make sure that your Time Scale for your movie is as large, probably equal to the Time Scale for your media track. And if you have more than one media track you want to make sure that the Time Scale for the movie is a multiple of the Time Scales for each of the media tracks.

Otherwise you won't be able to individually access frames, and that would be inappropriate. I should also mention that the larger the Time Scale selected the longer the duration of each frame. So really, that creates another kind of a limit that we're going to look at in this next slide.

And that limit is in the last column, the duration. So that the numerically larger your Time Scale the finer the resolution of things you can index. But potentially the shorter the duration of the entire movie that you can deal with. All that said, there's no correct answer for what is the proper Time Scale.

These are some that we consider to be proper. Most of them are pretty simple. 64 frames per second. 6,000 Time Scale. The duration for each frame would be 100. That makes sense. And you can see in that last column that you have, that's four days worth of movies. Now, and that's in a signed 32-bit integer.

So that seems like a lot. And maybe it is. Maybe it's not. But you can make those your trade offs. The one point I want to bring up is for NTSC. That's that 29.97 frames per second slide. Here there are a lot of ratios that are used for Time Scales. And frankly, most of those are incorrect.

They are imprecise, they don't give quite the right results. And then what will happen is over time when you do an edit you won't get the frame that you're really expecting. So the 3,000 Time Scale with the 1001 duration is one of the ratios that works out correctly. And that comes out of the actual specification.

Now, a 3,000 Time Scale, it's a pretty large number. And sure enough it does use up your duration faster. So what you'll see is that we have just under 20 hours of dura tion possible for that movie. Which still seems like quite a bit. But there are already companies that just run continuously 24 hours a day. So that may or may not be an issue for you.

So Jean-Michel has already mentioned the flexibility of QuickTime. And also the responsibility as you as professional video application developers have in sort of constraining QuickTime's flexibility into something that's legal for video. So one concept is if you're doing an edit, it only makes sense in the video world to edit on frame boundaries. But QuickTime doesn't have that concept at all. And I'll show you a quick animation.

So you can see that that arrow is smoothing smoothly across. And it's not moving completely smoothly. But it's jumping by the granularity of our Time Scale. But pretty much, it allows us to perform edits at an arbitrary time within a frame. So as I've already mentioned, and I'll show you the animation will be on the top of this film strip -- that in the video of the land, you can only edit at frame boundaries.

And that's how can we move that in a sort of a quantatized (sic) way. And it's up to you, again, to make sure that you're doing the proper thing. So the next concept that's important to remember has to do with frame selection. And again, I have another animation. Maybe I'll bring that up first.

So here you see as the edit point moves between basically selecting times specifically that correspond to the beginning of one frame and the ending of the previous frame. So you should all remember that within QuickTime, QuickTime will always choose the frame that's beginning at that time. And we represent that here as being selected in orange.

So Jean-Michel also mentioned data references and references movies. And reference movies are a powerful and perhaps underutilized concept within QuickTime. So pretty much, just a review, it allows you to have a QuickTime Movie complete with edits and compositions and any number of other kind of effects. But have all the media be somewhere emphasis.

That somewhere else could be other movie files, and those movies files could be basically anywhere on the Internet. Which is hugely, hugely flexible. Really, allows you to do some interesting thing as well. You can have any under of edits. I mean, you can keep all of those. Because those would be represented in their own reference movies. And those would be really pretty small.

Each of those movies, again, are going to be small. Each of those different edits, basically doesn't effect your original content at all, your original media. So that's how we can say that they can be done non destructively. Now, with that flexibility comes responsibility. Being small, you can mail those reference movies to other engineers, you can back them up. But it's really important for either your application or for your application to notify the users of those responsibilities.

And that is either before handing the reference movie back out to the end user, you want to convert it to a standard movie using a process we call flattening. And basically that brings back in all the media files. Or alternately, you want to make sure that your end users are aware that the reference movie alone isn't complete, and that they have to keep the reference -- the media that's referenced along with that movie. So -- powerful, but there are trade offs.

So, now I'd like to talk about rendering just briefly. And rendering is really a process where you take your content in. It's going to be decoded by a codec basically into a pixel format that we've talked about. And we're going to talk about again in a moment. And then allows you to do some value add. QuickTime can do all sorts of things to those pixels. It can apply effects, it can do competitions, compositing, I should say. And other things. And it's also another opportunity for your application to add value. Processing it in any manner of ways.

And then you go back out again. Back out to a QuickTime Movie. So the point I really want to make here is that up until QuickTime 7, that's the QuickTime that was released corresponding to the Tiger release. Up until that time we were using this GWorld rendering model.

So the visual context has replaced that model. And if you happen to be using the GWorld model, then please, it's time to switch off to the visual context model. And if you're new to QuickTime, then welcome. But this isn't a good time to go and investigate the GWorld model at all.

And I should make a plug for one thing that QuickTime is -- it's an old technology, and it's a mature technology. There are a lot of wonderful demos and source codes up on the Internet and elsewhere. Please be very careful when you're looking at those examples to make sure that you're using the latest and best practices.

So now we're going to take a look at pixel formats. And this is actually an area that we get a lot of questions about. And there are some aspects that are a little bit confusing. So I am going to talk about two aspects. The first is this concept of video range. And the next is something called full range.

Both of those are defined in the Rec. 601 video standard. They're pretty much -- that standard talks about how to take analog video and represent it in a digital space. So in terms of video range data, all the legal video values -- I'm talking about in this case an 8-bit representation. So of course, 8-bits, of course, represents values 0 to 255. However, in terms of video range only the values 16 to 235 contain legal video data.

The values beneath 16 are actually superblack. And superblack is -- it's an old analog video word. But pretty much superblacks aren't used to represent content. They're really used for switching and other kinds of meta purposes. The values above 235, 236 to 255, are what's kUnown as superwhite. That's another old analog video concept. But in this case, you can have certain content in the superwhite range.

Basically, you can have brief excursions as they say into superwhites. Maybe as the camera tracks across the sun or something like that. But at least in the old analog days you were limited to how much superwhite you could have on the screen at one time. Otherwise, your video transmitter might blow up or your receivers might blow up. It was bad. It is possible for some content to be out in that format.

So just to be clear, if you're dealing with pixel format that's in video range, the value 16 is black. It's as black as you can get. And that corresponds to, let's say an RGB full range pixel format's zero. And the same with 235. 235 is really as white as you're going to get. And that corresponds to your full range RGB value of 255.

Now, most of us are computer users and we feel comfortable in RGB formats. But they can cause pieces of confusion as you import and export through RGB, basically, it's not without a loss. It's not without repercussions. So the most important one, perhaps, to some at least, is that there is a potential. And actually, you can and will lose your superblacks and superwhites as you convert from a video format into a full range RGB format.

And also you can introduce artifacts. I'll go into details in a moment. But these artifacts are typically called contouring. Pretty much if you have a gradient, maybe it's somebody's face. You can see areas that look like they're equivalent. It's also called posterization It's not the best possible effect. There are ways to mitigate it. But you see should be aware of this.

So if we go from your video range, our video values, 16 to 235 gets stretched out to 0 to 255. And the areas that were superblack and superwhite now are not representable. You'll see that we make those areas disappear. And we've introduced these black bars. And these black bars are what I call stretch marks.

If you think about it, if you had a perfect gradient, basically a gradient that increased by one value between the 16 and 235 range and you stretch it out to 0 to 255, that gradient is going to have to -- if it's in the same quantitization, I should say. It started out as 8-bits and you wound up at 8-bits -- the only thing you can do is every once in a while repeat a value. And those I have represented as those black lines. And those do and will show up as contouring in your image.

Now if we take our full range RGB data and squish it back down again, we're switching the value 255 into the value of 16 to 235. And I'm drawing these white bars now. And I call those wrinkles. And there you get the same kind of effect. If you had that gradient that only skipped by one from the value of 0 to 255, then every once in a while when you had squozen (sic) it into 16 to 235, instead of skipping by one by one by one, even once in a while it's going to go one by one by two -- by one. And those also create visual artifacts.

And the last point as you can see by the dotted lines around the video range is that even though you're in video range again, representation, pixel format that does support superblacks and superwhites, that there's no information to fill in. Because the RGB format didn't have it. So these values basically will never be used.

So there's also an issue of color and color gamma. What you see here are two representations. You see the NTSC, and actually I believe that's NTSC 1953 for those who are astute. And there's also the generic RGB color proceed file displayed as well. So that interesting looking chart that its displayed on is basically the C I E space that defines all colors that are humanly visible.

And one of these color spaces is defined by what are known as the primaries. Back in the old days of C R Ts, those were defined by the phosphors for the R, G, and B colors. I know it's a little difficult to make out. But you can probably see red, green, and blue Xs at the vertices of the triangles.

And you can probably see that the NTSC and the generic RGB, again, aren't the same size. I'll zoom this in. It makes it a little bit easier to see. Please remember that the blue is the NTSC and the red is the generic RGB. And now I'll combine them for you. And here you can see that there are fairly large areas from the NTSC that just aren't representable in the RGB space.

And also there are smaller areas that were openly in the RGB space that are not representable in the NTSC space. And there are a lot of things that can happen depending on the exact nature of the scular space manipulations that you do. But pretty much, you get some sort of clamping or squashing of the gammas. So the point here to remember is that if you take a round trip through RGB space, potentially you're going to lose or change the characteristics of your colors.

So back to our professional video pipeline. We're going to continue on with expert and look at these three points. So QuickTime when you export can and will automatically apply color space conversions for you. The most common place that this happens is if you change sizes. If you go between an HD image -- and most all HD representations are in the rec. 709 color space. And let's say you export to an SD size. Most SD representations are NTSC and the more professional way to describe NTSC space as the SMPTE C, or sometimes you'll see it as the 170 M space.

So in the example here, if you take a DV 100 source, and DV 100 is an HD codec, and you export that using QuickTime to the H.264 codec, but at a 640 x 40 size, th@at's an SD size, then you're going to get an automatic color space conversion. You're going to go from rec. 709 to 170 M. The second place that you'll get an automatic color conversion is if your export codec has basically a well-defined color space. In our example we're going to talk about DV.

Don't confuse DV with DV 100. DV is either NTSC or PAL. Let's call it NTSC in this case. So that if you were exporting something that was in HD Rec. 709 color space to the DV codec, then QuickTime automatically is going to perform the color space conversion to make it legal and valid for that codec.

Now we do have a concept that we call color space agnostic. The H.264 is probably the best example of a codec that's color space agnostic. That it will accept either the HD Rec. 709 or the SD SMPTE C color space. So when you export to H.264 there won't be an automatic color space conversion for the codec itself. Possibly for the size. But not just going to the codec.

Now some of our newest codecs automatically tag for you. Jean-Michel described tagging. He mentioned Dumpster and some of these tags. And all I can say is the best thing for you folks to do is to understand what codecs you're using, and what their behaviors are. And the best way to do that is to use Dumpster. So go export your content, and then examine it using Dumpster. Frankly, look at your source content in Dumpster to make sure that you're getting tags of the and the tags are what you want and expect them to be.

So one thing that's a little bit maybe non intuitive is that the aperture mode. And we're going to talk about aperture mode a little bit when we talk about display. But aperture mode, the aperture mode of your movie will expect -- excuse me -- will influence what you get when you export. So in this case we're going to look at the example of taking a piece of DV content and exporting it to the uncompressed codec, which is also called the 2 vy codec. In two different aperture modes.

So this is our content displayed in clean aperture mode. And you can't tell, but in the non clean aperture region we have two fuchsia bars on either side. And we'll see those in a little bit. But if we were to take this movie and export it while it's in clean aperture mode -- they get this result.

Which looks very much the same. Basically, it's the clean aperture content in the new file. And two things happen. One is the non-clean aperture content. Those fuchsia bars that you can't see aren't there in the new movie. They're just not there at all. Whereas if we were to take the source movie and change its aperture mode you would see those fuchsia bars. Changing the aperture mode on the resulting movie won't have any effect. It's gone.

The other thing to know is that the result of the export is actually to a square pixel. So your pixel aspect ratio is changed. That's a conversion and a process that's been applied to your movie. So you should understand that your movie has been profoundly altered. In more ways that in the nature of the compression.

Now if we instead look at our source movie in the encoded pixels aperture mode, and that's an aperture mode where you see all the content, and you can see that our fuchsia bars are back. And you export that. Then you get a result where all the pixels are maintained and there's been no aspect ratio change.

So now we'll look at the last stage of the professional video pipeline. Which is playback. And playback is actually pretty sophisticated, so we'll spend a little bit of time with it. So we'll return again to look at aperture mode as I promised. And one of the themes we have for this talk is that tagging is really critical. In this case, aperture modes are dependent on the pasp and the clap tags. Where pasp is the pixel aspect ratio, and clap is clean aperture, in case those haven't been defined yet.

So aperture modes, those were defined in the SMPTE 187 standard. And there are three of them. Pretty much the encoded pixel and the production aperture modes are intended for professional use. We'll talk about those in a moment. And then there's a third called the clean aperture mode. And that's really what the end user really wants to see. And again, we'll show that as well. To that, in QuickTime, we add a fourth mode. Which we call classic. And classic is what you get when the content isn't tagged. And it's one of the reasons you do want to tag content.

So to begin with, I will quickly show you how to manipulate aperture modes so you can see what your content actually will look like in each of these modes. This is actually QT player's property panel. And you can bring that up by just typing Command J. So if you select the video track, the presentation tab, and you click the conform aperture mode to check box, then you get a drop down list with the four different modes that you can select between.

So let's go back. This is the same DV movie that we already looked at in the export example. So clean classic aperture mode, again, that's what you get for untagged content. Pretty much we're picking up the dimensions from the track dimensions. This isn't information that's coming from the codec. This is just the way QuickTime had done this since the dawn of time. And pretty much there's no processing at all applied. You get all the pixels, no aspect ratio correction, and this is what you had before QuickTime 7.

You can see the fuchsia bars that I have circled. And zooming in to the corner you can see that the non-clean aperture region actually has pretty much garbage. In fact, garbage could come from your camera, it could come from the codec. But it's really content that's not necessarily intended to be seen. Certainly not intended to be seen by the end user.

Next, the encoded pixels aperture mode. That's a mouthful. So you can see that we're at 720 x 480 resolut ion. We can compare that to these other ones. This is one of the modes I already mentioned that the professional may want. Basically you get all the pixels and no aspect correction. It looks like this. And as you can see, it looks distorted, and you get the fuchsia bars. But for the professional you may want to see it without any additional QuickTime processing.

I will bring up some calipers so we can actually see how distorted it is. I'm not sure on the nature of the projector if this is clear or not. But you can see here that you have it being 390 pixels wide, but I think only 254 vertically. So it's not square. Or at least it doesn't look round, if you were to look at it on a square-pixel display. If you were looking at it on an 11 x 10 ratio, analog monitor, then it will look fine.

So production aperture mode. This is the second of the professional modes. You get the fuchsia bars. But in this case you get the aspect ratio corrected. So depending on what your purposes are, and what the purposes you anticipate your end users using, you may want to be in either of these modes. Or you may allow the user to select between them.

And finally, clean aperture mode. Now we're down to 640 bx 680, which is less than the previous dimensions we saw. Because not only have we corrected for the aspect ratio, but now we've removed those fuchsia bars, the non-clean aperture areas. And we're doing all the processing to this. We're doing the aspect correction and we're removing the non-clean aperture. I should mention -- this is important -- that this is the default for all tagged content. We display all tagged content in clean aperture mode.

And you can see if we bring up our calipers that we actually get a truly round object, 354 x 354 pixels. T So next we're going to talk about QuickTime and sort of described as being color-spaced aware. This is a relatively new feature from QuickTime 7. And there's a lot of confusion about it, so I am going spend a couple few slides on this.

Pretty much what our intention is to provide consistent color across displays. That's what the slide says. But really what that means is we're going to obey the rendering intent of the source media. The source media is in the rec. 709 space, then when it's viewed on a true rec. 709 HD monitor, let's say a professional monitor, it's going to look a certain way. Here, QuickTime is using ColorSync technologies to try to make the result on whatever display you happen to be using as close of an analog to that as possible.

And it does that using two mechanisms. The first is the color tag and as Jean-Michel already mentioned, you can specify the color space of your content using that tag. So that's the rendering intent that I mentioned. And the second is that we used ColorSync to match that rendered intent to your display profile.

So we're going to look at this a little bit more closely too. You're familiar that by default Apple, Macintosh, display buffers are at a 1.8 gamma. But there are all sorts of color characteristics that differ between the models. And it's through the display profile, the color scheme display profile, that we figure out what to match to.

So this is where things get really confusing. And this is where I get a lot of people talking to me personally. Pretty much, interpreting color is really difficult. So the standard way to do that is to use what you see on the display. Which are 75 percent SMPTE color bars.

That's the standard that's been used for quite a long time. But that SMPTE standard is simply an analog standard. It was defined quite a long time ago -- long time before digital representations. If you crunch all the math, and again, this is an 8-bit space, you will figure out that those 75 percent leveuls correspond to a 191 out of 255 value.

And that is if your RGB space is in that native space. So if this is -- let's say there are SD bars in the SMPTE C space. Then this would be in an RGB space, it's also a SMPTE C space. Basically the same RGB primaries. So the white bar would be 191, 191, 191, the same odd percent, magenta bar 191, 0, 191.

Unfortunately, or maybe fortunately, because we're doing a lot of processing for you. Digital color meter will give you another answer. So at least on my monitor -- this is on my MacBook. I actually read back different values. I actually read back 187, 22, 193 for magenta. Which is if you think about it considerably different than the 191, 0, 191 we were expecting. I would say it different in two ways.

One is that -- certainly that red value is low. It's not 191. And then maybe in a more confusing way we have 22 -- that's 22" units out of 255 energy in the green channel. When really we're expecting nothing in the green channel from magenta. So that's -- that's a bit of a mystery.

Well, the good news is that this response is actually correct. That it's actually QuickTime in conjunction with ColorSync trying to make these color bars to appear to you visually to be as correct as possible on whatever monitor you have. And frankly, using the calibration that you may have performed on that monitor. And not to get the right values when you read them back out of the pixel buffer. So that pixel buffer isn't there for inspection. It's really there to produce the appropriate response.

So these are some of the points. And I can spend easily an entire talk just describing these. But these are some of the things that are going on under the hood. And chances are there's more going on there than you are aware of. So I already mentioned that by default the frame buffer is at a 1.8 level. So because it's at a 1.8 level and not video's native 2.2, then you would expect those 191 energy levels to be somewhat lower. And that is indeed what we see So already we're not expecting that 191, 191, 191 value.

So I should mention that QuickTime does use the display profile that's on record. And that means if your professional users actually manually calibratee their monitors, we're going to be using and syncing to the display profile that they've created. Which is a very good thing. And if they've decided to move their pixel buffer off of 1.8, maybe to the 2.2 level, then we're going to use that. Which is also a very good thing.

And I think it answers questions that I've heard from a bunch of people. Now one thing that you may not know is that there's actually a piece of hardware between your frame buffer and the actual display. Because old C R Ts really aren't at a 2.2 level. They're more at a 2.5, 2.6 range. LCDs are at an even different level.

So there's actually a 2-D look up table that provides the mapping between what that buffer is advertised as being, and what the actual hardware is. So really, it means that the pixel values you read back from the buffer are not the actual pixel values that actually go out to the output hardware.

So another aspect is that those pixels that you're reading out of that buffer, they've been already through ColorSync. And that's why the colors aren't pure. And that's what I call pure. In this case, it's really where we got the green energy magenta. The example I use here is red.

That if you were looking at, say, SMPTE C, 75 percent red bar, you would be expecting only energy in the red component. But if you were to read that value back you may see that you have energy in the green or the blue. And that's just the mathematics of ColorSync. Trying to make that color appear to you as close as possible to what it would look like if it was displayed on the true broadcast monitor.

Now the implications for what I just mentioned are kind of profound. The way ColorSync term determines the profile is actually dynamic. ColorSync reads a serial number off that display device. So even if you had a pair of cinema displays sitting next to each other, same size, apparently the same model -- those may actually have different serial numbers, they could have been made by a different manufacturer, come off a different production run, have slightly different color characteristics. The good news is ColorSync is going to automatically create the correct profile for each.

Perhaps the bad news for you trying to analyze your data using digital color meter is that if you're displaying the same image on two monitors you're going to get potentially different values read back. And certainly, if you're looking at it on two different machines, then it's really not comparable. So please don't compare those values. Or if you want to talk to me at the lab tomorrow I can tell you some tricks you can use to allow you to do just that.

And finally, we get a lot of questions about what I call Legacy applications. And right now there are a lot of Legacy applications. And in this case, I mean that to say that there are video applications that are not ColorSynced. They're not ColorSynced, they don't follow the same characteristics, they're going to give you different result. And on top of that they tend to make assumptions. So one assumption they make is video gamma is actually 2.2 exactly.

And that the proper way to display that content is to do sort of the inverse 2.2 to 1.8 display mapping. And that doesn't really follow true video engineering guidelines, where you're probably are going to use a more complicated representation for the reconstruction filter. Incidentally, QuickTime uses for the purposes of reconstruction a 1.961 gamma value. And again, if you have questions about that come see me tomorrow.

The other assumption that's made is that the buffer, the display buffer, is always at a 1.8 gamma. And that simply is mot the case. Anybody that has done a custom calibration of their monitor may have set it to another value. And then they're going to get some very interesting and peculiar values.

So the next topic is the visual context. And I should say that if you're using visual context, that's really where all our efforts are placed to give you basically as accurate results as possible. It's through the visual context that you get the aperture modes implemented, it's through the visual context and Core Video that you get correct HD Rec. 709 up loads so that your 709 colors look correct.

So some of you may be using GWorld or thinking about using GWorld. You really should avoid those. GWorld are based on QuickDraw, I am sure that all of you have heard by now that QuickDraw has long been deprecated. Additionally, as I sort of alluded to, the GWorld colors are simply incorrect. I should also mention that the visual context is actually GPU accelerated and you get all sorts of wonderful processing, largely for free. So you get chroma filtering and some other effects that we're not really going to talk about in this presentation.

So visual context does come at a cost. You may have to implement or reimplement parts of your code. If you want to avoid the complexities, really, the simplest way to get the visual context is to use QTKit's QTMovieView. And if you recall way back when Jean-Michel first introduced the visual context he displayed a diagram that looks similar to the one I am showing. But QTMovieView replaces that whole entire center part for you. If you're interested in more about QTKit, there's a talk coming up in the same room in a little bit.

So, to summarize, you can create really wonderful, professional video applications using QuickTime. But you have to do a number of things. One is you have to really take very special care to -- I say on the slide manage QuickTime's flexibility. But I think part of that responsibility is to understand what QuickTime does, and also understand what you need in the professional video space.

The single most important thing to do is make sure that all your content is properly tagged. As Jean-Michel stated, you really don't want QuickTime to guess, because often it's going to guess wrong. To that end, Dumpster is really your friend. You can take that content, drag it on to Dumpster and actually look to tell what tags are set and how they're set. And also, please use best practices. That really is using the visual context.

So there are two labs I want you to be aware of. Tomorrow morning is the QuickTime video lab. Jean-Michel and myself will be there. As well as some members of our team. That's really the best place to get into the really detailed questions that I am sure you have. It looks like we're going to have some time for Q&A. So we'll be able to field some now. Also, there's a pro applications technology lab. That's Friday afternoon. And that will be your opportunity to talk to David Black in depth as well.

We do have more information online. The best place to find this information -- and you don't have to copy this down, I am sure you all have it -- is at the attendee Web site. And another really wonderful reference is the Ice Floe 19. You can try to copy down the URL. The easiest way to get that information, though, is just to Google Ice Floe 19. Please remember that we're remember that we're spelling flow F L O E, and not O W.