Next-Generation Video Formats in QuickTime - WWDC 2004

Graphics • 1:05:33

Modern codecs demand a more complex infrastructure in order to support advanced encoding techniques. Find out what kind of changes are occurring in QuickTime movies and how your applications and video codecs can take advantage of these new techniques.

Speakers: Tim Cherna, Thomas Pun, Anne Jones, Sam Bushell

Unlisted on Apple Developer site

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.

Good afternoon. Welcome to the penultimate session for WWDC and the ultimate QuickTime session. Today we're going to be talking about next generation video formats in QuickTime. My name is Tim Chernia. I'm the manager of the QuickTime Video Foundation team. And the QuickTime team is going to be talking to you about H.264, AVC, the new video codec that we're shipping in QuickTime, the next version of QuickTime in Tiger, and the technologies that are required in QuickTime to support H.264, and as well as changes that were required to support IPB video frame coding.

We're going to talk a lot about what that actually is. So you're going to see a bunch of abbreviations in my slides, and it'll be a little bit strange. So let me talk about the QuickTime video technologies before Tiger. Basically, we have this software stack, and at the top of the software stack is the movie toolbox.

The movie toolbox is used for creation and editing and navigating movies. It's also used for playing back movies, stepping through movies, and so it's typically what the highest level applications use. Now, the movie toolbox is using the video media handler to sequence video frames from files or from a network device to the image compression manager. An image compression manager is a service within QuickTime that deals with compression and decompression.

And so we create these compressor components and decompressor components underneath the ICM, which are created by Apple and third parties, and that serves as the codec model. And there's a base codec, a base decompressor, which helps implement decoders. And we recommend you actually use that, and you'll see today that it's actually essential for the new decompressor formats.

So, before Tiger, there were some limitations in the movie toolbox and the video media handler in ICM. This wasn't really a big issue because most of the codecs were either iframe or keyframe codecs, such as DV or motion JPEG, or difference frame or IP coded videos, such as MPEG-4 simple profile, CinePack, the Sorenson codecs, etc.

What we didn't support is the more complex frame ordering, IPB frame ordering that's used within H.264 and MPEG-2, MPEG-1. Now, we do support MPEG-1 and 2, but that's via the MPEG media handler, not the video media handler. It's a different code path, and we're not changing that code path in Tiger.

So today we're going to cover some fundamentals about the new H.264 video codec, details about what IPB is and how it differs from I and IP. By the end, you'll really know that. The user level impact of the changes that we're doing in QuickTime 6.6, changes to the movie toolbox for navigation editing, and of course the changes to the ICM to support these new kinds of video codecs. With that, I'd like to bring up Thomas Pun to talk about H.264.

Hi, I'm Thomas and I work in the Video Codec team in QuickTime. Oh, actually, back to the side first. So today I'm going to briefly talk about H.264. I'm sure you guys have all heard about it throughout the whole week. I'm just going to recap. So H.264, what is it? It's a joint effort from the two biggest organizations regarding standards.

One is the ISO and the other one is the ITU. They brought us video standards such as MPAT-1, MPAT-2, and also H.261, H.263. Now because it's a joint effort at different stage, they seem to give this codec a different name. So you may also heard MPAT-4 Part 10, JVT, H.264, and also AVC. Now it's standardized in last year, so it's a very recent addition to the video standard.

and it has all the new technologies and it works very well at various bitrate all the way down to 3G all the way up to HD. And because of that, it has recently been chosen for one of the video codecs for HD DVD and also 3GPP standards as well. And because QuickTime always stands behind standards and it will for sure become a new video codec in QuickTime. and with that I'm just going to show some demos. Can you go to your demo machine please?

Now, as I said, it has been recently chosen for HD-DVD, and let's see how it looks at HD resolution. I have a clip here, a sender. It's encoded at HD resolution 1280 by about 550. And this is actually only at 6 megabit. We couldn't really do that with MPAT 2 at this kind of quality before. So let's see. I'm going to play the whole clip.

[Transcript missing]

So that's how it looks at 6 megabit. And if you actually play with our encoding video, you notice at the beginning, the sandstorm is actually really hard to code. And the codec does it really well. Another demo that I'm going to show is to give you an idea of how it compares with existing standards. One that I chose is MP4. So I'm going to open this file first.

Now when we try to compare with different standards, we usually use three guidelines. You fix the bit rate, you have quality, how does the quality look, how does the bit rate, so one could be a lot bigger, twice as big. The other one is how much information you actually pack in the stream. So that goes with frame rate, with resolution or frame size. So here I have two clips, both of them encoded at megabit.

The H.264 one is about four times as big as the MPAT-4 one that we shipped earlier. So I'm going to play this. I'm just going to play a short, about 30 seconds. I'm going to play a movie. So the quality is about the same, except that the rest of the... ♪ ♪ Thomas Sykes, Spooner.

[Transcript missing]

We can play the whole clip later if anyone wants to continue. So can we get back to the slide, please? So what makes this codec a lot better than, say, MPAT 4, MPAT 2 that we already have? So here's a big table that I'm sure some of you may have seen it from another section. What I really want to point out here is three things.

The first one is there's really not one single technology that gives you all the gains. It's a combination of advances in different fields, different technologies. Most of the technologies that we use in H.264 are based on technologies that we already have, like 10 years ago. So we have a lot more improvement and we know how to use the technologies a lot better.

Second thing is I want to bring out is, as we mentioned earlier, that, for example, MPAT 2, it took us a couple of years to try to get the best out of the MPAT 2. And H.264 is a very new standard. It's just standardized last year. So you should expect the quality of H.264 stream getting better and better once we know how to use the tools even more efficiently.

And the last thing is, a lot of the technologies here, I'm not going to go over all of them because it's boring. But most of the technologies, what it really means is it's self-contained within a codec. So your codec just have to implement it, for example, a different transform, different way of packing the bits in the streams. But there's one particular technology, which is... and the IPB frames, which is the third column down.

With H.264, you have a lot more options. It's very flexible. You can do almost anything you want. The simplest form is about the same as the previous MPAC, IBP, which we're going to explain more. But if you really need to take advantage of it, then the higher layer will require some changes, and thus QuickTime will have to make some structural change as well. And that's going to bring it to an end, which is going to talk about what changes in QuickTime are needed to support H.264.

So how are we going to deliver H.264 in QuickTime? We've added four new H.264 specific components, a compressor/decompressor and a packetizer and reassembler, and made a whole lot of changes inside the infrastructure to support these components. with Tiger, applications will be able to play back H.264 content. They can also play back H.264 streams. And if they use the high-level movie toolbox APIs, they'll be able to do this without any changes in their apps. In addition, to QuickTime movie files, we'll be able to store these H.264 streams in MP4 files and 3GPP files.

and the streaming realm will fully support H.264 in that you can play back H.264 streams, you can take H.264 content, hint them, put them on the QuickTime streaming server and stream them to clients, and you can broadcast using QuickTime Broadcaster. These H.264 streams are in the standard format as defined by the IETF. On the authoring front, applications will be able to edit H.264 movies, and if they call the high-level movie toolbox APIs, they'll be able to cut, copy, paste with no changes. They'll be able to produce H.264 content and store them in QuickTime files, also MP4 files, 3GPP files.

So if you want to compress H.264 content, you can do it using the movie exporter components. And if you call these components, we've modified them so that they can generate H.264 B-frame content. Now, if you call std compression and the ICM APIs yourself instead of the movie exporter APIs, then H.264 will show up as a new item in the codec list.

However, B-frames won't be enabled by default. And in order to get B-frame content, you'll have to opt in to B-frames by calling new APIs that we'll describe in this session. If you call the sequence grabber APIs and you want H.264 B-frame content, you'll have to call std compression and compress the frames yourself.

So what's in the seed? With the seed, you'll be able to play back H.264 streams and edit them. We've been working really hard on H.264, but we haven't fully integrated it into all our exporters yet. But we really wanted to get you something in your hands, so we've included a preview H.264 exporter. It appears as a new menu item in the exporter list, and it does support multi-passing code.

One thing to note is that it produces an interim format right now. The format is guaranteed to change before GM, so don't produce any content that you want to stick around for a long time and be able to play it back, with the seed anyway. The APIs we talk about today are in the seed, so please try them out. And a couple things, H.264, there's a lot going on there, so it requires a G4 or G5. And also the seed doesn't contain a compressor, packetizer, or reassembler yet.

So, say you want to take advantage of H.264 in your application. What do you have to do and what do you have to change? Well, what you have to change depends on what level of APIs you're calling. If you're calling into the high-level QuickTime APIs, then chances are you don't have to do anything in order to gain support for H.264 in your app.

However, if you call some of the lower-level APIs, such as the media level, you might or might not have to change your application depending on what specific APIs you're calling. And if you access the sample-level APIs yourself, if you access the samples yourself, you'll have to change your app.

So as I said, if you call the high-level APIs, you'll be able to gain access to H.264 with no changes to your app. And some examples of the high-level APIs are the various views that QuickTime provides, such as the new Qt Movie View, part of the Qt Kit, the new HI Movie View, and the older Carbon Movie Control.

If you use those views, you're all set. You don't have to do anything in order to use H.264. If you call the movie and track level APIs, you'll still be able to play back, step through the movies, edit them, navigate through the movies without any changes in your app. So let's have a look at that.

Okay. Here I have a H.264 movie that I've compressed, and if I drop it on the currently shipping version of Adobe Go Live, I can go and it opens as expected and it plays the video as expected. Turn to page 394. Okay, and just for fun, I can bring up this timeline editor in Go Live, and if I click around in there, I can click around in the movie and step around in the movie, and that works.

I can also take this movie into QuickTime Player, select a portion of the movie, go over here, copy that small portion of the movie. And this is the currently shipping version of Word. I can create a new document and if I go and paste, then that small section of the movie is pasted into the document and Word will play it back. There's something moving out there. It was a Dementor.

So use the high-level APIs if at all possible because when you do that, with each new release of QuickTime, you'll gain a lot of new functionality and usually you won't have to make any changes to your app and they'll just magically work. Can we go back to slides?

Okay, so if you can't just call the high-level APIs and you have to call some of the lower-level APIs, well, in order to use this B-Frame content, you might have to change your application. So if you're calling the media-level APIs and you call APIs that don't reference time, durations, or sample flags, then those APIs haven't changed. You don't have to do anything.

However, if you do make calls at that level that reference time duration or sample flags, it won't work with the new B-frame content and you'll have to change your application. You obviously can still use those APIs. It will still work with content that doesn't contain B-frames, but once you start trying to use it with B-frames, those APIs will return errors.

And we've added some new errors so you know that that's the cause of the problem. Instead of using those older APIs, we've added some new APIs for you to use. If you use sample references, then we've added a whole new set of Qt sample table APIs, which I'll describe later. And for everything else, we've added similar-looking APIs, which I'll also describe later.

Okay, and one last thing before Sam comes up. I want to stress that these new APIs work for content that contains Bframes, but they also work for all the other content too. So please switch to them whenever you can. and here's Sam to talk about B-Frames. - Thanks Anne. - Thank you.

Hi, I'm Sam. Let's talk for a moment about video compression technology. Lossy Video Codecs provide you with a tradeoff between quality and bit rate. If you want more quality, you need to use more bits. If you can't use so many bits, you might have to accept a lower quality.

And we're constantly trying to improve this quality curve and move it towards A higher quality at a lower bit rate. And we do this by adding more tricks. As Thomas said, many of these tricks are self-contained within the codec, but some of them require awareness outside the codec in other parts of the system, other modules. And that's what we're going to talk about.

So, suppose you had some video that you wanted to compress. Here's a clip of some guy parking his car. It's prosaic, but this is educational. So we could encode each of these frames independently. If we did this, this is called spatial compression because we're only compressing in the spatial domain.

If every frame is self-contained, we call it key frames, we call it sync samples, we call them iframes. I stands for intra. Random access is fast, which is good, but the data rate isn't so good. Because if we're compressing everything independently, we're not taking advantage of the similarities between frames.

I've got frames 4 and 5 of the previous 6 on the screen here, and you can see that the tree and the building are practically the same. And the car has moved a little, but it's mostly the same. So we can improve compression performance substantially by using one frame as the basis for describing another frame. And the jargon for this in codec terminology is temporal prediction.

The way it works is you start off by saying these are the areas of the new frame that are similar to areas of the old frame. For example, in the example that I've got here, we're describing frame 5 in terms of frame 4. So first, in the yellow parts of the screen, we're saying these pixels are more or less the same as the pixels in the same location in frame 4. And then the green part, that's where we're saying these pixels are like the frames if you just move over so many pixels to the right.

But these are only first approximations. There's still a fix-up that has to be added because the wheel is turning and the reflection doesn't move with the car. It sort of seems to stay in place. And so you can see that there's an additional image that must be added as well.

This is called the residue. The first part is called motion compensation and the fix-up is called the residue. And you'll notice that there's a strip of that car that is in frame five that wasn't there in frame four. And this part might need to be coded from scratch, encoded from scratch.

So this is what we get if we encode the last five frames out of those six as a motion compensation piece and then a residue. We call these difference frames or P frames. P stands for predicted. Well, we get better compression because motion compensation can be described extremely compactly relative to describing something from scratch. And as a result, the bit rate that we get is a whole lot better.

There's something else that's worth paying attention to here, which is that each of these frames The encoded frame can only be interpreted with reference to the previous one, which means each frame in a way depends on the previous one. If you want to Decode and display the last frame in this sequence and you haven't decoded the previous frames, well you better go and do that right away. So random access into a sequence like this could be somewhat expensive.

So when we have iframes and pframes, or keyframes and difference frames, This is what it's like. We call it IP for I frames and P frames. And it gives you much better compression than I frames only, but random access can be somewhat slower. For example, if the key frame rate is 20 frames, you might have to decode 20 frames before you can display the one that you want to see.

Another thing to pay attention to is that gradually appearing images are constructed incrementally, like the car in this clip. The image of the car that you see in frame six was constructed out of strips in five different frames. This might not be the most efficient way of doing things.

So let's introduce an alternative. What if we encode the first frame in that sequence as an iframe, self-contained, and then go all the way to the end and encode frame 6 as a pframe based on that iframe? Well, if we'd done that first, then we can encode all the frames in between using motion compensation, part from the previous frame, that's the yellow piece, and part from the later frame, which is the blue piece. And you can see that these frames are almost entirely motion compensation.

Very little residue to encode. Here's what it looks like if we encode our six frames, which with all of the four frames in the middle encoded as B frames, which stands for bidirectional prediction, based on the frames at the end. Again, these four frames in the middle are almost entirely motion compensation.

and another thing to notice about them is that random access can be a bit faster. For any of the frames in the middle, starting from scratch, if you needed to display those, you only need to decode three frames, the one at the beginning, the one at the end and the one in the middle.

So these are B-frames. They refer to information in a future frame as well as perhaps information from a previous frame. And the good news about B-frames is that they let us enhance the compression quality, lower the bitrate even further. There's two benefits. You get better compression, especially when objects appear gradually, for the reasons that we've described.

And also, random access is faster. As I illustrated, accessing any of those frames, the worst case for random access is having to decode three frames. Another example to think about is if you're playing in fast forward. You could skip the frames you didn't need to display if they were B frames. You wouldn't have to decode them at all. The jargon for this is temporal scalability.

But there's something tricky about B-frames. The decoder that's displaying these can only use motion compensation from frames that's already decoded. If one of those frames is going to be displayed later, then that means the order in which frames are decoded and the order in which frames are displayed is different. So the frames have to be reordered somewhere. And this reordering is why your application might need to understand B-frames.

So some of you have been working with IPB codecs for some time and this is no news to you. But I wanted to speak to you guys for a moment because there's an important point that I want to drive home. With some other IPB codecs, you can implement playback using a small finite state machine, which is driven with different transitions for iframes, pframes, and bframes.

And this works for MPEG-2 because only one frame can be held at a time. There's only one future frame that would ever need to have been decoded but not displayed. And this is not true for H.264. The standard for H.264 allows up to 16 future frames to be held.

In fact, H.264 allows the encoder an enormous new amount of flexibility in how it chooses to find material for motion compensation. P and B frames can depend on up to 16 frames. Not all iframes reset the decoder completely. We have a new tag for those. The name in H.264 is IDR frames, which stands for Instantaneous Decoder Reset, if you care.

Some B-frames can be used to provide material for motion compensation, so not all B-frames can be skipped. And some I-frames and P-frames can be skipped because they don't count for motion compensation. So, as on the left, you can see that the pattern for MPEG-2 is fairly regular and, in fact, you can entirely derive The dependency graph of the frames, just knowing the frame letters, and that's how the finite state machine works. Everything can be worked out from the frame letters.

But with H.264, the encoder is free to do things in a much wilder way and just knowing those letters, those frame letters, doesn't let you derive the graph. In fact, as you can see, I didn't know that you'd really want to try and store that graph unless you were the decoder itself. So. The new rules, if you want to work in H.264, it's no longer sufficient to use the frame type letters to derive frame dependency information and the dependency graph. Instead, you should pay attention to four things.

First is a frame of synchronization sample. Not all iframes are sync samples. This is because an iframe may not, if you decode an iframe, that may not prime the decoder with all of the motion compensation material that it'll need in P and B frames that follow it. So instead, you only want to pay attention to whether a frame is a sync sample, which is equivalent in the new world to an IDR frame.

Number two is a frame droppable. Now some B frames are not droppable and some INP frames are. And that's the information that if you're outside of the codec that you really want to know. You want to know whether you need to decode that frame in order to get random access.

Number three. What order must the frames be decoded in? And sometimes it's also sensible to include information about what time the frames should be encoded at, decoded at. Number four, what time should each frame be displayed at? And this is how we know how the frames are reordered.

to summarize. Dependencies between frames are getting weirder, but it's all in the cause of improving the quality versus bit rate trade-off. Number two, IPB means someone needs to know about frame reordering, and if you work with the compressed media, it could be you. and three, some of the convenient rules, things like the one frame delay and the ability to build this little finite state machine, although they're okay for MPEG-2, they don't hold for H.264. Back to Anne.

So what changes did we have to make in the movie toolbox in order to support B-frames? Well, first we had to change the file format. For those of you who care and parse the files yourself, we've added four new tables in the QuickTime files when there's B-frames. One other thing to note is that samples are stored in the files in decode order. Now, they've actually always been stored in the files in decode order, but decode order and display order were always the same before, so you couldn't tell the difference.

We've added a bunch of new APIs to distinguish between decode time and display time because, as Sam explained, with B-frame content, they're not necessarily the same anymore. and some of those APIs take something called a display offset. And a display offset is simply just the difference between the decode time and the display time. And just note that sometimes display offset is a negative number.

Okay, so for example, where you used to call sampleNumToMediaTime, if you're processing B-frame content, that call is going to return an error. So instead of calling that, you should call either sampleNumToMediaDisplayTime or sampleNumToMediaDecodeTime, and which one you call depends on which time it is that you want. We've added a whole bunch of new sample flags and increased the size from 16 bits to 32 bits.

Most of them are optional, but the main one that you need to know about is media sample droppable, which usually but not always indicates a B-frame. If you need to know whether a movie or track contains B-frame content, don't hard code in if track is encoded with H.264 because not all H.264 movies necessarily contain B-frames. And we might add new codecs in the future that use B-frames. So instead, call MediaContainsDisplayOffsets.

Okay, so if you're using sample references, we've added a whole new set of APIs, the Qt Sample Table APIs, for you to work with these. Qt Sample Tables represent media sample references in a movie. They're reference-counted, so use retain and release, similar to other Apple APIs. And you can use these APIs for all media types, so audio, video, text, et cetera, not just video and B-frames.

are working on a new model in order to get sample references out of a movie called Copy Media Mutable Sample Table to get the sample table. From that, you can get the number of samples in there, and then you can index through the samples and get information about each sample, such as data offset, sample flags, a whole bunch of things. And these samples are in decode order, same as stored in the files.

In order to add sample references to a movie, call QTSampleTableCreateMutable to create an empty sample table, and then add your sample references to the sample table similar to the old addMediaSampleReference call. And when you're done, call addSampleTableToMedia to actually add the sample references to the movie. We've also included a whole bunch of more advanced sample table APIs, so if these aren't quite what you need, you could have a look in the headers in the documentation to see if what we've provided helps. And here's Sam to talk about changes in the ICM.

I'm still Sam. So Anne's explained that if you call high-level APIs, you might not need to worry about B-frames because they might not make a difference for you. But if you write the kind of application that deals with compressed frame data yourself, or if you write a codec, then there's no hiding this information from you and you wouldn't even want to anyway.

So let's talk about how the APIs at this layer might need to make -- well, how they've been changed to support B-frame codecs. There are three things that are missing from the current APIs in order to support B-Frames. Frame reordering, new frame information like the droppable flag, and multiple buffers in flight at once.

The Image Compression Manager provides APIs both for compression and decompression. It provides high-level client APIs and also defines the interface to decompressor and compressor components underneath. Let's go through each of these in turn. In Tiger, we're introducing a new multi-buffer API for compression. We're also extending the existing GWorld-based decompression API to support B-Frames.

And we're introducing a new multi-buffer decompression session API. These new multi-buffer APIs are based on core video pixel buffers. Underneath, we're introducing a new multi-buffer API for compressor components, and we have extended the decompressor component API to support B-frames. So if you write code that works at any of these levels, then you'll want to look at this stuff.

[Transcript missing]

The existing GWorld-based API is one frame in, one frame out. What this means is that the compressor has to give you back the compressed data for frame one before it'll get the image data for frame two. This makes it really difficult to reorder frames. Also, The current compression API is almost completely unaware of time.

The new compression session API is based on core video pixel buffers instead of GWorlds. If you're using a new style compressor component, a B-frame aware compressor, then multiple buffers may be in flight at once. This allows the compressor to reorder the frames in order to encode B-frames. It also allows the compressor to have a look ahead window for better rate control. Timestamps can flow all the way through the compression chain. And the new API supports multi-pass encoding.

So whereas in the previous API, with the GWEL-based compression API, you draw each frame into the same buffer, and then each time you'd pass that buffer off to the ICM. With the new API, you take a fresh Core Video Pixel buffer each time, put your source frame in it, and then pass that over to the compression session. And it'll retain those until it's done with them, and then it'll release them. So you can release the buffer as soon as you've passed it to the compression session.

So this uses standard retain and release semantics. And you could just allocate these each time if you wanted. But mapping and unmapping these large pieces of virtual memory that you use for pixel buffers-- can involve some memory management overhead, and that can be somewhat expensive. So we have a pixel buffer pool as part of Core Video that does efficient recycling.

So this is how reordering happens. You push source core video pixel buffers in display order. and the session will call a callback function that you provide with the frames that have been encoded in decode order. The session will also call you when it's releasing those pixel buffers, so you can perform your own frame buffer recycling if you want.

In some cases, you might not want the compressor to hang onto too many frames at a time. Perhaps you're a networked application, like a video conferencing application, and there's a maximum latency before you have to send those frames out over the network. Well, in those cases, you can set a maximum number of frames that the compressor is allowed to hang onto at once, and you can also make an explicit request that forces the compressor to finish encoding the frames that it's currently hanging onto.

Their new compression session API has a bunch of other features that make it a big jump forward. It's easier than before to add encoded frames to a movie. You can use the fixed or flexible GOT pattern if you know what a GOT pattern is. It's not politics, don't worry. You can set a CPU time budget.

You can set data rate limits. And as I said before, it supports multi-pass encoding. In fact, the movie exporter that's in the Tiger seed supports multi-pass encoding as well. In the final version of Tiger, the compression session API will be compatible with existing compressors, but no B-frames will be generated.

However, in the Tiger seed that you've got, it's not yet compatible with existing compressors. And also, we don't have an H264 compressor. So it would be a good idea to try and get on our seed program if you want to try and exercise this API. What's next? Let's talk about what's underneath the compression session API, which is the new compressor component interface. New style compressor components still use the four character code IMCO, but they support three new component calls for B-frames. And if you want to opt in for multi-pass support, there's three more APIs to implement as well. Let's also talk for a moment about decompression.

There's two flavors of decompression API that we have in the GWELD-based mode. We have synchronous APIs, and these are all one frame in, one frame out. We also have a second mode for decompression, which is called Scheduled Decompression. And with scheduled decompression, you can queue multiple frames, each of which with a frame time. And when that time arrives, the frame is triggered and we decode it and display it.

With B-Frames, as we've gone through, the decode order and the display order can be different. In fact, you may need to decode several frames before you come to the first frame to display. So immediate one-frame-in, one-frame-out APIs aren't a good match. Let's look at the example of the little clip of that car parking.

In the decode case, the first frame happens to be the first frame both to display and to decode. So we decode it and then we display it. But the second frame in decode order doesn't need to be displayed until time 60. But it does need to be decoded before the next frame in decode order, which is the frame at time 20.

and it needs to be decoded before the frame after that, which is at time 30, and the one at time 40, and the one at time 50. After that, it's okay for that frame to be displayed at time 60. The new model for doing decompression is that you always queue frames in decode order.

And then you provide the display timestamps so that we know how the frames should be reordered. As before, frames can be scheduled against a time base, in which case the frames will automatically be output according to that time base when that trigger time happens. But we have a new mode called non-scheduled display times.

In which case, there's no time base and you have to make an explicit call to the ICM to say, "I would like this frame back." You can also optionally supply decode timestamps, which are a hint saying this is when it would be a good time to decode that frame.

So many of you will have loops in your code where you

[Transcript missing]

Like I've been saying, immediate mode, one frame in, one frame out, is very awkward for B frames, at least if you want to get the frames out in display order, which is the order that makes sense to the user. So we need to enhance this a bit.

So here we have an outer loop and an inner loop. The outer loop queues frames in decode order, and the inner loop retrieves frames in display order. So frames go in in decode order and you pull them out in display order. And there may not be a one-to-one correspondence, so that's why we have the outer loop and the inner loop.

The inner loop isn't going to be run many times. I'll show you in a second. One other thing, because you're queuing multiple frames, you need to load them into multiple buffers. These aren't called video pixel buffers, these are just data buffers. And the ICM will call you back to say when it's time to release those because the codec no longer needs them.

You can do this both with the existing GWeld API, and we've also introduced this new multi-buffer core video pixel buffer based API called Decompression Sessions. Now the Decompression Session API does not support any drawing operations. It doesn't do clipping. It doesn't do matrix transformations. It doesn't do transfer modes or any of that other guff. Instead, it just gives you the buffers in the format you want.

There's a flavor of this API that supports sending the buffers directly to a visual context. And there's one that just gives you the buffers back. So, let's do a demo. Wake up!

[Transcript missing]

So I have a little clip here. I like the Harry Potter trailer, and I clipped out a little bit of it in the middle. It's not very long.

It plays in display order. See, everything's moving forward. The bus is moving down the road. QuickTime Player has not been revised to understand B-frames. So like Anne said, If you're using the high-level APIs, you don't need to change. What's more, I'm going to step through this by pressing the arrow key, and you notice that the frame is moving forward.

So what's happening here is we are telling the movie to move forward to new movie times and rendering the frame for each of those times. If you do that in your application, if you step through with something like setMovieTime and moviesTask, if appropriate, then you don't need to worry. You'll get the frames out in display order. Everyone will be happy.

and David Schreiber. That's a great sound. But this movie does have B-frames. The frames are being reordered. I have a copy of Dumpster here. It's great when we get to demo Dumpster. Many of you will know what Dumpster is. Some of you won't. Dumpster is a tool that shows you the internal structure of movie files. Actually of the movie header. Not the place where the media compressed data is, but the movie header itself.

And we've modified Dumpster, this version of Dumpster, to show you the new B-frame tables that Anne was mentioning. You probably can't see the details, so I'll just pop them open so you can... take my word that they're there. There's a piece of information here that tells you the size of individual frames.

These are numbers varying between 40 and 80 kilobytes. It also stores the timing information. Each of these frames has a duration of 1,000. And this is from film, so the speed is more or less 24 frames per second. And so the time brace here is something around 24,000.

and each frame has the same duration, 1000. We also now store, this is a new table for Tiger, you also store the display offsets and you probably can't see but the first one is zero, the next one is a thousand, then minus a thousand, then a thousand, then minus a thousand.

What does this mean? Well, if you think about the durations which are now interpreted as decode durations, the frames are at times zero, 1000, 2000, 3000 and we add the display times, we add the display offsets to those to find the display times and we'll see zero, 2000, 3000, 2000, 3000.

2000, 1000. These first, the second two frames are reordered, they're exchanged in pairs. And the same for the next one. So, Here's the difference between the decode order and the display order. I'm an adopter. Here's the difference between the decode order and the display order. I'm an adopter. Here's the difference between the decode order and the display order. I'm an adopter.

[Transcript missing]

This is a little command line tool which was included in the disk image for this session. It's a command line tool that steps through a movie and does that new kind of loop that I was describing. It calls the new decompression session API to decode each frame into a core video pixel buffer.

And it takes command line arguments. I've made it scale the frame down. It's so big there's not a lot of space for the debugger. And it also takes a command online path to the movie file. So let's try it out. I'm pausing in between the frames so that you can see them so they don't race past. Wasn't there something odd there? Did you see that? I'll play it again.

and Harry Potter. Where did he come from? He wasn't there when we opened this movie in QuickTime Player. What's going on? Well, This little tool is going down to the media layer of the movie and it's accessing the sample table directly. But when you do that, you're circumventing the higher layers and at the track layer there's a thing called the edit list. And when I cropped that little piece out, copied and pasted it out of the longer movie, I had edited out the frame of Harry Potter.

But it's still inside that sequence of frames at the media layer. So we're going around the edit list. If we wanted to make the movie, set through the movie in the same way that it appears to someone who's using it, then it would look-- we'd have to put a bit more code that would have to walk through the edit list. Or an easy way would be to do that, what I did with the arrow keys, stepping through the movie. So briefly, let's have a look at this in the debugger.

will talk a little bit about this code and see how it works. So we're doing some things that aren't very special about B-frame movies, and I'll skip over those. We are opening the movie. We are getting the video track, we're getting the image description, and then we're making a window that's the scale size of that image description. There's the window in the background there. Nothing big yet. Then we're making a decompression session. I want to show you this.

Local variables, love them. So we're creating a pixel buffer attributes dictionary. This is how we're describing the kinds of pixel buffers we'd like to get back from the decompression session. And we give it the width and height that we'd like it to give us buffers in, because we would like them to be a specific width and a specific height. Local variables, love them. So we're creating a pixel buffer attributes dictionary. This is how we're describing the kinds of pixel buffers we'd like to get back from the decompression session.

We also pass a callback when we're creating the decompression session. This callback is called with the encoded pixel buffers. There's no GWorld here. The callback is called with a fresh buffer for each time, and when you release them, they can be -- go back into the pool that the ICM is using to recycle pixel buffers. That tracking callback is also called when we want to recycle, when the data buffers can be recycled.

Okay, so now we've got a decompression session, but nothing's been decoded into it. Now we've got to have a look at the media and pull out those frames. We get the number of samples in the media and we allocate some storage for each of them. Now here's the first interesting question. What's the first frame that we want to display? Well, we're starting at time zero in display time, but that might not be the same as the first frame, so we need to go and do a mapping, and this is calling a new API in Tiger.

Previously we'd call media time to sample them to get this information, but now we need to specify which kind of time we want to talk about, so it's media display time to sample them. and I think I have debug tools. Expressions, aha, my little expressions window. Some of these variables are uninitialized. So you can't see this, but I'm telling you that number in red is one.

So that's the next sample number we want to display. That's also the first sample that we're going to decode. That's just like the... as we saw it in dumpster. OK. So we're approaching the outer loop here. The outer loop queues frames in decode order. That's just ascending numbers in sample time. So we use plus plus to get to the next one.

So we translate that sample number to a decode time so that we can get the size of that sample. We allocate some storage. We load it into that storage. Along the way, we have found out what the decode time and display offset are. And we add those to find out the display time for that frame.

And now we have enough information to queue that frame with the decompression session. And we hand it to the ICM, and nothing happens yet. Nothing happens because we're using this new mode where you pass display times and they're non-scheduled. There's no time base. They're not going to come out until we ask for them back.

So the next question is, well, have we enqueued the frame that we need to get out? And so we just compare these two numbers. We've queued sample one, and we want to get back sample one. So yep, let's go into the inner loop, where we retrieve samples in display order. And we do that by saying, here's the non-scheduled display time that I would like to get the frame back for. And there's the frame. Hooray!

And we do that by saying, here's the non-scheduled display time that I would like to get the frame back for. And there's the frame. Hooray! But that's frame three. So let's go around this loop again. I've got a break point here. So the next thing that we decode is--that we schedule for decode is frame two.

And now we've queued frame two, but we want frame three, so we have to go around again. We'll skip over the inner loop and come back to the outer loop. We'll enqueue frame three, and then we'll ask back for frame three, which is in the queue now. And here we go. The bus will move a bit.

Or the road will move. I guess we're in the bus. And the next time is 2000, and that's frame two. So remember this order? We're going through the frames in display order, thanks to the Get Media Next interesting display time and so forth. We're going through these frames in order one, three, two, five, four, seven, six.

So as we go through, we're going to get back frame two now. So OK, big deal. We carry on in this pattern. Because the frames are exchanged in pairs, we'll queue two frames and then retrieve two frames, and then we'll queue two frames and retrieve two frames. And it's going to carry on like this. I'll clear that breakpoint and just continue. And in fact, Here's us going through the rest of the frames. And here's Harry Potter. Great note to end on.

Okay, so one more thing. It's not like a Steve one more thing, I'm sorry. We've enhanced the decompression component interface to support B-frames. This is all based on the Base Decompressor. We introduced the Base Decompressor six years ago in QuickTime 3. It's been helping you write video codecs. The Base Codec helps by... Implementing the queue that holds the scheduled frames.

and now it'll also help you with frame reordering for B-frames. The new rules for B-frame-aware decompressors, if you want to write one yourself, you opt in in your initialized function by setting a flag. You classify frames in your begin band function, which means that you say whether the frame is a key frame, a difference frame or a droppable difference frame. It's always been a good idea to do this, but for B-frames it's mandatory.

Third, we've split the work in the drawband function. This used to be where both decode and display happened, and now we've separated those, and decode happens in the decode band function and display happens in the drawband function. And only one of those will get called if we just need to decode the frame in order to prime it inside your decoder.

Final note, and this is not just for B-frame codecs, but in fact any codec that wants to work in QuickTime Player now, don't cache the Pixmap base address in Begin Band. It might change between Begin Band and Draw Band. and then you draw in the wrong place. So these are the new APIs that we've provided in the Image Compression Manager for multi-buffer support and for B-frame support.

There's a video track movie exporter in the seed. I encourage you to try it. I encourage you to try out the new APIs that exercise that because the new APIs will work with the B-frame stuff and you'll see how that works. Also, exercise your application and examine whether your application needs revising in order to cope with B-frame content.

I want to give you one small warning about one bug, because it's likely to bite you and I'm a kind guy. The bug is this. The Video Track Movie Exporter only creates video tracks. It doesn't copy the soundtrack. So it's likely that you're going to want to go and extract the soundtrack and paste it into the new movie with the add command so that they're next to each other. If you do that, save the movie but don't save it self-contained. And save as self-contained is now the default in the new QuickTime player.

Carefully switch it back to save a reference or what used to be called save normally. The reason for this is that when you save a movie self-contained, we do a thing called interleaved flattening. We interleave half-second chunks of audio and video. And the code that does this, it's got a bug that doesn't do the right thing for H.264 and you end up with movies that won't play properly. They're not going to cause any harm, they just won't play right. So avoid saving those movies self-contained. So, more information.

Both on the CD, the DVD, and on the net at connect.apple.com you can download a bunch of documentation for this. There's a "What's New in QuickTime 6.6" document in -- it's the big 60 megabyte tiger documentation. Download it here while Apple's paying for the bandwidth. and others. Also download the disk image for this session. You can go to connect.apple.com, log in and click download software and see what's new and it'll be there under the developer's conference stuff. The API reference won't be updated until Tiger goes final.

And one more thing. It's another dull one more thing. Just around the back here in the hands-on lab for the QuickTime for the Graphics and Media and QuickTime Lab. We have a special extended hands-on lab time so that you can talk to me and other folks about IPB and about visual contexts. And that's starting more or less right after this session. And that'll go until 6.30 and they'll be tearing down everything else. But they're going to let us stay in the room so that we can help you. So come along.

Also seeding. You've got the Tiger seed. It's possible that we'll do a further QuickTime seeds. If you would like to be involved in such seeds, please send an email with your name, company, product, technology interest to QuickTimeSeed at Apple.com. I have some reminder cards. I have some reminder cards that I can give you if you come and see me after this. Or you can contact your friendly evangelist.