Working with Media in AV Foundation - WWDC 2011

Graphics, Media, and Games • iOS, OS X • 56:12

The AV Foundation framework provides a rich Objective-C interface for playback and editing of audio and video in your iOS or Mac OS X application. Discover the tremendous control and flexibility provided by AV Foundation and get instruction about working with media in your apps. Understand the mechanisms for reading and writing media and learn how to compose independent clips into new assets.

Speakers: Sam Bushell, Adam Sonnanstine

Unlisted on Apple Developer site

Downloads from Apple

HD Video (287.1 MB)

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.

Good afternoon. Happy World IPv6 Day. My name is Sam Bushell. I work on writing code and fixing bugs on our media frameworks. And today I'm talking to you about working with Media in AV Foundation. The APIs and tools I'm going to talk about today were introduced in iOS 4 and some of them were added in iOS 4.1 and now they're all brought to Lion in Mac OS X. They're the same APIs and tools that we use to build iMovie for iOS and QuickTime Player 10 for Lion.

And I'm going to do some demos. The sample code for those demos is live on the web. And if you go to the schedule page for this session, there's a link to the sample code there. And I recommend that if you want to follow along with some of the code samples I'm going to go through on the slides, you can look at the file called simpleeditor.m for more details and more context.

So let's look at where AV Foundation fits in. AV Foundation is an Objective-C framework that sits upon the foundations of the C frameworks, Core Audio, Core Media, Core Animation, Core Video, and so forth. And it's below the level of UIKit. Which means that we can deliver the same framework and more or less the same APIs on iOS. and Alan Lyon. So the things I'm telling you about today are uniform across those.

So I'm going to go through a number of scenarios today and talk about how you can use AV Foundation's APIs and tools to deliver those kinds of features to your users in your applications. I'm going to talk about how to generate a still image from a movie, how to export a movie to a new file, How to assemble a movie from clips from other source movies.

How to add an additional audio. How to add video transitions like crossfades. How to add core animation. and my colleague Adam Sonnanstine is going to tell you about how to extract audio and video data from movies and how to build a new movie from your own video and audio data.

But first, let's talk about some fundamentals that AV Foundation is based on. I want to introduce the framework CoreMedia. CoreMedia is a C framework that provides some basic data types that AV Foundation uses. The most important one for our session is called CM Time. CM Time is a C struct like CG Size or CG Rect, and it's a struct type that defines rational time. It's counting a number of seconds as a rational with a 64-bit numerator and a 32-bit denominator that we call a time scale.

We have constants for zero, positive infinity, negative infinity and so forth. We have utilities for adding and subtracting and various other arithmetical utilities. We have utilities for comparing CM time values. And if you have a start time and a duration, you can construct a time range. And if you have two time ranges, that's called a mapping.

So, enough of core media, let's move out and look at AV Foundation. In yesterday's session, Simon introduced AV Asset. AV Asset is what you use in AV Foundation to represent a movie that's in a file. And inside AV Asset, you can browse and find AV Asset tracks, each representing the video and audio tracks in the movie. To play an AV asset, you construct an AV player item from the AV asset and then you attach it to an AV player.

So let's get into these scenarios. First one is creating a still image from a movie. AV Asset Image Generator is the class used for doing this. You build your AV Asset Image Generator from an asset, And then you can ask it to generate images for you by giving an array of the times you want images at.

Now you need to retain your image generator until you're done getting those images. And you'll notice that I passed in a handle block. The handle block is important. That's how you get the output. And it's important that you check the result because it may not succeed. If it did succeed, then you can go ahead and use the CG image, retain it if you need it later. If for some reason it fails, you can examine the error object to find out why. And it's also possible for you to cancel outstanding image generation requests, in which case your handle block will still be called to say that this request was cancelled.

Exporting and trimming movies. You export an AV asset to a new file using the class AV asset export session. Whenever you make an AV asset export session, you have to specify a preset. We have presets for different sizes and bit rates and so forth. You have to specify the output URL and the file type. In this case, we're generating a QuickTime movie. Optionally, you can set a time range. If you don't set a time range, then we'll export the entire asset.

You can also optionally modify or add metadata. And then you kick off the export asynchronously and provide another handler block that's called when it's all done. Again, the handle block looks a lot like the one for image generator and you should examine the status to see if it completed, failed or if you cancelled it. So I'm not going to give a demo of either of those cases. They're fairly basic and you can use the sample code to do that yourself. I do want to talk about error handling for a moment. It's important that you handle failures gracefully.

AV asset export session will not overwrite files. If you try, it will fail. If you want to overwrite a file, it's your responsibility to delete it first. On iOS, your application is restricted to a certain sandbox and AV asset export session cannot be used to write files outside your application sandbox.

Also on iOS, the multitasking features of iOS let the user go off and do something else while your export can continue in the background. And this is a great feature, but it introduces more situations where export could fail after a while. For example, If the user goes and starts playback of video, that might use resources that you'd been using for your export, and so that playback could interrupt that background export and cause it to fail. But even if your application is still in the foreground, export can fail. For example, an incoming phone call or FaceTime call can interrupt export.

So let's move along to the next things. Let's talk about AV composition, which, along with AV Asset, are the cornerstones of editing in AV Foundation. We've seen how you can use AV Asset as a source object representing a single movie, and then you can use that AV Asset for playback with AV Player and AV Player Item, for image generation with AV Asset Image Generator, for export with AV Asset Export Session. And a little later on, you'll see how you can use it with AV Asset Reader to extract audio and video data out of the asset.

What about the case where you have several source movies and you want to put them together? AV Foundation's tool for doing this is called AV Composition. And because AV Composition is a subclass of AV Asset, you can use it in the same great ways that you can use AV Asset. You can use it for playback, for image generation, for export, and for asset reading. So let's take a demo of cutting together a bunch of different clips and playing them together.

Okay, I'm launching AV Edit Demo, the iPad version. The first thing you're going to look at this is to say this is no iMovie. This is not designed for users to play with. This is designed as a programmer's playground so that people like you can understand how the APIs work and explore and experiment with them.

The UI is very close to the way the API works. So the first thing I'm going to do is select the clips that I'm going to use to edit together. I have a clip here of a cat and a clip here of a beach and the third clip of some flowers.

And you can see that there are start and end times. I can set my edits. I can set the portion of these clips that I want to use. I'm going to take in a section from about 20 seconds of the cat. Oh, I'm going to take in a section from about 20 seconds of the cat. Oh, the first eight seconds or so of the beach and then some eight second portion of the flowers. That's about that. And now let's view the player and see what that looks like. Okay. I'll play through this so you can hear what it looks like.

So what did we just see? We started with three clips, the cat, the beach, and the flowers. And we took segments of each of these and composed them on the same timeline. The tool we use for doing this again is AV Composition. So what AV Composition does is it takes segments of assets and places them on a timeline. So your AV Composition object contains a number of AV Composition tracks, just like AV Asset contains AV Asset tracks. And each AV Composition track contains segments. And each segment contains information that defines the source movie, the video track, and the time range.

Here's the mutable version of AV Composition, which is called AV Mutable Composition, can be edited in a number of different ways depending on your needs. There's a family of APIs which edit across all of the tracks at once, and there are some other that only modify a single track at a time. And you can also modify the whole segment array directly if you have your own representation of how the editing should work. New in iOS 5 is an optimized batch version of the single track editing API.

So here's how it looks in the code that you'd see in simpleeditor.m. First, we make a mutable composition object and then we add composition tracks to it. Here I'm adding one for video and I'm taking that video composition track and adding a segment of a source asset into it.

Now, it's generally not a safe idea to modify a composition object while you're using that same object for playback, for image generation, for export, or for media data extraction. Instead, you want to make a copy and then you can modify the original. Here I'm making an immutable copy of the immutable composition and creating the player item from that.

Now, if you have a live view of what your video preview should look like and you want to change it sharply and snappily and safely, you can use the replace current item with player item API to do that. All right, let's move on to the next scenario of adding additional audio.

Now that I've shown you what AV Composition is, you can get a better idea about what you're seeing in the lower right part. We have enough space on the iPad version of AV Edit Demo to draw an illustration, a visualization of the same objects that are actually being used for playback.

This is not generated separately. This is actually generated-- this artwork is generated from the objects that are used for playback. So you can see that there are three segments. for the video track and three segments for the audio track. And now I'm going to go down to these extra options here and add a commentary track. I'm going to select my audio clip.

And I get to choose when in the overall composition that audio clip should start with this slider here. So I'm going to drag that to about 11 seconds. You can see how the AV composition has been set up. And now let's play through this and listen carefully for what you hear.

It's true, when something exceeds your ability to understand how it works, it sort of becomes magical. Scrub back just one more time and play just a bit just before he starts to talk and listen carefully for how it becomes quiet. It's true, when something exceeds your ability to understand how it works, it sort of becomes magical.

So what's happened this time? So once again, we have our compositions, audio and video tracks. And this time, we're adding an additional audio track in parallel with the other ones. Notice that this audio doesn't begin at the beginning of the movie and doesn't go all the way to the end. There are gaps. These gaps are called empty edits or empty segments.

The other thing we noticed was that the audio became quiet just before Johnny Ives started talking so that we could hear him clearly. This technique is called ducking and it's accomplished in AV Foundation through the use of audio volume ramps. Looking at the objects we use, once again we have our composition with the old tracks and now the new track that we're adding. And AV Audio Mix is the object that we're using for representing the audio ramps. It contains a list of audio volume ramps.

An audio volume mix has an array of input parameters objects. Each input parameters object adjusts the volume level of one track. If a track does not have an input parameters object, then it will maintain the default volume level of full volume. Here's how you generate them in code. You create an input parameters object.

And then you set the volume at times and across time ranges. Note that in between the volume times and ranges that you specify, the volume level continues at the last value. So in this case, I'm setting the volume at time zero and from time X to time Y, and that's sufficient to draw the yellow line on the right.

Once you have your input parameters for each track, you collect them up into an audio mix object. And then to use that audio mix object, you set it as a property on your player item, your export session, and there's a special subclass of AV Asset Reader output that knows how to deal with multi-track audio mixing, and you'd set it on that one as well. Let's move on to some video transitions. Those cuts are a little bit abrupt, so let's add some crossfades to smooth that over.

Now you know what the AV Audio Mix that you're seeing there is all about. So now I'm going to go down to the second option, Transitions, and turn that on. See a lot more stuff happening there. I'll play it again. Watch the cat as it goes through the transition period.

It's true, when something exceeds your ability to understand how it works, it sort of becomes magical. So that's what a crossfade looks like. I can also set it to a push transition and I can change the length. It's kind of fun the way everything moves around. And I'll just adjust the start time of this guy to bring it back in. And let's play this again. This time we're showing about a two second push transition.

It's true, when something exceeds your ability to understand how it works, it sort of becomes magical. That's some transitions. So, recapping again what we're seeing here. Once again, we're using AV Composition, but there's something different. This time, instead of just cutting from one segment to the next, we have transitions.

So we're taking the middle segment of video and moving it into its own track. Now you can see that each of those transition periods is a period when two video tracks are being decoded at the same time. Now, we're also decoding two audio tracks at the same time, but that's pretty natural.

When you hear two sounds, what you hear is the sum of their waveforms. So we don't have to do anything particularly special there. But for video, it's important to explicitly define what the output should look like in terms of the input videos. AV Foundation's tool for doing this is called AV Video Composition.

Not to be confused with AV Composition. AV Video Composition has instructions that tell AV Foundation when to play track A, when to play track B, when to play some combination of the two. So if we take this video composition and apply it to the multitrack composition above, we get something that looks like this.

So AV Video Composition contains an array of video composition instructions, each for one period of time. Each instruction describes the output video in terms of instruction layers, layer instructions. And each layer, which is associated with the track, has an opacity and an affine transform. Now you can apply a ramp to these values, what we call a tween. You can tween the opacity to get a crossfade or you can tween the affine transform to get a push transition.

Here's how we put them together in code. First, you make your video composition instruction object. You must set a time range that says what time range that instruction is going to apply for. And then you construct layer instruction objects that represent the tracks that are being inserted into that composition instruction. So here we're making one for track A.

And now we're going to fade out track A by setting a tween on the opacity from 1 down to 0. We then make a second layer instruction representing track B and we'll leave that at the default full opacity of 1.0. And then we set our layer instructions array to be an array of these layer instructions from top to bottom.

Once you have all of your instructions, you assemble them and set them on a video composition object. It's critical that the time ranges do not overlap and the time ranges may not contain gaps. There may not be gaps between the time ranges. You have to tell the video composition what frame duration, which effectively means the frame rate. Here I'm saying what the frame duration is 1/30th of a second, which means we'll generate a 30/5 animation. You also have to set the render size, here we're using 720p. And for playback, you can optimize the frame rate by lowering the rendering scale.

To use AV Video Composition, once again, just like Audio Mix, you set it on the player item for playback, on the image generator, the export session, and again, there's a special subclass of AV Asset Reader output that knows how to do video compositions. A couple of things to watch out for. Like I said, the time ranges must not overlap or contain gaps. Every time range must begin where the previous one ended. And they must not be short of the Navy composition. They have to fill out the full duration of the composition.

Now if you know something about video compression, you might be thinking, okay, can I edit anywhere or can I only edit at keyframes, at iframes? Rest assured, AV Foundation lets you edit at any frame boundary you want. How that works is that to play a video segment that doesn't begin with a keyframe, AV Foundation has to go back and decode frames from the previous keyframe.

This is called catch-up. As an optimization, It's possible to use AV Composition to put alternating segments in alternating tracks and then use AV Video Composition to select which track to play at a given time. Doing this gives AV Foundation more time to do the catch-up decoding in those empty gaps. Let's go on to the Core Animation Scenario and have a look at one more demo.

I'm going to scroll down on the left to the last option, which is for a title. And I'm going to go and type in my title. Ooh, I can use this thing. Use my thumbs. M-A-G-I-C. Oh, thank you, autocorrect. All right. Okay, so let's play this one. I've added some animation.

It's true, when something exceeds your ability to understand how it works, it sort of becomes magical. didn't say it wouldn't be cheesy animation. So let's scroll back here, and you see that as I go back, the stars spin anticlockwise, and you go forward, they go clockwise. Core animation. We've taken our AV composition and we've also added some core animation stuff. We've added layers that contain the text and the stars and some animations. Here's what our layer tree looks like.

We have one parent layer which contains a layer that contains the text and a layer that contains the ring of stars. And we have an animation controlling Spinning the Ring of Stars and another animation that fades everything out after 10 seconds. If you need a recap on core animation, there's a session tomorrow.

Let me take a moment to talk about animation. Animation is the result of modifying a property like position over time. If you've used that Core Animation, you know what it's like to use it for real-time animations of your user interface. AV Foundation lets you use the same tools in movies. The only difference is that instead of applying in real-time, the timeline for these animations is now your movie's timeline.

Let me illustrate how that works. Your UI view or NS view is going to contain a layer hierarchy. Somewhere inside that layer hierarchy is an AV player layer. The AV Player layer has a private layer which contains the video. Now, that video's timing is special. Every other layer in this diagram operates in host time. Time is counted as the number of seconds since boot and it's always monotonically increasing.

The video layer, on the other hand, runs in movie time. Its timeline is the number of seconds since the start of the movie. And that starts when I start playback and it stops if I pause. But I can even rewind it and make it go backwards, which real time can't do.

When I add my animation, the animation needs to run along movie time, not real time. AV Foundation provides The AV synchronized layer object to make that happen. So for playback, you use AV synchronized layer to make animation use movie timing. For export, it's similar but a little different. AV Foundation's object is called AV Video Composition Core Animation Tool and that integrates Core Animation as a processing video stage.

So in this case, you set up a single parent layer that contains both your video layer and the animations you want to do. For optimized rendering, it helps to indicate when there's no core animation artwork to render for a particular time period. By setting the enable post-processing property to no, you can tell us to skip unnecessary rendering.

On iOS, core animation use is prohibited while you're in the background. If you try, it'll cause your export to fail. I need to give you a few hints about coordination features that are convenient when you're working in real-time animation, but they don't work so well for movies. First is that if you have an animation with begin time set to zero, which is the default, as soon as you commit it, its begin time is changed to the current host time. That's very convenient for real-time animations, but it's not at all useful for movies. So, you want to use a very small non-zero number for the begin time, and if you can't think of a very small non-zero time, we have one for you.

Another thing is that Core Animation automatically collects and removes animations after it thinks that they have gone into the past. That's obviously a useful idea for real-time animations, you can never go back. But with movies you can go back. So you need to set the removed on completion property of the animation to no, to tell Core Animation to keep its hands off.

The third thing from Core Automation is that if you set a property or change anything about a layer tree, by default Core Automation will construct an implicit animation from the old state to the new state. And that's set up to operate in real time. Generally that's an unwanted thing. You don't want that to happen inside your movie. So you need to surround all of your changes to the layer tree in explicit transactions and you need to set disable actions to yes.

Finally, sometimes you want your animation to extend past the end of your AV composition. And that's okay. But we need to know how long playback should end or how long export should happen. So you set the playback, for playback you set the player item, the forward playback end time, and for export you need to set the time range explicitly.

Okay, so I've gone through these six scenarios. I'm now going to hand over to Adam, who's going to tell you about reading audio data and constructing movies from your own audio and video data. Thanks a lot. All right. Thank you, Sam. So up until this point, we've generally been talking about pretty high-level use cases for audiovisual assets, whether we're taking an entire asset and exporting it to a new file with AV Composition -- or sorry, AV Export Session, or if we're taking multiple segments of assets and stitching them together with AV Composition. So for the next 25 minutes or so, we're going to dive down a little deeper and get our hands dirty and learn how to read and manipulate the individual pieces of audio and video data that make up media assets.

So what are we going to cover? First, the why and the how. We're going to go through a few simple scenarios that show what you can do with these capabilities. Then we're going to go into detail about reading data from existing assets and doing stuff with it. And then we're going to talk about taking media data that you have -- audio and video data -- and writing it to new files. Finally, we're going to talk about some considerations to keep in mind whenever you're writing files that contain audiovisual media.

So say that you have an app that does simple video editing, like we have iMovie here, and you want your users to be able to have a simple graphical interface but to be able to do precise edits. Maybe they want to trim out the silence at the beginning of the clip. So it's helpful to them, if you draw an audio waveform like iMovie does here. In order to do that, you're going to need to be able to examine the audio stream in detail so you can draw it on the screen.

So to do that in AV Foundation, you use a class called AV Asset Reader, which was available starting in iOS 4.1 and is now available on the desktop in Lion. And the way that it works is you hook it up to an AV asset, which could be a movie file on disk, and then it will decode the data in that and give it to you piece by piece, so that you can work with it. And AV Asset Reader is an offline tool to be used for doing offline processing.

The next thing you might want to do if you have a graphical game, then you want to take a segment of the game, a sequence of gameplay and record it to a movie file so you can play it back later and relive that great game moment. What you'll want to use is another class from AV Foundation, AV Asset Reader's counterpart called AV Asset Writer. And the way this works is you'll take your rendered frames from your game and you'll push it into an AV Asset Writer which will compress them and write them out to your movie file.

Final scenario is if you want to let your users manipulate the color in a video, you're going to need access to the individual pixels so that you can change the color properties of them. And then you want to write the results out to a new file, you're going to want to combine AV Asset Reader and AV Asset Writer.

First you hook up the AV Asset Reader to your movie so that you can read out your uncompressed video frames. Then you'll do your color manipulation and write the new frames out to an AV Asset Writer, which will compress them to your file. So that's the basics of what we can do in a general overview. So now let's dive deeper into the actual reading from assets with AV Asset Reader. So recall these simple use cases, drawing an audio waveform or decoding video frames so you can mess with the color.

So to zoom in on the waveform scenario a little bit, the way the classes are laid out. You have an AV Asset Reader. You have an AV Asset, which will represent your movie file. And the AV Asset has a single audio AV Asset track within it. And you want to be able to decode the data in that track and get PCM out on the other side. If you're not familiar with PCM, it's basically just an uncompressed audio format. If you imagine the wave of the actual audio that's happening, it's just the raw values that we sample when we record that audio.

and to actually do that decoding, you use a class called AV Asset Reader Track Output, which you can hook up to your AV Asset Reader. If you go to the pixel filtering scenario, you'll have a video track, an AV asset track with compressed video in it, and you want to decode that to get your decoded frames. And the decoding is done by AV asset reader track output once again.

But then say you also have an audio track, and although you're filtering the color in the video frames, you just want to keep the audio track as it is. You can also read those samples out with AV asset reader track output, but configure it so that it just passes through the samples as they're stored in the asset without decoding them. And in this case, you'll get AAC audio if that's how the audio is stored in the asset.

So when you have multiple tracks, are there any rules on how you can choose those tracks? Well, there are. All the tracks for a single AV asset reader operation must come from the same asset. If you want to combine tracks from multiple assets, you should use an AV composition. And since AV composition is a subclass of AV asset, as Sam mentioned earlier, you can feed that into an AV asset reader and read out your decoded samples.

So let's talk more about how you actually set them up. There's a few simple steps. First, you want to instantiate your asset reader using a reference to the asset that you're reading. Then you want to create your track output within a reference to the track that you're going to read from, in this case, this audio track for the waveform, and also an NSDictionary of output settings, which we'll talk about more in a little bit. And then you can connect them with the add output method. And then you want to do some configuration.

In this case, we're telling the asset reader that we only want to read out the first five seconds of the asset. And then you start reading. And this is your commit point. And after this, no more configuration is allowed. And you really just need to focus on your reading of the actual audio and video data, which you usually do in a loop. And inside that loop, you'll read each piece of audio or video data using the copy next sample buffer method.

And that will give you an instance of CM sample buffer, which is the data type we use for representing media, whether it's audio, video, or another media type. Then if you got a valid non-null sample buffer, you do something with it. In the waveform case, you'll extract your PCM values and draw your waveform up to that point. And then don't forget to release your sample buffer when you're done with it.

Now, copy next sample buffer can also return null. And this could mean one of two things. Either you've finished reading all of the... ...the buffers for the time range that you selected. Or an error occurred that prevented you from reading any more buffers. And to distinguish between these cases, you can query the status property on AVAssetReader.

So now let's talk more about those output settings that we had. Output settings, as I said, are represented as an NSDictionary of key value pairs. And what you're doing with the output settings is telling the asset reader the format that you want to receive the buffers in, the audio and video buffers, ...when you're reading the data. And it decodes them and gives them to you. So for this example, for audio, we have linear PCM, 32-bit floating point, using the keys you see on the screen. These keys and more are declared in AVAudioSettings.h.

On the video side, remember, Asset Reader is always doing decompression. To specify the format that we want of the decompressed video frames, we use pixel buffer attributes from Core Video. And although these keys are from Core Video, AVVideoSettings.h is where we want to go to get more information on how to construct them for AV Asset Reader.

In addition, your choice of pixel format, here we use 32-bit ARGB, which is convenient for manipulating color. Your choice of pixel format might be influenced by the platform you're on, iOS versus desktop, or it might be influenced by the format of the encoded media that you are decoding. So for more information about how to choose a good pixel format, see avassetreaderoutput.h.

[Transcript missing]

Then you're going to append your samples. The append sample buffer method of AV Asset Writer input is the method you use to do this. And finally, as a last step, you're going to tell the Asset Writer to finish up writing the file, make sure it flushes out any queued samples it has, and tell you whether it succeeded or failed.

So now we'll talk more about those two things that I sort of glossed over. First, the sessions. So imagine that you have a group of audio samples and a group of video samples, but the start and end times don't quite line up like we have in this example.

It's important to keep in mind that anything that you append to the Asset Writer will get written to your output file. And by default, if you play that file back, it'll all get played back. You'll see all the video and hear all the audio. But if you've ever seen a movie that has a few frames of video with silence until the audio starts in abruptly, you know how distracting that could be. So usually we want to have a clean start and end where the audio and video are matched up.

So you can do that with AV Asset Writer by calling startSessionAtSourceTime, and that will give the Asset Writer a place to start the presentation. It's a good idea to give it a time that is the beginning of a video frame that also lines up with a segment of audio data.

And the grayed out part of the audio sample here will still be in the file that gets written, but, you know, you just won't hear it when you play it back. You'll get a clean start where both the audio and the video start at the same time. And you can do the same thing at the end using endSessionAtSourceTime.

So then we'll talk about the output settings for compression, which is a little bit different than the output settings that we use for the Asset Reader. So here we have an example dictionary. These keys also come from AVAudioSettings.h. And here we have AAC, 128 kilobits per second, and 44.1 kilohertz.

On the video side, we're going to look to AVVideoSettings.h once again, but we'll use keys that are actually in that file. In this example, we have H.264 at 640 by 480. Now, H.264, as most of you know, is a great choice for most consumer applications. If you want to integrate with a professional video workflow, AVAsset Writer also supports compressing to two different flavors of Apple ProRes. Apple ProRes 422 and Apple ProRes 420. Both of which give you excellent video fidelity. For more information about ProRes, as well as a link to a white paper, you can go visit the link on this slide.

So that's the basics of AV Asset Writer, how you configure it, how you use it. And at this point, we need to think a little bit about where our audio and video data is coming from. Because that's going to have an impact on how we use the Asset Writer and how the Asset Writer behaves. So you might be getting your audio and video data from an offline source like AV Asset Reader or maybe from a real-time source like AV Capture.

Similarly, you might have a pull model or a push model. And all these things influence how you use AV Asset Writer. But before we get into the differences, we need to talk a little bit and understand how AV Asset Writer wants to lay out the data that it writes to the files.

So you have a choice when you have an audio track and a video track. You need to make a choice on how you're going to lay those out in the resulting file relative to each other. One way you can do it is very simple. You can just take the entire video track and then write the entire audio track. And this would work. And you see in the diagram we have subscripts to indicate times that match up.

So we want to display V1 and A1 at the same time, V1 and A2 at the same time, and so on. But the problem here is that the corresponding video and audio samples are really far apart from each other. So if we want to read the file while we're playing it back, in order to keep the audio and the video synced up, we're going to have to jump back and forth between these two tracks constantly. This is very inefficient, and it's even worse if you're trying to download and play back from a network.

A better way would be to mix them together. We call this audio-video interleaving, and you just basically do one segment of video with one segment of audio, one segment of video, and so on. And this is much more efficient because the video and the audio that's supposed to be played at the same time are much closer together, so we don't have to jump around when we're playing back, when we're reading through the file.

And so the thing to remember about AV Asset Writer is it's always going to try to do this audio/video interleaving thing to make sure that when you play this file back, you get the most efficient I/O pattern that you can. So this works well, but how does it achieve this?

After all, you're the one who is writing your video data and writing your audio data at your own leisure, if you want. Well, it uses this property called Ready For More Media Data on the Asset Writer input to try and balance out how frequently you're appending data to each input. And we'll illustrate it with this diagram. In the middle, we have an Asset Writer with an audio input and a video input. And at the bottom, we have an empty file that we'll be writing data to.

And imagine for this example that the ideal interleaving pattern is just one audio sample, then one video sample, then one audio sample, and so on. It's not quite how it works in real life, but it's close enough to illustrate the point. So as we're going through, we're going to be receiving audio data in. We're going to ask the input if it's ready. It'll say yes, so we can go ahead and append.

And we'll have a video frame come in, and that input's ready, so we're able to append that. But then imagine that your audio starts coming in faster than your video. Now, AVAsset Writer wants to maintain this ideal interleaving pattern, and so it's going to tell you to hold off a little bit until more video comes in. And so your video comes in, and it'll let you append that. And then that'll unblock your audio input and you can add more data to that and move on.

So this works quite well. But then you might be wondering, what's the best way to keep track of when your inputs are ready for data and when they're not? Well, the answer depends on whether you're using a pull model source or a push model source. So if you're using a pull model source, like AVAsset Reader, generally what you're doing is you request each piece of data individually when you're ready for it. So what we let you do, we provide an API called Request Media Data When Ready that will let the asset writer tell you exactly when it's ready for more media data so that you can go and pull another chunk to append.

And the way this API works is you give it a block that it's going to call back any time that the value of ready for more media data transitions from no to yes. And inside the block, you'll generally want to loop and append samples until your input is no longer ready for more media data. So inside that loop, we'll grab our sample buffer, possibly from our asset reader. We'll go ahead and append it using AppendSampleBuffer.

But the last thing to keep in mind is what happens when you run out of, say, when you run out of audio to append? Now it's important that you tell the asset writer that you have no more data to append to a certain input, because in its efforts to balance out how you're appending audio and video data, it might be holding off letting you append more video data until you get more audio. And if you have no more audio and you don't tell the asset writer anything, the whole process might stall out. So you want to call the method markIsFinished, and that'll tell the asset writer not to expect any more data in that input.

So things are a little different for a push model. In a push model, generally what's going to happen is you're periodically going to get a callback. In this case, we have an example of the AV Captures Data Output Sample Buffer method, which is the delegate method that you use with AV capture audio or video data output. And this is typical of a push model source, where it's just going to give you a callback and hand you a buffer, and you have to do something with it quickly before the next one comes in.

So in this case, we recommend that you just append your sample right then and there, do any processing you need to do, and then append it without any delay. Now you notice that we still check the value of ready for more media data, because it's possible that we are in a push model source that we're being pushed data faster than the asset writer can actually process, compress, and write to the file.

and Adam Sarkozy. So if this is happening, it's tempting that you would want to do your own queuing where you save off the buffers and you wait and you hold on to them until your input is once again ready for more data. We recommend that you don't do this. First of all, I'll let you in on a little secret.

AV Asset Writer is doing queuing on your behalf. So queuing on top of queuing isn't going to buy you a whole lot. It's usually not necessary for most use cases. It's error prone and it typically will just hide the real problem that you're just producing data too fast and the Asset Writer can't keep up.

So instead, we recommend that you drop some video frames in the short term in order to catch up. You don't want to drop audio because then you might lose AV sync. And if possible, try to go and throttle down the rate at which you are producing those video frames to avoid overwhelming the Asset Writer further.

An example of how you can do that is if you do that with AV Capture, there's a property that lets you set a maximum frame rate for your audio -- sorry, your video data output. So that's a good story for push model sources, but a lot of push model sources are also operate in real time. And AV Capture is an example of that. And with a real time source, the time constraints for how much time you have to deal with each piece of audio or video data is even stricter. And with these additional constraints.

Although interleaving in the output file is always important to make sure you get good playback I/O efficiency, even more important is making sure that we can capture as much of the data coming at us and get it written out to the file so we don't lose any data.

In addition, in a real-time scenario, the interleaving will generally be OK, even if we don't do much to try and manage it, because the audio and the video are coming in at about the same time. And so if the asset writer knows that the data is coming from a real-time source, it can focus less on micromanaging the interleaving and give you an opportunity to keep the gates open for appending data.

And what that means concretely is that ready for more media data will be yes a lot more often for your real-time scenario. And the way that you tell asset writer this is you use this property on the asset writer input, expects media data in real-time. And you want to set this to yes for all of your inputs that are getting data from a real-time source.

So that's the basics of Asset Writer and Asset Reader. We've talked a lot about what they're good for and how to use them, but we should really also talk about when you shouldn't use them, because sometimes there's a better tool for the job. As I mentioned before, AV Asset Reader's not for playback or any sort of other real-time scenario. It's just for offline grabbing samples from an asset and doing your offline processing.

Also, if you're just doing a simple transcode or an export where you just want to write your file in a different format, AV Asset Export Session will typically be better suited to that task as well as being easier to use. Similarly, if you just want to do a capture and write all of that capture from a camera or a microphone and get that written out to a movie file, then AV Capture Movie File Output is going to be the better tool to use for that.

So, let's sum up. Sam and I have talked about a lot of different APIs today that fill a lot of different editing tasks. So, let's just go through them to review one by one. First, if you want to create an image for a time, use AV Asset Image Generator. If you want to output a movie, maybe in a different format, you use AV Asset Export Session. If you want to combine multiple clips, you use AV Composition. For audio volume adjustments, you use AV Audio Mix. For video transitions, you use AV Video Composition.

If you want to incorporate core animation into your presentations, you use either AV Synchronized Layer for playback or AV Video Composition Core Animation Tool for export and offline operations. If you want to read audio and video data from your assets, you use AV Asset Reader. And finally, if you want to create a new movie file with your data, then you use AV Asset Writer. And there's an additional piece of sample code available that's not quite up yet, but it should be up tomorrow morning. And it's called -- it covers AV Asset Reader and AV Asset Writer, and it's called, quite simply, AV Reader Writer. So look for that in the morning.

For more information, there's documentation. We have a programming guide, and you can get to it at this URL. As usual, you can use the Apple Developer Forums to ask questions and get them answered. And you can also contact our Media Technologies Evangelist, Eric Version. There were a few sessions that happened yesterday. If you didn't catch them, then you can catch them on the videos when those become available.

There's also two more capture sessions right in this room for the rest of the afternoon. So if you want to learn about capture on both Lion and iOS 5, stick around right here. So thank you very much for coming. Have a great rest of your afternoon and go write some great editing applications with AV Foundation.