Introducing AV Foundation Capture For Lion - WWDC 2011

Graphics, Media, and Games • OS X • 56:09

iOS developers have enjoyed full camera access using AV Foundation since iOS 4.0. Now AV Foundation is coming to Lion with an even richer feature set. Learn how to capture screen input to movie files, query and configure real-time capture devices, and harness the power of AV Foundation capture classes in your app.

Speaker: Brad Ford

Unlisted on Apple Developer site

Downloads from Apple

HD Video (400.8 MB)

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.

Good afternoon everyone and welcome to session 417, introducing AV Foundation Capture For Lion. My name is Brad Ford and I will be your host for the next hour and maybe more if you choose to stay around. Here's what we're going to talk about today. Why and when you should use AV Foundation Capture, the AV Foundation Capture programming model, and differences between AV Foundation capture between Lion and iOS.

Keep a close watch out for those two badges there, the new and the only on Mac OS, because you'll see a lot of that. Pretty much everything we talk about here today is new, either new on Mac OS or new in general. The sample code for this session, we have four sample apps, and last I checked, three of them were published already. You can follow along at that specified URL. The fourth should be up there shortly.

Let's start by doing a brief history of capture on Mac OS. Set your way back machine to classic Mac. Video digitizer components were introduced in QuickTime 1.0. Does anyone know when that was? Anyone? This isn't stump the experts, but-- That was December of 1991, which was 20 years ago. And in case you're curious, this is what engineers looked like in those days. Jim Batson: I'm Jim Batson and I've been working in the QuickTime sleep deprivation experiment.

Thanks. Video digitizer components served us well. They still are the means for third parties to deliver video device drivers on our OS. And they've been around for 20 years. Now that was a pretty hardcore, low-level set of APIs to use. So shortly after, we came up with the Sequence Grabber in QuickTime 1.5. These were considered high-level, easy-to-use capture interfaces. These APIs still work today, and the video quality was much improved in 1.5, as you can see here.

And curiously enough, both of these men still work at Apple. Now let's push forward to Mac OS X, the more recent history, where Qt Kit was introduced with modern Objective-C capture APIs in 2005. We did this because we felt we needed a simpler programming model, something that fit in better with the rest of Mac OS X.

And it sits atop core media, or what we sometimes call in marketing QuickTime 10, which is the more modern underpinnings for QuickTime. And it provides a legacy bridge back to the 32-bit world, so it grandfathers in these VDIG or video digitizer components. These APIs, as you might expect, also still work today, and they are in fact the preferred capture mechanism on Mac OS X.

Until now. Let's talk about AV Foundation. It was introduced in iOS 4 for iPhone, iPad, and iPod Touch. And by show of hands, who here has used iOS AV Foundation Capture APIs? Okay, almost everybody. That's good. The ideas for AV Foundation Capture were inspired by QtKit. We did this on purpose. We felt the interfaces worked really well, so we wanted to keep the familiarity. They also sit atop Core Media or QuickTime 10.

And we felt that to get wide developer adoption of these new AV Foundation capture APIs, we couldn't deliver something that was feature hobbled. So we really worked hard to encompass all of QtKit's capture features in our 1.0 release of AV Foundation. So we think we've done that, encompassing all of the QtKit capture API features. We also provide new features that were not available and will not be available in QtKit. And for the first time, we're supporting third-party core media I/O video device drivers. This is an opportunity, finally.

finally to write modern video device drivers for Mac OS X. and it's available in Lion and Forward. So now you might be scratching your head thinking, well, which am I supposed to use, Qt Kit or AV Foundation? The answer is, AV Foundation. Really. All new development should be using AV Foundation. The only caveat would be You need to continue to use QtKit if you need legacy VDIG support. As I mentioned, we do not grandfather the 32-bit world into AV Foundation, so you'll need to stick with QtKit for that.

If you need legacy video encoder support, when I say legacy encoders, I mean things like Sorenson Video, RLE, some of these older encoders that don't have a place in the modern world when we have very good replacements for them. Those also are not supported in AV Foundation. And if you need to run on Snow Leopard or earlier, you'll need to stick with QtKit as AV Foundation capture APIs are not going backwards, only forwards.

Now let's look at where AV Foundation sits in the technology framework hierarchy. As you see, there are a lot of green boxes there, and AV Foundation is the top one below the thin blue line. AV Foundation is the one that we're talking about, but it relies heavily on CoreMediaIO, CoreMedia. These are C-based frameworks that do the heavy lifting. And above the thin blue line, that's where you come in.

You can write your apps using AV Foundation and interface directly with those Objective-C APIs. And now for the first time, you get to participate below the lower thin blue line. If you are a device driver writer, we welcome you to come talk to us in the labs about how you can write your device driver for Mac OS X using CoreMediaIO.

So the theme for today is new in Lion, more features, more flexibility. Especially if you're familiar with AV Foundation on iOS, you'll be impressed with the number of new features that we've introduced in Lion. First off, AV Capture device enhancements. We support discovery and selection of supported frame rates and formats, which is something we couldn't do in QtKit. System-wide device sharing, locking of shared devices for configuration, and support for closed captions from real-time devices.

Also on the output side, we support compressed AV capture video data output, which is a long sought after feature on the phone. and support for arbitrarily complex AV capture sessions. If you've worked with our APIs embedded on iOS, then you're familiar with some of the restrictions we have there where you are not allowed to use certain combinations of inputs and outputs. All of those restrictions are lifted on the desktop where you can go crazy and make as complicated a session as you would like.

Lastly, three new classes, which are only available on Lion. AV Capture screen input, audio preview output, and audio file output. Audio preview output we won't spend much time on. It does what you would expect. It previews the audio in real time. And it synchronizes it with a video preview if you're showing one. And the audio file output also we will not spend very much time on. But it allows for frame accurate writing of audio files into the common formats, such as CAF, AIF, and WAV. From an AV Capture session. Time for our first slide about the programming model.

Capture basics. How do we look at the world in AV Foundation? We look at the world as inputs and outputs. When my mom asks me what I do for a living, I usually tell her that I'm a bit shoveler. Which means I just take the bits and I put them from one place to the other. And that's what AV Foundation does too.

It sees everything as an input and an output. What I'm talking about here is, let's say you've got a shiny new MacBook Pro. It's got a camera on it, a FaceTime HD camera. And from that camera you might want to do real-time previewing into a video layer, or perhaps capture still images in high quality.

Maybe you just want to get video frames into a delegate call so that you can process them and look for patterns. or the traditional thing, which would be to capture the output to a QuickTime movie. Likewise, all modern machines have built-in audio microphones, and we support third-party audio hardware as well. And from those, you might want to, again, get the audio data into your process for manipulation or write it to a QuickTime movie or preview it to the speakers.

And then there's a third kind of input that we support on Lion, which is the screen. If you have a screen or any portion of the screen, you might want to grab some of that and send it to all of these places we've just talked about. Still images, video output, or a QuickTime movie.

Here's what this looks like to AV Foundation. There is an AV capture session in the middle. This is the center of our universe. It's the place where you hook up the inputs and you hook up the outputs, and it's the place where you control the flow of data. You start the session running in order to get the inputs producing input and the outputs consuming the data.

On the top, we have AV capture inputs. On the bottom, AV capture outputs. And the preview for video is a little bit different. Because it's a video preview layer, it does not descend from a common parent class like the outputs. It descends from CA layer, so it can fit right into a core animation layer tree.

The rest of the talk will be talking about four use cases, which you see here, and we'll take these one by one, the first of which is controlling the camera and recording movies. So let's switch over to the demo machine and we'll look at our first demo. AV Recorder is an aptly named application. It records A and V.

Okay, AV Recorder, the sample code you have, so you can follow along if you'd like to. Very simple application here that's highlighting the kinds of things that we can do with the devices that we could never do with Qt Kit. So you'll see on the top I've got my video device or devices. I happen to have a DV camera hooked up as well as my FaceTime HD camera. I also have some audio devices connected.

[Transcript missing]

Not all applications want this automatic behavior. Sometimes you know exactly the resolution that you want, exactly the frame rate that you want, and you don't want it changing out from underneath you. And we support that using this lock for configuration, which I'll talk more about in the slides. But when I lock video device for configuration, let's say I go and try to change my preset to something high like 1280 by 720.

You'll notice that it changed to a different preset, but the video device stayed put where it was. That's because it is locked for configuration. No one's allowed to touch it. And so if another process comes along and tries to also do IO with the camera, they can get buffers from the camera, but they're not allowed to configure it.

You can probably see where I'm going with this. Please be a good citizen on Mac OS X. Because these devices are shared, it's a good idea to be accepting of many formats and not expect that you'll be able to get exactly what you want all the time. I did not demo recording, but trust me, it records movies. And if recording from a DV camera, these transport controls also become enabled and you're able to do device transport control. So we'll go back to slides and talk about how we did that.

Again, from the high level, AV Recorder looks like this. It's using the FaceTime camera or the DV camera. We saw a live video preview. It was capturing to QuickTime movies. It also was capturing from the built-in microphone to a QuickTime movie and doing a real-time output to the speaker. I turned the volume far down because I didn't want to hurt your ears with audio feedback.

And in AV Capture parlance, that looks like a session in the middle. An AV capture device for each of those devices, but notice you do not plug a device directly into a session. Sessions want inputs, so you have to associate a device with a capture device input before you can add it to the session. And then on the output side, we had a movie file output and an audio preview output. And then for the video preview, we used the AV Capture Video Preview layer.

To create the AV Capture session, it's just a few lines of code. Alloc and init your session and specify the preset that you want. That determines the baseline quality level that you want to receive in your outputs. Find a suitable AV capture device. Here I'm just using the default device with media type video. As a hint, the default device on any computer that you get is always going to be a FaceTime camera if it has one.

You then create and add an AV capture device input. So you associate that device with an input and then say session add input. Now you're done with the input side. For the output side, it's also very simple. Create a movie file output, add it to your session as an output, create the audio preview output, and also add it to your session.

To create the video preview layer, you don't add it to the session. You associate it with the session by calling layerWithSession. That's so that the layer owns the session and not vice versa. You can put the layer into a rendering tree and forget about the session, and it will just clean up after itself when it's done. You set a frame to get an initial rectangle and then add it to some parent layer of a view that you have.

And then start the session running. And you're done. That's the guts of what we just saw happening in AV Recorder. To enumerate AV capture device formats, you can loop through the device's property formats, and you'll get an array of AV capture device formats. Each one can tell you the media type, be it video, audio, mux.

And you can also get a CM format description, which is a core media reference counted object that gives you a lot of information about the format, including any extensions involved. And then to find out about the supported frame rates, you can iterate through the format's video-supported frame rate ranges and find out exactly which formats it supports. To select a device format, you must lock the device for configuration first and then set its active format.

Now let's go through the important AV capture device concepts, some of which I skimmed over in the demo. It allows you to set the format and frame rate. But not all devices expose formats and frame rates. Let me give you an example. An HDV camera or a DV camera, they don't know, they can't let you set the format because the format is dictated by what's on the tape. So you must accept whatever format you get. So when you ask it for its list of formats and frame rates, it doesn't have one.

AV Capture Session will try to configure the devices automatically for you so that you get the best input for your desired output. And AV capture devices are shared across the system, so be aware that when you ask for a particular input from the device, you may be affecting other apps that are sharing that camera at the same time, like Photo Booth or iChat.

The last one in always wins. Unless someone is holding that lock for configuration, the last person to ask for the device to change formats always wins. You can use lock for configuration to gain exclusive control, but please be judicious with that use and unlock for configuration after you're done configuring it to be a good OS X citizen. Lastly, since lock devices may still be used in other processes, you need to code defensively. Don't assume that you'll be able to lock for configuration successfully. You may not be able to because it's already locked in another app.

Switching cameras is equally trivial in AV Foundation. What we did not want is for people to have to write a lot of code where they stop a session, do a lot of configuration, and then start the session again, because then they would have to pay the penalty for all the time it takes to stop, and then all the time it takes to restart.

So instead, we encourage a model where you keep things running all the time, and then you reconfigure while running. And you do that by using begin configuration and commit configuration, which are methods on the AV Capture session. Here's what the code would look like. So instead of stopping the session to remove one camera and then add another camera, you see here I begin configuration, then I remove the input that I don't want, add the input that I do want, and commit. Only when the last commit pops off the stack do I get the behavior where it performs all of my operations at once.

Movie recording considerations. This is how you do a movie recording with AV Foundation. You initiate a QuickTime movie recording by supplying a file URL and a delegate. And in that recording delegate, only one method is mandatory. You have to tell us what to do when it finishes. So you must implement this one delegate callback, did finish recording to output file at URL, in which you will handle the success or failure of the movie being recorded.

This is a new feature in Lion only. AV Capture movie file output supports frame accurate start and stop using a new delegate that does not exist on iOS called the AV Capture File Output Delegate. This delegate receives frames all the time when you're recording or when you're not recording. So every single buffer that would conceivably be written to the file, you get to look at it.

And so based on this one buffer, you can look at its metadata or process it or figure out, does this have the time code that I'm looking for? And start a capture from within that callback. And you are guaranteed to have the movie start exactly on that frame boundary. You can also stop, pause, or resume within that callback. So it is possible to get exactly frame accurate recordings on frame boundaries.

It is, of course, optional. If you just need a simple start and stop, you don't need to use this delegate method. Likewise, we support pause and resume on Lion. You can get the frame accurate behavior by using that same capture output, did output sample buffer from connection call in which you would call pause recording or resume recording.

You can also set limits with the AV Capture movie file output. Set max recorded duration, file size, or free disk space limit. If you set any of those parameters, then your callback might be called spontaneously when one of those limits is hit. Now you need to take care here. If you get an error in your did finish recording to output file at URL, that doesn't necessarily mean that the file is no good.

See what I've highlighted here? You check the error and its user info dictionary because it will come back with an error if one of your conditions was met, such as the file size limit was exceeded. It will tell you that in the error, but by looking at the user info dictionary, you can find out whether the recording finished successfully.

Early termination conditions are enumerated here. The disk may have filled up, or a device might have gotten disconnected because someone kicked the cord, or duration reached or file size reached. Metadata. We allow you to insert custom metadata into your movies, and unlike properties that need to be set before you start the recording, you can set movie-level metadata at any time while the recording is in progress. We did this because we recognize that some metadata is not ready to go when you start the recording, such as GPS location where it can be slow to come in.

So if at any time while you're recording a movie, you set the movie file output's metadata property, it will still wind up in the movie. We reserve space for it. So there you see I'm setting the location metadata to a given latitude and longitude, adding it to the metadata array and then adding it to the movie. Movie fragments are a really cool technology that help with crash protection.

Here's how we record QuickTime movies. Normally a QuickTime movie that you would find, say, on the web will look like this. It's got a movie header at the top. That's where it has all the information about samples, formats, where they are located in the rest of the movie. And then the big blue part is the actual data. So you need the orange part at the top to tell you where to find the sample offsets in that blue part. That's a well-formed, fast start QuickTime movie.

For the capture case, we can't do that because we don't know how long the recording is going to be up front. We allow you to start writing to disk before we know how long you're going to record. So we actually put the movie header, in this case a footer, at the back of the file and the movie data is at the front. You see the problem inherent with this strategy. If you crash in the middle before we have a chance to write the movie header, you have a big file that's useless.

So new in AV Foundation, we have QuickTime movies with what's called movie fragments. And the default is to have movie fragments on. We write a small header at the top that accounts for the first N seconds. By default, it's 10. And then from there on out, every 10 seconds, we'll lay down a movie fragment, which are those little Fs. And they record the information about the movie up to that point. So if at any point you crash, you're good up to the last point where a fragment was written.

The movie will, if you have a two hour recording and then in two hours and one minute someone kicks the cord, it's still good. So do use movie fragments. It gives you crash protection. And you are able to specify the movie fragment interval at the movie file output.

Let's move on to our second case, which is AV Screen Shack, capturing the screen to a QuickTime movie. AV Screen Shack is an equally simple application that does little screen recordings. Ah, the eternal tunnel. I love this. This is the video equivalent of an audio feedback loop. Don't look into the tunnel, it goes back forever. You'll never get out.

Yes, it is sort of like Inception. So you can see here that I'm recording the full screen output. As I move the preview around, you can see what kind of latency we're getting, and that's because I've set a max frame rate of 15. This app exposes some of the properties that you can play around with with the AV capture screen input.

I'm going to do a crop rect here by pushing the set button. When I do that, I can specify just a subsection of the screen. I'll select the top left portion. And now what I'm going to be recording is only that little bit there. So when I move my mouse around up here, you'll see I'm just recording that section.

I'm going to go ahead and buck up the max frame rate to something like 35. And I'll say capture mouse clicks and then push the start button. So we see that we are capturing a movie over here. When I go up here and I move it around, Or when I click, you'll see there's a little black circle drawn around the mouse.

That's to indicate that it's capturing mouse clicks. And when I stop, it'll go ahead and open QuickTime Player and show me the movie that I captured. And indeed, it did capture a movie to the specified format, and we can see it moving the window around when it gets there. There we go. Okay. We'll go back to slides and see how we did that.

AV Screen Shack starts with a single input, which is the screen. It previews the output from it to a video preview layer and writes it out to a QuickTime movie. So a very simple application that's available right now. You have instead of a device with a device input, it's just an AV capture screen input and a movie file output and the preview layer.

So here are some of the features that you saw me playing around with. Fast frame grabbing. It supports up to 60 frames per second on recent hardware. It also does efficient color space conversion to 2VUI for video applications. So by default, it will be producing 2VUI data. It also respects protected content, so if you have DRM content displaying on the screen, it knows to black that rectangle out.

Usage. It grabs frames from a specified CG direct display ID. When you create the AV capture screen input, you tell it which display you want to capture from. It does not, however, support capturing across two displays at once. You can use setCropRect to specify just a subsection of the display.

Set scale factor to capture and scale. If you want, say, the whole screen but you don't want a screen-sized movie, you can scale down and it will preserve the aspect ratio. You can use set min frame duration to adjust the max frame rate that it will deliver. It's not like a capture device where it has set frame rates that it can support, so you need to tell it what kind of frame rate you want. Also, you can set captures mouse clicks to draw a mouse ring around the mouse.

All right, on to our third case, which is processing frames from the camera. And to do this, we're going to use a demo called Stop and Go. OK, Stop and Go. I'm actually going to show you code. I hope you're not afraid. Never done code on stage from Xcode. Let's see if it works. Stop and Go is an app that does stop motion animations. So first I'll give you a preview of the UI that we're creating here.

So we get, again, a little window that lets us start to take a movie. And when we start, it asks us where we want to record it to. And then whenever I take a picture, it's going to record that picture into the movie. So you can have fun with it like this.

Stop. And then the movie we wind up with... looks funny. Okay, so how did we do that? You'd be surprised how few lines of code it actually requires. We set up AV Capture by selecting a video device and making the input. In this case, I could have chosen video from either a MUX input or a video input. I said, choose a device that has the video type of input so that we'll only consider the front-facing FaceTime HD camera. And then I made an input for it and added it to the session.

Then I created the still image output and made my preview layer Telling it what kind of aspect ratio to use by setting the video gravity and setting the frame. and then started the session running. That's it for the input side. Now, I'm not capturing to an AVCapture movie file output because I want precise control over when the frames get written. I only want a still image per video frame in the movie.

So instead, I'm using something called an asset writer to do the writing. The only tricky part of this application is that I restamp the video. When I get each buffer from the still image output by calling capture still image asynchronously from connection, I get a CM sample buffer with the image data in it. And it is timestamped with exactly the time in which it was captured.

But that's no good for my movie, which I want to follow along on a different timeline. So I create a copy of it using CM sample buffer create copy with new timing, which is not a deep copy. It's just making a new wrapper with new timing. And I restart the timing at zero. And I increment it by whatever the frame rate is supposed to be for the given movie. And then I call AV Asset Writer Append Sample Buffer for each video frame that I want to append.

The rest of the code in the app is just some tear down of the asset writer and the UI to start and stop. My kids have been having a field day with this application over the last few days. They're my beta testers. Here's one of the movies that they made.

And here's my favorite. You guess whether a boy or girl did this one. Hmm. So, fun with stop motion animation. Now I think would be an appropriate time to talk about CM sample buffers since we were using one of those low-level heavy lifting kind of objects. CM Sample Buffer, again, is defined in core media, cmsamplebuffer.h.

It's a reference-counted core foundation object, which contains first and foremost the sample data. If it's an uncompressed video frame, you can get at that CV pixel buffer by calling get image buffer, CM sample buffer get image buffer. And once you have that pixel buffer, you can lock the base address and start actually looking at the pixels in that CV pixel buffer.

It also has timing information, and that's what I was manipulating to get the stop motion effect. It has every CM sample buffer has a presentation time, and it might optionally have a decode time if the presentation and decode times are different. And it might also have a duration.

And importantly, it has a format description which travels along with it. This is another Core Foundation reference counted object. So it's very cool that every buffer that flows through the system always has the format attached to it in a lightweight fashion. It's just a reference count being bumped. From that format description, you can find out things about the video. For instance, its dimensions.

Also traveling along with the sample buffer will be interesting metadata. Metadata about the camera, processing instructions about the CM sample buffer, and you can attach your own metadata to a CM sample buffer as it travels through a pipeline. Let's take an aside for a minute and talk about output settings. Because I want to make it very clear when and where format conversions happen within AV Capture.

All file and data outputs in Lion support customized output settings. This is something new. In iOS, only the video data output and audio data output and still image output allow you to kind of sort of override the output. They'll let you select a different format. But in Lion, we let you really configure the output.

By default, the AV capture session is going to determine the baseline output settings for each output in your graph. It does that based on the session preset that you've selected. So for a given session preset, say, "Hi," it's going to determine that that should mean maybe 3.5 megabits per second H.264 for the video data output or for the movie file output.

You can set custom output settings on each output to override the session's session preset. Once you've done that, those output settings stick. So even if you change the capture session preset or the input device to something other than what it was configured for at the time you set the output settings, they will stick.

So even if you've set a higher quality on the input, we'll do additional format conversions in order to satisfy what your output is currently set for. If you want to get pass-through, you want to get exactly what the device is producing, you can set an empty dictionary of output settings on your output.

And to get back to the session presets default behavior, the baseline, you can set the output settings to nil. And then it will go back and start choosing by default what it thinks is best. Here's what video settings look like. Again, it's a dictionary-based approach. You give it a dictionary of stuff that you want it to apply to the output.

You must at least have a video codec specified or a CV pixel format if you're doing an uncompressed format. Here I chose H.264. You'll want a width and a height. Here I'm specifying 720p. And then optionally you can choose to include some compressor-specific properties. H.264 has many of them. Here I chose bit rate to specify a 10.5 megabits per second and a keyframe interval to specify that I want it to insert keyframes. at least every second.

The next part, the scaling mode, is something that we're going to talk about in a minute in more depth. Once you've created your dictionary of settings, you can call your output and say, "Set output settings," and now you've overridden the session's default. Let's talk about the supported video formats in AV Foundation. Of course, we support H.264, which is our canonical compressed format for all kinds of applications. We also support JPEG.

and ProRes in two flavors. The 44441 is a great format as a mezzanine format. It preserves high bit depth source up to 12 bits per channel. It has a mathematically lossless alpha channel and does no subsamplings. This is a perfect one if you want smaller files but really no quality loss. We also support Apple ProRes 422, which produces slightly smaller files than the 4444, but does do chroma subsampling. sampling. We also support lots and lots of CVPixel formats, so pretty much every uncompressed format that is supported by Core Video.

Now let's delve into these video scaling modes. This warrants some extra consideration. There are four video scaling modes supported at the output. The first one is called Fit. It crops the source processing region and scales down if necessary, always preserving aspect ratio, but it will never ever upscale. So this is a good one to use for like presets.

If you don't know exactly what the input is going to be, but you want to create a preset that works across a lot of different applications, the Fit mode is a good one to use. It is the default scaling mode for most of our capture session presets. Let me give you a concrete example.

When you're using the Fit mode, let's say you're coming from a 1280x720 source and you've specified a 640x640 output box, but you've also told it to use the Fit scaling mode. Well, it's going to preserve the aspect ratio and scale it down to fit inside that 640x640 box, and what you wind up with is 640x360. If you have a smaller input source like 320x240 and you specify 640x640 with Fit, again, it never does any scaling up, so you get exactly 320x240. This is why it's good for presets. If you know you don't want to upscale, you can use this mode.

The second one is called the resize mode, or what I call funhouse mode. It crops the source to remove any edge processing region, and then it scales the remainder to the destination box. So it's going to stretch it or squish it, whatever it has to do to fit within the box, not preserving aspect ratio. So again, if we take our guy with the dog and put him into 640 by 640, he's going to get squished. Likewise, if we have a 320x240 source, it will be scaled up and also slightly squished.

If you use the resize aspect mode, or what I call the letterbox mode, it will do aspect ratio preservation on the source and will fill the remainder of the destination box with black bars if it needs to. So let's take our guy again. It's going to preserve the aspect ratio, but it's going to honor that 640 by 640 box you wanted, so it's going to put black bars on top and bottom. Likewise with 320 by 240, since that doesn't line up exactly with a square, you're going to get some black bars on the sides.

The last one is aspect fill, or what I call the zoom mode. The zoom mode preserves source aspect ratio while scaling, but it really wants to show some video in the whole box that you've specified. So it's going to crop the picture to fit within the destination box. So if the aspect of source and destination are not exactly equal, you're going to lose some pixels around the edge. Here's what I mean by that. We're going to see all of that output box filled with video.

But since they don't match, we're going to lose some on the sides because it's going to have to crop it out. Likewise, if we have a 320x240 input source, it's going to scale up and make sure that we see video in every part of the box so we lose a little bit on the sides.

Okay, it's time for our fourth use case, which is complex graphs, capture from multiple video devices simultaneously. And to demo that, I'm going to use an app called AV Video Wall. You might notice something peculiar about this sample app. It's different than all the other sample apps we've seen up to now because it's in all lowercase letters.

The reason for that is that it's a command line app. And this is another difference between Lion and iOS. AV Foundation has no upward dependencies on AppKit. It is perfectly legal and perfectly safe to make AV Foundation tools apps, command line tools. We found it extremely useful to do so. And you could also do it for kiosks or things where you don't really need a UI, just some video or some processing.

AV Video Wall is going to make use of all the cameras that I have connected. I happen to have two cameras connected. So when I run it, it's going to find all the cameras that I have connected. And it's going to build a square for each camera, and it's going to mirror the previews, so it's building four previews in a square for each input device. Maybe I'll point this at something more interesting like me.

And once it's got all of these video preview layers, you can press the space bar to send them flying around the screen. So now we're doing mostly core animation effects here, but it is live video, so you can see I'm still talking, I'm still getting the frame rate that I was before.

And to send them back home, you press the spacebar again, and then they go back to where they were supposed to go. This is complex because we're using multiple input sources, multiple input devices, and we're going to multiple outputs. So this warrants some consideration. How are we doing this? Go back to slides.

The way we're doing this is through the magic of AV capture connections. AV capture connections are the glue that holds inputs and outputs together. I consider this to be sort of an advanced topic, so hang with me and hopefully it'll all make sense at the end. I've shown a lot of slides today that look like this. And this.

And this. And this. But what I've neglected to show you is the thing that's holding them all together, which is these little AV capture connections. Every arrow that you see on screen between an input and an output, that arrow is represented by an AV capture connection. It describes the connectivity between an input and an output or a video preview layer.

The purpose of AV Capture Connections is to, again, describe the connectivity between an input and an output. Each AV capture input has an array of AV capture input ports. So an input can give you more than one type of media. An AV Capture connection ties a specific input port to a specific AV Capture output or video preview layer.

Also, AV capture connections can do neat things like processing with the video or audio data along the way to the output. For video, AV capture connections let you manipulate the video that's delivered to the output by either orientation, so you can rotate the buffers, mirroring, so mirror them in the, I guess, across the vertical plane, deinterlacing, or frame rate limiting.

An audio AV capture connection lets you manipulate or monitor the audio data delivered to the output. So this is where you can use your-- you can get audio level meters. This is the audio traveling to the output. You can also enable or disable individual source audio channels. So let's say you have a pro audio device that delivers 24 channels of audio, and you only want the first two because you know that your microphones are only plugged into the first two channels. You can use the AV capture connection to disable all but the first two source audio channels. You can also adjust individual audio channel levels using AV capture connection.

Here's how you would use it as a status monitor to do audio level metering. You find the connection that you want and get its audio channels properties. And then for each audio channel, get either its average power level or peak hold level. The average power levels tend to jump around and the peak hold levels stay at a peak before subsiding. Here's how you find an AV capture connection. They're owned by outputs. So an AV capture output implements a connections property.

So you could, for instance, call movie file output connections to get its list of connections. And then once you have the connections, you can use its properties or iterate through them. An AV Capture video preview layer only has one connection. Now let's take the case that I was talking about with multiple inputs and multiple outputs. The default behavior that AV Capture uses is to greedily form connections implicitly between all eligible inputs and all eligible outputs. So let's say I have a session with no inputs or outputs, and then I add a video data output.

If I then add a video camera or a video device input, The session will look for connections that it can form. You'll see this guy produces video, this guy can accept video, I'll form a connection on behalf of the caller. So every time you add input or add output to an AV capture session, you're getting an AV capture connection implicitly.

Sometimes this breaks down though in the desktop world where we have more complex scenarios than we do in iOS. Take this DV camera here. This DV camera produces a MUXed input stream, and only once it is housed in an AV capture device input can that MUX stream be separated into its component parts.

Which are video, audio, and closed captions. So if I have multiple devices and multiple outputs, it's not clear what AV Capture Session should do, which input it should try to tie to which output. And in a lot of cases you might get undesirable results because it implicitly tried to make connections everywhere it could and connected the wrong input to the wrong output. So for complex situations like the AV Video wall that I just showed you, we have some additional APIs to help you out.

Power users can avoid these implicit connections by calling session add input with no connections. When you do that, the input will be added to the session, but no implicit connections are added on your behalf. Same goes for add output with no connections or preview layer set session with no connections.

If you've done that, the input or output is not going to provide any data because there is no connection describing the connectivity. So you're going to have to manually create a connection between the desired input ports and output or preview layer. So in summary, Use AV Foundation Capture for all new development. If your app is Lion and forward, you absolutely should be using AV Foundation.

AV Foundation Capture in Lion provides more features, more functionality, new classes that are specific to the desktop, including screen grabbing and enhanced device control, and more flexible output with the additional kinds of output I described. Also supported is complex session support. You can have multiple sessions in your process or one session with a very complex input and output graph. For more information, contact Eric Verschen, who is our Media Technologies Evangelist, and look online for the documentation in AV Foundation Programming Guide.

Also, the developer forums, devforums.apple.com, is a great place to go with questions. Sometimes we engineers, although we're not paid to, we do lurk there and we answer questions from time to time. Related sessions have mostly passed already, but they'll be great to watch on the iTunes feed after the show. There is one remaining iOS-related capture session, and that's right after this one in about 15 minutes. We welcome you to stay for that one. Thank you for coming. I hope you have a great rest of the show.

And there is one more thing. Since we have a couple minutes remaining, I thought we'd take this opportunity to show you something we cooked up. The QuickTime team for years and years had a tradition of making a stupid movie that we show at WWDC, and this year was no different. So without further ado, here is this year's stupid movie.

Hey everybody, I've got great news! What is it this time? Our omnomnomnom kitten of the day video app has just sold a billion copies! I was able to buy my 40th car, but now I need a second car to house it in. So I have decided that you need to support our omnomnomnomkitten video of the day app, Back to the Mac.

[Transcript missing]

You better get started now. Okay, you heard him.

[Transcript missing]