Capturing from the Camera using AV Foundation on iOS 5 - WWDC 2011

Graphics, Media, and Games • iOS • 56:34

AV Foundation provides your application with full programmatic access to video and still images from the built-in cameras. Learn what's new in iOS 5 including exciting API enhancements and performance improvements that allow you to leverage the GPU and CPU to create immersive, performant augmented reality apps using the camera.

Speakers: Brandon Corey, Brad Ford

Unlisted on Apple Developer site

Downloads from Apple

HD Video (279.4 MB)

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.

Good afternoon and welcome to Capturing from... Thank you. Yes, I can push a button. Capturing from the camera using AV Foundation on iOS 5. What you're about to hear is not a repeat of last year's presentation. This is all new stuff, so it's worthwhile being here. What are you going to learn about?

Which iOS 5 capture APIs to use in your app? There are several choices. The AV Foundation capture programming model. We'll just discuss it briefly. And we'll delve deeply into iOS 5 performance improvements we've made in AV Foundation, which I think you're really going to like because they're going to equal more performance for your apps. Finally, we'll talk about some API enhancements that are not strictly performance related.

For this session, we have four pieces of sample code. And before the session, I went and looked and I think two, maybe three of them were already published and there was a holdup on one of them. But if it's not up there by now, check tomorrow and all four of those should be ready to go.

First, let's take a look at the overview of our technology framework and where we sit. You'll notice the green ones, these refer to the frameworks that are below the UI level. AV Foundation, the focus of our talk today, sits below that line. But at the top, you have some other choices, namely UI kits, UI image picker controller. Apple's Camera app is built on that technology and the UI image picker controller itself is built on top of AV Foundation.

The UI Image Picker Controller offers simple programmatic access to the camera. API for high, medium, or low quality recording. Hideable Camera Controls UI. So if you're familiar with the camera look and feel with the control bar at the bottom, that's what you'll be getting when you use this view. And if all you need is a simple view controller, this is the API for you to use because you get a lot of bang for your buck.

You can also hide that Camera Controls UI if you want to build your own buttons and do simple things like take a picture or start video capture, which you get programmatic access to. Or if you want to set the camera device, that is switch cameras between back and front, or control whether the flash should fire or not.

You can also touch on the view that's managed by the UI image picker controller so that you get the touch to focus and touch to expose behavior that Camera app does and you get all of that for free. Also, new in iOS 5, if you touch and hold on that UI image picker controller view, it will lock and hold AE or AF.

So given that there's all of that power available for very little code, why might you want to use AV Foundation for capture? Again, the lower level framework. If you need full access to the camera, you'll want to use AV Foundation because of the additional features that it provides you.

You can independently control focus, exposure, and white balance controls, independently lock any of those things, and select independent points of interest for focus or exposure. So for instance, if you want to expose on one section of the frame while focusing on another, you have the power to do that, whereas in the UI image picker controller, focus and exposure are always tied to the same point. You also get access to video frame data in your process. These are video frames with accurate timestamps telling you exactly when the frames came in on the bus.

uses the Perframe metadata to tell you things about the images that are captured, like their exposure levels, for instance. And you can configure the output format so that instead of getting just the native pixel format, you can get a different pixel format that works better for your app, such as BGRA.

You also can configure the frame rate of the camera. So if you don't need full frame rate, you have some control over how many frames per second you expect to get in your application. And finally, when getting access to video frame data in your app, you can configure the resolution so that if you don't need full 720p, you can get a lower frame, a lower resolution that works well for your processing.

Also, AV Foundation provides some flexible output. When capturing still images, you don't have to capture to JPEG. You can capture to uncompressed formats like YUV and RGB and insert your own EXIF metadata before writing to the camera roll. When doing QuickTime movie recording, you have some additional powerful features like inserting metadata into the movie or locking the orientation. So for instance, if you want your app to be landscape only and enforce that you only take landscape movies, you can set that orientation lock.

Also, you get a layer-based video preview, which you can insert into a core animation layer tree and get all of the benefits that core animation provides in its implicit and explicit animation. Also, you get some control with that layer about how it displays within its layer bounds, whether it fits, fills, or stretches. Finally, some ancillary properties like audio level metering. And of course, access to audio sample data.

So let's take a high look at the AV Foundation capture programming model. We view the world as inputs and outputs. If you've got a phone, it's got a camera on it, maybe two, and from those cameras you might want to do several things like see a real-time preview of what the camera is seeing and have it delivered to you as a core animation layer or get high-quality still images from the camera.

Or as we were just talking about getting video data, per frame video data into your application. Or perhaps simply write to a QuickTime movie. Likewise, all of our iOS 5 supported devices have built-in microphones from which you might want to process the audio data in real time or write the output to a QuickTime movie along with the video or separately.

This is what the world looks like to AV Capture. You have AV Capture inputs at the top level. They're the ones that provide the data. You have an AV Capture session in the middle. The session is the guy that gets inputs added to it and outputs added to it. And it controls the flow of data such that even if you've added inputs and outputs to the session, nobody delivers data or consumes data until you tell it to start running. AV capture session is also the place that tells you if there were any runtime errors.

On the output side, we have some concrete subclasses of the AV Capture Output superclass. Still image output, video data output, audio data output, and movie file output. The video preview is a little bit different. I drew it in orange there to highlight that it does not descend from AV Capture Output like the rest of the classes do. It is a subclass of CA Layer, so it can be inserted into a Core Animation Layer tree.

Also, the ownership model is such that the session owns its outputs, but it does not own its layer. The layer owns it. So if you want to have a layer that you insert into a view hierarchy, you can attach a session to it and forget about it. And when the layer tree disposes of itself, it will clean up the session as well.

We're going to cover four capture use cases today. And we'll take these one by one. The first of which is processing video frames from the camera and rendering with OpenGL. I hope you like augmented reality apps because we're going to do a lot of that today. So let me call up Sylvain Neuse who's going to help me with a chroma key demo.

And Sylvain is going to find me on the screen. And he's going to touch on the interface to sample a color. And when he samples that color, he's also going to expose and do a white balance and then lock. So that he's now locking-- there's no focus on the iPad 2, but if there were focus, it would lock focus as well. Locking exposure, white balance, and focus. And then he goes and selects a background image and uses what he just found as a background substitution to do a chroma key.

And using this, he's able to substitute a background image. And you'll see the buttons at the bottom. He can snap a picture or record a movie. But let's also take a look at the options on the left. He's got a fast button. Don't you wish your app had a fast button? Let's turn fast off.

Okay, so when Fast is off, let's look at the frame rate up at the top. His FIPS is hovering around 20 right now. He's doing 720p and he's doing a lot of heavy work in OpenGL. He has a shader running that's doing the background substitution. Okay, let's turn the Fast button back on.

And now we go back up to 25. We're in 25 because we're in low light. It actually goes up to 30 if we had a little more light since we're letting the device throttle down to a reasonable frame rate for low light. So he's able to get 30 frames per second capture with no dropped frames doing some pretty heavy OpenGL work due to the fast button. And the fast button is all about improvements that we've made in iOS 5. Thank you, Sylvain. All right, so let's talk about what we just did there. We started with an iPad and used the back camera.

and captured video frames using that video data output because we wanted the frames into the client process. And then everything after that point happened in custom client code, processing and rendering with OpenGL, including the preview that you saw. AV Foundation was not handling that preview at all. So our involvement as far as AV Foundation is concerned, we had a device and an input and a session and a video data output delivering sample buffers and that was it. Then we got out of the way.

To create an AV capture session, you simply allocate an edit and set a session preset to determine the baseline quality that will be delivered to your outputs. For this demo, Sylvain chose 1280 by 720 so we could highlight the kinds of performance that we're getting with these improvements in iOS 5 at a very high resolution. He then found a suitable AV capture device. He chose the default. The default device on all iOS devices is the back camera. He then created and added the device input. He associated a device input with the device and then called session add input.

Then he was done with the input side. For the video output side, he alloc'd and annited a AV Capture video data output, added it to the session. and then performed some minimal configuration before starting the session. He set the output format to BGRA since that's a format that OpenGL likes.

And then set a delegate on the video data output. Our AV Capture data outputs have fancy delegates. So it's not just a delegate callback, it's a delegate with a queue. So you call setSampleBufferDelegate queue. And with that queue, you're telling it, when you call me back with video frames, I want them to be delivered on this dispatch queue. So you have some control over where and how these frames come to you by specifying the queue on which they are delivered to your process. Then he called sessionStartRunning and he was up and going.

Inside the delegate callback, for each video frame, you're delivered a capture output, did output sample buffer from connection call in which you're free to do whatever processing you need. I also mentioned that when he touched on the screen, he was sampling a pixel and before he did that, he was performing an auto exposure and then locking.

To lock auto exposure, you first need to lock the device for configuration and then you can set any of the properties on the device such as focus mode, exposure mode, white balance mode. And when you're done, don't forget to unlock the device for configuration. Now to talk about the Fast Button and what exactly was going on there and the great improvements we've made with bridging between Core Video and OpenGL, I'd like to invite up Brandon Corey to talk about that.

So I'm here to talk about the fast button. To make things faster, to allow people to interact with the GPU better for capture and for all of our AV Foundation APIs, we added a new API called CVOpenGLES Texture Cache. Now, the intent of this was to provide a CF type that allows clients to bridge Core Video Pixel Buffers to OpenGL ES textures.

And the idea is to avoid copies to and from the GPU. So for something like that 720p video, if you were doing BGRA and you wanted to get that data in and out of the GPU, you know, you're talking in the neighborhood of 220 megabytes a second of copying and not to mention twiddling data back and forth.

And, you know, it's a significant overhead that we like to avoid if at all possible. The other thing we like to do is allow you to recycle textures so that GL doesn't have to spend so much time maintaining its state every time you create and use new textures.

I wanted to point out that this is only supported in OpenGL ES 2.0 and later, so you can use it with all your fun shaders, but you will have to use 2.0 and not the older ES, which is kind of a compatibility mode. And all the API is contained in the header above there, the Core Video CV OpenGL ES Texture Cache.

So this is what the API itself looks like. This is the main API that you'll call once you've created the texture cache to bind an actual pixel buffer to an OpenGL ES texture. So to kind of go through this, we have an OpenGL ES texture cache ref, which is one you'll have previously created. Then we have here a pixel buffer, which is the one you would be getting from your output delegate callback in the capture case.

[Transcript missing]

So kind of an overview here, the standard texture binding that you would use with OpenGL. You have a CV pixel buffer here.

And OpenGL contains these textures, texture objects which are backed by texture images that you would get if you use GL Gen Textures. So normally to upload data to a texture, you'd create your texture. You would get the base address from your pixel buffer. And notice that's a void star there.

It's essentially raw data that you're getting. And you're calling the GL text image 2D with your appropriate parameters. And at the end there, the actual pixel data itself. And what happens here is we do the equivalent of a mem copy with some twiddling and such to get this data into the texture image, which OpenGL ES was sourced from.

Now for the CVOpenGL ES texture binding, The idea here is you have the equivalent extremely long function name, CBOpenGLESTextureCacheCreateTextureFromIm age, with all the very similar parameters there. But you'll notice here that we actually pass in the pixel buffer itself and not the pointer to the raw data. And we're kind of moving this model so when you call in with the pixel buffer, you'll end up getting back a texture that you would use in PointOpenGLES at. And what this does is it allows us to avoid this copy by binding that CV pixel buffer directly into the texture object.

So to get, that's all fine if you just want to get your data to the screen and you just want to do some processing before there, but what if you want to get your data back out? Now normally with OpenGL, for those who aren't familiar with this, we use what's called a frame buffer object to encapsulate what OpenGL draws into. Now normally your frame buffer object can be your screen, but if you want to get the data back out, you can do this in one of two ways.

You can create and attach a render buffer, which is essentially a data-backed buffer that GL can write into, or you can create and attach a texture to that frame buffer object. And that's usually used for something like if you wanted to feed that back in for another pass of rendering. And to get the data back out, you would then use GL read pixels in the case of a render buffer or GL text image 2D if you wanted to move the data from one texture to another.

And these all are attached with this GL color attachment as part of the frame buffer object. Note, the diagram there shows multiple color attachments and our implementation supports the first, but you really only need one in this case. So normally here you have the OpenGL ES render and you'd have the equivalent of a texture render buffer object.

And what you would do is you would render into this texture and then you would use GL read pixels to pull the data back out of this texture into your buffer. But again, that's going to incur the equivalent or worse than a mem copy because the data will have to be pulled back out and also twizzled. And in this case, you can have OpenGL rendered directly into your iOS Surface-backed pixel buffer.

And you can completely avoid that GL read pixels call. And this is a similar binding to attach a texture to a frame buffer object where you would create the image in the very same way you would for the input case. But when you get that texture back, you can bind it to the frame buffer and attach it to the actual frame buffer object here. And it would be the same or very similar to if you had GL render into a texture that it created. But in this case, it's rendering directly into your buffer.

So a couple usage notes here. We kind of have a special buffer type to handle this. And all the buffers from AV Capture and AV Asset Reader are created with the appropriate formatting. So if you use either of those, you don't really need to do anything. If you create your own pools, however, you must pass this KCVPixelBufferIOSurfaceProperties key as a pixel buffer attribute when you create your buffer or your buffer pool.

And that'll make sure that everything's formatted correctly so we can apply an efficient attachment and get that fast speed that you'd like. And if you use AV Asset Reader, be sure to use the pixel buffer pool it provides. Not only will it give you this particular backing that you need in this case, but on top of which, there could be other alignment issues and such that the encoder might need. So it's always a good idea to use that whenever possible.

As I mentioned earlier, BGRA, 420V, and 420F are supported. And OpenGLS also now supports GLRED and GLRG, which are the single-channel render targets. So instead of rendering into BGRA, you can render using GLRED and GLRG. And I just wanted to mention that those are only available on the iPad, too, but they're definitely cool to play with. And with that, I'd like to turn it back over to Brad. Thank you.

So if you want automatic 10 to 15 FIPS more in your process and you're using GPU, you might want to check that API out. AV Capture Video Data Output has some peculiarities that I'd like to address. That sample buffer delegate queue that I talked about earlier, it needs to be a serial dispatch queue.

If you're familiar with GCD, you know that there are two kinds of queues. There's concurrent queues and serial queues. Well, video frames are coming in in a particular order and you want them delivered to your callback in that same order. If you use a concurrent queue, they might be delivered to you out of order and that would be hilarious.

So make sure that you use a serial dispatch queue and do not ever pass dispatch get current queue because you may think you're on a serial queue but you don't really know if you use get current queue. So always use a known queue by making your own, which is of the serial type, or use dispatch get main queue, which is always a serial queue.

By default, we're going to give the buffers to you in the camera's most efficient format because we want you to get the most efficient pipeline possible and use the least amount of CPU possible. But if you need it in a different format, such as BGRA, which is not the default, you can use the video settings property to override that behavior. Both Core Graphics and OpenGL work very well with BGRA.

So here are some performance tips. You can also set the min frame duration property if you want to cap the max frame rate. Probably scratching your heads, that sounds backwards. But to get a frame rate, you need to talk about the one over the duration. And in our APIs, we refer to things in terms of durations, not rates. So if you want a max frame rate of 15 frames a second, you need to tell us to set the min frame duration to one over 15.

You configure the session to output the lowest practical resolution. So if you only need 320 by 240 or something small, don't set it to session preset high because you're going to be getting way more pixels than you can process. Also, there's a property called always discards late video frames. Set this to yes, which is the default, if you want us to efficiently drop late video frames before they're even delivered to your process.

So if all you're doing is say preview, you definitely want to set this to yes because we won't bother trying to message frames to you into your process that you're too late for anyway. You might want to set it to no, however, if you're trying to write a movie file and late frames are okay.

Your sample buffer delegate callback must be fast. That's the main thing. Be as efficient as you can in your callback and don't waste time. Let's talk about the supported pixel formats for video data output. The default is 420V. If you're not familiar with this format, it is a planar, a bi-planar YUV 420 video format. So that means the luma and the chroma are in separate planes, separate sections of the buffer.

And the chroma is subsampled in both the horizontal and vertical direction. The V is for video range, which means the samples that you get are clamped to the range of 16 to 235. This is common in video processing so that you have some leeway in the 0 to 15 area for super black, super white.

And this is the default format on all iOS 5 supported cameras. We also have a variant of 4.2.0 called 4.2.0f, which is just like 4.2.0v except it has an f. And the f is for full range color. That is, the full range of 0 to 255 is used. And this is our default when you're using the photo preset because when you're doing still image capture, you want full range and not video range.

We also support BGRA, which is a blue, green, red alpha format, and it's 8 bits per pixel, which is more than twice the bandwidth of the YUV flavors. So it does come at a cost. If you can avoid using BGRA and do your work in YUV, it's more efficient from a bandwidth standpoint. Now let's talk about some specific iOS 5 enhancements on top of the core video bridging that Brandon already mentioned. In iOS 5, we support rotation for CVPixel buffers. Rotation is hardware accelerated.

And you can use AV Capture Connection's set video orientation property. So for instance, by default, you're going to get landscape-oriented video frames, but if you need to pass them off to a library that needs them portrait-oriented, we can do that work for you using the hardware. All four video capture orientations are supported: Portrait, Upside Down, Landscape Right, and Landscape Left, but we do not support arbitrary rotation to weird angles. The default is still non-rotated buffers. So for the front camera, by default, they're gonna be landscape left, and for the back camera, landscape right.

Also new in iOS 5 is support for pinning of minimum frame rate. Up to now, we've only let you specify the max to throttle. But by giving you access to minimum frame rate pinning, you can now create a fixed frame rate capture if you need to. You can do this using AV Capture Connection's setMaxFrameDuration property. Again, we express it using duration, not frame rate. So to cap the min frame rate, you set the max frame duration to one over what you want.

But here's a caveat: fixed frame rate captures can result in reduced image quality in low light. By default, we like to throttle down the camera in low light to get longer exposures and better looking pictures. If you pin the frame rate to a high frame rate, you might get worse looking pictures in low light.

Also, you can use the AV Capture Sessions session preset property to affect the resolution of the video data output. Here are the currently supported session presets. All but the top three should be familiar to you if you've used this API before. High, medium, low, and then some named presets and photo mode, all which have their purposes.

And then those three at the bottom are new in iOS 5. Now we allow on all of our devices 352 by 288. So if you have a streaming application and you only need SIF quality video frames, we can deliver them in that resolution. And also I'll be talking a little bit more about iframe, which is an important format for Apple and an interesting format for you if you plan to be doing a lot of editing with your captured movies.

Here is the grid of supported resolutions. You'll note that high or the high quality preset means different things on different devices. If you have an iPhone 3GS, it only supports up to SD video, so the highest quality it gives you is VGA. Whereas the back cameras on iPhone 4 and iPad 2 and iPod Touch can give you 720p. Medium and low mean the same thing on all platforms. And then I'd like to call out a peculiarity here at the bottom for photo.

Now when you use the photo preset and you capture still images from it, you get the full resolution. 5 megapixels or 2 megapixels or whatever that happens to be. But if you're using the photo preset with video data output, it's a little different. We have some special considerations here because these were added in iOS 4.3.

The preset for photo delivers full res out the still image output, but only preview-sized buffers out the video data output. We can't give you full res stills for every video frame. The bandwidth would be too intolerable. But you get preview-sized buffers out the video data output, which are sized to about the size of the screen, but with exactly the same aspect ratio as the full-sized buffers.

So if you have any processing to do, lining up or choosing where to set focus, you can do those on the preview-sized buffers in your video data output callback and then still snap pictures using the still image output whenever you want to. And as I mentioned, the aspect ratio is unchanged. Continuing along with our supported resolutions grid, you'll note that every single device and every camera supports the SD or lower resolutions, but only the back cameras support the HD modes.

All right, now let's delve into that iframe thing. What is iframe? Well, to explain what iframe is, I need to tell you what iframes are. Little i, big F is a little different than what iframes are in general. This is a term used in video compression to talk about dependencies between these video frames. Iframes or intraframes have no dependencies on any other frames. They can be decoded individually and they don't rely on any frames before or after.

P-frames are predictive. That means in order to be decoded, they need to search back to their previous iframe in order to fully reconstitute the picture. B-frames are even worse. They can predict in either direction and they're stored out of order. These are added levels of complexity to get smaller files.

So what is iFrame? Well, iFrame is an Apple term for Apple ecosystem-friendly video. It's a format that we've given out to third-party camera and camcorder vendors. There are already 30-plus camcorders and cameras on the market that support iFrame. It means H.264 iframe-only video plus AAC or PCM at a constant frame rate of 2997 or 25 on the iOS products we support 2997.

And the data rate is quite high. It's 30 megabits for quarter HD or 40 megabits for 720p.

[Transcript missing]

It is supported on all iOS devices with HD cameras. Let's move on to the second capture case. And to show that, I'm going to call up Matthew Calhoun to show us Rosie Writer.

Rosie Ryder is yet another augmented reality application. Something you should know about Matthew, he's a very cheerful guy. He's a very friendly guy. I like to say that he views the world through rose-colored glasses. So he wrote an app that really does view the world through rose-colored glasses. It's taking the video from the back camera.

It's processing every pixel of every buffer and applying a tint to them so that they come out kind of rosy-colored. And then he's recording. So he's actually already started the recording there. And you can see he's getting 25 frames per second. That's, again, only because we're in low light.

It will actually do a full 30 fips without dropping any frames. So applying processing to every pixel, doing a real-time preview using OpenGL, and writing a quick-time movie at 30 frames a second, 720p, and we can view that live. That video. It's processing every pixel of every -- So it was doing audio as well, obviously.

Thank you, Matthew. Let's talk about how we did that. Okay, Rosie Writer again used the back camera from the iPad. to get video data output into his process and also audio data. and Write to a QuickTime Movie, but he wasn't using the movie file output because that doesn't let you get at the buffers to process them.

So what does this look like to us? Well, you still have a session in the middle. You still have device inputs for the camera and for the microphone. You have data outputs. But then there are some new classes that are not strictly part of AV Capture, but they are part of AV Foundation and it's called AV Asset Writer. AV Asset Writer has an input for video and audio.

To talk about AV Asset Writer, I first need to talk about what an AV Asset is. AV Asset, which is defined in AV Asset.h, is how we abstract a media asset on our platform. So it can be URL based or stream based. It can be inspected for properties.

If you want to play one, you would use an AV player. If you want to read one in an offline mode, you would use AV Asset Reader. If you want to do a full file export of one of these assets, you would use AV Asset Export Session. And for our use, you can write them using an AV Asset Writer. To create one of them, you alloc and init one with the file type that you want to write to. In Matthew's app, we were writing a QuickTime movie.

And then there's a little bit of setup to create the inputs for each kind of input data you'll be feeding it. In this slide, I put the video input setup. So you tell it you're going to be providing it video input, and you give it some output settings to tell it what kind of output it should produce, what format it should produce.

You set the expect media data in real time flag to yes so that it knows that you're going to be in the real-time capture scenario and so it doesn't do additional buffering on your behalf. And then you add the input to your AV asset writer, start up your delegate as you normally do, and start the session running. Inside your delegate callback, after you do your processing, you would call video input a pen sample buffer. That's it. You're now writing the movie one frame at a time from within the delegate callbacks.

So now let's talk about which one you would use. We have two classes that sound like they do the same thing, Writer and Movie File Output. Well, Movie File Output has some features all of its own. It requires no setup. You can't specify an output settings on it because it inherits them from the session preset.

It's a flexible object in that you can record multiple movies from it. You can start one, then stop, and then reuse that same movie file output to do multiple recordings. It also supports some limiting like file size limiting or duration, and it will automatically stop the recording when any of those limits are tripped.

But it does not allow for client access to the buffers before writing them to disk. So if you use a movie file output, you get what the camera sees. You don't get to process the frames. AV Asset Writer, on the other hand, is a general purpose writing utility that can be used for the non-real-time case or the real-time case.

So it does require setup of output settings because it doesn't know what to produce by default. It's a one-shot writer. You can't use this to record multiple movie files. Once you've finished a movie file with an asset writer, you need to throw it away and make a new one.

But it does allow for client access to the video buffers because they come into your process in the video data output and audio data output, and then you send them along their way to the asset writer in your delegate callback. Be aware, though, that asset writer does incur some more overhead than movie file output. So if you don't intend to do any processing, you will want to use the movie file output.

Here's what the sample video settings might look like that you would feed to the asset writer. And these are defined in AVVideoSettings.h. You can tell it what codec you want. I chose H.264. You can specify a width and a height. You can also give it a dictionary of compressor-specific properties. For H.264, you can specify a bit rate. I chose 10.5 megabits per second. You can also specify a max keyframe rate interval so that it will force a keyframe or an iframe at least once a second.

You can also specify profile level. And now let's look at the equivalent for audio. The audio settings look like this. You can specify a layout, so is it stereo, is it mono, the format that you expect it to produce, AAC for instance, the bit rate, number of channels, sample rate, and the layout.

So lastly with AV Asset Writer, let's talk about some do's and don'ts. Do set that expects media data in real time to yes when you're using AV Asset Writer input with capture. Otherwise, it will do some buffering and try to do some optimal interleaving of video and audio, and you will drop more frames.

Also, set AV Asset Writer's movie fragment interval to a non-zero value if you want to preserve your recordings in the event of an interruption. I'll talk more about that in a minute. Also, that always discards late video frames. Set it to no. Earlier I said set it to yes. But for this case, I'm saying set it to no because you want to capture a movie file. So even if the frames are a little bit late and there's a possibility that you can still write them to the movie, you would want them.

Don't hold on to those sample buffers outside the callback. So don't do your own buffering. Append them to the asset writer within the callback. And again, don't take too long inside your callback. I just mentioned movie fragments. This is a neat technology that helps in the event of crashes or interruptions. Interruptions are unpredictable. You don't know, this is a phone that you're dealing with. You don't know when you're going to get a call.

So if you're in the middle of a movie file, you really want that movie file to succeed even if it gets interrupted or crashed. Here's what a movie file looks like that you would get off the web. It has a movie header at the top which tells where all of the data is. It has sample offsets that say find sample N at this offset in the file and the actual data is afterwards. It's called a fast start quick time movie.

When we're capturing, we can't write fast start movies because we don't know how long the movie is going to be. We don't know where all the samples are yet. So we have to put the movie data up front and only when they end the capture do we append the header to the end. You see the problem with this strategy. If you crash or you are interrupted in the middle of your recording and you haven't had a chance to lay down that movie header yet, you've got a big file that's unreadable.

Our solution for this is movie fragments. We write on iOS devices, QuickTime movies with movie fragments. So if you've specified a movie fragment interval to the AV asset writer or to the AV capture movie file output, it will lay down a small header at the beginning that accounts for some number of seconds.

By default, it's 10. And then for every 10 seconds of your movie, it will lay down a movie fragment. And if you crash at any point or are interrupted at any point, your movie will be safe up to the point of the last fragment that was recorded in the movie. So the most that you would lose is 10 seconds. All right, let's move on to our third capture case, which is scanning video frames for patterns using the flash and video data output. And to help me with that, I'd like to call up Valentin Bonnet.

and could we dim the front lights a bit? Valentin has helped us out a lot with his summer intern project. He's written a really cool application about an emerging technology. This technology I like to call Lost, L-O-S-T, or line of sight texting. Have you ever been in a scenario where you have no cell coverage but you'd really like to text a person, you can see them, but you just, you can't message them. There's no Wi-Fi, there's no cell. Well, wouldn't it be cool if you could use your iPhone? You ready for me?

So this line of sight texting business is using input from a sender phone, translating it into a new technology I call Morse code. and translating it into dots and dashes that it then uses the LED flash to reproduce on the receiver side where the receiver interprets the pulses of light, turns them back into text and makes it look like a text message. Let's go back to slides to talk about how we did that. And we can bring up house lights now unless we want a late afternoon ambiance.

Cool app, huh? The MorseMe application has a sender and a receiver side. And the sender side is as simple as can be. It just uses the LED torch. So all it looks like to AV Foundation is an AV capture device. That's it, nothing else. On the receiver side, he's using the iPad's back camera. along with the video data output to process those frames, find the light pulses and turn them back into text. So that looks like session in the middle. Device input on top, video data output, and live video preview.

So let's talk about torch support in general in iOS 5. We have three torch modes, off, on, and auto. For this app, we were just using on. He was setting it to on when he wanted it on and off when he didn't. You can also use the hasTorch property to determine if the capture device has one. Obviously it was going to be a one-sided conversation because he was using an iPad and it doesn't have a torch. You call lock for configuration before attempting to set the torch mode.

And flashlight apps, I don't know if I should ask anyone to raise their hand. Has anyone written a flashlight app here? Because you know who you are. And you know that last year I stood up here and I told you not to write flashlight apps with our LED torch API because it was only for video illumination. But you went out and you wrote your flashlight apps anyway.

And the problem with that is doing it the old way, the AV capture session had to be running. So you were in effect running a full video pipeline just to turn the flashlight on. So you were burning battery at an alarmingly fast rate. Well, in iOS 5, we've altered this so that to use the LED torch for a flashlight, you don't need to run the session anymore. You can just find the capture device and turn the LED torch on or off. We will not run the capture session in the background.

There's no allocation of buffers. There's no additional CPU. So you're only using the power that would be used by the LED torch turning on and off. So that's the problem with that. So the code shrinks down to just three lines, basically. You lock the device for configuration, set the torch mode to on, unlock the device. That's pretty much all that the sender side of the MorseMe application did.

We have a couple torch enhancements in iOS 5 as well, some availability accessors. The device housing the torch is a phone. It has a lot of components in it that can heat up. And the torch may become unavailable as the phone gets too hot. So you can now key value observe this isTorchAvailable property to know if it suddenly becomes so hot that it needs to cool down. And so it will tell you when it's available again as well. Also, you can key value observe the TorchLevel property of the AV Capture device to know when it's throttling the actual illumination level of the torch down because it's getting too hot.

All right, with that, let's move on to the fourth demo, which is processing captured still images using Core Image. And to demo that, I'm going to invite some of my friends up on stage. Okay, there's StashCam. Okay, StashCam is a still image capture application. Pretty basic. It does the kinds of things you would expect it to do. You can switch between the front and back cameras.

So there are my friends. There's me. You can also take pictures, as you might expect. It's doing a little animation when the capture is supposed to happen. You can zoom in just like the Camera app, so we now support digital zoom or scale and crop and take pictures of digitally zoomed pictures. But the killer feature of the app is that it can detect faces. So I'm going to turn the Detect Faces button on.

Okay, so go ahead, Matthew, turn sideways so that you, so, oh, it needs two eyes. Oh, there it comes back. And it works equally well on the front camera as well. My kids have been beta testing this app for the last week and it's a hit with them. Okay, thanks, guys.

All right, how did we do that? Well, we used the iPad and the back camera I was showing you a video preview while I was superimposing the funny glasses on top. and Capturing Still Images. But everything else that was done, learning where the faces are, was done with Core Image. Core Image is making its iOS debut in iOS 5 and they have a new class called CI Face Detector and I'll talk about that a little bit more. So from the session standpoint, it just looks like session, device input, still image output, video preview layer.

Let's talk about the enhancements for still images within AV Capture first. You notice that when I took a still image, I flashed the screen. I could have done a fancier sort of iris animation or something like that. But in order to do that and to have it actually line up with the picture that's being taken, you need to know exactly when the camera is taking the photo.

And that's not instantaneous, especially on like an iPhone if it's using the flash, it might take a minute for the flash to warm up. So AV Capture still image output now has a new property called isCapturingStillImage. You can key value observe this property to know exactly when the still image is being taken and you can key your animation, your iris shutting or your shutter off of that key value observed property.

There's also another new property called Subject Area Change Monitoring Enabled, which is a mouthful. But what it means is if you have previously locked focus, it might be nice to know when the scene changes enough that you would like to go back to a continuous focus mode. So it's great that we let you continuously focus or lock, but now we're giving you the ability to know when things have changed enough that you might want to focus again. So once you lock a device, focus or exposure, you can start observing this property and set Subject Area Change Monitoring Enabled to true. And once you do that, you'll be notified when we think that the picture has changed drastically or the user has moved.

And then on the receiving side, we have Core Image and CI filters. This is their iOS debut, iOS 5, and there are some great CI filters available. There's one for Red Eye Reduction. There's one for Auto-Enhance, which is just like the iPhoto One Touch cleanup button. It applies a number of good cleanups to the still image.

There are others as well. I just called those out. You can, of course, do sepia tone and color cube and other things like that. Also new is a CI detector class, which doesn't process the image to deliver an output image, but it processes the image to deliver data about the image. In our case, the CI detector is finding faces and features on those faces.

So the CI image interfaces are great. They line up well with AV Capture because they allow you to make a CI image out of a CV pixel buffer. So if you get a still image or a video data output buffer, you can easily create a core image CI image object from it. You do need to specify BGRA output to be compatible with CI image. It doesn't work with the other formats right now.

Here's some code. This is the main code in StashCam that finds the faces. Inside the still image capture button, I get the sample buffer's image buffer, so now I have my CVPixel buffer pointer, and I create a CI image out of it. And then I create one of these face detector objects, giving it some options. Here I told it CI detector accuracy low, so it's because I'm a real-time app, I don't want it to take forever to find faces.

And then I ask it to find features in image, so calling face detector features in image, it produces an array of features. And from those CI face feature objects, I can find the bounding rectangle, left eye, right eye, mouth, and do some interesting things with how the face is tilted as well. This is all available for third-party use. And I highly encourage you, if you're interested in this core image stuff, to take a look at their session tomorrow, which is 2:00 PM in Mission.

Let me summarize. In iOS 5, AV Foundation Capture gives you more CPU cycles. In the dramatic OpenGL case, you saw us getting an immediate 10 to 15 frames back. That's huge performance. There's also a lot of other performance work we've done under the hood that require no changes. And hopefully you'll feel the performance benefits even if you don't opt into the GL improvements.

Bridging Core Video and OpenGL is a key win in iOS 5. We've exposed more resolutions, so you can get smaller resolutions for SIF streaming apps. Also, if you're doing editing apps, you now have the option of iFrame recording. We've added more flexibility to still image capture for doing things like digital zoom, crop and scale.

And we hope this is going to enable you to deliver even cooler apps now that you have more cycles to do those cool things. So for more information, please contact Eric Verschen, our Media Technologies Evangelist for AV Foundation, or check out the documentation online. Also, the developer forums are a great place to ask questions and have them answered either by fellow users or sometimes even by Apple engineers. Our related sessions have all passed, but please give them a look in your iTunes after the show. Thank you for attending today. I hope you've had a great session. Have a great show.