What's New in Camera Capture - WWDC 2012

Graphics, Media, and Games • iOS • 1:01:01

AV Foundation provides your application with full programmatic access to video and still images from the built-in cameras. Hear about improvements that simplify and speed up your applications. Learn key practices for debugging performance issues, correlating frames with camera motion, and the proper use of AVCaptureSession.

Speakers: Brad Ford, Ethan Tira-Thompson

Unlisted on Apple Developer site

Downloads from Apple

HD Video (387.8 MB)

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.

Good morning. Good morning and welcome to session 520. If you're interested in cameras or capture or new stuff, you've come to the right place. And looks like we've got a really good crowd here. Thank you all for coming and thank you for all of your apps and all the work that you do. You're making our platforms a very vibrant place to be right now.

We're going to cover a lot today, so if I sound like an auctioneer, it's not because I'm nervous, it's just because we've got a lot to go through. What you're going to learn today is five main topics to cover. Performance improvements in Mac OS X 10.8. A bit about the camera ecosystem as a whole. New AV Foundation capture features in iOS 6. That's the bulk of it. And then we have a section on solutions for performance problems in your capture apps. And finally, a kind of a neat demo at the end where we'll talk about synchronizing motion data with video.

What you will not learn, however, is AV Foundation and Core Media Basics. I've given this talk two or three years in a row now. I think you've seen the block diagrams enough, you've seen the class hierarchies, that we're not going to do that again. But you do have all of the videos available to you on developer.apple.com, and we encourage you to go back there and view those as prerequisites to this session. We have five pieces of sample code for today's session. I just looked before the session, and most of them are up there. I anticipate that the rest will be up later today or tomorrow at the latest. And there are the URLs.

[Transcript missing]

Also, there is an optimization in there for writing movies, which is we detect duplicate frames. If there's been no movement in the frame, we won't give you those duplicate frames. But for screen-grabbing apps that are doing effects or such, you want all of the frames. You want a constant frame rate. So you can opt out of that behavior in 10.8, the duplicate frame removal.

See the updated AVScreenShack sample code for examples on how to use all of these new APIs. Next up, is support for hardware-accelerated H.264 encoding. 2011 and newer Macs that have the Sandy Bridge or the latest MacBook Pro that has the Ivy Bridge chipset all have special hardware for doing H.264 encode.

Up to 1920 by 1080. So if you have something larger than that, it will fall back to software, and that happens transparently. It just--if you can get the hardware, you'll get it. If not, you'll get the software. This is both movie file output and asset writer in real-time mode are eligible for getting real-time H.264 encoding, and no code change is required. Everybody just gets it automatically. The difference is you'll see dramatically decreased CPU usage and a lot more time for doing processing or other things in your app.

Next up, support for just-in-time compression. So we know that there's a mix of consumer-based and pro apps on the desktop, and we have a feature in AV Foundation on the desktop that's kind of just for pro apps, which is that you can do frame-accurate starting and stopping. This helps if you want to, say, switch between one recorded movie to another recorded movie on the fly without dropping any frames. That's a neat feature, but it does come at a price, because in order to do that, to guarantee that we can switch on any frame boundary to a new movie file, we have to compress all the time in the background, even when you're just previewing.

So that comes with a big power cost to doing that all the time. In Mac OS X 10.8, you must opt in for the frame-accurate start behavior. We think that most clients would rather have the power win and not a frame-accurate start. So if you implement a delegate to the AVCapture file output, you can just do that. You can choose whether you want frame-accurate start or not.

If you want frame-accurate start, then in capture output should provide sample-accurate recording start, you'll answer yes, and it will compress all the time, and you'll still burn a lot of power, but you will have the feature that you depend upon. This feature is great for lowering power consumption when previewing. And please see the updated AV Recorder sample code for an example of how to do this. This is up there right now.

Okay, lastly, with 10.8, we have the newly published CoreMediaIO DAO SDK. A lot of times on the developer forums, we're asked, "How do I write a video device driver? Is there a replacement for sequence grabber? What do I use?" And last year, we made the CoreMediaIO framework available with support for writing DAO or device abstraction layer drivers.

That's if you want to write a native 64-bit device driver for a video card or, you know, a little gadget on OS X. But we didn't make it very easy for you because we didn't release an SDK. This year, we've just released to the public, not just WWC, but to everyone, a sample SDK that includes a sample device, both an assistant and a plug-in, so it shows how to make it a shareable device across processes. And if you're interested in doing this, please come and see us in the labs and talk to us about how to use this SDK. That's the URL right there.

All right, let's move on to the camera ecosystem. I know you're all eager to get into the APIs. I am, too. As developers, we tend to kind of obsess over the minute details of these APIs and how we can use them for our nefarious purposes in our apps.

But I'd like to start by taking a step back and just looking at the bigger picture, because your capture apps, especially on iOS, are part of a bigger ecosystem. There are a lot of concerns that you need to be aware of, chief among them being privacy and sensitivity of data. So I wanted to just talk about the bigger picture before we dive deep.

Apple's Camera app saves photos and videos to a central library. You know, before iOS 4 and before we gave you access to this library, this was the place where only the Camera app could write photos to. It's called the Assets Library, and as of iOS 4, you have read/write access to it in your apps.

That means that you get access to the photos that people have taken on the device in their camera roll. You have access to synced assets that came in from iTunes. You have access to saved assets that they took from your app or mail or some other source. And now we have photo streams as well. All of this data is there for your reading and for writing.

But all of this data is sensitive and personal, and it's a big deal, just as location of the device is a big deal to people. They want to have confidence that their photos are not being used behind their backs or for purposes that they're not aware of. So as of iOS 6, devices now prompt the user to grant access to the library.

So when you take a picture the first time in your app, I think it's a good thing, too. The first time your app tries to write to the assets library, the user will be told about it and will grant access or not. What that means is your app may now fail in places where it didn't before. Don't just assume that you have automatic access to the assets library, so please handle errors.

Great. Now we're on to the good stuff. AV Foundation capture features in iOS 6. We've got three main areas of feature improvements that we'd like to talk about today. The first is video stabilization. We'll talk about real-time face detection and AVCapture video preview layer enhancements. Stabilization first. What is video stabilization? As the name implies, it steadies shaky shots. And with help from the gyro and core motion, we're able to take what would previously be unusable footage and make it into something that's a good, lasting memory that someone can actually use.

Besides steadying for shake in the hand, it's also compensating for rolling shutter artifacts. If you're not familiar with what those are, rolling shutter is local motion that happens in a frame between the time it started being captured and to the end time when it was captured. It can manifest itself as a little ripple or a wave. Here we see a picture of me riding my bike down a hill. And as you'll see here, I witnessed a phenomenon. It was a spontaneous change in the curvature of the earth right before my eyes. But luckily, my phone was there to capture it.

This can be really disconcerting to users because it sort of, you know, gives a seasick or earthquake effect, and it can really ruin otherwise good video. The most dramatic way to show the effects of video stabilization is to show you a before/after. And here we have on the left the before.

This is shot by a member from our team in a small boat traveling in the Alameda. And it's great because we get to see this bridge closing, the big green bridge. And what you see on the left is obviously shaking around a lot. He's doing his best to hold the phone absolutely still.

But, you know, the motor is going and there are waves. So things up close look okay, but the farther away we get, and particularly when you see all of these straight lines, it's really obvious when there's shake. And what we have on the left is probably not a piece of footage that I would choose to keep because it makes me sick to watch.

But if you look on the right side, take a look at this. This is really dramatic. The left side is just kind of juddering all around, but the right side is absolutely rock solid. It looks almost as if he had a Steadicam there on the boat and he was just, you know, gently panning from side to side. So applaud for that. That's a technical marvel.

So video stabilization, why use it? Well, I think the pros are obvious. Camera phones in general are susceptible to shake, partly because of how people hold them. You know, they're very light. They're held with one hand. It's hard to hold your arm exactly straight. So there's going to be some shake when people take videos.

Also because the bigger we go, and remember, our images are getting bigger and bigger and bigger. The original iPhone that did video only shot at 640 by 480. Now we're up to 1080p. And the HD resolution recordings are especially susceptible to rolling shutter because they have more pixels. There's more time for that local motion to happen in the frame and wreck the shot. Stabilization saves otherwise unusable footage, which is great, because it means you've saved a memory for someone.

and the kicker is it works in real time. You don't have to import it into iMovie afterwards, let it analyze the data, crop out some area so that you've lost some of your field of view. It just works in real time and when it's done, you have a 1920 by 1080 movie that's already stabilized and this is a wonderful thing.

Why not use stabilization? Well, I think most people will want to use stabilization, but if you have some pixel processing algorithms in place that might not interact well with it, you should be aware of what it's doing. Stabilization does alter the pixels because it's correcting for shake. It does have to move them around.

So that means your output that you see in the movie is no longer matching what you see in the video preview. If it's important for you to have a one-to-one correspondence between what's coming out the video data output and what's being shown in the preview, stabilization will no longer make that true. And it may not interoperate well with whatever pixel processing algorithms you have, in which case you would want to turn it off.

Also, you should be aware that it does add latency, one or more frames to the video data output. So expect just a little bit more lag when stabilization is turned on. Where is it supported? It's supported on iPhone 4S and the new iPad. Compatibility. As I stated before, all the HD resolutions are now compatible. In iOS 5, we only did 1080p. Now we've extended that support to all the HD resolutions, 720p and 540p as well.

Where does it not work? Well, it does not work with front camera. We do not stabilize video from the front camera because there it's SD only, and typically the subject in the picture is just, you know, maybe a meter away from the screen, so stabilization is not a big win there.

It does not work with AVCapture still image output. Still images are not stabilized. And it also does not work with preview, as I stated before. Usually when you stabilize preview, people have this weird feedback effect of trying to correct for what's already been corrected, and then they can't really follow themselves well. It's better to keep the preview uncorrected for stabilization.

Now, in iOS 5, we did give some limited support to developers for using stabilization, but largely it happened behind your backs. Movie file output always stabilizes 1080p video, period, on iOS 5. Nothing you can do about it. It never stabilizes any other resolution. And video data output also never stabilizes any session preset.

And there was no API to opt in or out. So 1080p, you're getting it if you're recording to movies, nowhere else. But in iOS 6, we're keeping the behavior the same if your app is linked before iOS 6. But then once you recompile against iOS 6, you must opt in for stabilization where you want it. Both movie file output and video data output are eligible, so you can use it with either one.

And here's how you opt in. There's a little bit of boilerplate code. You're probably familiar with--you know, you create a capture session, you create a device input for the back camera. You make either a movie file output or a video data output. You get a reference to its connection by getting the connection with media type video.

And then you opt in for it when it's available. So if video stabilization is supported on that particular platform for that particular camera, you say, "Set enables video stabilization when available to Yes." Now, as I stated earlier, it's not--not all session presets are stabilized. The SD ones are not. The photo preset is not. How will you know when it kicks in? You can observe, using key-value observation, the connection's video stabilization enabled property. It will flip to Yes when stabilization goes on, and it will flip to off when it's no longer in use.

There are a couple of gotchas that you should know about with video stabilization and in general with setting properties on connections that I wanted to call out. When inputs or outputs are added to a capture session, connections-- these are a different kind of object, an AVCapture connection-- they are implicitly formed between compatible inputs and outputs. This is what it looks like. Let's say you have a session and an output. Let's say a movie file output. And then you add an AVCapture device input to your session.

What you don't see is that behind the scenes, the session asks the device input, "What kind of media can you produce?" And it asks the outputs, "What kind of media can you accept?" And it finds the matches, and it forms a connection between them. In this case, that device input produces video, and the movie file output accepts video, so it implicitly forms an AVCapture connection for video, and it's there.

Now, the reverse is also true. If you switch cameras, so you need to remove an input and then add a new input, what happens is as soon as you remove that AVCapture device input for the back camera, it also implicitly severs the connection to the existing AVCapture connection and both disappear at the same time. So what that means is any settings that you have-- that you've applied to that connection are now lost. So you need to add the new input and reapply the same settings that you did to the new connection.

And I highly recommend using AVCapturEssion's begin configuration and commit configuration when you are doing these high-level, big graph changes, because what it does is it holds off the commit of any of these property changes until you say commit configuration, and it prevents multiple stops and restarts of the graph.

You can see the updated AVCam sample code for an example of how to opt in for video stabilization, and that, too, is already up there. All right, on to real-time face detection. This is neat. Face detection, this is the same face detector that's used in the camera app, and it's different than the one we showed you last year, which was CI Face Detector. Look at those cute kids. Scans for faces in real time, and it can track up to 10 faces in real time.

So the contrast isn't great up here. I hope you can see all of the rectangles around their faces. It assigns a unique ID to each face in the frame. And the detector sees the camera in the native orientation of the camera. That is, if you take out your iPhone right now and you turn it sideways and put the home button on the right, that's the orientation in which the camera is mounted. So in this picture, for instance, even though we're holding it portrait, it found the girl on the right first because she's closest to the top left if you rotate by 90 degrees to the left.

It provides a timestamp for each face. The timestamp will be the same. It correlates to the frame in which that face was found. And that's good when you want to match it up with video later. We'll talk about that later. It also finds the rectangle bounding each face. And these rectangles are given in the coordinate space of the device. That is, they're scalar coordinates from zero to one. Again, in the un-rotated space. It can also determine the roll angle. So you see the tall boy right there has his face turned slightly.

[Transcript missing]

It can determine roll angles in increments of 30 degrees either way. It can detect faces all the way from straight up to all the way upside down. So great for finding kids on monkey bars. It also determines the yaw angle. Yaw is also known as turn. Before we were doing tilt, now we're doing turn.

So here we see that the tall boy has turned his face, and so it's going to detect that he has a yaw angle of 315, or minus 45 degrees from center from the perspective of the person turning the face. It always uses positive angles, so you'll always see something between 0 and 360.

Also notice that it no longer finds the girl on the right. That's because she's turned her head too far. It can only find faces that are between 0 and 90 this way and 0 and 90 this way. So between 91 degrees and 269 degrees, it won't find a face anymore.

And good news, it works with front and back camera, all presets. It's resolution independent, and it happens quickly regardless of which capture preset you might be using. What does it not do? Sorry aliens, sorry pets, it will not find you. Unless you're an alien that looks a lot like a human.

It also does not recognize particular faces. That's a distinction to make. It's a face detector. It's not a face recognizer. So while it might find two faces here and assign an ID to them, it's not going to know that that's Princess Clara and Sad Captain America. No. It also does not remember faces.

That means that once it's assigned an ID to a face, if that face turns too far and it loses it, or if the face goes out of the frame and comes back in, it's a new face to the detector. It assigns it a new ID that's incremented higher. It also does not currently determine pitch, so the nose up or nose down, it can't tell the difference. It's either a face that's there or not. We'll tell you yawn, roll, but not pitch.

It also does not find, as I said before, faces with a yaw angle between 91 and 269. It has to be able to see some of the defining features of the face to determine that it is a face. So why might you want to use AV Foundation's real-time face detector as opposed to the CI face detector that we showed you last year? Well, first and foremost, it's optimized for real-time capture. It's the same one that's used by the camera app. It's hardware accelerated. So it uses very little CPU.

It's capture resolution independent, meaning it doesn't matter if you're in a photo preset with very large images, it will still find faces at the same speed. It also supports tracking faces over time. The CI Face Detector interface is a push one, meaning every time you feed it a frame, it searches through the frame to find faces. Here, once we've found a face, we can lock onto them and we don't need to search the whole frame again and again to find that same face. So once it's found a face, the tracking is very fast, and the latency is very low.

But that doesn't mean that Core Image's CI face detector is no longer relevant. To the contrary, it's still very relevant. Why? Well, it's available on all supported iOS devices in iOS 6. The same is not true for the real-time face detector. And also because of its push interface, that means you can use arbitrary source images. They don't have to come from the camera.

You could pull an image from the assets library and find faces in that one, and it will tell you about them. The AV Foundation real-time face detector only works with content coming from the camera in real time. All right, so enough talk. Let's have a demo. I'd like to call up Ethan Tiratompson to give us a demo of StashCam 2.

Hi, everyone. I've been working with Brad to update StashCam for these new APIs. But I'd like to start with reviewing the core image face detection, which you see running here, because as Brad pointed out, that this is still pertinent for a variety of situations, such as if your images are not coming from a capture source or if you aren't running on a device with the necessary hardware support. But now if I switch to running AV Foundation's face detection, we see that it gets similar results.

With a much higher frame rate. Yes. If you look at the top, the frame rate's running there. So you see it's getting the full 30 frames a second. And it hasn't dropped the core image frame rate at all. So it's able to do that without stealing any CPU from whatever else you've got running. So now if I put on my Stash, and I'm going to run it, you can see some of the new fun features that we've been -- that this supports. So we've got the rolls and the yaws.

And to show you the ID tracking, I can tap on my face and I can get to a pirate. Arr! I think Brad looks particularly good in a clown outfit. Scary face. Yeah, very scary. Scary clown. And we also in this have a still capture. So this is operating on the video preview layer. I can tap the button at the bottom and save my StashCam memories for later. And there's all the API in there for using that. So that is StashCam. Great. Thank you, Ethan.

StashCam 2 ships with built-in support for mustaches, clowns, and pirates, and we're going to make aliens and pet overlays available as an in-app purchase. All right, so let's talk about how we did that. First, he showed you there was still that legacy CIA face detector path that was in the first revision of StashCam, and this is how you do it. It's got a device input at the top and a session, and at the bottom, it has a video data output, a still image output, and a preview layer.

As it gets frames in real time through the video data output, it pushes them in one at a time to the CI face detector, which scans the frames, finds the faces, returns a result, and then he uses those results to overlay the red rectangles on top of the video preview layer as core animation layers.

And he also uses the CI face detector when taking a still image down the CI face detector path, and then uses the result to composite using CG and then using image IO to write the JPEG and record it to assets library. So all that code is still in there. This is a great sample because it shows you both ways to use faces.

It also has the real time path, and the programming model is a little bit different. It's the same at the top. We still have a still image output and a video preview layer, but then in the middle, there's this new kind of output called an AVCapture metadata output. Notice it's not a face-specific one. It's for metadata in general.

And the metadata output outputs an array of metadata, or in this case, an array of faces, which he can then use to draw the mustache layers as before. And he can use that same AV metadata output to pair with a still image and use CG as before to composite the mustaches. Let's talk more about this AVCapture metadata output.

Previously, AVCapture device input would only expose a single kind of input port. It would say, "I have video available. That's the only kind of data I can produce." Now, on supported platforms, that device input will say, "I support both video and metadata." So that input port can be hooked up to things that consume metadata.

And we're in luck. We have a new AVCapture output subclass that does consume metadata called the AVCapture Metadata Output. Now, today, there's only one kind of metadata supported, which is faces. But you can imagine that in the future, we might add other kinds of metadata that can be had through this output. If you've used video data output, you'll be right at home. It's patterned after that one. So instead of outputting a sample buffer at a time, it outputs an NSArray of AV metadata objects to your delegate on a serial queue that you have defined.

And it also allows discovery of a bunch of available metadata object types. That means if in the future we add foo discovery and face discovery, you can know what kinds of metadata objects are available through this output, and you can filter to the subset of objects that you want. So if you only want faces and you don't want foos, you can specify to the metadata output that you want to set the types to just faces. I'd recommend future-proofing your code now if all you want is faces to do that.

Now, what's in this face object that you get? In the capture output did output metadata objects callback, you'll get an array of objects, and they all have a time stamp, a rectangle, an ID, a roll angle, just as I was talking about before. That's where you can do interesting things with your faces.

So let's talk about the definition for what a face is. First of all, what's the rectangle of a face? The bounds extend from just above the eyebrows to just below the lips. So it's not actually the whole face with the forehead and all. It's about what you see here.

Also, the CG rect coordinates are scalar values from zero to one, as I said before. It's in the--origin is top left, and it's in the unrotated camera devices orientation. And CI Face Detector and the metadata output rectangles are comparable in size and origin. That's good. So if you are going back and forth between using one detector and the other, you can use them with confidence knowing that the rectangles they find and the faces that they find are going to be roughly the same.

There's also still image support when using face detection. So if you have a metadata output and it's enabled, and you have an AVCapture still image output, then when you take images, we will attach the face information to those still images. And if you then write those still images using our JPEG still image NSData representation-- that's a mouthful-- to turn it from a sample buffer into an NSData, we'll preserve those face rectangles and IDs in the metadata as XMP.

Where is it supported? As I said before, it's only on the newer devices, so iPhone 4S, iPad 2, and the new iPad all support real-time face detection. Great. On to our third main feature set, which is AVCapture Video Preview Layer Enhancements. First, conversion methods. Boy, people whine about this a lot on the developer forums. So we're finally giving in, and we're giving you what you asked for.

This is a bug that I had, and I love the title of it, so I just included it here. Setting focus and exposure points of interest is ridiculously hard. We tend to agree the developer's point was well taken. It seems like a totally arbitrary space, regardless of device orientation. The API is really hard to use.

So we do have a piece of sample code that shows how to do it for one orientation, but it doesn't apply to all orientations, and it's a little bit fiddly. Let's start with a review of how these AVCapture device points of interest work. A focus point of interest, this would be if you want to do like a tap interface for focusing.

The point is from zero to one with zero, zero in the top left and one, one in the bottom right. and the camera sensor's native or unrotated orientation is landscape. For the back camera, its native orientation is landscape right, that is, with the home button on the right. And for the front camera, the native orientation is landscape left, that is, with the home button on the left.

So let's say you have a phone rotated in portrait and you're taking a picture of a little girl. What the camera sees is what you see here. It's actually a landscape right orientation with zero in the top left and one in the bottom right. So what makes translating these coordinates so hard? There's a lot of math involved. Preview may be rotated.

[Transcript missing]

So you can imagine it's pretty hard then to figure out if someone taps on your video preview layer, how does that turn into an unrotated, unmirrored source. Conversion methods to the rescue. We've taken all of the trouble out of it by adding these new conversion methods. And here's what it looks like. Let's say you have a tap point from a gesture recognizer. You just call preview layer capture device point of interest for point, and it will give you the converted point.

[Transcript missing]

on an AVCapture output object transformed metadata object for metadata object connection. All right. Second video preview layer enhancement is support for pausing and resuming the video preview layer without stopping the session. Many of you want to pause rendering on the preview layer because you've taken a still image, you want to do some processing, you don't want to waste cycles previewing, or you want to just match the still image that you just took. You can do that now because AVCapture Video Preview Layer exposes an AVCapture connection.

Let you in on a secret. It always had a capture connection. We just didn't expose it. So now that it's public, you can use all of the properties that you would normally use in an AVCapture connection, but with respect to a preview layer instead of an output. So to pause video preview, all you do is find its connection and call setEnabled "no." And while you do that, rendering is paused on the preview. It shows you a frozen frame. And then when you setEnabled "yes," it goes back to its regular rendering, and it does not cause any glitch in any other outputs or lose your focus or exposure points.

Because we now have a first-class connection on the AVCapture video preview layer, it means we have some redundant API in video preview layer, and we've taken this opportunity to deprecate them. You can see AVCaptureVideoPreviewLayer.h for the ones that you should convert over to. Here's a list. The deprecated ones are calls that you would make on the layer. Now you do the same thing, except you get the layer's connection first, and then set the same properties on the layer's connection.

We have three miscellaneous APIs that didn't fit into any of the big buckets that I was talking about earlier, so I just thought I'd throw them all on one slide and tell you about them quickly. AVCaptureDevice's TorchActive property. This lets you know whether or not the torch can be used.

As you know, using the torch can make your device hot, and if it gets too hot, the torch has to shut down and can't be used until the unit cools off sufficiently. You can now use the TorchActive property to know about the current availability of the torch, whether it--and it's key value observable, so it will change when the torch is now active.

You can also set the torch mode to something other than fully on or fully off. Some of you out here are writing flashlight apps. I just know you are. If you are, you might want to set the torch mode to something like halfway between full and zero. Or perhaps you might want to use it to do interesting effects with your video. You can now set the torch level to things other than top or bottom.

Also, AVCapture still image output has a nice new feature when recording JPEG images. Previously, you had no control over the quality of the JPEG compression that was applied. It would always give you 85 percent quality. We determined that that was the right amount of compression to use. But now you can use the AVVideoQuality key with the still image output and specify anything from zero to one, zero being zero quality and one being 100 percent quality.

All right, on to our next big topic area, which is solutions for performance problems in your Capture app. We're going to talk about a couple common problems that we see over and over on the developer forums. First of which is, my app is dropping frames during video capture. And usually the second question is, is it my fault? And we all know the answer to that. The answer is, yes, of course it's your fault.

It's never Apple's fault. What can I do to recover? That's the more important question is, once I see it happening, what can I do about it? We're going to talk about that. Also, for those of you who use AssetWriter, you may have noticed that you have dropped frames at the beginning of your movie, and we'll talk about how to mitigate that.

Also, you may have noticed that you have some garbage-looking frames in your AV AssetWriter-recorded movies if you use GL for rendering. We'll talk about how to get around that. And many of you have a do-it-yourself kind of preview where you get video data, you manipulate it, and then you need to draw your own preview somehow, and it's slow. What do you do about it? Okay, we're going to talk about all those four things. First of all, handling frame drops.

Always, always, always set AVCapture video data outputs, always discards late video frames to yes. There is one tiny exception, and that is if you are recording and you are super likely to always be faster than real time. Because when you set always discards late video frames to yes, what it tells the video data output to do is to size its buffer queue to one at the end of the video processing pipeline.

So that means if you don't pull a frame on time because you were late, it will never get further behind than the current frame. It will throw out the current frame, append the new one. So in effect, it's always giving you the latest. And it keeps you from building up latency if you have a process that's taking too much time and it's doing so chronically.

So it saves you from periodically slow processing. If you just have a momentary glitch, you're still going to get all the frames. It's just that when you had the glitch, you might miss a few. What it does not save you from is chronically slow processing. If you're always slower than real time, you're still going to drop a ton of frames and you're still going to have a bad user experience.

Also, we've provided a way in iOS 6 to help you debug your dropped capture frames, and that is we've given you a new delegate method in your video data output that's just like the didOutputSampleBuffer, except it's didDropSampleBuffer. So you can know when you're dropping sample buffers. and why might that be helpful? Well, the sample buffer that you dropped, it contains no image data. You were late. The image data is no longer there. It's been recycled. You can't see it anymore.

But the sample buffer does contain timing information, format description, and most importantly, it contains some attachments that you'll want to look at, like the dropped frame reason. So that will tell you why you're dropping frames. If it says the dropped reason was frame was late, it's because you are using the always discards late video frames, but you're still late. So that means it had a cue size of one. It had to pull one out to put a new one in. That means you're operating slower than real time.

The out-of-buffers one, that happens if you've been so late so many times and you're not using the discards late video frames. So let's say you're in a recording scenario, but you've dropped so many frames now that the capture device has no more left to give you. And it just can't produce anymore until you give some back. Out-of-buffers means you're way behind.

And then a discontinuity means something unexpected happened, the capture device had to reset itself, maybe something crashed, but it just means that you're going to have a--some number of frames were lost, and that the next video frame that comes in will have a much later timestamp than the one that you are currently on, more than one frame.

So how can you mitigate these? Well, having this tool at your disposal, knowing when you're dropping frames is huge. You can know when it's happening and take action. The action that you take might be that while you're developing your app, you pay attention to this and figure out the devices on which you're dropping frames so that when you ship, you've already tuned those devices to know what frame rate to use.

But you can also use it as a stop gap if in the field, like in real time, you find that you're still dropping frames. One way to mitigate them is by lowering the frame rate dynamically. You may not be aware that as of iOS 5, you can lower the frame rate dynamically on the video data output without stopping the session.

and it will gradually and neatly throttle down to the new frame rate that you've set. So by lowering the min and max frame rate, it means you don't have to drop a pile of frames to catch back up. You can let it do it gradually, maybe over a second, and get up to a reasonable quality of service where you can keep up.

The way that you set min and max frame rate is by setting the video min frame duration. Duration is one over frame rate. So you set the min frame duration to set a new max frame rate, and you set the max frame duration to set the min frame rate.

Okay, asset writer frame drops. This one's a little bit stickier. And I told you that it's always your fault, but in fact, here it's kind of Apple's fault. So we're fixing it. AVCaptureMovieFileOutput is really good at capturing movies. Why? Because it's optimized for real-time file writing. It knows what format you're going to produce. It pre-allocates buffers for glitch-free movie writing, so it can start right away without dropping any frames. AVAssetWriter, on the other hand, is limited by its interface. It does not know the source format that you're going to be providing up front.

So when is it going to do all of that setup, building up its render pipelines and compressors and such? It can't prime that render pipeline ahead of time. It has to do it when you append your first sample buffer. Now it knows what the source format is. It knows how it can proceed. But that's too late, because if you're appending the first sample buffer, you meant for that one to go into the file. So now you're going to wind up with some frame drops at the beginning of the movie.

So we're getting by this by giving you a new initializer for AV Asset Writer input. First of all, continue to do what you have been doing, which is set expects media data in real time to yes. And now new in iOS 6, you can use a new init method to hint it the source format before you start. So the existing init method looks like this, Asset Writer input with media type output settings. And the new one just adds one more parameter, source format hint.

So use that. Use that if you're in the real-time case because it lets you set up--it lets AssetWriter set up itself ahead of time. Now, it doesn't mean that it's zero cost now. You still have to pay the cost somewhere. The costs of setup now move from the first append sample buffer call to when you call start writing. So what that means is you should still set up your asset writer off of the video data output serial queue on which you're getting delegate callbacks. You don't want to do it there.

That start writing might take hundreds of milliseconds, which might cause frames to pile up behind you or cause frame drops. So set up your asset writer off of that serial queue, and then when you're actually ready to start going, you can append sample buffers in your delegate callback.

All right, rendering with OpenGL and writing to AVAssetWriter can be a little sticky. When you render to a texture using CVOpenGL ES Texture Cache, you have to ensure that GL has finished rendering the buffer before you pass it to AssetWriter. If you don't, you're going to wind up with some garbage or lines in your movie that you don't want.

So the way to do that, the safe way to make sure that GL has finished with the frames is to call glFinish. It's safe, but unfortunately, that's going to block the CPU and wait until the GPU is done rendering, and you don't want that. You don't want to block your CPU so that the GPU can do its work. You want both to be busy at the same time.

There's also glFlush, which flushes out any pending commands to the GPU but doesn't finish them. So here's our recommendation to get more out of the CPU and the GPU to get them working in parallel, is to use what we call a delayed glFinish or staggered glRender to keep both of them busy. Here's what it looks like. In your video data output, You'll get a frame. Pass that frame to OpenGL to render it and call glFlush. It's now working on frame N. Don't call glFinish. Just hold on to the frame.

Now wait until you get frame N plus 1. Now call GLFinish for the first frame. That means you let the CPU do all that work to get the second frame in before you called GLFinish, and you gave GL some more time to work on rendering that first frame. Now you can pass the second frame to GL, call GL flush, hold on to the second frame while you pass the first frame to AssetWriter, and then wash, rinse, repeat.

GL Flush is only necessary if you are rendering for Asset Writer but not presenting it. If you do call present render buffer for preview, it implicitly takes care of the GL Flush for you. Also, that big staggered GL render that I just took you through is only necessary on devices before iOS 6. iOS 6 and later, Asset Writer takes care of this for you so that GL is always guaranteed to be done with the buffer before it writes it to the movie. You will not get any more of those garbage frames.

Okay, how to draw my own preview fast. This is the last performance problem we're going to talk about. Well, the first way to do it and the easiest and the most efficient is to use our own built-in video preview layer if you can. If all you're going to do is render things on top of the video preview, such as StashCam did, where it's drawing the mustaches on top of the video, well, no need to be manipulating pixels. Just use our video preview layer and use Core Animation to composite other images on top of it. That's as easy as can be and it's very efficient. Plus, you get all of Core Animation's rotations and animations for free.

If you are using video data output because you need to manipulate the pixels before previewing, our recommendation is to use OpenGL. We found that it is consistently the fastest way to draw your own preview. It's not necessarily as easy, but we've given you some good sample code to show you how. There's the GL Camera Ripple sample code.

It operates in 4:2:0V. If at all possible, use 4:2:0V for your preview. And if you can manipulate your pixels in 4:2:0V, please do. Because it's less than half of the data size of BGRA. It's the native format of the camera, and it just makes things faster and more memory efficient.

If, however, you need to use BGRA for output and for preview, please review the Rosie Writer sample code. It shows how to do the same things with operations in BGRA. On to our last topic, synchronizing motion data with video. This is a fun one. Okay, let's start out by having a demo of VideoSnake. I'd like to call up Walker Eagleston to help me demo this.

It has a nifty icon, too. There we go. Okay, it's black, so you can't see the whole screen frame. But what's happening here is it's using AV Foundation plus Core Motion, the accelerometer, in this case, to draw a video snake. The name of this app is Video Snake. So what you see, I'm the head.

I am the head of the snake. And behind you -- behind me, undulating around, you see a motion history of where I've been. And we're using Core Motion for the accelerometer only here. So as Walker translates the iPad around, so moves it left, right, top, up and down, you'll see the tail of the snake undulate around.

There's also a second effect built into this app, which is more of a collage effect. So he's demoing that to you now. This is sort of a poor man's pano, I guess. As he rolls around, this takes into account not only translation but rotation. So as he kind of moves the iPad up and down and rotates it around, it will keep track of where I am and draw the history around me in more of a 3-D space. And so what's neat about this is that it knows the camera's field of view by using the sample buffer attachment's focal length in 35-millimeter film.

And that lets him guess how much -- when to stop to snap a new picture. So every 10 degrees or so, he's snapping a new picture and using GL to render it. And what you wind up with is this cool kind of collage effect. None of it works, though, if we can't correlate those video timestamps to the timestamps that we're getting from Core Motion. Thank you, Walker.

All right, so how did we do that? You couldn't see it because how would you see audio? But there was a big record button at the bottom, which we were tempted to press, but we didn't because recording is cool, but it's better seen in code than demoed on stage. VideoSnake is doing both video and audio. So at the top of his AVCapture session, he has both a video device and a microphone. He has a video data output and an audio data output.

And he's And to correlate the timestamps with core motion, he's taking the video frames that he gets and he's undoing AV Sync. We'll talk about that in a minute. That's wrong. He's not passing to a CI face detector. That should say passing to core motion. And then passing the core motion and getting the timestamps, it's then rendering with OpenGL.

And the rendered frames, that is the composited snake, is now previewed using GL. And the results are brought back into main memory where they can be written to AssetWriter and correlated with the audio that came in at the same time. The audio and video is in sync. This sample was not on the developer site as of this morning, but I think it will be there this afternoon.

All right, so how do you synchronize motion data with video? The problem is we've got three input sources now. We've got the gyro core motion, we've got a video device, we've got an audio device. That means potentially three clocks. And you know the old saying, "The man with two clocks never knows what time it is." Well, we really don't know what time it is because we've got three clocks to deal with.

But we're lucky because, in fact, there are only two clocks to deal with, because both the video camera and core motion use the same clock. It's the host time clock. So now we only have to worry about the audio being on a different clock and thus a slightly different timeline.

To synchronize motion data with video, first you have to know about the motion samples. The Core Motion samples contain a timestamp. You can get at them by using CMDeviceMotion timestamp, and that'll give you an NSTime interval. It's timestamped using the Mach absolute time of the motion, so the exact time that the accelerometer event happened or the gyro event happened, that's what you're getting when you ask it for the timestamp. And Core Motion does use the host time clock to do its timestamping.

When you do tricks like this where you're using motion and you're using video, you'll typically want more motion samples than you have video samples so that you can compare motion that was happening around the time of the video frame. Because typically people can move around faster than the video can spit out frames. So we recommend at least a sampling rate of 2x your video frame rate if you're going to do effects with Core Motion.

Now, on to the video timestamps. How do you get those? Each sample buffer of video that you get has a timestamp that you can get using CMSampleBufferGetPresentationTime, and that is the Mach absolute time of the frame. Front and back camera on iOS both use the Mach absolute time or the host time clock.

For the audio, though, it doesn't just contain one frame of audio. It contains N frames of audio. And the time stamp there is the time of the first frame in the buffer. And, again, it's on a different clock. But it's the time that it came into the microphone.

The audio capture device uses the audio clock. And to do AV sync, then, we have to pick one or the other because otherwise they would drift apart and you would have movie recordings that were out of sync. So because they might drift, we have to pick one. Well, we picked the audio clock because the audio clock, it's easier to sync video to audio than vice versa. So audio becomes the master.

That means that the video stamps that you get on the video sample buffers when you are also capturing audio are altered. They're on the audio timeline, not on the video timeline. That's no good if you want to correlate them with motion, which is on the host time clock. So the way you undo AV sync is by getting a reference to the audio clock and the video clock, as seen here.

And then taking your presentation time from the video buffer, which, again, is on the audio timeline, and call CM sync convert time, which converts from the audio timeline to the video clock's timeline. So now you're back on the core motion timeline. You can now do your correlating of times. So in summary, what did we talk about today? We talked about OS X features, mostly performance. We were really centered on getting the performance better in OS X.8. 8.

We also talked about camera ecosystem as a whole. Be sensitive to our users' private data. Don't violate their trust. We also spent a lot of time talking about iOS 6 capture features, solving performance problems, and lastly, we talked about correlating motion data with video. Hope you've enjoyed our session today. For more information, please talk to Eric Verschen, our evangelist. Thanks again, and enjoy the rest of the show.