What's New in Camera Capture - WWDC 2013

Media • iOS • 58:39

AV Foundation provides your application with access to the powerful camera imaging sensors built-in to all iOS devices. iOS 7 enables finer grained control over the capture format, support for zoom, built-in barcode recognition, and enhancements to autofocus. If your application uses the camera on iOS you want to attend.

Speakers: Brad Ford, Rob Simutis, Ethan Tira-Thompson

Unlisted on Apple Developer site

Downloads from Apple

HD Video (3.36 GB)
SD Video (586.1 MB)

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript has potential transcription errors. We are working on an improved version.

Morning everyone. Welcome to Session 610. I'm Brad Ford, I work on the core media engineering team. For the next hour I'm going to talk to you about the most popular camera in the world. Actually that's inaccurate. If you go by Flickr data, I'm going to talk to you about the three most popular cameras in the world -- iPhone 4s, iPhone 5, and iPhone 4.

And we recognize that you're a big part of that popularity. We bring the great hardware, we bring the camera that people get excited about, and we bring the framework level support, but you bring the apps. And we wouldn't be as popular or as successful without your apps that make our platform so useful, and so fun. So thank you for that.

Today we're going to have a brief appetizer of greater transparency for users, and then the main course is features -- lots and lots of new features, and then we'll follow that up with a sample code update for our dessert. We're not going to spend any time today on core media basics or AV foundation basics, because we just don't have time in an hour to do that.

But lucky for you, we've talked about them several times in the past, and all of these sessions are available on your WWDC app on your phone right now. So you could actually call it up, and you could be listening to me two years ago, while you're listening to me now.

But turn the sound down. First upgrade for transparency for users, last year we introduced some security hardening that we introduced in iOS 6 to make it more transparent for users when the photos and videos were being accessed in their photos library. And we did that by popping up a dialog the first time your application tries to access -- that is read to-- read from, or write to the assets library, so that the user would have an opportunity to opt in or out. And we warned you that you should start paying attention to errors that you get back from AL assets library.

Well this year we're hardening things even a little bit more, and we do this for a couple of reasons. You've probably noticed that on our iOS devices we have no hardware blinky light that tells you that recording is in progress, and AV foundation as a framework does not force you to put up a UI saying "recording in progress". So therefore it's possible to do something headlessly. And users want to trust your app, they want to know when things are happening.

So also in some regions it's now required by law to present users notice when the microphone or the camera is in use. So new in iOS 7, we are going to introduce two new dialogs the first time your app makes use of the microphone or the camera to allow users to know about it, and to opt in or out. Now the microphone dialog is everywhere, that is all iPhones, all iPads, everywhere. The camera dialog is just in certain regions where required by law, such as China.

Here's how it looks in code. The first time you create an AV capture device input, the dialog will be invoked. In code it looks like this. You call device input with device, and pay attention to that error, because it might return an error now. The very first time we need to succeed, because we need to return control to you immediately, but we actually don't know the answer yet because the dialog is up but the person might not have said okay or deny. So what do we do in the interim?

For a microphone we produce Silence until the user grants access, and for the camera we spit out black frames until they've granted access. If on subsequent launches we already know what the answer is, we can return an error immediately, and that's a new error in AV foundation called application is not authorized to use device. So pay attention to that. That means the user is choosing not to allow you to use the camera or microphone.

Alright, on to features. This is going to be the major bulk of our talk today, and we have a lot of them to get through. Five major feature areas, first 60 fps support, video zoom, machine readable code detection, or barcode detection, focus enhancements, and integration with audio session.

First up 60 fps support. And we know a lot of you have been waiting a long time for this. You've been waiting patiently, and I think it's worth the wait. We didn't want to unleash this feature on all of you before we had really thought it through, and given you comprehensive support across the media stack. So by introducing 60 frame rate movies, we wanted to make sure you could also do interesting things with them, as far as playback and editing.

So we are introducing full iOS ecosystem support for high frame rate content. What does that mean? On capture we support 720p video, up to 60 frames per second, with video stabilization, and we write really cool movies. These have droppable p-frames in them, a feature of H264, which allows them to play back smoothly, even on lower powered machines, or older machines.

On the playback front, we've beefed up our support for audio processing in the time pitch domain, so that if you want to do effects like slow the movies down or speed them up, you can do interesting things with audio. On the editing side, we largely already had the support there, but we do support fully scaled edits in mutable compositions.

And lastly, in export we allow you to do it two ways. You can either export such that the high frame rate areas of the movie are preserved, or you can do a frame rate conversion that will sort of flatten it all down to 30 frames per second, or something else. But enough talk, let's do a demo.

Alright. So the first demo app is called Slowpoke. This is an app that showcases all four feature areas of 60 frames per second support. First one is capture, as you might expect. Now it looks just like a regular capture app, except I don't know if you can tell out there, but it's a really fast frame rate, it's a buttery smooth 60 frames per second preview. And it's running the camera at 720p 60. You can also do all the things you'd expect, like focus, and it writes movies that have the proper H264 bitrate profile level, etcetera. Let's go over to the more interesting part for today's demo, which is the playback and editing.

I recorded several movies here previously, they're all 60 frames per second movies. I'm just going to pick one of them, and now we'll find out why this app got its namesake. This is a clip of a guitarist playing the prelude from Bach's E Major Lute Suite, let me play a little bit for you.

[ Music ]

So let's say you're trying to learn this piece yourself, and he's going too fast, you need to slow him down so that you can hear it better. I'm going to swipe to the left to slow it down.

[ Music ]

And I'll go even slower now.

[ Music ]

So now you can really see his fingers move well. Alternately, you could make him sound like Yngwie Malmsteen, which is my favorite.

[ Music ]

Notice how good it sounds. We're preserving pitch here, so that you could even export this and pass it off as him doing the real thing. Now he's an amazing guitarist. Alright, let's pick another one.

[ Applause ]

Thank you. Let's have a little fun at my dog's expense. This poor animal protects our house from dangerous birds on wires, and this is what he sounds like.

[ Dog Barking ]

Okay, protecting our house. Now let's have some fun speeding him up and slowing him down, but this time I'm going to engage chipmunk mode. You'll notice over in the corner I'm going to turn the chipmunk button on so that we can make him sound like a yip-yip dog.

[ Dog Barking ]

Or, like Barry White.

[ Dog Barking ]

Or a dinosaur.

[ Dog Barking ]

Okay, enough of that. And finally, let's go --

[ Applause ]

Notice we have so many frames in the movie that it looks really good when you slow him down. Let's take this last one here, this is an action shot, kind of a frightening one of my dog coming up towards you at a million miles an hour. Now let's engage the chipmunk mode, but let's say this time I don't just want to mess around with it in real time, I want to program a slow motion part right into the asset. Okay, so I'll pick a point right where he's starting to come up the stairs. Now I'm going to swipe down to begin and end it, and edit.

And then I'll go to where he's right next to me, and I'll swipe up to end the edit, and here I get to apply a rate. So I'll set the rate to .25, quarter speed, and apply it. And you notice that the duration just changed on this movie. Now I can go back and play it.

[ Silence ]

Oh yeah. He's coming for you. Okay. So -- and then of course as you might suspect, we would want to be able to save these off for posterity, so we have the export button over here on the side, which lets us export to the camera roll, either preserving the high frame rate sections, or going down to a constant frame rate of say 30, or something like that. And that is Slowpoke.

[ Applause ]

On the playback side, AV player does most of this for you automatically. If you just use player setRate, it can select arbitrary rates and play them back, and do the really hard job of keeping audio and video in sync. There's a new property on the player item. A player is composed of player items, because it's a queue model.

You can take the player item and set its audio time pitch algorithm, that's what I was using there to either adjust the pitch higher or lower, or keep it constant. I was using the bottom two constants there, which are spectral, which keeps it preserved, and varispeed, which alters the pitch. And these are very high quality algorithms. They can go constantly from 32x down to 1 over 32x. And they sound great.

On the editing side, I was using AV mutable composition to build up those temporal edits when I saved off that scaled section. I did that by creating an empty composition, inserting all of my source asset into that composition, and then choosing the section that I wanted to scale up or down, just by using scale time range to duration, very simple. See the Slowpoke sample code where you can find out how to do all this yourselves. And if you're interested in the editing aspect of this, I invite you to come back tomorrow at 9:00 a.m. where we're having an advanced editing session with AV foundation.

On the export side, I'm using AV asset export session to flatten that out into a new movie. Now as I mentioned, there are two ways to do this. You can use the pass-through export preset if you want to avoid any re-encoding. That will just retime the media, and send it out as section 60 frames per second, section slowed down or sped up.

Or you can do a constant frame rate export. You might want to do this if you want maximum playback compatibility. To do this you set the video composition's frame duration, saying my source's composition's frame duration is 1 over 30 frames per second, for instance. This gives you maximum playback compatibility. And you can also choose to set the time pitch algorithm for the export as well. So you can use a low quality, or a cheap expensive one during playback, and then when you export use a high quality one.

Again, that's all in Slowpoke. Now on to recording, which is why we're all here. AV capture movie file output just works, as you might expect. It picks for you automatically the right H264 profile level bit rate, and makes sure that the movie looks great. If you want to do stuff with the frames yourself, you need to use AV asset writer, and it requires some additional setup. As with all real-time use of AV asset writer, you need to set expects media data in real time to yes, otherwise it won't be able to keep up with the frame rate.

And we have a new object that helps you create settings dictionaries for the AV asset writer. An asset writer doesn't know what kind of output you want by default. You have to tell it what kind of settings to use. And this can be complicated with high frame rate movies, knowing what H264 keys to use, etcetera.

So you can instantiate an AV output settings assistant, tell it what the source video format is, tell it what the source video frame rate is, and then ask it for a dictionary of settings, and then apply that to your asset writer, and it just works. It'll pick the best settings for you.

That was the recording aspect of 60 fps, now let's talk about how you just configure the session in general. Those who've used AV foundation's capture classes know that we have an AV capture session that's the center of our universe. And the way that you configure it is one call. You set the session preset to something. We have a set of strings that tell you the quality of service you're going to get -- photo, high quality, medium, low, etcetera. And that does the hard job of configuring the inputs and outputs for you.

now we had a problem with 60 fps captures on iOS 7, because we didn't want to try to make new session presets for every conceivable frame rate and resolution combination, because that would result in a combinatorial explosion of presets, and it would be very difficult to program to. So in iOS 7 we're introducing a parallel configuration mechanism. The old one is not going away, but this one is for a power use case.

And that is we're now going to allow you to inspect the format of the AV capture device, and set the active format directly. And when you do this, the session is no longer in control, it no longer automatically configures inputs and outputs. 720p 60 capture is supported on iPhone 5, iPod Touch, the tall one, and iPad Mini.

Let's review how set session preset works. Here's a block diagram we have of the various pieces in a capture session. You have inputs, you have outputs, you have a preview, and they're connected via these white arrows, which are represented in our API as AV capture connections. So you can see the capture session kind of knows its inputs and outputs, it knows its topology.

So when you set a session preset, let's say photo, here's a common scenario, you might also want to get BGRA frames out of your video data output instead of the default, which is 4:2:0. Here's what happens under the covers. The session goes and talks to all of its outputs. It says still image output, for the photo preset what shall -- what do you require? And it requires full res jpeg, so it figures out that it should give 3264 by 2448, assuming this is an iPhone 5.

The video data output does not give full res buffers for the photo preset, it's sort of a special case. Instead it gives screen sized buffers to make sure that they're not too large for your processing. So it picks a screen resolution, and chooses BGRA because you wanted to override the default. The video preview layer just wants screen size, and it can cope with the native format.

So knowing all of these requirements now, the session goes up, aggregates all of those requirements, goes to the AV capture device, and says pick me the best format. And the AV capture device looks through its formats, picks the best match, and also picks the optimal frame rates -- min and max frame rate to satisfy all those requirements. That's what's happening underneath. Now using the new configuration mechanism, it's simple. Just do this.

Oh let's highlight this one at a time. The AV capture device now exposes an array of natively supported formats. Each one is an AV capture device format. So here I'm iterating through them trying to find the highest frame rate. In the next little section I look at each format object's supported frame rate ranges, find the one that has the max -- the highest max frame rate. From that I select the best format match based on the highest frame rate range.

And once I have a best format, I lock my device for configuration, set the active format, and then I pin the min and max to the highest frame rate that I found, which is exactly what I did in the Slowpoke app. Always, always, always unlock for configuration when you're done.

A note about frame rate selection. Previously we've only allowed you to do this at the AV capture connections, which as you know, sit lower in the session hierarchy than the device. But what they're doing is actually going and talking to the device and setting the active format -- or the active frame rates on the device, and we would like you to do now too.

This is the new preferred mechanism for setting frame rates. So talk to the device, not the connections. Frame rates can be set at any time, whether using the set session preset API, or the new set active format API. It's -- you can use them at any time, and it reconfigures the graph without tearing down.

Sometimes you just want to go back to whatever the default was supposed to be for the given preset or active format that you're using. But if you don't know what those defaults are, you can just set the min and max frame durations to CM time invalid, which will go back to the defaults. As I mentioned, AV capture connection's frame rate selection, accessors are now deprecated. Please switch over to the new one as soon as possible.

Okay, this slide hopefully doesn't scare you too much. This is a list of supported formats on the iPhone 5's back facing camera. It's a little overwhelming. There are 10 formats here, and actually that's only half of them. There are 20 formats, but I left half of them out because they're really just two flavors of the same format. There's a 420v and a 420f, the v for video range, the f for full range.

But if we take that complexity out, we're left with basically 10 formats, and as you can see, they're sorted, ascending by dimensions, and the more commonly used ones are listed first. So let's take a look over at the right-hand column. You can see that most of them are already used by one session preset or another.

There are, however, two new ones that we've never exposed before on iPhone 5, which is the 720p 60 format and one that's a 4 by 3 format, but not as big as the absolute full res 8 megapixel, and that's a 5 megapixel 4 by 3, and it's not used by any session preset. To get at it, you have to use the active format setters.

Here's what happens when you use the new way of configuring AV capture. Instead of talking to the session, you talk directly to the AV capture device. You say I want your active format to be 8 megapixel, let's say. Now when you do this, the session is listening for that, and the session says okay, they are now in control, I'm going to take hands-off. My session preset is now inputPriority, which means I'm not going to touch the inputs or the outputs, I'm just going to let the user be in control.

That means that the AV capture device will now deliver to the still image output the full 8 megapixel. Video preview layer is an exception, it still only gets screen size. But now, new for video data output, you get the full 8 megapixel buffers, not a scaled down screen resolution version of it.

Let's talk briefly about what is in an AV capture device format object. It has a media type, as you might expect, so you know if it's audio or video, it has a format description from which you can get the pixel dimensions, the pixel format such as 420v, 420f.

You can also get the video field of view. This is a really handy one. Previously, in order to know what the field of view of the sensor is, you would have to run a video data output, and then look through the meta data and find the focal length in 35-millimeter film, and that's really hard. Now you don't need to run the camera at all. You can just look through the supported formats array, and see what the field of view is, and it's expressed in degrees, and this is the horizontal field of view.

Also it tells you whether a given sensor format supports video stabilization, so that later when you enable it on the connection, you can know if it's going to succeed, if it's going to turn on stabilization or not. You can also look through the supported frame rate ranges, such as I support 1 through 30 frames per second.

And there's something called video binned, which you may or may not have heard of. This is sort of a sensor-specific keyword. And binning means taking groups of neighboring pixels and binning them together, sort of averaging them. And it's a means of reducing the resolution, reducing the throughput, but without reducing the field of view. Let me give you an example of it.

Going back to the iPhone 5 back facing camera, we have two 1280 by 720 modes available. Previously we only let you use the 1 through 30 one. But the 1 through 60 one has almost an identical field of view, so you're not really losing any depth of field. Instead, one of them is binned and one is non-binned. If you'd like to read more about what this means and the different image characteristics of binned versus non-binned, I encourage you to go do a web search for sensor pixel binning, and read all about it.

So some guidance about when to use one over the other. The session presets setting mechanism is not deprecated, it's not going away. It's still a good one to use, because it knows how to optimally configure inputs and outputs, so it gives you the best bang for the buck. It's just one call and it does everything for you.

But you should use the new set active format means of configuration if you need a specific format, such as the 60 frames per second format, or if you're looking for a specific field of view, or if you need those full resolution video data output buffers, this is the only way to do that. Alright, I think we've talked that to death. Let's move on to video zoom, and to do that I'd like to bring up Ethan Tira-Thompson. Thank you.

[ Applause ]

Hi everyone. I'm very excited to talk to you today about zoom. I hope that perked you up, because we've got some exciting stuff here, and I think a lot of you will want to take advantage of it. I'd like to start by reviewing our current API, which is the video scale and crop factor of the AV capture connection.

Apparently this only applies to still image outputs, so you could enlarge an image by setting this property. And typically then you would also apply a transform to the preview layer so that the user gets some feedback as to how much zoom has been applied. So that would look something like this screenshot on the left.

And this is not being deprecated, so this is still available. However, we're introducing a new property simply called video zoom factor, which is on the AV capture device. So this is at the root of the session, and applies to all image outputs, including the preview. So by setting this one factor, you'll get a preview, which is enlarged, and also much sharper.

So to look under the hood and let you know how this works, let's go back to our architecture diagram, and we have our video scale and crop factor on the AV capture connection. And notice this is only applying to the still image output, and again this is still there. However, our new property up on the capture device is applying to all the image outputs, including some not shown here, such as the meta data output and movie file output.

And because this is at the root of the session, we can do some interesting things with the image processing. Normally when we're getting an image from the sensor, it's at the full photo resolution, it's the maximum resolution of the sensor. However, video resolutions that we output, like 1080p, are a lower resolution.

So we must scale down the image in order to get to that resolution. Now if we want to enlarge the image, instead of upscaling the video that we're outputting, let's just crop the image and not downscale as much. This means that you'll be getting a larger output, without actually -- with retaining the original detail that's coming from the sensor.

Of course we can also crop tighter than that. So once we are requesting a smaller source area than the output, then we need to upscale, and that's fine. We have a property to let you know when this is going to happen, and this is a property of the new AV capture device format that Brad was just talking about.

So you can check for each format, depending on the resolution of the video that's going to be returned, that will adjust how much upscaling -- the threshold before you hit upscaling. So you can check the property for each format and know when you're going to encounter this range.

To illustrate this, we have a little animation here. So the purple box would be the same dimensions as the video output, and the red box would be the maximum say that you want to zoom down to. And as we increase the zoom factor, we cross that threshold, so now we're in the upscaling range as zoom back out. Now we enter the crop zoom section, so that's just cropping on the sensor. And you kind of just go back and forth across the transition. You don't actually need to know where that threshold is, but it's there.

So to talk a little bit about the API behind this, there's the aforementioned video zoom factor, which is applying to all the image outputs. And it's up to a maximum value, which is also a method -- property of the AV capture device format. So if you just care about the current format, the active sensor format, there's a property, the device active format, and then you can check the max zoom factor of the active format, and that'll let you know what the current maximum zoom that you can use is.

The device coordinates are a little interesting because they are going to stay fixed through the frame. You can think of this as an optical zoom at the front of the pipeline, the rest of the pipeline has already been -- the image has already been cropped. And so all the device coordinates are going to be applying to the image that is being shown.

So for instance, if we set a focus interest point on the soccer ball in the corner, and then we zoom in, as the image scales, the soccer ball will go out of the field of view, but that focus point's going to stay fixed in the corner where you set it. Those corners are static. Similarly, if the -- if you have face detection enabled, the faces will be returned as they're being displayed, and as the face goes out of the field of view, they will stop being detected.

There's a pre-existing transform meta data object for meta data object, which also helps you coordinate these, because we return the meta data and device coordinates, and if you want to convert it to the preview layer coordinates, this pre-existing method will do that for you. There's an interesting -- another aspect to this in that we are now applying zoom to video outputs. And there's a temporal aspect there.

Because we don't want you to have to increment the zoom factor for each frame as it's being captured, because you might have some threading issues, it might be hard to time and synchronize your threads and updates with the capture of frames as they're being received, so we can do that internally. And we have this new method, ramp to video zoom factor with rate. So you can specify the target factor, and the rate at which to get there, and then we will internally increment the zoom factor on each frame as it's being captured in real time.

There's another method, cancel video zoom ramp, so that any time you can do this interactively. You can either call ramp to video zoom factor again with a new rate or a new target, or you can just cancel the current one if the user has -- cancels, you know, lets go on the button.

And then any changes in the rate are smoothed further, so that you don't have any jumpy transitions within the zooms. Now rates are a little tricky in zoom, because the apparent speed of a zoom is actually determined by the multiplicative factor that we're applying, it's not an additive thing.

So the rate is specified in powers of 2 per second. So if you want a consistent speed of doubling every second, then you'll set a rate of 1. When you set a rate of 2, it'll go twice as fast, if you set a rate of .5 it'll go half as fast.

What you see here on the right is the graph showing a rate of 1. So essentially every second we double the zoom. And so we go from 1 to 2, 2 to 4, 4 to 8, and so on. In practice you'll probably want to stay around 1 to 3 for comfortable ranges, but of course your app is welcome to do whatever it wants. To demonstrate this I'm going to bring up Rob Simutis, and he's going to help me demo SoZoomy.

There we go. Alright, so we're going to start with a mode called constant face size. So there's a cinematic effect called a dolly zoom, where the zoom is changed to keep a target object a consistent size, while the object is moving, which causes the background to shift. So if I have Rob start walking forward, tap on his face, then if there's anything in the background, which it's mostly dark so you'll have trouble seeing this, but yeah, let's have him try that again.

Keep backing up, and you can kind of see -- there you go. Let's see, I'll try again. And we're losing the face. But anyway, so that's the constant face size. There's another aspect of this demo, which I hope you'll recognize, if he turns around and I zoom in a little bit, let's get a good size. And let's go.

[ Music ]

[ Applause ]

So I think people have a lot of fun with that. Let's take a look at the code that's running in this. First off if you notice there's a slider, which I was using to adjust the zoom. And I was actually accounting for that exponential growth in the zoom, and the formula for that is there on the slide.

So what we do is you don't want to directly take -- say your slider's going from 0 to 1, you don't want to directly send that right into the zoom factor, because that'll mean that it's very sensitive on the wide end of the zoom, it'll be less sensitive on the telephoto end of the zoom. By taking our maximum zoom that you want to achieve, and taking that to the power of the current 0 to 1 target, that'll give you that exponential growth over the range, and so you get a linear feel to the actual zoom motion.

And of course remember to lock the configuration and unlock it when adjusting these values. Now for speed control, such as a jog dial, or maybe you just want to have a button you hold down to zoom in, you'll typically want to set the target either to the minimum or maximum zoom, and then you'll be -- the user will be interactively controlling the rate.

So in this case, we look to see if we're zooming in, then we go to the maximum zoom, otherwise we go to the minimum zoom. And we pass this to ramp to video zoom factor, and then you specify some rate. And then at any point if the user cancels the zoom, then you cancel the video zoom ramp, and we will ease out of the ramp so that it looks very silky.

In summary, to compare it to the video scale and crop factor, they both apply to still image output, but our new video zoom factor applies to all the image outputs. You can set the zoom factor directly, but the new video zoom factor API on the AV capture device lets you set the zoom rate as well. And this is currently available on the iPhone 5 and iPod Touch 5th generation. And with that I'd like to welcome Rob back up to present machine readable codes.

[ Applause ]

Thank you, Ethan. Hi, I'm Rob Simutis, and I'm also with core media engineering. I'm here to talk to you today about machine readable code detection, which is a formal way of talking about barcode scanning. We've introduced this in iOS 7, but to do real-time machine readable code detection for one-dimensional and two-dimensional barcodes, up to four at a time on both the front and back cameras on all supported iOS 7 hardware that has a camera on it.

You can see this in action today in the seed with the passbook application. In the upper right-hand corner, there's now a scan code button. So when you press that, you get a view to scan in codes. These are PDF417 or QR codes, or Aztec codes that are in the passbook format, and they pull directly into your passbook.

Now beyond just those types, we actually support a number of different types of machine readable codes, or symbologies -- UPC-E often found in products in stores, EAN-8 and 13 commonly found over in Europe, code 39, code 93, and code 128, some other types of one-dimensional codes. In the 2D space we support three types, PDF417 often found on airline passes, QR codes found on buildings and billboards, and corn fields in some cases, and Aztec which you often find on packages that you ship. So we'll demo this in action today, and I'll invite Ethan to come back up. And we have a sample app we call QRchestra. So Ethan has a version of the application, and his view has a bunch of QR codes on it.

And each of these contains a value that is a midi note. And I'm just going to have the scanner on mine, if I can hold it steady. [Several Beeps] So each of the QR codes is a midi note, and we translate that into a string, and then we run it through a synthesizer, and then out. And the detection's failing 'cause I'm shaking a little. [Several Beeps] But it allows you to have a QRchestra right in front of you. There we go.

[ Beeping ]

Alright, there we go. Thanks, Ethan. So a couple of notes. You'll be able to download this with sample code along with our slides and the other demos that we have today. But those QR codes were actually being generated on the fly, they weren't fixed images. And those are being done with a new core image filter that is available in iOS 7, so you can go and check out how to make your own QR codes on the fly.

[ Applause ]

So getting into the programming model, in iOS 6 we introduced the AV capture meta data output class, and this was originally done for face detection data. So we've expanded that, and this is how we get bar codes out. Normally you add it to your capture session, and it has a connection to your capture device, and so this would be your video device. And then your application implements a meta data output objects delegate. And as we detect barcodes, machine readable codes, we will then send an array of those AV meta data machine readable code objects to your delegate.

We'll take a look at this in code. First, alloc/init your meta data output, add it to your session, create your meta data delegate, along with its dispatch queue, set it on the output, and then configure the types of machine readable codes that you're interested in. Here we've set it up to look for Aztec codes.

Now in your meta data output delegate, you implement the capture output, didOutputMetadataObjects fromConnection API. And this will get callbacks periodically, and you'll receive an array of AV meta data objects. So because we're listening for machine readable code objects, we'll look and make sure that they're of the class AV meta data machine readable code object. And once we have one of those, we can retain it or use it further at that point.

So let's take a look now at what a machine readable code object contains. It contains a bounds property, the bounding rectangle that it's sitting within, an array of corners which are CG points represented as dictionaries. We'll cover the difference between bounds and corners here in a later slide. It also has a type property, so it indicates whether it's UPC-E or QR, EAN-8, etcetera.

And then finally the most important one is the string value property. This is our best effort attempt to decode the payload into a string that you can make use of. Now I say best effort, because in certain cases with some barcodes, maybe it's damaged beyond recognition, but we can still tell maybe that it's a QR code or Aztec code. This property might return nil, so your code should be prepared to handle this.

Now, bounds versus corners. So let's say you've got your QR code scanning app, and your user is holding it off center, and it's sort of off axis, and the bounds property's going to come back as a CG rect. It is a rectangle that is axis-aligned with the image. But your corners are going to come back as the corners of where the barcode were detected, and so this allows you to draw a much tighter fitting overlay of where the barcode was found, so you can give a better representation onscreen.

So, performance considerations. These are some things you want to take into account for your application to get the best user experience. To start off, you should really just enable the codes that you're interested in finding. You generally don't want to turn all types of barcodes on, because this takes more CPU, more processing power, and it hurts battery life. So depending on your application's needs, just enable the codes that you're interested in. You can also make use of a new AV capture meta data output rect of interest property. We've introduced this in iOS 7, and I'll talk about this in a little bit.

You also want to pick the right session preset for your use case. Most applications can start off with the 640 by 480 session preset. Depending on the density of the codes, you might want to go higher or lower, maybe up 720p, or something below 640 by 480. But you can start there, and adjust as your testing dictates. You could also consider using a new auto focus range restriction API that can help you get faster performance for auto focus, and Brad's going to cover this a little bit later.

And as Ethan said, you could also make use of the new zoom APIs to get the barcode right, nice and tight in your image. So let's talk about requesting the codes you want. As before in iOS 6, you make use of the AV capture meta data output meta data object types property. And this is an array of string constants, these are defined in AV MetadataObject.h, you can check out that header.

Now normally you -- with iOS 6 we had behavior where all meta data types would be turned on by default. This was fine when we just had faces, but now we're introduced a new type for each symbology of machine readable code that we detect. So that's really not the ideal situation. So in iOS 7 you need to explicitly opt in to all desired meta data object types. If your app was built and linked prior to iOS 7, you get the old behavior of face data only, if that device supports it.

So here's an example of what you probably want to avoid. You can make use of the available meta data object types method on the meta data output, which is the array of all the supported types that that device will support, and then you set it on the meta data output. This would enable everything by default.

Most apps should avoid this. Instead, do something like specifying your array of types, and here we're going to look for faces, as well as QR codes, so this is the way we prefer you to do it. Specify them as you need. So here, now I can find faces within my QR code.

Alright, let's talk about limiting your search area. The new property AV capture meta data output rect of interest was introduced, and this is going to help you narrow the search window for where you're scanning for your meta data. This works on faces as well as barcodes. By default, it's the entire size of the image, but you can restrict that to a smaller portion as your application needs.

And as Ethan talked about, and also as Brad discussed last year in WWDC slides, there's some conversion that you need to keep in mind when going between the different coordinate spaces. So the meta data output rect of interest is in the devices coordinate space, which is different from your preview layer, or your video data output coordinate space. So we've provided conversion methods that help you go between those different coordinate spaces, and make it really easy.

Let's look at this and visualize it. So here I've got an app that's doing a scan of a barcode. You can see the sort of highlighted region up near the top. That's the rect of interest that I would like to have in my application. As far as the video preview layer's concerned, its coordinates are in pixel coordinates, so the upper left is 0,0, and the bottom right is 320, 540.

The meta data output, however, is different. It's actually rotated 90 degrees, and its coordinates are normalized. They're in scalar coordinates from 0 to 1, so 0,0 is the upper left and 1,1 in the bottom right. So if we want our rect of interest to be at 100 pixels down and 320 by 540, in the meta data outputs coordinate space that's .1,0, and the rectangle size is .4 by 1.0. So you can see going back and forth here could be a little complicated, and it gets tricky with mirroring and video gravity, and things of that nature. So we've provided the methods that help you go back and forth.

So when going from the video preview layer to the meta data output, you use the AV capture video preview layer meta data output rect of interest for rect method. To go the opposite way from the meta data output to the video preview layer, use rect for meta data output rect of interest. We'll take a look at this in code very briefly.

So using the previous example that I showed you visually, if I have my CG rect that are the bounds, I make my rectangle that's 100 pixels down, and it's 150 pixels high, and then that's in the preview layer's coordinates, so let's go to the devices, or the meta data output's coordinates, and use the conversion method. And then finally, we'll set that rect of interest on our meta data output.

So all of these things are good to keep in mind to give your users the best experience possible when doing machine readable code scanning. And just to drill this home, this is supported on every single platform where iOS 7 is supported. And with that, I'm going to turn it back over to Brad to talk about some additional focus enhancements. Thank you.

[ Applause ]

Are your brains exploding yet? This is a little bit of information overload, but I hope that if you'll just focus with me for a few more minutes, all will become clear. Alright, focus enhancements. Focus is a hard job. We take it for granted because our eyes do such a good job of it.

The eyes are the motor that change shape and can bring different things into focus, and the brain is the engine that -- the pixel processing engine that can determine what's supposed to be sharp, and what you want to focus on. The iPhone does the same thing. It has a physical mechanism that can move the lens so that it can get it into focus, and there's some algorithms that have to run to decide what should be in focus.

But this can be a really hard job, because sometimes you have ambiguous results, such as here where we have a person looking at a clock that's very close to him, and a tree that's far. And both might be equally sharp, and so which one do you choose? Which one should be in focus?

Sometimes we need a little help to get that right. And you can make sure that we get it right by using the new auto focus range restriction modifiers to our auto focus mechanism. Here's what it looks like in code. You tell the AV capture device to set its focus range to just near, or just far, or by default none means the default of search the whole range.

For machine readable code detection, we'd recommend that you use near, unless again you're going to look for barcodes that are in fields. And these auto focus range restrictions are supported on all iOS devices that have cameras, so go to town on it. The next enhancement is smooth auto focus.

I'm going to show you two different videos. The one on the left is what I'll term fast focus -- this is what's shipping today, this is our algorithm for finding focus -- and then what I term smooth focus on the right. You'll notice that it's going to pan from one side to the other, and then pan back, and you'll see different characteristics in the focus.

The one that's fast has a tendency to sometimes pulse, or throb a little bit, because it's running through the whole range really fast, whereas the smooth one runs slower, takes a little more time to do it, but doesn't have the visual pulsing. So here I go, 1, 2, 3. Take a look at both sides. You'll see the one on the left has a tendency to just come in and out a little bit more noticeably. The right one is focusing. It's just doing it more subtly.

[ Silence ]

You'll still see it focus from -- every once in a while you'll see -- you can definitely see that the smooth one is zooming it back into focus, but the left one tends to be much more prominent. Okay, like I said, the fast focus is the one that ships today. We're offering now the smooth focus as an alternative behavior modifier to auto focus, and you do that by telling the AV capture device to set smooth auto focus enabled to yes.

Smooth auto focus just slows the focus scan down, so it's a little less visually intrusive. We recommend that you use this if you're recording a movie. For instance, you want to perform a tap to focus in the middle of a recording. Well you don't want to ruin your recording with a big vwoop in the middle. So if you use the smooth recording it will take a little bit longer to get there, but the smooth focus will be less visually intrusive.

We do recommend that you stick with the fast focus for still image taking, because it's faster, and no one's going to see the pulse in the resulting still image if it got to focus faster. This is supported on iPhone 5. And Slowpoke makes use of this when recording movies, so you can see how it does it. Now let's look at how you program with these modifiers. It's easy, just do that.

So as with all setters on the video device, you have to lock it for configuration first. If you're successful, then you can start checking whether these features are available. Don't set them blindly, you'll throw an exception on platforms where these features are not supported. Auto focus range restriction happens to be supported everywhere, but be safe.

So here we're going to set it to the far range, and then in the next block I'm seeing whether smooth auto focus is supported, and I set it to yes. And then for giggles I threw in an extra one, which is to set the point of interest. This is how you would do a tap to focus at a particular point.

None of these actually start a focus operation, they're just programming the next focus operation. You're telling it I want you to focus far, I want you to focus smooth, I want you to focus at dead center. And then the way that you actually kick off the focus is to set focus mode to auto focus, or continuous focus, and then unlock when you're done.

The last bit we're going to talk about today is integration with application audio session. Hopefully you've seen some of the core audio sessions in years past, and the one that they just did I think it was yesterday, where they talked about some improvements to AV audio session. If you're not familiar with it, you have one. If you have an app, you have an AV audio session.

It's a singleton instance that every app gets, whether they want it or not. As soon as you use audio, you have an AV audio session. And it does important things for you like configure the routing, for instance, whether both the microphones and the speakers are active at the same time, or just speakers only.

You can customize your category, for instance so that you include Bluetooth or not, lots of goodness there. And new in iOS 7 they have some great new features for microphone selection that they talked about in yesterday's session, where you can select a specific microphone, top or bottom, back or front, and you can even set polar patterns. For instance, if you want an omnidirectional pickup as opposed to cardioid or sub-cardioid, so great stuff there.

Why do I bring it up here in a video recording session? It's because we have a situation on our hands called dueling audio sessions, and let me describe it to you. Let's say you have an app, and it plays back audio, and it also does some recording with the camera and with the microphone.

Well, you're probably going to be using an AV audio session, because you're playing some audio, and you're definitely going to be using an AV capture session, because you have to if you're going to use it for camera capture. Unbeknownst to you, AV capture session is kind of lousing things up, because it has its own little private AV audio session.

So now what happens when you play and record, you fight. So you get a situation where depending on which one you started first, one is going to interrupt the other. So if you started playback first and then you start recording, the playback stops, or if you do vice versa, then you interrupt your recording. Not good for anyone.

So in iOS 7, we're changing that behavior. There were some good things about the old behavior. By having a private AV audio session, we ensured that the AV capture session is always configured correctly to succeed for recording. And your audio session is not configured automatically by default to record, it's just for playback.

So we needed to do that. But now we're going to help out the interruption problem by using your app's audio session by default. And we tell you that we're doing that by accessing the session property uses application audio session. And again, by default it's yes. If your app is linked before iOS 7 you get the old behavior. We use our own little private audio session, and nothing changes.

And we still will configure your audio session now, not ours, so that it succeeds for recording. And that's the default behavior. You can opt out of that behavior, and there's an accessor for that called automatically configures application audio session. (We're going for length here.) After the capture is finished, we're not going to attempt to clean up our mess, so we're not going to try to preserve any of the state that was in your audio session before we configured it to succeed.

If you want to stash off some state you can do that before beginning your recording. Be careful though. If you opt out of the automatic configuration that we provide, because you are now in control of your AV audio session, you can pick a category that will make recording fail. So just be on guard there.

I mentioned earlier in the talk that we have this great new way of configuring AV capture devices by setting the active format. This however, does not apply to audio devices on iOS 7. It exposes a no format array, its format's array is nil, and that's because we already have a perfectly good mechanism on iOS 7 to configure audio, which is the AV audio session. If you want to configure your input, instantiate your AV audio session, and then go to town - setting gain, sample rate, whatever you want.

Best practices, we do recommend that you let us use your app audio session so that we don't have the interruption problem, and we do recommend that you let AV capture session modify your AV audio session by default, because it'll succeed. The exceptions to the rule would be if you know that it's going to do something that you don't want to do.

For instance, by default it will always pick the microphone that's pointed the same direction as the camera that you're using. So if you're using the front facing camera, it's going to pick the microphone that's pointed at the person's face. If you for instance want to use the front facing camera, but also record from something in the back, you'll need to use your own AV audio session configuration.

Lastly, sample code update. Here's our dessert. Last year we talked about Video Snake, which was a great demo app that incorporates a lot of capture aspects with open GL. We've updated it this year, incorporated iOS 7 APIs and best practices, including use of clock APIs that I have not talked about today, but they let you know which clock we're using, video or audio, which one is the master clock for a session. It also illustrates best practices with respect to integration with open GL, and writing with asset writers. So please download it, and model your code after it.

If you've been watching the news -- the Apple news, you'll -- you probably were aware that two weeks ago we introduced a new iPod Touch. It's the 16 gigabyte iPod Touch, and what's unusual about it is that it has no back facing camera, it only has a front facing camera. Well, if you have been following Apple's sample code, your app still works with this new device, because it would have picked the right one by default.

So sample code is your friend. Please use it. Please model your code after these samples that we spend a lot of time putting together, because we want to make sure you're using best practices in your apps. In summary, we talked about user consent, transparency, then we talked about a lot of new features, 60 frames per second, video zoom, barcodes, app integration, app audio session integration, focus enhancements.

And all of these demos that we showed you today are available, so go download them and take a look at them. Documentation, and of course, related sessions. Some of these already happened, but you can already look at them in your WWDC app, because they've already been posted. They're amazing. Thank you for coming today, and enjoy the rest of the show.

[ Applause ]