Audio Session and Multiroute Audio in iOS - WWDC 2012

Graphics, Media, and Games • iOS • 55:37

iOS provides a powerful engine for playing, recording, and processing audio in your applications for iPhone, iPad, or iPod touch. Learn how to deliver multiple streams of audio from your application. Gain a thorough understanding of the new capabilities in audio session management.

Speakers: Bill Stewart, Harry Tormey, Torrey Walker

Unlisted on Apple Developer site

Downloads from Apple

HD Video (181.1 MB)

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.

Good afternoon, everyone. Welcome to session 505. My name is Torrey. I'm an engineer in the core audio team. And today I'll be talking with you about audio session and multi route audio. Now, for those of you who have written an application before that plays or records audio, some of the information that I cover today about audio session may seem familiar to you.

However, there are a number of important changes as well as some new functionality we think you're going to want to take advantage of. And for those of you new to this, welcome. This session will give you a basic overview of using audio effectively on iOS and also will capitalize on some new features that will really make your application sing.

First, I'll give you an idea of how this session is structured. We'll start with kind of a 30,000 foot view of audio session before we drill down and look into the code to see what it takes to configure the audio session. We'll take a moment to look at the anatomy of an audio route and what it takes to handle route change notifications.

Then we'll talk about one of the Objective-C APIs you can use, AV Audio Player. And during this talk, we'll talk about the state of multichannel audio on iOS, as well as how you can select channels for input and output now. Then we'll move on to the all-new MultiRoute category, new to iOS 6. And then we'll wrap up the talk with a discourse on using I/O units in various types of applications with Audio Session. So, what is Audio Session?

Audio Session is a managed audio experience on iOS. Now let's face it, people use their devices everywhere. When they're hiking, at nightclubs, in their cars, in the bathroom. Some of you are probably using your devices now. And users have expectations as to how the system behaves with respect to the hardware controls, like the volume up and volume down buttons, the ringer switch, the lock button, and how the system behaves when you plug in or unplug headphones.

So there's also a number of different devices that are available on the market now. There are iPads, there are iPhones, iPod touches, and there are numerous software updates. What you want to do is choose the right APIs to communicate what your app wants to do with audio. When I say a managed audio experience, I don't mean that we give you access to every knob, switch, and dial in the system.

On the contrary, this allows you to opt in to a number of specific audio behaviors so you don't have to reinvent the wheel. So, we'll focus on the audio user experience. And by that, what I mean is we'll make your app sounds behave the way that users expect them to behave, and also consistent with the applications that are already installed on the device.

The way we'll do that is first you'll categorize your application, you'll respond to any interruptions that can occur, and you'll deal with any audio route changes. Okay, let's get started. When I say audio session, I am referring to the AV Audio Session Objective-C API, which is part of the AV Foundation Framework.

This API contains all audio session functionality. And to use audio session, there are five tasks that you'll need to execute. First, you'll set up the session and you'll register for notifications. Second, you'll choose and set a category. Optionally, you can choose and set a mode. Third, you'll make your session active.

Fourth, you'll handle any interruptions. And fifth, you'll deal with any audio route changes. All right, talk, talk, talk. Let's see some code, right? The AVAudioSession instance is a global shared instance. You can retrieve it by using the class method shared instance that will give you a pointer to the AVAudioSession object. Next you want to register for notifications. There's an example here on this slide of how to do that. In this example, we're registering for AV audio session interruption notification, so we'll know when we're getting interrupted in our session. Next, we'll choose and set a category.

Categories have a variety of different rules and behaviors associated with a category, and you'll choose category based on how you intend to use audio in your application. This may involve management of things such as whether playback is allowed, whether recording is allowed, whether or not your application's audio is allowed to mix with audio played by other applications, and how the audio behaves when the ringer switch is set to silent.

There's also a brand new category in iOS 6 called MultiRoute. We'll be discussing this in detail later in the session. Back to the code. To set your category, you'll send the message "Set Category" to the session object. In this example, we're using play and record, so we'll be setting AVAudioSessionCategoryPlayAndRecord. And ordinarily, we deal with errors if there were any.

Next, you can optionally set a mode. A mode is a way for a third-party application to declare the class of that application, so that it's treated like other applications in that class. Here we have voice chat, video recording, measurement for people who want the rawest, unprocessed audio data possible. Default is the behavior you'll get if you don't specify a mode.

And then there's a new mode in iOS 6 called Movie Playback. As you might be able to guess, Movie Playback will optimize the dynamics of video content for applications whose primary purpose is playing movies. And modes and categories work together to communicate how you're going to use audio in your application. Okay, the code for the mode. Ooh, that's a poem.

This time we're going to set the video recording mode. We can do that by sending the message "Set Mode" to the session object with "AV Audio Session Mode Video Recording." We're now ready to make our session active, which we can do by sending the message "Set Active" "Yes" to the session object. And now we can configure our AV audio player, our audio queue, our AU remote I/O, whichever API we're going to be using to play or record audio.

Okay, let's talk about interruptions for a second. Your audio session may be interrupted by higher priority audio, something like an incoming phone call or a clock alarm. And when this happens, your session will be made inactive. Now, this isn't a polite request to say from the system saying, "Hey, can I play that cool little video game Ringtone for a second if you'll just stop?" No, priority has already been taken away from you, and that's why it's called an interruption.

So you want to be able to deal with this. After the interruption is over, depending on the API that you're using to play or record audio, you may need to reactivate a certain state, and you may also want to reactivate the session if that's appropriate for your type of application.

The way you'll deal with notifications is you'll get notified by registering for AV Audio Session Interruption Notification. If you've dealt with notifications before, this one has a user info dictionary, which has two keys. The first key... is the type of interruption that you received. There are two of these: type began and type ended.

And then the second key is AV audio session interruption option key. It only applies to type ended. And if you have this option, it will be AV audio session interruption options should resume, which lets you know if it's okay to resume playback or recording of audio. A little more about these notifications.

For AV audio session interruption type began, that means that your audio has already been stopped, you're already inactive. This is an opportunity for you to change your UI, maybe by converting a play button to a stop button, or graying something out to say that you're not playing anymore. And this is a good time to point out that not every type began notification has a corresponding type ended notification. This actually depends on what actually interrupted your audio.

For example, the user's using your application, there's an incoming phone call. If the user decides to take that phone call, you will not get a type ended notification when they're done talking. However, if the user decides to let that same call go to voicemail, you will get the type ended notification. This is an opportunity for you to make your session active again. You can update your user interface.

And if AV audio session interruption option should resume, it's part of that dictionary, it's okay for you to start playing audio again. This is one more notification that we've added to iOS 6 that I want to let you know about. And it is AV audio session media services were reset notification. Try saying that five times fast.

So, if the media server goes down for any reason, this is the notification that will fire. There's no user info dictionary for this, and any objects that you are using to play or record audio are now invalidated and they are zombies. So we don't want a zombie apocalypse on your device. Please kill your zombies. The proper way to handle this notification is to fully reconfigure your audio.

Okay, let's talk about route changes. Users expect a behavior that we like to call "last-in wins." What that means is your audio route is completely determined by the last thing to be plugged in or unplugged into the system. For example, if someone is playing audio out of a device and they plug in a set of headphones, the user would expect the audio to be rerouted to the headphones and to continue playing without pause. However, if they were to detach that same set of headphones, they would expect the audio to be rerouted to wherever it was playing before, in this case, the built-in speaker, and for the audio to pause in this situation.

If there is a new route en route, you will get AV Audio Session Route Change Notification. The user info dictionary for this notification has two keys. The first is AVAudioSessionRouteChangeReasonKey. This will tell you the reason that that route change occurred. Could be something like there was a new device that came online, or the category change.

Next is the previous route key, where it will give you back the previous route. And that is of type AVAudioSessionRouteDescription. We're actually going to look at the anatomy of that right now. So, inside querying the route, you can actually get the route by sending the message current route to the session object. This gives you an AV audio session route description with detailed information about that device, specifically a collection of inputs and outputs, both of type AV audio session port description. So what's a port? A port is a single input or output on a device.

This port description has four different properties. The first is the port type, something like AVAudioSessionPortHeadphones. The second property is the port name. This is a human-readable and localizable string, such as line out or headphones. There it is, the UID, that's short for unique identifier. This is a system assigned string that identifies the port. And finally, there is an array of channels. which are of type AV Audio Session Channel Description. So what's that look like?

AV Audio Session Channel Description. Well, the channel descriptions are just a description of a single channel on one of those ports. So, to give you an idea again, we've got the route description, which has inputs and outputs, which are ports, and each of those ports has channels. Now for these channels, they have three properties.

The first is the channel name. It's a human readable string, like headphone left or headphone right. Next, there's the Owning Port UID. That's the unique identifier from before in the port description. And finally, there's a channel number, which is a one-based index into the array of channels on that port.

And that's what you need to know to use audio session. First, you'll set up the session, you'll register for notifications. Second, you'll categorize your application. You can also set a mode. Third, you'll make your session active. Fourth, you'll handle interruptions. And fifth, you'll deal with any route changes.

Now for those of you who have written applications before and used audio session, you may notice that a lot of this sounds different from what you were doing before. The reason why is because there's no more mix and match. The audio session services C API has been deprecated. So you'll now want to move your apps over to using the Objective-C API exclusively, AVAudioSession.

But all the functionality from the C API is there. A few more changes to let you know about. AV Audio Session Delegate has been deprecated. Now remember we register for notifications with NS Notification Center. Also, some of the properties have been deprecated and renamed so that naming is consistent across the board.

Now let's take a look at one audio playback API we can use, AVAudioPlayer. If you want to get your app up and running in a hurry and just playing an audio file, AVAudioPlayer may be a good choice for you. It plays a number of different audio file formats. It has the transport controls that you would expect, play, pause, seek, and stop. If you want to play more than one audio file at a time, that's easy. Just create multiple AVAudioPlayer objects.

And it supports volume, panning, looping, and rate control, which is the speed at which the audio file is played back. So if we wanted to create one of these in code, we can do that using a file URL. In this example, we've supplied a URL, we're allocating a player, and then we're initializing it with the contents of that URL.

and now new to iOS 6, finally, you can play a file from the music library. The way you do this is you obtain a reference to the MP Media item object from the device's music library. This is something that you can get from, say, MP Media Picker Controller.

In this example, we've got a media URL. Our MP Media Item object is My Media Item. We'll get the URL for that, and we'll allocate our player and initialize it with that URL. How about interruptions? Before, when I talked about interruptions in audio session, I said you can register for interruption notifications, and you should do that. However, if you're using AV Audio Player, you can get by with using the delegate methods.

They are completely analogous to the interruption notifications from audio session. Here we have audio player begin interruption. Like before, this means playback has already stopped. You're already inactive. This is an opportunity for you to update your UI to say you're not playing. And, once again, you're not guaranteed to get this call. It depends on the type of notification. But there is an interruption, and it also has the option AV Audio Session Interruption option should resume. You update your user interface at this point, and depending on whether or not you have that flag, it's okay to resume playback.

Let's talk about the state of multi-channel audio on iOS. Specifically, when are you going to have access to more than two channels? With USB, from iOS 5, you've been able to access more than two channels. If you have a USB audio class compliant device plugged into the camera connection kit of an iPad.

Beginning with iOS 6, you now have access to more than two channels for USB output as well. Now, with all these channels, you may wonder, "Does that mean that I have to do something special with my application?" Not necessarily, but you should know what the default behavior is.

If you decide to play stereo content, and the currently selected route has more than two channels available, it will play out of the first two channels. The same is true for mono. If you play back a mono file, it will play the same mono data to the first two channels. However, if you're recording, it will record from the first channel only. Now, if these are not the channels you want, you'll want to use something we've added to iOS that we call channel selection.

You can choose the inputs and outputs that you want to use. And the way you'll do that is you'll set an array of AV audio session channel descriptions. Remember when we examined what a route looked like earlier? It started with the input and output ports, and then there were channels associated with each of those ports.

So you'll select an array of those that you want to use and assign those to the object. And I'll show you how to do this in code a little later in the session. You can use this with both AV Audio Player and AV Audio Recorder. And there are congruent channel selection methods for Audio Queue and AU Remote I/O.

It also turns out this is not the only way that you can get multi-channel audio on an iOS device now. I'd like to introduce you to something we call MultiRoute Category. Remember when I talked to you earlier about the behavior last in wins? That means the audio route is completely determined by what was last plugged into the system. Take a look at this diagram.

We've got headphones plugged into the headphone port. And we've got a USB 2 in, 2 out device plugged into the camera connector kit in the bottom of the iPad. From simply looking at this diagram, you don't necessarily know where your audio is going to or from, and that's because you don't know the order they were plugged in.

But you're going to get one or the other, and that's the end of the story. So Last In Winds is still the paradigm on iOS, but there is one new exception to it, and that's called MultiRoute Category. When you set MultiRoute Category, that same setup from before now appears as a single MultiRoute device.

So now we have three inputs. That is the iPad built-in microphone and the two USB inputs from the USB audio device. And now we have four outputs. That's headphone left, headphone right, USB output 1, and USB output 2. Essentially from using two very simple stereo devices, we're now in a multi-channel environment.

I'd like to invite my colleague Harry to the stage to give you a demonstration of how using MultiRoute Category can work in your app. Harry? Hello, everyone. I now want to give a demonstration of an application that uses the new MultiRoute Category with AV Audio Session. This application consists of four instances of AV Audio Player, each of which has its own volume slider, select channel button, select track button, play button, and stop button. For the purpose of this demonstration, we're going to play back four preselected audio files through the HDMI outputs and headphone outputs simultaneously.

Before we get to that part of the demo, though, let me show you how you can use the Select Channel button to configure an AV Audio Player instance at runtime to play audio out a selected port. In this example here, we have HDMI ports and headphone ports. So we can output audio through the headphone left, headphone right, HDMI output one, HDMI output two. I'm going to output this file through the left headphone channel.

I'm now going to play back audio through the HDMI outputs and the left and right headphone outputs simultaneously and adjust the volume so that you can differentiate between them. So that's just the HDMI outputs. I'm going to stop that. I'm now going to play. Left headphone. And the right headphone. And as you can see, Each of these channels are selectable. And that's it for the MultiRoute demo.

Back over to you, Torrey. Thanks, Harry. Let's take a closer look at what he just showed in this application. So in the MultiRoute demo, before he set MultiRoute category, this was his configuration. He had an iPad with headphones plugged into the top and HDMI was plugged into the bottom. Then he set MultiRoute category.

Now, at least for output, he's looking at a four-channel output device. It has headphone left, headphone right, HDMI 1, and HDMI 2. Next, he set up some files. Two of the files were mono, the one that said left channel, the one that said right channel. And then he also had a stereo music file.

Last is, he chose which channels he wanted it to go to. So sent one to headphone left, one to headphone right, and the music came out of HDMI. Now why would you want to use MultiRoute category? Well, how much control do you need? Maybe you're designing a DJ application, and you want to have a stereo cue mix in your headphones that you can hear independently from the stereo house mix. So while they're all grooving, you can be beat matching before you move the crossfader.

Maybe you're designing a digital audio workstation application where it's very common to manipulate multiple inputs and outputs. Maybe you've got an idea for a multi-channel instrument and you want to play the audio to different outputs or record from different inputs, more than two. And then maybe you've got ideas for audio applications that we haven't even thought of. In fact, we're kind of counting on it.

So how do you do this in code? Back to the AV Audio Player example. We've already retrieved our session, and now we're going to register for notifications. This time we're going to register for AV Audio Session Route Change Notifications. If you're going to use MultiRoute category, it is absolutely critical that you register for this notification. The reason why? Because any route change could potentially invalidate part or all of the channels that you are playing audio to or recording audio from.

So, you'll register for AV Audio Session Route Change Notification. Next, we're going to set MultiRoute Category. We'll send the message "Set Category" to the Session Object with "AV Audio Session Category MultiRoute." And we'd deal with errors if there were any. Now we can set our session to active. This will trigger a route change with the reason "Category Change." Now you want to see what's actually in your route. And you can do that by sending the message current route to the session object.

That'll give you a pointer to the AV audio session route description object. And in this example, I've additionally squirreled away the outputs. Now you're probably wondering what this thing actually looks like if you were to print it out. So, if you can walk through this slide with me, we're going to log what the route description looks like. Top of the hierarchy, inputs and outputs. So now we're first looking at inputs. At index zero, there is iPhone microphone. There is one channel at index zero. It's microphone built in. Channel number one.

Now we move on to outputs. And by the way, this is the type of route description you could expect to see if the user has headphones plugged into the top of the iPad and nothing else. So, output port zero is headphones. The name is headphones. And there are two channels. Headphone left, which is channel one. And headphone right, which is channel two.

So, we're going to create our player. We'll allocate an AV audio player and we'll initialize it with some URL. Then we're going to select the channels we want to use. So I'm going to get, since I'm streaming, I'm going to play an audio monofile. I'm just going to get one channel.

So give me the outputs, the object at index zero will give me the first output port of those channels. I want the object at index zero, so give me the first channel of the first output port, and that's my desired channel. Next, we want to create an array of those desired channels.

Now in this situation, we're just playing a monofile, so it's a single channel. So I'll just say NSArray array with object desired channel, and we'll squirrel that away in channel descriptions as an array. Then we can assign channel descriptions to our channel assignments for our player. Alright, and then we're ready to play audio. Just as simple as that.

Thanks for your time and attention. I'm now going to turn the stage over to my colleague Bill, who's going to talk with you about using IO units with audio session. Bill. Thank you, everybody. Thank you, Torrey. And thanks, Harry, for a good demo. What I'm going to go through today is to go a little bit lower down into the system and look at IO units and then how those IO units can interact with some of the concepts that Torrey has gone through with audio session and the route descriptions and so forth. So an IO unit, there's two in the system. The main one is IU Remote IO. It's an audio unit. And it represents the lowest level interaction that you can have with audio IO on the system.

Now, because it's at a low level, it is operating very much at a low latency. And you can, depending on routes, depending on some of the devices that are plugged in, depending on the settings that you have, you can get input to output latency that can be less than 10 milliseconds. So we're looking at something that's very serious, very responsive.

And it does take a little bit of extra work than just creating an AV audio player and playing a file. So it's important to understand when you would use these sorts of things, when you want to understand about them, know about them. Games will typically want to use something that's lower latency and have some kind of control over it.

Apple's OpenAL implementation does so. Games that typically provide their own engines also will use the remote I.O. as their basis. Music apps like GarageBand, samplers, guitar amp models, drum machines, all this kind of stuff will typically want this type of functionality. And also VoIP apps where, of course, you're communicating, you want that to be very responsive.

So how do audio units work? We've done several sessions in the past about audio units. I'm not going to go through all of the details of them. Typically, you open them, you do some configuration, you initialize them, and then they're ready to be used and you can interact with them while they're actually rendering and doing their work. An IO unit has some particular things that you need to look at and to take care of when you're using it.

So once you've opened the AU Remote IO, you'll look at the IO formats. You'll look at both the device side formats, the port formats in AV audio session terminology, as well as the client formats, which is the formats that you want to interact with when you use the audio unit.

So once you've set those formats up, and that's the main configuration that you'll do, then you would initialize the audio unit, and that causes the audio unit to evaluate its state, to allocate buffers and so forth, whatever it needs in order to do its work. And you'll also need to establish your data connections and the mechanisms that you use for that, and we'll take a look at that in some detail, and then you start Audio.io.

So what does this look like? So there is some complexity here. It's not quite Byzantine, but it's probably getting that way. And the reason that it's this way is because we are trying to give you one place to get audio input and output for one particular period of time. And that means that you can really get the low latency.

So I'm going to try and walk you through what this actually looks like as an audio unit and how you understand it and then use it in your application. Now you can use input and output separately or you can use them together. So this isn't something you've got to do all or nothing. You can use the parts of it you want.

So you see here, audio units have scopes. And the three main ones are global scope, where an audio unit's going to keep state that's global. There's input scope, which is where the audio unit is going to get input from, you know, audio input. And then output scope, which is what the audio unit is going to output.

And elements are members of a scope. They're typically the same kind of collection of objects that are the same. You can think of elements in, if you think of a mixer, which could have many different inputs and one single output. So in an audio unit, that mixer will have many input elements and one output element.

So when we're looking at an AI unit, we're looking at an audio unit that has two inputs, sorry, two elements, both on input and output scope. And element zero is the element you interact with when you're doing output, when you're making sound. And a good way to kind of trigger this in your mind is element zero and output, zero and O. And then element one is the element that you interact with when you want to get audio input from, you know, the outside world into your application. And you can, there's a nice way of thinking about that is one and I.

Now then, the purple boxes here represent what we're going to call the virtual formats, either output or input. And you can see what these look like from AV audio sessions, port descriptions, and they will represent whatever audio input and output capabilities are available to your application when you are running. And then the green boxes represent the client format. And the client format is the format of the data that you want to provide to the audio unit or that you want to get from the audio unit when you're getting input.

And the yellow arrows between these boxes represent audio conversions that can occur. And if you know the audio converter API, that's actually the object that's being used internally. And that means that your client format can be different than the virtual format, either on input or on output, and we'll do the conversion for you. So how do you interact with this?

So for setting your client formats, you do the audio unit set property call, and audio units are generally configured through a property mechanism, and there's lots of properties. You can have a look in audiounitproperties.h for the exhaustive list. And then when you set the client format, you set that for output on input scope and element zero, because that's what you're going to give the audio unit to play.

And then when you want to record audio, you'll set the client format on the output scope of element one, because you're going to pull that audio out of the audio unit, and the audio that's coming into your application then will come in through element one, and we'll look at that a little bit more.

So what about audio routes and how that interacts with the IA unit? And if we remember the demo that Harry showed and what Torrey talked about, we had an output of two channels of HDMI and two channels through the headphone jack. So the AU Remote I/O unit on the output scope on bus zero, it's going to see four channels. And if you look at AV Audio Sessions route description, the current outputs will have four channel descriptions. There will be two channels for the headphone output channels and two channels for the HDMI.

And so you get this format by looking, by doing audio unit get property, and it's the same property identifier as setting it previously, which is stream format. And in this case, you're going to get it from the output scope on element zero. Now, on the purple boxes, just as a side note, you can't set those formats. They're always just read-only properties. They're just going to tell you what's available to your audio unit at any given time. And, of course, with route changes, these values can change if devices come and go.

So what does it look like if we kind of pull this apart a little bit? So if we start at the bottom, the purple box is what we see on the device side, and it's on the output scope. And for the demo we showed, we'll see four channels. And from AV Audio Sessions channel descriptions and the order that they're presented, we can see headphone left and right, HDMI 1 and 2.

[Transcript missing]

And like I said, they want to make sound. That's why they've run your application. So sound in these categories is going to blow through the ringer switch. It will have no effect on whether audio is played or not. Now there is one thing to understand about AV Audio Session here, and that is the mix with others option.

So we talked about with games how you mix with others, and with a lot of music apps, it is actually quite useful to mix with others because different users will want to use different applications potentially at the same time. Maybe they want to play along with their iPod application, or they want a drum machine going in the background and then switch to a synth and make the synth go.

And if you're not mixing with others, then you have the potential to interrupt each other and you get into this kind of, no, I want the system, no, I want the system sort of battle. So you would set the mix with others when you set the category as an option. And you just would set AV audio session. Category, option, mix with others.

And you can still play audio in the background. It doesn't affect your application's ability to keep playing audio in the background or keep playing audio with the ringer switch set. It's all just about how you cooperate with other applications. Now, you can imagine that there are cases with music apps where they really want to be the primary app. They really want to exert some control.

Over the system. Maybe they're very particular about the sample rate the device is running at, about the latency characteristics of the device. And they really want to just assert that ownership. And so when they set the category, play or play and record, or even record, for example, you just don't set the mix with others options. If you consider that as a characteristic, as a requirement for your app.

And you could, if you wanted to really give this as an option to your user. But we generally prefer to not have the user have to do this sort of thing. But for you to understand how your app is going to be used and just to take care of things as would be appropriate.

Now, a couple of things that you might be interested in at this, interacting with this sort of level with AV audio session is the I/O duration. This can go down to as low as under two milliseconds. It could be as high as 40 or 80 milliseconds. And this is the size of the I/Os that we're going to do. So if you were to set a three millisecond I/O at 48 kilohertz, you'll get about 128 samples to process in your I/O unit's rendering cycle.

And if you're a game, you know, 10 milliseconds might be good enough, given all the other latencies involved in the game, or 20, or 5. It's really up to you. You can set this yourself based on what you expect. Now, there is a cost. If you have lower latency I/Os, you are going to be using more power.

So just don't all go and set this to do, to two milliseconds or something, because that's what you think is going to be perfect. In some cases, you really want to be smart about how you set this. And you may also have situations where you want to understand the sample rate and have the sample rate set, and there's an AV audio session property for you to do that.

Now, if you're mixing with others, you can make these, and they're requests. That's why the name is preferred rather than required. But if your category is not, not allowing you to mix with others when you go active, you'll generally get these settings. And you'll generally get them in the case where you're mixable. You just won't be as guaranteed to get them.

So that's the basic AU Remote I/O and AV Audio session. I wanted to spend a little bit of time looking at the voice processing audio unit. This is an extension of AU Remote I/O. And what it does is that to the I/O mechanisms of AU Remote I/O, it adds voice processing. And the voice processing can be done on the output as well as on the input.

The types of voice processing that it will do is echo cancellation, noise suppression, gain correction to keep the input within a certain range of gain. And it's really designed for high quality chat and it's optimized with different settings for different routes like speaker or headphones and also for different use cases. Now while we're mainly focusing on iOS in this talk, there is a version of this audio unit that is available from Lion and you can use this as well there. And it works much the same way.

So why would you use this? Why wouldn't you just do your own echo counselor? Right? It's not that hard. Maybe it is that hard. So what I thought I'd do is to actually go through and show you what's actually going on when you're in a conference and why we would ask you to, or recommend that you use this audio unit. So the two wavy lines there represent the network. And we're having a conversation between a far end talker, somebody somewhere else, and a near end talker, which is, for the purpose of my talk, me.

So on my phone, what's going to come in from the network is the dark blue line that's to, uh, yeah, that one. Um, and that's what I'm going to hear out of my speaker. Now, what I could hear out of my speaker is also other applications making sounds.

It could be UI sounds, I could be playing a game, I could be listening to music, and so the downlink audio, the dark blue line coming from the network, is going to be mixed with whatever other sounds are on the system, and that's what's going to come out of my speaker. And we try to represent this by the light blue line because something else is there besides what went out the network.

Now, when I'm talking, the microphone is going to hear everything I can hear, uh, uh, particularly if I'm on a speakerphone. And so the microphone is going to hear what I'm... what is coming out of my device as well as what I'm saying to it. And so there's sort of two sources of audio, what I'm talking, uh, my voice, and what my device is playing, and we're representing this coming into the microphone, mixing it together, and we have this nice little magenta line that goes into the voice processing block.

Now, of course, the role of the voice processing block is to take out the signal that was played by the device and Basically subtract it and you end up with what I was talking about, not what my device was playing. And that's the red line coming out of the voice processing block.

And we're trying to use color here to key you to the fact that the goal of the voice processing block is to extract my speech. And not to play, not to include the blue line that came down from the network. Because if we don't do this, then the far end talker is going to hear themselves talking with a delay. And that's why it's called an echo. It sounds just like an echo.

Now the reason that this is important to understand all of this is because other applications are mixing sound, playing sound, and so you don't actually know in your application other sound that is played. So you can't deal with it. You just can't. You don't know the audio that's there. And so you can't distinguish between what was played and what was spoken in the environment.

Okay, so to use the voice processing AU, it's very similar to AU Remote I/O. You just look for the voice processing AU and you make it and you set up the callbacks and everything the same way. Now if you're doing a VoIP app, voice is primary and you want to make sure that that's the best experience you can. So first of all, for setting up AV Audio session with this, you'll need to set up the category to do play and record because you want input and output simultaneously.

And this is where the modes come in. You would set the voice chat mode and that allows us to optimize the routes that may choose. It tells us, oh, there's, you know, some things we should do in the audio system to make sure that this is going to work well.

And by using this mode, you can set up the voice chat mode. And that allows us to optimize the routes that may choose. And you get the same behavior in your device that a phone call or FaceTime will get when they're running a VoIP app. So it makes for a consistent user experience as well.

Some of the things you might want to do with the VoIP app is to set the preferred sample rate. Typically, you would use this to match the sample rate of the VoIP that you're doing, say 24 kilohertz or 16 kilohertz. And it just means that if the sample rate is set, that there's less processing done on the signal, and so it's going to keep fidelity of the voice much better.

You also might want to adjust the preferred IO duration. And it's interesting here, because if you're putting data on the network, you're doing so in packets. And typically those packets represent a span of time. Let's say 20 milliseconds. That's a common one. So it's really no use to do IOs at 2 milliseconds, because you're running the system at a very high rate of overhead.

And all you're going to do is accumulate that data until you've got 20 milliseconds, and then you can encode it and put it on the network. So why not just do a 20 millisecond IO to begin with? Well, that's probably a good idea. So you can set preferred IO duration, which is set in seconds, and set it to 20 milliseconds. And you can play with this. Maybe you want to do a little more tweaking on the output side. You want a little less latency on the output. It really depends on your app.

But just don't assume that the lowest IO setting is really necessary to get what you want. There's some other properties you might like to look at in AV audio session. There's a property to override the route, so you get speaker output, even though headphones are plugged in, for example. And that's the classic speakerphone. There's some properties in audio unit properties, that great tome of audio wisdom. And there are some properties specifically there for the voice IO unit that you should look at. There's some comments there. I won't go into them today.

Now, just to finish up, I thought I'd just very briefly discuss a different scenario where voice processing is used, and that is in the GameKit, the GameChat case. And this is interesting because it shows some of the flexibility of the system. And the difference here is that with a VoIP app, audio, the conversation is primary. You want everything to be set based on what your conversation is, going to be, to make that an optimal experience.

With the game, that's not the case. The chat is secondary. The game, the fidelity of the game, the audio environment of the game, that's really primary. And so, typically, when a game's run, the audio will be at 48 or 44.1, something that's going to provide you with that rich audio experience for the game. But the chat's not at that sample rate. The chat's probably at 16 kilohertz or something much lower.

And so the voice processing I/O will take these into account. It will resample the audio coming in. We will optimize the system around this kind of configuration. And it's not that it's going to sound like crap, because it isn't. It's going to sound pretty good. But it's a different way of configuring the system because really the game is more important than the chat. The chat is an ancillary, it's a part of the game. It needs to sound good. It will sound good, but it's not primary. So this is some of the flexibility that the audio system has.

So that's the end of my talk. Thank you for listening. AB Audio session was covered. We hope in enough detail for you to go and use multi-channels, to use multi-routes, and we look forward to seeing what you do with that. IO units, low-level stuff, hope you have a good understanding of that. This is where you can get more information. Eric is the media technologies evangelist. There's a lot of downloads of sample code and documentation on the developerapple.com. And then there's the developer forums. Thank you very much.