Audio Development for iPhone OS, Part 1 - WWDC 2010

Graphics & Media • iOS • 53:49

iPhone OS provides a powerful engine for playing, recording, and processing audio in your applications for iPhone, iPad or iPod touch. Get introduced to the high level classes in AV Foundation used to play or record sounds. Gain a thorough understanding of audio session management, and learn the recommended practices for handling background audio, choosing an audio session category, and playing multiple sounds simultaneously.

Speakers: Bill Stewart, Allan Schaffer

Unlisted on Apple Developer site

Downloads from Apple

HD Video (148.3 MB)

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript has potential transcription errors. We are working on an improved version.

Good morning. My name is William Stewart and I manage the Core Audio group and we are doing three sessions this morning in this room on audio, primarily on iPhone. We're also covering some general audio topics that are relevant for the desktop as well as iPhone. The set of frameworks and services that we provide are quite extensive and cover a number of different areas and we'll be going through some of these in the talk. The Media Player API is just a general kind of remote access for the iPod application and the media library and the iPod.

And the iPod itself uses the same sets of APIs and frameworks that we're discussing with you today. So it's all, you know, what Apple uses itself to implement its features is, of course, the same things that you get to use as developers. OpenAL is an industry standard, I guess you'd call it, for doing games and we support that on the platform for game audio.

AV Foundation made its debut last year with some very simple AV audio player and recorder objects and this year it's become quite an extensive framework with a collection of video. And there's a whole series of sessions on the new classes in AV Foundation but there is some specific audio functionality in this framework as well and Allan will be going through that in a moment. And all of these technologies are really built on a collection of services that are rendered through Audio Toolbox and that's really the primary services that my team delivers to the platform.

That includes services for reading and writing audio files, for converting data formats, for using audio units that provide processing, mixing, all kinds of things -- basically the collection of tools that you need in order to do audio. So that's a very general introduction. There will be some more data overviews through the sessions this morning.

The sessions that we have is this one and one of the things that will be covered here is how your application integrates with the rest of iPhone OS. So it's primarily audio session and managing the resources of the platform and how you can best use them. What we thought we'd do in the second session is take a step back from talking -- rather than sort of focusing specifically on APIs, we thought we'd rather take it from a different angle. And so what we're doing in that session is looking at: what are the fundamentals of audio?

If you're talking about digital audio, not just on our platform but on any platform, what does that mean? What is Linear PCM? What is AAC? How are these things different? How do we express them in our APIs? But really more like: what is the fundamental features of these things?

And our APIs are very fundamentally shaped by what audio looks like and as a data format in some of the constraints in terms of time and resolution and everything that we deal with by dealing with this media format. And then the last session, Audio Development for iPhone OS, is really taking a more detailed look at how to use audio units. So what do audio units look like for your application? How do you interact with them?

And we're also taking a little bit of a forward-looking stance at that and looking at some of the general ways that you can deal with more complicated processing demands with IU graphs and so forth. So that's enough of me talking. I'll get Allan to come up and he'll begin his discussion on AV Foundation. Thank you.

[ Applause ]

[Allan Schaffer]

Great. So thank you, Bill and good morning everyone. The AV Foundation has a number of high-level classes that you can use for audio playback and recording. And this is where we're going to spend most of the time in this session. So I'm going to be talking about the audio player, which lets you play back audio from a file or from data in memory. I'll talk about the recorder, which lets you record audio, capture from the microphone and record that to a file.

And then I'll talk about the audio session, which, as Bill said is going to let you manage the audio behavior of your application on the device. There's a fourth class that actually I'm not going to be covering in this session but it's worth taking a look at as well.

We covered it yesterday in the AV Foundation sessions for video and that is the new stream player. Now, a lot of the new functionality in AV Foundation has been geared towards a lot of very expressive video functionality. And so that class is part of all of that but it can also be used for audio, to either play audio from a local file or to stream it over a network. So I'm just going to jump straight in. Let's talk about the AVAudioPlayer. And this is really a very simple class for you to use to play a sound.

It supports a variety of different file formats and the ones that are supported by the audio file service is API. So that's things like caf files, m4as, mp3s, and so on. And the class provides a number of just basic playback operations -- to play a sound, stop, pause, move the playhead around, and so on.

And with this object, if you want to play multiple sounds simultaneously, you can do that as well -- what you do is just create multiple instances of the object and have each one controlling the different sounds. There's also a number of properties that I'll go through in just a moment but things like volume control, you can enable metering, you can have the sound be looping as it's played back, and the few new features in iOS 4 -- the object now supports stereo panning from left to right.

It also synchronizes playback if you have multiple instances playing simultaneously. So right away, I'll just jump into some API here. To get started with this class to instantiate an object, you just -- we'll call init with contents of URL. The URL needs to be a local file that's in the sandbox for your application or you can create it from an NSData.

And one quick side note before I go on: All of the code snippets that are on the slides in this talk are available on the attendee website so you don't need to worry about writing them down. You can just go and download them right after the talk or go ahead and do it right now.

Now, there's a number of properties on the AVAudioPlayer that let you control playback. And you can either set these up before you begin playing or with a number of them, actually, you can just change them as the player is playing a sound, so you can change the volume -- here I'm setting it to 100% of the current output volume. You can change the panning -- here I have it set all the way to the left. If I had it to 1.0, then it would be all the way to the right.

The number of loops here is something also you can control. So 0 means no loops, -1 means loop indefinitely, or you can have a specific number of times that the audio will loop back after it's played through once. You can have direct control over the playhead as well with the current time property.

And so if you want to implement something where you are scrubbing around in an audio file, you would just be changing the value of this property. Or if you want to reset the playhead to the beginning, you set it to zero. And there's a delegate as well that we use for notifications.

And then other properties, some that I mentioned -- so to be able to enable metering of the playback, you can find out of the duration of the audio file that you're playing, the number of channels, and the state of the player as it's playing your sounds. Now, the playback controls here are really simple, there's just these four controls that you'll be using all the time with this object.

PrepareToPlay is actually probably one of the more important ones. This is going to get the player ready to play a sound for you with absolutely minimal lag. So what this will do is allocate the buffers that the player is going to use internally and prime those buffers with your data. And that way, when you go to invoke the play method, it can happen nearly instantaneously.

So the play method just starts playing your sound now. And so -- if you later on you pause or stop the sound, the play method will resume from the point that it leaves off. And that's maybe an important note to make is that if you had expected, after you stopped playing a sound, for the playhead to go back to the beginning, that actually isn't the behavior.

The playhead stays where it was, just like a tape deck. And so if you want to go back to the beginning, you would reset the current time to zero. Pause is going to pause the playback but with Pause, the player stays ready to resume again; the cues and the buffers are still going to be allocated and ready to go. And that's the difference between Pause and Stop. With Stop, the cues are disposed of and the buffer is disposed of. So if you want to restart playing after you have stopped, after you've invoked the stop method, you would probably call prepareToPlay and then later on call play.

Another element of the player class are some delegate methods. So these will be invoked when certain events happen. And probably the one that's very important for you to implement is when the player is finished and that's called audioPlayerDidFinishPlayingsuccessfully. And in that method you might clean up, you might change the state of your interface, and take care of other things just to indicate to the user, "Okay, the sound is no longer playing."

Then there's a number of other delegates that you can implement. If there's a decoding error in the file that you played back or if interruptions began or ended -- interruptions are things like a phone call came in for example, and I'll talk about interruptions a lot more towards the end of the talk.

So let's just put this together and look at playing a sound. So here in this method, I'm being passed in the URL of a local file; I create my player object; I next set the passing appointer to that URL; I set up a delegate for any of the notifications that I may want to have fired later; prepare the player for playback and hit Play. So all of this is really -- this is just to show that this is a very simple class but a great way for you to get started with audio playback in your app.

So that's the player. Let me now go just about as quickly through the recorder. Another simple class -- this lets you record audio to a file and its behavior actually is that it will either record until you stop it by calling it stop method or you can set it up to record for a specific duration.

And it supports a variety of different encoding formats, I've listed a bunch here -- AAC, ALAC, Linear PCM, and so on. AAC is interesting though because we have hardware support for the AAC encoder on certain platforms -- the second generation and third generation iPod touch, we have hardware support for it on the iPad, the iPhone 3GS, and of course on the iPhone 4.

Now with the AVAudioRecorder, the API here is really just the mirror image of what you saw with the player, so I won't go through it in quite as much detail. You initialize the recorder with a URL to a local file. One difference, though, is this next parameter: the settings dictionary. I'll cover that on the next slide so I'll come right back to that.

There's a number of recording controls that you can manage, so prepareToRecord, Record, or Record for a particular duration, then Pause and Stop, and sort of your predictable properties that you can get about the state of the object and the state of the recording. Now about that the settings dictionary.

So when you're recording, you need to specify exactly what format you want to record into, what sample rate to use, the number of channels, and then perhaps format specific settings as well, like for Linear PCM you'd specify the bit depth, the endian-ness. For certain encoded formats, you might specify the quality or the bit rate and so on. So let's take a look at that.

Now this looks like a lot of code but really, it's actually very simple. All I'm doing here is setting up a dictionary containing key value pairs for each of those encoding settings. So on the left here, I'm setting my format to AAC, the rate to 44100, the number of channels to two, the bit rate to 128K, and the audio quality to the maximum.

So all of that is just being packed into an array and I give that array now and create a dictionary from that. And then that dictionary is what I pass when I initialize the audio recorder. And so now it's all ready to go. I can call its methods to prepareToRecord and start recording or record for a particular amount of time.

Okay, so I know that was pretty quick through those objects but they're really just very simple. The thing is, though, that they're very feature-rich so they do suit a lot of the basic needs of audio developers who just want to play and record some sounds. If you saw the Quest demo on Monday or yesterday, for example, we're using this API to play all of the game background soundtrack. And so, you know, it's just a very, very capable API for doing that.

And really, it's recommended as the starting point for you in most cases. You know, it's the good starting point unless your requirements go into more complicated uses for audio. So if you need access to audio samples, for example for processing, then you might use the audio unit's API, which Murray will be covering in the third session.

If you need to do spatial 3D positioning of audio sources in a game, then OpenAL is perfect for that. And so you would probably choose that API in that instance. And if you need to do network streaming, then you might use the new AV Player or you might go into the audio file streaming services API. Okay, where I want to go next, though, is into audio session management.

And this is really an important topic for developers to all get absolutely right in their applications and that's why we put so much focus on it. And really, I'm going to dedicate the rest of this talk to this topic. So the idea here is that this is how you can manage the behavior of the sounds in your application and make them behave according to both the expectations of the user for the kind of application that you're writing and to be consistent with either built-in applications or other applications of the same time of app as yours.

What you're going to do with this API is to categorize your application into one of six possible categories and then your app's audio is going to follow the behaviors that are defined for that category. Then this is also the API that will let you manage some of the shared resources on the device and do things like mix with background audio.

It's how you'll interact with interruptions if they occur, say again, if a phone call comes in and it's how you can handle changes in the routing if the user were to, say, plug in or unplug a headset. Now there's actually two APIs here that are relevant to what we're going to be talking about: a high-level API and a low-level API. So the high-level API is the AV audio session class. It's an Objective-C class, part of the AV Foundation framework.

And really, it wraps up all of the most commonly used functionality that you need to manage with the audio session. Then there's also a lower-level API called Audio Session Services and that's part of the Audio Toolbox. And that's really all of the implementation that we expose to you. C-based, a lower level and has a bit more functionality.

But what's interesting to note is that it is possible and quite okay for you to mix and match between the high-level and low-level APIs. In fact, what's quite typical is you might set up your audio session using the high-level API and then maybe just drop into the low-level API to set some overrides or other things that aren't exposed at the high level.

So there's five basic tasks that we're going to go through for the remainder of the session here to talk about with AVAudioSession: We're going to set up the session and configure its delegate; we will choose -- very carefully choose and set -- and audio session category; we'll go active and we'll talk about that; and the things that are new there in relation to iOS 4; then I'll talk about how we handle interruptions and handle route changes. So first setting up the session -- very easy.

The audio session instance is just a singleton object for your application. So you just retrieve a handle to that. You'll set up a delegate for any notifications that might occur, like an interruption. And this is the place where you might request certain preferred hardware settings. Just for example, in this case I'm requesting a sample rate of 44100.

But now the second part is really -- this is the most important part of using this API. And it's not a line of code, it's a choice that you have to make for your application. You will choose instead a category based on the role that audio plays in your app and the kind of application that you're writing. And there's six possible categories: playback; record; play and record; audio processing; and then two more -- ambient and solo ambient. So let me show you how these differ. So I have the categories listed here on the left.

Their intended usage will be a column that we build out as I talk about this and then a number of the behaviors are listed in the table as well. And it's things like whether each category obeys the ringer switch, meaning, is your audio silenced if your user flips the ringer switch to silent?

Is your audio silenced if the user locks their screen? Does this category allow your sounds to be mixed with others? Does it use input or does it use output? And is it allowed to play in the background if your application is transitioned into the background? So let's have a look.

Okay, first with playback. This is probably the most straightforward of all of the categories because it's intended for such an exact purpose. This is the category that you should choose if your application is an audio player or a video player. So if that's the primary purpose of your application is to output media, then you would choose the playback category.

And you can see that the behaviors here, in a way, are very similar to what you see with the iPod application on the device. This does not -- applications that are using this category are not affected by the state of the ringer switch; they can continue to play if the user locks the screen.

Now MixWithOthers is not enabled by default by saying optional here, it is possible here for you to optionally turn it on. This is an output category. And the last one. With this category, it will allow your application to continue to play audio through a background transition if you have set up the audio key for the UI background modes in your info plist.

Now Record is very similar. But of course, its intended usage is for audio recorders, applications that are doing voice capture, that sort of thing. The behaviors, though, are right across the board mostly the same. Obviously this is an input category that uses input instead of output. But it will also survive through a transition in the background if you have set that key. Play and record.

Well, this one is essentially combining those first two. So this is intended for applications that are doing voice over IP or voice chat types of apps. You can optionally mix with others, just like with the playback category. But with this one, this is using input and output simultaneously or enabling you to do that, so both of those are shown. And if you choose this category also, your application can go into the background. One important difference with this category, though, compared to what we've seen, say, with the playback category is the default audio route.

So with the playback category, by default your output will go through the speaker. With the play and record category, your output will go on a phone to the receiver, which is the speaker you hold up to your ear when you're on the phone. Then the audio processing category. All right, so this one is used for offline conversion of audio file formats, offline processing of audio data, and you can see that actually many of the behaviors are similar but it's not using either input or output for any sounds.

All it's doing is doing this processing in memory. Now, one thing about -- a special note about this, about how I say that it's allowed in the background. So yes, if you set up your application to use this category, the processing can continue in the background. But unlike the previous three, just setting this category alone does not enable your application to transition into the background and keep running; you would have to use one of the other ways of going into the background. So for example, maybe you would just be asking for extra time to do processing and that's how you would transition into the background.

Now, these next two are very similar to each other, so I'm just going to put them up simultaneously so you can see the difference: ambient and solo ambient. But the purpose of these two categories is very different from the previous four that we talked about. The top four are really intended for applications whose main purpose is very audio-centric, right? A playback app.

You know, an audio player, a voice recorder, a VoIP app or something doing audio conversion, so very audio-centric kind of purpose of the way that application uses audio. These other two are really intended for much broader purpose kinds of apps. So games and productivity apps or utility apps would probably choose these latter two categories and the reason is because of the behaviors that they enforce are consistent with the behaviors that users expect for that kind of application. So what you can see is, well, these both obey the ringer switch.

That means the user is playing your game or using your to-do list app or whatever your app happens to be that's using in this category. If they're doing that and they hit the ringer switch, well, the audio will be silenced, which is exactly what they expect. And again, it's because the purpose -- the usage of audio -- is not critical to the purpose of the application. It's perfectly expected that you can play a game with the sound turned off, or many games at least.

It's expected that you can use your productivity apps like Mail or Safari and so on and have the sound turned off in those as well. They will both obey the ScreenLock as well, that means that audio playback will stop if the user locks his screen. I'm going to jump over. They both are output categories and neither of these enable your application to transition and continue to play audio in the background. But the difference between them actually is this MixWithOthers parameter.

And so I'm going to be talking about that in a little more detail, really, just coming up next. But it has to do with whether your application needs access to a hardware codec or not. With MixWithOthers with ambient that means you would use that for applications that don't require access to a hardware codec, don't need to use it. Excuse me.

With solo ambient, that's the category you would choose if your, you know, game or productivity app does require access to a hardware codec for decoding. Okay, so here is where we are setting category. So we've gone through the table now, we've made our choice for the specific kind of application or applications that you guys are writing. And in this case, we're choosing the ambient category. We go on to the next part here and set our session to be active.

So to do that, all we do is call set active, tell it yes, we are now active. And once we're active, now we can play sounds or record sounds if we have chosen the record category and go on to set up our audio APIs, handle interruptions, and so on. Like we have now asserted that we want to make use of the audio functionality on the device. But let me go back now and talk about going active.

So it would be typical for most applications to just go active when they start up and stay that way for the remainder of the app. But there's a few classes of applications that should not do that and it's because of the interaction that they have with other audio that might be playing in the background.

So with a voice recorder application, for example, or a VoIP app or a Turn-by-Turn app -- well, all of those should have different behaviors related to music that might be playing on the iPod when they start up or there may be an email notification sound and so on that might happen. So with those kinds of applications, you want to be a little more clever about when you go active. And the story is that you should only go active in those kinds of applications while you're actually doing those things.

So in a recorder app, you would only go active once you actually start to record, not just when the application starts up. And you would go inactive as soon as you're done recording. On a VoIP app, you would only go active while you're on the call and then go inactive when you're done. And in a Turn-by-Turn app, well, there's some specific behavior that we would want there, which is that let's say the user had been playing music in the background when they ran your app and now it's time for you to announce the next turn, "turn left."

What we want to happen there is for the iPod music to be ducked -- for it to lower its volume -- you make your announcement and then you go inactive to bring the iPod music back up again. The same would be true if it was audio playing from a third-party application in the background as well. So let me just show you in specific how you might go about that. I want to focus on the VoIP app and the Turn-by-Turn app.

So again I said, with the VoIP app, you would go active when it's time to start the call -- and this is going interrupt sounds that might be playing in the background -- and then go inactive when the call is over. And what we can do, there's actually a new method in iOS 4.0 called setActivewithFlags. And what that can do is if you set that flag to notify others on deactivation, then that other, the background process that was playing audio can be notified when the call is over.

And it will be told, "Ah, okay, the call is done. You can go and resume your audio now," with Turn-by-Turn navigation types of apps. As I said, there's a couple of things you want to do here so that the other audio gets ducked while you make your turn announcement and then comes back.

And this is in three steps here. So this is the first step, the setup. The first thing that we're going to do, of course, is just choose the right category to put our application into. So the main purpose of a Turn-by-Turn app would be to announce these instructions. So we would need that to happen regardless of the state of the ringer switch, the screen locking and so on. So it will choose the playback category.

Now next, though, we're going to set an override on that category and this is the part where I said was optional before. We're going to set this override to enable our category -- override our category -- to MixWithOthers. So "others" being, say, the background music coming off of another app or maybe from the iPod.

And then third in this step is that we're going to enable other mixable audio -- or say that the other mixable audio should duck when we go active. And so that's what's going to lower its volume when we go active and then when we go inactive, it will come back. So here are those two parts. So when it's time now for us to make the Turn-by-Turn announcement, we'll go active, we have some audio player ready to go with the sound of that announcement, and we play that.

And then whenever that is done, when it's finished making the announcement -- for example, if we were using the AVAudioPlayer, we could do that through the delegate -- that's when we'll go inactive again. But now I've been talking about mixing with background audio so let me go into a little more detail about that. And there's really a convergence of a few different topics here.

So your application might be playing a variety of sounds, it might be taking advantage of a hardware codec, it may be using a software codec, or just playing straight through with the mixer. Another app might be running in the background and also playing sounds. So it might be the iPod application is running or it could be a third-party application now that's running in the background. And so we have to sort of arbitrate: Well, what's the user going to hear?

And it's going to depend on things that are going on in both of the applications. It will depend on what category both of the applications have set. And related to that, it will depend on whether either one of those have enabled MixingWithOthers if they happen to choose the playback or play and record category. So all of this is ending up to define something that I call "mixable" or "nonmixable" as a state of your application.

If your application -- now by default, the only mixable kind of application is one that chooses the ambient category but if you choose the playback or play and record and override it to MixWithOthers, then those become mixable. Otherwise, everything but ambient would be nonmixable. So let me show you this just in a picture what's going to happen here. So let's say that this is your app, the foreground app, and you're playing in the AAC file or some of you might be playing an mp3 file.

But just for the sake of discussion, I'm going to use AAC -- but just bear in mind that the same things would apply if that were the case. Now, if you had put your application into a category that is nonmixable, then you will be able to take advantage of the hardware codec to playback that compressed track. It will go into the mixer and out to the playback hardware.

Now, if you had chosen a mixable category, then what will happen is that actually your sound is going to be decoded in software. So we have mp3 and AAC and a number of other software decoders and those will just run on the CPU to code your audio, have that go in through the mixer and out to the playback hardware. But okay, so this is the basics and you guys are probably already familiar with this part but now what happens if there's a background application?

The thing is, it's the exact same story. So a background application -- let's say that it's playing a music soundtrack of its own, it has an mp3 file. If it chooses a nonmixable category, then it will be able to take advantage of the hardware codec and its sounds will play through the mixer, they will be mixed with your music soundtrack, and out to the playback hardware.

The same thing again. If they choose a mixable category -- if this is the case -- and you're mixable, too, then both of these will be decoded in software and play out through the playback hardware. But there's one case that you need to be thinking about: What if you've chosen a nonmixable category for your application, meaning you're asking to use the hardware codec and there's something else that's going to be running the background, maybe when your application started, there was already something there, what's going to happen with it?

Well, the result will depend on the category that the background app chose. So if they chose a mixable category, then they'll get a software codec and both of these are going to be mixed together. So even though you have chosen a nonmixable category for your app, since they have decided to choose a mixable category, essentially they're playing nice.

They're saying, "Okay, I can pretty much mix with anything." And so both of these sounds will be heard; they'll be mixed in the CPU and sent out to the playback hardware. But if they've chosen a nonmixable category also, then the sounds from the background app are going to be silenced when your application goes active.

Those are the different cases now for mixing with background audio. Now, what's interesting, though, is that there's actually a way for you to detect in advance what might happen and therefore for you to decide maybe of a different category, depending on whether something is already there. And you may have seen this in some apps that are doing something like this: they'll say, "Hey, do you want to play the game sounds, like, do you want the game music soundtrack to play in the background or do you want the iPod or a third-party app soundtrack to play in the background?" And it depends, the user either says yes or no. If they say yes, well, then a couple of things happen.

If they say yes, this is a game, so we would usually either choose the ambient or solo ambient categories. So in this case, they say yes, it means, "Okay, then that means they want my game music and my game music is an mp3 file or an AAC file, so it's best to use the hardware codec. So I want the hardware codec." So I'll use solo ambient and I'll play my game soundtrack. And if the user says no, well, then I'm not going to use the hardware codec.

Maybe all I'm going to do is play sort of the incidental sounds in my game, you know, the bullet sounds or just a momentary sounds. But maybe those are something that would be okay to decode in software or they may be Linear PCM, WAV, or AAIF files that can just be mixed directly.

So in that case, I'll choose ambient, saying, "Hey, I'm fine to mix with others and there's something else playing in the background. So I won't play my soundtrack, I'll just play my incidental sounds." So all of this is fine. This is a perfectly good way -- logic -- to use for defining your app but one part that just might be not be necessary is to leave this choice as something that the user has to figure out when they first start the application up. Instead, you can just detect this programmatically. So there's an audio session property called OtherAudioIsPlaying.

And it will come back you know, 1 or 0. And you can use this to decide whether or not you play your game music soundtrack and really, you'll use this to decide what category to use, whether or not to enable MixingWithOthers or not. And so I show you here just how you can get at this information. So AudioSessionGetProperty, I pass that token above and I get back the result.

Now one change that's important to note in iOS 4 is the behavior where your application may be suspended. So prior to iOS 4, it would be very typical to just put this in the beginning of your application. Maybe application did finish launching, you would check this, you'd have a value that was valid for the entire run of your application.

But now something that might happen with, say, a game is that user might come into your game, start it up, and then realize, "Oh, you know what? I want to listen to my first person shooter soundtrack instead of the game sounds." And they suspend your app, they go over the iPod, they start up their playlist, and then they come back into your game. So you don't want to have already made your decision about this, you want to recheck it every time your application comes back from being suspended. So check again in applicationDidBecomeActive.

Okay. One more thing, we've been talking a lot about the behavior about mixing and how to detect that when your session goes active. We talked about the behavior with, like, recorders and VoIP apps and Turn-by-Turn apps, when they go active. There's one more thing about going active in a tip or a change that occurred that I want to just bring to your attention. It's a behavior change with the MPMoviePlayerController. So some of you guys might be using this to playback video; very simple class for that.

But its behavior has changed in relation to your audio session category. So prior to iPhone iOS 3.2, the movie player had its own session -- playback was its category. And so that would interrupt your session potentially, it might silence other audio because that playback by default is a nonmixable category. And so this could have a number of effects.

Well, now in iPhone iOS 3.2 and above -- so iPad and then all the devices that support iOS 4 -- the movie player controller now uses your audio session; it just inherits whatever setting you made. And so this is actually nice. It's something now where the movie player's behavior is now made consistent with your app. But of course, you have to be just aware of this change in case you were relying on the movie player to silence other sounds or do something that was kind of a side effect just of you playing the video.

Now, if you want to go with the default behavior -- sorry the new behavior -- then you do nothing, it's just the behavior's changed. But if you want to revert back to the way it was prior to iPhone iOS 3.2, there's a property on the movie player object if you set that to false. Use application audio session, you set that to false, then the movie player will go back to choosing its own category. But all right.

So folks, we have gone through three of the five basic tasks. We've done setting the session in delegate, we talked about making the right choices as far as your category, and then going active and all of the effects that going active can have. So two more things to talk about and that's interruptions and route changes. So let's go into interruptions. The thing to understand here is that your application's audio might be interrupted at any time.

So it could be interrupted by a phone call, a clock alarm, if you're running in the background, it could be interrupted by a foreground application. So what will happen if you are interrupted is that your session is just made inactive and whatever you were doing with audio is stopped -- if you were playing, it's no longer playing; if you were recording, then that is stopped.

And it just -- this just happens to you -- it's not a request that this is about to happen, it's just done. So what you can do -- there's certain steps, though, that you can take in reaction to that. Of course, you might update your user interface to reflect that this has happened.

But more than anything else, you're interested in what happens after the interruption has ended. So if the user has declined the phone call and comes back into your app, you're interested in getting your audio restarted again. And there's a number of other cases as well where, you know, you just need to be -- all that you're really wanting to do is get back up and running after the interruption. So there's a few different things that you need to do.

So the first is let's say that we've set up our audio session and we've implemented these two delegate methods. Begin interruption and then this one is new in iOS 4, it's endInterruptionWithFlags. The old delegate is still available as well, just endInterruption. But this lets you get a little bit more fine-grain control. I'll get to it in a second.

So when the interruption begins, that means the phone call has come in. You know, as I say, the playback has stopped, you are already inactive and so what you really should just do is change the state of your user interface to reflect that. If you're a game, you would probably go to your pause screen.

If you're an audio player, well, you would change your playback icon from whatever it was to something to say, "Okay, restart again." Let's say that the user declines the phone call and is now back in your application. Well, now the interruption has ended and there's this flag that can be passed into you that will tell you, based on various characteristics of the interruption whether or not your session should resume, okay?

Whether you should start playback, restart playback, or restart recording. And assuming that that's the case, meaning if you're past the flag -- AVAudioSessionInterruptionFlag should resume -- then now you can just fire everything back up. So you will set your session as active again, you can update your user interface, and resume playback. So that actually, though, is just the general case. There's actually a few instances depending on if you might be using one of the lower-level audio APIs from the Audio Toolbox or using OpenAL that you need to take care of. So let's start with OpenAL.

Now OpenAL has this concept of a context that is analogous to the position of the listener in the OpenAL's world. And the context does not survive through an interruption, does not stay current through an interruption. So what you need to actually do is when the interruption begins, if you are using OpenAL, then at that point you will want to invalidate the context. So you do that by calling alcMakeContextCurrent and just passing it nil -- that invalidates the context.

Then when interruption is over, you can end that interruption and pass it -- well, excuse me -- you implement the endInterruptionWithFlags delegate method, you check to see if the flag says that you should resume, and if so, then you can set yourself as active and now is when you make your OpenAL context current again.

Another case that you have to take a few extra steps is if you're using the AudioQueue API and an interruption occurs. So if you're using this API, well, then first of all, when the interruption happens you probably want to save your playback head or the recording position just in case your application actually gets quit or suspended for later.

But now the decision about how you restart will depend on whether your application is using a hardware codec or a software codec for whatever it's doing -- its playback or recording. If you're using a hardware codec, then that attachment cannot survive through the interruption. So you will dispose of the currently playing AudioQueue when the interruption begins and then when the interruption is over, you will create it back again and start it again with a new queue whenever you're ready. Now if you're using a software codec, then it's simpler, there's no need to dispose and restart -- excuse me -- dispose and recreate the queue, all you have to do is restart it, for example here with AudioQueueStart.

This actually, I just -- one side note that I put up here. This can actually get a little intricate though. So we wrote a technical Q&A that has some snippets of code that you can take a look at and it's up there, number 1558, up on the developer website.

And the last topic here is routing. So the behavior that users expect in terms of the audio system routing is that whatever gesture they have made most recently is taken as an expression of their intent for where the audio should be routed or saying it way more simply, "whatever happened last, wins."

So if the user plugs in the headset, then we take that as an expression of their intent -- that they want the audio to now be routed out through the headset. Or if they were using the microphone, they plugged in a headset with a microphone, then they want the audio input to be taken from the headset microphone.

And now usually along with that, there's some behavior as far as whether the audio should continue or not. When they plug in a headset, we want the audio to continue to go -- if it was output -- to just keep playing without pausing at all. But when they unplug the headset, okay, that's also taken as an intentional gesture from the user to change the routing on the device. We'll route back to whenever it was previously, probably the speaker and in that case generally we want audio playback to pause so that they unplug their headset, it doesn't just start blaring out through the speaker at them before they have a chance to take care of it.

But okay, so these are the behaviors that users expect and so there are ways for you to respond to route changes that will let you implement this in your app. So really there's just three topics I'm going to mention here: so it's possible for you to query the current route; it's possible for you to listen for changes and then in response to those changes, you might either keep playing or stop playing as I just said; and it's also possible in limited cases for you to redirect where the output is currently going.

So first of all, just getting the current route. So this is just another case where you can call AudioSessionGetProperty, pass it this token that I have at the top of the audio route. So this is going to give you back CFStringRef, just with the name of the current route. And so you can see here that I'm outputting that to the log. But it will just tell you, "Okay, the current route is the speaker, the headphone, the receiver," and so on.

Now, more importantly, though, is that you probably actually would rather be listening for changes to the route than to just know where it is now and output that to the log because you want to react to those changes accordingly in your app. And what you do here is -- now this is down in that lower level API, as I had mentioned earlier -- you're going to set up a C-based callback using AudioSessionAddPropertyListener. And basically you will be registering for notifications of a route change.

And you pass it this token, AudioSessionProperty_AudioRouteChange. And then your callback is going to be told the reason why the route changed, like the user unplugged something or plugged something in and you'll also be informed what the route had been previously. And of course, through what I'd just shown you, you can also find out what the route is now. So here's the code, just to get this started.

We're setting up a property listener here. AudioSessionAddPropertyListener with the audio route change, we set up our C function and you can optionally pass it some data as well. Now what that C function is going to look like, this is what you would have written. The final parameter to this function is this void* with some data.

And that data can be cast to a CFDictionaryRef. And inside that dictionary is where you're going find the information about what the old route was and the reason why it changed. So that's all I'm doing here, getting those values out. And then you can act accordingly in your app.

Now, the third one that I mentioned is that in certain limited cases, it is also possible for you to redirect the output. Now we actually limit this in most cases because for most of the instances where the output should be rerouted, we want to leave that up to the user.

If the user has plugged in or unplugged the headset, then we take as an intentional gesture from the user to change the route. And we really don't allow for third-party apps to interfere with that. But there's this one case where it may be necessary for an application to change the route itself and that's if you're using the category play and record.

Now you might remember from earlier in the talk how I mentioned that play and record by default will output to the receiver, that speaker you hold up to your ear when you're on the phone. Well, let's say that you're writing a VoIP app and so you're going to be implementing something where normally that's where the output would go but you also want the option of rerouting out to the main speaker for like a speakerphone mode.

So that's really what this allows you to do: You can set a property to override the audio route and in this case, if you were in the play and record category and you were already outputting the receiver, you could redirect that out to the main speaker. But okay, folks, so actually that takes us through our five topics on the agenda here.

As I said, we set up the session, we chose a category, and choose that very wisely for the role that audio plays in your application. We made the session active and saw all the effects that that might have with relation to background audio and mixing and so on. Then I talked a little bit about handling interruptions and route changes here in the end.

I just wanted to mention a couple -- as Bill had said, some sessions coming up so the next session is going to be about fundamentals about digital audio, it'll be right here in this room. And the third session in this room as well is going to go deep into the use of audio units, which are awesome for writing applications that need to do more intense audio processing.

Here is actually my contact information: I'm Allan Schaffer, so my email address or you can contact Eryk Vershen who is our media technologies evangelist if you want to get in touch with us after the show. And a couple more notes: The audio session programming guide has a lot of great information that goes into a little more detail than what I just spoke about, so be sure to check that out and really make the right category choice for your app. And then finally, we have the Apple Developer Forum. So if you have questions about audio and want to talk about just among yourselves, among other developers and along with us, check out the dev forums. So thank you very much.

[ Applause ]