Audio Development for iPhone - WWDC 2008

Media • 1:05:40

Core Audio provides a powerful engine for playing and recording audio in your iPhone application. Learn how to play sounds and alerts, record audio from the built-in microphone and play sound files of arbitrary length. Understand the best practices to minimize latency and conserve power. Learn about the audio codecs and formats available for iPhone and understand the capabilities for playing multiple sounds simultaneously.

Speaker: Bill Stewart

Unlisted on Apple Developer site

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.

So welcome to this session on audio development for iPhone. My name's William Stewart and I work in the Core Audio group. And in the conversations today in this session, we're going to be going through all of the various services for doing audio on iPhone. And I thought what we would do to do this is to go through a set of tasks that we think are representative of the kinds of things that you would want to do.

So we're going to look at doing system sounds, we're going to look at playing back and recording files, we're going to look at audio in games, and we're going to look at doing audio units if you want to do some more low-level processing. And we're also going to look at some of the general behavior of audio. There was a question in the last session in games about managing audio as calls come in and so forth. So we're going to have a look at that.

And I'd just like to say a personal note of it's very interesting to see all of the games that are coming out, seeing the use of the audio system and the way it's being used with MooCam Music and the games yesterday. So it's very exciting and I'm really glad that you're here and we're looking for some great things.

So the APIs are really what we want to cover here today to look at what you're going to learn in the session and what APIs to use for what purposes. And also to have a better understanding of the types of experiences that your user is going to be having.

So the audio system is going to have with your game in the larger context of the phone and the audio system that it is using. So the technology frameworks that you have to look at are primarily the audio toolbox framework. And this is the primary APIs for most of the functionality. The audio unit framework provides some headers that you use for if you're doing the lower-level stuff. And then there's also the OpenAL framework, of course, which we'll get into in a moment. So the first step is system sounds.

System sounds are basically small sounds. UI sounds, mail send, keyboard taps, SMS alerts, these kinds of things. If you've got a very simple game, this could be all that you need if you're just making kind of tick sounds, stuff like that. It's a very lightweight API, very lightweight use on you. There's no individual control for volumes or for panning. So it's just very much a play and forget sort of API. And let's have a look at what that looks like. It's in the audio toolbox framework, audioservices.h.

And the way the API works is that you provide a CFURL file and then the sounds must be short, so they should be less than 30 seconds. Even if you're over 5 or 10 seconds, you might think about compressing the sound with IMA or something similar to that because it will make a smaller memory footprint when we load the sound into memory. And then when you're finished playing the sound, if you're only going to play it sporadically, you could dispose the sounds as you go. If you're going to play the sound multiple times, you just keep it around.

So playing the sound then is just a simple matter of calling one of the two APIs that you see here. There's play system sound, and this plays the sound with normal usage. It'll obey the ringer switch on the phone. It will play back at the sort of the volume that the phone is set at at any particular point in time.

And it's commonly the API that you'll use. If you need to do some kind of alert, like let's say you have a notification, you want to do some kind of an alarm, or you want to attract the user's attention in some way, then you can use the play alert sound variant of this API call.

This will vibrate as well on the phone if the user has got vibrate set on. Of course, there's no vibration capability on iPhone Touch, so it will make a small buzzing noise from the iPod Touch speaker. Because it has this kind of extra actions associated with it, you don't really want to be bothering the user all the time, so use the API appropriately.

And then with the API, because it is a play and forget API, if you want to know when we're finished playing the sound for you, then you can add a completion proc. And this completion callback will fire when the sound is finished playing. You can decide to dispose the sound at that point or you could loop it if you wanted to. And then if you want to explicitly vibrate the phone, there's a sound ID called vibrate. And then you just play this ID as if it were a sound file and that will vibrate the phone.

Okay, so that's system sounds. And what I want to do now is to go through the general sets of APIs that you can use for playing and recording audio files. And there's really two things to look at here. There's the API that is going to read or write audio files and then there's the API that's going to be used to render those files or to get the audio for recording.

So let's just look very briefly at the file services that we provide. There's two different file objects. There's an audio file object and an audio file stream object. Audio files are used for reading and writing audio data. Audio file stream is just a data object and it can only be used for reading and I'll go into some more details about that shortly.

Now both of the APIs are capable of dealing with several different types of file formats. And in Cordio, we talk very differently about file formats and data formats. So a file format is a description of the container of a file of itself, or it could be a network stream or whatever, but it's a description of a container. A data format is the data that's in that file, the specific audio data, and the two are not the same in most cases.

So when we talk about file formats, we're talking about specifications for files. MPEG-4 is a file format, and you'll see .m4a, .mp4 as common extensions. A .m4a file can contain AAC data or Apple lossless data. An .mp3 is both a file format, commonly called a .mp3, which is actually an MPEG-1-03, and then there's a data format that can go into that. And then ADTS is a bitstream that's part of MPEG-2, where AAC was first specified. And this is a similar type of bitstream to .mp3 in that it can be used on a network.

And these are really a case where the file and the data that's in it are kind of the same type of thing, .mp3 or .aac. A CAF file, Cordio format file, was a file format we introduced in Tiger. And this is a file format that we introduced in Tiger. And this is a file format that can contain any data, can contain Apple lossless, AAC, .mp3, linear PCM, IMA, any sort of audio data you like. And then you've got AAF and WAV files. So these files can be read or written on the iPhone.

So with Audio File API, of course, you've got to pair the right data to the right file format. And one of the things about Audio File API is that it's used with the file data that the file is completely there. It's not appropriate for a network download or for a chatcast-style network stream.

But because the file is completely there, you can do arbitrary seeking. You can pull arbitrary data to or write arbitrary data to the file. So that can be things like overviews or regions or marker chunks, that kind of thing. And then Audio File can read or write data to or from the file.

It has a fairly straightforward way of usage. There's two calls that create an audio file object. One where you're going to create a file on the disk or you're going to override an existing file. And to do the create call, you just provide the data format and the file format for the new file.

And then if you're going to open the file, typically you'll do this for reading. Then you do audio file, open URL and you provide, if possible, a hint to the call to tell us what type of file it is that you're wanting us to open. Sometimes the extension doesn't match the file type. Some of the files are very difficult to determine because they could be one of two or three different things. So if you know, giving us a hint is very useful.

And then once you've created the audio file, you can go ahead and open the file. And then once you have an audio file object, as with other Cordio APIs, we have a property semantic where you can get and set properties of files. And then to get the data from the file or to write the data, it's just a straightforward audio file read packets to read from this particular audio packet for this many packets into the buffer you've provided. And writing is the inverse of that, of course.

And then that's fairly straightforward. Now, Audio File Stream is a different type of API in the sense that it's not really dealing with the file itself. It's dealing with bytes. And it's bytes that you're pushing into the file stream object. So the file stream is a parser. It's parsing those bytes and it's telling you what's in them.

So because of this, the data doesn't have to be present completely. It's just a buffer that you're going to push into it. So you can use this with the file, in which case you'll open the file and read the bytes from it. Or you could do an HTTP download. So while the file is actually downloading, you could be pushing the bytes through and getting the audio metadata from it.

Or it could even be a shoutcast style network stream. So it's a very simple way to do it. It's not really a file. It's just a stream of bytes and you're kind of jumping in at one point. And then the way the API works is that it uses a notification mechanism to tell you information about the bytes that you're pushing into it.

And we also try to be efficient with minimal copying and to really kind of not have an overhead in us doing the work to tell you information about the stream that you're pushing into it. And so because it's not a file stream, it's not a data stream. So it's not a data stream.

And so because it's not a data stream, it's not a data stream. And so because it's not a data stream. So it's not a data stream. So it's not a data stream. So it's not a data stream. So it's not a data stream. And so because it's not really dealing with a file but with bytes, the API to create the audio file stream object doesn't provide a file, it just provides callbacks.

And they're the callbacks that we're going to give you, to call you with when you've pushed data in and we're going to call you back and tell you something about that data. And as with audio file, if you know the format of the file, it's very helpful to provide a hint to tell us that. Thank you.

[Transcript missing]

And so, AudioQ can play back any supported file format through the use of audio file, of course, and any of the data formats, but there are some qualifications on the data formats that you can use. And this is because the iPhone itself has limitations and some of the formats are going to be decoded through hardware. And so, that makes us, puts a limitation on what we can do.

So, these are the list of the formats that we can decode that we ship with. And the top three, AAC, MP3, Apple Lossless, are all modern codecs, are all fairly complex computational objects. And because of this, their actual work is done in hardware. And you can only have one of these operating at a time. And this isn't just one of each of the three of them, but one of any of them.

So, only one AAC, and then you couldn't have MP3 or MP4, or MP5, or MP6, or MP7, or MP8, or MP9, or MP10, or MP11, or MP12, or MP13, or MP14, or MP15, or MP16, or MP17. And so, that's the kind of things that we can do. And so, the format that we can use is pretty simple. It's very simple. It's a pretty good fidelity. And it's a 4:1 compression ratio.

ILBC is Internet Low Bit Rate Codec and AMR. These are both speech codecs. AMR is a speech codec that's in 3GPP, which is a standard that's kind of based off MPEG-4. And the speech codecs are very small, usually 8 kilohertz, so it's what's called narrowband. And they're very small data streams and ILBC is particularly good for network where you may lose packets. ILBC can interpolate with lost packets and provide a fairly good audio signal. And of course, if you're doing just sort of speech content and you're concerned about size, these codecs are both very good.

And so with AudioCue, you create an AudioCue object and you provide a description to the AudioCue to say what kind of data you're going to play back with this cue. And then you give it a callback and that callback is going to be called by the cue when it's finished processing your data.

And you provide a thread context on which the cue is going to call that callback on. And based on the type of latency or the type of the size of the buffers that you're using with the AudioCue, the thread context can be very important to make sure that you get your callbacks in a timely manner.

And the queue owns the buffers that you're using on it. It's not something that you just kind of can give us arbitrary memory. So you allocate buffers on the queue that you're using and you just provide a size. And a good optimization that we've added since Leopard is the allocate buffers with packet descriptions. And this means that the packet descriptions are in the same memory buffer that we're using internally as the queue.

And it means we've got one less copy and that's a good efficiency for the queue when it's playing back. And then once you've got the queue buffer, you fill it up with audio data, typically from audio file or audio file stream. And then you just enqueue that buffer to the queue and it'll get played. And of course, how do you play it? Well, you start it. You just start the audio queue. When you want to stop, you stop it.

If you've got intermittent data, let's say you've got just bits of audio here and then there's like some indeterminate silence and a bit there, you can keep the queue underneath sort of running by just pausing the queue itself. But you sort of keep some of the engine underneath the audio queue API kind of primed. So you can pause the queue at that point. You can prime the queue if you want to get it ready so that when you hit start, it's going to really get going straight away. And then if you're doing transport controls, you've got audio queue readouts.

If you want to reset and deal with compressed formats, you can flush the queue at the end to make sure that we play out all of the data that's there. So that's a very simple overview of the cue for playback and cue in recording is actually not that different. So in this case you'd use audio cue to record and then we'd use audio file to write the audio file as we're getting the data.

So, AudioQ and AudioFile together can, as you would hope, record into any supported file format that we can write and any supported data format. And so what are the data formats that the AudioQ can be constructed to use? And linear PCM, uncompressed. We do have an Apple lossless encoder. And so this will compress your audio losslessly. It'll be the same when you play it back as it was when you recorded. And that's about a 55, 50, 60 percent.

It varies a little bit, compression ratio. You've got MU-LOR and ALOR again, which are both 8-bit formats. IMA4 is a 4-bit format. And for speech content, we just have the ILBC codec available for speech. And when you create the Q, you just provide the data format that you want the Q to record into. And it will create all of the codecs that it's needing to get from the device, from whatever the device has got, including sample rate conversion and so forth, to the format that you've specified when you create the Q.

and you give it a buffer callback and the buffer callback is how the cue delivers the data to you and as with the output case, you provide through the thread context and it has much of the same semantics as it does previously. And so as with output, you allocate a buffer for the queue to use and then you enqueue the buffer.

Now of course you don't have anything in that buffer yet because the queue is filling the buffer for you. So you just enqueue the buffer for the queue to use. And then the input queue when it's run, it's going to put data into that buffer until it's filled those buffers and then it's going to deliver them to you through the callback that you provided in the construction.

And so it has the same APIs for playback control. You can start the cue, you can stop it, you can pause it, and you can reset it to kind of get rid of all of the data if you're not interested in it and that kind of thing. So we're actually going to go to a demo now and I'll show you a simple example that we've written to demonstrate playback and record called AQ Touch. So if we can go to the projector. Okay, there's some sound, that's good. So, this application, I'm going to just select the file here and play it.

We believe in very high fidelity audio in the Core Audio Group, as you can see. So the file we're playing here is an M4A file, the same type of file you'd have in iTunes. It's got AAC data in it and we were metering two channels and we were metering that because the file had two channels in it. I'm actually going to record a file now so I'm just going to hit the record button. Hello, welcome to WWDC 2008.

[Transcript missing]

And as you can hear, high fidelity recording. And that's coming in from the microphone. And of course, you know, it's not really great in such a reverberant room as this. But basically what we're also seeing there is that we're seeing one meter coming in on the input side because the microphone is an 8 kilohertz mono input. And so we're reflecting the level that's coming in from the microphone in the meters. If we can go back to slides, please.

Okay, so how do we know that it's 8 kilohertz and mono? How do we know that we're actually getting audio in from the microphone? I had a dock connector plugged in there, why wasn't I getting audio through that? Well, these are all important questions, of course, and it's a good point to step back and just think about how complex the iPhone is as an audio device, because it's actually very complex. It has a built-in microphone, it has a built-in speaker, it actually has another built-in speaker, and we call that a receiver to distinguish between that and the speaker. They're both speakers, but you understand the difference.

It has the ability to plug in headphones and you can also plug in headphones that we call headsets because they have a microphone in them as well. So you can plug in either headphones or headphones with microphones. You can use Bluetooth for calls. You can get Bluetooth which will have both input and output.

You can plug your iPhone into a dock and you can get line out from an iPhone or an iPod touch for that matter. Or you can do the dock connector like I'm using up on stage there. You can also use, get USB output directly from an iPhone and go into a car unit and most of the car units now are using USB as their main transport mechanism.

precisely to avoid the noise problems we had while we were setting up, actually. And then there's also controls. There's a ringer switch control which silences the phone or not, and then there's volume keys. So that's quite a complex device. If you look at your desktop computer, it doesn't have this complexity for audio.

And so we wanted to really understand how this should behave and what do users expect from their phone when they're doing things. And we wanted the behaviour for the users to be consistent. The user shouldn't be surprised, and the user shouldn't have to go and do configuration types of activities.

And so this meant that we had to take a lot of the control of the audio system from individual applications and do it at a system level so we could ensure that consistency. And a guiding principle that we had is that we wanted to do what the user means.

When they plug in headphones, what do they mean? What does that gesture mean for them? When they're hitting the volume keys, are they changing the iPod volume? Are they changing the ringer volume? Are they changing the volume of your application? If I hit the ringer switch, should that always be silent? What if you hit the ringer switch and you have a clock alarm?

So these are all kinds of semantics layers on top of just the basic mechanics of getting it to work that we wanted to take account of. And we wanted to respond to interruptions too. You've got alarms playing, you've got calls being rejected and so forth, and accepted of course. And so we needed sometimes to be able to silence the audio that's playing on the phone because you don't want to miss your alarm because you're listening to loud music.

So the way we're expressing this behavior to your application is through an API called Audio Session. And this is a new API. All of the other APIs that I'm talking about are in Leopard and they're available on the desktop and they behave in the same way. This API is just for iPhone because we're dealing with behavior on iPhone and behavior in the iPhone audio system.

And there's two primary features about Audio Session that is important to us. And that is categories. We talk about different applications being in different categories. So MooCow Music and their sequencer is a different type of application than if I were to bring up notes and just tap on the keyboard or if I were to play an iPod.

And some of the behaviors that are associated with these different categories are things like whether you mix or not with other applications. Would you allow, is it sensible for your application to have iPod playing in the background? And it's also got to do with routing. Where is your audio going? Where a ringtone or an alarm is going to go is going to be different than when iPod is playing back audio.

There's also the question of volume. When you're changing volume, what are you changing? And ringer switch behavior. And then the other concept with session is whether your session is active or not. And when do you make your session active? When do you assert your role and your desires into the system?

So the other thing to think about as a developer and something we have to think about as well is different models because different models will have different behaviors. We use the same code on iPod Touch and iPhone but from an audio standpoint they're quite different devices. iPhone is very complex. iPod Touch really just has headphones and line out and USB out. There's nothing terribly complex about that. It's much more similar to a desktop type environment.

And then who knows what we'll do in the future. We could have a whole bunch of different types of audio possibilities and so these types of things are fluid and they're flexible and they're dynamic. And so the API is based around the fact that we want you to express your meanings, your intentions and we'll take care of the behavior for you. Now we may not take care of it completely right all the time and you can tell us when we're not.

But that's the idea. That's the intention. And so part of the reason I'm talking to you today about this is to have you to understand our thinking here so that you can code your apps to behave and to fit in with the way that we're actually doing our applications as well.

So, to buy into Audio Session, you initialize the session. It's not actually an object. A session is associated globally with your application, so we don't actually give you a session object. You just initialize the session and you provide a callback which we're going to use to tell you when your audio has been interrupted.

And you can get interrupted because you get a call or because the clock alarm goes off. And they're just two examples. Who knows what may else interrupt you in the future. And then the interruption has two states associated with it. There's a begin interruption, so when the alarm goes to ring, we're going to tell you, hey, you've been interrupted. You are no longer playing audio at this point.

We've stopped you. And here's the notification to tell you that we have. Now, you may not get an end interruption. As you know, if you answer a call, your application is going to be terminated. So you won't get an end interruption. There's no one to tell. You're gone.

And if you want to do something at that point, you can use the Cocoa UIs to interface to that fact that your application is being terminated. But if it's just a clock alarm and the user dismisses it or the user doesn't answer the call, then we'll tell you that the interruption has finished and that you're actually free to make audio again.

So now we mentioned set active and when a session becomes active it asserts its behaviors, its requests, its characteristics on the system. So when you become active you may stop other people playing. And when you become deactive the other people may play again or not. It depends on what's going on on the system.

So the set active call is really just saying right I want the device now, I want to make audio or I want to record audio. So Audio Session, just as with all the other Core Audio APIs, uses a property mechanism to get data, to set data. You can instantiate listeners in order to listen to changes in the audio data.

And these are the categories. And the categories are implemented as a property and the property is audio category. So there's six categories defined at the moment. Two of those categories allow your application to mix with other applications on the system. Now, the sound effect one is kind of like a default.

If you think of the Notes app, it would be sort of, you know, user interface sound effects. You're not really doing much. You're happy for anything else to happen on the system. Ambient sound is sort of similar to that. And if you're a game and you want to, or any application and you want to allow the iPod to play music in the background while your game is in the front, this would be the category to set.

And it's not the default. So if you know nothing else about this session from audio session and you want to play music in the background, you can do that. and you want to do this, you have to know this. Now, if your application is much more audio centered, then you need to be one of these four remaining categories. And these categories are going to stop other applications from making audio. You will now own the audio on the system.

Except for one which we'll go into in a moment. There's always got to be one exception. So the four categories of media playback, which you could imagine the iPod would be in that category. Live audio, which could be something like the sequences, the instruments, the iPhone band stuff, all of these would be kind of live audio things. Recording, if you're just doing recording, I'm sure there's at least 20 voice notes apps being written to you. I'm looking for one. And then there's a play and record and this allows you to do both input and output at the same time.

And these categories are going to keep things about them like volume. So when the user sets volume, that's a gesture that the user does. It's not actually a gesture your application can do. It's from what we consider a user's action. And when they set the volume, they're going to set it on a category.

So let's say you've got three applications that do different types of band things. You might have a guitar app and a piano app or something and they all come up as the live audio category, then they're going to share volume for the different routes. So if they've got headphones plugged in, that's going to be a volume for that category on that route.

If they've got just playing to speaker, that's a different audio route but the same category. So those two applications would share volume to that destination. And so this is--this covers across the whole thing. And because we're really trying to do volume as much as possible in hardware because it gives us a much better fidelity and a much better user experience, this is why we're really actually trying to control it.

So how can you present volume to the user? Well, on the iPhone, the user has volume controls that they can hit directly. On iPod Touch, they don't. So there's two ways you can bring up a volume slider. Bill Stewart You can just do one which brings up a HUD, which is the MP Volume Settings Alert.

Or you can get a slider, which is a Cocoa UI View slider and you can, you know, put that in the place that you would like to put it best for your application. And the slider that you see in the iPod app, for example, is this type of slider.

Now there's some audio hardware settings that you may be interested in and this for various reasons. The most common hardware setting that people would like will be sample rate. But this is not something you can set because the current route where audio is coming from or going to may not be able to do that sample rate. As an example, you might want to get 44kHz audio.

So you're not going to get your preferred hardware sample rate. So you have to be prepared for the fact that you may not get this, but you can still request it. I.O. buffer size you'll mostly normally get, but you might not get it for one reason or another. So the I.O. buffer size is talking about the size of the buffers that we're going to use to do I.O. to the device. And this affects the latency of the audio. of the audio coming in or going out of the device.

And so when you do your preferred hardware settings, if you want to get down to that level, they're going to be made active. When you become active, they're going to be asserted onto the device if it's possible. And then you can make these calls which gets now the current state of the hardware. What is the sample rate? Did I get the preferred one or am I stuck with some other sample rate? How many input channels do I have? How many output channels do I have?

And this is a very useful thing, of course, to do before you're recording. It's no good recording 44k kilohertz stereo if you've got mono 8 kilohertz. So this is a useful thing to know. And your session must be active. If you just make these calls randomly and you're not active, then you're getting whatever the device is now.

So we've talked about routes and routes are things like headphone, line out speaker and you can get the current route at any time by calling the get property call with audio route. An interesting thing to know is when the route changes, when the user plugs in headphones or pulls out headphones. And this is informed to you through a property notification and the property ID on this is AudioRouteChange. You cannot get AudioRouteChange or set AudioRouteChange, you can just get notified that the audio route has changed.

And the property notification includes the data for that property. And in the AudioRouteChange property, it's going to tell you two bits of information. It tells you why the route has changed. Let's say the route has been removed because the user has unplugged their headphones. And also it will tell you what it was before the route changed.

So if you're going, say, from speaker to headphones, this would tell you speaker and the reason would be, well, you know, there's a new route available. And then you can get the current route, which would now be headphones because the user plugged in headphones. And of course, the reverse if you remove them.

And so what if you don't use audio session? So this is the exception I was talking about earlier. If you're just playing system sounds, you don't need to know anything about audio sessions. So half of you could have left. You're wasting your time here. Good to see you're still awake.

That's very reassuring. I thought I might be putting you all to sleep. If you're just using system sounds, you do not need to know about audio session because system sounds, the audio services, play system, sound play, alert sound, they're just going to mix in. They're just going to mix in with whatever else is going on in the system.

But if you're doing anything else, you probably do want to know about audio session because if you just open a cue and start to play it back or start to record, then you're going to make everything else be quiet. And if you really don't want to do that, then you need to know about audio session.

If your application is running happily playing back a file and the user gets a call, you're not going to know we stopped your application. And you also won't know when you can start again because, well, you don't even know you were stopped. So all the rest of the APIs are great and it's really, you know, fine for bring up and for doing your testing and development. But you really probably don't want to ship your application this way. You really want to tie in and use audio session to really make sure that you're not getting a call.

So all the rest of the APIs are great and it's really, you know, fine for bring up and for doing your testing and development. But you really probably don't want to ship your application this way. You really provide a good integration experience with the rest of the stuff going on and with how users are using their phone. So now I'm going to demonstrate some of this in action.

So if we can go to the projector. I'm going to start playing something in the iPod app. This is only a test. And then I'm going to go to the Notes application. This is a test. And I'm just going to type some text in here. And you can hear the clicks coming in from the keyboard as we're typing. The iPod is still playing. So this is system sounds just playing back, mixing in. The Notes application knows nothing about audio sessions.

So now I'm going to go to AQ Touch, the example we had before, and I'm going to start playing this file. Now this is an AAC MP4 file and this is using the media playback category. So it's interrupted the iPod, the iPod has stopped playing and now I'm starting to play my sound.

Now I'm going to stop it. The iPod isn't resuming, that's the behavior of the iPod. It's not going to resume after that interruption. Now I'm going to kill the volume here because I want to do a couple of things. It's like having a patient on an operating table here. So now I'm going to start playing and I'm just using the speaker.

You're going to have to trust me. You can see the metering, right? So now I'm going to plug in the headphones. Let me kill that for a bit. Oops. Caught that. So I'm plugging in the headphones. Oh, hang on. Let me try. Sorry about that. I'm going to play this again. So I'm on speaker.

plugging in the headphones and now the sound is still kept playing and it's playing through the headphone speaker now. Now what happens when I pull the headphones out? I'll kill the audio there. And it stops. And the reason it stops is because the application has set up a listener for the route change notification. And it's seen that the route change has come in when we pulled the headphones out.

The reason for the route change is that the old route was removed. So it's using that as a cue to stop. When we plug the headphones in, the reason for the route change wasn't removed, so we didn't actually do anything. And this is actually how the iPod behaves.

If the user plugs in headphones, it keeps playing. If the user removes headphones, it stops playing. And this is how this behavior is implemented in the iPhone, in the iPod app. So now what I'm going to do is to show you how to deal with interruptions. So I'm going to first go to a clock alarm. Here. And let's see if we can make that. Okay. Okay.

So I'm playing happily away here and very shortly an alarm is going to fire and my application is going to be interrupted. Now have a listen to the sound. You'll notice we've got a red stop button up and oh, looks like I might have been too late. Let me go and reset that alarm again.

I knew I was going to be cutting that one fine. Let me see how much time I've got. Okay. So I'm going to add an alarm here and it's at 4:19 which should be no good. I'm going to edit that one. I'm going to add another one. That's what I've got to do. Otherwise you're going to be here all day.

"I really don't want that. Okay, so let's try that again. Set that. Here's our car, we're all revving up, ready to go. So at the moment you can see we're metering, we've got the red stop button up because that would be the gesture we want the user to do.

When the alarm comes in, it's going to interrupt our application and what the application is going to do... Wake up. So what the application did, as you can see, is that the button changed because it got told, oh, you've been interrupted. You're no longer playing any audio. So it changed the button to its play state so that it can stop.

Now, if I dismiss the alarm, we're going to get an end interruption and we go back to playing again. And I'm going to stop that so I don't have to talk over it. And this behavior has just been programmed into the app and this is how we decided that we would respond to the interruption. So we can go back to slides, please. Okay.

[Transcript missing]

Open AL works in a fairly straightforward way. You create a device which is where you're going to render the audio on. For Open AL on the phone, there's just one device, which is the system audio device. Then you create an Open AL context, and this is really the mixer. This is where all of the work is done. It's the rendering engine. And the listener is implicit to the context.

And then the way you work is that you add sources to the context. And the sources, of course, are sources of sound, of audio. So you generate the sources, and then you generate the buffers. So you create buffers to put the audio data in. You put the audio data in them. And then you cue the buffers up to the sources, and then you play them.

So it's pretty straightforward. It's somewhat similar to AudioCue as well. And many of the sound playing APIs have got this notion of sort of cues and buffers. With Open AL, of course, then you can position the audio to different locations at any time, so you can move sounds around. And you use a 3D coordinate system to do that.

Now there's two extensions that we provide with Open AL, and this is actually true on the desktop as well, because both of them are useful in both situations. There's a static buffer extension. The static buffer extension allows you to get access to the audio data. And then you have the buffer pointer memory that's used in the actual buffered objects in Open AL. So instead of writing into a buffer that is then copied into those, you get access to the buffer data directly. So this is a very good efficiency, and we strongly recommend that you use this version of the AL buffer data core.

And then if you've got a lot of sources in your game, and you're quite content to mix them at a different sample rate than the device might be at, then we would also recommend that you use the mixer output sample rate extension for Open AL. And an example of this would be, let's say you had sounds that there's not really high frequency content in them, so 22 kilohertz is perfectly fine. But the device might be running at 44K because you want to play music as well with the iPod. So in this case, you could have, you know, all your sources running at 22K.

You could be mixing them at 22K. And then you would just do one sample rate conversion to get to 44K for playback. So this is a very useful extension to understand and to use. So I'm going to go to a demo, the slides will click, and go over to OpenAO so we can have the projector up again.

Okay, I'll just start this playing here. So this is a very simple example of OpenAL and the reason we wanted to provide this is to give you a very simple code base where you can just look at what it takes to play one sound and just get a real flavor for the API. Now I can just use this sound to move it from left to right. I presume that's panning.

I can put the sound closer or I can move it further away, which is a notion of a 3D space that the sound is occurring within. Now we talked in the section just before about the ambient category and the ambient category is good if you want to have music playing in your game. So where the music is coming from, say the user's iPod, they may have a bunch of applications.

They may have a playlist that they want to use or whatever. So in this case, as compared to AQ Touch where the music stopped, when we make sound with this application, it's going to let the iPod keep playing in the background. I get the full panning control of OpenAL. I can go from left to right and in and out and so forth. And the user can also control the iPod. They can bring up the HUD by the double click.

They can stop the... They can stop the iPod and just go back to their game, which I seem to have lost. Oh well, it doesn't matter. You can... But you'll see how you can really just keep the iPod going and the game will still go and the user can bring the iPod up and play sound or not.

We go back to slides, which we're already at. Back to slides. Thank you. So the last section that I want to get into is audio units. Audio units are really the foundations for iPhone development and these represent a kind of a lower level of the Core Audio services for you to use.

It gives you a lot more control over the rendering behavior. The rendering is done in the audio units case in your application. And it will probably give you more code to write, but it gives you much greater control. If you want to do lower latency input to output or output processing, then this is probably a service you need to know about. If you want to do mixing and to do open AL types of things, then audio units are actually the implementation that we're using. doing OpenAL in.

So in order to use audio units, you need to understand something about the data format that we're using on the phone. And linear PCM, of course, is the uncompressed audio format. And there's actually two canonical PCM formats that are used on the phone. There's a 16-bit format, which is the device format, the IO format that's used on the device. And then audio units are using an 8.24 fixed point. The 8 bits are the integer part, the 24 bits are the fractional part. And the reason that we're using fixed point is that you've got some headroom for mixing and doing processing and that kind of thing.

Now, there's no floating point sample support on the phone. There's a considerable difference in the CPU on the phone between doing floating point calculation and integer calculations. And the penalty for doing floating point is really too much for the types of audio that you want to do. So this is the reason why we've decided to look at a fixed point and integer-based format for audio. Formats are generally described using an AudioStream basic description, which is declared in the Cordio types header, and this is the header that's used for all of the basic types in Cordio.

So audio units are published in the Audio Unit Framework and they're pluggable, they're bits of code. They're basically discrete modules that you can load and use for different activities. So we have audio units available to you to use for doing I.O., for doing mixing and there's an EQ effect as well.

On the desktop environment, audio units are used to also allow third-party plug-ins to run in host applications like Logic and GarageBand. We're not supporting this at this point in time for the iPhone, so there's no third-party audio units that will be found and discovered. You, of course, could implement code using audio units and just load that code yourself, but we're not providing any discovery process for third-party audio units.

Because of that, there's no need for us to also do custom UI support, which is a common feature on the desktop version of audio units. So the audio unit is really there as an API for you to use for discrete activities and tasks. And probably the one you will need to know about most is the AU Remote I.O. This is how you get audio in and out of the system at this level. Now, if you've used the desktop environment, it's very similar to AU Health. It's a very simple interface, and it works in the very same way.

It has an input bus and an output bus in terms of the device. And it works in a just-in-time notion. And what I mean by that is that the work that you do with this audio unit is done on a real-time thread, on a time-constrained thread. That thread has a deadline to meet, and that deadline is not half a second away. It's normally some small number of milliseconds away.

So this puts me in a position where I can do that. It puts very strong constraints on the kinds of things you can do. For instance, you cannot block, you cannot read or write from the thread that the AU Remote I.O. is going to call you on. And there's a whole host of things you have to be careful of. Allocating memory is another activity you can't do on this thread. Now, AU Remote I.O. also will do sample rate conversion from the format that you're using as a client of it, and the format that the device is currently set to.

And then we provide mixer audio units. There's a stereo mixer, and it will take either 16-bit or 8.24 fixed-point inputs. Those inputs can have mono or stereo. And then the parameters you have for this mixer is a volume control and enable on each input. The enable can be seen as like a muting. It still leaves that input active. It's just not being called for data. And you can enable it and disable it, and we take care of ramping it up and down so you don't glitch.

And then it has a single 8.24 fixed-point output in stereo, and then you can connect that up as you can with audio units to the remote I.O., and then that gives you an ability to do mixing of multiple sources to one output. Now, OpenAL is implemented on top of the 3D embedded mixer, and this is also an audio unit that you can just use on your own if you want to do something different than what you can do with OpenAL.

So the 3D embedded mixer gives you a lot of options. It gives you inputs of either 8 or 16 bit, mono or stereo. It has a lot more parameters available for each input. It has a volume control. You can enable inputs. You can also pan using 3D coordinate space.

It has a distance attenuation. That was the sound getting quieter and louder as the sound was further away from the listener. You can also do rate control. Rate control can be used to simulate Doppler, so you have a pitch change as the sound comes to you and goes away.

You can do rate conversion on inputs, but really the rate conversion is a side effect of the fact that we have a rate control parameter. And we really don't recommend that you do a lot of rate conversion on the inputs. It's expensive. It's better to get all of your data at the same sample rate if you can. And then it gives you a fixed point output, 8.24 stereo output, just like the other mixer. Bill Stewart There's the iPod EQ.

The iPod EQ is going to do 8.24 input and output. It's the same EQ that's used with the iPod settings. So if you go into settings, there's EQ settings there. And the iPod EQ just gives you a collection of presets and this will match what the user can select for iPod playback.

So there's two ways that you can interact with audio units. There's using the AU Graph API, and the AU Graph API is a little bit more abstract than directly using Audio Unit API directly. And the way you do work with a graph is that you create a graph, you create a collection of nodes, and then you start and stop the graph. And the collection of the nodes, you connect up to each other to form your signal chain, your processing chain, processing graph.

Now, one of the features that the graph does that's very useful is that it provides you the ability to interact with the graph while it's running. It takes care of the threading issues. So when you add and remove nodes or add and remove callbacks for inputs and things like this, you're able to do that while the graph is running, and you're not going to get breakups in your audio or other problems because of the different threads that are involved in this. If you want to deal with the audio units directly, you can use the Audio Unit API as it is. And because the component manager is not available, we have a new mechanism for publishing the components that represent the audio units that you're going to use.

So this is the audio component. It's a new API. It's available both in Snow Leopard and on the iPhone. And it's how you find the components. And the components are sort of factories. They're class objects in a way. And then once you've found the components, which you find by describing it.

So you say, well, I want to find the component that represents the mixer or the iPod EQ or the AU Remote IO. Once I've found that component, I then want to create an instance of that component. I want that factory to make me an instance. And this is the audio unit itself.

So the example that I want to show you now is a duplex I.O. example. And what this does is it takes input and it copies it to the output and it just does a simple I.O. and it uses AU Remote I.O. to do this. The way that it works very briefly is that the callback fires to say that there's data there for the input. You call Audio Unit Render on Render Bus 1, which is the input side.

And then you pass that data to the callback, which is going to be used for output. And this example is provided as a starting point for you just to get some idea of how do you use Audio Unit. So if we can go to slides and we'll go to the demo.

So this is AURIO Touch. And what I'm displaying here is an oscilloscope that's representing the audio coming in. I can zoom it out so that the time granularity is a lot longer. I can zoom it in so it's covering, you know, very small and live piece of data. Now, if we could cut my microphone.

Taking audio in and pushing it through this cable into the output. Thank you. I think I'm back online. So I've got a mute button here and as I hit the mute button it, you know, played a little sound. That's using the audio services play system sound call. We thought we'd have a little bit of fun with this app, so what you see here now is actually a spectrogram of the input that's coming in and we're doing an FFT of the data.

And you can sort of see the yellow lines there. That's my whistle tone. It's a pretty noisy environment up here, so it's picking up a lot of data. Maybe if I sort of move it.

[Transcript missing]

And thank you. And as I said, all of this code is available for you to download, so you can try this app yourself. You'll notice a difference when you plug in the headphones and the headset. You'll get a much cleaner signal than you can from the mic and so forth. So it's really quite a lot of fun to play with.

Thank you very much. I'm basically done. Sorry we've run a little bit over. To just summarize what we've been doing is looking at audio toolbox. We've looked at AudioCue as a primary source for input and output. We used AudioFile with this. We had a look at audio services, both the session and system sound APIs, OpenAL for games, AudioUnit if you want to do low-level audio.

There's a couple of related sessions. There was the introduction to game development, which puts audio in the broader context of doing your game. Now I've skimmed over a lot of details about the Core Audio API set in general, and the understanding Core Audio architecture of session is very useful in giving you much more detail about how Core Audio works, how you can use the APIs and get a sense of how it's all put together. We have a lab tomorrow afternoon. We'll all be there, bring your code, bring your questions. I'm not going to read the resources slide. I'll just leave it up there while we do Q&A. Thank you very much.