Media • 1:05:40
Core Audio provides a powerful engine for playing and recording audio in your iPhone application. Learn how to play sounds and alerts, record audio from the built-in microphone and play sound files of arbitrary length. Understand the best practices to minimize latency and conserve power. Learn about the audio codecs and formats available for iPhone and understand the capabilities for playing multiple sounds simultaneously.
Speaker: Bill Stewart
Unlisted on Apple Developer site
Transcript
This transcript was generated using Whisper, it may have transcription errors.
So welcome to this session on audio development for iPhone. My name's William Stewart and I work in the Cordio group. And in the conversations today in this session, we're going to be going through all of the various services for doing audio on iPhone. And I thought what we would do to do this is to go through a set of tasks that we think are representative of the kinds of things that you would want to do. So we're going to look at doing system sounds, We're going to look at playing back and recording files. We're going to look at audio in games, and we're going to look at doing audio units if you want to do some more low-level processing. And we're also going to look at some of the general behavior of audio. There was a question in the last session in games about managing audio as calls come in and so forth. So we're going to have a look at that. And I'd just like to say a personal note. It's very interesting to see all of the games that are coming out, seeing the use of the audio system and the way it's being used with MooCam Music and the games yesterday. So it's very exciting and I'm really glad that you're here and we're looking for some great things. So the APIs are really what we want to cover here today to look at what you're going to learn in the session and what APIs to use for what purposes.
and also to have a better understanding of the types of experiences that your user is going to have with your game in the larger context of the phone and the audio system that it is using. So the technology frameworks that you have to look at are primarily the audio toolbox framework, and this is the primary APIs for most of the functionality. The audio unit framework provides some headers that you use if you're doing the lower level stuff. And then there's also the OpenAL framework, of course, which we'll get into later in the session. So the first step is system sounds. System sounds are basically small sounds, UI sounds, mail send, keyboard taps, SMS alerts, these kinds of things. If you've got a very simple game, this could be all that you need. if you're just making kind of tick sounds, stuff like that. It's a very lightweight API, very lightweight use on you. There's no individual control for volumes or for panning. So it's just very much a play and forget sort of API. And let's have a look at what that looks like. It's in the Audio Toolbox framework, audioservices.h.
And the way the API works is that you provide a CFURL file and then the sounds must be short. So they should be less than 30 seconds. Even if you're over five or 10 seconds, you might think, about compressing the sound with IMA or something similar to that because it will make a smaller memory footprint when we load the sound into memory. And then when you're finished playing the sound, if you're only going to play it sporadically, you could dispose the sounds as you go. If you're going to play the sound multiple times, you just keep it around.
So playing the sound then is just a simple matter of calling one of the two APIs that you see here. There's play system sound, and this plays the sound with normal usage. It'll obey the ringer switch on the phone. It will play back at the sort of the volume that the phone is set at at any particular point in time. And it's commonly the API that you'll use. If you need to do some kind of alert, like let's say you have a notification, you want to do some kind of an alarm or you want to attract the user's attention in some way, then you can use the play alert sound variant of this API call.
And this will vibrate as well on the phone if the user has got vibrate set on. Of course, there's no vibration capability on iPhone Touch, so it will make a small buzzing noise from the iPod Touch speaker. And because it has this kind of extra actions associated with it, you don't really want to be bothering the user all the time, so use the API appropriately.
And then with the API, because it is a play and forget API, if you want to know when we're finished playing the sound for you, then you can add a completion proc. And this completion callback will fire when the sound is finished playing. You can decide to dispose the sound at that point, or you could loop it if you wanted to. And then if you want to explicitly vibrate the phone, there's a sound ID called vibrate and then you just play this ID as if it were a sound file and that will vibrate the phone.
Okay, so that's system sounds. And what I want to do now is to go through the general sets of APIs that you can use for playing and recording audio files. And there's really two things to look at here. There's the API that is going to read or write audio files, and then there's the API that's going to be used to render those files or to get the audio for recording. So let's just look very briefly at the file services that we provide. There's two different file objects. There's an audio file object and an audio file stream object. Audio files are used for reading and writing audio data. Audio file stream is just a data object and it can only be used for reading and I'll go into some more details about that shortly.
Now both of the APIs are capable of dealing with several different types of file formats. And in Cordio we talk very differently about file formats and data formats. So a file format is a description of the container of a file of itself. Or it could be a network stream or whatever, but it's a description of a container. A data format is the data that's in that file, the specific audio data. And the two are not the same in most cases. So when we talk about file formats, we're talking about specifications for files. MPEG-4 is a file format, and you'll see.m4a,.mp4 as common extensions. A.m4a file can contain AAC data or Apple lossless data. An.mp3 is both a file format, commonly called a.mp3, which is actually an MPEG-1.03, and then there's a data format that can go into that. And then ADTS is a bitstream that's part of MPEG-2 where AAC was first specified. And this is a similar type of bitstream to MP3 in that it can be used on a network. And these are really a case where the file and the data that's in it are kind of the same type of thing,.mp3 or.aac.
A CAF file, Cordio Format File, was a file format we introduced in Tiger. and this is a format that can contain any data, can contain Apple lossless, AAC, MP3, linear PCM, IMA, any sort of audio data you like, and then you've got AAF and WAV files. So these files can be read or written on the iPhone.
So with Audio File API, of course, you've got to pair the right data to the right file format. And one of the things about Audio File API is that it's used with the file data that the file is completely there. It's not appropriate for a network download or for a Shatcast-style network stream. But because the file is completely there, you can do arbitrary seeking. You can pull arbitrary data or write arbitrary data to the file. So that can be things like overviews or regions or marker chunks, that kind of thing. And then audio file can read or write data to or from the file.
It has a fairly straightforward way of usage. There's two calls that create an audio file object. One where you're going to create a file on the disk or you're going to override an existing file. And to do the create call, you just provide the data format and the file format for the new file. And then if you're going to open the file, typically you'll do this for reading.
Then you do audio file, open URL, and you provide, if possible, a hint to the call to tell us what type of file it is that you're wanting us to open. Sometimes the extension doesn't match the file type. Some of the files are very difficult to determine because they could be one of two or three different things. So if you know, giving us a hint is very useful. And then once you have an audio file object, as with other Cordio APIs, we have a property semantic where you can get and set properties of files. And then to get the data from the file or to write the data, it's just a straightforward audio file read packets to read from this particular audio packet for this many packets into the buffer you've provided. And writing is the inverse of that, of course.
And then that's fairly straightforward. Now, Audio File Stream is a different type of API in the sense that it's not really dealing with the file itself. It's dealing with bytes, and it's bytes that you're pushing into the file stream object. So the file stream is a parser. It's parsing those bytes, and it's telling you what's in them. So because of this, the data doesn't have to be present completely. It's just a buffer that you're going to push into it.
So you can use this with the file, in which case you'll open the file and read the bytes from it. Or you could do an HTTP download. So while the file is actually downloading, you could be pushing the bytes through and getting the audio metadata from it. Or it could even be a shoutcast-style network stream where it's not really a file. It's just a stream of bytes and you're kind of jumping in at one point. And then the way the API works is that it uses a notification mechanism to tell you information about the bytes that you're pushing into it. And we also try to be efficient with minimal copying and to really kind of, you know, not have an overhead in us doing the work to tell you information about the stream that you're pushing into it. And so because it's not really dealing with a file but with bytes, the API to create the the audio file stream object doesn't provide a file, it just provides callbacks. And they're the callbacks that we're going to give you, to call you with when you've pushed data in and we're gonna call you back and tell you something about that data. And as with audio file, if you know the format of the file, it's very helpful to provide a hint to tell us that.
And then you just push the data in. Audio file stream pass bytes. You feed the data into the object, push it in, and then we'll call your listener callbacks. Now, there's two callbacks. There's the callback for properties of the data. So that could be things like, well, this is an AAC data file, or this has got MP3 data in it. And that's the data format that's contained within the file.
And then at some point, we're going to get to the place where we've got enough information to start telling you about the audio packets that are actually in there. And so you'll get a property notification that says we're ready to produce audio data. We're ready to really pass audio packets for you. And then we'll start calling your audio packet callback and we'll tell you where the packets are. And so then you can use that to interface to other objects. And let's have a look at the primary object you'll interface to and that's audio queue.
And so we're now getting deep into this. Hope you're still with me. There'll be a test at the end of the session, by the way. Okay. So audio playback, that's the one I want to first get into. So you use audio file or audio file stream to read the bytes. And then you use AudioCube as the playback API. And by using these two APIs together, you have a very large degree of control over the types of behavior. You can do playback volume on the queue, which allows you to sort of play multiple queues and mix them to different volumes. You have control of the data because you're providing the data into the queue, into the queue's buffers. You can loop files. You can do transport controls with these kinds of things. And just so that you understand the APIs we're exposing to you, these are the APIs that form all of the foundations for the audio playback of iPod, of the media framework that's on the phone with the plug-ins, with the YouTube. So these applications are sitting on top of AudioFile and AudioCue as their mechanisms. So this gives you some idea of the richness of the APIs that you have and what you can do with them.
And so, AudioQ can play back any supported file format through the use of audio file, of course, and any of the data formats, but there are some qualifications on the data formats that you can use. And this is because the iPhone itself has limitations and some of the formats are going to be decoded through hardware. And so, that makes us, puts a limitation on what we can do.
So these are the list of the formats that we can decode, that we ship with. And the top three, AAC, MP3, Apple lossless, are all modern codecs, they're all fairly complex computational objects. And because of this, their actual work is done in hardware, and you can only have one of these operating at a time. And this isn't just one of each of the three of them, but one of any of them. So only one AAC, and then you couldn't have MP3 or lossless.
But you can have unrestricted use of PCM. MU-LOR and ALOR is an 8-bit format. It's used in some voicemail formats. IMA4 is a 4-bit format. It's actually not a bad format. It's fairly simple to decode. It's a pretty good fidelity, and it's a 4-to-1 compression ratio. R. ILBC is Internet Low Bit Rate Codec and AMR. These are both speech codecs.
AMR is a speech codec that's in 3GPP, which is a standard that's kind of based off MPEG-4. And the speech codecs are very small, usually 8 kilohertz, so it's what's called narrowband. And they're very small data streams. And ILBC is particularly good for network where you may lose packets. So, ILBC can interpolate with lost packets and provide a fairly good audio signal. And of course, if you're doing just sort of speech content and you're concerned about size, these codecs are both very good.
And so with AudioCue, you create an AudioCue object and you provide a description to the AudioCue to say what kind of data you're going to play back with this cue. And then you give it a callback, and that callback is going to be called by the cue when it's finished processing your data, and you provide a thread context on which the cue is going to call that callback on. And based on the type of latency or the type of the size of the buffers that you're using with the AudioCue, the thread context can be very important to make sure that you get your callbacks in a timely manner.
And the queue owns the buffers that you're using on it. It's not something that you just kind of can give us arbitrary memory. So you allocate buffers on the queue that you're using, and you just provide a size. And a good optimization that we've added since Leopard is the allocate buffers with packet descriptions. And this means that the packet descriptions are in the same memory buffer that we're using internally as the queue buffers themselves. And it means we've got one less copy and that's a good efficiency for the queue when it's playing back. And then once you've got the queue buffer, you fill it up with audio data, typically from audio file or audio file stream, and then you just enqueue that buffer to the queue and it'll get played. And of course, how do you play it? Well, you start it. You just start the audio queue. when you want to stop, you stop it. If you've got intermittent data, let's say you've got just bits of audio here and then there's like some indeterminate silence and a bit there, you can keep the cue underneath sort of running by just pausing the cue itself, but you sort of keep some of the engine underneath the AudioCue API kind of primed. So you can pause the cue at that point. You can prime the cue if you want to get it ready so that when you hit start, it's going to really get going straight away. And then if you're doing transport controls, you've got audio cue reset, and to deal with compressed formats, you can flush the cue at the end to make sure that we play out all of the data that's there. Okay. So that's a very simple overview of the cue for playback. And cue in recording is actually not that different. So in this case you'd use audio cue to record and then we'd use audio file to write the audio file as we're getting the data.
So, AudioQ and AudioFile together can, as you would hope, record into any supported file format that we can write and any supported data format. And so what are the data formats that the audio queue can be constructed to use? And linear PCM, uncompressed. We do have an Apple lossless encoder.
And so this will compress your audio losslessly. It'll be the same when you play it back as it was when you recorded. And that's about a 55, 50, 60%. It varies a little bit, compression ratio. You've got MU-LOR and ALOR again, which are both 8-bit formats. IMA4 is a 4-bit format, and for speech content, we just have the ILBC codec available for speech. And when you create the queue, you just provide the data format that you want the queue to record into, and it will create all of the codecs that it's needing to get from the device, from whatever the device has got, including sample rate conversion and so forth, to the format that you've specified when you create the queue. Thank you. and you give it a buffer callback, and the buffer callback is how the queue delivers the data to you, and as with the output case, you provide the thread context, and it has much of the same semantics as it does previously.
And so as with output, you allocate a buffer for the queue to use, and then you enqueue the buffer. Now of course you don't have anything in that buffer yet because the queue is filling the buffer for you. So you just enqueue the buffer for the queue to use. And then the input queue, when it's run, it's going to put data into that buffer till it's filled those buffers, and then it's going to deliver them to you through the callback that you provided in the construction.
And so it has the same APIs for playback control. You can start the cue, you can stop it, you can pause it, and you can reset it to kind of get rid of all of the data if you're not interested in it, and that kind of thing. So we're actually going to go to a demo now, and I'll show you a simple example that we've written to demonstrate playback and record called AQ Touch. So if we can go to the projector. Okay, there's some sound, that's good. So, this application, I'm going to just select the file here and play it.
We believe in very high fidelity audio in the Cordio group, as you can see. So the file we're playing here is an M4A file, the same type of file you'd have in iTunes. It's got AAC data in it and we were metering two channels and we were metering that because the file had two channels in it. I'm actually going to record a file now, so I'm just going to hit the record button. Hello, welcome to WWDC 2008.
I'm going to play that back. Hello, welcome to WWDC 2008. And as you can hear, high fidelity recording. And that's coming in from the microphone. And of course, you know, it's not really great in such a reverberant room as this. But basically what we're also seeing there is that we're seeing one meter coming in on the input side because the microphone is an 8 kilohertz mono input. And so we're reflecting the level that's coming in from the microphone in the meters. If we can go back to slides, please.
Okay, so how do we know that it's 8 kilohertz and mono? How do we know that we're actually getting audio in from the microphone? I had a dock connector plugged in there. Why wasn't I getting audio through that? Well, these are all important questions, of course. And it's a good point to step back and just think about how complex the iPhone is as an audio device. Because it's actually very complex. It has a built-in microphone. It has a built-in speaker. It actually has another built-in speaker, and we call that a receiver to distinguish between that and the speaker. They're both speakers, but you understand the difference.
It has the ability to plug in headphones. And you can also plug in headphones that we call headsets because they have a microphone in them as well. So you can plug in either headphones or headphones with microphones. You can use Bluetooth for calls. You can get Bluetooth, which will have both input and output.
You can plug your iPhone into a dock and you can get line out from an iPhone or an iPod Touch for that matter or you can do the dock connector like I'm using up on stage there. You can also get USB output directly from an iPhone and go into a car unit and most of the car units now are using USB as their main transport mechanism precisely to avoid the noise problems we had while we were setting up, actually. And then there's also controls. There's a ringer switch control, which silences the phone or not. And then there's volume keys. So that's quite a complex device. If you look at your desktop computer, it doesn't have this complexity for audio. And so we wanted to really understand how this should behave and what do users expect from their phone when they're doing things.
And we wanted the behavior for the users to be consistent. The user shouldn't be surprised, and the user shouldn't have to go and do configuration types of activities. And so this meant that we had to take a lot of the control of the audio system from individual applications and do it at a system level so we could ensure that consistency. And a guiding principle that we had is that we wanted to do what the user means. When they plug in headphones, what do they mean? What does that gesture mean for them? When they're hitting the volume keys, are they changing the iPod volume? Are they changing the ringer volume? Are they changing the volume of your application? action. If they hit the ringer switch, should that always be silent? What if you hit the ringer switch and you have a clock alarm?
So these are all kinds of semantics, layers on top of just the basic mechanics of getting it to work that we wanted to take account of. And we wanted to respond to interruptions too. You've got alarms playing, you've got calls being rejected and so forth, and accepted of course. And so we needed sometimes to be able to silence the audio that's playing on the phone, because you don't want to missed your alarm because you were listening to loud music.
So the way we're expressing this behavior to your application is through an API called Audio Session. And this is a new API. All of the other APIs that I'm talking about are in Leopard and they're available on the desktop and they behave in the same way. This API is just for iPhone because we're dealing with behavior on iPhone and behavior in the iPhone audio system. And there's two primary features about audio session that is important to understand, and that is categories. We talk about different applications being in different categories. So MooCow Music and their sequencer is a different type of application than if I were to bring up notes and just tap on the keyboard, or if I were to play an iPod.
And some of the behaviors that are associated with these different categories are things like whether you mix or not with other applications. Would you allow, is it sensible for your application to have iPod playing in the background? And it's also got to do with routing. Where is your audio going? Where a ringtone or an alarm is going to go is going to be different than when iPod is playing back audio.
There's also the question of volume. When you're changing volume, what are you changing? And ringer switch behaviour. And then the other concept with session is whether your session is active or not. And when do you make your session active? When do you assert your role and your desires into the system? Thank you.
So the other thing to think about as a developer and something we have to think about as well is different models, because different models will have different behaviors. We use the same code on iPod Touch and iPhone, but from an audio standpoint, they're quite different devices. iPhone is very complex. iPod Touch really just has headphones and line out and USB out. There's nothing terribly complex about that. It's much more similar to a desktop type environment.
And then who knows what we'll do in the future. We could have a whole bunch of different types of audio possibilities. And so these types of things are fluid and they're flexible and they're dynamic. And so the API is based around the fact that we want you to express your meanings, your intentions, and we'll take care of the behaviour for you. Now, we may not take care of it completely right all the time and you can tell us when we're not. But that's the intention. And so part of the reason I'm talking to you today about this is to have you to understand our thinking here so that you can code your apps to behave and to fit in with the way that we're actually doing our applications as well.
So to buy into audio session, you initialize the session. There's not actually an object. A session is associated globally with your application. So we don't actually give you a session object. You just initialize the session. And you provide a callback, which we're going to use to tell you when your audio has been interrupted.
and you can get interrupted because you get a call or because the clock alarm goes off, and they're just two examples. Who knows what might else interrupt you in the future? And then the interruption has two states associated with it. There's a begin interruption, so when the alarm goes to ring, we're going to tell you, hey, you've been interrupted. You are no longer playing audio at this point. We've stopped you, and here's the notification to tell you that we have. Now, you may not get an end interruption. As you know, if you answer a call, your application is going to be terminated. So you won't get an end interruption. There's no one to tell. You're gone. And if you want to do something at that point, you can use the Cocoa UIs to interface to that fact that your application is being terminated. But if it's just a clock alarm and the user dismisses it or the user doesn't answer the call, then we'll tell you that the interruption has finished and that you're actually free to make audio again.
So now we mentioned set active, and when a session becomes active, it asserts its behaviors, its requests, its characteristics on the system. So when you become active, you may stop other people playing. And when you become deactive, the other people may play again or not. It depends on what's going on on the system. So the set active call is really just saying, right, I want the device now, I want to make audio, or I want to record audio.
So audio session, just as with all the other Cordia OPIs, uses a property mechanism to get data, to set data. You can instantiate listeners in order to listen to changes in the audio data. Thank you. And these are the categories. And the categories are implemented as a property and the property is audio category. So there's six categories defined at the moment. Two of those categories allow your application to mix with other applications on the system. Now the sound effect one is kind of like a default. If you think of the Notes app, it would be sort of user interface sound effects. You're not really doing much. You're happy for anything else to happen on the system. Ambient sound is sort of similar to that If you're a game or any application and you want to allow the iPod to play music in the background while your game is in the front, this would be the category to set. It's not the default. If you know nothing else about this session from audio session and you want to do this, you have to know this. Now if your application is much more audio centered, then you need to be one of these four remaining categories. And these categories are going to stop other applications from making audio. You will now own the audio on the system.
except for one which we'll go into in a moment. There's always got to be one exception. So the four categories of media playback, which you could imagine the iPod would be in that category. Live audio, which could be something like the sequences, the instruments, the iPhone band stuff, all of these would be kind of live audio things. Recording, if you're just doing recording, I'm sure there's at least 20 voice notes apps being written today and I'm looking for one. And then there's a play and record and this allows you to do both input and output at the same time.
And these categories are going to keep things about them, like volume. So when the user sets volume, that's a gesture that the user does. It's not actually a gesture your application can do. It's from what we consider a user's action. And when they set the volume, they're going to set it on a category. So let's say you've got three applications that do different types of band things. You might have a guitar app and a piano app or something, and they all come up as the live audio category, then they're going to share volume for the different routes. So if they've got headphones plugged in, that's going to be a volume for that category on that route. If they've got just playing to speaker, that's a different audio route, but the same category. So those two applications would share volume to that destination. And so this covers across the whole thing. And because we're really trying to do volume as much as possible in hardware, because it gives us a much better fidelity and a much better user experience, this is why we're really actually trying to control it. So how can you present volume to the user? Well, on the iPhone, the user has volume controls that they can hit directly. On iPod Touch they don't. So there's two ways you can bring up a volume slider. You can just do one which brings up a HUD, which is the MP Volume Settings Alert. Or you can get a slider, which is a Cocoa UI View slider, and you can put that in the place that you would like to put it best for your application. And the slider that you see in the iPod app, for example, is this type of slider.
Now there's some audio hardware settings that you may be interested in, and this for various reasons. The most common hardware setting that people would like would be sample rate. But this is not something you can set because the current route where audio is coming from or going to may not be able to do that sample rate. As an example, you might want to get 44K and they're using the built-in mic, which is only doing 8 kilohertz. So you're not going to get your preferred hardware sample rate. So you have to be prepared for the fact that you may not get this, but you can still request it. IO buffer size, you'll mostly normally get, but you might not get it for one reason or another. So the IO buffer size is talking about the size of the buffers that we're going to use to do I/O to the device and this affects the latency of the audio coming in or going out of the device.
And so when you do your preferred hardware settings, if you want to get down to that level, they're going to be made active. When you become active, they're going to be asserted onto the device if it's possible. And then you can make these calls, which gets now the current state of the hardware. What is the sample rate? Did I get the preferred one, or am I stuck with some other sample rate? How many input channels do I have? How many output channels do I have? And this is a very useful thing, of course, to do before you're recording. It's no good recording 44k kilohertz stereo if you've got mono 8 kilohertz. So this is a useful thing to know. And your session must be active. If you just make these calls randomly and you're not active, then you're getting whatever the device is now.
So we've talked about routes, and routes are things like headphone, line out speaker. And you can get the current route at any time by calling the get property call with audio route. And an interesting thing to know is when the route changes, when the user plugs in headphones or pulls out headphones. And this is informed to you through a property notification and the property ID on this is audio route change.
You cannot get audio route change or set audio route change. You can just get notified that the audio route has changed. And the property notification includes the data for that property. and in the audio route change property it's going to tell you two bits of information. It tells you why the route has changed. Let's say the route has been removed because the user has unplugged their headphones. And also it will tell you what it was before the route changed.
So if you're going, say, from speaker to headphones, this would tell you speaker and the reason would be, well, you know, there's a new route available. And then you can get the current route which would now be headphones because the user plugged in headphones. And of course, the reverse if you remove them.
And so what if you don't use audio session? So this is the exception I was talking about earlier. If you're just playing system sounds, you don't need to know anything about audio sessions. So half of you could have left. You're wasting your time here. Good to see you're still awake. That's very reassuring. I thought I might be putting you all to sleep. If you're just using system sounds, you do not need to know about audio session because system sounds, the audio services, system, sound player, alert sound, they're just going to mix in. They're just going to mix in with whatever else is going on in the system. But if you're doing anything else, you probably do want to know about audio session. Because if you just open a queue and start to play it back or start to record, then you're going to make everything else be quiet. And if you really don't want to do that, then you need to know about audio session. If your application is running happily playing back a file and the user gets a call, you're you're not going to know we stopped your application. And you also won't know when you can start again because, well, you don't even know you were stopped. So all the rest of the APIs are great and it's really, you know, fine for bring up and for doing your testing and development. But you really probably don't want to ship your application this way. You really want to tie in and use audio session to really provide a good integration experience with the rest of the stuff going on and with how users are using their phone. So now I'm going to demonstrate some of this in action.
So if we can go to the projector. Okay. I'm going to start playing something in the iPod app and then I'm going to go to the notes application. And I'm just going to type some text in here. And you can hear the clicks coming in from the keyboard as we're typing. The iPod is still playing. So this is system sounds just playing back, mixing in. The Notes application knows nothing about audio sessions.
So now I'm going to go to AQ Touch, the example we had before, and I'm going to start playing this file. Now this is an AAC MP4 file, and this is using the Media Playback category. So it's interrupted the iPod. The iPod has stopped playing, and now I'm starting to play my sound.
Now I'm going to stop it, the iPod isn't resuming, that's the behaviour of the iPod, it's not going to resume after that interruption. Now I'm going to kill the volume here because I want to do a couple of things. It's like having a patient on an operating table here. So now I'm going to start playing and I'm just using the speaker.
You're going to have to trust me. You can see the metering, right? So now I'm going to plug in the headphones. Let me kill that for a bit. Oops, caught that. So I'm plugging in the headphones. Oh, hang on. Let me try. Sorry about that. I'm going to play this again. So I'm on speaker.
plugging in the headphones, and now the sound is still kept playing, and it's playing through the headphone speaker now. Now what happens when I pull the headphones out? Kill the audio there. And it stops. And the reason it stops is because the application has set up a listener for the route change notification and it's seen that the route change has come in when we pulled the headphones out. The reason for the route change is that the old route was removed. So it's using that as a cue to stop. When we plug the headphones in, the reason for the route change wasn't removed, so we didn't actually do anything. And this is actually how the iPod behaves. If the user plugs in headphones, it keeps playing.
If the user removes headphones, it stops playing. And this is how this behaviour is implemented in the iPhone, in the iPod app. So now what I'm going to do is to show you how to deal with interruptions. So I'm going to first go to a clock alarm here and ooh, let's see if we can make that. Okay.
So I'm playing happily away here and very shortly an alarm is going to fire and my application is going to be interrupted. Now have a listen to the sound. You'll notice we've got a red stop button up and oh, it looks like I might have been too late. Let me go and reset that alarm again.
I knew I was going to be cutting that one fine. Let me see how much time I've got. Okay. I'm going to add an alarm here and it's at 4:19 which should be no good. I'm going to edit that one. I'm going to add another one. That's what I've got to do. Otherwise, you're going to be here all day.
don't want that. Okay so let's try that again, set that. Here's our car, we're all revving up, ready to go. So at the moment you can see we're metering, we've got the red stop button up because that would be the gesture we want the user to do. When the alarm comes in, it's going to interrupt our application and what the application is going to do.
Oh, wake up. So what the application did, as you can see, is that the button changed because it got told, oh, you've been interrupted. You're no longer playing any audio. So it changed the button to its play state so that it can stop. Now, if I dismiss the alarm, we're going to get an end interruption and we go back to playing again. And I'm going to stop that so I don't have to talk over it. And this behaviour has just been programmed into the app and this is how we decided that we would respond to the interruption. So we can go back to slides, please. OK.
Now, I'll just take a note here too to say all of the three applications we're demoing today, I've got two more to do, are available for download. So you can see how all of this is implemented and we thought that might help you to get going and get started with all of this for your own apps. So now I want to look at games. This is the, we're using the OpenAL API and this is the API that's being used by the Touch Fido, by the Monkey Ball games, et cetera, that have been demoed. And OpenAL is an API that works with a very similar way to OpenGL. It uses a similar coordinates, same coordinate system. It's a 3D coordinate system. It allows you to position sounds in a 3D space. It's a cross-platform open source API. It's not actually an API Apple developed. It's not Open Apple Library. It's Open Audio Library. And its primary use of course is for games. OpenAL.org is the website that you can get further information.
It's available on Windows, on Linux, on varying different games, game consoles and so forth. Apple has been supporting this API since Tiger 10.4 and we shipped the 1.0 specification then and then in 10.4.7 we updated this to the 1.1 implementation which is the current specification for the API. And this is what is also available on iPhone, except that on iPhone we don't have the audio capture API available for you.
Otherwise it's all there. Now OpenAL works in a fairly straightforward way. You create a device which is where you're going to render the audio on. For OpenAL on the phone, there's just one device, which is the system audio device. Then you create an OpenAL context, and this is really the mixer. This is where all of the work is done. It's the rendering engine. And the listener is implicit to the context. And then the way you work is that you add sources to the context. and the sources, of course, are sources of sound, of audio. So you generate the sources, and then you generate the buffers, so you create buffers to put the audio data in, you put the audio data in them, and then you queue the buffers up to the sources, and then you play them. So it's pretty straightforward. It's somewhat similar to AudioCue as well, and many of the sound playing APIs have got this notion of sort of queues and buffers. With OpenAL, of course, then you can position the audio to different locations at any time, so you can move sounds around, and you use a 3D coordinate system to do that. Now, there's two extensions that we provide with OpenAL, and this is actually true on the desktop as well, because both of them are useful in both situations. There's a static buffer extension. The static buffer extension allows you to get access to the buffer pointer memory that's used in the actual buffered objects in OpenAL. So instead of writing into a buffer that is then copied into those, you get access to the buffer data directly. So this is a very good efficiency and we strongly recommend that you use this version of the AL buffer data core. And then if you've got a lot of sources in your game and you're quite content to mix them at a different sample rate than the device might be at, then we would also recommend that you use the mixer output sample rate extension for OpenAL. And an example of this would be, let's say you had sounds that there's not really high frequency content in them, so 22 kilohertz is perfectly fine, but the device might be running at 44K because you want to play music as well with the iPod. So in this case, you could have all your sources running at 22K. You could be mixing them at 22K. And then you would just do one sample rate conversion to get to 44K for playback. So this is a very useful extension to understand and to use. So I'm going to go to a demo, the slides will click, and go over to OpenAL so we can have the projector up again.
Okay. I'll just start this playing here. So this is a very simple example of OpenAL, and the reason we wanted to provide this is to give you a very simple code base where you can just look at what it takes to play one sound and just get a real flavour for the API. Now I can just use this sound to move it from left to right. I presume that's panning. I can put the sound closer or I can move it further away, which is this notion of a 3D space that the sound is occurring within. Now we talked in the section just before about the ambient category and the ambient category is good if you want to have music playing in your game. So where the music is coming from, say the user's iPod, they may have a bunch of a playlist that they want to use or whatever. So in this case, as compared to AQ Touch where the music stops, when we make sound with this application it's going to let the iPod keep playing in the background. I get the full panning control of OpenAL, I can go from left to right and in and out and so forth. And the user can also control the iPod, they can bring up the HUD by the double click, they can stop the iPod and just go back to their game, which I seem to have lost. Oh well, doesn't matter. You can, but you'll see how you can really just keep the iPod going and the game will still go and the user can bring the iPod up and play sound or not.
We go back to slides, which we're already at. So the last section that I want to get into is audio units. Audio units are really the foundations for iPod, for iPhone development. And these represent a kind of a lower level of the Cordio services for you to use. So... It gives you a lot more control over the rendering behavior. The rendering is done in the audio units case in your application, and it will probably give you more code to write, but it gives you much greater control. If you want to do lower latency input to output or output processing, then this is probably a service you need to know about. If you want to do mixing and to do OpenAL types of things, then audio units are actually the implementation that we're doing OpenAL in.
So in order to use audio units, you need to understand something about the data format that we're using on the phone. And linear PCM, of course, is the uncompressed audio format. And there's actually two canonical PCM formats that are used on the phone. There's a 16-bit format, which is the device format, the IO format that's used on the device. And then audio units are using an 8.24 fixed point. The 8 bits are the integer part, the 24 bits of the fractional part. And the reason that we're using fixed point is that you've got some headroom for mixing and doing processing and that kind of thing. Now, there's no floating point sample support on the phone. There's a considerable difference in the CPU on the phone between doing floating point calculation and integer calculations. And the penalty for doing floating point is really too much for the types of audio that you want to do. So this is the reason why we've decided to look at a fixed point and integer-based format for audio.
Formats are generally described using an audio stream basic description, which is declared in the Cordio types header, and this is the header that's used for all of the basic types in Cordio. Thank you. So audio units are published in the Audio Unit Framework and they're pluggable, they're bits of code. They're basically discrete modules that you can load and use for different activities. So we have audio units available to you to use for doing I/O, for doing mixing, and there's an EQ effect as well.
On the desktop environment, audio units are used to also allow third-party plug-ins to run in host applications like Logic and GarageBand. We're not supporting this at this point in time for the iPhone, so there's no third-party audio units that will be found and discovered. You, of course, could implement code using audio units and just load that code yourself, but we're not providing any discovery process for third-party audio units. Because of that, there's no need for us to also do custom UI support, which is a common feature on the desktop version of audio units. So the audio unit is really there as an API for you to use for discrete activities and tasks. And probably the one you will need to know about most is the AU Remote I.O. This is how you get audio in and out of the system at this level. Now, if you've used the desktop environment, it's very similar to AU Hell. and it works in the very same way. It has an input bus and an output bus in terms of the device and it works in a just-in-time notion and what I mean by that is that the work that you do with this audio unit is done on a real-time thread, on a time-constrained thread. That thread has a deadline to meet and that deadline is not half a second away, it's normally some small number of milliseconds away. So this puts very strong constraints on the kinds of things you can do. For instance, you cannot block, you cannot read or write from the thread that the AU Remote I.O. is going to call you on. And there's a whole host of things you have to be careful of. Allocating memory is another activity you can't do on this thread. Now, AU Remote I.O. also will do sample rate conversion from the format that you're using as a client of it and the format that the device is currently set to. And then we provide mixer audio units. There's a stereo mixer, and it will take either 16-bit or 8.24 fixed-point inputs. Those inputs can have mono or stereo. And then the parameters you have for this mixer is a volume control and enable on each input. The enable can be seen as like a muting. It still leaves that input active. It's just not being called for data.
And you can enable it and disable it, and we take care of ramping it up and down so you don't glitch. And then it has a single 8.24 fixed point output in stereo, and then you can connect that up as you can with audio units to the remote I.O., and then that gives you an ability to do mixing of multiple sources to one output.
Now, OpenAL is implemented on top of the 3D embedded mixer, and this is also an audio unit that you can just use on your own if you want to do something different than what you can do with OpenAL. So the 3D Embedded Mixer gives you inputs of either 8 or 16-bit, mono or stereo. It has a lot more parameters available for each input. It has a volume control. You can enable inputs. You can also pan using 3D coordinate space. It has a distance attenuation. That was the sound getting quieter and louder the sound was further away from the listener. You can also do rate control. Rate control can be used to simulate Doppler. So you have a pitch change as a sound comes to you and goes away. You can do rate conversion on inputs, but really the rate conversion is a side effect of the fact that we have a rate control parameter. And we really don't recommend that you do a lot of rate conversion on the inputs. It's expensive. It's better to get all of your data at the same sample rate came. And then it gives you a fixed point output, 8.24 stereo output, just like the other mixer.
There's the iPod EQ. The iPod EQ is going to do 8.24 input and output. It's the same EQ that's used with the iPod settings. So if you go into settings, there's EQ settings there. And the iPod EQ just gives you a collection of presets. And this will match what the user can select for iPod playback.
So there's two ways that you can interact with audio units. There's using the AU Graph API, and the AU Graph API is a little bit more abstract than directly using Audio Unit API directly. And the way you do work with a graph is that you create a graph, you create a collection of nodes, and then you start and stop the graph. And the collection of the nodes, you connect up to each other to form your signal chain, your processing chain, processing graph. Now, one of the features that the graph does that's very useful is that it provides you the ability to interact with the graph while it's running. It takes care of the threading issues. So when you add and remove nodes or add and remove callbacks for inputs and things like this, you're able to do that while the graph is running and you're not going to get breakups in your audio or other problems because of the different threads that are involved in this.
If you want to deal with the audio units directly, you can use the audio unit API as it is. And because the component manager is not available, we have a new mechanism for publishing the components that represent the audio units that you're going to use. So this is the audio component. It's a new API. It's available both in Snow Leopard and on the iPhone. And it's how you find the components.
And the components are sort of factories. They're class objects in a way. And then once you've found the component, which you find by describing it. So you say, well I want to find the component that represents the mixer or the iPod EQ or the AU remote IO. Once I've found that component, I then want to create an instance of that component. I want that factory to make me an instance. And this is the audio unit itself.
So the example that I want to show you now is a duplex IO example. And what this does is it takes input and it copies it to the output and it just does a simple IO and it uses AU Remote IO to do this. The way that it works very briefly is that the callback fires to say that there's data there for the input, You call audio unit render on render bus one, which is the input side, and then you pass that data to the callback, which is going to be used for output. And this example is provided as a starting point for you just to get some idea of how do you use audio units. So if we can go to slides and we'll go to the demo.
Okay, so this is AURIO Touch. And what I'm displaying here is an oscilloscope that's representing the audio coming in. I can zoom it out so that the time granularity is a lot longer. I can zoom it in so it's covering very small and live piece of data. Now if we could cut my microphone.
taking audio in and pushing it through this cable into the output. Thank you, I think I'm back online. So I've got a mute button here, and as I hit the mute button, it played a little sound. That's using the audio services play system sound call. We thought we'd have a little bit of fun with this app, so what you see here now is actually a spectrogram of the input that's coming in, and we're doing an FFT of the data. And you can sort of see the yellow lines there. That's my whistle tone. It's a pretty noisy environment up here, so it's picking up a lot of data. Maybe if I sort of move it.
There, that's a little bit quieter. Get a good noise there. So you can see that that's actually, you know, quite cool, what it's doing there. Back to there, and I can do an FFT as well, which just shows me frequency response. And that's that demo. Back to the slides, please. Thank you.
And thank you. And as I said, all of this code is available for you to download, so you can try this app yourself. You'll notice a difference when you plug in the headphones and the headset. You'll get a much cleaner signal than you can from the mic and so forth. So it's really quite a lot of fun to play with. Thank you. Oh, good, there we go. Okay, so thank you very much. I'm basically done.
Sorry we've run a little bit over, but to just summarize what we've been doing is looking at audio toolbox. We've looked at AudioCue as a primary source for input and output. We used AudioFile with this. We had a look at audio services, both the session and system sound APIs, OpenAL for games, AudioUnit if you want to do low-level audio. There's a couple of related sessions. There was the introduction to game development, which puts audio in the broader context of doing your game. Now, I've skimmed over a lot of details about the Core Audio API set in general, and the understanding Core Audio architecture of session is very useful in giving you much more detail about how Core Audio works, how you can use the APIs and get a sense of how it's all put together. We have a lab tomorrow afternoon. We'll all be there, bring your code, bring your questions. And I'm not going to read the resources slide. I'll just leave it up there while we do Q&A. And thank you very much. APPLAUSE