Graphics and Imaging • 1:05:00
Mac OS X features a state-of-the-art audio engine in Core Audio, enabling the most powerful music and audio applications available today. Walk through the high-level queueing services for playing and recording audio, see how to stream audio files over a network, and get the details of extending Core Audio to load and store audio files. Bring your headphones and laptop.
Speakers: Doug Wyatt, James McCartney, Bob Aron
Unlisted on Apple Developer site
Transcript
This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.
Okay, so thank you. Welcome to session 404, and we're going to be talking about three primary APIs in Core Audio. Oh, look, it even works. Audio queues, audio streams, and extended audio files. And it's sort of like a high-level overview, and I think you'll find the information very useful. And without any more ado, I'm going to bring up Doug Wyatt, and he's going to get us started with audio queues.
Thank you, Doug. Doug? So this is actually a hands-on session, meaning that if you've got the code off of the developer attendee site, the code for the program I'm going to show you in this session is on that site. You can look at it now. You can look at it later. But in either case, I'm going to walk through a little program there called AQ Record. So in the session, what I'm going to cover pertaining to the audio queue is what it is, how it's constructed, how you set one up, how you manage buffers of audio.
And in this example program, we'll record some audio buffers from the hardware to a file. And then there's also an example program called AQ Play, which we won't actually step through, but I'll speak briefly about its similarities and differences compared to AQ Record. So I'd like to go to the demo machine. We'll start from the outside, and I'll just show you the simple little command line program here. And what screen are we on? Do we have any windows? Is that a separate -- ah, okay.
Now I can't see it. But fortunately, I typed my command line a little earlier. So here-- OK, let me actually start the iPod before I record. Okay, so this is some music I recorded a couple years ago. And I'm playing it from my iPod into the Mac. And now we should have an audio file.
And if I just say AQ Play, hopefully we will hear what I just recorded, but maybe not. Do you know if the audio is up on this machine? Okay. Well, in any case, you see the basic usage here. It's a very simple command line app. We just specify a file name.
There's, if I run AQ record without options, just in the spirit of format agnosticism, as Jeff was talking about in the last session, you can see that I can do things like AQ record dash D AAC, and it'll make an AAC file. That would be an AAC file. OK, back to slides, please.
Okay, so as we walk through the source code to this little program, we'll see that here are the steps that it goes through. First, we'll figure out what audio format we want to record into. We'll create the audio file to actually hold those samples. We'll create an audio queue to get audio asynchronously from the hardware. And as we receive buffers from the queue, then we'll write them out to the audio file.
So in terms of what pieces of API this program is built on top of, there's not only the audio queue. There's also the audio file and audio format APIs, which Jeff Moore described in the last session. Those in green are the audio toolbox APIs, which this program uses. Underneath the audio queue, we're using the Core Audio HAL or the audio hardware APIs, but only indirectly through the queue. And actually, this example program does have one little place where it talks to the HAL directly, but I'll show you that in a sec.
So what the queue does, as I said, it receives buffers from your audio hardware. And as it receives those buffers from the hardware in the hardware format, then it runs them through an internal audio converter to convert those buffers of audio to whatever your desired recording format is, whether that be 16-bit integers or AAC. Any encoder that is on the system can be used to encode your audio to whatever format. So your audio will receive buffers in that encoded format from the queue.
Okay, so stepping into the code for AQ record, here, this is about halfway down through main, after it's parsed all the command line arguments, so it's getting ready to actually do some work. It's going to default to the calf audio file type. That's that constant there. Then we're going to try to figure out if the user specified some other extension on his file name on the command line. For instance, if he entered a file name that ended in .aif, then we'd like to create an aiff audio file instead.
So that's going to be done by this function, infer audio file format from file name. To get there, we just need to convert the command line argument record file name to a CFString, CF record file name. Then we can pass it to our function and get back the audio file type ID. for whatever might have been specified.
So the guts of that function, I won't go through all the mechanics there, but the essential point here is that it calls an audio file function called audio file get global info. Audio file get global info when given an extension in the form of a CF ref. And so, for example, if that were .aif, then this function will know, oh, that's an extension for aif, aiff, rather, and give us back that constant. So that's an example of one of the nice little bits of information the AudioFalc API can give us.
Okay, getting into a bit more code. So this is setting up the audio stream basic description for the format which we want to receive from the queue and write to the file. So we're going to call that record format in this program. Now, the way the program is set up, it's trying to just have some reasonable defaults.
If you look in the code, record format is just filled out completely with zeros. So what we'll do first, we'll say, well, if the sample rate is still zero-- in other words, it wasn't specified on the command line-- then we'll get the sample rate from the hardware device.
Then the next if statement will default to stereo, two channels. Then that last large block, if it's a ref block. If the record format is zero or linear PCM, then we've got to elaborate and say, what kind of linear PCM do we want? So we're going to default to 16-bit integers.
We'll decide whether to use bigger little endian 16-bit integers, depending on the file format. We'll set up bytes per channel, bytes per frame as functions of bits per channel and channels per frame, and so on. So by the end of this, we've got a full specification of the audio stream. Basically, we're going to have a full description record format.
So going back, the function, here's the function that we called to figure out what the hardware sample rate is. This is the one place we have to sort of duck underneath to ask. Actually, in this example, I'm using Audio Hardware Services, and I'm not talking directly to the HAL, but I might as well, if I were on a Tiger system where Audio Hardware Services didn't exist, I could call the HAL directly here to make these calls. But in fact, on Leopard, I could call Audio Hardware Services, which is a way of avoiding talking directly to the HAL.
But in any case, those APIs are very similar. I can ask, so what is the default input device? I can get back an audio device ID for it. Given that audio device ID, it's the local variable device ID. I can ask the device for its nominal sample rate, be it 44,100, 48,000, or whatever. And that will come back in. And that's the function that we use here, my get default input sample rate.
Okay, moving on. Now we know the record format. We've got it fully specified. So we can ask the queue to -- well, we can create a new queue, rather, and ask it to provide us that format of audio. So the record format's the first argument to audio queue new input.
The queue will use the default input device. We're passing Audio Queue new input the address of a function, my input buffer handler. It's going to call this function every time new data arrives. And it's going to pass the address of the local AQR, which is my user data for the callback function, just so that in that callback, I've got access to data that was declared in main.
I've got two null arguments to Audio Queue New Input, specifying the run loop and mode on which I want my callback function to be called. This is a slightly more advanced feature. There are some situations where you might want to say, OK, here's my disk I/O thread. I want to receive the incoming buffers of audio on that thread. But in this case, this is a simple program. I can just say null, and the queue will construct its own thread internally, and it will deliver me buffers on that thread. And so, the last argument, the address of AQR.queue is my newly created Audio Queue object.
Okay, there's one more little detail here. So this variable record format that I passed, that's my audio stream basic description, and I've passed it to Audio Queue New Input. In the case of a compressed format like AAC, all I had to specify was AAC, two channels, 44,100 hertz, and that was enough for the queue to be able to create an audio converter.
But that's not necessarily a sufficiently accurate audio stream basic description for the audio file API. So what I'm doing now is I'm turning around and asking the queue, so, okay, I gave you this sort of half-baked, maybe, record format. Can you really fill it out completely for me? And that's what this call here does. I'm asking the queue, please give me your underlying audio converter's output stream description.
And if it were PCM, then I'm just going to get back exactly what I gave it, but if I gave it sort of that half-baked AAC format, then I'll get a full description of the AAC format, for example, with 1024 frames per packet. And now my local variable record format is fully specified.
So given that full specification of the record format, now I'm ready to turn around and create my audio file using that audio data format specification. I've got my audio file type, which I determined earlier to be defaulted to CAF or .aif, would have specified an .aiff audio file. So at this point, now I've created an empty audio file containing exactly the audio data format that I want.
Now, in the case of a compressed format, Jeff Moore spoke a bit about Magic Cookies in the previous session. Magic Cookie, again, for those of you who weren't there, is just a little bit of private data from the encoder that the decoder needs to decode the stream. And many audio file formats will hold Magic Cookies. In the case of all the Core Audio APIs that deal with Magic Cookies, all you really need to know is that the Core Audio API, you'll get the cookie from one place and you can deliver it to another.
So in this case, we just need to get the cookie from the audio queue, say, so what's your converter's compression Magic Cookie? And then we turn around, assuming there is one. That's what that first if statement is checking. If it were PCM, there wouldn't be a magic cookie. But if there is a magic cookie, then we fetch it from the queue, and we set the cookie on the audio file.
Okay, before we go further in the code, I thought a little picture might be worth a bunch of words. This is what's going on inside the queue. We've got the audio hardware symbolized by a microphone here on the left, providing input data. That's going through an audio converter, converting from whatever format the hell is delivering the samples in to the format I want to record in.
Now, the interesting concept here is the idea of the buffer queue, and that's the object from which the queue derives its name. In the case of an input queue like this, the buffer queue is simply a sequence of buffers that the queue will fill in order. So from the application's point of view, we're going to enqueue a series of buffers, and then as audio comes in, the queue will fill those buffers and return them to us.
So in the diagram, you'll see that the converter's filling buffer number one in the buffer queue. When that buffer gets full or nearly full, that buffer gets delivered to the queue's callback function, and we'll see what this example program's callback looks like in a moment. From the callback function in our example program, we'll write that audio data to the file, and we'll turn around and call audio queue and queue buffer again to effectively recycle the buffer. We're done writing data from it, and we're ready to have it refilled.
So that was a good introduction to the process of dealing with buffers and the queue. So the buffers are actually owned by the queue, which saves you some memory management problems and addresses some shortcomings and typical usage errors we had with the sound manager all those years ago.
So what we'll do is we'll ask the queue to allocate our buffers for us. But before we do that, first we're going to have to figure out how many bytes in size we wish these buffers to be, just to try to strike a right balance between various performance characteristics.
We want the buffers to be long enough in time that we're not spending all of our time just going back and forth between the file system. But we also want the buffers to be long enough in time that we're not spending all of our time just going back and forth between the file system.
We want the buffers not to be so enormously large that we're chewing up lots of memory. And to be format agnostic, a half second of audio, which is a reasonable compromise in terms of not hitting the disk too much, a half second of audio could range from being rather small to fairly substantial.
So we'll look at that function in a moment, my compute record buffer size. So what we'll do is we'll, once we know that, we'll look at the function in a moment, my compute record buffer size. So what we'll do is we'll look at the function in a moment, my compute record buffer size. Then we can allocate some number of buffers of that size. Three is a good typical number. And we'll enqueue those buffers, meaning that once the queue starts, the hardware and audio starts coming in, that buffer will get filled, as we saw in the last diagram.
( Transcript missing )
Okay, so we've enqueued all these buffers. We've got a file all set up and ready to record into. I don't think there's anything left to do now but set this member variable AQR.running on my structure. We'll see how that's used in a moment. Call audio queue start, which starts the underlying hardware and starts the audio converter as it receives the buffers from the hardware. And then we can basically sleep for five seconds. I'm using CFRunLoop run-in mode, but I think I have no RunLoop sources on this thread, so I'll just sleep for five seconds.
Okay, so as the buffers of encoded audio are filled by the queue, then they're delivered to this function. And you'll remember this is the one, this is the function I passed to audio queue new input. So this is my input buffer handler. The first thing it does is just make sure, kind of a sanity check, that we received at least one packet of audio.
And assuming we did, then we can just turn around and call audio file write packets, given the record file that we set up early in main. And using the length of the buffer, that's in buffer arrow M audio data byte size, we've been given possibly some packet descriptions, depending on the format. If it's a compressed format that requires packet descriptions, then in packet desk, that pointer will be non-null. Thank you.
We also have the argument, or I'm sorry, the variable AQR error record packet. That's, if we go look at the arguments to audio file write packet, that's just saying the packet number in the file that we want to start writing at. Earlier in the program, I didn't show you this detail, but we started at packet zero. So the first buffer that comes in will get written as packet zero to the file. And then after we call audio file write packets, then we're going to advance AQR error record packet by the number of packets that we just wrote.
Okay, so we received a buffer of audio from the queue. We wrote it to the file. And now, the only thing left to do is, unless we're in the middle of stopping, in other words, if we're still running, then we're going to turn around and re-enqueue the buffer, telling the queue, okay, put this at the end of the queue of buffers and fill it when it comes around again. So that's the input buffer handler.
So having done all that, when that five seconds expires in main, we simply have to stop the queue after setting our member variable running to false so we don't keep trying to recycle the buffers. So we stop the queue. The true argument means yes immediately. We call this function to copy the magic cookie from the queue to the record file.
Again, sometimes a cookie can change as the codec is running, so it's best to set the cookie both before and after recording. Okay, so we've written the cookie to the file. The file's just about done. We're disposing the queue. We're closing the file. And that's pretty much all there is to recording.
So having walked through all that about recording, that's the AQ Record program, the queue's anatomy for audio playback is similar, very similar, just different and a few little wrinkles. So instead of enqueuing buffers to be filled from incoming audio, your program's job is to supply buffers to be played. And just like with a recording queue, there's an audio converter so you can supply your buffers in a compressed format. As the buffer is completely played, it's passed to a callback function whose job is to refill it before re-enqueuing it.
So this is pretty much what I just told you, the similarities for playback. You can create and configure the queue. You enqueue your buffers the same way. You start the queue the same way. The completed buffers arrive at your callback function. And in many cases, your callback function will just re-enqueue the completed buffer.
And the main difference really is what happens in the callback function, since in the recording case, you're just going to be reading the audio out of the buffer and writing it to the file, whereas in the playback case, you'll be reading it from the file and writing it into the queues buffer.
Where this actually shows up most distinctly as a difference in your programs is if you look in the AQ play example, when we first start playing, we have to call our own callback function manually to fill the buffers from disk and enqueue them for the first time. That's a minor wrinkle, and it makes sense when you walk through it.
So I just wanted to give you that brief look at how similar playing is to recording. So what I've shown you here is a little bit of a difference between the two. What I've shown you here is that using queues for recording and playing back audio is pretty simple and efficient.
The queue takes care of your memory management, converting between encoded audio formats and linear PCM. There's a volume control on the queue. I didn't show you that, but it's there. It manages all of those lower-level interactions with the audio hardware, including, of course, doing the I.O. to the hardware.
And because it does that for you, it leaves your application free to do its processing of incoming audio, whether it's recording it to a file or doing signal processing. You can do all of your work in less time-critical contexts, and that's the benefit of the queue. And with that, I'll bring up James McCartney. Thank you.
JAMES MCCARTHY: OK. I'm James McCartney, and I'm going to talk about the Audio File Stream API. It's a new API in Leopard, and it's for dealing with non-random access audio streams. The basic mode of operation is a three-step process. You open an audio file stream, passing it callback functions that will call you back with properties and audio data packets that it finds in the stream.
The second step is to parse the data. You enter a loop, and you pass buffers to the Audio File Stream parser, and it will call your callbacks. And then when you're finished, you close the stream. Audio File Stream API doesn't handle networking or disk streaming. It's a file parsing API. It's up to you to provide-- provide the audio data. And it currently supports all these formats.
Okay, now the difference between a stream and a file is a stream is not random access. It has a pass that is no longer accessible. So we're assuming you're not writing the entire pass to a file. It could be an indefinitely long stream. And the future is not yet accessible, so you can't access into the future to find out any kind of data that might not be accessible.
So the basic problem in dealing with streams that you don't have with files is that you're getting your data from some source, like from a socket or some other source, and it's chunked up into buffers. And those buffer boundaries may bisect chunks in the file so that you may have read part of a header, but you don't have even the rest of the header yet to find out how long this next chunk is. So your parser kind of has to suspend in mid-operation. And that's what the Audio File Stream parser handles for you. It remembers the necessary state from previous buffers, and it doesn't ask for a state that's not yet accessible.
And as data arrives in the buffers, it will call your callbacks telling you that I found the data format or I found the Magic Cookie. And then here's audio data and you can begin decoding and playing the audio data. So as the buffers arrive, it's calling your callbacks. So I'm going to show a small client server audio stream example.
The server is a very simple program that just f opens a file, just treats it as a file of bytes and doesn't even know it's an audio file and writes its content into a TCP stream. It's really simple and I'm not going to go over it. So the interesting stuff happens in the client where I connect to that socket, I allocate data that I need to manage the stream, and I create an audio file stream parser, then I enter a main loop.
And that will read data from the socket and then call audio file parse data. And an audio file parse data in turn will call the callbacks saying we've gotten these properties or audio data. And when it's inside one of these callbacks, we're going to create an audio queue and then start in-queuing buffers on the audio queue to have them played back. And then after that, once the data is exhausted, we'll call back the callbacks. and we'll exit the loop and clean up.
Some other utility functions that are necessary for this are the property callback, which tells you it was called whenever properties are discovered, and the audio packet callback, and then there's the Audio Queue Output Buffer Callback, which is called when the audio queue is finished decoding an output buffer. And then there's a My Enqueue Buffer, which is a helper function that just enqueues buffers onto the audio queue. So now I'm going to go to the demo machine and run the demo. We have two terminal windows, one for the server and one for the client.
Okay, so on the left terminal window, I'm going to run the server. I'm going to play back an AAC file. So now it's waiting for a connection. So then on this terminal, I'm going to run the client. So now we're streaming data from the server to the client.
You can see these printouts are happening in the client from the callback, so I'll show that in a bit. All right, so I'll just kill that. You can see here's the property callbacks being called. Okay, so back to slides. Okay, so this is the main program in the client program, the main function. The first thing I do is I allocate a structure called MyData, and it's just a structure I'm going to use to hold all of the bits and pieces that I need to manage the stream. You can see a declaration of it in the sample code.
Then I will initialize a Pthread mutex in Convr because the audio queue is running a separate thread from my client that's reading data from the socket, so I need to synchronize on signaling when buffers are no longer in use. And then I connect to the socket. That's just a standard connecting Unix sockets. And then I'm going to allocate a buffer for reading. So I'm going to use the buffer to read the stream from the socket.
Okay, and then the next thing I'm going to do is create an audio file stream parser with audio file stream open. I pass it MyData, which is just a pointer to my local data, and I pass it to my property listener and audio packets proc callbacks. And there's a file type hint, which I'm going to tell it I'm going to be streaming A, C, D, E, and E. And then I'm going to add a pointer to my audio file stream parser, and it's going to put that into a field in MyData struct so I'll have it.
Okay, so this is the declaration of audio file stream open. It's got the void star for the client data, which can be in any of your data that you want to have kept around by the audio file stream. And then the two callbacks and a file type hint, which tells the Audio File Stream parser what format it can expect, because there's some binary formats that aren't self-identifying, so it may not have an easy way to figure out from the bits that you're giving it what it's supposed to be parsing. And then it will pass you back the Audio File Stream.
Okay, so then the main loop, this is where everything happens, basically. I just receive data from the socket into a buffer, and then I call Audio File Stream Parse Bytes, giving it my audio file stream, how many bytes are in the buffer, and the pointer to the buffer, and then a flags field.
This is the declaration for audio file stream parsed bytes. The flags field, in the case that you had a discontinuity in your stream, you could signal that here, and it would ignore anything that happened to be in the middle of parsing at the time, at the buffer boundary. So it would just say, "Okay, I don't know where I am again, so I'll start parsing." From Xero.
Okay, so and then the property listener callback is the first one I'm going to show the implementation of, which is the declaration is it passes you, this is the audio file stream files are calling you. So it passes you a client data pointer, the audio file stream instance, the property ID that it's found.
So it just tells you, I found this property in your stream. And then it flags fields. And the flag can be that this tells you that this property is either cached or it's not cached. And that means if it's not cached, you need to ask for it now or signal that you want it to start caching the property now because you won't have access to it because it will have fallen into the past in a past buffer. And the audio file stream parser doesn't buffer indefinitely into the past.
So, in this property listener callback, I'm only going to be interested actually in one property, which is the ready-to-produce packets property. And in that property, I'm going to create an audio queue, allocate the audio queue output buffers, and get a magic cookie from the stream and set it on the audio queue.
So this is my property listener, Proc. So when I'm called, the first thing I'm going to do is cast the client data to my, a pointer to my data struct, so I have access to all my data. And then I'm just going to print out what property I'm finding.
At the bottom on the left, you can see the properties it finds, and they're all four-character codes, which indicate that it's found the file format, the magic cookie, the data format, the data offset. And then the last one is ready-to-produce output packets. So once we have gotten ready-to-produce output packets property called on us, that means we're at the point where we've received all the necessary file and data formats. So we're going to get the data format information that is required to begin decoding the data. So at that point, we can start reading audio packets or getting them passed to us and being able to decode them.
( Transcript missing )
Okay, so now this is the declaration for audio file stream get property, which I use to get the ASBD. It's just I pass the audio file stream the property ID I want, and I pass it the byte size for the buffer that I'm passing it to fill the property, and then it will pass me back the property data in that buffer.
So then I allocate the audio queue buffers. I just have a loop where I allocate a few audio queue buffers. I pass it the audio queue, the buffer size I'm going to use, and then it will pass me back a reference to the audio queue buffer, and I'm going to keep that in an array in my data struct.
Okay, so then I need to get the magic cookie from the audio file stream. So what I do is I call audio file stream get property info to find out how big the cookie is. Then I'm going to C-Alec a buffer to hold the cookie, and then I'll get the cookie from the audio file stream using audio file stream get property. And then I'm going to call audio queue set property to set the cookie on the audio queue. So this way the audio queue now has a cookie so that its audio converter can decode the stream.
Okay, so audio file stream get property info, which I just used to get the size of the cookie, pass it the file stream instance, the property ID, and it will pass you back a size for that property and a writable flag. And currently there are no writable properties for audio file stream.
So then the next thing I'll show is the audio data callback. So once we're ready to use output packets, we've created a queue, we know the format and the magic cookie, then we're going to start getting packet callbacks from audio file stream parse bytes. And I So it's going to give me the pointer to my client data, which is my data, and it's going to tell me how many bytes are in this bunch of packets, how many packets there are, this void star pointer to the actual input data, which is my audio data, and then an array of AudioStream packet descriptions, which tell me where the packet boundaries are in the input data.
So this is the actual implementation of myPackets proc in this demo. So the first thing I'm going to do is, well, first I get my pointer to my data, and then I just print out how much data I got, how many bytes, and how many packets. And then I go into a loop for each packet that I got.
I'm going to get the packet offset and the size from the packet descriptions. And if I've -- now, I'm currently filling an audio queue buffer. I may have been from a previous callback, so I have to find out whether there's enough buffer space remaining in this audio queue buffer to fill the -- or to completely put this packet in there.
So if there's not enough buffer space remaining, I'm going to go ahead and enqueue the buffer on the audio queue so that it'll go ahead and play it back, and then it'll step to the next buffer, and I can begin -- continue copying data. So then the next thing I do is just mem copy that packet of data into the audio queue buffer.
Okay, then I need to fill out a packet description for the audio queue to tell it where this packet boundary is. And so first I just copy the packet description from the stream that the -- The packets callback has given me, and then I'm going to... Copy that into the Audio Queues output buffer array of packet descriptions. And I need to change the offset because it had one offset in the input stream, and I'm going to change it to the offset where I am currently in the Audio Queues output buffer.
And then I just keep track of the bytes filled and the packets filled on the current queue, Audio Queue buffer. And then if that was the last free packet description, so I have a limited size array of packet descriptions that I have allocated in my data, so if I've reached the end of that, then I need to enqueue the buffer because I don't have another packet description left in that array. And that will enqueue the buffer and give me a new -- I can begin filling a new buffer.
Okay, so the helper function that does the enqueuing First, I need to set an in-use flag so that I don't try to use a buffer that's already in use. So I set the in-use flag to true. And then I'm going to call audio queue in-queue buffer to in-queue the buffer on the stream.
And I pass up the audio queue, the audio queue buffer. The number of packets filled and an array of packet descriptions. And then, if I haven't already started my audio queue, so now I've potentially enqueued the first buffer, and I'm going to call audio queue start to go ahead and start the audio queue playing back.
Alright, so then I need to go to the next buffer and, uh... So I'm going to reset the bytes filled and the packets filled to zero, and then I'm going to take a Pthread mutex so I can wait until there's a buffer that's become free. So I enter a loop waiting for the next buffer's in-use flag to be set to false.
Okay, then the other callback I need to implement is the AudioCues output buffer callback. This gets called whenever the AudioCues finish decoding a stream, a buffer of audio. The main thing I need to do here is take this mutex again and signal that the buffer is now no longer in use. And then basically that's everything. Once the data has been exhausted from the socket, we're going to exit the loop and call audio file stream close, audio key dispose, and free all our data.
This is the declaration for audio file stream closed. You just pass it to file stream. There's one other issue. You can actually do some random access. This is to handle the situation where a user has a play bar and they've changed the playback position. You can call audio file stream seek to find out what is the byte position I need to play back the packet where the user moved the play bar to. So audio file stream seek will return that byte position, and then you're actually responsible for seeking in the data. And if you seek to a different place in the data, you need to set the audio file stream discontinuity flag to signal that that's happened.
This is the declaration for audio file stream seek. You pass the audio file stream the packet offset that you want to go to, and it will pass you back the byte offset in your stream where that packet is located. Now, in some formats, that may be in -- some file types, that may be an estimation. And in that case, it will set the I/O flag parameter to offset as estimated. Okay, so that's audio file streams, and now I'm going to bring up Bob Aron to talk about Extended Audio File and OpenAL.
Hi, I'm Bob Aron. I'm a member of the Core Audio team here at Apple, and I'm going to talk a little bit, as James mentioned, about OpenAL and using the Extended Audio file in conjunction with OpenAL. So first, just a little bit of review if you're not familiar with OpenGL.
It's an open-source cross-platform API for doing spatialized 3D audio. It has some real OpenGL-like conventions, uses the same coordinate system. It actually complements OpenGL very well if you're doing your graphics in GL already. As Jeff mentioned earlier, it's primarily used for game development, although there are developers using it for other things as well.
And I'm not going to go too much into detail of the overall API set. You can get more info on the website there that's up on the slide. As far as Apple's commitment to OpenGL, we delivered an implementation on the system in OpenGL framework when we shipped Tiger.
That was based on the 1.0 specification of OpenGL. And then when the 1.1 spec was released, we released a new implementation with our 10.4.7 system software update. And then at the same time, we also added some new OpenGL extensions for doing reverb occlusion and a bunch of other things.
So we're really happy with the overall API set. And then we're going to go into a little bit more detail of OpenGL construction. It's purely a core audio stack. It uses a lot of the same pieces that we've been talking about for this session and the previous session. And it really takes advantage of the 3D mixer audio unit for doing all that hard spatial rendering. I also wanted to touch briefly on ALUT. There's been some confusion on the lists about the OpenGL ALUT library. ALUT is basically a companion library for doing some utility things.
And when the 1.0 spec was done originally with OpenAL 1.0, it was primarily used for getting audio data out of WAV files or from data into a form that you could pass into the OpenAL buffers. And we implemented those APIs when we did the initial release. But when OpenAL 1.1 was finished, actually the 1.0 ALUT spec was deprecated.
So we've removed the ALUT.h header from the framework. But we're still using it. We've still implemented those APIs. And so we won't break any of your binaries at runtime. And you actually can still use them in your code. But you just have to define the prototypes for them yourself when you're building.
As far as ALUT 1.1, there is a new specification for ALUT. It's an expanded set of APIs. We did not implement that in the current OpenAL framework that's on Tiger. And it won't be in the Leopard version either. But if those are APIs that you need, you can go, again, get more information about those on the OpenAL website.
So what's new? With Leopard, we have a couple of new effect audio units on the system for doing a Roger Beep effect and a distortion effect. And so we decided to add OpenAL extensions to take advantage of these so that you could use them through your normal OpenAL calling mechanisms. So we've got a couple there, Roger Beep and Distortion, and I'll talk about these as we go.
The Roger Beep is basically an effect so you can do walkie-talkie type simulations. Basically what it does is when you're playing your audio in your OpenAL buffer, whenever the audio signal dips below a certain dB threshold for some amount of time, it replaces the audio with a tone. Now it could be a static tone like a walkie-talkie, like a sound or a beep or whatever. It just depends on which Roger Beep setting you're using. It'll be available.
It'll be available at runtime whenever the Roger Beep audio unit is present on your system. And we've got some predefined settings. Also, you can create your own Roger Beep audio unit preset files and load those at runtime in your application. And there are several properties that go along with it.
And we set the properties with the ALC ASA set source API. That's something we introduced with one of the extensions that shipped with the 10.4.7 release of OpenAL with the 1.1 spec. And again, you can find all the details in the openAL. And you can find all those properties and things in the Mac OS X OIO extension header that's in the framework.
So just to give you a visual representation of what Roger Beep does is, let's say this is the audio data that's in one of your OpenAL buffers. You'll see that there's a couple of points there in the data where the signal's not as loud, so that you can see those designated in the blue. Well, if that area of the signal was dipping below the dB threshold that your sensitivity is set at for the effect, then what happens is this other waveform will be played, and that gets done by the audio unit itself.
So here's our source properties. Basically, all these properties are set on an OpenAL source basis. And what I mean by that, if you're not familiar with OpenAL at all, is an OpenAL source is your object that you're moving around your spatial environment. So each thing that's making sound, so to speak, is your OpenAL source.
So we have some properties here. The first one is the RogerBeepEnable property. You're going to want to call this first when you're using this effect so that you can tell the library, do some initialization, get the audio unit ready so when I want to use the effect, it'll be ready to go. The next one is the RogerBeepOn. This is basically an on and off toggle. And you'll want to explicitly turn that on because it's off by default. Now, you can do that while the source is rendering or not rendering, but you'll want to explicitly turn it on.
The next property we have is the RogerBeepGain property. This allows you to set how loud the waveform that replaces your original data, how loud you want it to be played. So a setting of zero means you're not doing any attenuation at all, and then you can attenuate it down to a setting of minus 80 dB.
Again, I mentioned there are several different RogerBeep types. We have some preset ones that you can use. Those are in the header file. RogerBeep sensitivity. This is so you can fine-tune how the effect kicks in. You may be using -- you may be capturing data off a microphone, so to -- and you have a particular level of audio coming in, or you may have files. So there's some different sensitivity settings, and you can fine-tune it for how you're using it in your application. And then lastly, we have the RogerBeep preset. As I mentioned, you can create your own audio unit presets and load those at runtime.
So if you've had a chance to look at the code that's associated with this session, there's an OpenAL Tools project, and there's a RogerBeep test source file. And so here's some of the source for setting up the RogerBeep. I've kind of trimmed it out a little bit so that we could get it all on the slide without all the error checking and stuff.
So the first thing you'll want to do is call the ALC, Is Extension Present? Now, this is the normal mechanism in OpenAL for determining whether an extension that you want to use is present when your application is running. So we're going to pass in that string for our extension.
And then if that's true, then we can go and do some setup on our source object. So we're going to enable it. We'll pass it a value of true. We're going to explicitly turn it on. As I mentioned, you could do this while the source is rendering or not. We're going to set our gain to zero, which means we don't want any attenuation at all on the Roger Beep tone that's going to be played.
We'll set our sensitivity to light in this case. When I demo it, this seems to work well with the program because I'm in the way that I'm capturing my data. And lastly, we're going to use the walkie-talkie setting. This is one of the preset sets that you'll want to use.
So just to summarize the demo code, if you've had a chance to look at it or you can look at it after the session, basically what we're going to do is use the OpenAL Capture APIs to capture some data off of an input device on the machine that we're running.
We're going to capture that data and put it in some OpenAL buffers, and then we'll attach those buffers to our source object and play it back. And as you just saw, we just went through all the RogerBeep settings. So if we could go to the demo machine. I'll set this guy up here, because we don't need this. So I have an iPod here that has a file, and I'm going to use that into the line input.
Let's see, we don't need this. OK. Okay, so we want to do a Roger Peep test. This tool takes a parameter for the duration for how long we want to run. Okay, and I've got my iPod here. And so I'll start it running, and then I'll start sending it some data on the line input. Welcome to session 404.
You are listening to a test of the OpenAL Roger Beep effect. To use Roger Beep, simply verify at runtime that the extension is present. Use the ASA SetSource API to set the enable and on properties. This test is currently capturing data in real-time and filling OpenAL buffers for playback. If the test is running properly, you should hear the Roger beep tone at each pause in this dialogue.
All right, if we can go back to slides. So what you heard right there was some data coming off my iPod, and all of those beeps that were at the end of each pause, those were actually being generated by the RogerBeep audio unit. So you can see how that might be useful in your game if you've got characters talking on a walkie-talkie or whatever. That's primarily what this effect is for.
Let's go to the next one. Okay, so let's talk about the distortion. Again, this is an extension that's used in a very similar fashion to the RogerBeep extension. It'll be available on the system at runtime if the distortion audio unit is present on your system. So that'll start with Leopard 10.5. We have some predefined settings. Again, you can create your own audio unit presets for distortion, load those at runtime, and we set the properties with the same mechanism that I was just showing you with the RogerBeep.
So here's our distortion property. So see again some similarities. First the distortion enable property so that you can tell the library to set up the audio unit, do some initialization, get it ready for use. The distortion on property for toggling the effect on and off. Again, it's off by default. You want to explicitly turn it on.
The distortion mix is so that you can designate how much of the distortion effect you want applied to your audio when it's being played back. So a setting of zero means you're not applying any distortion and 100 means you're getting full maximum distortion. We have some preset distortion types.
Now, there will be some more types added when we finish up Leopard and we ship it to you. So if you look in the header file now, you'll see some and there will be some more that are added. And we have the distortion preset if you want to create your own audio unit presets and load those at runtime.
So I want to talk a little bit about how we can use Extended Audio File with OpenAL. Extended Audio File, as Jeff mentioned a little earlier, is sort of a combination of the Audio File and Audio Converter APIs in that it allows you to read and write audio files of any of the audio file types that we support with our normal audio file APIs, whether that's WAV or AIFF or MPEG-4, MP3, any of the files that we support with any of the data formats that we support.
It combines -- and so it -- basically what it does is it allows you to open any file, but then designate the format that you want to receive when you're asking for packets of data. So there's the header file there, and you'll find that in the Audio Toolbox framework.
And let me just run through some code here. So in the distortion test, also in the sample code, I have a function here called MyGetOpenALAudioData. And the reason we call it that is OpenAL has a limited number of formats that it wants. And so we want a function here that's going to return us back a format that OpenAL understands.
In this case, we're going to ask for 16-bit integer data. So the first thing we'll do is we'll take the URL that we've been passed in and call extended audio file OpenURL. Now, if it successfully opens the file that we've provided, we'll get back an extended audio file reference that we can pass to the subsequent APIs in the set.
Next, we're going to call Extended Audio File Get Property, and we're going to use the File Data Format property so that we can get the original format of the data that's actually in the file. Now, even if we don't want that format to pass when we pull packets out, we do want a couple of fields out of that Audio Stream Basic Description.
Specifically, we want to keep the sample rate and the number of channels the same. So even if we've got MP3 data that's stereo 44K and we want 16-bit, we still want it to be 44K stereo. So we're going to fill out this output format struct here, which is an Audio Stream Basic Description, and we're going to maintain our sample rates and our channels per frame. And then we're going to fill out the remaining fields to designate that we want 16-bit integer data.
So that first field, the Format ID, we're going to pass in Linear PCM. And then the next field, the Bytes Per Packet, and then two fields down, the Bytes Per Frame, we're going to designate that to be the size of each packet of PCM data. So two being a 16-bit sample, so two bytes, times the number of channels we have, either mono, it's probably going to be mono or stereo.
And then the frames per packet, as Jeff also mentioned earlier, packets of PCM are always one frame at a time. And then the bits per channel, 16, we want 16-bit. And then the last field there, the format flags, this is somewhat important if you're dealing with OpenAL. OpenAL expects to get integer data in whatever Endian format you're running on at the time, because there's no way to designate what the data, the format is at runtime.
So if you're running on a PowerPC, it expects to get big Endian integer data. And if you're running on an Intel machine, it's going to expect to get little Endian. So here's where we would set that. Now that we have our AudioStream basic description filled out, we're going to pass it to the Extended Audio Set Property API to designate what our client data format is. In other words, this is the format we want to get, receive our audio data in when we ask to read packets from the file. Thank you.
Okay, so we've done our format stuff. Now, the next thing we need to do is determine how many frames of data we have in the file. For our simple example, we're just going to read all the data out in one shot. Now, if you have a big file, you may want to break this up, but just for the purposes of this demonstration, we're just going to pull it all out at one time.
So we get the total number of frames, and then we're going to allocate some memory, which is the total number of frames times the size of our packets, which we know from filling out the AudioStream basic description. Once we do that, we're going to read all of the packets out of the file into our audio buffer list, which is pointing to our allocated memory that we just did.
And if that's successful, we no longer need our extended audio file reference, so we can dispose it and pass the data back to the caller of the function. So for the distortion demo, it's very similar. If you look at the code, we're not going to use capture APIs to get the data. We're going to get our data out of a file via the extended audio files.
We're going to take that data, fill OpenAL buffers, attach it to a source for playback, and we're going to apply the distortion effect much the same way that we applied all the settings for the Roger Beat. So if we could go back to -- there you beat me to it. I'll run this one.
Let's see. Okay, so there's our test, and we're going to provide it a file. Now, you might recognize the dialogue, but keep in mind we're going to hear it through the distortion audio unit. You can see it is time to play Choose a Vista. Oh, what's going on? Well, Vista comes in six different versions, but I don't know which to choose.
( Transcript missing )
So that about wraps up our session. Here's a couple of contacts for anybody that's interested in contacting us later. I want to remind you that we have a Core Audio session tomorrow at 2:00, from 2:00 to 6:00, so you can come and ask us questions about anything that we've talked about today. There you go. Sorry. I'd like to bring up Bill Stewart now to host some Q&A. And thanks very much.