Configure player

Close

WWDC Index does not host video files

If you have access to video files, you can configure a URL pattern to be used in a video player.

URL pattern

preview

Use any of these variables in your URL pattern, the pattern is stored in your browsers' local storage.

$id
ID of session: wwdc2006-209
$eventId
ID of event: wwdc2006
$eventContentId
ID of session without event part: 209
$eventShortId
Shortened ID of event: wwdc06
$year
Year of session: 2006
$extension
Extension of original filename: mov
$filenameAlmostEvery
Filename from "(Almost) Every..." gist: ...

WWDC06 • Session 209

Core Audio Update

Graphics and Media • 1:03:11

Core Audio is the world-class audio architecture in Mac OS X. Go in-depth with Core Audio's new high-level audio services for playing and recording audio. You'll also discover Core Control, a new API set to represent control surfaces such as mixers and other audio hardware.

Speakers: Doug Wyatt, Bill Stewart

Unlisted on Apple Developer site

Transcript

This transcript was generated using Whisper, it may have transcription errors.

Hello, my name is William Stewart and I work for Apple in the Kordio team. Welcome to session 209, Kordio Update. Just to give you a synopsis of a couple of the things, the session after this one is also a session around Core Audio technologies. In this session now we'll be covering two main topic areas, media hardware control, which is a new API set designed to represent various control services and other MIDI and non-MIDI type of hardware, two applications.

We'll be doing a quick revision of the CAF files and then there's also another new API in Leopard called the AudioQ API. AudioQ API is for dealing with both playing and recording of buffers of audio. The session after this one is focused really on surround, on some of the different things that we enable with Cordio to do surround. The first major topic is OpenAL 1.1, which we support now in 10.4.7, and we'll be doing a little more of a detailed discussion of that and some of the custom extensions that we added to that. And then we've got an application that we ship called AU Lab, which is provided for hosting audio units, and we've added support in that for multi-channel and surround audio unit hosting. So we'll be going through some of those changes. And then in general, just reviewing how Cordio in general sort of deals with multi-channel formats. And then Pana audio units. And that's an audio unit type that's been around for a little while, but we're actually defining now a kind of standard API for that, and we'll show you some things about that. And then finally just another audio unit feature to enable audio units to do MIDI output. So that's the synopsis of the two sessions. There's also a session tomorrow afternoon for QuickTime audio, and that'll be covering something that may be of interest to people in this group, the ability to insert processing into QuickTime signal chain, so you can take a QuickTime movie, play it back, but actually get the audio and process it in some way. So without further ado, I'd like to get Doug White to come up onto the stage and he'll be covering media hardware control. Thanks, Doug.

Thanks, Bill. So the media hardware control, I'll start out just by introducing what it is and who it's for, talk a bit about some of the devices that it supports. They're kind of intense and complicated in some ways. And then I'll go through some basic tasks in the API for receiving input from these devices and sending feedback back to them.

So here's a screenshot, actually several windows from Emagix Logic application. And this is fairly typical of today's digital audio workstation. In the upper right, we've got the transport control with lots of buttons, your current locators, locators where you're going to punch in and out. Over here, we have the mixer window. And this is kind of a pretty direct analog to an old-fashioned mixing console. We have channel strips, volume, panning, mute, solo, effect inserts, and so on. And on the right, we have automation showing on top of some audio events. I think I'm automating the balance between the two channels in this example. Now the mixer and the transport both have real world physical analogs. If you go back, we've got tape machines with buttons on them and the mixer. It's only been with the advent of computer-based systems that we've got this intense capability for automation.

So in any case, those applications can be a bit unwieldy to deal with just a mouse and keyboard. So now we're starting to see devices like this. This is the Mackie control. Actually, it's the eMagic control. And as you can see, it has the same channel strips. There are mute and solo buttons. You can select tracks, transport controls in the lower right, a jog wheel, and so on. So we've got this movement back towards these richer control interfaces for these complex applications.

Now since these control surfaces right now all have these custom protocols, and we're seeing an increasing number of applications, not just digital audio workstations like Performer, as my example in the upper left there, but also Final Cut Pro is getting some increasingly advanced audio capabilities. So the purpose of MHC is to sit in the middle and abstract the differences between these various devices and make it so all these different applications can communicate through a single interface.

So looking in more detail about how these devices work, so this is the Mackie control again. A lot of them use MIDI as a transport. Some others use like Ethernet and so on. And in those situations, the hardware manufacturer can create a custom driver to deal with that transport layer. For devices that are MIDI-based, with MHC, we should be able to create just profiles that are data-driven for the most part to support these devices. And in either case, whether it's a driver or a custom protocol, or a custom profile rather, the purpose of that profile or the driver will be to translate what's going on at the transport layer into just a series of functional messages to the application, Sort of like AppleScript.

And just a little bit about MIDI here. This goes both into the implementation of MHC, but also how things are layered. Some applications will still want to deal with core MIDI directly if you're recording musical performances from a keyboard. But if you wanted to just deal with control surfaces and you're not a music application that's working with MIDI, you can deal directly with the MHC portion of the core MIDI framework and just speak in terms of functional messages without parsing MIDI directly. But since MHC has this relationship with Core MIDI, we've put it in the Core MIDI framework. It serves a lot of the same kind of purposes to your application in that it's a central interface to shared hardware. So that's why it's in that framework. And again, with MHC, you'll just get functional commands instead of many messages.

So from just the MHC API point of view, when you're dealing with devices, since you're just going to get these functional messages and not be dealing with the protocol directly, your application will be device independent. You'll just be able to work with any device for which there's an MHC driver or a profile. Through the API, the devices that appear to your application through the API will come from two places, either those data-driven profiles or from the driver plug-ins, which can automatically detect the presence of hardware.

So now with that background, I'll start just giving you an outline of what MHC looks like from an API point of view and do some basic tasks here. So you'll register as a client. You'll find the device that you want to talk to. You can find out what the device's capabilities are. You can get messages from the device and send feedback back to it.

So this little bit of code here is the one function call that connects your application to MHC, MHC client create. You pass a unique identifier for your application or client, and that's used because there's this concept of device-specific, or rather, application-specific preferences. My message callback is the function that will get called when messages arrive from the device. And you supply a run loop on which you'll receive those messages. And at the end, you get back this client object, which you can pass to other API functions.

And if you're familiar with the core MIDI calls, it's very parallel in MHC to locate devices. This little code fragment simply iterates through all the devices in the system, gets their name property, uses CFShow to print the name to the console, and releases it. So you could use a loop, something like this, to populate, say, a scrolling list of the devices that the user needs to choose just one to work with. So once having chosen a device to work with in the application, you can call MHC client connect device. And what that does is simply says, OK, so everything that this device sends to the computer, I want my client via that callback function to receive those messages.

And so once you're receiving those messages, here's what the message structure contains. Now, as you saw in the call to create a client, you can specify the run loop, and you have two choices here. One, you can specify a run loop, which in many cases would be your main run loop, and if you do that, then you have no thread safety issues. You can receive these MHC messages and update other graphic elements on the screen in response to those messages. There may be situations where you want to receive the MHC messages as soon as possible after they come in. And, for instance, that's typically what you're doing with MIDI messages in an application where you want to process them and send them back out quickly. So you can work either way. If you want to receive things on a run loop, then you don't have those thread safety issues that you would have if you chose to receive them on the high priority internal thread. So that's the trade off between the two choices. So inside this message structure, we get a timestamp. So if you're getting the messages in your main run loop and maybe you're doing a lot of drawing and you're not servicing the main run loop that often, relatively speaking, you do still get very accurate timestamps in host time as to when the event actually occurred. You get at the bottom of the structure the value of the control that was changed. And you also get this MHC function object which we'll look at in a second and that specifies the actual application function to be performed.

So the kinds of functions that MHC defines for being controllable from these control surfaces include transport, stop, start, rewind, move to a marker, and so on. Mixer and effects control, volume, pan, mute, solo, balance, EQ, delay times, anything like that. There are functions defined for changing views or modes in the application. For instance, in Logic there's -- well, let's take as an example, working with a MacCut control with Logic, there's one mode where all of the little rotary encoders across the top are altering effects parameters, and there's another mode where those are the PANs, the PAN controls for eight channels.

And there's also, kind of analogously to AppleScript, you can define custom application behavior that's controllable through MHC by having a private function suite. So inside the structure, the first member is a suite, which is a four-character code. And again, you can have a private suite there, but we'll have defined some common ones for those classes of functionality. So, there's the suite, there's a function within that suite, that function may need to be qualified in some way. For instance, if the function is track volume, then the group might say, "Okay, we're dealing with the tracks here," or it could also say, "I'm dealing with buses or outputs." And then the specifier says, "Okay, whether it's tracks or outputs, which track or output are we talking about here?" So that's how an MHC function is specified.

Now, going back to the code example, you've registered a client, you've connected to a device, and now you're going to start receiving messages from the device. Receiving input is pretty much straightforward, and the only thing about it is that it can be tedious because there are a lot of different messages you might want to respond to. So you end up writing a lot of switch statements. So you get this message callback. There are several other kinds of messages other than the control having changed, but the control having changed is the most common one, so we'll just follow that through for the moment.

So we'll dispatch to another function to do that because we have more switch statements. So here, after control has changed, we'll say, okay, so what suite is this function in? Okay, it's the transport suite. Given that it's the transport suite, which function is it? To keep this from getting completely full of code, I'm just showing stop and play. So you'll call your doStop and doPlay functions. And that's pretty much how your input code will end up looking, just lots of switch statements like that to dispatch the individual messages to your application's functions.

And just to continue here, some more examples, the channel strip, and here's the track number as the specifier as an example of how those other fields in the message work. And here we're handling the mute and volume messages, and we're passing along the value of the muter, the volume control as it was received.

Before we go further into the code example and look at how we send data back to the control surface, it's useful now to look at what's actually inside one of these MHC devices and how it's represented. Because to send feedback, we're going to have to go through and query the device a little more. So there's the device object, which is either a physical device or which may be an aggregation of other devices. For example, if you have two of those Mackie control devices, you can slave them together in such a way that they appear to the application. Instead of two separate devices each controlling eight channels, you get one device that can control 16 channels.

There's this concept of a configuration which comes into play when you have devices that are kind of generic MIDI controllers with presets. I won't go further into that because here I'm focusing on the devices with large suites of dedicated functionality. Generally these more complex devices will just have one configuration. So within the configuration, there are groups which basically just organize the actual controls, which we call elements, borrowing terminology from the USB HID spec.

So given elements, there are three types that you need to be concerned with. Some are input only, as you see there, a push button, also the jog wheel at the bottom right of the Mackie control. Those are the simple ones to deal with. They move, you get a message, you respond to them. Some devices, or some controls rather, are input devices with feedback, meaning that in some way they are showing to you what you're doing, whether it's a push button with the light on it to indicate you're in play mode, for instance. Here in the slide we've got a rotary encoder which shows some LEDs to indicate the current position of the dial. That same Mackie control also has motorized faders on the volume controls.

So those controls with feedback in them, the application has responsibility for sending feedback, sometimes even while that knob is actually being controlled. So you'll get some information about the element to determine how to handle that. And lastly, there are elements that are output or feedback only, meaning that you'll never receive a message from them, but you might want to send something to them. In the example here, we've got an LED display from the Mackie control that's showing the current position in the song.

So in terms of what you do in the application to actually send feedback, this is a brief overview of the steps to do this, you'll walk through the device's elements in some way to go and find the elements with feedback for the functions that the application supports. Then when something happens in the application, for instance, you're stopping or starting playing, and this applies whether you're stopping or starting playing in response to a message from the control surface or whether the user actually did something on the Mac, like push the space bar to start playing or whatever. But regardless of that source, when we start playing, we're going to want to go and provide feedback to the control surface about the place state. And so to do that, we'll have had to have located those elements which provide that kind of feedback.

So, looking at the code behind that, when we first start the application, we can make a call to MHC object, apply function to children, which essentially just traverses the entire device hierarchy of elements. And here we've defined a callback function, scan one element, and that will get called for every group and element inside the device. So inside the function scan1_element, info not equals null is MHC element info is a structure that contains a bunch of details about the element. And if it's null, then it's probably a group that we're looking at and we don't care. But if info is non-null and we look inside the info structure and it has feedback, then we can say, okay, let's look at this element. Does this element correspond to a function that we care about?

Here again, in the interests of brevity and space, I'm just showing one example. So here we're looking to see, OK, so is this the element in the transport suite reflecting the place date? And if so, then I'm going to cache a reference to that element in G play element. And then later I'll be able to use that. So in a more realistic example, you would be doing switch statements here again to keep track of the elements that correspond to your application's functions.

So inside your audio engine then, so earlier we saw there's functions to doStart and doStop. So the first thing you would do in doStart is whatever you have to do to start actually playing, load audio off disk or whatever, start the hardware. And then the last thing you would do is call this-- I factored out doStart and doStop so that they do all this engine work, and then they make a single call to update MHC place date to reflect that current place date to the MHC device.

So I've got a safety check in that function to see if gplay element actually exists. And if we found one when we started up, then I'll set the value local variable to a Boolean, true or false, whether we're playing or not. And then I can send that value to the element that corresponds to the play state. So if that element that we found earlier, for instance, was a little green LED sitting above the play button, then at this point we would be making that LED turn green. And that's feedback in a nutshell.

I just want to touch briefly on some of the more advanced features. There's touch and release which comes into play when dealing with automation. For those of you who've used Logic or Digital Performer or Pro Tools or whatever, these applications when playing back recorded volume, pan and effects automation parameters, it's great. sliders on the screen or if you've got a control surface you can control your mix from the hardware as it's going along and the program will just record all this and you'll get a little graph of what you've done when you're done.

Then what will happen often when you're mixing, you'll say, "Okay, I didn't quite get this part right. I want to change the volume curve just over this one little phrase here and then let it keep going the way it was." And the friendly way to do that is to put a track to overwrite automation mode.

Now, these control surface devices are pretty cool in that they don't only send messages -- well, not all of them, but the fancy ones -- not only will they send messages when you move the slider, but they will send a message when you touch the slider. So from the point of view of editing existing automation data or saying, "Okay, I just want to change part of it," you can touch it when you want the existing automation to stop playing, at which point then you're recording anything new that you do, or you could just hold it to erase it. And then when you release it, you're done recording and the previous automation keeps playing. So in any case, MHC supports this by having this concept of touch and release events. You'll get a message, "Okay, the volume slider on track one has been touched," and you can respond to that appropriately.

A lot of these devices have multiple modes, and this is the second advanced feature, meaning that controls, whether they're like arrow buttons or the rotary encoders across the top, do very different things depending upon the context of the application or a mode that's been selected on the control surface itself. To the extent possible, we're going to try to hide that inside the driver and profile. But that's something that's-- not fully baked like the rest of the API. But that's just something to be aware of.

The other interesting thing to think about with MHC, not only can we send simple volume commands, for instance, to a motorized fader, say, okay, slide up to the top or to 0 dB or whatever, but we also have these SMPTE or bar beat unit displays. And to do that, we've defined structures where you can say, okay, I'm going to send this SMPTE time to this element. and that's in there. As controls are moved, sometimes these devices will have facilities for displaying the name of the parameter that's currently being affected and its current value.

That's text labels. On many of these devices, we also have the ability to graphically display in some way what's happening with the parameter, whether it's the LED ring around the rotary encoder or even a bit mapped graphic, for instance, with a stereo spread parameter on an effect that might be illustrated as a graphic like that, but you might only have to send one value while specifying that feedback mode. In any case, these are all some of the more advanced details that are in the API.

So just to review, the basic idea here is that MHC takes care of a lot of this complexity under the hood of these control surfaces. Whereas in the past, to support them, you'd get deeply into parsing MIDI messages. Your whole internal representation of how to support a control surface is this large map of MIDI messages going to and from the device. And since there are so many devices that do it so differently, we'd like to pop up a level and let you just deal with these devices in a more functional manner. So we have a preliminary implementation in the SEED, and that's in the core MIDI framework, the header file media hardware control. And there's a fair amount of commenting there. And for those of you who are working on applications and hardware that can take advantage of this, We'd really like to hear from you. And we hope you'll keep in touch and let us know what you need from us about this. Thank you.

So that's Media Hardware Control. I'd just like to add a few words about the Core Audio file format. which has been around for a while now. This has been defined for, I believe, a couple of years now. The header file, audio toolbox, caffile.h has all the data structures. On the developer website, there's a pretty detailed formal description of the caffile format. And appropriate to this week's announcements about 64-bit support, unlike AI It has 64-bit chunks, so we don't have our 2 or 4-gigabyte file size limits. But it is chunky like AIFF and WAVE. So everything you know about how to parse those files is applicable in a general sense.

There's two basic approaches you can use for supporting CAF. You can either use our audio file API, which makes CAF appear as just another format like AIFF or WAVE, MP4, and so on. Or you may have cross-platform requirements, in which case you might look at an open source audio library.

But for that matter, you could write your own code. The specification is very detailed and clear. Thank you. And we're already seeing a number of applications already using CAF. iTunes can play CAF files. Digital Performer, I believe, can both read and write, as can Amadeus. But I'm just bringing this up because it's out there. We think it's a good format and wish more of you would please incorporate support for it in your applications.

Just one more. This is, yeah, two more things about this. One thing we really like about CAF is that it's really a fairly complete generic container format, meaning that pretty much any kind of audio data you can think of and describe with one of our audio stream basic description structures you can put in a CAF file, whether it's PCM, and within PCM, whether it's bigger, little, Lendian float or integer, 8, 24, 16, 32 bits. It can hold compressed formats.

It has a structure similar to an audio channel layout for describing channel layouts. It holds a magic cookie for decoding formats. And this reflects the way that our audio file and audio format or converter APIs think of files and what's in them as separate things. which is useful. You can have files even if you don't have the decoder to actually listen to the audio in it. The file is still intact.

And one more thing about CAF that I think is going to be even more interesting and important going forward is its support for metadata. Like AIFF, we've got regions and markers and loop points, but we've also defined some standard keys to go into a dictionary with metadata. For example, artist title, album, and so on. One thing that's crazy to look at sometimes is the way that all the different applications generate audio waveform summaries of the file. You'll notice a lot of applications when you open a file will go read through the whole thing just so that they can construct some miniature waveform view of it. CAF has some facilities for storing those waveforms in the file and it could save users some time and effort if developers were to use the CAF file itself to hold those overviews and multiple applications could look at and use those. CAF also supports comments about the editing history of the file and those are time stamped and human readable. And as I said before, the entire verbose clear CAP spec is online at the Apple Developer Audio site. And you can see it there. And that's the CAP file. Thank you very much.

Thank you Doug. So the rest of the talk we're going to go through the AudioQ API. This is a new API in Leopard. It's a high level API service to play buffers of audio, record buffers of audio in linear PCM, playback you can play any content. The header files in the audio toolbox framework. And by high level service we really have become very aware of some of the complexities developers have to deal with to do what should be fairly simple tasks. And so the idea of this API is to provide a much simpler interface where a lot of the work that Core Audio would do to get the data out of the system or get the data into your application is taken care of underneath this API. You'll notice similarities to, in some respects anyway, to the Sound Manager API with the AudioQ API. However, the Sound Manager is deprecated in Leopard and this is a formal deprecation. I think informally we've been deprecating it for some years now. But Sound Manager will also not be available in 64-bit. All of the Cordio frameworks are available to 64-bit applications. So if you're going forward with your application, now is a very good time, I think, to move off SoundManager if you've still got code there and this API should be very appropriate for that kind of use.

So when we go through this, there's several roles we want to describe in the queue. There's the creation of the queue itself. It's how the queue manages its buffers, how it manages state like start and stop, priming, pausing, resuming, what reset is and how it works. You can send parameters to the queue so that you can change playback characteristics of the queue. And the queue also provides some timing services. We'll just have a brief overview of what that is.

One of the things, as we've already mentioned, is the queue handles both input and output, and the way that we've decided to do this is to share as much of the code and the API between both roles. And so when you have an API that's specific for output, as you'll see in a moment, to create an audio queue object to do output, that API will be audio queue, new output, whatever it's called. And same for input. but a lot of the cases you'll actually see that it's an AudioQ API and you can use it for either input or output.

The QObjects themselves are not doing input and output together. They're either an input or an output object. person. So, when describing what the queue does and how it works, I thought the best way to approach this would be to look through an example of what it would take to play a file with the queue. The code itself uses two primary, two APIs in the audio toolbox, the audio file API that Doug just sort of mentioned in the context of CAF files, and the audio queue API, and the audio queue API is obviously the thing that interests us. Because we're dealing with the audio file API, the code is also going to be very general and it's going to deal with any kind of format that Cordia can understand, whether it's compressed, variable, bitrate audio or linear PCM. The code itself that we'll be looking through will deal with those circumstances.

So, when we play a file, the basic jobs we have to do is to open the file and we need to read some data about the file. And then we use that information to create the audio cue and to configure it. Then we'll need to allocate buffers to read the data into. And then those are the buffers that we'll cue up to the cue. Then we'll start the cue object to play and then that'll decode existing buffers. Then the queue's runtime state is managed in its communications to the client with a callback and so the queue will call a callback and you'll see what we'll do to deal with that callback and keep the program doing its work. Then at the end of the file we dispose and clean up. So they're the main steps that we're going to have a look through now.

There's no real demo here. The demo is beep. Play the file. So AQ test and the name of the file that you want to play. The first thing we have to do to play the file is to find out, is to open the file. And so the next line is to make an FSREF from the file path and then audio file open will open that file.

And if that succeeds of course then we've got a valid audio file. You'll notice the last argument there to the audio file open call is myinfo.m_audio_file. My info is a structure that we're going to define in our program, which I've just given it a type of AQ test info. That contains basically the information that we need to configure or keep around in order to run this program. the file ID, the description of the format that's in the file, the data format that's contained within the file, the queue itself, the buffers that we're going to use with the queue, where we're reading from in the file, how many packets of the file we can read at a particular time and packet descriptions which we'll go into in a minute. And then the other thing aside from just the procedural code of calling the APIs that we the only way to define is the callback, and we'll have a look at that as well.

So our next step is to get the format from the file, the data format that's contained within the file, and then once we have that, we have enough information to create an output queue. Because when you create an output queue, the one thing you should define is the format that the queue is going to be dealing with, the format of the data it's going to be dealing with. So we read that data format, create the queue. You'll notice that I'm specifying see if run loop get current and see if run loop common modes. And if I passed in null for those, that would be the value of those two arguments. And what they define, as you saw with MHC as well, is that they define the thread and the mode that the queue object can call your callback on. And that gives you some control about the priority of the thread that you're going to use to fill the buffers and you can have that behavioral control in your program. And then we get back a QObject from this call.

So with buffer management, we allocate buffers, but we ask the queue to allocate the buffers for us, and the queue will own the buffers. But before we can allocate the buffers, we need to know some things about the format. Basically, you know, how big we want, which at the moment in the program will just keep a constant size. But we also want to really know if we're dealing with variable bitrate or constant bitrate audio. And just as a review, I thought we'll go through the characteristics of these two audio types mean. So CVR audio, constant bitrate audio, is linear PCM, which is an uncompressed audio.

There's also a collection of CVR codecs. IMA4 is one, QDesign is another, and they generate packets of audio that have both the same size of bytes, number of bytes in each packet, and the number of sample frames in each packet is also the same. Linear PCM, there's one frame per packet, so in stereo linear PCM you'd have left, right, left, right, and the left and right together is what we call a packet. In compressed audio it will be a block of audio. So you can see the diagram there that CBR, every packet is the same size. In the AudioStream basic description, bytes and frames per packet are both specified with non-zero values.

Now in our program we've just got an idea, 64k value for G buffer size bytes. And then we can use the bytes per packet field to tell us how many packets we can read at a given time. And so that's what we need to understand in our program. And this is of course based on the data format contained within the file. And because it's CBR, we can calculate from any place where the packets are. So we don't need packet descriptions. We don't need any external information to tell us the packet 20 is here. Now, that's not true in VBR. So in VBR, we have to do a little bit more work.

So with VBR, let's try and understand, well, what is VBR? Each packet of audio is a different size in bytes. That's a common case. You can get VBR where you'll have also the frames contained in each packet differing, but the more common case is that the size of each packet is different.

An example is MPEG-4's AAC, MPEG-1 or 2, Layer 3, they're both VBR formats. Apple Lossless, FLAC, these are all VBR formats. In an AudioStream basic description, a VBR format, you can tell because the bytes per packet will have a value of zero. In other words, we don't know. Each packet's different.

And so now to locate a packet in just a big blob of data, we need to know a couple of bits of information. We need to know where in that blob of data a packet is, and we need to know the size of that particular packet. So the audio stream packet description, which is a structure in CordioTypes, it defines both where each packet begins and how big each packet is. And the diagram there is fairly clear that each packet can be a different size.

So in our program we use the complete test which is bytes or frames per packet is zero. In other words the file can't give you a constant value for either of those two fields. Now we have VBR data and then we go and we ask the file what is the largest size of the packets of data that you have in your file? What is the largest size of one of those packets? If I go back to the preceding slide, you'll see the, I'm going to use the laser pointer because everyone tells me not to, but you see this guy here is the largest packet. But I don't want to have to start right from the beginning and go all the way through because in some formats that might mean, some file formats, that might mean I'll have to read the whole file. So the call that we're using here is an estimation and the file can usually, if it doesn't know without having to pass the whole file, it can make an estimation. And this will be a conservative estimation.

And that will give you back the maximum packet size that could be contained in that file. And so this is a cheap call. There's another call where you can get the actual value, but for some formats like MP3 as an example, with a large MP3 file, that can take you a long time because you have to read the whole file.

packet descriptions. So this is all just sort of set up. The code is actually, it's a lot of explanation. The code's only about six or seven lines in the test code. And the test code is actually on your example code CD. So now our next job, once we've determined that we're dealing with the VBR format, is that we have to prime the queue. And we've got to allocate the buffers that we're reading to, which will be our G buffer size that we saw previously. And then we need to get the queue ready to play by filling out these buffers before we actually play back. And that's where we'll see our callback come into action first.

So with the priming of our queue, there's a couple of things we want to do. First of all, we're not done. So we haven't even started to read the file yet. So we'll set done to false. And because we're going to start playback from the start of the file, the packet index is zero. And then we have a call to allocate the buffer. And like I said, we make this call on the queue.

We don't just mallet the buffers. And the queue owns the buffers, and there's a couple of reasons why we do it this way. One is that one of the biggest problems that we had with developers and internal programs over the years with the sound manager was an unspecified idea of when buffers can be freed, and you'd get buffers being freed while the sound manager was still using them and so forth. And so we decided this time that we would make the queue actually responsible for the buffers, and the client requests the buffers to be allocated by the queue. And so after this call, the queue is allocated a buffer for me based on how many bytes I asked it for. And then I'm going to actually call my callback myself here, and I'm going to provide it the same arguments that the queue would, because it's really going to be the same code. It's just that in this part of the program, I haven't actually started to play the file yet. I'm just priming it. So let's have a look at what this callback will actually do. And in a normal case, it's called by the queue when it's finished processing an enqueued buffer. Now, it doesn't mean that the queue is finished playing the contents of that buffer, but it's gone through and done the internal conversions and everything, and it's finished with that buffer. doesn't need that buffer anymore.

And so the callback is to say, here, I've finished with this buffer. And the buffer that it's finished with, the queue provides in the callback. The contents of that buffer haven't been changed. So if you want to re-queue that buffer, you can, even just from the callback. And in our example, AQ test, what our callback does is reads more data from the last location in the file and then looks for an EOF to take termination actions. So this is what the callback looks like.

The things to note here is that even when we're finished reading from the file, we may still get callbacks because we're going to be a couple of buffers ahead of what the queue is actually playing. The queue will keep calling us to say, hey, I finished with this buffer. The first thing we need to do is make sure that we haven't finished reading to the end of the file. If we have, then there's nothing for the callback to do. You'll see why. that is in a minute. Then the next step is to read as many packets as we can from the file and this is going back to our previous calculations of the number of packets to read which we calculated based on VBR or CBR and so forth. Then we call audio file read packets. It's going to tell us how many bytes it actually read in num bytes and we're going to get it to read straight into our QsBuffers M_AudioData field which is just a void star data. And then if we have packet descriptions there provided here, audio file read will fill in the packet description so that void star will at the end of this call if it's variable bitrate data, M_AudioData will contain a chunk of audio data. desks will be telling us where each of those packets are and how big each packet is. So Audio File Read Packets does a lot of the work for us here.

So if we read some packets, the number of packets, n packets that we get back will be greater than zero. So now we know that we read some data from the file and we need to re-queue that buffer to the queue so that it plays. So we use the numbytes, which audio file read packets also fills out for us, to tell the queue's buffer how many bytes there were. And then we re-queue the buffer.

And this is a simple version of the call. There's a more complex one that I'll mention briefly later. But in this call, it's basically there's two primary arguments we need to provide, which is which queue and the buffer. Now if it's variable bit rate, we need to also provide packet descriptions. So I make a test here. Do I have packet disks? And you'll remember in the constant bit rate case, we set that to null. So if that's null, then I'm not going to be providing packet descriptions, and there will be zero and null. If it is not null, then I'll be giving it some number of packet descriptions. Then I increment my packet index so that the next time I come in here, I read from where I last read. Now, if we didn't read any packets, then we're finished. We're finished reading the data from the file. I'm going to call audio queue stop.

Now, you'll notice that I'm passing a Boolean here, true or false. NICU, false, and I'll explain and go into this more in a moment, but basically this is an asynchronous version of this call. And then I set my Boolean to say, "Hey, I'm done. I'm done reading the file."

Okay. To review where we've got to, we've done all of the setup that we need to do, including reading a couple of buffers into and queuing to the queue. So all we need to do now is start the queue. We need to task the threads run loop in the default mode and that will allow the queue to call that callback.

And then the callback will stop the queue queue asynchronously and the callback also sets the terminate mdone. Now we're not actually done playing, we're just done reading. We don't know when we're going to be done playing. So there's a notification and the queue has a notification of a property API like other Cordio APIs and the is running notification will tell you when the queue actually is running and when it changes from one state to another. And so we would use this notification to actually know when the queue stops. And then we'd dispose and clean up. And so the code for all of this is quite simple, except that we don't have isRunning implemented, so I can't show you that. But what I'm going to do here is basically start the queue.

I'm just going to loop around and finish reading. Now I know, OK, well, I finished reading, but I haven't actually finished playing yet. I'm just going to wait for another two seconds, but this is where I'd have my is running call and maybe I'd wait on another variable and just task the run loop again. Once I'm done with that two seconds, I'd do audio queue dispose, audio file close and I'm done. The program just exits, terminates is a better word there.

Okay, that's it. That's all that's involved. The entire code is, I think the source code is about that long at 10 point. So it's really a lot less code than what you would have to write in the past. And it becomes a very descriptive process for something as simple as this in that you really are just doing it in the tasks. And the AudioQ API has been kind of designed to be this kind of a descriptive type of API. API rather than, you know, our other APIs tend to get a little more convoluted in some ways because they're dealing with complex things too. But these are the steps that we and as I said the code is available. So a couple of things I want to go into more, and that is the synchronous and asynchronous behaviour of the QObjects.

There's three API calls, and they're all related to each other, as we'll see. Audio queue reset, audio queue stop, and audio queue dispose. They all take Boolean parameters. Intermediate can be true, which means do it now. False is basically to queue that command. And it allows the client to customize the effect of these commands. And it's not really what they'll do, because after reset is executed, whether it's now or later. The queue will be in the same state. It's just whether the command executes now or later that you're customizing. Thank you. So let's have a look at what the calls actually do when they execute.

So AudioQ Reset, basically in both cases, its role is to flush out the output of the conversion process, make sure there's no state left in the converter so that the next time it would be used, that that state would come out the converter's output. That would be undesirable. So the difference about reset in asynchronous or synchronous mode is when that action occurs. In the asynchronous case, the command is queued and this occurs after the last currently queued buffer is actually processed by the queue.

and then it will do its actions of resetting it. In the synchronous case, it actually does it immediately. So it will cut off any audio that's playing. It's going to dequeue any remaining buffers that are in the queue object and reset the conversion process. So you end up in the same state, it's just the synchronous version is actually more disruptive.

Now, stop will do the reset actions for you. And then, after the reset is completed, if it's asynchronous, the queue is stopped when all of that output is currently sent out. In the synchronous case, the queue is stopped immediately. And this is where the is running notification is important because in the asynchronous case, you don't know when the queue is actually stopped. You've just queued a command that will execute at some point in the future.

So that's a great way to know when that command is actually executed. Now, we also have a dispose call, and that can be called asynchronously. And it will do the reset and the stop actions for you. And when both of those are completed, the queue is disposed. Now, in the asynchronous case, it's a little strange because you're telling the queue, go away. But it's actually still going to do some work for you. One of the rules that we're going to apply with the asynchronous version is that it's going to be as if the object is disposed. There'll be no more callbacks, there'll be no notifications, there'll be no ability to make any other API calls. The object for you as the client is gone. In the synchronous case, of course, this just happens immediately like you would expect.

So in our callback, we took advantage of this because we were done reading the file, but we probably weren't done playing it. So we did the asynchronous version of stop, and we just did the false, and we made sure we've got no more data to provide to the queue and everything. So the other thing that the stop will do with the reset is that any state that's in the converter when it gets to this point will be flushed out for us, so we'll hear the end of the file. The general case of this is the reset call, and that's really the call that does all of the work. In the stop case, it's just going to stop the device as well. Thank you.

So, some of the features that aren't in the code but are in the queue object are really around four major parts of the queue's capabilities: priming, pause/resume, parameters and timing. So let's have a quick look at what they do. Audio Queue Prime is an API call you can make directly in your code and it's to tell the you to start doing its internal processing before you actually start it. If you don't call this then the start call will do the priming for you. Thank you.

Pause and resume is an important notion to support. Audio cue pause will not change any underlying state of the device. Normally start would start the device and stop will stop the device. Pause will stop playback of the cue to the device, but it's not going to actually stop the device itself. And it also doesn't flush or alter any state. It doesn't reset anything in the cue itself. It just leaves the cue kind of suspended. And then to resume it, you just call audio cue start. And this is where reset could come in handy because in the reset case, which we saw kind of as it's being used in stop above, we called the asynchronous version. In the pause case, we may want to call reset if when we're going to play back, we're going to start playing back from a different location. We'd want to flush everything out. Thank you.

but normally you won't. So parameters enable you to control the playback operation of the cue. There's parameters that can be applied immediately, which is the audio cue set parameter call. You can also schedule parameters to apply with a particular buffer that you're enqueuing, and there's a more extended version of the enqueue buffer that enables you to specify those parameters. Currently we're only defining volume, but we will be defining more parameters in the future and of course your feedback and comments on parameters you'd like to see a supporter requested and valued.

So timing enables several features for the audio cue. You can have multiple cues synchronised through use of timestamps. You can start playback of a cue at a particular time in the future. You can enqueue a buffer to be played to begin playback at a particular time in the future. And you can also create timeline objects to detect discontinuities in presentation time. For instance, if the device's sample rate changes or some other format of the device changes, that would be a timing discontinuity. So you may want to take account of that.

time since the last time the queue was started. In the presentation time it would be the last time since the device was started. And then host times provide an independent frame of reference so that you can do enqueuing and so forth without reference to particular sample timelines. So that's basically the queue. Much of the complexity of playing recording buffers I I think is dealt with in an accessible way. There's a lot of comments in the header, as with the MHC things. And please let us know your needs, your requests, other ways that you'd like to see this used. We value your feedback. For more information in general, we have the Cordy API list. There's general development information, including the CAF spec and other documents available through the audio developer's website at Apple.