Core Audio Update - WWDC 2006

Graphics and Media • 1:03:11

Core Audio is the world-class audio architecture in Mac OS X. Go in-depth with Core Audio's new high-level audio services for playing and recording audio. You'll also discover Core Control, a new API set to represent control surfaces such as mixers and other audio hardware.

Speakers: Doug Wyatt, Bill Stewart

Unlisted on Apple Developer site

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.

Hello, my name's William Stewart and I work for Apple on the Core Audio team. Welcome to session 209, Core Audio Update.

[Transcript missing]

Thanks, Bill. So the Media Hardware Control, I'll start out just by introducing what it is and who it's for, talk a bit about some of the devices that it supports. They're kind of intense and complicated in some ways. And then I'll go through some basic tasks in the API for receiving input from these devices and sending feedback back to them.

So here's a screenshot, actually several windows from Emagic's Logic application. And this is fairly typical of today's digital audio workstation. In the upper right, we've got the transport control with lots of buttons, your current locators, locators where you're going to punch in and out. Over here we have the mixer window. And this is kind of a pretty direct analog to an old-fashioned mixing console. We have channel strips, volume, panning, mute, solo, effect inserts, and so on.

And on the right we have automation showing on top of some audio events. I think I'm automating the balance between the two channels in this example. Now the mixer and the transport both have real-world physical analogs. If you go back, we've got tape machines with buttons on them and the mixer. It's only been with the advent of computer-based systems that we've got this intense capability for automation.

So, in any case, those applications can be a bit unwieldy to deal with with just a mouse and keyboard, so now we're starting to see devices like this. This is the Mackie control. Actually, it's the eMagic control. And as you can see, it has the same channel strips, there are mutant solo buttons, you can select tracks, transport controls in the lower right, a jog wheel, and so on. So, we've got this movement back towards these richer control interfaces for these complex applications.

Now since these control surfaces right now all have these custom protocols, and we're seeing an increasing number of applications, not just digital audio workstations like Performer is my example in the upper left there, but also Final Cut Pro is getting some increasingly advanced audio capabilities. So the purpose of MHC is to sit in the middle and abstract the differences between these various devices and make it so all these different applications can communicate through a single interface.

So looking in more detail about how these devices work, so this is the Mac-E control again. A lot of them use MIDI as a transport. Some others use like Ethernet and so on. And in those situations, the hardware manufacturer can create a custom driver to deal with that transport layer.

For devices that are MIDI-based, with MHC, we should be able to create just profiles that are data-driven for the most part to support these devices. And in either case, whether it's a driver or a custom protocol, or a custom profile rather, the purpose of that profile or the driver will be to translate what's going on at the transport layer into just a series of profiles. For devices that are MIDI-based, with MHC, we should be able to create just profiles that are data-driven for the most part to support these devices.

And just a little bit about MIDI here. This goes both into the implementation of MHC, but also how things are layered. Some applications will still want to deal with Core MIDI directly if you're recording musical performances from a keyboard. But if you wanted to just deal with control surfaces and you're not a music application that's working with MIDI, you can deal directly with the MHC portion of the Core MIDI framework and just speak in terms of functional methods.

So from just the MHC API point of view, when you're dealing with devices,

[Transcript missing]

Through the API, the devices that appear to your application through the API will come from two places, either those data-driven profiles or from the driver plug-ins, which can automatically detect the presence of hardware. So this little bit of code here is the one function call that connects your application to MHC. MHC client create. You pass a unique identifier for your application or client.

And that's used because there's this concept of device-specific, or rather application-specific preferences. My message callback is the function that will get called when messages arrive from the device. And you supply a run loop on which you'll receive those messages. And at the end, you get back this client object, which you can pass to other API functions.

[Transcript missing]

So you could use a loop, something like this, to populate, say, a scrolling list of the devices that the user needs to choose just one to work with. So once having chosen a device to work with in the application, you can call MHC Client Connect Device. And what that does is simply says, OK, so everything that this device sends to the computer, I want my client via that callback function to receive those messages.

[Transcript missing]

You get at the bottom of the structure the value of the control that was changed. And you also get this MHC function object which we'll look at in a second and that specifies the actual application function to be performed. So the kinds of functions that MHC defines for being controllable from these control surfaces include transport, stop, start, rewind, move to a marker, and so on. Mixer and effects control volume, pan, mute, solo, balance, So the kinds of functions that MHC defines for being controllable from these control surfaces include transport, stop, start, rewind, move to a marker, and so on.

Mixer and effects control volume, pan, mute, solo, balance, And there's also, also kind of analogously to Apple Script, you can define custom application behavior that's controllable through MHC by having a private function suite. So inside the structure, the first member is a suite, which is a four-character code. And again, you can have a private suite there, but we'll have defined some common ones for those classes of functionality.

So there's the suite, there's a function within that suite. That function may need to be qualified in some way. For instance, if the function is track volume, then the group might say, okay, we're dealing with the tracks here. Or it could also say I'm dealing with buses or outputs. And then the specifier says, okay, whether it's tracks or outputs, which track or output are we talking about here? So that's how an MHC function is specified.

Now, going back to the code example, you've registered a client, you've connected to a device, and now you're going to start receiving messages from the device. Receiving input is pretty much straightforward. The only thing about it is that it can be tedious because there are a lot of different messages you might want to respond to. So you end up writing a lot of switch statements. So you get this message callback. There are several other kinds of messages other than the control having changed, but the control having changed is the most common one, so we'll just follow that through for the moment.

So we'll dispatch to another function to do that because we have more switch statements. So here, after control has changed, we'll say, OK, so what suite is this function in? OK, it's the transport suite. Given that it's the transport suite, which function is it? To keep this from getting completely full of code, I'm just showing stop and play. So you'll call your doStop and doPlay functions. And that's pretty much how your input code will end up looking, just lots of switch statements like that to dispatch the individual messages to your application's functions.

Just to continue here are some more examples. The channel strip, and here's the track number as the specifier as an example of how those other fields in the message work. And here we're handling the mute and volume messages, and we're passing along the value of the muter, the volume control as it was received.

Before we go further into the code example and look at how we send data back to the control surface, it's useful now to look at what's actually inside one of these MHC devices and how it's represented. Because to send feedback, we're going to have to go through and query the device a little more.

There's this concept of a configuration which comes into play when you have devices that are kind of generic MIDI controllers with presets. I won't go further into that because here I'm focusing on the devices with large suites of dedicated functionality. Generally, these more complex devices will just have one configuration. So within the configuration, there are groups which basically just organize the actual controls, which we call elements, borrowing terminology from the USB HID spec.

So given elements, there are three types that you need to be concerned with. Some are input-only. As you see there, a push button, also the jog wheel at the bottom right of the Mac key control. Those are the simple ones to deal with. They move, you get a message, you respond to them. Some devices, or some controls rather, are input devices with feedback, meaning that in some way they are showing to you what you're doing, whether it's a push button with a light on it to indicate you're in play mode, for instance.

Here in the slide we've got a rotary encoder which shows some LEDs to indicate the current position of the dial. That same Mac key control also has motorized faders on the volume controls. So those controls are the same as the dial. They're not the same as the dial. They're controls with feedback in them. The application has responsibility for sending feedback, sometimes even while that knob is actually being controlled. So you'll get some information about the element to determine how to handle that.

And lastly, there are elements that are output or feedback only, meaning that you'll never receive a message from them, but you might want to send something to them. In the example here, we've got an LED display from the Mac key control that's showing the current position in the song.

So in terms of what you do in the application to actually send feedback, this is a brief overview of the steps to do this. You'll walk through the device's elements in some way to go and find the elements with feedback for the functions that the application supports. So in terms of what you do in the application to actually send feedback, this is a brief overview of the steps to do this. You'll walk through the device's elements in some way to go and find the elements with feedback for the functions that the application supports. place state. To do that, we'll have had to have located those elements which provide that kind of feedback.

So looking at the code behind that, when we first start the application, we can make a call to MHC object, apply function to children, which essentially just traverses the entire device hierarchy of elements. And here we've defined a callback function, scan one element, and that will get called for every group and element inside the device.

So inside the function scan one element, info not equals null is MHC element info is a structure that contains a bunch of details about the element. And if it's null, then it's probably a group that we're looking at and we don't care. But if info is non-null and we look inside the info structure and it has feedback, then we can say, "Okay, let's look at this element. Does this element correspond to a function that we care about?" And here, again, in the interests of brevity and space, I'm just showing one example.

So here we're looking to see, "Okay, so is this the element in the transport suite reflecting the play state?" And if so, then I'm going to cache a reference to that element in gplay element. And then later I'll be able to use that. So in a more realistic example, you would be doing switch statements here again to keep track of the elements that correspond to your application's functions.

So inside your audio engine then, so earlier we saw there's functions to do start and do stop. So the first thing you would do in do start is whatever you have to do to start actually playing, load audio off disk or whatever, start the hardware. And then the last thing you would do is call this, I factored out doStart and doStop so that they do all this engine work and then they make a single call to update MHC PlayState to reflect that current PlayState to the MHC device.

So I've got a safety check in that function to see if gplay element actually exists. And if we found one when we started up, then I'll set the value local variable to a Boolean, true or false, whether we're playing or not. And then I can send that value to the element that corresponds to the play state. So if that element that we found earlier, for instance, was a little green LED sitting above the play button, then at this point we would be making that element, that LED, turn green. And that's feedback in a nutshell.

I just want to touch briefly on some of the more advanced features. There's touch and release which comes into play when dealing with automation. For those of you who've used Logic or Digital Performer or Pro Tools or whatever, these applications when playing back recorded volume, pan and effects automation parameters, it's great.

You can move the sliders on the screen or if you've got a control surface you can control your mix from the hardware as it's going along and the program will just record all this and you'll get a little graph of what you've done when you're done. Then what will happen often when you're mixing, you'll say, "Okay, I didn't quite get this part right. I want to change the volume curve just over this one little phrase here and then let it keep going the way it was." And the friendly way to do that is to put a track into overwrite automation mode.

Now, these control surfaces devices are pretty cool in that they don't only send messages, well, not all of them, but the fancy ones, not only will they send messages when you move the slider, but they will send a message when you touch the slider. So from the point of view of editing existing automation data or saying, okay, I just want to change part of it, you can touch it when you want the existing automation to stop playing, at which point then you're recording anything new that you do, or you could just hold it to erase it.

And then when you release it, you're done recording, and the previous automation keeps playing. So in any case, MHC supports this by having this concept of touch and release events. You'll get a message. Okay, the volume slider on track one has been touched, and you can respond to that appropriately.

A lot of these devices have multiple modes, and this is the second advanced feature, meaning that controls, whether they're like arrow buttons or the rotary encoders across the top, do very different things depending upon the context of the application or a mode that's been selected on the control surface itself. To the extent possible, we're going to try to hide that inside the driver and profile, but that's something that's... Not fully baked like the rest of the API. But that's just something to be aware of.

The other interesting thing to think about with MHC, not only can we send simple volume commands, for instance, to a motorized fader, say, okay, slide up to the top or to 0 dB or whatever, but we also have these SMPTE or Barbie unit displays. And to do that, we've defined structures where you can say, okay, I'm going to send this SMPTE time to this element. And that's in there. As controls are moved, sometimes these devices will have facilities for displaying the name of the parameter that's currently being affected and its current value. And that's text labels.

On many of these devices, we also have the ability to graphically display. In some way, what's happening with the parameter, whether it's the LED ring around the rotary encoder or even a bit mapped graphic, for instance, with a stereo spread parameter on an effect that might be illustrated as a graphic like that, but you might only have to send one value while specifying that feedback mode. In any case, these are all some of the more advanced. advanced details that are in the API.

So just to review, the basic idea here is that MHC takes care of a lot of this complexity under the hood of these control surfaces. Whereas in the past, to support them, you'd get deeply into parsing MIDI messages. Your whole internal representation of how to support a control surface is this large map of MIDI messages going to and from the device.

And since there are so many devices that do it so differently, we'd like to pop up a level and let you just deal with these devices in a more functional manner. So we have a preliminary implementation in the seed, and that's in the Core MIDI Framework, the Header File Media Hardware Control.

And there's a fair amount of commenting there. And for those of you who are working on applications and hardware that can take advantage of this, we'd really like to hear from you. And we hope you'll keep in touch and let us know. what you need from us about this. Thank you.

So that's Media Hardware Control. I'd just like to add a few words about the Core Audio file format.

[Transcript missing]

But it is chunky like AIFF and WAV, so everything you know about how to parse those files is applicable in a general sense. There's two basic approaches you can use for supporting CAF.

You can either use our audio file API, which makes CAF appear as just another format like AIFF or WAV, MP4, and so on. Or you may have cross-platform requirements, in which case you might look at an open-source audio library. But for that matter, you could write your own code. The specification is very detailed and clear.

And we're already seeing a number of applications already using CAF. iTunes can play CAF files. Digital Performer, I believe, can both read and write as can Amadeus. But I'm just bringing this up because it's out there, we think it's a good format, and wish more of you would please incorporate support for it in your applications.

Just one more. This is, yeah, two more things about this. One thing we really like about CAF is that it's really a fairly complete generic container format, meaning that pretty much any kind of audio data you can think of and describe with one of our audio stream basic description structures you can put in a CAF file, whether it's PCM. And within PCM, whether it's big or little endian, float or integer, 8, 24, 16, 32 bits. It can hold compressed formats.

It has a structure similar to an audio channel layout for describing channel layouts. It holds a magic cookie for decoding formats. This reflects the way that our audio file and audio format or converter APIs think of files and what's in them as separate things. Which is useful. You can have files even if you don't have the decoder to actually listen to the audio in it. The file is still intact.

[Transcript missing]

And as I said before, the entire verbose clear CAP spec is online at the developer, Apple, the Apple developer audio site. And you can see it there. And that's the CAP file. Thank you very much.

Thank you, Doug. So the rest of the talk we're going to go through the AudioQ API. This is a new API in Leopard. It's a high-level API service to play buffers of audio, record buffers of audio in linear PCM, playback you can play any content, the header files in the audio toolbox framework. And by high-level service, we really have become very aware of some of the complexities developers have to deal with to do what should be fairly simple tasks.

And so the idea of this API is to provide a much simpler interface where a lot of the work that Core Audio will do to get the data out of the system or get the data into your application is taken care of underneath the system. So we're going to go through the core audio framework and we're going to go through the core audio framework and we're going to go through the core audio framework. You'll notice similarities to, in some respects anyway, to the Sound Manager API with the AudioQ API.

However, the Sound Manager is deprecated in Leopard and this is a formal deprecation. I think informally we've been deprecating it for some years now. But Sound Manager will also not be available in 64-bit. All of the Core Audio frameworks are available to 64-bit applications. So if you're going forward with your application, now is a very good time, I think, to move off Sound Manager if you've still got code there and this API should be very appropriate for that kind of use.

So when we go through this, there's several roles we want to describe in the queue. There's the creation of the queue itself. It's how the queue manages its buffers, how it manages state like start and stop, priming, pausing, resuming, what reset is and how it works. You can send parameters to the queue so that you can change playback characteristics of the queue. And the queue also provides some timing services. And we'll just have a brief overview of what that is.

One of the things, as we've already mentioned, is the queue handles both input and output. And the way that we've decided to do this is to share as much of the code and the API between both roles. And so when you have an API that's specific for output, as you'll see in a moment, to create an audio queue object to do output, that API will be audio queue, new output, whatever it's called. And same for input, but a lot of the cases you'll actually see that it's an audio queue API and you can use it for either input or output. The QObjects themselves are not doing input and output together. They're either an input or an output object.

So when describing what the queue does and how it works, I thought the best way to approach this would be to look through an example of what it would take to play a file with the queue. The code itself uses two primary, two APIs in the audio toolbox, the audio file API that Doug just sort of mentioned in the context of CAF files, and the audio queue API, and the audio queue API is obviously the thing that interests us.

Because we're dealing with the audio file API, the code is also going to be very general and it's going to deal with any kind of format that Core Audio can understand, whether it's compressed, variable bitrate audio, or linear PCM. So the code itself that we'll be looking through will deal with those circumstances.

So when we play a file, the basic jobs we have to do, is to open the file and we need to read some data about the file. And then we use that information to create the audio cue and to configure it. Then we'll need to allocate buffers to read the data into.

And then those are the buffers that we'll cue up to the cue. Then we'll start the cue object to play. And then that'll decode existing buffers. And then the cue's runtime state is managed in its communications to the client with a callback. And so the cue will call a callback and you'll see what we'll do to deal with that callback and keep the program doing its work.

And then at the end of the file we dispose and clean up. So they're the main steps that we're going to have a look through now. There's no real demo here. The demo is beep. Play the file. So AQ test and the next step is to create the demo. And then we'll have to do a demo of the demo.

And then we'll have to do a demo of the demo. And then we'll have to do a demo of the demo. And then we'll have to do a demo of the demo. And then we'll have to do a demo of the demo. And then we'll have to do a demo of the demo.

the name of the file that you want to play. The first thing we have to do to play the file is to find out is to open the file and so the next line is to make an FSRef from the file path and then audio file open will open that file.

If that succeeds of course then we have got a valid audio file. You will notice the last argument there to the audio file open call is myinfo.m audio file. The myinfo is a structure that we are going to define in our program which I have just given it a type of AQ test info and that contains basically the information that we need to configure or keep around in order to run this program. So it is the file ID, the description of the format that is in the file, the data format.

It is contained within the file, the queue itself, the buffers that we are going to use with the queue, where we are reading from in the file, how many packets of the file we can read at a particular time and packet descriptions which we will go into in a minute. Then the other thing aside from just the procedural code of calling the APIs that we will need to define is the callback and we will have a look at that as well.

So our next step is to get the format from the file, the data format that's contained within the file. And then once we have that, we have enough information to create an output queue. Because when you create an output queue, the one thing you should define is the format that the queue is going to be dealing with, the format of the data it's going to be dealing with. So we read that data format, create the queue.

You'll notice that I'm specifying see if run loop get current and see if run loop common modes. And if I passed in null for those, that would be the value of those two arguments. And what they define, as you saw with MHC as well, is that they define the thread and the mode that the queue object can call your callback on. And that gives you some control about the priority of the thread that you're going to use to fill the buffers. And you can have that behavioral control in your program. And then we get back a queue object from this call.

So with buffer management, we allocate buffers, but we ask the queue to allocate the buffers for us, and the queue will own the buffers. But before we can allocate the buffers, we need to know some things about the format. Basically, how big we want, which at the moment in the program, we'll just keep a constant size. But we also want to really know if we're dealing with variable bitrate or constant bitrate audio.

And just as a review, I thought we'll go through what the characteristics of these two audio kind of types mean. So CVR audio, constant bitrate audio, is linear PCM, which is uncompressed audio. There's also a collection of CVR codecs. IMA4 is one, QDesign is another. And they generate packets of audio that have both the same size of bytes, number of bytes, and the number of sample frames in each packet is also the same. Linear PCM, there's one frame per packet. So in stereo linear PCM, you'd have left, right, left, right.

And the left and right together is what we call a packet. And compressed audio, it will be a block of audio. So you can see the diagram there that CVR, every packet is the same size. In the audio stream basic description, bytes and frames per packet are both specified with non-zero values.

Now in our program, we've just got an idea of 64K value for G buffer size bytes. And then we can use the bytes per packet field to tell us how many packets we can read at a given time. And so that's what we need to understand in our program. And this is, of course, based on the data format contained within the file.

And because it's CBR, we can calculate from any place where the packets are. So we don't need packet descriptions. We don't need any external information to tell us the packet 20 is here. Now, that's not true in VBR. So in VBR, we have to do a little bit more work. So with VBR, let's try and understand, well, what is VBR? Each packet of audio is a different size in bytes. That's a common case. You can get VBR where you'll have also the frames contained in each packet differing.

But the more common case is that the size of each packet is different. An example is MPEG-4's AAC, MPEG-1 or 2, Layer 3. They're both VBR formats. Apple Lossless, FLAC. These are all VBR formats. And in an audio stream basic description, a VBR format, you can tell because the bytes per packet will have a value of zero. In other words, we don't know. Each packet's different.

And so now to locate a packet in just a big blob of data, we need to know a couple of bits of information. We need to know where in that blob of data a packet is, and we need to know the size of that particular packet. So the audio stream packet description, which is a structure in CordioTypes, it defines both where each packet begins and how big each packet is. And the diagram there is fairly clear in that each packet can be a different size.

In our program, we use the complete test, which is bytes or frames per packet is zero. In other words, the file can't give you a constant value for either of those two fields. Now we have VBR data. Then we go and we ask the file, what is the largest size of the packets of data that you have in your file? What is the largest size of one of those packets?

If I go back to the preceding slide, you'll see the, I'm going to use the laser pointer because everyone tells me not to. You see this guy here is the largest packet. But I don't want to have to start right from the beginning and go all the way through because in some formats that might mean, some file formats, that might mean I'll have to read the whole file.

So the call that we're using here is an estimation. And the file can usually, if it doesn't know specifically, without having to pass the whole file, it can make an estimation. And this will be a conservative estimation. And that will give you back the maximum packet size that could be contained in that file. And so this is a cheap call.

There's another call where you can get the actual value, but for some formats like MP3 as an example, with a large MP3 file, that can take you a long time because you have to read the whole file.

[Transcript missing]