Digital Media • 1:01:02
This session features Audio Units and Audio Codecs, component types used by the AudioConverter API to process audio data and convert audio formats. These two extension formats are covered in detail, including how Audio Units and Audio Converter are used, how to address UI and control issues, and how developers can write their own extensions.
Speakers: Jeff Moore, Doug Wyatt
Unlisted on Apple Developer site
Transcript
This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.
Good afternoon and welcome to session 508, Audio Units and Audio Codecs. I'm Craig Keithley. I'm Apple's USB and FireWire evangelist. Today we're going to talk about Audio Units and Audio Codecs in Mac OS X. And Audio Units are very cool. They provide a way by which you can create a graph of processing units where you can connect together a bunch of discrete -- I'll even say digital signal processing elements and process a signal through this graph in a nice, timely, time-synchronized manner. So to talk about that, let's bring up Jeff Moore, Core Audio Engineering.
In contrast to what Craig just said, I'm going to talk a little bit about the Audio Codec API. Audio Codecs are a new feature to Jaguar. They exist to plug into the Audio Converter to allow for more formats converting to and from all sorts of different stuff. Audio Codecs are basically a component API, just like Audio Units are. They obviously have different selectors than Audio Units, but they have the same basic layout. In Jaguar, like I said, the Audio Convertor is going to use Audio Codecs as its extendability mechanism.
There are three basic kinds of Audio Codec components. There are encoders, which transform linear PCM data into some other format. There are decoders, which translate that other format back into linear PCM. Then there are Unity Codecs, which transform between variants of the same format. An example of that would be an inter-float converter or a sample rate converter.
Audio Codecs are being components. They're discovered just like any other kind of component. You use component manager routines. In particular, you will use Find X Component to iterate through the list of different kinds of components. Then once you find one that you're interested in using, you can open a component on it to get an instance of it. Then when you're done, you just call Close Component and it will go away.
The Audio Codecs also have properties like most of the other core audio APIs, and they provide you a way to configure the transformation being performed by the codec. Like other properties, the value of the property is an untyped block of memory whose contents are agreed upon by the property's ID.
Some of the important properties for Codecs are the Requires Packet Description property, which says that the format you're dealing with requires external packetization information to describe how the data is broken up in the buffer. Then you have the Packet Frame Size, which tells you how many frames are in a given packet of data for this format.
Then you have the Variable Packet Byte Sizes property, which tells you whether or not the number of bytes in each packet is going to vary. This will be the case for variable bitrate formats like AC3, AAC, MP3, what have you. Then there's another property that tells you how big a packet is ever going to get for a given format. That's the Maximum Packet Byte Size property.
Some more important properties that allow you to configure the rough translation between that you're doing. You can set the input and output format. This is important. You may think that sounds a little weird. If you have some codecs will support outputting 16-bit integer and floating point, you might want to choose between them. You may want to choose other aspects about the formats. The codecs will also provide a list of what formats they support in each direction.
Then you also have the Magic Cookie property. The Magic Cookie property provides an untyped block of data that is private to its format. It's there to provide out-of-band configuration information about the transformation that you're about to do. Most advanced codecs need to have this information supplied before they can do any useful work.
Audio Codecs operate in two states. They can be initialized or uninitialized. You can move between these states using the Audio Codec initialized and Audio Codec uninitialized routines. The difference between the two states is that when a codec becomes initialized, that's a signal to the codec that it's going to lock down the transformation it's going to do and allocate all its buffers that it needs and load any tables that it needs to do its job. This is important because from that point on, until you uninitialize the codec, you can't change anything about the transformation that the codec is going to do.
So there are also properties that will only be able to be known when you have initialized the codec. A good example is the maximum packet byte size. You can only really ever know that once you've determined what the input format is, and you've supplied the magic cookie for that format, and then the codec has enough information in its possession to say, okay, the packets is never going to get any bigger than X number of bytes. And so you won't be able to find that information out until you actually initialize the codec.
To move data through the codecs, the codecs use a push and then pull model. Input data is provided to the codec using Audio Codec Append Input Data. The input data is then copied into an internal buffer of the codec. The codec will also return how much data it has consumed from your input buffer. You have to supply whole packets of information, not partial packets, if external packet descriptions are required for a format. A very important format for that that requires packet descriptions is MPEG-4 formats like AAC.
To get data back out of the codec, you use Audio Codec Produce Output Packets. Output is always produced in full packets of data. You can't get partial packets. Most codecs just don't have the capability to cache the data or partially process partial input packets. Audio Codec Produce Output Packets will also return a status value to tell you a little bit about what's going on internally in the codec, so that you can know whether or not you need to feed it more data or whether it has enough data to keep pulling out more packets. We provide an SDK for building Audio Codec components. It provides a C++ class library. I'm going to show you a little around that SDK here in a second.
Could I have demo four, So the Audio Codec SDK is provided, you know, it's just a set of C++ classes. The important, and since it's a component, since the actual packaging of the codec is done as a It's done as a component. The SDK implements all the nitty-gritty of dealing with the component aspect of things, and you can just override a few methods here and there. It's very similar to the way the Audio Unit SDK is laid out.
You have a base class that provides the abstraction points of all the routines in the Codec API. As you can see, there just really isn't a whole heck of a lot of routines in the Codec API. You have the property routines. You have initialize and uninitialize. You have append input data and produce output packets. Then you have the reset routine, which allows you to reset the codec's internal state. You can also reset and clear all its internal buffering to begin processing a new stream of data.
Then the Codecs, there are further specializations of the AC Codec base class that provide different levels of implementation. The first one is the AC Base Codec class. It goes about its job by just providing the raw mechanisms necessary to manage things like the input and output format lists, as well as dealing with the management of the magic cookie. It also will further break out the properties into individual virtual methods that you can override to make it easier to implement this stuff.
Then the final base class that's provided is AC Simple Codec. And AC Simple Codec goes the extra mile of providing some primitive buffer handling. It implements a ring buffer for input and provides routines for managing this ring buffer. And then in your subclass of this, you would override the other routines and use these routines to actually access the input data.
The SDK will come with a couple of preliminary implementations of some codecs. The one I have up here is an IMA codec. IMA4 is a pretty common codec. It's reasonable in quality. It's mildly lossy. The way the IMA codecs are brought up is they have a base codec class that wraps up all the common tables and the management of the channel statelist that is needed to do IMA encoding and decoding.
Then there are individual subclasses to implement a decoder and an encoder. This is the header file for the encoder class. As you can see, it doesn't really have a whole lot to do. It provides, it overrides the routine to figure out when someone changes the input format and when someone changes the output format. It also overrides produce output packets so that it can handle the encoding. Then IMA, this implementation of IMA is based... uses a static routine to actually do the signal processing.
I want to go through, just quickly, the produce output packets routine to give you a feel for what you're going to have to do to implement a codec. The first thing to note is that the codec has to be initialized before you can call produce output packets. You have to check, make sure that you're in the right state before you can do anything. If it's not, we throw an exception here. Since this is a C++ framework, all the exceptions are caught up at the component layer. Error codes are produced based on that.
In the codec, once you know you're initialized, you've got to go figure out how many packets are available in the input buffer. The IMA encoder is based off of the simple codec class, so it's using the ring buffer routines in its base class to figure out how big its input buffer is. Now, this codec is built to take only a 16-bit integer as its input, because that's what the IMA algorithm is defined to do.
So then, once it figures out where its input data is, and how much is there, and whether or not there's actually enough input data that we can produce an output packet, we find out how many packets the client has asked us to produce, and then we go through, we set a few things up, and then we end up calling the encode channel routine that just steps through each buffer and encodes each channel individually. Not the most efficient algorithm, but it does the job.
and the ENCODE channel is a pretty involved routine, and the IMA algorithm is available from the IMA website for those that are interested in it. That's pretty much all there is to the Codec SDK. Next, I'd like to bring up Doug Wyatt again. He's going to talk more about Audio Units and everything that's new and wonderful about writing them.
[Doug Wyatt]
Thanks, Jeff. Okay, in this section of the session, I'm going to talk about Audio Units. and I will be talking about what they do, how they're packaged, and although we'll touch on some uses of Audio Units from the client side, I'm going to focus in on writing a simple Audio Unit and an Audio Unit view, which is a new feature we have for Jaguar, a user interface component for an Audio Unit.
So just to review for those of you who haven't looked at Audio Units yet, the basic functionality, its purpose is to provide some nice little modular piece of audio signal processing. It can have an associated view plug-in, a user interface for editing it. An Audio Unit can be connected to other Audio Units with any number of input and output buses or connections.
Audio Units use a pull model when they're connected together, meaning that if you've got a chain of Audio Units, you pull on the bottom one, which will then pull on the one above it for input, and so on up the chain until the input is obtained. And then from there, it processes its output. And that's how AUGraph works.
Audio Units operate on 32-bit floating point numbers as the canonical format. It is theoretically possible to write Audio Units that operate on other data formats, but there's no guarantee that you'll be able to connect such Audio Units with any other Audio Units since 32-bit float is the canonical format.
As Jeff was just mentioning, we do use the Component Manager for packaging Audio Units for various reasons, including our responsibilities to QuickTime. And it's funny because I've talked to developers about how we use the Component Manager, and there's some really major developers who've never used it and don't know how it works.
And we're very pleasantly just surprised to find out there's really only three functions they need to know in order to use it: find a component, open -- find next component, open a component, and close component. So once you've opened the Audio Unit component, then you're free to issue, from the client point of view, you're free to make calls on the Audio Unit, like initialize, setting properties, and so on.
So from here on, I'm going to focus on the process of writing an Audio Unit. The first thing you'll want to think about is how many input and output buses your Audio Unit needs. It's important to bear in mind that when we talk about a bus, it can actually be a multichannel bus.
In 10.1, if there is a multichannel bus, we're using interleaved formats. But we are working on a version 2 audio unit specification in which multichannel buses are represented as an array of deinterleaved buffers. And I'll be explaining more about that later. In any case, the other first step in the API when you want to begin writing an Audio Unit is to look through our base classes in our SDK, because we have a fairly wide variety of them, and see if you can find one that gives you the most leverage for free.
So these are the main base classes in our SDK, and I'll be talking about each of them At the root of the class hierarchy is AUBase, and its main job is just to translate the component entry selectors and get and set property calls into C++ virtual methods. These virtual methods have, in many cases, default implementations by many of the base classes, but of course, since they are virtual methods, you're free to override them to customize the behavior.
They manage AU Scopes and AU Elements, which are objects that correspond to the scope and element concepts that you'll see in the API. For instance, whenever you set an Audio Unit parameter, you have to say which parameter in which scope in which element. An element is simply a bus number when you're talking about the input or the output scope.
So you might be talking about parameter three in the input scope element number zero, for instance. And AUBase also handles a lot of other housekeeping, as I said, providing default implementations for the somewhat elaborate Audio Unit API. But as we'll see in the example later on, that housekeeping that happens in the base class for you makes it so that your actual signal processing code can be very simple.
Okay, working our way down in the class hierarchy, the first subclass of AUBase that we'll talk about is AUEffectBase. This may be a little simplistic for some algorithms, but for a lot of algorithms it's perfectly useful. It assumes one input bus and one output bus, and that they each have the same number of channels.
The effect I'm going to show you later is a multi-tap delay, and it's implemented as an AUEffectBase. And the way it works, AUEffectBase issues a virtual method call asking your derived class to create what we call DSP kernel object for each channel. So if it's a stereo, if your effect is invoked in a stereo context, the base class will ask your subclass to create two kernel objects, one to process each channel. And so then when it comes time for your Audio Unit to render its audio, the base class can simply call your two kernel objects separately, passing pointers to mono buffers to process.
And so that way you can write your DSP code for algorithms where there isn't any relationships between what's happening in the different channels. You can write your DSP code to work in mono or stereo or five channel or whatever context. A slight specialization of AU Effect Base is AU Inline Effect Base. And this just adds a small optimization.
It pulls its input samples into the same buffer into which your output samples are going to be rendered. So it's assuming that you're going to process your samples in place, as you can in many algorithms, and this is obviously a much better thing to do to the cache when you're doing real-time DSP.
Okay, moving on to another subclass of AUBase. We have AUConverterBase. In our session on Tuesday, we talked a bit about the Audio Converter. Jeff was just describing the Audio Codecs, which are plug-ins into the Audio Converter. So the job of the AU Converter base is simply to wrap up one of our Audio Converter objects in such a way that it can be used in the middle of an Audio Unit chain.
So the emphasis -- the usefulness of the Audio Converter is a bit different in the world of Audio Units. There don't tend to be many needs for format conversions, except in some special cases, which I'll get to in a minute. But in the middle of a chain, you'll probably only be using it for things like doing sample rate conversion, deinterleaving and interleaving, and channel mapping, which means adding or removing channels or rearranging the order of channels like you may have to do sometimes with multichannel formats.
Deriving from AU Converter Base, we have AU Output Base, which we use internally as the base class for all of our output units. Output units have some extra virtual methods, start and stop. Those are also component selectors for output units. For hardware output units it's obvious start and stop mean start and stop the hardware.
You could write a audio output unit deriving from AU output base that wrote audio to an audio file, and you might use the start and stop methods as the place where you would open the file, prepare it, set up the header, and when you close the file in the stop method, then you could fix up the chunk sizes and whatnot that you have to do when writing AIFF files, for example.
The one other thing that's really useful about the output unit is that since it derives from AU Converter Base, it gives you very easy access to format conversions. There are a number of different ways you could use this. For instance, if your application is just generating audio from -- maybe it's playing an integer audio file, you can set up the output audio unit to accept integer input. Just say, okay, my input stream format is 16-bit stereo integer.
And by doing that, the AU output base will automatically insert a converter into the chain, converting that integer format into the floating point file format, or data format, which is typically required by the HAL. And conversely, if your output unit is running from AU Converter Base, it will automatically convert that data format into a file. So that's one way to do it. The other way is writing to a file in an integer format.
You might be receiving your samples from another Audio Unit, which is generating float, and your file format will probably be integer. And so the converter that's built into the output unit can perform that conversion for you with your having to do almost no work at all. It's all handled in the base class and how your clients set the input and output stream formats for your Audio Unit.
Okay, another subclass of AE base is music device base, and this one is fairly substantial and separate because music device is a completely separate type of component. It's based on audio unit, but it has a large number of extra selectors, allowing you to control the audio unit by sending it MIDI events and what we call extended note and control events. For example, our extended note events allow you to specify pitch more precisely than with 127 MIDI note numbers. You can specify fractional pitches and so on.
The Music Device, though, is an Audio Unit, since that's what it derives from. So it renders audio just like any other Audio Unit. You make the same calls to it to have it render audio. And so you can plug it into chains of Audio Units. And we see it as a really good way for people to implement software synthesizers. In the system, we have our own downloadable sample software synthesizer built in. You'll see it as a music device. And I forget its signature offhand, its component ID, but it's in the header files. You'll see it there.
Okay, so the AU base class, returning back to the top of the class hierarchy, it manages some other objects called AU Scope and AU Element. As I mentioned before, in our API, we talk about parameters as existing within elements of scopes. And so these objects are where the parameter values are stored. You can subclass these objects if you want to store any additional state per input or output bus in your Audio Unit.
The other thing they do is manage the stream formats for your Audio Unit's inputs and outputs. There are variables there that say, okay, this is the current stream format for input number one, and that lets us do all of the format negotiation. Well, a lot of the format negotiation, your subclass has many ways it can hook into that, but when you're connecting two Audio Units, that format negotiation is handled largely by the base class. AU Element has a subclass, AU Input Element, which obviously operates on your input buses. Its main job is to obtain your incoming audio data, whether it's from an upstream audio unit or a client-supplied callback function.
And we also have AU Output Element, and its main job is to -- or specialization of AU Element, I should say -- is to maintain some buffers into which that output's audio is rendered. And there's some reasons -- if you've looked at some of our audio code, you'll notice that we do support fan-out connections, meaning one audio unit can be connected to two extra audio units in a chain.
And in order to support that, we provide a way for an AU Output Element, a cache by which, when the first destination unit pulls and says, "Hey, give me your audio," it gets rendered into this cache buffer so that when the second one asks, "Hey, give me your audio," it's already been rendered, and that cached audio can just be passed back. So having reviewed the basic concepts in the API, I'd like to go over to machine five here. is hopefully awake. Okay, it's awake now. Clean up after the previous session.
And I'd like to just walk you through the process of writing a simple multi-tap delay audio unit. In Jeff's session, Jeff's portion of this session, he showed a lot of the implementation of the base classes, but unfortunately there's a lot of code in the base classes. So I've just given you an overview of what they are and what they do, and you can explore them in more detail and ask us questions about them on the API list.
But right now I'd like to just focus on how little code is actually necessary to create a functioning audio unit. The Audio Unit I chose to wrote, since I'm not a great DSP programmer by any means, is just a simple multi-tap delay. It derives from AU Effect Base.
I'm going to close this. The first thing I'd actually like to look at is since we are writing a component here, we do have to provide the component manager with a few resources in order to find our component. And when I came to Apple, this process confused me enough that I said, "Okay, I just want to have a really easy way to do this." And so by defining about eight macros here and then including this file, which contains the magic, I get the three or four resources that are necessary to define my Audio Unit component and in such a way that the component manager will find it.
So the other little bit of component manager magic is all encapsulated into this one macro here, component entry. What this expands into is the single function entry into your component, which we'll call a dispatch function for all the component entry selectors, and that's where the translation of all those component entries into C++ virtual methods is made.
The next thing I'd like to show you here. So the one virtual method that's really important and required to be override in AU Effect Base is new kernel. As you remember, the kernel object is what gets called by the base class to process each channel in a stereo or multichannel situation. So I'm simply, when asked to create a kernel, I'm creating an instance of a multi-tap kernel. Which is another class in this file here. These properties we don't need to look at right now. They play into the UI.
And here's the MultiTap Kernel Class. And it's just maintaining a little bit of state per channel, like the maximum number of frames of delay. We've got a five-second maximum delay time, and so I'm translating that into some number of sample frames. I'm allocating a buffer for the number of delays, and I'm initializing some state as to how I'm using that buffer as a ring buffer for the delay.
While we're looking at the kernel, its processing function gets called with a pointer to a source buffer of samples, a pointer to a destination number of samples, a number of frames to process, and here's the trick by which we can make this one function work with both interleaved and deinterleaved data. We're told how many interleaved channels are in the input and output streams.
So if it's stereo interleaved, in-num channels will be two, and we'll be expected to go skipping through the samples two at a time, both on source and input. And we recognize this isn't the most efficient way to do things, and I'll be discussing more about interleaved versus deinterleaved issues later. But this is how it works for interleaved.
And so at processing time, we just do a little work to look up the parameters of the delay. The first parameter -- actually, I've scrolled down too far. The first parameter -- no, I haven't. Okay, first we're looking -- this is a five-tap delay, and for each delay tap, we just have a level and a delay time.
So we're issuing these methods on -- which are implemented in the base class. Give me this parameter by its number. We're finding out the delay time, which is converting a percentage to a ratio. And here we're finding out what the delay time corresponds -- which is in seconds -- corresponds to in terms of number of samples, limiting it.
This is all pretty straightforward DSP code. As we go through the input buffer, we're writing the incoming data to the delay line. We're saving a fraction of the input sample as our dry signal. Then we're walking through the five delay taps and mixing from each delay tap into the output signal.
So that's the DSP code. I think there's only about one more function in here I'd like to show you at this point, which is get parameter info. So far we've seen pretty much everything that we needed to do. Actually, I did forget one thing. The constructor of the MultiTapAU, which is the Audio Unit base class, here's where we have to initialize the parameters, because if we don't do this here, the parameters won't exist, and when we go later to find out what their values are, we'll miss them. Here we see what the parameters are. There's the wet/dry mix, and for each of the five taps, there's a delay time and a level.
This call to create elements simply makes sure that the scope in which I'm creating these parameters exists before I try to create the parameters. Okay, so that's the DSP side of this Audio Unit. That's all there is to it. There's a few more things that are all user interface related.
One method that we have to override is Get Parameter Info so that a generic user interface, which is how I'm going to show this audio unit first, so that a generic interface can know what the parameters are and what their ranges are and what their units are. And so here in Get Parameter Info, our caller is specifying the parameter and scope, and we're passing it back this Audio Unit Parameter Info structure.
And since we only actually have three parameters here, this is a short function. We have the dry/wet mix, which goes from 0 to 100%. We have, for each of the five taps, a delay time. and the level. And so this function here, This is all that the generic Audio Unit view will need in order to Work with our Audio Unit.
So I've built this -- I'm not going to take the time to compile it all right now, but I've built this Audio Unit into a component bundle and I've installed it into system library components. And I have a test application. How are we doing for time? Should I go through the test step in a bit of detail? Yeah, I'll just show you quickly, because it shows off some of our other APIs.
So it'll help you understand what's going on when you hear the program, too. What this program does is create an Audio Unit graph. which connects a DLS software synthesizer to the multi-tap delay that I just wrote. And from that, we go straight to the default audio output unit. This bit of code here creates nodes in the graph for those three components that get specified by their component descriptions.
Having created the nodes, then we just establish connections between them from the synth to the delay, and then from the delay to the output node. We open the graph, we initialize it. And then we call getNodeInfo to get from the AU graph structure to the actual audio unit that we're going to talk to. For instance, we're going to send notes to the synth, we're going to send parameter changes to the delay.
I don't know if there's a reason why I'm hanging on to the output. Probably none. But in any case, we've built the graph. It's all connected and it's ready to run. And it's doing DSP at this point, except that nobody's actually generating any audio. We've got a synthesizer and nobody's controlling it. So another function in here.
This is just a quick whirlwind tour of the music sequencing APIs and how you can build up songs to play on the fly. I create a music sequence object. I get its tempo track. I set its tempo. At time zero, the tempo is 120 beats per minute. Then I add a track into which I'm going to add all of my other events.
On MIDI Channel 0, I'm going to have a piano sound with a program change message. On MIDI Channel 1, I've got an electric piano. I'm going to pan channel one hard left and pan channel two hard right so that you can really hear for sure that this effect is operating in total stereo manner.
And then I'm going to randomly generate some notes. And although I kind of like totally random notes, I actually made this less random. I'm generating arpeggios, which are in various keys. But that's what this loop is doing. It's just generating some nice pretty patterns of notes and adding the events to the music track that we created. So now we've got a sequence full of notes, program change at the beginning, pan.
Then I can connect the sequence to the graph of audio units that I created earlier. I can assign this track full of MIDI events to the synthesizer node in that graph. I can create a music player object. Then I assign that player to play the sequence I've just created. And then, boom, I call MusicPlayerStart and I hear noise. So let's see what that sounds like.
So we can hear, this is just the dry sound. Do you want it louder? OK. OK. Can you hear it OK? So we don't have my effect, well we do have my effect in the signal chain. Here's the generic view for the effect. You can see that the level on all five taps is zero. So let's get some delays going.
Okay, so that's how easy it was to create an Audio Unit to do a multi-tap delay with the generic user interface. Back to the slides, please. So a little later I'll show you how I've created a custom user interface for that, but first I've left some other loose ends that we should cover first. There's been some concern about the way we've been using interleaved data as our canonical format within Audio Units. We went to a developer kitchen a couple of months ago, and the majority of the room told us they'd really rather be working in deinterleaved data instead of interleaved.
So, And we have some other things we want to do to solidify and clarify the specs. We're going to work towards a version 2 specification of the Audio Unit API. The main two differences being that we're going to change the way we describe the components, and we're going to have to pass a different data structure to support the interleaved buffers.
The component description differences in the version 1 API, as we've shipped in 10.1, we use the AUNT component type, and we also enforce the use of certain component subtypes. We got some feedback that we were taking up 64 out of the 96 bits available for describing a component. So in the version 2 API, we're only going to use the component's type, AUFX, for instance, instead of AUNTEFCT. And we'll leave the manufacturer and subtype available for developers.
As for the differences in the rendering part of your code, as I mentioned, in the version one API, we're using interleaved buffers anywhere you've got a multichannel stream. An interleaved buffer can fit into our audio buffer structure. But in the version 2 API, in order to support deinterleaved buffers, we need to use a different data structure, which is an audio buffer list.
Fortunately, this isn't too traumatic a change, because an audio buffer list has already been defined to be an array of an audio buffer. These are data structures that are in the HAL API as well as being used by Audio Units. I'll go into the details of this a bit later on, but the main changes are that the render slice, Audio Unit render slice becomes Audio Unit render, and there's some callback functions from the Audio Unit that are going to change accordingly.
And just to reiterate why we're making this somewhat dramatic change, It seemed like developers really wanted us to do this. Better to do it sooner than later, after there are even more audio units written in the world. It seems like a lot of people's DSP code that they've already written, this may be partially due to VST, but there's probably also some cache efficiency reasons. A lot of existing code uses deinterleaved buffers, and we respect that and want to make that the canonical format. And it does offer us some optimization opportunities when we're munging channels around in multi-channel situations.
So for you as the potential author of an Audio Unit, here are some implications of those changes. You may have a reason to support the version 1 API if you need to ship on Mac OS 10.1, if you want to depend on the Audio Units that we shipped on 10.1. But at some point after Jaguar, any new Audio Units that Apple ships will only be published with the new component type. which means they only support the new API.
But in order to make this transition as painless as possible, we're going to do as much as we can to make it possible to write one Audio Unit that conforms to both APIs. And the way we can do this is by having a couple of different ways. telling you how to use component aliases to tell the component manager there's two components here, but actually those two components point to the same piece of code, which is your Audio Unit. And that piece of code can find out which component description was used to open it, and from there it can know, okay, I'm being expected to operate in version one versus version two mode.
The other thing we can do, we can do a lot of work in our base classes to hide the differences. Our current SDK doesn't support this, but we're very close to being done with the new SDK where essentially everything is being done with version 2 in mind and version 1 is being implemented as -- the old version 1 calls are being implemented in terms of the new version 2 ones. And since the audio buffer list and audio buffer structures are so similar, that turned out to be not too much work.
And in fact, if you derive from AU Effect Base or AU Converter Base, the change may be completely transparent to you because these details of interleaving and deinterleaving are happening on the base classes below you or above you, depending upon how you look at the class hierarchy. From the client point of view, you can't mix the version one and version two types in an audio unit graph.
It would just be too difficult to assume that any given, well, if that audio unit had both personalities, you would see it as a version, you know, you'd be able to see, okay, I've got two version two units here that I can connect up. But if you connect up a version one to a version two unit, you would almost certainly get a stream format mismatch.
One of them would be saying, hey, I want the interleaved input, and the other one would be saying, hey, I'm providing interleaved output. You may be able to insert a converter to get around that, but in the AU graph API, we're going to be strict about that and just not permit it. From the client point of view, you'll see the changes to use the Audio Buffer list instead of the Audio Buffer in your callbacks from the Audio Units. And you'll be using the Audio Unit Render and Audio Unit Render, instead of Audio Unit Render Slice.
Okay, moving back on towards the world of Audio Unit Graphic User Interfaces. We have a new component type called Audio Unit Carbon View, which wraps up an entire user interface for an Audio Unit into a component also. We have a generic View component, which I just showed you. It essentially just interrogates all of the Audio Unit parameters in the global scope and shows them one after another.
But we have defined a property for the Audio Unit to tell its host, here are the component descriptions, there can be one or more, for the view components that know how to edit me. And I'll show you that in my sample unit in a moment. So what the Audio Unit Carbon View does, its job is simply to create a Carbon User Pane, which can be embedded anywhere in the host application's window. That User Pane, I think it's a Carbon term for just a view container of some sort.
That view can contain either more Carbon controls or a completely custom UI. It's completely up to you as long as you use Carbon events to receive your events from the OS. We didn't want to get into any of the messiness of OS 9 and the way that applications, for instance, VST hosts would have to dispatch mouse and keyboard events and window up to -- I don't know what else.
But a lot of plug-in models, you know, people had to, in their host applications, manually dispatch events from the operating system to the plug-in. But if you use Carbon events, the plug-in can just get those messages directly from the OS, and things are just wonderfully simple by comparison to OS 9.
And similarly to both the Audio Codec SDK and the Audio Unit SDK, for Audio Unit Carbon views, we're -- we're going to be using a lot of different things. We're going to supply a small C++ framework to make it as simple as possible for you to write these components.
Those of you who use Cocoa may be asking, "Well, so what about us? Why not an Audio Unit Cocoa view?" There are a lot of tricky issues in mixing Carbon and Cocoa. There are some directions in which it already works. There are others in which it doesn't. And we really want to make this as transparent as possible. So we can't commit to that, to Jaguar, but we are very aware that there are people who want to write their user interfaces in Cocoa, and we're going to find a way to make it work if we can.
So moving back to the Carbon View component, the base class, AU Carbon View Base, all it really has to do is handle the one component selector, which is open yourself, in effect, although it has a lot of parameters. So the component gets told, here's your component reference, which is in view.
It's getting told the Audio Unit it's being expected to control. It's being told the window into which it's going to build its UI and the parent control within that window. These are all Carbon terms. The parent control might typically be the window's root control, and we'll see that in the sample program. That's how I'm doing things.
The host application is providing a location and requested size for your view component, but the size is actually only a suggestion. If the host says, well, you can be 200 by 300 pixels wide, and you say, no, I want to be 400 by 800 pixels in size, you're free to make yourself as large as you want. That's just the host's request, because he's actually free to go back and make you smaller afterwards or embed you inside scroll bars or whatever he wants.
Or he could be really nice and say, okay, he's going to be 400 by 800. I'm going to give him the whole window and make the window that size. So the last argument there is the user pane which the plugin creates, the view component creates, and returns to the host. The other main function provided by AU Carbon Base, or Carbon View Base, our base class, is to manage the connecting of controls to parameters.
And the mechanism by which it does that we call Parameter Listeners. And this is our way of making it possible for multiple pieces of software that are all working with the same Audio Unit to be aware of changes that the others make to the parameter values. For instance, you might have -- I can think of a lot of examples. You might have two views on the screen, two different user interfaces for the same Audio Unit.
And they could -- one of them could be moving the slider -- actually, I'll show you this in my program. Moving the slider in one will make the slider move in the other. Your program might be automating the Audio Unit, and then the user interface wants to reflect those changes.
There's a lot of possibilities here. We've created a centralized dispatch mechanism called the AU Parameter Listener. All that you have to do from the point of view of when you change parameters is to call AU Parameter Set, which goes through this notification mechanism, instead of going directly to Audio Unit Set Parameter.
And you can set up one listener to listen to any number of parameters. In our view components, for instance, there's just one listener, and it listens to all of the parameters in that view. And for more details, you can look at our new, for Jaguar, header file, AudioUnitUtilities.h. So with that, I'm getting low on time, so I'm going to put -- actually, writing a Carbon view is also pretty simple. I'm going to show you how to write one.
I'm in the wrong program, no wonder. Wrong project. Okay. Okay, so this is the custom user interface for the multi-tap delay I showed you earlier. It derives from AU Carbon Base, which I was just telling you about, and it only has to override one virtual method called Create UI. It has the same magic macro to create a component dispatcher for me.
And then this is just a bunch of kind of tedious Carbon UI code to manually create controls on the fly. This C++ object specifies the parameter in terms of its Audio Unit, parameter ID, scope, and element number. I create a static text control, which is the name of the parameter.
and Jeff Moore: Thank you, Jeff. Thank you, Jeff. Thank you. And then the SDK has this handy little utility function that will, for one parameter, create a slider with labels on both ends and an edit text field over to the side and connect them all up to one parameter.
And so here that's where I'm creating the controls for the wet/dry mix parameter. And then, as you remember, this multi-tap delay has five taps. So for each of the five taps, I go through this loop, and I do the same process of specifying the Audio Unit parameter scope element, make a rectangle, and again call this function to create a labeled slider with an edit text field.
Basically, I'm just building a bunch of Carbon controls on the fly. I'm making a lot of use of some utility functions here to do that. You can look at the details of how that works in the SDK. Then the last thing I have to do when I'm done is just make sure that the Carbon pane that I've created is big enough to enclose all of the controls that I've created. The base class, as you create those controls, keeps track of which one went furthest right and which one went furthest to the bottom. Those are stored in this member variable, mbottomright.
Just by adding a little bit of slop here so that there's space at the bottom right of the window, I can set my custom view to a nice size. So this is all the code for the custom UI for the MultiTap AUView, other than the code that we provide in the SDK. Now let's look at that.
This isn't actually any more beautiful than the generic view, but it is a little more nicely organized. You can see how the parameter listener mechanism is working here. These two views, when I change a parameter in one, it's changing in the other.
[Transcript missing]
So just to wrap up here, I realize that the transition to the Audio Unit version 2 API may be causing you some questions and concern.
I'd like to reemphasize that we're going to do everything we can in the base classes in the SDK to make sure that those classes support both versions of the API. So you can start working on learning your way around the SDK without fear of things changing drastically underneath you later.
And I'd like to really thank all the developers who've been giving us feedback all along because it helps us know when we're doing the right thing and that feels good, and it helps us know when we're doing the bad thing and it feels good to be able to do the right thing. So thanks again for your feedback, and please keep giving it to us on our API list, and thank you very much.