Configure player

Close

WWDC Index does not host video files

If you have access to video files, you can configure a URL pattern to be used in a video player.

URL pattern

preview

Use any of these variables in your URL pattern, the pattern is stored in your browsers' local storage.

$id
ID of session: wwdc2000-176
$eventId
ID of event: wwdc2000
$eventContentId
ID of session without event part: 176
$eventShortId
Shortened ID of event: wwdc00
$year
Year of session: 2000
$extension
Extension of original filename: mov
$filenameAlmostEvery
Filename from "(Almost) Every..." gist: ...

WWDC00 • Session 176

Mac OS X Core Audio: Multichannel and Beyond

Digital Media • 55:30

This session presents a roadmap to the low-level application audio services available for Mac OS X. Topics range from direct manipulation of hardware features to finding devices and streaming audio to them. We discuss performance issues such as threading, memory management, and synchronization. We also present an update on the Sound Manager, including the recent API changes and Carbon support.

Speaker: Jeff Moore

Unlisted on Apple Developer site

Transcript

This transcript was generated using Whisper, it may have transcription errors.

So we'd like to get this session started, so I'd like to thank you all for coming. It looks like we have a full house, which is great to see. This is the second of three sessions on audio focusing specifically on Mac OS X. The first session we did on Wednesday was to do with the new audio unit and music services that are being provided on OS X, as well as the MIDI services.

The MIDI services are obviously at a lower level to deal with MIDI devices. The music and audio units sort of sit up above those, and so application services that programs like QuickTime and other applications would use. In this session, we're going to be covering the sound manager as it exists on OS X and some just revisions of what's been going on with the sound manager over the last few months. And then we'll be getting into the lower level interface that expresses the interface between your application or the higher levels of the audio engine to the particular audio devices from an application point of view. And then in the session following this, we'll be talking about what is going on underneath that API in the kernel and in the I/O Kit world.

So without any further ado, I'll get Geoff Moore to come on. He's been doing most of the work in this area. and make him welcome. Thank you. Thank you. Hi, everybody. Wow, it is full. It's good to see there are this many people who are as interested in audio as I am.

So let's get started. So like Bill said, today we're going to talk a little bit about the sound manager first. We're going to talk a little bit about what the sound manager can do for you. Specifically, we're going to talk a little bit about how the sound manager is implemented. We're also going to talk a little bit about what's changed in the sound manager over the past year. Been a bunch of different versions. It's been a little bit of confusion about it.

And from that point, we're going to go into what's new with OS X. and the core audio services that we've been working on. Then by the end of this, hopefully, you'll understand where we're going with all this stuff. And you'll be able to move your own audio data to and from devices and all over the system.

So first we're gonna talk a little bit about the sound manager. The sound manager is available now on three platforms: Mac OS X, Mac OS 9, and Win32. The Mac OS X and Win32 versions are very similar in terms of feature set. Mac OS 9 has a few more features than the other ones. Then I'll talk a little bit about things we've changed. Most notably, we added variable bitrate decoding. And we've fixed a bunch of synchronization bugs recently. Then I'm going to talk a little bit more about, you know, what we did to bring it up on Carbon and what's there, what isn't there.

So to start off with, the sound manager itself, we're all pretty familiar with sound channels and the general procedural API for playing sound and for bringing sound into your application. I thought I'd do a quickly review a little bit about how all that is implemented under the hood because it's not exactly clear because a lot of this stuff is not exposed through the procedural API. So under the hood, all sound channels are implemented via a network of sound components. They do all the actual processing work and moving the data and massaging it into a format that you can play or encoding it, decoding it, whatever. There are a bunch of different areas that the sound manager focuses on.

Rate conversion is arguably the most time expensive thing the sound manager will do for you. Now all these little components are linked together in pretty much linear chains. That is, you won't have two inputs feeding one sound component. They just have one in, one out. And then all the semantics of the procedural API are really handled at the low level by the system mixer component. it's a pretty monolithic architecture that that does the job, but it's beginning to show its age a little bit.

So this diagram kind of represents a runtime view of a system using the sound manager and the connections of the various componentry. You'll note the first two chains you see are show what is typically the full chain you get. That is you start with a sound source component whose job is to be the traffic cop with the buffer of data that's being pulled through the chain. And then you have a decoder component, which is the component that will take a stream in one format and convert it into a format that the rest of the system then can use.

Typically, that's a linear PCM format. And then you have an equalizer component. The equalizer component actually has two jobs. Its first job is to provide the implementation of the base and trouble controls that you see in QuickTime. The second job it has is to implement the spectrum analyzer that you see in QuickTime. To do that, it siphons off the stream and puts things into a buffer and then performs the FFT at event time. This thing will never perform its FFT at interrupt time. So it's not too much of a performance drain if you're not using the equalizer for actual base and trouble adjustments. And then finally, you have a rate converter component. And the rate converter component does two roles. First, it will take the raw data and convert it to the rate that the hardware wants. but it also does the work of handling the rate multiplier command. So if you say play this sound twice as fast, it upsamples and then downsamples to get everything back into the format that the hardware wants.

Then finally, everything is junctioned at the system mixer component. On Mac OS 9, you only ever have one mixer component talking to one output device component. On Windows and on Mac OS X, you have one of those. You have a mixer and an output device per process. Everything is handled in user space for the sound manager.

So next we have what we've done with the sound manager over the past few years. Over the past year we've had about seven, I think eight maybe, releases of the sound manager. You know, 351, 36, 363. There have been a whole bunch of dot releases fixing little bugs here and there.

The important releases were the 351 release, which was primarily a release to support new hardware. specifically the iBook and the G4. It also rolled in new support for some of the Mac OS 9 features, like multi-user preferences and that sort of thing. But then in 3.6, that's where we made some pretty big under-the-hood changes to the Sound Manager, when we added support for variable bitrate decoding. And 3.6 shipped with QuickTime 4.1. Now, the current version is SoundManager 365. Between 36 and 365, there were a couple of bug fixes that had mostly to do with synchronization. There was also one bug fix that had to do with fixing a very longstanding issue in the init that caused a hole in the resource map to be drilled because of a bad dispose handle. It's amazing what you find, the 24 to 32 bit transition biting you almost seven years later. The specific thing was the master pointer flags were moved to the data block, and if that data block gets purged in the handle, you can no longer tell it's a resource. So if you call dispose handle on it, you're in trouble. Sound Manager was doing that. Notably, it caused remote access, of all things, to have trouble. Yeah, go figure. You'd see a whole bunch of unstable connections. Specifically, if you were using tunneling IP for encryption, you would have a lot of trouble trying to get through.

So variable bitrate decoding was a very interesting problem and a very interesting feature to add to the Sound Manager. Now, as most of you know, variable bitrate encoding is a technique that varies the number of bits used to encode a sample or a block of samples over time. Typically, this yields better quality encoding for a lower overall bitrate. And we added that specifically to support QuickTime's MP3 3D decoder in QuickTime 4.1.

So to do it, we had to basically grapple with the fundamental issue that you don't, with variable bitrate situations, quite often you just don't know the relationship between a buffer size in bytes and the number of samples you're gonna get out of that when you decode it. And as the sound manager's pretty reliant on the fact that you know ahead of time the number of samples in a buffer. So we had to go and mess things up a little bit so that we could express that notion of a buffer size in terms of the number of bytes it had rather than the number of samples. To do this, we ended up extending three data structures. The scheduled sound header needed to be extended so that you had access to schedule a block of variable bit rate data. The sound component data structure had to be extended so that the component chains could talk about variable bit rate data with each other. And then the sound param block had to be changed so that the mixer could keep track of it as well. In all cases, what we did to extend these data structures were we added a new flag to their flags field that said, "I'm extended." And then you could cast that to one of these extended structures and then you would access to extended fields that included another flags field. And the flags field for that was used to indicate whether the structure was counting by sample frames or by bytes, and then the field to actually hold the byte count as well.

So in addition to that, we also rev the sound converter API to better support variable bitrate situations, and in fact, be a better and easier to use system in general just for any sort of conversion. So we added a new routine called sound converter fill buffer. It's a direct replacement for the functionality you got from sound converter convert buffer. In particular, the difference is that the mechanism for moving data through sound converter fill buffer is a callback mechanism where you specify a routine that the sound converter can call to get more data to decode or encode. This gives you complete control over the buffering in the system. You specify the output buffering when you call sound converter fill buffer and you have complete control of the input buffering 'cause you have a function to feed it into the system. And then finally, this obviated the need for calling sound converter to get buffer sizes for really any reason if you're using sound converter fill buffer, since you are already in control of all the information both on input and output. There's no need to ask the sound converter about it anymore. So the best place to find more about this stuff is the recent QuickTime 4.1 developer update note. And that's a URL where you can get the PDF version.

So the other major things that we've done to the sound manager recently were in the nature of synchronization fixes. As most of you have read and probably seen firsthand, we've had a lot of problems with things like DVD, audio, and video sync. Now, we've been fighting a running battle with these bugs for well over a year now. And we've nailed most of them pretty much to the wall at this point, I think. There are probably still a few running around, and who knows, with new hardware, we're going to see more. And I'm sure we'll be revisiting this problem as we go forward. Specifically, we ended up changing the sound clock, which was driving most of its timing by watching the number of samples go by, as well as relating that to the progression of microseconds over time. And we've been investigating changing the math to be a little more accurate and to be better suited for certain pieces of hardware, like the iBook and the G4, that have different clock trees underneath the hood.

So the other big thing we did was the Carbon Sound Manager. And we brought that up for-- I think it was DP2. Yeah, it first showed up in Developer Preview 2 of OS X. So the Sound Manager for Carbon is pretty much full featured, with a few exceptions. Now, in most cases, the features that we did not choose to port were primarily because of the functionality was either duplicated by other services in the sound manager or, indeed, on other services in the OS. but also because the services that they were providing were obsolete in a lot of cases.

So among other things, these included the wavetable synthesizer and related commands. This is things like freak command and let's see, there are a bunch of other ones that escape me at the moment. The other big one that we didn't include, choose the port, was sound play double buffer. Sound play double buffer is probably the most inefficient mechanism for feeding sound into the sound manager at the moment. You are much better off to use buffer commands and callback commands, or even better yet, you should be using scheduled sound. Scheduled sound is far and away the best way to get sound in and out of the sound manager at a specific time.

We also don't have any support for recording to or playing from disk. Again, we feel that these features are best accomplished through QuickTime. Then there were a bunch of other sound commands whose services that we just don't need anymore or didn't work right. Some examples of these are the amp command. It's pretty much exactly what the volume command does.

The rate command, which does sort of what the rate multiplier command does, only it has an interesting problem in that it treats all the rates as if you were scaling 22 kilohertz sound, which can give you unpredictable results if you're not sure ahead of time what you're doing. Then there's the commands like the load command, which were about querying registers on the old Apple sound chip. I don't think anybody was using those.

So ultimately, where this leaves us right now is that we have a system that's pretty well optimized for handling 16-bit words, stereo channel formats, with a 44.1 kilohertz sampling rate. Now, we support constant and variable bitrate formats, usually the simpler variety of those. And we're seeing native processing coming along. We're using a lot of it ourselves. And we've also got a fairly loose synchronization model, all things said and done. But it's doing the job pretty much for what we have to do today.

So going forward, the sorts of things that we see coming down the road are at the hardware, we see 24-bit integer formats coming straight at us. And then in the software realm, you see a very heavy reliance on 32-bit floating point to support all the bandwidth you need. And then in channel formats, surround sound is just beginning to take off now. You're seeing it more and more at the consumer level. You're seeing it more and more at the authoring level.

A lot of games are being authored in 5.1. And in the pro market, you're starting-- in the authoring market, you're seeing much higher sampling rates than what we've been used to in the past. 96 kilohertz certainly looks like it's going to be the next bump up in the standard. And then we're also beginning to see even more complex encoding schemes than the variable bit rate stuff that you see in MP3.

Specifically, you see codecs that you, that, encodings that are going to be doing different techniques for data resiliency when you transport it over the network. New kinds of perceptual encoding techniques that result in better, smaller, faster, whatever. But they're coming, and we need to be ready for them. And then we also see, you know, native processing is just going to explode. We were talking multiprocessors. I mean, it's like 4G, 4G. You've got plenty of bandwidth to burn for signal processing.

And then in addition to that, we also see hardware acceleration starting to take off. In fact, on the PC platform, it's already taken off like crazy for games, for doing 3D rendering. And then we also see an increasing requirement for tight synchronization with other media, both internal media and external to the box.

Increasingly, we need to run machines in sync over a network or in sync with a SMPTE deck. you know we feel that pretty strongly that all those features need to be encompassed in any audio architecture at the operating system level so What is Core Audio? What are we going to do about all that stuff? Well, first, we're going to provide a new low-level audio device API. Specifically, it's geared to allow you to read and write data from a given device and to do that in a way that can be shared across many processes. Then we're also going to do some inter-application communication stuff so that you can have audio being generated in one application and sent to be processed in another application.

And then on Wednesday, Chris Rogers went in-depth about the new component model that we're going to be supporting on OS X and in other places as well, the audio unit architecture. And I hope you saw his talk, because you're going to hear a lot more about that stuff as the year progresses. And then probably the best news about all this is that we're going to open source all the low level sources, or the services.

We feel pretty strongly that while we think we're pretty good, we know what we're doing, you guys have more knowledge about the specific areas and more knowledge about your specific hardware so that you could tell us, hey, you're not doing that right or you're not going to be able to support my hardware. We really want to encourage you to participate in what we're trying to do and to make it better.

So this diagram kind of lays out the way core audio is being spread about the system. Now, down in the kernel, you have I-O-Kit. That's where all the drivers live. That's where all the hardware lives. So the big problem with a protected mode system like this is how do you get the communication out to the application so that you can move the data fast enough and enough of it so that you can do something useful with it? So we're going to tackle that with the audio device API. It specifically manages moving data over the kernel user boundary. And it sits entirely itself, entirely in user space. It doesn't have a single part piece of it that lives in the kernel. And then it lives individually within each client process that's using it as well. So you link, it's just another shared library. And then on top of that, while clients can directly access the audio device API, We only really expect to see that done with clients that have a really high degree of need for low-level control and management and whatnot. Mostly, we hope to see you using audio units to deal with the hardware, as we'll be providing audio units that will completely wrap up the use of the audio device API, as well as the audio IPC mechanisms that will allow you to move data across processes completely in user space again.

So what are the goals of the Audio Device API? So like I said, it's designed to be multi-client. The Macintosh, since its inception, has been able to play sound in two applications simultaneously. If we weren't able to do that in OS X, that would be a huge step back. So it's very important that multi-client features be carried forward. And further, we're supporting multi-channel. Obviously, we think surround sound is important. We've got to be able to talk to devices that have more than two channels and to be able to do that in a way that has low latency so that when you say, play this buffer, it gets out there on the wire as fast as we can get it there. The audio device API is designed to have very low overhead. And specifically for the latency, we depend very heavily on the CoreOS scheduler to make sure that the threads that have our code in them run when they're supposed to run. And then the latency will obviously also depend on the transport layer you're using to send the audio to the hardware.

You know, it's like PCI has one kind of latency. USB has a totally different kind of latency. And then another primary goal is synchronization with the Audio Device API. It's very important that a device be able to be synchronized with both internal hardware and with external hardware. Both SMPTE signals, digital clocks, word clocks, Studio Sync, everything. We're bringing that into the system. And then finally, we hope it sucks a little bit less.

Jim here. So the data formats for the Audio Device API, well, it's pretty much format agnostic. Now we do treat PCM data a little bit better than we treat other kinds of data, but by and large, whatever your hardware wants to take, the Audio Device API is prepared to ask for it from the client code.

We pass every, all the data is passed around in void stars and we don't make any requirements about the data. Now if you do choose to use PCM, the PCM format that we use internally is 32 bit floats and we support both interleaved and non-interleaved streams for PCM data. And further, we'll do all the mixing for you internally if you're using PCM data. Otherwise you pretty much have to rely on the driver supporting mixing in some fashion for other formats because we don't know how to do that. We're willing to let the driver figure it out. And further, with the 32-bit floats, to actually convert into the hardware format, we also rely on the driver, providing us a routine to do that.

So conceptually, a device in this model encapsulates an I/O cycle to read and write the data to the device, and a clock to keep track of that I/O. And the clock generates timestamps that specifically map out the relationship between the host clock, which in our case is the CPU time register, as specified by uptime, and there are other system services on OS X for retrieving that clock value.

And the sample clock of the hardware, that is the counter that's counting the samples going by, it's really important to know to a high degree, as accurately as you can, the relationship between when a sample is played and the host clock time that it was played at. You'll see why in a few minutes. So devices also have a set of properties. And properties are used to describe the state and configuration of a device. Now, they have getters and setters, and they are specified as selector value pairs. Now, the selector is an integer ID, and the value is any format that the property wishes to express. Again, it's expressed as a void star in the API. Another big feature of properties is that you can schedule the driver to make the change to the property for you ahead of time. That way, if the driver supports it, that gives you a way to do sample accurate scheduling with hardware changes. And this will be much more important as FireWire devices start coming online.

And then finally, clients can register for notifications in the changing of properties. The notification mechanism specifically, you will get a notification only if the value changes, and if it changes anywhere on the system. So if process A makes a change, and process B is looking for that change, process B will get a notification that process A changed that value.

So on the inside, what we have is a single ring buffer that gets mapped by I/O Kit into each client process. Now, IO-Kit reads and writes to this buffer asynchronously from what the client's up to. In fact, it's usually done via DMA program. We're trying to avoid having to have any kernel threads actually executing to do any audio processing if we can at all help it. And then the interrupts that we get in this system are only generated when the device driver wraps back around the ring buffer. Now, the buffer size, the size of the ring buffer is typically fairly large. In the current system, it's about three-quarters of a second. So you're going to only see the interrupt rate for this system is extremely low. Now, you all have a lot of questions, I can tell. Thank you.

So every time the buffer wraps around, we get a new timestamp for when that wraparound happened. And that timestamp comes out of the sample clock and from the host clock. And you can also get a timestamp whenever you want. So you can generate more as you need them. And sometimes those timestamps are going to be interpolated because we may not have specific data on the time that you're asking for. So we have code that keeps track of the history of time and can predict one value given the other.

In this diagram, you kind of see what's going on from the device's point of view. So there's an input ring buffer and an output ring buffer. And the client is spinning around, reading and writing to those buffers. And the DMA heads chase those heads around and clean up or write new data after them.

And then, as you can see, when the head gets back to the interrupt point, we'll raise an interrupt. And at that point, we'll generate some new timestamps. and we'll do a few other housecleaning chores. But other than that, we don't actually call out to applications to get data. That's a big difference to the way systems have worked in the past.

So how do we get data to the system? So from the client's point of view, implicitly, you're going to be doing a lot of multi-threaded programming with audio. Just be clear on that because that's pretty complicated and it's a lot different than the Mac OS 9 implementation. There are a lot of new issues you're going to have to face about atomic operations on data, making sure that you're not waiting on a semaphore that's not ever going to get signaled on or the usual things that you have to deal with when you're dealing with a multi-threaded environment. It's now coming to bear right on you when you're dealing with audio.

So the client has one high priority run to completion thread per device in the process. Now the client can give us the thread. And you can configure it however you want with your own priorities. Or we will do it on your behalf. And we will set things up so that we give you the appropriate priority as well for the type of latency that you're looking for in your I/O.

So this thread gets, this thread is, Okay, so the code that's running in this thread needs to, obviously, because it tends to be a very high priority thread, in fact, audio tends to be one of the highest priority threads in the system, there's a high degree of, there's a high chance that you will lock the system up if you take too much time in the I.O. thread. Consequently, there are a number of things you can do to alleviate that.

You can just not take so long. That's a fine idea. to optimize your code. You can also, but perhaps the better strategy is if your situation allows it to have a secondary thread available that runs at a lower priority than the audio thread that you can use to generate or render your audio or read it off the disk or get it from the network or do whatever you need to do to get your data and process it and get it into a form that's ready to be handed to the hardware. If you can do that, you will increase your throughput and the general throughput of the system a great deal because we won't be spending a whole lot of time locked up in these high-priority threads that don't yield to the rest of the system.

Now, in fact, if you do take up too much time in your I/O thread, first we're going to tell you about it. We will send you a notification that says your thread is taking too much time. You will also hear glitches in your stream, obviously, because you're not generating the audio at the right time to be played so that the stream can be continuous.

The other sorts of issues that you're going to see is... that you can lock the system up but still not panic the system. So the machine will just freeze, and the only thing you can do is a cold reboot. Given that, the implementation of the Audio Device API goes to great lengths to keep that from happening to the point where it won't reschedule your thread for you if it sees that you're taking too much time. So if you eat too much processor time, we're going to scale you back a little bit. so that you're not starving the rest of the system.

So what do you do in this thread, right? And that's kind of an important question. So the I/O thread is basically scheduled to wake up periodically in a way, and when it wakes up, the idea is that you read your input and you write to the output. Now, the way we schedule the thread to wake up is so that it kind of simulates a double buffering situation. That is specifically, by default, your thread will be scheduled to wake up about a buffer ahead of the buffer you're supposed to render for output. This gives you roughly 100% of CPU to do whatever it is you need to do and still be able to deliver the audio to the hardware on time so that you have glitch-free audio.

Like I said, the input and output are presented to you synchronously. So when your I.O. routine gets called, you get both the input data and the output data. Further, you get timestamps that talk about when that data was acquired or when that data is going to be inserted in the output stream. And you get a third timestamp that tells you what the current time is as well, so you don't have to ask and can potentially save a little time in the I.O. thread.

Now, the buffer size that you're using is completely configurable by the client. We can do this because we're just writing into one shared ring buffer. We just know that we need to write that data at a certain space ahead of the DMA readhead so that we keep the stream continuous. And so the buffer size that you use is entirely up to you. You can make it as big or as small as you want, and that obviously you have tradeoffs with overhead in those terms. the smaller the buffer size, the more frequently we need your thread to run, and also the more effect the jitter in the thread wake-up is going to affect you. For instance, if you want to render at, say, 64 sample buffers, that's roughly three or four milliseconds of data. Now if the thread that you're using to render that data has a jitter of some number of microseconds, obviously the smaller your buffer is, the more that jitter number is going to matter to you.

So another thing that's configurable about this whole process is the wake up time. So you can say how much time in advance you want the thread to wake up. You can make it as close to the actual delivery time as you want, provided you have a good idea about how much time about you take to deliver your data.

The general rule-- there are a lot of interesting applications for that. one reason you might want to go as close to the delivery point as possible is to be as responsive as possible to interactive events like MIDI keys or user interface events or user interface devices and whatnot. Given that those parameters that you've told us how big your buffer is and how far in advance you want us to wake you up, the actual wake up time that we schedule the thread to wake up for is then calculated using the previous time stamps and the relationship between the number of samples played and how much host time has passed in order to generate the appropriate time to set to wake the thread up at.

So in this diagram, you kind of see a conceptual idea of what's going on when your IOPROC is called. So you see the buffer size are the individual blocks, and the read head, as you see, is processing one of those blocks, and time is going from left to right. So you use the wake-up offset to control how far into the buffer you want to be scheduled for, and, of course, you can go out a few buffers if you know that you're working ahead of time. Many applications have the luxury of being able to work a couple of milliseconds into the future.

And then you can also control how big those buffers are. And when your thread wakes up, you are woken up, as you see, one buffer ahead of where you're going to deliver data for the output. And the data you're going to get on the input is that buffer previous that we just finished reading. So that kind of gives you the relationship between where the data is that you're going to read and where the data is going when you're gonna write.

So I thought I'd finish this up by showing you exactly how easy it is to write a real live client with this stuff. This code was adapted specifically from the SDEV component that's used in the sound manager to talk to this API. So first thing you got to do is you got to find a device to talk to.

So to do that, you use a property of the entire system, which is a little different than a property for a specific device. You can tell system routines versus device routines by the name of the routine. System routines start audio hardware. Device routines start audio device. So the first property you're interested in is trying to find out the default output device. So you call, get the system property for the default output device. Pretty simple.

Then you need to figure out what kind of data you need to send to this device. So to do that, you get the devices stream-- the stream format property. Now, the stream format in this API is encompassed by the audio stream basic description struct. It contains enough information to describe any constant bit rate format where all the channels are the same width. This applies to most of the general of the compression techniques that you see in the sound manager today, like IMA, linear PCM falls into that category, mu law, a law, all that stuff. Now, the struct will supply you with the sample rate, the number of bytes in a frame, the number of bytes in the channel, and the number of bytes in a packet if the format has another grouping above the sample frame and the channel structures. More complicated formats obviously have more information to talk about than just how big their individual fields are. For variable bitrate data, you need to know where do the frames actually start in the stream. More complicated formats also provide an extended description which is format specific and is defined by that format. That's also available via another property on the device.

And then in order to do I/O, you also need to know how big your I/O buffer is. In this case, we don't really care what the I/O buffer is because this is just a simple client. So he's just going to take whatever the default buffer size is for this device. And again, you just call audio device get property to do that.

And one thing I should mention, that sizes in this API are almost always passed around in terms of bytes. And you can calculate the number of frames, if you can calculate the number of frames for that format, by getting the stream format and doing the appropriate math using the description in the stream format descriptor. So to start playback, you need to tell the device about your I/O routine. So you install it by calling audio device at IOPROG. Now, you also are given a place to pass in a pointer to whatever kind of data you want, pass back to your IORoutine so you can, you know, that's really useful for keeping track of context on multiple, well, you all know how to use those things. Been around forever.

So then you just start the device by starting it. There's a routine to start it. And stopping a device is pretty much the same but in reverse. You call audio device stop, and then if you're done with I.O., you can remove the I.O. proc as well. Now, one thing I should point out that you'll notice that with start and stop that you also need to pass in the I.O. routine again. Now, the reason why is that you can install multiple I.O. routines on a given device. I'm sure that there are a lot of reasons to do that, but it's just useful for a number of things. Thank you. So here's the prototype for the I/O routine. And when it's called, you get the ID of the device that the I/O is happening on. You get a timestamp that represents now. And like I said, timestamps represent the mapping between the sample time and the host clock.

You also get a pointer to the input data and a timestamp for when the first frame of that input data was acquired. Then you get a pointer to the output buffer and a timestamp for when the first frame of the output buffer is going to be inserted into the output stream. Then you get back your client data pointer as well.

And here's the entire implementation of the IORoutine. In my case, I'm using a MyNifty file object in my client data field so that I can get my data back, so I can get some data to play. And I cast that back, and then I use it. And I put that data just right in the output buffer. And then if I find out that I'm done, I can just turn off that IORoutine. And the semantics there is that when you turn off an I/O routine from during an I/O process, that current I/O will complete and then no more I/O for that routine will happen.

So when are we going to give this to you? So like I said, the sound manager is in DP4 now. It's been there since DP2. The audio device API is also in DP4. And the IPC mechanism is going to be, we're going to start seeding that prior to the public beta and we'll hopefully have it GMed for the public beta. And then with the audio units architecture, we're looking at this fall for releasing. We don't really have anything specific there.

So next, I'm just going to show you that it all actually works, and it's really alive. First, I want to point out that the music that you heard when you came in was coming live off my PowerBook using QuickTime Player on OS X on top of the Audio Device API.

I think you can hear me. So first up, I want to say in DP4, by default, the Sound Manager is not set up to use the Audio Device API. You can add a magic cookie to the framework, and it's documented on DP4 how to do that, to make that actually happen.

But other than that, there's one reason why all the demos you've seen previous to this haven't been running through the Audio Device API. Bye. These will. So first up, I'd like to show just a reasonably high frame rate QuickTime movie. And the thing to watch for in this one is synchronization. The synchronization is still pretty solid. It's not 100% perfect yet, but it's pretty good already.

Gotta like that. All right. And how about that? The sound stopped right when the movie did. Thanks, Mark. Bright and sync. Let's try that again. Thank you. Don't you love it? Male Speaker: You don't have to reboot. Eric Green: Hey, and you don't have to reboot. How about that?

Eric Green: Oh, another interesting thing about the Audio Device API is you don't have to reboot to reinstall a new version of it either. As long as you're not playing sound, you can just put in a new version and go. That's kind of neat. It cuts down on the development time. So let's -- once more from the top.

He delivers ten times out of ten. Who's the cat that won't cop out? Shea Ray. They say I'm a complicated man. I might take you down, but I'll never let you down. Who's the man who'd risk his neck for his brother man? Now, what's my name? Secretization works. It doesn't look like a bad movie either.

So another thing I wanted to show you a little bit was some of the benefits of variable bitrate encoding. You see some examples of some MP3 files I've encoded using different kinds of different data rates and whatnot. First, I like to play the original. kind of give you a feel for what this sounds like before I give you all the compressed versions.

OK, of particular note, you should hear the symbol sounds and what they sound like. Kind of remember that. And try to figure out which of these sound the best. First, let's start with the high data rate version, since those are the easiest. Obviously, they're going to sound relatively good. And the relative difference between variable bit rate and constant bit rate starts to go away when you use larger bit rates. But they're still there. If nothing else, you get smaller files. So here's the variable bit rate, or here's the 128k constant bit rate kind. This is typically the format you find on the internet. It sounds pretty close to the original. And here's the variable bit, the high quality version for this encoder of-- As you can see, the file size is a bit smaller than the 128K size.

And again, you can hear it's still pretty close to the original mix. And in this case, the saving is obvious. Variable bitrate wins because it's a smaller file size. Just kind of an interesting aside, the normal encoded version, at least for this encoder-- wait for the variable bitrate to parse.

normal quantum, that sounds pretty good. And hey, look, the file size is even smaller yet. Now, here's where variable bit rate really starts to shine now, is in the smaller, the low bit rate cases. So say there's a 64K constant bit rate version. It almost doesn't sound like a hi-hat.

all the usual gripes you get about MP3 at that point. So let's take a look at the variable bitrate version in the lowest quality setting that squeezes the most bits out of it. And in this case, again, you see the file size is roughly the same as the 64K version.

depends on the, uh, the actual content. Hold that question. So you can see the low quality version is much, much better than the 64K version. And you get the same file size. So use VBR, I guess. So to finish things up, you need to contact Dan Brown to-- finish things up and participate, particularly if you want to participate in the seating program. The seating, we're going to be seating the, the, the, the core audio services a lot quicker than we've been running in the past. We're hoping to keep things moving, get things out into the open, get you guys working with the stuff and working with us to make it better so that we can actually, you know, meet your needs for a change. So contact Dan, and he can get that working. So next, we're going to finish things up with a little Q&A, but Bill has something to say first.