Modern QuickTime Audio Programming Techniques - WWDC 2006

Graphics and Media • 1:09:21

QuickTime integrated with Core Audio provides access to powerful professional-grade audio capabilities such as high sampling rates and resolutions, multiple channel layouts, and sample-accurate synchronization. If you want to modernize the existing audio capabilities of your QuickTime application or take advantage of the spectacular audio capabilities in QuickTime, this session is for you.

Speakers: Sayli Benadikar, Brad Ford

Unlisted on Apple Developer site

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.

Hi, everyone. Welcome to our session. My name is Sayli, and Brad Ford and I are going to be your tour guides for the next one hour as we go on a whirlwind tour through the land of modern QuickTime audio. That's me, Sayli. So, a quick preview of what we'll be covering in this session.

We'll start with a general overview of QuickTime Audio and its capabilities and contrast this a little bit with what Core Audio has to offer. Because often when you're writing your audio-based applications, one of the first questions you face during development time is, "What set of APIs should I use?" Hopefully this session gives you a little more information to make a better decision.

Next, as you know, QuickTime is a rather mature piece of software, and over the years, we've had the need to deprecate certain APIs. With some other APIs, although we've not deprecated them, we highly discourage the use of them just because there's newer, smarter, better ways of doing the same thing. So in this section, we'll talk about what's deprecated and ill-advised, and then we'll steer you towards what we'd rather have you use. And finally, we'll cover some new features in QuickTime Audio. These were either introduced in QuickTime 7.1 or Leopard.

So starting with our overview, QuickTime is a layer based on top of Core Audio. So it uses Core Audio's audio units, the audio converter, the output unit, et cetera, in its signal processing. But what QuickTime adds on top is a sort of value that you can see in terms of built-in capabilities that are really useful for multimedia applications. QuickTime provides a really rich file and data format and some other built-in capabilities that I'll get into more detail in the next four slides.

So starting with import, QuickTime transparently opens a really wide range of containers. These include not only QuickTime's own .mov multimedia container, but that second sub-bullet down there is all the formats supported by Core Audio. So QuickTime uses Core Audio's audio file format APIs, and also some other formats such as DV, AVI, that Core Audio does not support, and then a whole host of third-party formats such as DivX and Windows Media.

This is because QuickTime's import architecture is completely expandable using third-party components. On the data format side, QuickTime decodes, again, a whole host of containers and also supports third-party decoders. So if you use QuickTime's QuickTime Audio APIs, you get all this rich file and data format support for free, just out of the box. So if that's something useful for your application, then QuickTime Audio APIs are definitely the way to go.

On the playback side, QuickTime gives you all that. Multimedia synchronization that helps you sync audio with video. Media time scaling. This is where during editing, when you edit media in QuickTime, you can assign rate and duration to your media. What this means is during playback, rate scalers can be applied to your media. And if you have two tracks that have media that contain different durations, you can fit those together.

Then also we have some built-in capabilities for pitch and rate control. And using the rate changes preserve pitch property, you can get the non-Chipmunk-style audio during rate changes. There's volume and spectral metering. And QuickTime does movie-level mixing. So if you have multiple tracks, multiple audio tracks in your movie, QuickTime will handle the mixing of all these together. And QuickTime also does channel layout-based mixing. So if you have, say, 5.1 content and you're playing it to a device that's, say, stereo, QuickTime can do that mixing for you.

In QuickTime 7, we introduced a new set of APIs called the Movie Audio Extraction APIs. And this is a way for you to suck raw PCM data out of the movie. You can consider this as your one-stop shop for getting decompressed, mixed movie audio samples. And these APIs are thread-safe, which means you can perform extraction on a worker thread. And provided the audio codecs, the underlying audio codecs that you're using are thread-safe. And this set of APIs gives you a sort of a convenience. It takes you a layer higher than, say, the audio converter.

QuickTime deals with all the intricacies and the details that are related with mixing and converting of audio and abstracts your application away from this so that you don't have to go down to, say, the level of the audio converter and do any of the mixing yourself or the decoding yourself. So that's a huge gain.

Just as with Decode, QuickTime can encode a whole host of data formats. And with AAC and AMR, license is required if you're doing the encode on Windows. And then we have a very useful, the standard audio compression component and its APIs that Brad will get into much greater detail in a later section.

But these are a way to very easily configure compression settings during export. And again, QuickTime writes to a whole host of containers, and QuickTime's playback and export and extraction paths are all the same, so this means what you hear is what you get. And finally, QuickTime APIs are multi-platform, so you can use them on Windows.

So that's all the good stuff about QuickTime Audio, but it might be the case that the application that you're writing is very audio-centric or audio-only. Maybe you're not really caring about, say, video synchronization. You don't really care much about file and data format support. Maybe you're writing a very specific signal processing application.

In that case, Core Audio is a better choice for you because basically it provides a lot greater control on the underlying processing units than QuickTime Audio. And in general, it has a lot more audio-specific capabilities than QuickTime Audio can provide. So a way of thinking of this could be if you need multimedia presentation support, file and data format support, QuickTime Audio is more the kind of APIs you'd want to use.

If it's more signal processing specific, not really related to video, Core Audio APIs might be a better choice. That sort of concludes this first section of the QuickTime Audio overview. And I'm now going to hand it to Brad, where he'll talk about deprecated APIs. But don't worry, I'll be back. Thanks, Sayli. And thank you all for sticking it out with us this late in the day. It is the last session of the day, and I'll try not to get too punchy.

API best practices is a nice way of saying, "Stay away from these APIs that will hurt you." So first let me ask a philosophical question, and maybe I could ask you all to write a little mini essay on this of 500 words or less. What is QuickTime to you?

I guess the answer might be different depending on how much pain you've gone through through the years, what level of APIs you've used. It's either suited all your needs, some of your needs, or been woefully inadequate in some areas. One thing we can say about QuickTime is that It is 15 years old, and that might give a perception of QuickTime as a whole, as a framework, as something looking like this. This should be rehashed to you if you were here for session 212 this morning. That's okay. Seeing it a second time is good for you. Underneath it says, big, bloated, crufty, and difficult to use.

That's the perception that one can receive when you say that a piece of software is 15 years old, because as we know, in software years, 15 years is like 90. But this is, in fact, a myth, and I'm going to debunk that myth, and I'm going to tell you why it's not true. The truth is, QuickTime is modern, very modern. Our audio engine, which this talk is about audio, so I'll just focus on the audio section of QuickTime. The engine was entirely rewritten in version 212.

It was rewritten in version 7.0, and that was two years or less than two years ago. And it sits atop Core Audio, which is a very recent architecture and provides low latency and sample accurate audio rendering. So the audio engine in QuickTime is very modern and very good.

It's built atop Core Audio. The only thing that's really old about QuickTime are the interfaces you use to perform the tasks that you're used to performing. And there's a good reason for that. People that have used our functions for years and years, we consider that a contract with those developers.

And we go through long API approval processes internally before we unleash something on you. We want to make sure that we get it right, because we intend to live with APIs that we publish for a very long time. And so we think long and hard before we deprecate something, because we don't want to leave people stranded.

But we do take API deprecation seriously. When it's appropriate, we feel it is necessary and good and right to deprecate needed APIs so that our outer shell matches the cleanliness and beautifulness of our internal workings. We are modernizing our API as well as our internals. This is an ongoing process.

We talked to you this morning about movie audio extraction, audio context inserts, and some of these other newer interfaces that are absolutely bright and shiny and new great ways to do things. So what we want to do in this section is just tell you which of the five million APIs to not use and which of those are good to use.

So, Sound Manager is deprecated officially in Leopard. They've been saying it on the Core Audio list for about two years now, but I think they really mean business this time. What does this mean for you and your code? In the area of Codex, And I'm only going to speak to codec writers. If you write sound codecs, if you've written SCOMs in the past, sound compressor decomponents, know that these are deprecated and you should be writing audio encoder components instead. This should not come as a surprise to you.

The same goes for SDECs. They are deprecated in Leopard, and you should be writing audio decoders instead. Don't be alarmed by this if you're worried about compatibility with applications that use Sound Manager. There is a compatibility layer built atop ADECs and AINC components called the SMAC. So if you write an AINC or an ADEC, you'll still be compatible with apps that use Sound Manager. It's just that it will... only expose interface that's acceptable for Sound Manager to or one channel of audio and sample rates less than 64 kilohertz.

Direct user access to codec components is discouraged. So I was speaking to codec writers in the first part because we believe at this point you should not, in any of your code, need to open a component on a codec component for any reason. We think that any operation that you would perform querying a component directly, you can now use the Audio Converter API to do the same thing.

Whether it be to get the magic cookie, the channel layout, various information about the formats that are acceptable for the codec, we think you can use the Audio Converter for that. If you disagree with me, please come and talk to us afterwards and tell us why you think you still need to dip down to the codec interface. as a user.

Next up, conversion. Sound converter is really the knife in the chest for sound manager deprecation. Many people just love hanging on to that sound converter because it's served us well for so many years. It is deprecated too. Your choices are: use Audio Converter, which has a very similar interface. It's in the Audio Toolbox framework. Or, if you can release your app with QuickTime 7.1 or later compatibility, we highly recommend use of the SC Audio Compression APIs because they give you an audio converter-style interface, they give you high-resolution audio, mixing, and they give you windows.

That being Windows, like Microsoft Windows. One thing we've seen people do in the past is a sort of combination of get media sample plus sound converter fill buffer in order to get PCM audio out of a file. This is not the correct way to do it. Let me tell you why. If you're familiar with movies, the QuickTime architecture, a movie has tracks.

Tracks have media. Media have samples, media samples. If you need to drill down to the media sample layer to get a sample out of a QuickTime movie and then decompress it yourself manually using sound converter fill buffer, you're not getting a mix of the audio across multiple tracks.

And you're also preventing us from doing interesting things with the media samples in the media. You might be making incorrect assumptions about how we're going to lay out the media. And so it's much better if you can use movie audio extraction instead. The same goes for put movie into typed handle.

Also, it's been around for a long time and people have been using this method to sort of do a mini encode or a mini decode of some number of samples from a source movie and to put them into an uncompressed buffer of audio. This is discouraged as well because it's a lot slower. It incurs a penalty of set up and tear down that movie audio extraction does not. So please, please, please use movie audio extraction for getting PCM audio out of a QuickTime movie.

Audio Mixing. Hopefully most of you didn't need to do audio mixing directly with the Sound Manager. Sound Mixer is deprecated. We found that usually people weren't using this interface because there wasn't, frankly, there wasn't that much mixing to do since it only supported stereo and mono. And you could get at the Sound Mixer by going through Sound Converter. But if you were using it, know that your alternative now is to use a mixer audio unit instead. They come in three flavors. There's a stereo mixer, a 3D mixer, and a matrix mixer. Use whichever one is appropriate for your needs.

In the area of movie export, I'm not going to talk about specific sound manager portions that are deprecated, but more along the lines of you should understand what's happening under the covers when you perform a movie export using QuickTime's APIs like export movie to data ref. The old style of export is discouraged. So you might ask, how do I know if I'm doing an old style export or a new style export?

You are using old style export if you open up an export component that is a spit-type component explicitly. For instance, if you're going to do some setup on it, you're going to pass some atom container of settings to it. And then you take that previously configured export component and you pass it to an export API such as movie export to data ref. You are getting the old style path, whether you knew it or not. This is discouraged because it's using sound converters and other sound manager interfaces underneath which are now deprecated. That means that we're not going to maintain that code path.

Well, we maintain it, but we don't want to make fixes there. We want to be working in the new world. So you should opt in for the new style of export instead. What do you have to do? Well, if you don't do what I just described, if you don't open up that spit component explicitly and you just pass null as the last parameter to one of these movie export APIs, you're covered. You're automatically getting the new style export.

If you need to continue to do this to open up an export component explicitly, then you need to make one additional API call if you're not already doing so. Call QT set component property on that export component instance and tell it that you want to opt in for high resolution audio features. It's kind of small. I don't know if you can read it. I think we have a competition in QuickTime for the longest constant in the world. We're getting close. KQT movie exporter property ID underscore enable high resolution audio features.

Next up is Capture. Sound input components are deprecated in Leopard, and you have several choices instead. If you are at the Core Audio level, you can use an AU-HAL audio unit. It does the same thing. You could also use the Core Audio interfaces directly, but they are a little bit more difficult to use, and so I think the Core Audio team would agree with me. You want to be at the AU-HAL audio unit level instead.

If you want to do synchronization with video as you're doing Capture and you like the sequence grabber, then know that the sound media type SG channel is deprecated because it uses a sound input component underneath the covers. Instead, you should be using the newer SG audio media type SG channel, which, not surprisingly, uses an AU-HAL audio unit underneath the covers. But the preferred method to use, and we really, really want to use it, is the Sound Input Component. The preferred method to use, and we really, really want to push this point, is that QtKit Capture in Leopard is the way to go for Capture, if you can.

It's a higher level interface. It's a very clean interface, so it might not allow all of the kind of fine-grained control that you're used to with Sequence Grabber, but this is an API set that's still actively being defined and pursued, and so now is the time to provide feedback. It's available on your Leopard. It's available on your Leopard disks. Look at the QtKit Capture APIs and let us know. Provide us feedback with what you think is missing so that you can transition off the Sequence Grabber as soon as it makes sense for your application.

In the area of playback, sound output device components or SDEV components are deprecated. Many of you found out about this in QuickTime 7.0 when we just stopped playing to them and started playing to Core Audio HAL devices instead. Sort of rude of us at the time, but we're telling you now officially that they're deprecated.

If you need to write an audio device driver, the correct way to do it on OS X is to write a Core Audio HAL plug-in in user space or a KEXT driver at the kernel level. However, we're finding that most people weren't writing SDEVs because they needed to write a device driver.

They were doing it as a sort of hackery, trickery way to get uncompressed PCM samples out the end of the day. So, we're trying to get rid of QuickTime's audio rendering chain. Well, that of course is deprecated. Don't try to use an SDEV for real-time effects insertion. Instead, we have a new way of doing it in Leopard called the Qt Audio Context Inserts, which Sayli will be telling you about in great detail in just a few minutes.

Sound description handles. So thus ends our section on Sound Manager proper, I wanted to inform you about a couple other interfaces in regard to QuickTime Audio that you should steer away from or know some caveats about. And one of them is sound description handles. They come in three versions now. That's a lot of versions to say the same thing. They can be very confusing to directly access the various fields in sound description handles.

We want to discourage you from doing that. Instead, we introduced in QuickTime 7.0 a suite of accessor methods called the Qt Sound Description functions. And there are five of them. Qt Sound Description, Get Property Info, Get Property, Set Property. That's how you find out about the format as an Audio Stream Basic Description, Channel Layout, Magic Cookie, et cetera.

You create a sound description. Not by saying new handle and then filling in the fields yourself, but by using Qt Sound Description, Create. And if you want to convert between one version of sound description and another, please use Qt Sound Description Convert, rather than trying to do the conversion manually.

Media handler calls. In the older QuickTime world, a lot of functionality was gotten to directly by going down to the media handler and making calls directly on the media handler. So on the left side here, we have the don'ts, and on the right side, we have the do's.

Instead of going down to the media level, we now have introduced track level, or in some case, movie level, properties to do the same thing. And the reason you don't want to go down to, say, media set volume rather than set track audio gain is that the media set volume call has... has worse parameters. It takes a short instead of a float, so you have less granularity and less resolution in making your volume changes.

Instead of a range of 0 to 255, when you use the new one, you can go from 1.0 down to 0.0 in floating point. So take a look at all of those functions and please transition over to the track level. And in two cases there, the movie level property calls rather than the media handler calls.

Load Movie into RAM is sort of a special case, but I wanted to bring it up because it's been somewhat problematic in code that people have talked to us about on the QuickTime API list recently. Load Movie into RAM was a really great idea in OS 9. Remember when you had those apps that you could do Command-I on, and then up would pop that window where you could allot memory to an app, and you could boost it all the way up and give an app lots of memory?

Well, in modern OSes like we have today with OS X and Windows, with virtual memory systems, it's ill-advised to call Load Movie into RAM because you're just going to be paging out that memory anyway, so you're not gaining anything by it. It is ill-advised when working with audio media to use Load Movie into RAM, and we also want to discourage using the preload bit on audio tracks to do the same thing.

So that concludes the section on deprecation and best practices. Let's transition over to the new APIs in QuickTime 7.1 and Leopard, first of which is SC Audio Compression APIs. We touched on these briefly this morning in session 212. Where are they? They're in the QuickTime 7.1 SDK, or I believe it's now called the QuickTime 7.1.2 SDK. It was released earlier this week. If you go to the web and you download Xcode 2.4 developer tools, you will get the QuickTime 7.1 SDK for the Mac, and it's also available as the Windows QuickTime 7.1 SDK.

The SC Audio Compression APIs are available on Mac and Windows. You can find the prototypes in QuickTimeComponents.h, and you can find sample code in SC Audio Compress and WACtv. WACtv has been out for a year. SC Audio Compress was just released earlier this week. It's on your seed example CD, and it should be live on the developer.apple.com site as well now.

What are the SC Audio Compression APIs? Well, SC Audio Compression is a component. You open it using the Component Manager API, like open a component. It's of type SCDI and Audi. It's been around for about two years now, but we've just beefed it up with the ability to do encodes and decodes.

It is a modern replacement for Sound Converter. So now we can finally answer people's questions on the API list when they tell us that they want to use Sound Converter and we say, "Don't use Sound Converter, use Audio Converter," and they say, "But I want my code to run on Windows." You can finally say, "Oh, okay, use SC Audio Compression APIs instead."

Another great thing about it is that if you've used Audio Converter before and you're familiar with the interfaces, you'll be right at home because it uses an Audio Converter fill complex buffer style interface. The equivalent in the SC Audio Compression APIs is called SC Audio Fill Buffer. Let me give you a demo.

This is a demo I showed this morning too. Let's go over to the demo machine. So I have here a build of Leopard. And I'm going to open up a QuickTime movie. This is of a recital that I gave two weeks ago, Sunday. I'm going to play a little bit of this.

So it's a solo guitar track. The first thing I do after a concert is get a call from my mom saying, how did it go? And after the call usually comes a request for the audio of the concert. And so I have to send her audio of what I just played. And it usually is too big to go over the internet without compressing it. So I go into my QuickTime Pro Save Exported File As dialog.

and I compress it to something. This is the point where usually people get really, really confused because it's just option overload. You've got, I don't know, what is it, 10 now formats to choose from, lots of channel layouts to choose from, lots of sample rates, and all of these various settings. So it might not be intuitive to the audio newbie which combination of parameters is the correct one for export.

So one thing that we added with the help of SC Audio Compression APIs in Leopard is a preview function in QuickTime Player. So these two new, these buttons are new down here. They let you loop a 10-second segment of the source movie and listen to it as you apply these settings in real time, and it will update the sound.

So you can just sort of listen to the trade-offs between quality and file size as you go, and then pick the right one for your particular situation. So I'm going to go ahead and play around with this a little bit. ♪ ♪ Can we turn it up a little bit?

[Transcript missing]

And I can compare that to the source at any time by toggling back to play source. Here are how much of the highs I'm missing out on if I go down to 16 kilobits per second. So there it is, SC Audio preview in QuickTime Player. Let's go back to slides.

How did we do that? There's the dialog on the top. That is a client of the SC Audio compression component. The SC Audio compression component has a list of properties. So all of that UI was built from that list of properties. You could build your own UI if you didn't want to use that stock UI, if you don't like it for some reason. And so you get callbacks when any of those properties change. If you have registered listener callbacks, you're told, say, when the current compression type changes. Underneath the covers, you'll see there's an SC Audio fill buffer call in the middle.

We know about the source movie, so we use movie audio extraction to get the raw PCM samples mixed from the movie. We run them through SC Audio Fill Buffer, applying the current output settings, and then write a new destination QuickTime movie using Add Media Sample 2, and then use QuickTime's built-in playback mechanism to just task the movie and idle it.

How does it work? I just told you about what's happening outside of the SC Audio box, but here's what's happening inside. When you call SC Audio Fill Buffer, you're providing buffers of audio in the source format, and you're getting them out in the destination format. Internally, we use an audio converter for decode if necessary. There are opt-outs for all of these things.

Next, we place a matrix mixer audio unit for mixing if a mix is desired. So if we're going from N channels to M channels, we'll insert a matrix mixer. It also internally uses another audio converter for ENCODE if necessary. So it's sort of similar to Sound Converter in that it does mixing unlike the Audio Converter.

Why would we want this? Well, hopefully it's self-evident by now. We needed something to replace Sound Converter on Windows. That was a big reason. It enables cross-platform decode, compress, and transcode. It provides value over the audio converter alone because it can do this mixing. It has a very nice, totally property-driven interface, and if that stock dialog is enough for you, go ahead and use it. You call SC Request Image Settings.

Here's the sort of general overview for how you would program it. You open it up, the component instance. You can configure the output format, input format, or the list of compressors and sample rates, et cetera, that you want to see in the lists by using this suite of QT get component property, set component property, and look in QuickTimeComponents.h and search for SC Audio.

You'll see a list of about 30 different constants that you can use to set for properties. After you've configured it with the input and output formats, and that's the minimum config that you need, you can start calling SC Audio fill buffer repeatedly and get those samples out of it. It is a pull mechanism just as Audio Converter is.

Here's what the APIs look like. SC Audio Fill Buffer has exactly the same parameters past the first one, which is a component instance. It's exactly the same parameters as you would see in Audio Converter Fill Complex Buffer. Down at the bottom, you'll see a type def of the SC Audio Input Data Proc, which again has exactly the same parameters as the Audio Converter's Input Data Proc for Fill Complex Buffer. And there's an additional call in the middle there, SC Audio Reset. If you've finished your compression and you want to, say, jump to a different point in the file, you would call SC Audio Reset to empty out any latent samples in the chain.

The contract for SC Audio Fill Buffer's input data proc is identical to audio converters. Let me just review that for a second. So again, you call SC Audio Fill Buffer, and then on the opposite side, you've registered an input data proc that will be called to provide samples in the source format.

You fill buffer for some number of packets in the destination format, and when your input data proc is called, it will be called for some other number of packets. It might be the same or it might be different than what you pulled for in fill buffer. And in that input proc, you may provide exactly that number of packets that it requested, or you can provide fewer or more or none.

You signal temporary underrun by returning an error from the input proc. And if you want to signal that you are out of data, you will never get any more data, you set all of the buffer pointers in the audio buffer list to null and set number of packets to zero in the input proc. And it might call you about six or seven extra times saying, "Are you really out? Are you really out? Is this really the end?" And each time you do the same thing and then it stops calling you.

Let's give you a demo of how to program with the SC Audio Compression APIs. Associated with this session, you'll find on your Leopard example CDs a sample application called SC Audio Compress. This isn't quite as sexy as the player preview that I showed you earlier. It's just a command line utility. Sorry.

But it's nonetheless very useful because it boils a compression using QuickTime from start to finish down to less than 1,500 lines of code. So we have--I've pre-built it here just to show you what it looks like, the parameters to it. When you invoke SC Audio Compress, you need to provide it a -i and an input file and then -o and an output file.

And then it pops up the dialog that you saw before. You select an output format. It performs the compression and writes it to a new file. It can accept any audio file or movie, and it can write to a CAF file or a movie or raw PCM. So let's go ahead and give a demo of that. I've got that same content from before. So I'll say -i Natalia Excerpt - O and I'll write it out to some temporary location.

Okay, now notice there is no preview button and play source button. You might ask why. Go ahead and ask. Well, because this sample is shipping now and works with the QuickTime that's already shipping, the... QuickTime that I showed you earlier is the one that's shipping in Leopard. This one is compatible with QuickTime 7.1. That's why you don't see those two buttons.

So I'll go ahead and pick some output format. Many of you might be interested to know that we now are going to put FLAC in Leopard. That's been a highly sought-after lossless format. So I'll compress to FLAC, say OK. It compresses it, and then I'll go ahead and open the result.

and amazingly, it's a movie with flack in it. Okay, so let's look at the code very briefly. I don't want to spend too much time, but I do want to just show you how to read it. It's very non-intuitively read from bottom to top. So when you're starting looking at this file, it's all just one file. Start at the very bottom where main is and work your way up. And you'll see the list of tasks in order. It opens the source file, it configures std audio, then it opens the destination file, then it calls SC audio fill buffer repeatedly.

So it's not only a good sample for learning about SC audio fill buffer, it's also good for learning how to read audio files using the audio file API and how to write using the audio file API and how to read using QuickTime and how to write using QuickTime. Okay, let's go back to slides.

Okay, I think I just covered all of that. It uses specifically Audio--Add Media Sample To to do the writing to movie. It uses audio file write packets to do the writing to a CAF file. Here's what it looks like graphically. SC Audio fill buffer is in the middle. That's performing the decode, mix, or encode. SC Audio Input Dataproc is providing the buffers in the source format, and those are acquired either using the audio file API, if it happens to be a .aif or a .wav or a .caf.

Uh, or it uses--it defaults back to the QuickTime APIs to open the movie and then uses Movie Audio Extraction Fill Buffer to acquire samples using that path. And then on the output side, it uses audio file write packets to write a CAF file, or if you specified .mov as the output format, it will use Add Media Sample To to write the output format. With that, I'm going to turn it back to Sayli to talk more about the new APIs in 7.1 and Leopard.

So moving along with our discussion of new APIs, we have pitch control. About two years ago, we showcased this pitch control through our QuickTime player. And if you ever opened up the AV Controls panel, you know you can control the pitch. But we hadn't really given a public interface for this. So we've finally done that as per your request. So in new in QuickTime 7.1, you can control the pitch of the movie. How do you do it? It's just one simple property. It's get set and listenable on the movie.

And what it will do is control the pitch of the audio of all the tracks that are mixing into that movie audio context. This means that currently it will not support pitch for tracks such as MPEGs, streaming, or music, because they don't mix into the movie audio context.

One thing to note is that the movie that you're working with needs to have been opened with this. Rate changes preserves pitch property, and that's because setting that property creates a time pitch unit. QuickTime creates a time pitch unit for the movie underneath, and you need that to exist to make the pitch changes. And the unit for this pitch shift is in terms of sense. So setting a value of 100 means that you're raising the pitch by a semitone, and so useful values would be between plus or minus 1,200 because that's an octave up or down.

Next, we have a set of properties that help you to control the render quality of the audio units that QuickTime is using in its render path. And this is useful because often you have to make a trade-off between render quality and CPU performance. And by controlling this, if you're on a low-end machine, you can choose to use a slightly lower render quality if you're on a high-end machine.

And you can afford the CPU usage. You can set a higher quality. In particular, that time pitch unit that I just mentioned uses extremely different algorithms when functioning in the low and the high qualities. In the case of the low quality, it's using a time domain algorithm, and in the higher quality, it's using a frequency domain. And by default, we haven't set... This unit to function in the high quality frequency domain. So the only way you can access this is by setting the render quality to high. This is again new in QuickTime 7.1.

And you can use-- you can set this property either during movie playback or extraction or during export. And note that when you set it for export, you're setting the render quality for all the units in the chain. So this is your one-step solution for controlling the quality. And the constants are all the constants that are defined in Core Audio, and one additional constant, which is the playback default constant. And you set this to kind of reset your units to a default optimal value that QuickTime chooses for all the units.

The media optimization properties. This is a way in which your applications can allow QuickTime's importers to perform import in an optimized fashion. And when QuickTime optimizes media, it might so happen that it'll create movie structures or layout media in a fashion that's different from what it used to in previous versions.

But this shouldn't affect you because by setting this property, you're sort of implicitly agreeing with QuickTime and saying that I'm going to treat your movie structures as an abstraction. I'm not going to delve into the internals or make any sorts of assumptions. I'm going to access this movie structure through QuickTime's high-level APIs. And what does this get you?

Well, for one, you can get the VBR-style import behavior. Just to give you a little bit of... background. Right now, QuickTime's MP3 importer, by default, when it encounters VBR media, does not create accurate VBR-style sample tables. And this is because of compatibility reasons. When we tried to do this, we broke certain older applications that were making implicit assumptions about how the movie structures are laid out. If you remember Brad talking about the get media sample... Yeah. ...and the sound converter combination that we highly discourage, these applications were doing that. They were basically delving deep into the movies, tracks, media, getting at the samples and doing their own decoding and whatsoever.

Because such applications exist, we can't change the default behavior, but if you're a new modern application and you want to opt into this VBR style import, which is the more accurate import and helps you to eliminate dropouts and helps you to synchronize better, then you set this property to let QuickTime know that you'd like it so. And in the future, we can add other kinds of optimization. For example, we could import movies at the time scale of the audio, and this helps sample accuracy, and you get this for free if you've already opted in.

This is new in Leopard, and the two properties that are new are one works with the new movie from properties, and the new movie from properties is the preferred way of opening a movie, but you can also, if you for some reason have to deal directly with the importer component, you can use the other property and call it right after you've opened the component, but before you do any sort of import. So those three were just some quick new properties. I'd like to now move on to the Audio Context Insert APIs.

QuickTime has for many years had a story for video effects. We've had a video effects architecture, but we haven't had any corresponding story on the audio side. There's not really been any way to get at the synchronized audio. QuickTime's audio context has been pretty much of a black box that you couldn't really access.

Sometimes we got questions as to how do we perform custom processing, and the easiest solution, well, the only solution we could give you was get at the audio using movie audio extraction, do your custom processing, but after that, you're kind of on your own. You have to deal with the synchronization beyond that custom processing point.

So essentially, we were asking you to write your own mini QuickTime playback engine, and that was kind of a difficult solution and not something we should have expected you to do. So these insert APIs are trying to make that job easier for you, and what they are is a way for you to slot in a custom processing unit into QuickTime's rendering chain. And once you're slotted in there, you could either perform, add custom effects like do mixing or perform any kind of, add a filter or whatever, or you could not manipulate the data, but just watch the synchronized data fly by and do data visualization.

You can insert an effect during real-time rendering to an output device, and you can also do this during movie audio extraction. And this insert API is completely compatible with audio units. It's driven by callbacks, so it can work really easily if the kind of processing you do is done using audio units, but it doesn't limit you to using audio units. So you can do your own custom processing, or if you're on Windows, do use direct show filters, et cetera.

So audio context inserts, but what exactly is the audio context? So let's review that a little bit and also look at how data flows in QuickTime's path. AudioPath. So every device on the system has this concept of a device context. It's not something you actually can get at, but it exists.

And generally, when you open a movie, the movie just plays to the default device and all is well. But you might want to play the movie to a different device. And the way you do that is create a context from this device context. And associate a movie with it.

So a movie audio context is kind of considered as a connection between a movie and a device. And if you're not doing real-time playback but doing extraction instead, what you'd be playing to is this abstract notion of an extraction context. Again, you can't really get at this, but this is what exists under the hood.

If the movie has multiple tracks, then audio from those tracks gets mixed into a movie mix before it's sent to device or before it's sent to the extraction context. Now, this mix we refer to as a movie summary mix. And the sample rate, it is PCM audio. Its sample rate is equal to the rate of the highest sample rate amongst the tracks.

And its channel layout is a layout that gets created by mixing like channels of the tracks together. So if that didn't make sense, this picture might help you. Here you see three tracks, and you can see that the lefts and the rights are mixing together. And the movie summary mix is that column you see in the middle.

So where do the inserts fit in? They fit in right there. The input to the insert is the movie summary mix. And generally, when you're writing your insert, you want to try and process the data with the channel layout equal to the movie summary mix. But sometimes, depending on what processing you're doing, it might be the case that you are constrained by what you can deal with in terms of in and out layouts. In that case, when you register the insert, you let QuickTime know what you expect as the input channel layout. And QuickTime will do a mix for you. So the summary mix gets mixed into the mix that you're expecting. And then the data gets handed to you.

You do your processing and then hand the data back to QuickTime, which then gets sent to either the device or the extraction context. So it's important to note here that if you're working with a movie that has created a summary mix of 5.1, but your insert or its internal processing can only do a stereo to stereo Effect, then you're constraining what's being sent to the device. So stereo content will be sent to the device. I'd like to now do a demo of an application. It's a client application of the inserts, and... It uses all these insert APIs and implements the callbacks needed.

That's just a movie of Brad playing his guitar while four Brads playing their guitars because you can never have too many Brads. I'm going to open up the inserts panel. What you see here in the top box is the movie summary mix that I just mentioned. And I'd like to mention here that we have, in QuickTime 7.1, made two more properties public that help you get at the movie and the devices mix.

So you can get at the movie summary mix, and that's the property I've used to get the movie summary mix. And here you see, well, it's a stereo... Studio Content, 32-bit floor and with a sample rate of 48 kilohertz. At the bottom is what the device is configured to. And this particular application is using audio units for its internal processing.

And out here is a cool unit called the filter. If you like how this looks, you can get at it and it's code in the Core Audio developer SDK. So I will now configure my insert. I'm going to select a layout, and I'm going to try and select a layout that is consistent with the movie summary mix, so in this case, stereo.

And my insert is pretty much good to go at this point, so if I play, it's going to be included in the QuickTime signal processing path. So here I'm varying some parameters of this filter and as you can see the effects are being applied real time while playing back. Let's do another unit, say the pitch.

[Transcript missing]

What's going on here is the insert that's configured here is going to be inserted in the extraction that we set up over here. And so up here you see that it's set to extract to a default layer of stereo, and it's going to... Extract the entire file. And if I were to hit export, I'd get an option to save it to a certain file, but for just to save some time, I'm going to just preview this. But know that this preview isn't doing playback through QuickTime.

It's actually previewing the extracted samples after the insert has been applied to the extraction. So if I play that... The pitch parameter has been applied and you can see the effect. Just an idea of the kinds of things you can do with these inserts APIs. And like I said, this is a really simple application that uses audio units, but you can roll in all your custom processing. Could we switch back to slides?

If you were to do this, what were the steps involved in setting all this up? The first thing you'd want to do is query the movie's summary mix. This, as I said, is for you to find out whether you can deal with the mix that the movie is creating. So after you've queried the summary mix, you try and see whether you can process that mix as an input mix.

If you create a context for the default audio device using, or whatever device you need to play it back to using the create context for audio device, and you pass in a device UID, and what you get back is a context. Once you have this context, you, well, first you register your insert with that context. And the function you use to do that is the register insert function, and you provide some registration information that I'll get into some more detail, but here's your chance to provide your in and out layouts that you support and addresses to callbacks that you will be implementing.

So this context with the insert register in it is then set on a movie. If you already have a movie open, then you just do a set movie audio context. If you're creating a new movie, you can send it to the new movie from properties, pass in this audio context to that function.

So those are the only five steps that are really needed, and you're good to go after that point, and QuickTime's going to make calls to your callbacks, and you're in the chain at that point. So what are the callbacks that you need to implement? There's three. You need to implement a reset callback, a process data, which is your render callback, and an optional finalized callback. So the reset callback is called right in the beginning during initialization.

And this is where

[Transcript missing]

The reset callback also gets called every time there's an interruption in the render chain. And this is--so in addition to that format negotiation, this is the right place to reset any of your buffers if you have latency to clear them out. If you're using audio units underneath, here's your place to reset those units. Basically clear any kind of state that you may be holding onto.

The process data callback is called per buffer of audio rendered. QuickTime hands you input audio. You do your processing and hand it back to QuickTime. And note that this callback is made on the high-priority audio rendering thread, so you want to be really zippy. You want to avoid making memory allocations. You want to avoid CF operations because invariably they take spin locks, and in general you want to be quick about doing things in this render call.

One thing I didn't mention is that QuickTime does not implement any kind of bypass for your insert. So if your client application does offer a bypass option, then you need to still stay true to the contract that has been set regarding your output layout. So even if you don't do any processing of audio, if you've said that you're going to take in a stereo input and create a quad output, you still need to be doing that kind of mixing, even in the bypassed state.

And the finalized callback is optional. It gets called whenever the audio context is going to go away, either because the movie was closed or if it's movie audio extraction, the context was reset. And this is a signal from QuickTime saying, no more callbacks are going to be made to your insert. And that's a good place to sort of let go of any resources that you may be holding onto. If you're, again, using audio units underneath, that's a good place to close those components.

So I mentioned this registration information and that struct, and that struct looks exactly like that. Let's start from the bottom. We have the three callbacks, so you provide addresses to those three callbacks. Then you provide information about your input and output channel layouts, and the first parameter is the user data which is sent to-- it's sent as a first parameter to your three callbacks.

Here's a bit of code following the same steps that we went through. So in step number one, you query the movie audio context, and like I said, you query the movie's summary layout and summary ASBD, and like I said, we have a new property that you can use to do that.

Then there's a bit of pseudo-code in step number two, but that's the general idea. You would try and see if you can handle the layout and the summary mix that the movie is claiming to be creating, and if not, then you get the right in-and-out layouts that your insert can process.

Then you fill out your structure. That's pretty self-explanatory. In step number four, you create an audio context. Now, here we're creating a context for the default device, so we're passing a UID of null, and we get back a new audio context. We then register our insert with that context in step number five and then set it on the movie in step number six.

So we just went over the steps involved in getting hooked into QuickTime's playback path. But like I said, you could also insert your custom processing in QuickTime's movie audio extraction path. And the way to do this is slightly different, but very intuitive if you've been working with the movie audio extraction APIs, which is first thing you do is begin a movie audio extraction session. You would query the extraction layout to see if your insert can deal with that.

You then set any properties that you need. These might be the start and stop times of the extraction, the layout that you might extract to, etc. But the property that is of importance in this discussion is the register insert property, and what you provide to this property is the struct that we went through, the register info registry structure.

QuickTime Registry Infrastructure. And then once you're configured, the fill buffer call can be made, and this is where your reset and process data callbacks are called. And once you're ready to end the session, you call end, and that's where your finalized cleanup callback is called. So the only new piece of code here is the setting of one property, and what you provide is that registry info struct.

A few things to note when you're working with inserts. One, you can have only one insert per audio context. So if you want to add multiple effects, you would want to create, say, a graph of your own on your end, but on the QuickTime side, it's just one insert. QuickTime will not work with protected content, so if you have movies with some tracks that have protected content, the call to register insert will fail.

and you must be ready to process at whatever sample rate audio is given to you during the reset callback. Well, I sort of showed you in the data flow diagram that what you get is the sample rate of the movie summary mix, but QuickTime is free to optimize that and change that, so you shouldn't be making any assumptions about that and should be able to deal with the sample rate given to you and not change it when you're doing your own processing.

And that was it regarding the audio context inserts, and in general, that's the end of the session. But I'd like to do a quick summary of what we've learned today. We started with a quick overview of QuickTime Audio and its capabilities, and hopefully you have a better understanding of what QuickTime Audio is good at so that you can make a better choice between that and Core Audio's APIs.

We went over some best practices. Sound managers, sound converter are deprecated. We'd like you to use the sound description accessor functions instead of dealing with the sound description handles directly. And movie audio extraction is the new way to go to get at PCM data. And then we covered various new properties and two main APIs. There's the standard audio compression component and its APIs and the audio context insert APIs. And that's it for the session. That's some documentation that flew by.