QuickTime Audio Update - WWDC 2004

Graphics • 1:13:10

This session discusses the latest advances in QuickTime's audio architecture. Learn techniques and best practices for incorporating audio in your applications with QuickTime.

Speakers: Greg Chapman, Brad Ford, Guillermo Ortiz, Jeff Brown, Paul Robins

Unlisted on Apple Developer site

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.

We're going to be talking about changes to QuickTime Audio today. For those of you who, like me, are having a little bit of deja vu, we stood in this room a year ago and talked about changes in QuickTime Audio that we intended to ship in Panther. We're going to be talking about some of those again today because we didn't actually ship them in Panther for various reasons. We will be shipping them in Tiger, but what we're shipping in Tiger is a lot more than what we were talking about last year, and you'll see that as we carry on.

Okay, we're finally built on core audio. Just like the video pipeline is being rebuilt on top of core video and core image, we're rebuilding on top of core audio and we're doing lots of good integration work there. We're going to be supporting high resolution audio. We'll be talking about that in a little bit. And we're going to be giving you some great movie-level audio controls and access, which is something you really haven't had before.

So high resolution audio up to 192 kilohertz. We could probably go higher. We certainly haven't put any limits on it, but that's where we're headed in this release. True 24-bit support. Also true 32-bit float support. If you go above that to 32-bit integer or 64-bit float, we do all our work in 32-bit float 'cause we're using core audio and that's the way that works, but 24-bit throughout. Up to 24 channels, and I say and beyond, because again, there's no architectural limitation there. You can have as many channels as you want. And also surround sound channel configurations, which you'll be hearing a lot of today and you heard as you were coming in.

We're unlocking the power of Core Audio for you. Core Audio is all about timestamps. If you use Core Audio at all, you got timestamps coming out your ears. We're taking advantage of that just like Core Video and Core Image and the new video stack is getting timestamps through the stack. We're getting timestamps through the stack. We're going to be completely synchronized.

We're leveraging audio units. We use audio units in our rendering chain and between the two, between QuickTime and Core Audio, you have the foundation for your next killer app. So what we're going to learn today, we're going to talk about enhancing your application to do audio playback, audio capture, and audio export. So let's talk a little bit about audio playback. Pre-Tiger multichannel.

It's more than one. It's twice as many as one. Okay, so what we had was every track in the movie could, every soundtrack in the movie could have one or two channels. A two-channel track implied that it was left/right stereo, of course. Movies could have one or more soundtracks, which sounds like you might be able to do multi-channel, but no, actually, they all just played to the device as stereo or mono.

So, new multi-channel in Tiger. We're doing great. Track's going to have any number of channels. We use a new sound description extension that contains a core audio channel layout to assign those channels to individual speakers. For example, if you've got a 5.1 movie, now some of you guys might not know what 5.1 is.

5.1 is five channels, which is center, left and right, left and right rear. That's the five. And then the .1 is low frequency effects and a subwoofer. So it's six channels. You could have...

[Transcript missing]

You could have six one-channel tracks. Each track in its sound description could say, "I'm left, I'm left surround, I'm LFE." You could have a five-channel track for the five speakers in the room and a one-channel LFE track that maybe was encoded at a lower sample rate because it's only got low frequencies in it. You can save some space there.

Track order in the movie is not important because all your channel assignments are labeled. If you've got mono and stereo tracks, one and two channel tracks without labels, of course, we'll assume they're mono and stereo as we've always assumed before. Are we downmix when necessary? This is kind of cool.

Core Audio gives us this capability. If you've got a multi-channel surround movie and you're playing it on a stereo speakers, like maybe you're just listening on your headphones, we'll mix down to stereo appropriately. If you're playing 5.1 audio on a quadraphonic system, you only have four speakers, we'll do the appropriate mix.

So let me do a demo of multi-channel playback. Switch over to demo two. What I want to do is I want to play one of the trailers that we got that's got full-on 5.1 audio. But first, I would like to play you the stupid movie for the year, which is a little documentary on the making of the trailers.

So now let's see one of those trailers. Our poor intern had no idea what she was getting into when she came here. Okay, here is one of those. In just a moment, you'll hear the audio. Seven days ago, one of my satellites over in Antarctica discovered a pyramid. Where exactly on the ice is this? It's not on the ice. It's 2,000 feet under it. Let's make history. Oh my god. Whoever built this pyramid believed in ritual sacrifice. Did you say this room was called? Sacrificial chamber.

Okay, so let's take a look at what's inside this movie for a second. What we have here We actually have three stereo tracks due to, well not stereo, but three two-channel tracks due to the interesting way we had to capture this stuff. You see, one of them's got left and right, one of them got center and LFE, and the other one's got left surround and right surround. When we mix that all together, send it out to the right speakers, no problem. Woo, that was intense. Okay, so let me show you a little bit of something of Channel mapping. Here's a little piece. We'll probably want to play the rest of this piece. Yes?

and Greg Chapman, Brad Ford, Guillermo Ortiz, Jeff Brown, Paul Robins

[Transcript missing]

Okay, now we're talking. Whoops, where'd my mouse go? Okay, now see if that ended up over there. Oh, it was over there. Woo-hoo. Three cheers. So we can... The interesting thing there is the player, all it's doing is it pulls the sample description out of the track, sets in the channel layout, and I'll show you the very simple APIs for doing that in a little bit and saves it back out to the movie, and then it gets sent to the appropriate channels. Okay, let's go back to slides. Can we go back to slides?

There we go. Excellent. So how do we do that? There's movie file format changes. And as you've heard in some of the other sessions and you'll hear again in more of the sessions to come, we are evolving the movie file format to make it a richer, more capable container for the new kinds of video and audio that are coming along.

So we've had to make file format changes, but what we're going to do--what we're doing is giving you APIs that hide these changes from you so that you don't have to deal with it. We got a new sound description. Obviously, we have lots of different things to tell you about the audio in the sound description, so we've had to come up with a new one.

This new sound description version actually borrows heavily from the AudioStream Basic Description, which is colloquially known as the ASBD, because I can never figure out which word comes where. That's Core Audio's way of describing audio. There's a new optional sound description extension where we put the audio channel layout, which tells you where the sound goes to the different speakers.

There's processing changes, not just file format changes. We have a new sound media handler. We've rewritten it, hosting it on Core Audio instead of on the Sound Manager. We do a multi-level mix. In the old days, every track would get played independently to the sound manager and would get mixed completely out of our control. We had no control over it at all.

Now, there are multiple channels in each track. There's multiple tracks in each movie and there's multiple movies in your application's presentation. We control every single point of that mix and we use Core Audio's Matrix Mixer audio unit for that and they do a wonderful job. It's really great.

Okay, how do you do that? This is how you do that. No changes required in your application. If I had MoviePlayer 1.0 on a floppy here, and if I had a floppy drive, and if we shipped QuickTime 6.6 on OS 7, it would be able to play multi-channel audio. It would have been a great demo.

Okay, let's talk a little bit about these new APIs we're giving you. You don't have to change your app at all, but if you want to leverage these new capabilities in a more rich way, here you go. We're giving you easy movie audio access and control instead of what we had before, which was you basically had to control each track individually. There was sort of an assumption there that there would only be one soundtrack, but if there's more than one, you could control them too. We're letting you talk about the movie audio directly now.

So, we've got movie audio parameters. Simple stuff. You set the gain, you can mute it, you can set the balance. If you've got a multi-channel output device, you can set the fade so you can move it from the front of the room to the back of the room.

Okay, volume metering. We let you do this before on a track level, but never like this, where you can just get the movies, volume meters, you know, one for each channel. Frequency band metering. We kind of had this before Player used it. It worked as Player used it, and that was about it. Here, you set any number of bands. We'll lay them out for you. You can ask what the frequencies were, and then you get the frequency levels, and you'll get frequency levels for every band for every channel and use the ones you want. Pretty slick.

You have to do a certain move or it doesn't work. Okay, an audio context. As you may have heard in the previous session about visual context, we have an audio context as well. An audio context is an abstraction for a connection to the output device that's playing your audio.

It is not an abstraction for the device itself. It's an abstraction for a connection to the device. So, you create an audio context for a core audio device. You create one for each movie. Remember, this is a connection to the device, so each movie has its own connection to the device.

You can set movie audio context. If you were in the last session, they talked about new movie from properties where you could actually instantiate a movie already playing to a visual context. You can do that with an audio context as well. In that same API, you can say use this visual context, use this audio context. You use this instead of the old media set sound output component because we're not playing to the sound manager anymore. And it's at the movie level, not at the media level, which is kind of a weird place to do that.

Okay, here's a good one. Movie audio extraction. Here's a great way to get uncompressed audio out of a movie. You tell me what PCM format you want and what the channel layout is. It's kind of like you're pretending to be a device, right? You're a device, you've got a particular set of channels, I've got left, right, and center, or whatever you've got, and I want 24-bit integer, or I want 32-bit float. And we will mix all the soundtracks to that format and hand it to you.

It's a bridge between QuickTime and Core Audio so that if you've got Core Audio processing you want to do in your app, you're a Core Audio guy, you've got lots of this Core Audio stuff you know how to do with audio units and all that cool stuff, you can get audio out of QuickTime to feed into your audio unit chain or whatever it is you're doing. You use an AudioStream basic description, which is a Core Audio type, to specify the format and it produces an audio buffer list, which is a Core Audio structure for passing audio around. So it just bridges that gap.

Okay, we've got a begin and an end call to begin the extraction. We've got the standard QuickTime set property, get property, get property info trio. Here's the properties you can set and get. The audio stream basic description. It has to be PCM. We're not going to compress audio for you in this API. This is about getting playable audio out.

You can specify an audio channel layout. And you can specify a movie start time. Maybe you want to extract just a little bit from the middle of the movie. You can start here and just pull until you've got what you need and stop. And then you call movie audio extraction fill buffer. And that feels very much like if you're used to audio converters, audio converter fill complex buffer. It's the same kind of call. Gives you an audio buffer list and you're set.

Okay, I talked earlier about how we were setting the channel labels in that demo in Player. They had that panel they brought up. If you want to do that or if you just want to look at a sound description and see what's going on, we've introduced now a new kind of sound description. This is the third version of sound description that you might see in a movie. It's getting a little bit out of hand.

You don't want to know where all those fields are in all those versions of sound descriptions. So, we've got some utility routines for you. Easier access. Future-proofs your application because you can continue to make these calls with version 7 sound descriptions. It doesn't matter. We're still going to give you the same information. It's great.

You can create sound descriptions. If you hand us an audio stream basic description, an audio channel layout if you've got one, the codex magic cookie if you need one, we'll make a sound description and you can do whatever you need to do with that sound description. You can specify what version of sound description you want. You want a version 1?

You want a version 2? Maybe you want the lowest possible version given the format you gave me. So we look at it and we go, well, you know, the sample rate is less than 64K and there's only two channels. I could put that in a V1. And then you might be creating a movie that is backward compatible to older versions of QuickTime, which may be something you want to do.

You can convert from one version of a sound description to another. You don't need to know the details. You can convert to the lowest possible. You may get a V2 sound description that's got, you know, 192 kilohertz audio, or you don't know what the sample rate is, you don't care, but you want to make the lowest possible one, you can do that conversion.

and we've got properties on here. And this is where if you just want to look at a sound description and see what's in it, here you go. We'll give you an audio stream basic description, core audio style description of the audio that the sound description describes. And that's future proof. I mean, we will always be able to give you an ASBD. Life is good.

You can get or set the audio channel layout. And that's what QuickTime Player was doing when we brought up the panel. He got the audio channel layout, displayed it in a series of pop-ups. I changed them, set it, it set the audio channel layout back into the sound description and saved it in the movie. And you can get and set the magic cookie. And if you don't know what a magic cookie is, you don't need to know. It's okay. But if you do, you really care.

Okay, before I bring Brad up, I just want to go over the main points. You don't have to change your app to play this stuff. If you want to... Really enrich your app and you get into the details. We've got good APIs that give you movie level access and control of the audio. And here's Brad and he's going to come up and talk to you about audio capture and audio export.

Hi. So we've been having a lot of fun and occasionally sleeping. And let's start out by doing a little audience participation. So I'd like you to raise your hand if you consider yourself an audio person. Okay. Raise your hand if you consider yourself a QuickTime Okay. Raise your hand if you know what the sequence grabber is.

Wow, I'm surprised. OK, raise your hand if you have ever programmed using the Sequence Grabber. Okay, pretty good. All right, so the sequence grabber, in case you don't know, is what QuickTime uses to capture audio and video. The idea is you create a sequence grabber, you instantiate a sequence grabber component, and that controls the grab. Get it? You're grabbing data, and it's a sequence of data, be it video frames or a sequence of samples of audio. So you create one of these to control it, and then you instantiate sequence grabber channel components.

A video one or a sound one. The video one captures from a video capture device like a DV camera, and the sound one captures through the sound manager from an audio device. Well, if you have programmed audio using the sequence grabber, you may have met with some frustration, and I feel for you, and I've heard your cries and pleas on the QuickTime API list. So here's the state of the world now. Pre-Tiger sequence grabber. Pre-Tiger sequence grabber. Capturing from more than one device, then who knows what you'll get.

Capture limited to mono or stereo. CBR only. Now, we've supported VBR export and playback for quite a while through the sound manager, but we never did the work in the sequence grabber. So AAC was out, and some of the other more interesting audio codecs were out for grabbing. No device sharing among sequence grabber channels.

This is a highly requested feature. You know, if you have... A Motu, and you have a metric halo, and you have a DV camera, and you want to capture from all of them at the same time into one QuickTime movie, you currently can't do it. At least for the audio side. No simultaneous capture from multiple devices.

[Transcript missing]

device sharing among SG audio channels, channel mapping, simultaneous record from multiple devices, timestamps, timestamps everywhere. Device Notifications, Capture to VBR, SG Audio Callbacks, Improved Threading Model. So let's go into each one of those in depth in case you were writing really fast and I clicked too fast.

High resolution audio capture for me means up to 192 kilohertz. Now we're not saying that we can't go any higher than 192 kilohertz. I'm just saying that I haven't tested anything higher than 192 kilohertz and I know that works. So that's what I can put on the slide.

Sample rate conversion. If you have a device that delivers 48 kilohertz or say 96 kilohertz, but ultimately what you need in your movie is 48 kilohertz, we can sample rate convert on the fly, including up converting. So it doesn't matter what the source sample rate is. We'll let you capture to any arbitrary sample rate you want.

[Transcript missing]

So it will be sharing this device only using a single HAL I/O proc and delivering them on one I/O cycle to both SG audio channels. And this is kind of interesting. We're letting you take a sneak peek inside the SG audio channel to see what we're leveraging. Again, we've told you that we're trying to leverage Core Audio's building blocks as well as we can.

So you can see in here we're using a Core Audio matrix mixer to do mix down and level metering. We're using an audio converter to do format conversion, compression, interleaving, and queuing it up. And I'll talk more to this real-time high priority thread and main thread business. But the main takeaway message here is that each SG channel equals one track in a QuickTime movie. And now you can share a device amongst multiple tracks.

Channel mapping is a very powerful construct. I'll tell you why. It's a virtual patch bay trademark. I came up with that one. It enables reordering. What do I mean by that? So it wouldn't be very interesting to capture from one of these multi-channel devices if all you could do was capture every single channel, because sometimes you don't have enough plugs to put in all of those inputs. So it's really nice to be able to disable the ones you don't want and just capture from the ones you do want. Well, we take that a step further. If for some reason you need to reorder-- oops.

It's still a virtual patch bay. So let's say it has four-- this happens to be a four-channel record device. This is an M-Audio device that has four inputs. Let's say you want it to appear in your QuickTime movie track. You want all the channels, but you want them in reverse order. So you can specify, I want one to go to four, two to go to three, three to go to two, four to go to one. And what you end up with in the QuickTime movie is exactly what you specified.

Another powerful thing about channel mapping is that you get to specify your desired channel valence. This allows splitting the device channels across multiple SG audio channels, as I've already talked about, and thus split a device's channels across QuickTime movie tracks. So you could take a six-channel device and put them all in one track, or you could split it across six tracks and have mono channels in each track. This also enables per-channel disabling, as shown here.

Same four-channel record device. The first SG audio channel connected to it just wants the first two, and it gets those in the order it asked for them. And then the second SG audio channel happens to just want the fourth. The third gets routed nowhere. In fact, this is the only way you can do this as an optimization because on the device side here, it's intelligent and knows we have a little layer in between core audio and the SG audio channel that does some intelligent enabling and disabling. So actually, if this device happens to have separate streams, if it happens to present its channels as separate streams, then it will turn this third channel off completely, and you won't waste the bandwidth capturing from that third one. one. It's really nice.

It also enables molting. Molting, if you haven't worked in a recording studio, is just a fancy term for Y-chord. So it's like taking a virtual Y-chord inside of your computer and taking, let's say, that same four-channel record device and taking its four channels and saying, in the first one, I just want the first stereo pair, but I want them twice. So you could go one, two, one, two. And in the second, you're asking for the third and fourth channels twice. You could get three, four, three, four. Each one will have a four-channel movie, but with the appropriate channels.

Okay, it also allows multiple simultaneous mixes. Okay, here's where it gets a little bit interesting. I hope I didn't bore you with all of that. I needed to lay the foundation for some of the cool things that you might be able to do with this. Multiple simultaneous mixes.

Okay, let's say you have a six-channel record device. You can do crazy things with this now. Let's say you wanted to do a raw, all six channels in the first SG audio channel. So you're just going to get one through six or label them as discrete zero through five.

What labeling them as discrete means is that when you go to play that track back, the first channel in the movie in that track will play out to the first speaker that you have. The second will play out the second and so on until you run out of either speakers or channels in the movie, whichever happens first.

And in the second SG audio channel, you're doing a 5.1 mix. So you're still not doing any mix down, but you're taking those same six channels and you're applying spatial order to them. So when you play it back, they'll play out the appropriate speakers. And in the third SG audio channel, you're taking those six channels that came from the record device and you are mixing them down to stereo using that little matrix mixer right there.

And in the last one, you're doing a mono mix. So all of a sudden, you've got a one, two, three, four track quick time movie. Each one has an independent mix. And we already have really nice APIs to let you enable and disable tracks. So you could write a very simple application that could capture very complex movies and then just enable the right tracks based on whether someone pays you more money. I don't know.

Okay. Also, it enables multi-data rate movie capture. I personally think there's a killer app potential here. So we know that we can do separate mixes. Now let's take it a step further. What if you were going to do a separate mix and you were going to apply a different compression or different compression ratio to it? So same six-channel device. Now we're going to do a 5.1 mix in the first SG Audio channel.

So we'll think of this as our high-res copy. It's coming in. It's going to be pristine. The second SG Audio channel is going to do a compressed version of it, a 5.1 AAC surround sound rendition of it at 160 kilobits per second per channel, which is a pretty high data rate, but it sounds great.

And then let's say in the SG Audio channel number three, you're going to get another 5.1 AAC rendition of it, but at a much lower data rate. And let's say in the fourth one, you're going to do a mix down to stereo and apply QGIS redesign to it so that you can broadcast that one on the web. Well, maybe you could write a streaming application that could do some interesting mixes and compression ratios for different broadcasting needs. I don't know. I think there is a potential there. Okay.

So simultaneous record from multiple devices. I personally have recorded from four devices at once, and that's just because I ran out of things to put the devices in to. But if you had lots of cards, I think you could probably capture from a lot more. The one caveat is you can only capture from one device per SG Audio channel at a time. Okay. So if you have two devices that you want to record to, they're going to go into two tracks minimum.

There is no architectural limit to the number of SG audio channels in a single record operation. We'll just leave it at that. However much horsepower you have, however many inputs you have, however much firewire bandwidth you have, that's how much you'll be able to capture. Capture from core audio devices or sound manager devices.

This is interesting. We asked people a long time ago to stop writing sound input components, sound input drivers, but there are still a few hangers on who don't have core audio implementations, and we support those as well in this new SG audio channel. So what this should say to you is going forward, we're not officially deprecating the SG sound channel, but we are highly, highly recommending that you use the new one.

Because the SG audio channel is going to be the new SG audio channel. can do everything that the old one can do, plus a lot more. Timestamps, timestamps everywhere. Core Audio audio timestamps flow through the signal chain. This is great because it means accurate timing info is available to you at various tap points along the way in these thing called SG audio callbacks, which I'll describe in a minute.

This enables us to get perfect sync among device channels which are split across QuickTime movie tracks. And in the future, we'll be able to take those timestamps, the host timestamps that we get in the audio stream and synchronize it with what we get in the video. Device notifications. So Sound Manager didn't give these either, so it was kind of rude. If you were capturing audio from a device and somebody pulled the plug on it, bad things would happen, sort of like a horrible Monty Python-esque death.

Client may register to be notified when devices come and go. So if you happen to be capturing and someone pulls the plug, you'll get a notification and it will stop and not crash. We also send notifications when a device is started or stopped by anyone, when a device's physical format changes, which probably will lead to you stopping so that you can redo your buffers and such.

or when a device becomes hogged or unhogged by you or by anyone else. VBR capture. Okay, this has been widely requested feature and we finally delivered. Variable bitrate formats are first class citizens in the SG audio channel. What does that mean for you? This includes MPEG-4 AAC including multi-channel.

Apple lossless. Again, I think this is a really killer codec for the capture arena. It's interesting on the iTunes music store. It's even more interesting on the iTunes music store. It's interesting in that most people that do captures probably don't want to compress down to something that sounds really crappy.

They would probably like to keep it at high res. Well, now you can get the best of both worlds. If you can capture to Apple lossless, you get to save half the space on your hard disk and you're not losing any bits when you play it back. So that's very interesting. Also AMR narrowband, which is what's used in 3G.

SG audio callbacks. Okay. Many of you who've worked with a sequence grabber know that we have these things called video bottlenecks on the video side. And these are essentially tap points where you can tap into the stream and look at the video at various points. You know, when a frame arrives, when it's done being decompressed, when it's, you know, all these different things. And now we are providing the same thing on the audio side with these SG audio callbacks.

They allow you to look at and or modify. I'll say that. Quietly. Because you okay, go ahead and modify it, but don't change the size of the buffer, please. So you can look at it. As it's flowing through the SG audio channel where okay, here's that same diagram I showed you before. So right before that matrix mixer.

That's tap point one. That's called the pre-mix callback. If you want to get the audio right from the device, the raw samples, you register for that callback and we'll let you see those samples right there. Second one is the post-mix callback. So if you're doing any mix down, you ask for this callback, you'll get it right after the mix happens. Third one is the pre-conversion callback.

So right before it's going to go to the audio converter, you can see those then. And the fourth one is post-conversion callback. So a lot of you have posted to the QuickTime API list about using the SGData proc. And a lot of you are probably abusing it, using it for things it wasn't intended for. Well, this is a much better solution than using the SGData proc.

SGData proc only lets you see the audio right before it's going to be written to disk. If you use these, you get to see it right as it's ready. So you get much lower latency. You don't have to wait for the one second or half a second disk writes.

We've also given you a dramatically improved threading model, as you've seen on some of the previous slides. We're doing minimal work on the main thread. Audio mixing and format conversion take place on high-priority worker threads. What does that mean to you? Well, you have this thing called SGIDLE that everyone complains about. You have to call it frequently. I think the documentation says as frequently as possible.

Well, we do work on the right threads in this new SG Audio channel, such that the only reason that we need to still do anything on the SGIDLE callback, which is on the main thread, is because that's where video does its work. So in order to get good interleaving within the file on disk, we'll still do the writing on the main thread.

Otherwise, there's no reason why we couldn't take this to a separate thread in the future. Better performance. This means that we're going to be able to do a lot of work on the main thread. When you are capturing audio, things are happening on threads where it's not going to interfere with you clicking on a pop-up button or causing a spinning beach ball.

More responsive apps. Okay, enough yapping. Okay, now we're going to see what I like to call a progressive rock demo. It's a progressive rock demo because it's a demo in a few steps, each of which progressively will rock more than the previous one. Okay, so I know, I know.

[Transcript missing]

But what we're seeing here is not what I want because it's recording 18 channels and all I have playing back are 1 through 6. So that's not very interesting. What I really want is to capture just the 6 that I want. So let's do that. This command line app, by the way, is available to you right now. After this session, you can go download it from the disk image for this session. It works just the way it's going to work on stage.

It does just about everything that the sequence grabber does, and it does it all from the command line. Some of your command line parameters will get really long. Greg Chapman, Brad Ford, Guillermo Ortiz, Jeff Brown, Paul Robins This command line app, by the way, is available to you right now. After this session, you can go download it from the disk image for this session.

It does just about everything that the sequence grabber does, and it does it all from the command line. Some of your command line parameters will get really long. while we're recording. So this time we're just getting, could you turn up the audio on that stereo pair a little bit? So what we're seeing on the top one is the levels that are coming into the matrix mixer and the bottom ones are the ones that are coming out of the matrix mixer.

I added a lot of key commands so you can do fun stuff like, for instance, take the master volume down or the per channel volume down. Now you'll notice that what's coming into the matrix mixer is a lot louder than what's coming out of it and I could do a nice little fade out. So we're still hearing what's coming from the device on left and right but now what's being recorded into the movie is nothing because I turned the volume all the way down. So let's play a little bit of that back.

Okay, what have we got? We've got a movie that's got a single audio track with six channels unmarked because I didn't label them. 96 kilohertz, 24 bit. Let's play it back. Four, channel five, channel six. Oh, I've got to do it again. Channel 3, channel 4, channel 5, channel 6.

Okay, so that's interesting. What it's doing is it's playing back onto our output device like you would expect unmarked audio to do. It's playing back to...

[Transcript missing]

Okay, so for the next demo, so that was neat. You just saw a 24-bit 96 kilohertz capture using Sequence Grabber. I don't think anyone's seen that in public before. Thank you.

And now for our third demo, which will progressively rock more than the last one, we are going to do all six channels in reverse order. Remember I told you that we could reorder them. So now I'm going to take the same six channels and I'm going to capture them in reverse order. So instead of 345678, I'm going to get 876543. And I'm also going to show off a little bit and do a sample rate conversion at the same time.

So even though it's coming in at 96 kilohertz, it's going to be a little bit more than that. So instead of 424 bit, we're going to do an on the fly sample rate conversion to 44.1 16 bit. And if you want to hear if you've forgotten what my voice sounds like. Channel six, channel one, channel two, channel three, channel four, channel five.

[Transcript missing]

We're going to do channel molting. Okay, so I told you that we could do this virtual Y-chord stuff. Well, now let's make a gratuitous 16-channel movie with silence in the first four channels, followed by doubled stereo pairs. So what we're going to see here is 0000, then 3434, 5656, 7878, because I can.

And now you'll notice that you'll see one, two, channel three, channel four, channel five, channel six, but nothing ever in the first four channels. This actually has some use. You know, a lot of post houses need your channels on just certain tracks on the tape. You know, so if you're going out to SDI on an eight track tape, but you only want to put your audio on the last four channels, you could provide silence.

You could just record silence in the first four tracks in the first four channels in the movie and then put the real audio on the last four. Okay, and then for our fifth demo, which I like to affectionately call the didgeridoo demo, I'd like to bring up Jeff Brown to help me out. We're going to have a little bit more fun and do some live recording.

Okay, so Jeff is an engineer on the QuickTime team, and he also likes to play Australian instruments. Fun stuff, yeah. So I've actually invited a didgeridoo player to follow me around stage and blow his instrument behind my back. This is akin to inviting a vampire into your house. Thank you. So what we're going to do with this demo is we're actually going to label the audio this time. See what I've highlighted here. I'm going to do two tracks in the QuickTime movie.

The first one I'm going to tell the recording device that it's coming in as left, right, center, LFE, left surround, right surround. And I'm going to map it out to the same places and we're going to have a 5.1 surround sound recording. And then in the second track, In the first one, we're going to output a stereo mix. So we've got a 5.1 coming in, and in that track we'll have a stereo mixdown of what we came in with. And then I need to set my sample rate back to 48.

In the second track, we'll actually have the 5.1. But notice that I've done it in a really weird, bad order. LFE, RS, LS, right, center, left. But it shouldn't matter because I've labeled them. They'll still come out the right speakers when we play it back. Okay, so let's go ahead and do some of this. Okay, so I think we have these microphones turned up. Go ahead and start giving me a drone. So this is right surround, right surround for all you listening at home. This is the right channel. The right channel. This is center.

Oh, we're clipping like mad. Back off a little bit. And then this one is the left. and this one is left surround. So left surround, excuse me, left, LFE, center, right, right surround. Let's give them a hand. Now in our resulting movie, what do we got? As expected, in the first track we've got a stereo mix down at 48 kHz, and in the second track we've got the 5.1 mix in the order that we specified.

LFE, RS, LS, RC, L, which is just whacked, but hey, it'll play back in the right places. Oh, excuse me, I need to open this up in Hipper. Um, or I mean QuickTime player. That's what I meant to say. Um, and... Now let's go in and enable tracks. So I'll turn off the second soundtrack. So we'll just hear the stereo mix.

Do we have the 5.1 turned up? So we should just be hearing it out left and right. and as expected we're only hearing it out the right speaker. This is the right channel. The right channel. Okay, I'll move forward a little bit. So next time we'll get the levels right. And let's instead turn Soundtrack 1 off and turn Soundtrack 2 on. So now we'll just get the 5.1.

It is right surround. Are you listening at home? Right. This is the right channel. The right channel. This is center. So pretty cool. It doesn't matter that I had them listed in a wacky order in the track. They still play to the right place because I had them labeled right. Thus concludes the progressive rock demos. Can we go back to slides, please?

The rules for audio capture. All of this does come with somewhat of a price. I said it's all new, so it really doesn't cost you anything except for you have to write all of your code over again. Because it's a new SG Audio channel, you have to opt in for it by creating this new channel type. And it has new APIs. It uses component property APIs. Get property info, get property, set property exclusively.

If you look in QuickTimeComponents.h on your seed and search for SG Audio, you'll see a lot of really good inline documentation. So I encourage you to go there and look, because this stuff is very functional in your seed. And you'll see there are about 40 properties that we've defined that you can use to get and set various things on this SG Audio channel. Other rules.

A single SG Audio channel writes to a single QuickTime movie track, right? So it doesn't record from two devices simultaneously. You have to make two of them to do that. It reads legacy SG Sound Channel settings stored in Atom containers. So if you have an existing app that has used Sequence Grabber and you have old Sound Channel settings lying around, you can feed these to the new SG Audio channel, and it will do its best to make them into something that's compatible with what you had.

But when you ask for new settings from it, it will never give you backward compatible SG Sound Channel settings. So be aware that when you get... settings back from it as an Atom container, you'll get new settings that are not backward compatible. Okay, moving right along. That was the audio capture portion. Now, exporting audio.

Pre-Tiger sound compression had a few problems. Exports limited to sound manager supported formats, just like playback and capture. We could only do as much as sound manager could do. We also had problems with the standard sound compression or stood sound dialogue component. This is what you see when you export from QuickTime player to sound.

You see that little cute dialogue. It requires displaying QuickTime's own dialogue. It's really hard to make your own custom UI because we didn't do a very good job of keeping your state up to date with that old component. Also, the dialogue offers strange choices for some compression formats. Here's what I mean.

This is one of my favorites. Movie settings. My compressed format is 32-bit integer, yet my size is 16-bit. Hmm, okay. That never really made sense to me. Now we have made it sane by giving you a new standard audio compression component. What's good about the new export chain?

Well, we use the playback chain. So if it sounds right on playback, it's going to sound right in your export. We've leveraged all the work that we did there in the export chain as well. The standard audio compression component configures export to and from all our new playback formats. So again, that's the new component that you want to work with. And you'll find him in QuickTimeComponents.h if you search for SC Audio. And you'll find lots of properties for him too.

Stood audio works well with or without using our dialog. So this is really good if you like to make your own app that doesn't use System 7 dialogs. Custom UI is much easier to develop, and new property APIs make this possible. Our dialog, so when you see our dialog come up, our new audio export dialog, it itself is actually a client of that state machine. So it's acting the same way that your UI would act.

Here's what it looks like. On the left, you can see we're going to linear PCM, a 5.1 at 96 kilohertz, so it can do all that our playback architecture can. We also get some interesting format-specific settings in this box here, and you can see that for AAC, too. We're doing a 5.1 AAC. Notice that the settings here have changed, and they're specific for what AAC can do in that given configuration.

Many of you may have gone to the first Core Audio session on Tuesday morning, Core Audio in Depth, in which James McCartney talked to you about audio converter settings properties, which is a CFArray of CFDictionary's that contain basically properties that you can parse and make into UI. That's what this guy is doing.

We are the first client of that machinery. So with that, it's time for the big demo. Let's call up all the questions. Let's call up all the quick time audio people, and let's do a killer demo. and the audio team. We have to turn off the mics. Paul Robeson: Say it again. Say it again, Paul. We're going to give you a mic.

Hello. My name is Paul Robbins. I manage the audio team. And we're going to bring together everything you've seen through the session, all the technologies for an end-to-end demo. And what that means is you're going to watch us record live to Apple lossless, export it to MPEG-4 AAC 5.1 surround, and then play it back for you.

All without, you don't even have to leave your seats. And we're going to do this with the help of the engineers you've met, Jeff. This is Siley, who you've seen before, and Daniel and Greg and Brad. And what we're going to do, we're making a feature film. And we're going to act out an old story. You may know it, those of you in the music world. And if you've heard this one before, don't stop. Don't stop us. Let us go ahead.

Can I use this one? I'll use this one. Okay, so what happened? Well, this is really interesting. We just made a movie out of a codec that can currently only do mono or stereo, namely Apple lossless, but we really wanted to do a 5.1 recording, so we did it anyway.

We recorded to six different tracks in a QuickTime movie, each one of them being mono, and we labeled them as left, right, mono, LFE, left surround, right surround, and they're all in compressed Apple lossless. We did this in real time. We also, for kicks, recorded video. So let's play that back and let you hear it.

Back in the days before digital recording, an adventurer came to Africa. He hired a native guide and set off into the bush. On the first day out, the drums started. The sound of those drums! He's not our usual drummer. The drums played day and night. The explorer was getting very nervous. And he asked, "Don't the drums ever stop?" The guide answered, "Drums never stop." The Explorer couldn't eat or sleep. They've got to stop sometime. In the coach days of this, the Explorer was going crazy. When are they going to stop?

But wait, there's more. I told you this was going to be an end-to-end demo. Well, so far we showed you Capture. Let's see if we can export this. So we've got Apple Lossless. Why don't we export it to -- we don't care about video. Video's not that important, so you know, any old codec will do. and why don't we go 48. Let's do a 5.1 AAC. That's impressive. We'll go to a high bitrate.

[Transcript missing]

Export. And we can watch it-- is this going to work in the QuickTime Player Carbon?

Okay, maybe it will. In the meantime, let's go to slides because watching an export is about as interesting as watching paint dry. Let's go back to slides. And I'll tell you the rules for export. The rules for export are: export via convertMovieToFile and dataRef opts in automatically unless you open a movie exporter and pass it in. So if you use one of those APIs and you pass in null for the last parameter, you're fine. You don't have to make any changes. But if you happen to open up your movie exporter yourself, you're going to have to do a little bit more.

If you do that, you're going to have to make this call: qt setComponentProperty passing your exporter instance, and you're going to have to tell it to set this property movieExporterPropertyId enableHighResolutionAudio to true. That tells QuickTime that you understand high resolution audio and you're ready to talk new APIs.

If you have opted in, you only need to make further changes if you call movieExport getSettingsAsAtomContainer and happen to parse the settings yourself, which you should never do. We tell you this over and over. Don't do this yourself, because we might change things underneath you like we just did.

You open std audio and configure the settings. Well, if you happen to open up the std sound dialog, you're now going to have to open up the std audio dialog to configure settings because you need to open the new component and because it has new APIs. Oh, let's go back one. So... Are we done? Just about. Can we go back to demo?

[Transcript missing]

Now notice, AAC only goes to one flavor of 5.1, and it's different than what we recorded to. We recorded left, right, center LFE. It needs center, left, right, LSRS LFE, but the channels all still come out in the right place because we have them labeled properly and because our playback engine rocks.

So go back to slides and let's wrap up. We'd like to give a big thank you to--

[Transcript missing]

and Motu as well. I don't know if any Motu people are here, but you make great stuff too. So do not miss the lab, the content creation, QuickTime development, and very important, lab hours.

We, the audio team, will be there in the lab tomorrow morning, specifically talking about audio and talking about the sequence grabber changes. So do make use of that time. Contact Guillermo Ortiz if you have any questions. And for more information, I told you we have that sample code. It's posted now. You can go find it at these places, connect.apple.com, and then look for our session 213.