Harnessing the Audio Capabilities of QuickTime 7 - WWDC 2005

Graphics and Media • 1:09:42

QuickTime 7 adds powerful audio enhancements through the adoption of Core Audio, providing access to higher sample rates and resolutions, multiple channel layouts, and sample accurate synchronization. Learn how you can leverage QuickTime 7's new audio features and capabilities in your own applications. If you are into high-performance multimedia and sound, you don't want to miss this session.

Speakers: Daniel Steinberg, Brad Ford

Unlisted on Apple Developer site

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.

I'm Daniel Steinberg, and welcome to the session on talking about audio capabilities of QuickTime 7. I should mention that we'll be playing some surround sound content in these speakers: center, left, right, left surround, right surround. So if you're sitting near those speakers, it won't sound very balanced. You might want to try to move into the center so you can hear things better.

We'll be talking today about some of the things that you can do to use the new features of QuickTime Audio most effectively in your applications. And we'll show you some of the things you can do to author multi-channel media in QuickTime Player and QuickTime Player Pro. We'll have several sample applications for you to look at, and we'll show you some code.

These samples are not on the CD, I'm afraid, and they're not on the website right now. They're going to be posted to developer services on the web next week. But if you come to the Media Lab this afternoon or during the week, we'll be able to give you a copy onto your machine from a CD master that we've got.

So you can get the sample code if you're hot for it now, and just wait until next week, it'll be up on the web. We won't be covering all features in depth, we just don't have time, but we'll try to give you some salient features and interesting things. We'll talk particularly about multi-channel audio.

Both playback and capture, how to create content, label content, and how to support channel layouts in your application. We'll also be giving a brief overview of some of the audio APIs that are new in QuickTime 7. And some of the property-based APIs. And we'll be talking about movie audio extraction, which is the new API for getting audio out of a movie so that you don't have to go into the samples and decode yourself and try to figure out how it's going to mix. And second guess, you can do it using this API. And we'll have significant code talking about audio capture with the new SGAudio channel component.

So I'm going to very, very briefly talk about what's new in QuickTime because there's quite a bit, and we went over it last year. I hope many of you were here last year, and now it's out there. The fundamental change is that we are now playing soundtracks through Core Audio instead of Sound Manager. And this is through the facility that we're calling an audio context.

You can create an audio context for a Core Audio device and then assign it to your movie, and then all the soundtracks will play to that context. That context includes a mixer, so it's mixing all tracks together, and then you can do things like gain control on the mixed audio, which was not possible before. When you said set volume, it would go to individual tracks.

Now we actually have a physical...

[Transcript missing]

We have movie-level APIs for, as I mentioned, setting the playback device and also for certain controls. And one of the things that we've gotten by the ability to be working with Core Audio is a pitch-preserving vary speed, the AU time pitch unit in Core Audio, which is an exciting feature for us.

It's enabled by default in QtKit and QtActiveX, and therefore the QuickTime player, which is QtKit-based, gets that by default. Other apps, you have to opt-in because the backward compatibility mode is you open a movie that had, for instance, scaled edits, and they change pitch. So if you opt-in using new movie from properties, then you'll get this new property and preserve the pitch on the scaled edits.

We have high-resolution audio support we've been talking about for some time, media sample rates up to 192 kHz, sample depths to 24-bit, and 32-bit float. All the Core Audio internals are 32-bit float, so you can store a movie in 64-bit float, but it's all going to be converted to 32 when we're processing it.

We support multichannels, as I mentioned. You can have single tracks, which are multichannel tracks, or you can have individual tracks that have individual contributing channels, and it all gets summed into a multichannel layout. You can label channels with their spatial location, like the speaker name or speaker location, and it's all mixed and routed under the covers.

We support playback of up to 24 channels. There's really no soft limit in what we support. We're just talking about what we can guarantee and test right now. and Playback and Capture are supported on most of the Pro Audio interfaces. If your favorite interface isn't covered, please let us know and we'll make sure that we get it supported.

Audio extraction is now the one-stop shopping for getting audio out of your movie. Because it's using the playback engine as opposed to a separate mechanism, what you hear is what you get. Finally, what you hear when you play the movie is actually what is exported, which is a nice thing from our point of view, and it should be from yours as well. Export of audio uses that same audio extraction mechanism, so it's all integrated.

Audio extraction works on multiple threads, and the demo code that I'll be showing you actually does export and playback on a separate thread, so you can see how we can do that. There's a tech note on the developer website, 2125, that talks about multithreading in QuickTime, and it's a very useful tech note that you should look up if you want to do multithreaded applications.

The compression configuration that was Stood Sound is now superseded by Stood Audio, and it provides access to all the different features, all the codecs, all the multi-channel capabilities, and you can either use its UI or you can configure it headless, as it were, and put your own UI up and use it to screen all your compression capabilities and whatnot.

As in much of QuickTime 7, we have a lot of properties now, and movie properties and track properties are kind of the way that we're moving in the future. So we support that as well with gain and mute properties, a balance property, channel layout properties, and several others. There's some convenience functions for the top-level movie and track control, like gain, mute, and balance, and also level and frequency metering, which is now on the audio context as opposed to individual tracks. And as I said, QtKit and QtPlayer have been vetted and used properties extensively.

On the capture side, we have a lot of new things because we've got this new capture component, SG Audio Channel. It allows you to have taps in your audio chain as you're capturing. It's all integrated with Core Audio, so you can get data out in formats that you can feed into audio units. We support multi-device capture and preview. And there's quite a bit of sophisticated routing and mixing of tracks and channels in SG Audio Channel, and you can control that all.

We'll show you how to do that. Of course, it supports high-resolution audio. We can capture from most Pro decks. And on the fly, you can do format conversion if you're doing bulk transfers and you don't want to be storing unmixed 192 kHz data. You can downmix or compress on the fly.

Can we go to the demo machine, please? And I'm going to show you QuickTime Player. I'm going to take this movie that I created from six individual unlabeled tracks of audio. It's a Thelonious Monk song you might recognize. And bring it up, and I'll show you, we can first of all... See, Movie Info says, well, it's mono because all these tracks are unlabeled, so they appear to be mono.

But if we actually bring up the Movie Properties panel, then you can see there's a bunch of different tracks. So, for instance, when I start playing it, you'll hear out of the center speaker up there, this track, it's a mono track, so it's going to the center speaker. But if I assign it to LFE, where it belongs, then it'll be mono.

Now in the LFE channel. And I can turn on this percussion track and stick it over in the left channel. And add some drums and put it in the back right. Maybe some piano, put it in the front right. There it is. Finally, get some melody. And that'll be in center, because it's the melody. And finally, some brass stabs we'll put in the back corner there.

And now you can see that we've, the summary info shows all these labels, and now you can see what the movie really is. Okay, and I'll save that. And, oh, I can show you this while we're here. The AV Controls panel, which you get even if you don't have the Pro key, lets you play at other speeds without changing pitch. I'll just give you a quick example of that.

[Transcript missing]

So we're going to talk about some of the ways that you can do these in your applications. And we've taken the QtKit player application that's in the SDK now and added a panel to it that's primarily for demonstration purposes and to help you understand how audio is routed. And we'll show you that. Here's what it looks like. You'll see that on the left, there's a display about the movie or about an individual track. And on the right, there's a display about the device it's playing to or the extraction that we're going to do.

In the right, I'll talk about the device first. We have a display of the speaker setup of your device, and we have a button that says Configure Speakers, and when you press that button, it brings up Audio MIDI Setup, which is an application in Application Utilities. You should all know about Audio MIDI Setup, do you?

If you plug in multi-channel audio devices, you want to go there to be able to configure the speakers and the formats and whatnot. I found out that a lot of people didn't know that existed, so they were plugging in their devices and didn't really know how to set them up.

So I recommend that applications actually have a Configure Speaker button or menu item that's very useful. And there's a gain slider on the bottom that'll just track the movie gain, just so we can show how gain is set and controlled and property listeners work for it. Let me talk for a second about gain.

We've been moving from the notion of volume to the notion of gain. Volume was a number between 0 and 255 and was difficult to map to anything useful. Gain is a very explicit mathematical function right now. So a gain of 1 means unity, means it's the same thing the sound passes through. A gain of 2 multiplies the signal by 2, which is effectively increasing it by 6 dB, or a gain of 0.5 lowers it by 6 dB. So obviously a gain of 0 is silence.

The Movie Audio Gain APIs and properties replace the Movie Audio Volume APIs, which are now just kind of implemented on them. And we have a mute API as well. Mute is independent from gain now. So in the old volume setting, you had to say, set negative volume to get it to mute and keep your notion of what the gain was. Well, now gain is constant. It's not going to be the same thing. We've been moving from volume to volume and you can mute and unmute.

And gain is not persistent. This is for playback. If you want persistent gain, there's a property, which is the preferred-- movie preferred volume or something. And that will save with the movie. We hope to add more persistence features into our whole property functions into the APIs as well. And gain is listenable from tracks and movies, so you can have your UI track changes that are happening from other places.

Actually, let me go to the slot, to the demo machine. And I will show you This movie that I just saved on our demo application comes up here as a little player and you can play it. This is the normal QtKit player. And if I go to Window Audio Extraction, it will bring up that panel you saw.

And here's the device layout. It's showing you that we have a multi-channel layout. But in fact, after the 5.1 speakers here, we have unknowns because it's a bigger device that has many channels that are not labeled. So they just simply show up like that. If I press Configure Speakers, you'll see Audio Mini Setup come up and show that we're playing to a mobile I.O.

And if I can find it, it's got 18 channels. And if I say configure speakers, then I can press multi-channel and see the speakers I've actually enabled. And there's several things I can select from. Once you set a multi-channel layout in AMS, that's the preferred thing. That's the thing that QuickTime will find.

So if you need to go back to stereo, you only have two speakers hooked up, you need to actually select it as a stereo device. Now this is new in Tiger, you may have a little difficulty with Panther. But I'm sure you're all running Tiger now, aren't you? And then there's the gain setting as well, and that will track... If I change volume here, you'll see that changes down and listens. Okay. On the left-hand side-- well, let me go back to the slides.

The device layout, if you plug in a multi-channel device and it's unconfigured, the default will be stereo. Now, I hope you're all running 7.0.1, the update that we just released, because there was a bug that if you plugged in a device and it was not known, then it would assume it was multi-channel and you'd find your mono tracks going to the third channel, which is not what's supposed to happen in 7.0.1 that's fixed. Please stay up to date with our updates. We do this for you.

Audio MIDI Setup is your friend, and as I mentioned, the channel layouts, once you set multi-channel layouts, that trumps the stereo pair. And remember, mono tracks route to the center speaker, so once you have a center speaker, you're routing mono to that, not to left and right anymore.

Okay, now I'll go on to the routing panel. We've got... The summary channel layout is a description of all the channels in the movie summed together. So if you've got multiple tracks, it sort of tells you what the summary is. If all the lefts would mix together, all the rights would mix together, and the summary tells you what the final mix is going to look like.

So this shows the summary channel layout. If you select in the top selector a specific track, then it'll show that track and all the channels in that track, and this will let you pop up and change the channels that are selected. And there's a gain slider for the track as well that stays in sync with a property listener.

I'll show you this and show you some of the code for it on the demo machine. So here you've got that layout, and we have selectors for all the tracks. And you can change the speakers right there if you'd like. And that will all take effect just as it did in player. Let me show you the code.

I actually didn't show you the code for the device stuff, so let me just show you that first. To get the gain for a movie, you can actually either call getMovieAudioGain with a float argument, or get the movie property for gain. Either one works. If you want to be listening for changes in the device, then... You say, QT, add movie property listener, and specify the property you're interested in, and then the...

[Transcript missing]

Such a big font, it's a little difficult to navigate. Is here.

So here's the gain change callback. It gets the property class and ID. We just check to see if it's the right movie. Otherwise, we only have one property we're listening for now. So we'll say update what you're displaying for that property. For the device channel layout... Once we have a movie open, we can say, QT, get movie property info on that movie, and say, we're interested in the device channel layout.

Tell us what you've got, and it'll tell you the size of that device. That's a variable size structure, depending on how many channels are in the movie. So you always want to get the size first, and then allocate something to hold it, and then you can do get movie property and actually load the channel layout into your application.

It's very easy to launch an Audio MIDI setup from Cocoa, just launch applications. So, very easy to add into your applications. Now on the left-hand side, we had the movie summary layout. Again, that's just a property, and it's the summary channel layout. And it's the same exact code. You say, get movie property of summary channel layout, get property info, gets the size, allocate, and then get the property, and you've got it loaded in memory.

We actually have another facility in this application called Expand Channel Layout, which will make sure that instead of getting a tag that says, for instance, 5.1 or stereo, you get all the channel names expanded into your structure. And that's useful for presenting UI. You want to get the individual channel names and whatnot. You just need to expand it so you can get them one by one. You'll see all that in the sample code.

Once you select a track on that left-hand side, we do another Get Property Info to get the channel layout of that track, and then we allocate and load that into memory. If you select a different channel layout, well, all you do is call... Set property of the channel layout that you have, and remember your size. And Track Audio Gain is just the same as Movie Audio Gain, QT Set Track property or use the Track Framework API for your gain property. Now I'll go back to slides, please.

Let me talk a little bit about audio channel labels. Every channel is either unlabeled or it's labeled. And if it's labeled, then it can have a spatial or a non-spatial label. The spatial labels are defined in CoreAudioTypes.h and are things like left, right, center, left surround, right surround, whatnot. There's a big list of them.

Non-spatial labels are things like mono, which just says this is mono data, and will route to center speaker if there is one. Otherwise, it will route to left and right. And we have these labels called discrete, number discrete labels. So, for instance, if you have a mixer associated with your device instead of actual speakers, and you want to say, oh, I've got 16 mixer inputs, and I want to route this track to mixer input 5... Numbering from zero, you can, say, set this channel label to discrete 5, and then it'll play to the sixth output of the device, and then everything will route that way. So this bypasses the mixer capabilities, really.

Unlabeled content if we see old movies. Without labels, we assume that one channel is mono, two channels stereo. Anything bigger than that and not labeled, we'll say, well, we don't know what that is, so we'll call it all mono so you'll hear it, and then you can set the labels and route it the way you want.

I'm going to show you a little bit about this. If, for instance, we have a 5.1 movie and we're playing to a 5.1 device, the movie mixer is very straightforward. We just multiply straight through. Now, if we were playing this same movie to a stereo device, it gets a little trickier. And what happens is the left and right obviously go straight to left and right.

Center now has to split and go to both left and right. And as it's split, each side is dropped down by 3 dB, and that's so that perceptually you'll hear the same volume. The LFE, or the low-frequency speaker, just goes away because normally you don't want to hear low-frequency content played in normal speakers.

It's not good for them, and it's not good for you. But the left surround and right surround are also dropped down 3 dB because they're presumably behind you and a little less important, and then routed to left and right as well. So let me show you how that will work on the demo machine.

If I bring up the extraction panel, you can see that the default extraction is the summary layout. Bring this back to summary. And if I say preview, I'll hear that, just as we heard before, in all the speakers. Now if I say play to stereo, or extract to stereo, it'll do the mixing we just saw, and you'll hear them only left and right. And you hear there's no bass. If we were extracting quad, we'd have left-right, left-surround, right-surround, so the flute would move to left and right, the back speakers would stay the same, but again, there's no bass. So you hear the conga and the brass in the back now.

So this is not a, this panel is not really designed for doing much useful work, but it's really designed to help you visualize or audio-lize what's going on with the movie, and it should be very helpful. And also, obviously, all the sample code is very interesting. And I'm going to show you some sample code.

Where is it? I'm going to talk about the movie audio extraction now. And basically what you do is you start an extraction session, and let's movie audio extraction begin. And now you've got a handle, and that is related to the movie you started with. And then you get properties, set properties, fill buffer, which means get data from the audio extraction, and then do whatever you want with it, and then end the audio extraction. Pretty straightforward. So here's the begin.

To get the stream layout, stream format, you ask for the audio stream basic description from the extraction session that you've opened. And to get the channel layout, which is the default layout that started when you first opened the extraction, which is the summary channel layout, you say get property on the extraction of the channel layout property. This is before. You want to get a get info as well to get the size. Let's up here.

Then if you want to set a different format, then you call set property on that.

[Transcript missing]

And that really bypasses all of the mixing, and you get all those speakers, all those channels out individually. There's also a current time, which specifies where in the movie you're going to start the extraction, so you don't always have to start at the beginning. So you set the property current time, and then you start extraction from that point. You can always set current time and start a new point if you like, so you can jump around that way.

Once you want to get your data, you use Movie Audio Extraction Fill Buffer with a buffer list, which is a Core Audio buffer list, and just the number of frames you want, and you're in business. Then you can do anything you want with it. And when you're done, of course, you say Movie Audio Extraction End. So I'll go back to Slides.

So again, there's that routing pane that you'll see in the example code with the layout selector and lets you display and edit the channel labels of the particular extraction you want to do. So you can come up with funny extractions and see how that's all going to work. Start and end times, preview and export. The preview and export both work on separate threads if they can. I'll talk about that in a minute.

Quick summary of audio extraction. Begin, set properties, get properties, fill buffer, and end. You'll only get data from participating audio context tracks. That means soundtracks right now, but there's more become implemented then you get them as well. The movie must be active when you start because non-active movies don't have any audio, essentially. And all the tracks that you want to hear should be enabled as well.

The first thing that happens when you start pulling data through is you clone the movie, which makes a copy of it. And that means that if you make edits in the movie you started with, they're not going to be reflected in your extraction. Now what happens is you configure your extraction and then start. And the cloning starts when you start fill buffer. So you can configure to your heart's content.

Once you say fill buffer, your configuration is frozen and that extraction session is only good for that configuration. Then if you want to change what you're extracting, you can do that. Once you're done extracting, you close it and open a new one. Otherwise, we recommend that you keep it open because it's a little more expensive to close and reopen every time you want to pull a little data out of the movie.

The initial default, as I said, is the summary channel layout of the movie, and we look at all the tracks and find the highest sample rate, and we'll give you that sample rate. If you want to lower it, you're welcome to. We only support PCM data or uncompressed data coming out of movie audio extraction. You'll have to do your own compression if you want to do that. The all-channels discrete option, as I said, circumvents the mixer. Once you set that, you can get your stream layout again, and that'll tell you how many channels you're going to get.

Set current time to set the start point of where you're going to begin extracting. If you have compressed audio data, I'll mention that often decompressors have a little latency to get up to speed to get the right thing. So if you try to seam together pieces that should be next to each other but you haven't pulled all the way through, or this way, I guess, to you, then you may get a little bit of an audio glitch at the beginning because the decompressor hasn't been properly primed. We actually hope we can actually fix this internally, but in the meantime, you might want to start your extraction a little bit earlier and throw away the initial quarter second or so to get the decompressor primed, and then you'll get valid data out.

Once you go past the end of the movie, you'll get silence. At the end of the track, you'll get silence. So just get the movie or track durations to limit how much you're going to extract out. And as I said, we can migrate movies to another thread so you can do extraction in the background. But remember, this only works if all the codecs are thread-safe.

Everything that the movie opens has to be thread-safe or else you're not allowed to migrate. So if there's any audio codec developers out there, please make sure that your codecs are thread-safe. And then once they are, mark them with the component thread-safe flag so that we'll know that and applications will be able to access them that way. Let me go back to the demo machine for a moment.

And I'll just show you that once you select a channel layout on the extraction, we can actually make funny ones, but you don't want to have duplicate channels. It actually kind of works as expected, but you normally would not want to extract with two left channels or something. That's kind of weird. But you can set start time and end time, and preview and export will follow that.

The code that I want to show you is that code for multi-threading. So when we say start preview, when you've hit the preview button, then we say we want to migrate the movie that's the main thread movie to another thread. So we're going to put it into a handle and make a new movie from that handle. Actually, this code needs to be changed. It's not going to preserve the pitch, preserving various--we need to make a quick change to this before we publish it. Then you detach the movie from the current thread, saying let go of it.

And You spawn a thread in your application so your new thread can take over. That's Detach New Thread Selector, and it gives you a function that you can call. And then once you get in that function, you have to say Enter Movies on Thread to say this thread is going to be using the QuickTime APIs, and then Attach Movie to Current Thread. It says bring this movie into this thread, and once that succeeds, you can then do your extraction on that thread. Back to slides.

I'm going to bring up Brad Ford to talk about Audio Capture and SG Audio Channel. Thanks. Hi, thanks for coming. I'm Brad Ford. Let's talk about audio capture, and that means sequence grabber. If you've been here the last two years, you know that this message is starting to sound repetitive because we've been talking about the same things over and over, but now we finally shipped it, and one might think, "Well, what have you been doing in all that time?" Well, we've been vetting it, and we had two very important early adopters to this technology. I'd like to give you a demo of one of them. Could we switch to demo two, please?

Some of you may have heard of this application called Final Cut Pro. Some of you might even use it. Uh, they had a tough problem. A lot of people were buying these expensive decks like these DVC Pro HD decks. They're $25,000 decks that can support eight channels of audio, and they can bring in these huge 1080i 60 video Streams, but only two channels of audio at a time because Sequence Grabber only supported two channels of audio.

Well, they were very eager to see us adopt Core Audio on the capture side, and we went ahead and did that. So I've got a DVC Pro HD deck up here, and as you can see, it's got eight channels of audio. I'm only interested in four of them because that's where I've got my audio playing. Now, their interface is really interested in stereo pairs or dual monos.

They're not really a surround app so much as a multi-channel app. So they were interested in being able to have... Audio brought in to separate tracks in the QuickTime movie to preserve their idea of mononess versus stereoness without needing to get n number of channels in the track. Well, SG Audio Channel supports that.

If you click this button, you're going to get two tracks of audio. They instantiate two SG Audio channels to get these two tracks. So if I do this, because I've turned off these guys, I'm only going to get four channels of audio in the movie, and they will be in four audio tracks. If I click the gang buttons, we'll get two tracks of audio each with stereo. So that's a pretty neat, interesting use of the new API right there. And you can see we can bring in 1080i material.

Uh, and it's got, uh, as we would expect, four channels of audio, and you can see the waveforms right there. Uh, but they're not the only interesting clients. Another interesting client is, uh, not Safari. 'Cause what would Safari want to capture audio for? Uh... Set my deck over to... Play mode. QuickTime Player now supports audio capture as well.

So we can bring in the same audio using the same SG Audio channel. It's doing metering. You can see that right here. ♪ Or you can also do an audio recording, audio only from the same source. Okay, so I just stopped my recording and there it is. Pretty interesting.

Uh, and along those lines, uh, so you just saw I captured a little snippet of a DVC Pro HD movie. Well, you guys are all here presumably because you like QuickTime, and QuickTime's all about movies, so let's take a little break and watch a stupid movie, shall we?

This one I captured from a DVC Pro HD deck. So it's 1080i 6D with four channels of audio. It was brought in into four separate tracks, as you can see here. And the only change I had to do after bringing it in with Final Cut was to label them left, right, left surround, right surround. Can we buck it up a little bit?

Our computer is great. Let's go back to slides. So, using SG Audio Channel. You too can do that. Let's look at a block diagram of the SG Audio Channel. So this is what it looks like from the inside. We just saw from the outside. The deck bringing in audio, and we saw it coming out the other end.

The important thing to realize with SG Audio Channel is that each instantiation of it equals one track in the movie, as it shows right here. One SG Audio Channel equals one track in the movie. Inside, it's got a matrix mixer, an audio converter, a cue, and eventually gets written to file. One nice thing about it over the old implementation is that we have good threading.

So we bring in audio on the Core Audio HAL I/O thread. We process it on a thread that's owned by the SG Audio Channel, a high-priority worker thread that does the conversion, the mixing, and then it's cued up there, and then we write to disk on the main thread at SG idle time.

Migration Story. If any of you use Sequence Grabber and you use the current SG Sound Channel, change to the new one, please. This is a replacement for the SG Sound Channel. Applications must opt in. We couldn't make it as automatic as playback and extraction and export because the old channel was too tightly wound. Well, it was too tightly coupled to the sound input components. The SG Sound Channel is in maintenance mode. A lot of you post to the QuickTime API list about problems with Sequence Grabber and sound. Those will be addressed in the SG Audio Channel.

It has configuration through component properties. We'll see more about that. Now, one caveat is that we did not ship an SG Settings dialog in QuickTime 7 for this new audio channel, but I'm going to show you some sample code on how you can write your own UI based on the properties that are supported. One caveat with our device support is that we still do not have a DV25 or DV50 Core Audio HAL plugin or kernel driver. It's still the sound manager one.

Well, we don't force you to use the old SG Sound Channel just for DV. We've shimmed that into the SG Audio Channel so you can use it for DV as well. But the caveat is that you'll be limited to two channels at a time for DV25 and DV50. First two, second two, or all four mixed.

But the great story is that we support capture from Core Audio devices. One input device per SG Audio channel instance. But if you're on Tiger, you can leverage the work that went into the HAL for aggregate devices. We've tested this, and it works. So if you have, say, two Pro interfaces that each do 24 channels of audio, and you gang them together so that they look like one big device, one big aggregate device to audio MIDI setup, then if you open up an application that uses SG Audio channels, such as Final Cut or QuickTime Player, it will see that aggregate device and capture from 48 channels. One caveat is that we only support PCM input only. So if a device delivers MPEG-1 Layer 2 natively via a driver or some other non-mixable format, we will not deal with it. We need to have PCM coming from the audio device.

We support real-time preview to Core Audio devices, one output device per SG Audio channel instance, just like one input device per instance. But multiple SG Audio channels may be instantiated in a single grab, so you can capture from multiple devices at a time. You can also preview to multiple devices at a time. And there are component properties for those things, too.

All right. If any of you use Sequence Grabber, you probably have looked at Hack TV. Has anyone-- raise your hand if you've used Hack TV before. Yay. This is venerable. It's like 15 years old. Well, today we're giving you something a little bit nicer. Wacked TV. Wacked TV is a replacement that we wrote for Hack TV that's a Cocoa app that supports the SG Audio channel. Thank you.

We also provide some wrapper Objective-C classes for SG objects. We're providing sample SG Audio Configuration Panel to show you how to make UI of your own. And as a benefit, a side benefit, we're showing you how to do some improvements with video, too. So it uses decompression sessions to preview the video, and it writes them into OpenGL views. So let's take a look at that.

Can we go to demo two? Sorry. This is WAC TV. It is a Cocoa app. It has classes. And you'll see here we've got a couple classes. Now, the QuickTime-- the Qt kit that we shipped does not have support for SequenceGrabber for capturing, but we did a kind of reasonable little subset here that lets you do most things.

So you can grab those if you haven't made your own wrappers already, and you'll-- I think you'll be pleasantly surprised that there's a lot of functionality in them. I tried to make them generic enough that you can reuse them in your own Cocoa apps. And then we have some specific stuff to our app.

Now, the first thing let's show you is-- and now, WACtv is on your CD that came-- that you got yesterday, hopefully. But an updated version will be available on the web and at the lab afterwards on CD. And you'll want to get the updated version because it's much cooler.

Okay. Will my bookmarks work? Yes, they will. QuickTimeComponents.h is your friend. That's where you'll find everything related to the Sequence Grabber audio channel. Just do a search for SG Audio and start reading. You'll see all of the properties that we support, the classes that are associated with those properties, and there's also notes about which properties are holdovers from the old SG Sound channel, which ones will work with the new one, and then a good description of each property. We're trying to do a better job of documenting our headers. I hope you appreciate that.

Okay, now let's show you, before we go any further, let me show you the UI for Wacked TV. It lets you make more than one audio channel, more than one video channel, which is a change, of course, from Hack TV. I'm going to add a video track first for all our video brethren in the audience. Okay, here's me again. This is, again, DVC Pro HD, but it's drawing into an NS OpenGL view, and so it has really good scaling, for instance.

I've given you some options to show you how to lower the quality. You'll see it immediately go fuzzy when I do that, and you can also throttle back the preview rate and stuff like that. So please look at the sample code. I'm not going to let you look at the sample code in here, because we're all about audio, but you should look at the sample code yourself and see it. Now, for audio... Here is a sample settings panel.

Like I said, we didn't ship an SG Settings dialog, but I wrote this one using just the properties. So you can do it too. And it shows you that you can select recording devices, for instance. You can set their-- and the team at QuickTime 7 are here to share their insights into the new audio capabilities.

Is it there? Oh, there it is. Sorry, the font's messing with me. Now when I init this SGAudio component, what I do is I've wrapped the component property calls, get, set, and getInfo. So I'm going to get the property info list, which will tell me all of the properties that this particular component responds to. And then I'm going to add property listeners for each of those that I care about.

Well, I'm going to add a property listener for all of them. And because this is a Cocoa app, I decided to forward those property listeners, which are callbacks, as notifications. So you'll see here when I see that such and such a property change, I forward it on as a notification. And in listener three, we see our app respond to that.

The SG Audio device lists change notification, for instance. It responds to that. This WAC TV controller object pays attention to that notification. When it sees it, it does interesting things rather than blowing up and quitting, which is always nice. So let's do something that we're never supposed to do on stage. Let's pull the plug.

I have an EMI right here, and I'm going to pull the cable out. Oh my gosh. It says it disappeared, but it didn't crash. That's always nice. And that's because it was listening to that property change. And so I go and I select a new device, and I'll plug that device back in. And after a while, he'll show up. After a while, he'll show up. There he is. Thank you. You know, dealing with devices on stage is like herding cats. Okay. So now that's how we listen to properties. Can we go back to slides?

Device channel selection. Wouldn't be very interesting if you had to always get all the channels on the device. If you only want, for instance, four of them instead of eight or 24, you would be sunk. But we provide you with component properties that let you do this. You can discover the device's format as a flattened stream, not multiple streams like the HAL does it. So if, for instance, it's a complicated device that has 18 different streams, you don't have to go try to query all the different streams. We'll present a flattened audio stream basic description that represents all the channels on the device.

We'll let you set the device format, and we'll let you specify a subset of device channels using the channel map. And using multiple SG Audio channel instances, of course, then you can split up these channels across tracks in a movie, just as Final Cut did in the first demo. So back to the demo machine, I'll show you how we do that. First in the app... Channel 1, Channel 2, Channel 3... So I'll just turn off some of these channels.

Okay, makes sense? Now let's go back here and find out how we did that in code. Oh, there it is. So setting the channel map is a little bit trickier than setting one property. You need to know both the device, or excuse me, yeah, the record device channel layout as well as the map, because the number of channels needs to mesh.

So you should stop the channel preview before you set these properties. Go and find your new channel map. A channel map is a number of channels that you want to enable. So for instance, right there I turned off 4, 5, and 6, and I just have channels 0, 1, and 2 if I'm zero-based.

So I would pass it an array of Sint32s with 0, 1, 2. And then I would also pass it a channel layout that had three channel descriptions in it. And then when I'm done setting these properties on the SGAudio record device class, I set the channel map and I set the layout right there. Then I go ahead and I start the channel preview again. Okay, back to slides.

Okay, let's look at the insides. A lot of people have been posting to the list about using the SG Dataproc, and that's fine for some things, but it's apparently being overused, and we would really like you to stop doing that and use something better. The SG Audio channel has more appropriate places for you to tap in and do interesting things with the audio than the Dataproc. Now, the Dataproc, if you've used the sequence grabber before, you know that capture happens in several phases. It happens, you get the data from the device, then it might be transformed somehow, mixed or compressed, and then it's probably queued up, chunked, and then finally written to disk.

Now, the SG Dataproc is a very interesting device. The SG Dataproc fires right here at number five, which is right as the buffers are about to be delivered to disk. So it's too late to do some operations. You don't really want to be performing interesting things with your audio at that point. That's too late. The better place to do them is one of these other four callbacks.

So you can register a callback to get the audio. We'll give it to you as an audio buffer list, so you can use it with Core Audio. You can use it with Post Mix, Pre-Conversion, or Post-Conversion, or you can continue to use the SG Dataproc. The difference between number four and number five is that the Post-Conversion phase is before it's been queued.

So by default, we chunk audio in half-second chunks. So if you register for number four, you'll get audio much more frequently and in smaller, more consumable byte sizes than if you sit on the SG Dataproc to see samples then, when you'll get a half a second of audio. at a time, which is kind of too much to process. SG Audio Callbacks: Pre and Post Mix and Pre and Post Conversion. Five points for you to tap in. Let's look at interesting things that we can do with that in WAC-TV.

Thank you. All right. So I'm going to turn up the audio again. Now, many of you may have seen this ominous button down in the corner that seems to be mocking us and asking us to press it. Well, these AU effects are not supported in SG Audio Channel. We didn't provide any effects built in.

But I decided to use the pre-mix callback to see the buffers as they come in, modify them in place by running them through an audio unit chain, and then sending them back into the stream. So we can do fun stuff like this. Select an AU effect. Let's select an obvious one.

"The Twilight Zone" "Channel 5, Channel 6" Or Band Pass, for instance, which makes it sound like you're on a telephone. So that's cool. I mean, a lot of these effects you probably would not want to put in audio as it's being recorded, but I could very well see someone wanting to throw a compressor limiter in there as a clip suppressor, for instance. So let's see in code how we did that.

In my SG Audio class that wraps the SG Audio channel, I accept an AUFX component description. I go and I open that component, and then the most important piece of code is right here. I fill out one of these SG Audio callback structs with the name of my callback and the refcon, and then I call QT set component property with SG Audio premix callback. That means that my listener will fire when audio arrives at the premix stage, and then I can start shoveling it through my audio unit chain, which I create right here lazily.

And so even though we officially say that the SG Audio channel tap-in points are for read-only, you can see here I've used them to write into the buffers as well. Okay, let's go on to the next slide. Okay, we have SG Audio Channel has very flexible preview. Again, it's configurable using component properties. You can either select a hardware playthrough or a software preview. Hardware playthrough will only work if the device in question supports it.

And you can find that out by getting the device attributes via component property and asking whether it supports hard playthrough. Now, there are two properties that come into play here. One is you need to set the record device on the SG Audio Channel. You can also set a preview device. In order to have hardware or zero latency playthrough work, both the record device and the preview device have to be the same device, and it has to support playthrough. Otherwise, we'll do software preview very flexibly. You can specify what you want to preview.

Remember those tap points that I showed you, one through four? Well, those you can specify as places you want to preview as well. So you can preview what has just come from the device or what was already mixed or what has already been converted to, say, some compressed format. Software Preview will mix to the destination device's layout automatically. Let's look at that.

Okay, here's my channel 1 through 6 again. Now let's go ahead and label these so we get something more interesting than coming out left and right. Okay, now we're previewing this post-mix. What happens if I set the output format to, say, mono? Or better yet... AMR Narrowband. Now what we're hearing is still what's coming off the post-mix, but now it's mono. If I preview pre-mix-- We hear it coming out the different speakers again. Now if I do post-conversion, you're hearing what it sounds like coming off of AMR. Kind of sounds like a cell phone, doesn't it? But I guess that's the point.

Okay, so all of those options are available to you using the flexible preview features. Oh, and I should also show you the code for that, since this WWDC is all about the code. You can see when I am told to update the preview flags, I just go and I set preview flags on the Sequence Grabber channel, and those flags are defined in QuickTimeComponents.h. Look for Channel Play. You'll see four new flags that are paid attention to by the SG Audio channel: Premix, Postmix, Preconversion, or Postconversion. And that will tell it what you want to preview. Can we go back to slides?

ENCODES is the last topic we'll cover. You can do ENCODES on the fly, as I just showed you going to AMR. By default, you'll get exactly the format that's present on the device. It will go through Core Audio, so it will be kind of blown up to canonical float 32 along the path, but what we'll actually write to the file will be what was on the device. This ME, for instance, is in 24-bit little endian integer mode, so that's what we'll get in our file by default, unless we specify a different output.

But via component properties, you can specify any output format you want, sample rate conversion, any flavor of PCM. You can also encode to VBR formats, such as Apple lossless and AAC. You can specify a different output channel layout than the device's, and we'll just do the right thing.

If you are a control freak and you don't want to specify the output layout and input layout cross-point coefficients are, you can set that as a property too and basically control the exact mix that we'll put in the movie. That's also available as a component property. Let's look at that. Specifying output format. I already showed you the demo of it, but here's the code that goes behind it. Now, this little dialog that we used... Oh, it's a real app.

This dialogue here, you may have noticed in QuickTime Player, the Pro version, this is what we use as our export dialogue, and this is what Daniel was talking about as the stood audio component. This is the stock dialogue, but I've also done some custom configuration on it. It doesn't show all of the output formats that we normally do in QuickTime Player, and it shows a different list of supported layouts than we do in QuickTime Player, and I'll show you how I did that.

I specify a list of limited formats that I want it to show, and I specified some arbitrary list of tags, tagged layouts that I want it to show. And then I find out what my record device's input format and channel layout are. Then I open up a std audio component of subtype audio, which is Audi, and I set those properties on it. First I tell it, your client-restricted compression format list should be the one that I specified with just LPCM and ALAC.

And then I set its restricted channel layout tag list so it only shows the restricted list. And then I use my starting and ending formats. And I let the dialog come up by calling SC Request Image Settings. Somewhat of a misnomer, but it's because of its heritage as a standard video configuration component.

So once it's done, we have the new output format that the person has desired. I get that out as a sound description, and I set that property as the output sound description of my SG Audio channel, and now we get a compressed output format. Okay, that's it for me. Back to Daniel.

Thank you. Shortly after Tiger was released, there was a web review of Tiger on arstechnica.com. And it went into all the Tiger features and whatnot. Of course, I was only interested in the QuickTime part. And one of the things that it said was, "We My favorite new feature is the playback speed adjustment, how it can increase speed of audio without changing pitch. That means I can watch WWDC sessions at one and a half times speed without making every presenter sound like one of the chipmunks.

However, it said, there is no filter for Bertrand's accent, unfortunately. Now, in the QuickTime audio team, we take feedback very seriously. And we thought we would try to address this. And we're going to give you a little sneak preview of some of the technology we're working on to address this. So we've got a clip of Bertrand from WWDC 2003. Well, we wanted to clean that up a little bit, so we've got a new menu item called Translation Services, which will perform translations. You can, for instance, translate to English.

Now I know what you're thinking. I know what you're thinking. It sounds a little wooden and Australian. But many of the engineers in the audio team are Australian, and you can't really criticize their accents at all, so we let them do that. Of course, we didn't want to stop at English. We wanted to support other languages, so we've got Spanish. We'll have the full feature set, obviously, when we're ready to release this. And of course, in today's world, you can't really do anything without having an Indian language, so we have Hindi.

My personal favorite is the Yiddish translator. . Now, as you all know, the test of any translation feature is to go back and forth between the original language and, say, English, many times until it reaches a steady state. And that will sort of tell you how you're doing. And I think you'll see we still have a little work to do in this area, but I thought I'd demo what we've gotten so far. We've gone, processed from French to English, English to French 50 times. It comes out like this.

So we'll work on that before we release it. Back to slides, please. So let me just remind you of what we talked about. The player and the pro features in the player help you with your multi-channel content creation and recording. The channel labels inform the movie mixer about how to mix your audio, and you want to support that in your applications.

You can extract using movie audio extraction and any channel layout you specify, and the device itself has a channel layout associated with it that all controls the mixer. Please use discretion using the discrete labels. This is not intended for publishing media. The discrete channels are really to map from your movie to your device, and for your use in development. You don't want to see movies out on the web with discrete layouts.

They won't really work properly. You can use Movie Audio Extraction to get the mixed audio out of the movie. You won't get anything from protected content, of course. and Audio Capture has many new capabilities, synchronized multi-channel capture for multiple devices, very flexible architecture for mixing, routing, and preview.

The website, the developer website for all the audio capabilities of QuickTime is very sadly out of date, and I'm happy to say that we're in the process of revamping that, and we'll have a whole lot more information and more modern samples up there very soon. We'll be spending the next few weeks and months working on that, so we can have much better examples and documentation for you.

The sample code you saw today, you can come to the lab and get a copy onto your disk or wait a week and it'll be published on the web, and also look on the web to see what else there is. We'll be constantly updating, so we'll get much more sample code out to you.

This afternoon there's a Core Audio session on using audio units in this room at 2 o'clock. And tomorrow morning is a hands-on lab all morning with QuickTime Audio team members. And in the afternoon, a lot of Core Audio people will be there as well. I can't be there tomorrow, but I'll be in the lab this afternoon, particularly after the Core Audio talk.

So come talk to me if you'd like to ask any questions or come tomorrow during the hands-on lab. And of course, the QuickTime feedback forum is Friday afternoon, and you can come and complain about all the things you want differently and tell us how great we are when we've done what you liked.