Video hosted by Apple at devstreaming-cdn.apple.com

Configure player

Close

WWDC Index does not host video files

If you have access to video files, you can configure a URL pattern to be used in a video player.

URL pattern

preview

Use any of these variables in your URL pattern, the pattern is stored in your browsers' local storage.

$id
ID of session: wwdc2017-501
$eventId
ID of event: wwdc2017
$eventContentId
ID of session without event part: 501
$eventShortId
Shortened ID of event: wwdc17
$year
Year of session: 2017
$extension
Extension of original filename: mp4
$filenameAlmostEvery
Filename from "(Almost) Every..." gist: [2017] [Session 501] What's New ...

WWDC17 • Session 501

What's New in Audio

Media • iOS, macOS, tvOS, watchOS • 55:56

Apple platforms provide a comprehensive set of audio frameworks that are essential to creating powerful audio solutions and rich app experiences. Come learn about enhancements to AVAudioEngine, support for high-order ambisonics, and new capabilities for background audio recording on watchOS. See how to take advantage of these new audio technologies and APIs in this session.

Speakers: Akshatha Nagesh, Béla Balázs, Torrey Holbrook Walker

Unlisted on Apple Developer site

Transcript

This transcript has potential transcription errors. We are working on an improved version.

Thank you. Good afternoon, everyone. Welcome to the session "What's New in Audio?" I'm Akshatha Nagesh, from the Audio Team, and today, I would like to share with you all the new, exciting features we have in audio in this year's OS releases. I'll begin with a quick overview of the audio stack. Audio frameworks offer a wide variety of APIs, and our main goal is to help you deliver an exceptional audio experience to the end user, through your apps.

At the top, we have the AV foundation framework, with APIs like AVAudioSession, Engine, Player, Recorder, etcetera. And these APIs cater to the needs of most of the apps. But if you wanted to further customize the experience, you could use our other frameworks and APIs like AUAudioUnits, Audio Codecs, in Audio Toolbox framework, Code Mini framework, AudioHAL framework, etcetera. In our last year's talk here at WWDC, we did a walkthrough of all these APIs and more, throughout the stack. And I highly encourage you to check that out.

Now, let's see what's on the agenda for today. We will see the new features we've added in some of these APIs, starting with the ones in AVFoundation framework. And that includes, AVAudioEngine, AVAudioSession, and the enhancements we have in AVFoundation on watchOS 4. Later on, we'll move over to the Audio Toolbox world, and see the enhancements in AUAudioUnits and Audio Formats. And finally, we'll wrap up today's session with an update on Inter-Device Audio Mode. We also have a few demos along the way, to show you many of these new features in action.

So, let's begin with AVAudioEngine. And here's a quick recap of the API. AVAudioEngine is a powerful Objective-C and Swift based API set. And the main goal of this API, is to simplify dealing with real time audio, and to make it really easy for you to write code to perform various audio tasks, ranging from simple playback, to recording, to even complex tasks like audio processing, mixing, and even 3D audio specialization. And again, in our previous year's talk here at WWDC, we have covered this API in detail. So, please check those out if you're not familiar with this API.

The Engine manages a graph of nodes, and a node is the basic building block of the Engine. So, here's a sample Engine setup and this is a classic karaoke example. As you can see, there are various nodes connected together, to form the processing graph. We have the InputNode that is implicitly connected to the [inaudible] and is capturing user's voice.

This is being processed through an EffectNode which could be for example, an EQ. We also have something called a [inaudible] on the InputNode through which we could be analyzing user's voice to see how he's performing, and based on that, we could be playing out some cues to the user through a PlayerNode. And we have another PlayerNode that is playing the backing track as the user is singing. All of these signals are mixed together, in a MixerNode and finally, given to the OutputNode which plays it out through the output hardware.

This is a simple example of the engine setup, but with all the nodes and the features the Engine actually offers, you could build a lot more complex processing graph, based on your app's needs. So, that was a recap of the Engine. Now, let's see what's new in the Engine this year. We have a couple of new modes, namely the Manual Rendering Mode and Auto Shutdown Mode, and also, we have some enhancements in AVAudioPlayerNode, related to the file and buffer completion callbacks. We'll see each of these, one by one, starting with the Manual Rendering Mode.

So, this is the karaoke example that we just saw. And as you can see, the Input and the OutputNodes here, are connected to the audio hardware, and hence, the Engine automatically renders in real time. The IO here is driven by the hardware. But what if you wanted the Engine to render, not to the device, but to the app? And say, at the rate faster than real time? So, here is Manual Rendering Mode which enables you to do that.

And as you can see, under this mode, the Input and the OutputNodes, will not be connected to any audio device, and the app will be responsible for pulling the Engine for Output and to provide the Input to the Engine which will be optionally through the InputNode or PlayerNode, etcetera.

So, the app drives the IO in Manual Rendering Mode. We have two variants under Manual Rendering. That is the Offline and Real Time Manual Rendering Modes. And again, we'll see each of these in detail and also, later in this section, I'll show you a demo of the Offline Manual Rendering Mode.

Under the Offline Manual Rendering Mode, the Engine and all the nodes in your processing graph, operate under no deadlines or real-time constraints. And because of this flexibility, a node may choose to say use a more expensive signal processing algorithm when it's offline, or a node for example, a player node, may choose to block on the render thread, until all the data that it needs as input, becomes ready. But these things may not -- will not happen with the nodes are actually rendering in real time, as we'll see soon. So, let's consider a simple example where we could use the offline mode.

So, here's an example where an app wants to process the audio data in a source file. I'll place some effects onto that data, and dump the process output to a destination file. As you can see, there is no rendering to the device involved here. And hence, the app can now use the Engine in the offline mode.

So, it could set up a very simple graph in the Engine, like this. It could use the PlayerNode to read the data from the source file, process it through an EffectNode, which could be for example a [inaudible], and then, pull the data out of the OutputNode and drive the process data into a destination file. And we will soon see a demo of this exact setup in a couple of slides.

There are many more applications where you can use the offline mode. And some of these are listed here. Apart from post-processing of audio files that I just mentioned, you could also use offline mode to say mix audio files. You could use it for offline processing using a very CPU intensive or a higher quality algorithm, which may not be feasible to use in real time. Or simply, you could use the offline mode, to test, debug, or tune your live Engine setup. So, that concludes the offline mode and as promised, I'll show you a demo of this in action.

Alright so, what I have here is an [inaudible] Playground. And this is the example where we will post-process the audio data in a source file, apply a [inaudible] effect on the data, and dump the output into a destination file. I have some code snippets here and [inaudible] on [inaudible]. So, the first thing I do here, is set up the Engine to render in a live mode to the device, just to see how the source file sounds without having added any effect to it.

So, I'm first opening up the source file, which I want to read. And then, I'm creating and configuring my Engine. So, I have an Engine and a PlayerNode. And I'm going to take the player to the main mixer node of the Engine, which is implicitly connected to the OutputNode of the Engine.

Then I'm scheduling the source file that I have on the player so that it can read the data from the source file. And then I'm starting the Engine and starting the player. So, as I mentioned, the Engine is now in a live mode, and this will render to the device. So, let's see how the source file sounds without any effects.

[ Music ]

Okay, so that's how the source file sounds like. So, now what I'll do is, add a reverb effect to process the data. So, I'll remove the player to main mixer connection, and I'll insert the reverb. So, here I've created a reverb and I'm setting the parameters of the reverb. And in this example, I'm using a factory preset and wetDryMix of 70%. And then I'm inserting the reverb in the playback part in between the player and the main mixer. So, now if I run the example, we can see how the processed output will sound like.

[ Music ]

Okay, so now at this point, if I want, I could go ahead and tune my reverb parameter so that it sounds exactly as I want. So, suppose I'm happy with all the parameters and then now I want to completely export my source file into a destination file. And this is where the offline mode comes into picture.

So, what I'll first do is, I'll enable -- I'll switch the Engine from the live mode to the offline mode. So, what I've done here is I'm calling an Enable Manual Rendering Mode API, and I'm saying, "It needs to be the offline variant of it." I'm specifying a format of the output which I want the Engine to give me.

And this is, in this example, same as the format of the input. And then I'm specifying a certain maximum number of frames, which is the maximum number of frames that you will ever ask the Engine to render in a single rendered call. And in this example, the value's 4096. But you can configure this as you wish.

So, now if I go ahead and run this example, nothing will happen because the Engine is now in the offline mode, and it's ready to render. But of course, it's waiting for the app to pull the Engine for output. So, what we'll do next is to actually pull the Engine for output.

So, here I'm creating an output file to which I want to dump the process data. And I'm creating an output buffer to which I'll ask the Engine to render sequentially in every rendered call. And the format of this buffer is the same format that as I mentioned, when enabling the offline mode.

And then comes the rendered loop where I'll [inaudible] pull the engine for output. Now, in this example, I have a source file which is about three minutes long. So, I really don't want to allocate a huge output buffer and ask the Engine to render the entire three minutes of data in a single rendered call. And that's why what I'm doing is allocating an output buffer of a very reasonable size, but [inaudible] pulling the Engine for output into the same buffer, and then dumping the output to the destination file.

So, in every iteration, I'll decide the number of frames to render in this particular rendered call. And I call the rendered offline [inaudible] around the Engine, asking it to render those many number of frames, and giving it the output buffer that we just allocated. And depending on the status, if it rendered success, the data was rendered successfully and I can go ahead and drag the data into my output file, and in case it rendered an error, then something went wrong, so you can check the error code for more information. So, finally, when the rendering's done, I'll stop the player and I'll stop the Engine. So, now if I go ahead and run this example, the entire source file will get exported and the data will be dumped into the destination file. So, let's do that.

Okay, so as you may have observed, the three-minute length long source file got rendered into an output file, way faster than real time. And that is one of the main applications of the offline rendering mode. So, what we'll do next is again, listen to the source file, and the destination file, and make sure that the data was indeed processed. So, that is my source file. And this is my destination file. So, first we'll listen to the source file.

[ Music ]

So, as you saw, it is pretty dry. And now, the processed file.

[ Music ]

Okay, so as expected, the processed data has reverb effect added to it. So, that concludes the offline rendering demo. And I'll switch back to the slides.

[ Applause ]

So, as I mentioned, there are many more applications to the rendering mode. And I'm also happy to announce that the sample code for this example, is already available on our Sessions Homepage, and we'll show you a link to that homepage at the end of presentation. Now, going to the second variant of the Manual Rendering Mode. The real time Manual Entering Mode.

As the name itself suggests, under this mode, the Engine and all the nodes in your processing graph, assume that they are rendering under a real-time context. And hence, the they honor the real-time constraints. That is, they will not make any kind of a blocking calls on the render thread. For example, they will not call any libdispatch. They will not allocate memory or wait to block on a mutex.

And because of this constraint, suppose the input data for node is not ready in time. A node has no other choice, but the say, "Drop the data for that particular render cycle, or assume zeros and proceed." Now, let's see where you would use the Engine in the real-time Manual Rendering Mode.

Suppose you have a custom AU audio unit. That is, in the live playback part, and within the internal render block of your audio unit, you would like to process the data that is going through, using some other audio unit or audio units. In that case, you can set up the Engine to use those other audio units and process the data in the real-time Manual Rendering Mode.

The second example would be, suppose you wanted to process the audio data that belongs to a movie or video, as it is streaming or playing back. Because this happens in the real-time, you could use the Engine in real-time Manual Rendering Mode, to do that audio processing. And now, let's consider the second use case and see how to set up and use the Engine both as an example an in code.

So, here's the app that's receiving input movie stream, and displaying back in real-time, say to a TV. But what it wants to do is process the audio data as it in the input, before it goes to the output. So, now it can use the Engine in the real-time Manual Rendering Mode.

So, it could set up a processing graph like this. It can provide the input through the input node, process it through an effect node, and then pull the data from the output node and then play it back to the device. Now, let's see a code example on how to set up and use the Engine in this mode.

So, here's the code. And note that the setting up the Engine itself, happens from a non-real-time context. And it's only rendering part that actually happens from a real-time context. So, here's the setup code, where you first cleared the Engine, and by default, on creation, the Engine will be ready to render to the device until you switch it over to the Manual Rendering Mode.

So, you cleared the Engine, make your required connections, and then switch it over to the Manual Rendering Mode. So, this is the same API that we saw in the demo, except that we are now saying -- now asking the Engine to operate under real-time Manual Rendering Mode. And specifying the format for the output and maximum number of frames.

The next thing you do is session cache, something called a surrender block. Now, because the rendering of the Engine happens from a real-time context, you will not be able to use the render offline Objective-C or Swift meta that we saw in the demo. And that is because, it is not safe to use Objective-C or Swift runtime from a real-time context. So, instead, the engine itself provides you a render block that you can search and cache, and then later use this render block to render the engine from the real-time context.

The next thing is -- to do, is to set up your input node so that you can provide your input data to the Engine. And here, you specify the format of the input that you will provide, and this can be a different format than the output. And you also provide a block which the Engine will call, whenever it needs the input data.

And when this block gets called, the Engine will let you know how many number of input frames it actually needs. And at that point, if you have the data, you'll fill up an input audio buffer list and return it to the engine. But if you don't have data, you can return nil at this point.

Now note that the input node can be used both in the offline and real-time Manual Rendering Mode. But when you're using it in the real-time Manual Rendering Mode, this input block also gets called from a real-time context, which means that you need to take care not to make any kind of blocking calls within this input block.

The next part of the setup is to clear your output buffer, and the difference here is you will create an AVAudioPCMBuffer and fetch its audio buffer list which is what you'll use in the real-time render logic. And finally, you'll go ahead and start the Engine. So, now the Engine is all set up and ready to render, and is waiting for the app to pull for the output data.

Now here comes the actual render logic. And note that this part of the chord is written in C++, and that is because as I mentioned, we are -- it's not safe to use Objective-C or Swift runtime from a real-time context. So, what we're doing first is calling the render block that we cached earlier, and asking the Engine to render a certain number or frames, and giving it the outputBufferList that we created.

And finally, depending on the status, if you get a success, it means everything went fine and the data was rendered to the output buffer. But you could also get an insufficient data from input note as a status, which means that when your input block was called by the Engine for input data, you did not have enough data in your written nil from that input block.

And note that in this case, in case you have other sources in your processing graph, for example, you have some of the [inaudible] notes. Those notes could have still rendered the input data, so you may still have some output in your output buffer. So, you can check the sizes of your output buffer, to determine whether or not it has any data. And of course, you handle the other status which includes the error, and that is pretty much the render logic in real-time Manual Rendering Mode.

Now, lastly a note on the render cause. In the offline mode, because there are no deadlines or real-time constraints, you can use either the Objective-C or the Swift render of line method, or you could use the render block based render call in order to render the Engine. But in real-time Manual Rendering Mode, you must use the block based render call. So, that brings us to the end of Manual Rendering Mode. Now let's now see the next new mode we have in the Engine, which is the Auto Shutdown Mode.

Now, normally it is the responsibility of the app to pause or stop the Engine when it is not in use in order to conserve power. For example, say we have a music app that is using one of the player nodes for playing back some file, and say the user stops the playback.

Now the app, should not only pause or stop the player node, but it should also pause or stop the Engine in order to prevent it from running idle. But in the past, we have seen that not all the apps actually do this, and especially that's true on watchOS. And hence, we are now adding the safety net in order to conserve power with this auto shutdown mode.

When the Engine is operating under this mode, it will continuously monitor and if it detects that the Engine is running idle for a certain duration, it will go ahead and stop the audio hardware and delete. And later on, suppose any of the sources become active again, it will start the audio hardware dynamically. And all of this happens under the hood. And this is the enforced behavior on watchOS, but it can also be optionally enabled on other platforms.

Now, next onto the enhancements in AV Audio Player Node. AV Audio Player Node is one of the source nodes in the Engine, through which you could schedule a buffer or file for playback. And the existing [inaudible] methods, take a completion handler and they call the completion handler when the data that you have provided has been consumed by the player. We are now adding new completion handler and new types of callbacks, in order for you to know various stages of completion.

The first new callback type is the data consumed type. And this is exactly same as the existing completion handler. That is, when the completion handler gets called, it means the data has been consumed by the player. So, at that point, if you wanted, you could recycle that buffer, or if you have more data to schedule on the player, you could do that. The second type of callback is the data rendered callback. And that means that the data that you provided, has been rendered when the completion handler gets called. And this does not account for any downstream signal processing latencies in your processing graph.

The last type is the data played back type, which is the most interesting one. And this means that when your completion handler gets called, the buffer or the file that you scheduled, has actually finished playing from the listener's perspective. And this is applicable only when the Engine is rendering to the device. And this accounts for all the signal processing latencies, downstream of the player in your processing graph, as well as any latency in the audio playback device.

So, as a code example, let's see a scheduled file method through which you can schedule a file for playback. So, here, I'm scheduling a file for playback and indicating that I'm interested to know when the data has played back. That me -- and I'm providing a completion handler. So, when the completion handler gets called, it means that my file has finished playing, and at this point, I can say, "Notify my UI thread to update the UI," or I can notify my main thread to go ahead and stop the Engine, if that's applicable.

So, that brings us to the end of the enhancements we have in AV Audio Engine. At this point, I would also like to mention that we will soon be deprecating the AU Graph API in the Audio Toolbox framework, in 2018, so please move over to using AV Audio Engine instead of AU Graph if you've not already done that.

Now let's go to the second set of API in the AV Foundation framework, AV Audio Session. AirPlay 2 is a brand-new technology in this year's iOS, tvOS, and macOS [inaudible]. And this lets you do multi-room audio with AirPlay 2 capable devices, which is for example, the Homepod. So, there is a separate dedicated session called "Interviews in AirPlay 2," happening this Thursday at 4:10 p.m. to go over all the features of this technology. So, you can catch that if you're interested in knowing more details.

Also seated with AirPlay 2 is something called Long-Form audio. And this is a category of content, for example music or podcast, which is typically more than a few minutes long, and whose playback can be shared with others. For example, say you have a party at home, and you are playing back a music playlist through an AirPlay device. Now that is categorized as -- that can be categorized as a long-form audio content.

Now with AirPlay 2 and long-form audio, we now get a separate shared route for the long-form audio apps to the AirPlay 2 devices. And I'll explain about that in a little more detail. And right -- and now, we have new API in AV Audio Session, for an app to identify itself as being long-form and take advantage of this separate shared audio route.

So, let's consider the example I just mentioned. So, say you have a party at home, and you're playing back music to an AirPlay device. We'll contrast the current behavior and see how the behavior changes with long-form audio routing. So, here is the current behavior. So, you -- the music is now playing back through the AirPlay device, and suppose you now get a phone call.

What happens is, at this point, your music playback gets interrupted and it stops. And the phone call gets routed to the system audio which could be receiver or [inaudible] speaker. And only when the phone call ends, is when the music gets a resumable [inaudible] and it resumes the playback. So, as you can see, a phone call interrupting your party music is not really an ideal scenario. So, we'll now see how the behavior changes with long-form audio routing.

So lets see the same example. So, now that we have music playing back through an AirPlay 2 capable device. And then, a phone call comes in. Now because the phone call is not a long-form audio, it does not interrupt your music playback, and it gets routed independently to the system audio without any issues. So, with long-form audio routing, two of the sessions can coexist without interrupting each other, and as you can see, this is definitely an enhanced user experience. So,-- .

[ Applause ]

So, to summarize, with long-form audio routing, all the apps that identified themselves as being long-form, which is for example, music, podcast, or any other music streaming app, they get the dedicated -- they get a separate shared route to the AirPlay 2 capable device. Now, note that there is a session arbitrated in between.

And that ensures that only one of these apps is playing to the AirPlay device at the time. So, these apps cannot mix with each other. And all the other apps that use the system route, which are non-long-form, can either interrupt each other or mix with each other, and they get routed to the system audio without interrupting your long-form audio playback.

Now, let's see how an app can identify itself as being long-form and take advantage of this routing. So, on iOS and tvOS, the code is really simple. You get shared instance of your AVAudio session, and you use this new API to set your category as playback and route sharing policy as long-form.

Now, moving over to the macOS, the routing is very similar to the iOS and tvOS. All the long-form audio apps, for example your iTunes and any other music streaming app, gets routed to the AirPlay 2 capable device, and of course, there is an arbitrator in between. And the other system apps like GarageBand, Safari, or Game App, do not interrupt your long-form audio apps, and they always mix with each other and get routed to the default device.

And to enable the support of long-form audio routing on macOS, we are now bringing a very small subset of AVAudio Session to macOS. So, as an app, in order to identify yourself as being long-form, you again get the shared and sense of your AVAudio Session, and set the route sharing policy as being long-form. So, that is the end of AVAudio Session enhancements, and let's now see the last section in the AV Foundation framework, that is the enhancement on watchOS.

So, we introduced the AV -- we made AVAudio Player API available in watchOS 3.1SDK. And this is the first time we get to mention it at WWDC. And the nice thing about using the AVAudio Player for playback is that it comes associated with its AVAudio Session, so you could use the session category options like [inaudible] or mix with others, to describe your app's behavior. Now starting watchOS 4, we are exposing more APIs in order to do recording. That is, we are making AVAudio Recorder and AVAudio Input Node and AVAudio Engine, available.

And with these, comes the AVAudio recording permissions, through which an app can [inaudible] the user permission to record. Now, [inaudible] to this you could use the watch [inaudible] framework to do the recording, using the Apple UI. But now, with these APIs, you could do the recording with your own UI.

With AVAudio Recorder, you could record to a file, or if you wanted to get access to the microphone [inaudible] directly, you could use the AVAudio Input Node, and also optionally, write it to a file. And here are the formats that are supported on watchOS, both for playback and recording.

A last note on the recording policies. The recording can start only when the app is in foreground. But it is allowed to continue recording in the background, but -- and the right microphone icon will be displayed at the top so that the user is aware of it. And recording in background is CPU limited, similar to the [inaudible] sessions and you can refer to this URL for more details. Now, let's move over to the Audio Toolbox world and look at the enhancements in AUAudio Unit and audio formats. We have two main enhancements in AUAudio Unit. And at the end of this section, we will also show you a demo with those two new features in action.

Now, Audio Unit host applications choose various strategies in order to recommend how to display the UI for AU. They can decide to say embed the AU's UI in their own UI, or they could present a full screen separate UI for the AU. Now, this presents mainly a challenge on the [inaudible] devices because currently, the view sizes are not defined, and the audio unit is expected to adapt to any UI size that the host has actually chosen. In order to overcome this limitation, we're now adding a way in which the host and the AU can negotiate with each other and the AU can inform the host about all the view configurations that it actually supports. Now, let's see how this negotiation can take place.

The host first compiles a list of all the available view configurations for the AU, and then hands the audio over to the AU. The AU can then [inaudible] through all these available configurations, and then let the host know about the configuration that it actually supports. And then, the host can choose one of the supported configurations and then it will let the AU know about the final selected configuration. Now, let's see a code example on how this negotiation takes place. We'll first look at the audio unit extension site.

The first thing the AU has to do is to override the supported view configuration method from the base class. And this is called by the host with the list of all the available configurations. Then, the AU can iterate through each of these configurations and decide which ones it actually supports. Now, the configuration itself, contains a width and a height, which recommends the view size. And also, it has a host test controller flag. And that flag indicates whether or not the host is presenting its own controller in this particular view configuration.

So, depending on all these factors, an AU can choose whether it supports that particular configuration. Note that there is a wild card configuration which is 0x0, and that means -- and that represents a full default size that the AU can support. And on macOS, this actually translates to a separate, resizable window -- full size, resizable window, for the AU's UI.

So, the AU has its own logic to decide which configuration it supports, and then finally, it compares a list of the indices corresponding to the ones that it supports, and [inaudible] this index set back to the host. The last thing that the AU has to do, is to override select method, which is called by the host with the configuration that it has finally selected, and then, the AU can let its view controller know about the final selected configuration. Now, let's go to the host site and see how the code looks like.

The host has to compile the list of available configurations, and in this example, it is saying that it has a large and a small configuration available. And in the last configuration, the host is saying it's not presenting its controller, so the host has controller flag as false. And in the small configuration, the host does present its controller, so the flag is true.

The host then calls the supported view configurations method on the AU, and provides this list of configurations. And depending on the return set of indices, it goes ahead and selects one of the configurations. And in this particular example, the host is just toggling between the large and the small configuration.

So, that is end of the preferred view configuration negotiation. Now, let's see the second main new feature we have, which is the support for MIDI output in an audio unit extension. We have now support for an AU to emit MIDI output synchronized with its audio output. And this mainly useful if the host wants to record and edit both the MIDI performance, as well as the audio output, from the AU. So, the host installs a MIDI output event block on the AU, and the AU should call this block every render cycle and provide the MIDI output for that particular render cycle.

We also have a couple of other enhancements in the Audio Toolbox framework. The first one is related to a privacy enhancement. So, starting iOS 11 SDK, all the audio unit extension host apps will need the inter-app-audio entitlement to be able to communicate with the audio unit extensions. And we also have a new API for an AU to publish a very meaningful short name so that the host say, can use this short name if it has to display the list of AU names in a space constraint list. So, that brings us to the end of all the enhancements in Audio Toolbox framework, and as promised, we have a demo to show these new features in action. And I call upon Bela for that.

[ Applause ]

Thank you, Akshatha and good afternoon everyone. My name is Bela Balazs and I am an engineer on the Core Audio Team. Today, we would like to show you an application of our newly introduced APIs. For this purpose, we have developed an example audio unit, which has the following capabilities.

It supports its preferred view configuration with the Audio Unit host application. It supports multiple view configurations, and it uses the newly bridged MIDI output API in order to pass on MIDI data to the Audio Unit host application for recording purposes. So, here I have an upcoming version of GarageBand. And I have loaded my example audio unit to a track.

Here you can see the custom view of my audio unit, together with the GarageBand keyboard. In this reconfiguration, I rely on the GarageBand keyboard to play my instrument. I have mapped out three drum samples on the keyboard. I have a kick, I have a snare, and I have a high hat. In addition to these, on the view of my audio unit, I also have a volume slider to control the volume of these samples.

However, my audio unit also has a different view configuration, and I can switch to it using this newly added button on the right -- lower, right section of the screen. When I activate that button, I get taken to the large view of my audio unit and the GarageBand keyboard disappears.

When I activate it again, I get taken back to the small view of my audio unit. This is made possible by GarageBand's publishing all the available view configurations to my audio unit, and my audio unit goes through that list and marks each of them as supported or unsupported, and at the end of this process, GarageBand knows that my audio unit supports two view configurations and it can toggle between them. In case my audio unit only supported one view configuration, then this button could be hidden by GarageBand, but my audio unit could still take full advantage of the negotiation process to negotiate the preferred view configuration for that one view.

In this small view, the host has controller flag as set to true, and that is why the GarageBand keyboard is visible. In the larger view configuration, the GarageBand keyboard is hidden because that flag is set to false. In this view configuration, my audio unit has its own playing surface, which I can use to play my instrument. I have a kick, a snare, and a high hat.

And in addition to these three buttons, I also have a new button on the right-hand side called Repeat Note. And this allows me to repeat each sample at the certain rate. And I can set those rates independently from each other using the sliders. And I can toggle each sample in and out of the drum loop.

[ Drums playing ]

This allows me to easily construct drum loops that respect the tempo of my track. So, let's use the MIDI output API to record the output of this audio unit extension. I have the synchronized rates button here, which sets my rates to 110 BPM. And first, I will record a kick, snare drum loop. And then when the recording wraps around, I will add my high hats. This is made possible by GarageBand's merge recording feature. So, let's do just that.

[ Drums playing ]

I will just record four bars of that. And then add my high hats.

[ Drums playing ]

My high hats have been added to the recording. And now we can go to the track view and take a look at our recorded media output. And I can quantize the track. And then we can play it back. And we have the full MIDI editing capabilities of GarageBand at our disposal to construct our drum track. And this concludes my demo. Thank you very much for your attention. And I would like to hand it back to my colleague, Akshatha. Thank you.

[ Applause ]

Thank you, Bela. So, now, onto the last set of enhancements in the Audio Toolbox framework, related to the audio formats. We now have support for two of the popular formats, namely the FLAC and the Opus format. On the FLAC side, we have the codec, file, and the streaming support, and for Opus, we have the codec, and the file I/O support using the code audio format container.

From audio formats to spatial audio formats, those of you who are interested in [inaudible] audio, AR, and VR applications, you may be happy to know that we now support ambisonics. And for those of you who may not be really familiar with ambisonics like me, ambisonics is also a multichannel format, but the difference is that the traditional surround formats that we know of, for example 5.1 or 7.1, have the signals that actually represent the speaker layout. Whereas ambisonics provide a speaker independent representation of the sound feed. So, they are by nature, [inaudible] from the playback system. And at the time of rendering, is when they can be decoded to the listener's speaker setup. And this provides more flexibility for the content producers.

We now support the first order ambisonics which is called the B-format and higher ordered ambisonics with the Order N, can range from 1 through 254. And depending on the order, the channels itself can go from zero -- the ambisonic channel number can go from zero to 65,024. And we support two of the popular normalized streams, namely the SN3D and the N3D streams, and we support decoding ambisonics to any arbitrary speaker layout, and conversion between the B-format and these normalized streams.

The last enhancement is on the AU Spatial Mixer side. So, this is an Apple built-in spatial mixer, which is used for 3D audio spatialization. And the AVAudio Environment Node, which is a node in the AVAudio Engine, also uses the Spatial Mixer underneath. And we now have a new rendering algorithm in this Spatial Mixer, called HRTFHQ, high quality.

And this differs from the current existing HRTF algorithm in the sense that it has a better frequency response and better localization of sources in the 3D space. So, that concludes all the enhancements in the Audio Toolbox framework and now, I hand it over to Torrey to take it away from here, and give you an update on inter-device audio mode.

[ Applause ]

Thank you, Akshatha. I am Torrey Holbrook Walker, and I'm going to take you home today with inter-device audio mode, or if you want to be cool, you can just say IDAM for short. And you remember IDAM. You take your iOS device. You plug it into your Mac.

You open up Audio MIDI setup and then it shows right up there in the Audio Device Window, you can -- there's a button next to it that says Enable. And if you click it, boom, you've immediately got the capability to record audio digitally over the USB lightning cable that came with the device, and it looks just like a USB audio input to the Mac host.

So, it uses the same driver, the same low latency driver, that's used on MacOS 4, class-compliant audio devices. And you've been able to do this since El Capitan and iOS 9. Well, today we would like to wave a fond farewell to IDAM. So, wave goodbye IDAM. Goodbye IDAM. And while you're waving, say hello to IDAM, Inter Device Audio and MIDI.

So, this year, we are adding MIDI to IDAM configuration, and that will allow you to send and receive your musical instrument data to your iOS device using the same cable that came with the device. It's class-compliant once again, so on the iOS side, you will see a MIDI source and destination representing the Mac. On the Mac, you will see a source and destination representing your iOS device.

Now, this will require iOS 11, but you can do it as far back as MacOS El Capitan or later, because it's a class-compliant implementation. And you don't have to do anything special to get MIDI. You're going to get it automatically anytime you enter the item configuration by clicking Enable. Do you need to do anything to your app to support that? No. It will just work if it works with MIDI.

So, while you're in the IDAM configuration, your device will be able to charge and sync, but you will temporarily lose the ability to photo import and tether. You can get that back by clicking the Disable button or hot plugging the device on your Mac. The input, the audio input, side of this can be aggregated, so if you've got multiple iOS devices, like I do, say, your iPhone and your iPad and your kid's iPad, you could say enable IDAM configuration on all three of these and aggregate them into a single, six-channel audio input device that your digital audio workstation can see.

And because the MIDI communication is bidirectional, you can use it as -- you could say for example, "Send MIDI to a synthesizer application," and record the audio back from it. Or you could just design a MIDI controller application for an iPad, that magical piece of glass, and you could use that to control your [inaudible]. But talk is cheap, and demos pay the bills. So, let's see this in action.

So, before I actually bring up my demo machine here, I want to show you the application that I'm going to use here. And it is called Feud Machine. So, I've got Feud Machine open here. And on Feud Machine, this is a multi-playhead MIDI sequencer. So, that means that you can actually use one MIDI sequence and use different playheads, perhaps moving at different times, in different directions, and use that to create a complex arpeggio using phasing and a timing relationship. So, I'm just going to play this pattern here. And there are a lot of playheads. I'll just stop some of them. So, this is just one. I'll add another. Add another. As you see, we can create arpeggios very easily this way.

So, there are other patterns that I could use. For example, this one's called "Dotted". This one's "Triplet." But we'll still with this one, and we're going to use this actually to control a project that we're working on in Logic. So, now, I'll move over to my demo machine. I'm going to click Enable here.

And I'll see it come up as a USB audio input, and if I look at the MIDI studio window, I'll also see that it shows up here as a MIDI source and destination that I can use in Logic. So, if I launch a project that I've been working on here -- now this is a short, four-bar loop that I'm working on for a gaming scoring screen. So, after this video game level is completed, the player can look at their results and they will be listening to this loop. And the loop right now, before I've added anything to it, sounds like this.

[ Music ]

Now, I want to add the arpeggio part over this. So, what I'm going to do is I'm just going to double-click here to add another track. I'm going to choose an arpeggio, maybe something like a square. There we go. I'll do percussive squares here. And in the channel strip, you can actually see an arpeggiator. I'm not going to need that because I'm going to play this with Feud Machine. So, if I record enable this, and I arm my sequence here, I'll be able to hear Feud Machine play the soft synth here in Logic. So, I'll solo that.

This is all four playheads moving at the same time. I could turn them off. I could just have one playhead if I wanted to. Or as many as all four. So, I'm going to record this into my track, and we'll see what that sounds like in context. Oops, sorry about that. I have to record arm here and play.

[ Music ]

Okay, so I've recorded my automation here, and I can use this automation and I can playback from the iPad here. So, if I listen to this in context, it sounds like this. So, now I've got MIDI going -- a MIDI start command going to Feud Machine. Feud Machine's playing our soft synth here. And I've got some automation here for the recording. And that concludes my demo for MIDI over IDAM configuration. Let's head back to the slides.

[ Applause ]

Okay, we've talked about a lot of things today. We've talked about enhancements to AVAudio Engine, including Manual Rendering which you can now do offline, or you can do real-time. There's AirPlay 2 support. There'll be an entirely other session on AirPlay 2 down the road in the conference. Please make sure to check that out if you're interested.

Watch OS 4, you can now record. We've talked about the capabilities and the limitations and policies regarding that. For AUAudio Units, you can now negotiate your view configurations and you can also synchronize your MIDI output with your audio output for your AU. We've talked about some other audio enhancements including new supported formats, ambisonics, head related transfer functions, and we wrapped up with talking about IDAM, which now stands for Inter Device Audio and MIDI. The central URL for information regarding this particular talk is here. And if you're interested in audio, you may also be interested in these related sessions later on in the week. We thank you very much for your time and attention, and have a fantastic conference.

[ Applause ]