HTTP Live Streaming Update - WWDC 2011

Graphics, Media, and Games • iOS, OS X • 58:22

HTTP Live Streaming lets you deliver live and on-demand multimedia content using a standard HTTP server. Gain a practical understanding of recent enhancements and how they affect best practices for delivering video into your application or on the web.

Speaker: Roger Pantos

Unlisted on Apple Developer site

Downloads from Apple

HD Video (194.7 MB)

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.

Thank you. That helps. So I'm here to update you on what's been going on with HTTP live streaming. I'm Roger Pantoos. I'm a core media engineer. I was also responsible for writing the draft. So you may have read some of my prose. What I'm going to do today is I'm going to talk about three different things.

Some of you may be new to the idea of HTTP live streaming, and so I'm going to take a really short period of time to explain how that technology works. Next, I'm going to discuss some of the new features they were adding to HTTP live streaming in iOS 5.

And finally, I'm going to spend some time discussing some techniques that you can use in your streams and in your applications in order to perform well. on our platforms. So, HTTP Live Streaming. Essentially, it's a way that you can use standard HTTP servers and the default web infrastructure, CDNs like Akamai and things like that, to host live and on-demand audio and video content to iOS devices and now to other devices as well.

How does it work? Well, here's the really short version. What you do is you take your media presentation and you divide it up into these little segments, maybe 10 seconds each You make a URL for each of the segments and you make a playlist file. And you put the URL to each segment into the playlist file and then toss all of that up onto an HTTP server.

For an on-demand content, you're done. If a client wants to play that, what he does is he downloads the playlist file, and then he downloads and plays each media URL in sequence. That's all you need to do for an on-demand content. If it's live, then what's happening is content is being added to the playlist file as it's being generated, and the client will typically reload the playlist file about every 10 seconds to discover those new segments.

Graphically, it looks like this. We've got our segment files, we've got our playlist. We create URLs for each of the segment files, stick them into the playlist. That's it if it's on-demand. If it's live, then periodically we're creating new segments and adding them to the end of the playlist file.

What do these playlists files look like? Well, we use the M3U playlist file format. It's an extremely simple format. It's basically just a text file with a list of URLs and some informational tags attached to them. So in this example here for an on-demand playlist, what we have at the top is some tags which refer to some general aspects of the playlist, such as its version and the target duration of the media.

The main body of the playlist is this list of URLs that have tags attached to it. And in this example, you can see that the URLs have Xinf tags, and what these do is they supply the duration of the file sequence of the segments. And so in this case, file sequence A has a duration of 9.7 seconds, file sequence B is 9.2 seconds, etc.

At the end of this playlist file is an endless tag and that just indicates that it's an on-demand playlist. We don't expect any new segments to unexpectedly appear at the end of it. So you could keep refreshing it forever and this would be all you would see. So this covers the simple case of presenting content at a single bit rate. But there are some refinements that make HTTP streaming work particularly well. The first is that you can actually encode your content at multiple bit rates. So multiple qualities of different network sizes.

And when you do that, you put each encoding quality into its own playlist file. And when you ask us to play your presentation, you present all those playlist files to us at once. When you do it that way, our client will adaptively choose the best bit rate to match the connection speed of the network the user is currently using. This makes it work well across a wide range of connected devices with all kinds of different throughput. And it's particularly important. In the mobile space where the available bandwidth varies tremendously.

So doing it is actually pretty straightforward. As I said, you've got these playlist files, each at a different bit rate. You take each playlist file and you put it into a master playlist or a variant playlist. And it's the same playlist format. It's M3U. The difference is in this case, the URLs actually refer to those individual playlists.

And the tags are stream info tags that describe the streams in the playlist files. So for example, what we have here is the first URL is a playlist to the stream that's encoded at 150 kilobits per second. The second example is the same content that's been encoded at a higher quality of 640 kilobits per second. So you produce this playlist file. If you hand it to us to play, we'll play your content adaptively.

The next variant, or the next refinement I'd like to discuss is that your content doesn't have to be a single uninterrupted movie. It can be a little bit of this, and then a little bit of that, and then back to your movie. And this is often used for ad-supported content. We sometimes also see it on program boundaries. So putting discontinuities into your playlist is simple in HTTP Live Streaming. All you need to do is find the first media segment of the new content and stick a discontinuity tag on it.

Finally, you can encrypt your content. And you would do this to control who has access to it. To do so, you encrypt the content first of all, and then you tag the encrypted media segments in the playlist with a key tag. And the key tag contains a URL. And then you add a key tag to the URL that allows us, the player, to load the key and decrypt the content if we're authorized. So multiple encoding contents or qualities, discontinuities, encrypted media. Those are the major features of HTTP Live Streaming that make them work really well.

When it comes time to actually play the streams, we have a choice of APIs for you. At the highest level, you can simply stick the URL to your master playlist into an HTML5 video element and let UI WebView control the rest of the presentation and control of playback, whether in Safari or in a UI WebView.

Programmatically, you can also use the MPMoviePlayer object from the MediaPlayer framework. And this is really convenient because it will handle putting up a UI for you. It will handle all the user interaction. It's a really easy way to have your application play a live stream. Finally, you can use the AVPlayerItem object in the AVFoundation framework. And you would do this to get the highest degree of control possible when playing your stream. Including if you want to play a live stream, you can use the AVPlayerItem object in the AVFoundation framework. And you can also use the AVPlayerItem object in the AVFoundation framework.

What you can do with HTTP streaming, how it looks on the server, how you play it on the client, that's it in a nutshell. So now, kind of the fun part, we get to talk about some of the new features that we've added to HTTP live streaming since last year's conference. And the first thing I'd like to discuss is alternative audio.

So what is that? Well, it's not uncommon for a video presentation to have a choice of different audio that can be listened to along with it. Some examples of this are you might have a movie that has been dubbed in several different languages, or you might have the audio from a director's commentary. If you're watching a live baseball game, it's not uncommon for there to be two audio broadcasts accompanying the same video, one that's directed at the home team audience and one that is directed at the away team.

So alternative audio is pretty useful. You can add some compelling value to your applications. But for a network playback stack, there are some challenges with doing alternative audio. And it starts with how you approach it. There are a few different ways. One approach you can take is you can simply take all the different... audio choices, all the languages, all the different, you know, variants, and you can put them into the same media file and download the media file and just play what you want out of it.

The problem with doing it this way is that now you're downloading all the languages and all the audio you're never going to listen to, but you're still paying for it with your precious, precious network bandwidth. And so it's not efficient. Another approach you can take, and in fact you can do this in iOS 4, is you can simply take your presentation.

Let's say it's authored in English. And you can duplicate the entire thing, replacing the English audio with the Spanish audio or the Italian audio or what have you. This has its own problem. It's expensive from a storage point of view. And to illustrate that, I'd like to show you an example.

So what I'm showing you here is the baseline cost of storing an HTTP live stream. When you encode your stream, we recommend that you start with an audio-only encoding and that you additionally have several different VDs. And you can also use video encodings at different bit rates so that clients in different network conditions can still get a good experience.

When you follow our recommended encoding levels, this ends up being about 1.6 gigabytes of data for every hour of content. So it's a fair bit. But now imagine that what you'd like to do is take this movie, let's say, and you'd like to offer it in Spanish and a few other languages as well. If you were to simply take the whole thing and duplicate it, well, you've just multiplied it by four.

And so now you've got six and a half gigabytes of storage for every hour of content. It's a lot of storage. And more to the point, if you happen to be paying a CDN like Akamai on a per megabyte basis to store this content on their origin server, it gets expensive pretty quickly as well. And it's not like Akamai loves it.

Because remember, what they've got to do is they've got to puff all this content out to their edge servers. And in this case, what can happen is that they can pull... They can pull a bunch of video content that has English audio embedded into it out to an edge.

And then the next person it connects wants to see the Spanish content. And so now they've got to eject the English content. They've got to pull in the Spanish content even though 90% of it is actually duplicate video. So it really doesn't work that well. And we decided to address it in iOS 5. How?

What we do is we allow you to specify that the audio and video be pulled from separate locations of the system. So if you're using iOS 5, you can pull in the Spanish content from separate locations on your web server. And we've taught the player to download both the audio and video streams in parallel. What this allows you to do is offer multiple audio renditions of your content simultaneously with a minimal storage overhead. And as an additional bonus, this allows us to allow you to change the audio on the fly.

Thank you. So what does that do to our storage picture? Well, let's go back. Rather than having this kind of thing where you've got six and a half gigabytes, you're duplicating all your video. All you need to do is create an audio-only playlist, and this is compatible with your existing audio-only playlists, for each of the different languages.

These guys are all really small. And then have them accompany video-only playlists for all of your different variants. What we have here now is all four languages, audio-only, plus all six video variants, and the storage cost is actually less than it would be for English-only under iOS 4, because we're no longer duplicating the audio in each video segment.

This is a much better approach from the point of view of storage cost. So if you'd like to set up your streams this way, there's a little bit more you need to tell us. You need to tell us principally where the audio is. And to do that, we have defined a new tag in the master variant playlist. It's the media tag. And what the media tag does is essentially it tells us that here's a place where we can find some optional media.

and it allows you to group the media. And so in this case, each both of these media tags has the same group ID. This indicates that they're part of the same group. In this case, it's an audio group. And the first member of the group is the English audio. The second member is the Spanish audio.

Now where you use this is down below in the stream info tag. The stream info tag gets a new attribute, the audio attribute. It's set to the value of the group. And what this indicates to us is although we may be pulling the video from the main URL down there, we're going to pull the audio from one or another member of the group. Of course, your video may still be encoded at multiple qualities. And if it is, you simply have more stream info tags and you use the same audio attribute to indicate that they're all pulling audio from the same group.

So, the next step is actually choosing which audio you'd like to listen to. And to do that, we have a new API in iOS called the Media Selection API. What this API allows you to do is to look at an AV asset and discover that you actually have optional media attributes in it. And to select the one you'd like to use for playback. And the way it works is you start by getting a reference to your AV asset, and then you load the available media characteristics with media selection options. Or the AMCWMSO, as we like to call it.

It's a long name because it has a bunch of stuff in it. It doesn't just have all the optional audio, but it can have optional video or a bunch of other different things as well. The next step is to call one of the selection APIs that says, out of the universe of media options you've got, filter out what's audible. Filter out my audio options. That produces a selection group.

And then the next stage is to walk through the entries in the selection group and choose which one you'd like to use. And you can do this by inspecting each one, looking at its language tag or its name, throwing up a UI, different things. Or you can just be lazy like me and you can just pick the first one, object or at index zero.

So you get a media option, you get one of the optional audio selections. And then you take your asset and you create an AV player item with it. And then you tell the player item to select the media option you've chosen in the media selection group. And as I described earlier, you can either do this once prior to starting playback of the player item, or you can do it at runtime while it's playing, and that will cause us to switch the audio. And I'd like to show you that now. And so to do that, I'd like to bring up my colleague and co-authors back, Bill May. Thank you.

Okay, so what Bill's gonna do is we have a little demo app we wrote called Foreign Cinema, and so Bill's gonna launch that application. and the first thing he's going to do is load a movie. So let's choose the Phil and John show. I love that one. So here's our buddy, Johnny Ive.

And what we're going to do next is we're going to tap the language button to discover what kind of audio we have in this movie. And what we've done here is we've gotten a list of media selection options, and we've iterated through them, and we've thrown up a UI with the name that's embedded in each media tag. And so to start with, Bill's going to pick English.

And now we're going to play that. And now we have Johnny speaking English. He probably does a better job than I do. So let's handicap him. Let's push him over to French. In fact, let's listen to the director's commentary. What the heck? Avoid getting moisture into openings. Don't use window cleaners, household cleaners, aerosol sprays, solvents. All right. That's riveting, Bill. Thanks very much. So that was alternative audio.

The next feature I'd like to discuss is byte range support for segments. Now traditionally, HTTP live streaming, you have your playlist file and you have a bunch of URLs indicating media segments. And traditionally, each media segment has been the entire content of a URL. And what this means is that if you have 700 media segments in your movie, you actually have 700 little files up on your web server.

In iOS 5, we're adding the ability to specify a media segment as a sub-range of a larger URL. And what this allows you to do is to consolidate your media segments into larger files or a single large file. The primary benefit of this is that when a client is playing your media, rather than sort of downloading a segment from over here, then downloading the second file, then downloading this one, then downloading this one, it's actually walking through a larger file in sequence.

And this allows servers like proxy caching to be able to do that. So, if you're using a lot of other servers, like proxy caching servers, such as Akamai, to get a much better idea of what it needs to prefetch in order to ensure that the next segment you're going to want is in the cache at the time that you want it. As a kind of a side bonus, it also means that you end up with far fewer files to manage. If you have many video variants and a long movie, you could have thousands of individual segment files using byte range support, you only get a few.

Graphically, it looks like this, old way. You have individual segment files. Each one has a URL. It's in a playlist. Using byte range support, we can consolidate all the segments into a single file. I'm calling it media.ts. And you still have playlist entries, but now the playlist entries, in addition to the URL, have the byte range specifier that say it's actually this sub-range of the Now, there is one caveat I'd like to mention with byte ranges and it has to do with live content. And it's this. When your application is watching a live stream, it's making media file requests over the big wide internet. And it may be making them through a number of different servers. Caching servers like Akamai, corporate proxy servers, what have you.

And all of these servers do a great job at caching entire resource requests. And some of them do a pretty good job at caching byte ranges. And some of them kind of have a sense of how to handle files that grow dynamically over time. But when you put all this stuff together, you're kind of asking for trouble. It becomes somewhat unreliable to make these requests over the internet because even if your web server can handle it, your intermediary caching servers may not. And for that reason, we recommend that you do a great job at caching byte ranges.

And of course, there's always static for on-demand content. We recommend that they be static for live as well. So it doesn't matter if they're really small or really large, just as long as you don't append to them after you start playing. So that being said, here's how you specify byte range media segments.

We have a new tag. We called it byte range. And it specifies the length of the range. It may also specify the offset. And it may also specify the offset. Unless the byte range adds up, it's not going to be a good idea. It actually follows immediately from the previous byte range, in which case the offset can be optimized away.

Here's an example of a playlist file. On the left, we have the old way where we've got three segments. Each one is its own URL. It's a relative URL in this case. On the right, we have the new way. We still have three segments, but now they're ranges into a single URL, media.ts, and we have the range tag that specifies the byte ranges that they're actually occupying.

So it's a pretty easy syntax, and we actually found it to be quite handy for the next feature I'd like to talk about, and that is Fast Forward Rewind. What we wanted to do with Fast Forward Rewind is enable the use case where a user is looking for a scene in your presentation, your movie, your TV show, what have you. They know it's in there somewhere.

They don't quite remember where in the movie it is, and so they'd like to scan rapidly backward and forward through the movie looking for the scene they'd like to play. play. We wanted to support the most popular scanning speeds, so 16 times, 32 times, 64 times, the reverse equivalence of those. And of course, being HTTP live streaming, we wanted to work with live content as well.

But again, being as we're a network player, there are some challenges with this, and the foremost challenge being, let's say I want to play your content at 32 times real time. There's no way that I can download all of your content 32 times faster than I'd like to play it normally. You might, with a fast network, get four or five times. There's no way you're going to get 32.

Fortunately, it isn't actually necessary to download all the content. For instance, if you're watching all the content at 32 times, that's over 1,000 frames a second. And believe me, it's just a blur. It's not useful. What you really want to do is you want to display a carefully sort of a well-chosen selection of frames at about 5 to 10 frames a second. And perceptually, this allows users to orient where they are in the stream and to stop when they want to.

So the solution we have for Fast Forward Rewind does exactly that. It's called iframe-only playback. Now, I'd like to spend a little bit of time describing iframe-only playback, and to do that, I have to make a brief detour into the mysteries of video encoding. So I hope you'll bear with me for a moment.

When video is encoded, what you're doing is you're taking individual frames and you're turning them into compressed frames. And every compressed frame gets a frame type, their IP or B. Now, the only thing you really need to take away from this is P and B frames, although they comprise most of the frames in the movie, are a pain in the butt to deal with. And the reason for this is that to decode a P or B frame, you also have to have a bunch of earlier frames from previously in the movie.

iFrames, on the other hand, don't have this problem. They are independent. They stand alone. So in order to turn an iFrame into a picture, all you have to do is load the part of the file that contains the iFrame, its byte range, and then run that through the decoder and you get a picture.

One of the advantages of this approach is that your content already has iframes. And so you don't need to produce special purpose content for supporting fast forward and rewind. All you need to do is tell us where your iframes are. And so to do that, we have a new kind of playlist, and it's an iframe only playlist. Now -- Mom, is that you?

So an iframe-only playlist is actually almost identical to a regular playlist. Its job is to tell us where the iframes are. And so what we have here is an example where we have a couple of iframes in segment1.ts and a few more in segment2. And so we have our familiar pattern here where we have a URL that has a bunch of tags on it. In the case of this one, the tags are simply saying that I've got an iframe in segment1.ts at this byte range.

The only difference between iframe-only playlists and regular playlists is that iframes do not have an intrinsic duration. They're just sort of an instant in time. And so in an iframe-only playlist, the ex-inf duration actually refers to what we call the span of the iframe. Or in other words, the length of time between that current iframe and the next iframe appearing in the playlist file. And so what this is saying here is that segment one has two iframes. The first is at the beginning. The next is 4.12 seconds later. The third, which is in segment one, is in 3.56 seconds and so on and so forth.

So when you do this and when you tell us where your iframes are in the content and you ask us to play a 32X, what we're going to do is we're going to go through the iframe playlist. We're going to pull out all the iframes. We're going to play them as fast as we can.

How fast is that? Well, it depends. If it's relatively low bitrate content, if it's 200, 300 kilobits and you've got an Ethernet connection, you've got Wi-Fi, we're going to download them really fast. If you're over 3G and these are HDI frames, it's going to take a little longer.

What do we do about that, Dr. Science? I'm glad you asked. Just as HTTP Live Streaming can play your content at 1x adaptively, we can do the same thing with iframe-only playback. You can supply us with multiple iframe playlists, one for each of the video variants that you're offering to us.

And that way, when we play it, we have the flexibility to choose the bitrate of the iframes that we wish to download so that we can create a good match between the frame rate we'd like to achieve and the quality of the iframes that we'd like to display. And so, it's pretty easy to set up. Once you've got it set up this way, then that's pretty much all you need on server side to do fast forward rewind. On the client side, it's even easier.

All you need to do to do fast forward rewind on a client is set the right property. There's a slight caveat for HTTP live streaming, which is this. Because we don't always have iframe playlists, because they're sort of an optional feature of the stream, a given stream may not be capable of doing fast forward rewind if there's no iframe playlists.

And so we recommend that unless you know you're playing your own streams and you know it has iframe playlists, that you check the value of the can play fast forward and can play fast reverse properties of the player item before you attempt to set the rate. In fact, it's probably a good idea to do it anyway.

Roger Pantoz: And so once you do that, you'll get fast, high-quality, fast forward reverse. And I'd actually like to demo that, if that's okay. And so let's bring Bill back up. We're going to switch back to the iPad and run our Foreign Cinema app again, bring it back up. Now we're going to choose a different movie. Let's choose Vacation Video.

And so let's start playing it. And so we've got some folks in the field here. Now, I know that there's a scene later on this movie, which is a footy game that I'd like to watch. And so let's go to fast forward. So we're going to start at eight times. It's a little bit further.

Let's go to 64 times. And so now we're playing 64 times. And let's see. Okay, now we're in Amsterdam. And okay, pause that there. Yeah, okay, that's what I want. I think we went a little too far. Let's go backward a little bit. Yeah, okay. Yeah, now let's play that.

We're still working on the transition a little bit as to where we transition from the operating button. So what we've just done is we started at the beginning of the presentation, we scanned forward rapidly to find the part we want, we've zeroed in on it and we started playing to--we started to see what we wanted to see. So, thanks Bill. Great. So, thank you.

Thank So next I'd like to talk about a couple of features we actually slipped into earlier iOS releases but perhaps didn't do as great a job of telling you all about. The first is closed captions. As most of you know, closed captions is a way to embed accessibility text into a video signal. And the format for the closed captions has been defined by a well-known standard called CEA 608.

And the good folks at ATSC have further specified how to embed 608 closed captions into an NMPG-2 transport stream. And so in iOS 4.2 and onward, if your streams, if your transport streams contain 608 captions in the video and the user has selected it, usually via the preference, then we will display those closed captions while the video is playing. So that's pretty cool. Next I'd like to talk about playback statistics at runtime.

We delivered this in 4.2 and what we did was we gave you a way to kind of peek into the player while something was playing to get a sense of what was going on. We adopted the Apache model, which is to say that we give you two different logs.

One's an access log, which generally tells you what's going on normally. And there's also an error log that tells you about sort of exceptional conditions that have occurred. So in the access log, for instance, we currently record a log entry every time we switch to a different variant, a different bit rate.

And amongst the things we record are when the variant started to play, how much time we downloaded, how much time was actually watched, the URL we pulled it from, the server IP address, because sometimes you need to debug a bad caching server, the number of times it changed, and a bunch of kind of performance-related information about the stream and how it was playing. Another thing that we have is a playback GUID, which is essentially a string, a pretty long string, that is unique to every single playback.

And the interesting thing about the GUID is it's actually supplied as part of the HTTP GET request. It's one of the headers. And so if you can combine statistics that you've recorded on your app at runtime with server logs, you can actually get a really good idea of what happened to that particular playback.

Getting the logs is really easy. You can get them from either an AVPlayer item or an MPMoviePlayer controller. It's essentially the same API. You just ask for the access log or the error log. And both logs are basically an array of log entries. And so you can step through that array.

You can examine each individual log entry, see what stuff it's got in it. And we also provide a convenience API, which is that extended log data guy. And what he does is he takes the entire log and he formats it into a file format that makes it really convenient to post to sort of an external logging server.

The file format we've chosen for our statistics is the W3C extended log file format. This is a long name for a very simple file format. Again, it's just a text file with a bunch of header fields at the beginning and then every log entry is a line of text in tab-delimited format. This is the access log.

The error logs are pretty similar. We found just internally that looking at these logs when we're bringing up a stream can give us a tremendous amount of information about what's going on. Even after you've deployed your stream, if you can collect this kind of information from your users, it can give you a really good insight as to whether you've made a good choice when you've selected your different encoding bit rates. You can kind of find out how many of your users are hanging out on the high bit rate streams or not.

You can find out how many of your users are hanging out on the low bit rate streams, versus the low bit rate streams, etc. Maybe you need to add a few more in there at some point. So, statistics are really useful. I'd recommend that you kind of get to know them. And that is it for features this year. So, we've got a couple little ones.

What I'd like to do now is spend some time talking about best practices. And these are essentially areas where when we've worked with you folks, we've seen you run into some issues and problems. Maybe you need to make slightly different choices. And so let's talk about that. The first thing I'd like to talk about is the tricky, tricky area of delivering encrypted content. So you encrypt your content in order to control access to it. And that starts with encrypting each of the media segments in AES-128.

And once you've done that, we recommend that you distribute the media itself over plain vanilla HTTP. It's fast, it's cheap, it's just as safe as anything else, so long as you control access to the keys. And that is really the thing you wanna focus on, is protecting the access to the keys.

So recall that the keys themselves are specified as URLs in the key tag. And so when the player needs a key in order to encrypt the media file they'd like to play, it's going to try to load the contents of that URL. There are a few different approaches to this that can help keep the key secure. And I'm gonna talk about a couple of them.

The first and perhaps the most obvious is that you can deliver the keys over the network using HTTPS. And HTTPS is a good start because it means that no one sort of in the middle can snoop your packets and discover the content of the keys. It's not enough, however. Because you also have to protect against someone using someone else using our APIs to get your keys, someone who's gonna make requests, who's gonna have our player make requests on their behalf. So it's also necessary that prior to playback, that you authenticate yourself with your HTTPS server.

So there are a couple of approaches here. One is that prior to playback, you can connect to your HTTPS server yourself in your application, and you can perform whatever authentication you feel is kind of interesting or necessary to get your server to trust the fact that it's you on the other end.

Once it's done that, your server can issue a cookie that will expire soon after that session is complete, and a cookie that is unique per session. What the server can then do is trust that anyone in possession of that session cookie is authorized to have the key. Now, when the cookie is sent back, our web subsystem will cache that cookie so that when the player has to go ask me for keys, the cookie request will appear to the HTTPS server and it should hand it over.

We actually, when we see a playlist, we sort of aggressively try to download all the keys we can find in it in a short period of time. And so you can actually have your session cookie expire relatively rapidly and things will still work. A similar approach you can take is that you can use the realms feature of HTTP. And this is, what this is, is realms, if you try to download a resource from an HTTP server in a particular location, then it's going to challenge you.

It's going to ask you for some credentials in order to prove that you are who you are. And to get this to work, of course, the trick is that you have to get the credentials to our player. So there's a simple way to do that. And that is, again, prior playback, use NSURL connection to try to download something out of the realm yourself. And that will cause kind of a challenge response action to occur with your NSURL delegate. And so once the authentication happens successfully, again, our web subsystem will cache the credential so that our player can reuse it when we need to obtain the keys.

So that's a pretty safe way to deliver keys. It does require a certain amount of server infrastructure to be built. And so if that's intimidating or if you already have a very similar but not quite the same server infrastructure, there's another approach. And that is that your application can actually provide the keys to us directly when we need them.

And there may be a bit of, you know, and the way you do this is using application-defined URLs. And there may be a bit of a, sort of amongst some people, it's kind of a Scooby-Doo moment, application-defined URLs, what's up with that? So let me talk about that for a moment.

The way application-defined URLs work is, first of all, you decide that in your world, URLs should be, for instance, ABCD colon blah, blah, blah, blah, blah, blah, blah. Fantastic. Well done. But where they really get interesting is where you actually let the system in on this. And the way you do that is by subclassing the NSURL protocol class and by registering your subclass with the system.

What you're doing when you do this is essentially you're telling the system that you're signing up to handle loading all ABCD colon slash slash URLs that come its way. So once you do that, the next step is to describe your keys in the HTTP Live Streaming playlist file as ABCD colon slash slash URLs. And so now when the player needs to load the keys, it's going to call back into your application and say, I need ABCD colon slash slash key number one or whatever.

However you define whatever. And at that point, your application can either hand over the key data directly or if it wishes, it can make up an actual HTTP URL on the spot and hand that to us and we will redirect the request and go fetch it off the web.

So since we're talking about keys, it's important to note that when you implement and register an NSURL protocol subclass, it is private to your application. Your competitor application cannot simply load an ABCD URL and get your keys and get access. It's your content. And so this means that NSURL protocol is a really secure and efficient way to deliver the keys to us when we need them.

Now you may wonder, "Hmm, this sounds pretty cool. Is there anything else I can use these application URLs for?" Well, you can actually. You can use them for playlist files, and I'm going to get to that in a second. One thing you can't do, however, is you can't use them for media files.

We do require that the player load the media files themselves directly, and the reason for this is that this is the information we use to understand the quality of the network, and it allows us to make the right decisions in terms of which bitrate we'd like to play. And so media files must always be HTTP or HTTPS URLs.

The keys and playlists on the other hand are really short and so it's fine to use application defined URLs for those. And in fact if you do or if you use the HTTPS methods that I described earlier, there's a nice bonus waiting for you in iOS 5 and that is AirPlay. When we rolled out AirPlay, we didn't have any support for encrypted video streams. In iOS 5, we do. And this is an example of one of those things, one of those new features in OS that should just work if you followed our recommended practices for doing stuff.

Now, the flip side of this is that when you build your application for iOS 5, it actually gets opted into AirPlay by default. And so if you wish, the flip side of this is it'll just work if your application is followed the best practices. And this is good news for most people, I would say.

For some people, it's actually not so great news because for reasons normally contractual or what have you, you don't want your content playing on some big TV somewhere. And so if you're in that boat, then there is an entire section of the AirPlay overview document waiting for you that discusses how to opt out. And I believe that's up on our website now. It's called AirPlay Overview. So download it and check it out.

The next thing I'd like to talk about is ad-supported playback and specifically what works well, what doesn't work so well. There's a technique that we've seen some folks try to use called chaining players. And this is kind of a clever technique where when you want to display sort of an ad at the beginning of your content or maybe an ad in the middle somewhere, you actually create two players. And one player is responsible for playing, let's say, your television show, and one player is responsible for playing the ads.

And what you try to do is juggle these two things. And so you're playing your TV show along, and then you say it comes time to run an ad. And so you kind of hide the TV show player, and you show the ad player, and you play your ad.

The problem with this technique is that there are several different reasons why it's poorly suited to network players. One of them is that network playback essentially has a buffering stage where prior to playback, we've got to download stuff as fast as we can from the network. So we have a buffer that allows us to play through uninterrupted.

When you have independent players, they're actually competing for bandwidth. And due to the way TCP works, how the competition is going to turn out is actually pretty unpredictable. And so, for instance, you might have a case where your user is watching just the last little bit of a TV show prior to an ad, and you decide you want to queue up the ad. It's possible that pre-buffering that ad will actually starve the player, and you could actually stop playback before you got to the ad. You might decide alternately that, okay, I'm not going to do that.

I'm going to just wait until the TV show stops, and then I'll queue up the ad. Well, the experience then is I'm watching my TV show, and suddenly it cuts to a commercial break, and I'm sitting there watching a little spinning thing. And there's nothing as a user I like better than watching a little spinning thing so that I can watch an advertisement. It's fantastic. And another problem that is specific to adaptive streaming is when a player is active, it's constantly monitoring the network and selecting the best bit rate. For the current connection speed. When a player is inactive, it's not.

And that means when you stop playing one player and you start playing another, the new player is not going to have a good sense of what the correct bit rate is going to be. So an example of how this can manifest is you might start playing your TV show with a commercial. And so it starts with the ad, and in a few seconds it switches up, and suddenly you've got fantastic quality in your Colgate commercial. It's like, ding! Kind of thing. And so then the commercial ends.

And the show, which is what the user actually wants to see, starts. And suddenly we're back to Captain Blocky because we're starting at a low bit rate. And finally, when you're trying to separate your ad playback from your main content playback, you end up with all these different little islands of sort of program content, ad content, program ad content, etc. And what that does is it makes it very difficult to implement seeking because now you're seeking across all these different little islands of content. And it's very difficult to implement something like fast forward rewind.

So chaining players is not going to work well for ad playback. So what do you do instead? Well, we actually have a technique that works really well. And that is to use a single playlist and put all your program and all your ads in that single playlist. And if you've used some of the popular TV applications on the iPad, you've actually seen this technique in practice. The way it works is you have a single playlist. You put all your ad content in. You put all your program content in.

You separate the ad content from your program content with discontinuity tags in between the different ads. You use floating point X-inf durations. And this is very important because it allows us to be accurate when the user seeks to somewhere else in the stream. And a lot of people use the position in the stream to know when they're playing ad content. And so because they want to control the transport bar, they want to disable seeking or something while an ad is playing. And so it's very important that the playback time report is accurate.

Once you've got your floating point exit durations, the next thing you can do is use the boundary time observer of AVPlayer. And what this will do is call it back into your application when we cross a particular boundary time, i.e. when you've crossed from the main program content into an ad content or vice versa. And so this allows you to control your transport bar.

Now, it's actually a fairly common request that ads not be kind of burned in for all time, but that they be rotated based on time of day or other factors. And so even though you may go back and watch the same episode, you may get different ads. And fortunately, the M3 playlist file is so simple that it's really easy to take multiple fragments of playlist files and kind of stitch them together and get a single usable playlist file.

So all you really need to do is break up your program, your television show, for instance, into the little islands that are between the ads, and then dynamically choose the ads you want, pull in those media segments, and stitch them together. There are a couple of different places you can do this.

One place, obviously, is you can do it on-demand on the server. When a client says, hey, I'd like to play House MD episode 128 or whatever, it can say, well, okay, it's about 9 p.m., so I think I'll put in a Budweiser ad or however they decide these things. And that works pretty well. It does require a certain amount of state to be kept on the server. But for one reason or another, it's also fairly popular to want to be able to choose the ads that are played back in the application itself at playback time.

Fortunately, you do have a way to do that, and that's using application-defined URLs. I discussed them earlier. The way it works is when you ask us to play back a URL, you give us back an ABCD or what have you URL instead. And so when we need to load the playlist file or the master variant or the individual variant playlist file, we'll go and we'll ask your application instead. And at that point, your application can go off and look at things like, well, has the user seen this particular ad? Or maybe they've told me where your location is and I wanna put up a locale-specific ad or market-specific ad or something like that.

But using application-defined URLs gives you an opportunity to do this. And if you set everything up this way, then you'll get a single buffering stage. You will rapidly hit the optimal playback bit rate. Fast forward rewind will work great. Seeking will work great. It's a much better experience. So if you're trying to do ad-based playback or anything like it, this is really the technique we recommend. Next, I'd like to talk about a quick performance note.

Pretty much all modern web servers have the ability on a per-mime type basis to compress a resource that's being asked for with gzip. And so if someone asks for a particular file and they ask for it and they say it's okay to gzip it, the server will do this on-demand. If you haven't got your web server set up to do this, we really recommend you do. Because, again, the playlist, it's just a text file. It's really simple. It compresses fantastically.

10x compression is actually typical. It's simple to do. I've got the Apache configuration right here. You can just slam this into your httpd.conf file or whatever they call it these days. And that will tell it any time you hand out an M3U8 file with this mime type, if the client asks for it, and we will, then simply compress it on the fly. Now, it's not required. It's optional. But if you're playing content over 3G or you're playing live event.

Content where you're continuously reloading these large playlist files, then downloading playlist file 10 times faster can actually make a perceptible difference in the responsiveness of the user experience. And so it's easy to do. It's cheap. Most of the major CDNs support it. And so ask them for it.

Speaking of optional things that can make a difference, there are a number of things you can do with your playlist file that give us additional information about what's going on in your content. And one of these is the codex attribute of the stream info tag. What the codex attribute does is it provides a complete list of all the codex that are necessary to decode a particular stream. So in this case, AVC is H.264 MP4A.40.blah, blah, blah, is AAC. And so looking at that, we know that, oh, I need an H.264 codec and AAC. Great.

What it also allows us to do, however, is it allows us to distinguish between the variants you have that are audio only and those that have audio and video in them. And there are certain times when we can make use of this information and provide a better user experience when we're switching. A similar attribute is the resolution attribute. The resolution attribute is another stream info attribute. And what it does is it provides the approximate horizontal vertical dimensions of the video in a stream.

Again, it's optional, but if you do provide it, then it can prevent us from making some boneheaded mistakes, like deciding that we're gonna start downloading your 1080p video stream and playing it into a little web window that looks like this, just because we happen to have the network for it. So you guys pay for the bandwidth and the user gets no additional benefit because we're downscaling it to 320, 240 anyway. We can do that, but we can only do that if you tell us what the resolution is in advance.

I've already mentioned floating point durations. It's really important to use floating point durations so that we know where we are on an accurate basis when you seek around in the stream. And finally, we ship a tool called the MediaStream Validator. And this is a tool you point at your stream, it'll download the stream, it'll check it for syntactical correctness, it'll check for some best practice stuff, some self-consistency things. It produces a lot of output, and some of it's a little hard to read. It produces some warnings, it produces some errors. I encourage you to become familiar with the output of MediaStream Validator, even the warnings, because they often can point out subtle errors.

In your streams, errors that may not appear to you automatically, when you're doing your testing, but may show up for some percentage of your users, or they may cause your streams to stop playing when we do a subsequent update and we tweak how the player works a little bit. So get to know the MediaStream Validator output, pay attention to it. You don't necessarily have to react to it, but you should understand what it's saying and you should say, yeah, okay, I know about that, that's okay.

Next, I'd like to talk about TLS. What's TLS? TLS is basically SSL, secure SOC Slayer, after the IETF got through with it. It's the official IETF version of SSL. The news in iOS 5 is that we have upgraded from TLS 1.0, which was used in previous releases, to TLS 1.2. And along with this upgrade, a number of new cipher suites are presented when we issue the SSL client hello message.

So, TLS 1.2 is a good thing. It fixes a couple of pretty significant security holes that are present in 1.0. It's generally more stable. However, in doing testing, we have discovered a number of 30-party apps, not a lot, but a good few, that are incompatible with TLS 1.2 on iOS 5. Now, we've looked into this, and sometimes we'll find that it's actually a bug in our TLS implementation. We can fix that.

But, most of the time, it appears to be sometimes a bug in the application. More often, it's a bug in the HTTPS server that the application is talking to. So, if your application uses HTTPS, if it uses SSL, I highly encourage you to install the iOS 5 seed you received this week. Try out your application. If your application has problems connecting, check your servers, make sure they're TLS 1.2 compatible.

[Transcript missing]

So those are kind of the trade-offs you need to consider when you're selecting your target duration. 10 seconds is a pretty good default. It works well for a lot of different cases. Another point I'd like to make, however, is that if your stream is not starting up quickly, it's not because you've picked a target duration that's too large. If your stream's not starting up quickly, it's because the bitrate of the stream combined with the speed of the network requires us to spend longer buffering before we can start up.

Working around target duration is not going to help you start up faster. If you're having startup problems, then the recommendation is to add or to have a lower bitrate stream and use that as the initial variant in the playlist file so that on a slow network, we can start quickly at the lower quality, and on a fast network, we'll start very quickly at the low quality and then quickly switch up to the appropriate quality.

Finally, I'd like to spend a little bit of time addressing the folks who are actually writing their own transport streams. Not everyone does this, but when you do, it's really important to get it right. So a couple of points. When we create these media segment files, they're really designed to be continuations of the previous media segment file. You should be able to catenate all the media segment files together and play the stream in an uninterrupted way.

When we load the first bytes of the second segment file, we expect to carry on right where we left off with the last bytes of the first segment file, from the point of view of parsing, from the point of view of encoding. This means the format has to be the same. The track count can't change. The transport stream continuity counter has to continue uninterrupted.

We have to be where we were in GOP structure, et cetera, et cetera, et cetera. Everything must be continuous. The only exception to this is if the media segment follows a discontinuity tag, and if it does, we'll throw everything away, and we'll start from scratch, and we'll start displaying at the next IDR frame we find in the segment file. So segment files, even though they're independent files, need to be continuous encodings.

Second, MPEG-2 transport streams have these structures called the Program Access Table and the Program Map Table. We don't impose a buffering model on you, but we do require that you have at least one of these at the beginning of every segment. And third, there are some formats out there that have very coarse sample time scales.

This means that when they put the media into the format, they lost a lot of accuracy by rounding everything to one millisecond timestamps. If you have to transcode, if you have the unlucky job of transcoding back into MPEG-2 time scales, you can't just multiply the timestamps by 90,000. You're just propagating the accuracy loss. You need to look at, like, the audio samples and look at the sample rate, count up all the audio samples, and reinstate that accuracy.

I'm sorry, it's a pain, but if you don't do it, your streams, the samples won't add up, they won't play right. So, have we dived in deep enough? I think so. Let's pull out. The thing you need to take away is that we continue to add new features to HTTP live streaming and to support new use cases.

There are a bunch more related sessions this week for folks doing media. A couple of them already happened. Check out the video replays. There are a couple more tomorrow, mainly involving editing and capture in both Lion and iOS. Check those out. And so that's it for this week. Thank you for coming today.