Graphics, Media, and Games • iOS, OS X • 57:48
HTTP live streaming lets you send live or pre-recorded audio and video to iPad, iPhone, iPod touch, and Mac using an ordinary web server. Learn how to add subtitles to your HTTP live streams. Get details about other new features and learn more best practices around HTTP live streams.
Speaker: Roger Pantos
Unlisted on Apple Developer site
Downloads from Apple
Transcript
This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.
The final thing about subtitles that's interesting from a production point of view is that due to their nature, they are often added post-production. In fact, particular programs are actually augmented over time with subtitle content. Subtitle content generally is easier to produce than alternate audio, and so you'll often have it localized in a much larger range of languages and variations, particularly in some of the really multilingual countries in Europe.
And so we had to take all of these different considerations into account when we designed our subtitle support for HTTP live streaming. And of course, the first thing that we needed to decide was, in what format are we going to ask you to author your subtitles? And that sort of has some key implications. And so we looked over a number of different formats, and what we ended up choosing was a relatively new, and format that has evolved recently.
It's being, the work there is being done under the offices of the Media Text Tracks Community Group at W3C. And it's called WebVTT, stands for Web Video Text Tracks. And it actually, if you look at it, it's pretty clear it has its origins in an earlier proprietary format called SubRip. And so if any of you are familiar with the .srt subtitles, WebVTT will look immediately very familiar to you.
From our point of view, one of the things that was really appealing about the WebVTT format is it combines an extremely simple text format. You've got these very simple text-based files that carry Unicode, so it's completely internationalizable. But at the same time, the simplicity carries a great deal of expressibility with it in the layout and control you have. And so we thought it was a really good match for HTTP, and it's a really good match for HTTP live streaming.
The specification is available publicly on the W3C website, w3.org. And I guess the easiest way to sort of introduce you to WebVTT is to show you an example. And so here we have an example of a complete WebVTT file. It starts, it always starts with the text WebVTT, so you can sniff the file, you can find out what you're dealing with. And then it's kind of like an HTTP response. You've got the kind of the WebVTT thing, and then you have an optional header fields following that, and then a double line feed, and it takes you then to the payload content.
The payload in WebVTT is just a series of queues. And queues are extremely simple things. I've got two up here as an example. The first thing you see is a time span, and that indicates when the subtitle is intended to appear on screen and when it's intended to disappear.
And then following that, you just have the subtitle text. So you have two subtitles here. The first shows up at 11 seconds in, "Let's play a game," then it fades out, and a couple of seconds later, it's followed by, "You know, I don't like games." So looking at this, it's pretty clear that these are really easy to author.
You could sit down with your favorite movie, stop watching a VI, and type, type, type, type, type, and you could generate a two-hour text file of all the subtitles. And so you can say, "Okay, well, that's great, but I have this stream. How do I get this stream to play the subtitles?" And so that's where we come in.
We look at subtitle media, .VTT file in this case, just like we do any other kind of media. And so in HLS, just as we would take your transport stream containing your audio and your video and you break it up into a series of segments, you do the same thing with your VTT file containing all your subtitles. You break it up into a series of segments and each segment is a complete standalone WebVTT file.
You take all of these segments, you throw them up on your web server, and you give each one a URL and you collect the URLs into a playlist file, into an M3O8 file, and that's your subtitle playlist. Now, of course, HTTP live streaming, one thing I want to point out here is that the most common case for a subtitle playlist is just a static playlist. You've got your two R2 subtitles, endless tag at the end, playlist type VOD.
But there's nothing to prevent you from subtitling live content as well. And so you could have a M3O8 playlist that has four subtitle segments and then 10 seconds later you'll get another subtitle segment and another subtitle segment. And in fact, if you're subtitling a continuous broadcast, such as a 24-hour news station, you'll see segments roll off the top as well. And so subtitles integrate completely into the live workflow of HTTP live streaming as well.
That's fine if all you want to do is play subtitles, but of course the interesting thing is to integrate subtitles with the rest of the presentation. And so the mechanism for that is the same mechanism that we introduced in iOS 5 to carry alternate audio content. It's the media tag and we just have, we've reused the mechanism for subtitles. And for those of you who may be a little rusty or haven't used the media tag, Let me quickly go over how that looks.
I have a diagram here and it has four what we call variants or the same content at different bit rates. We've got an audio only variant. We've got a 400 kilobit, 800 kilobit, 1500 kilobit. And over here on the right, we've got individual audio playlists that have the same audio content but in different languages. And so these are playlists that contain .ac files. And with the media tag, you can associate that set of audio options with all the different video bit rates. And at runtime, you can configure it to say, okay, now I'd like to hear the German audio, please.
So for subtitles, we did the exact same thing using the exact same syntax. We have alongside our group of audio alternates, we have subtitle alternates. So we could have English subtitles, German subtitles. You might have Slovakian subtitles or Portuguese. You might have commentary subtitles. And using the same APIs at runtime, you can discover these subtitles exist, choose the ones you want to see, and set them on the playback item and they all start appearing.
That's a little bit abstract perhaps. And so what I'd like to do is quickly run through an example of actually authoring some WebVTT subtitles and putting them into a playlist so you can see how that works. So let's start with a couple of subtitle segments here. I've got -- I've called them segment zero and segment one. The first segment is the first 30 seconds of subtitle content. Didn't have a lot of room on the slide here, and so there's not a lot of subtitles in each segment, but bear with me.
So two segments from zero to 30, 30 to 60. What we're going to do is, you know, save these as perhaps as individual files, and we're going to take them and put them up on our web server. And each one gets a different URL, and then what we do is we build our playlist around them. And so this is an utterly conventional playlist. Got a target duration, it's got a version, it's got a sequence number, a playlist type. The only difference is, is that the URLs, the URL is going to point to .VTT files instead of .ts files.
And so once you've got this subtitle playlist -- and again, you can have multiple subtitle playlists for all the different languages or types of subtitle tracks you'd like to author. You take the subtitle playlists and you put them into the master playlist and -- using media tags. And so in this example, that subtitle playlist we just authored that contained the English subtitles appears in the first media tag.
It's the subtitle group. We've labeled it as English. And it's got a language tag of English. It's actually part of a group. There's a second media tag as well, and that's indicating that there's another set of subtitles that are authored in simplified Chinese in Unicode in the text.
And it's -- I'll point out again that when you're naming these, you want to name them in the native script of the language so that the native speakers can actually -- read them and recognize them as being in their language. So you do that. You create your group of subtitle media options. And then in each of your variants, in each of your different bit rates, you have a subtitles attribute and you just indicate the subtitle group that defines which set of subtitles are the appropriate set of choices for that alternate.
So we went over that pretty quickly. There's a couple of things I'd like to go back and talk about in a little bit more detail. The first is I've kind of skipped over this timestamp map tag you see at the beginning of the WebVTT file. What's that about? Well, the WebVTT folks quite reasonably decided when they were building their format that they would indicate time in sort of the conventional format of hours, minutes, seconds, you know, decimal number of seconds. And that's totally reasonable choice. Of course, in the media space, that's not how we indicate our -- that's not how we timestamp our content.
In HTTP live streaming, our content is timestamped with 33-bit MPEG transport stream PES timestamps. And so this tag allows you to kind of line up the WebVTT hours, minutes, seconds timeline with the MPEG-2 33-bit 90K based, you know, time stamp. So that you know what corresponds to what. And in this case, we have an example here that says that when you see 000 in WebVTT, that corresponds to 900,000, which is a fairly common starting time for the beginning of a static transport stream.
An important thing to note here is that you don't necessarily need to change this tag every time you have a new WebVTT segment. All it's doing is indicating a synchronization point between the two different timelines so that we can map from one to the other, which means that you might have segment one or the following segment, which covers from 30 to 60 seconds. It might not have 0, 0, 0 in its range of covered subtitle content, but there's a continuous flow of time, and so we don't have to change the tag.
The only time that you need to update the timestamp map tag is A, when we're going across a discontinuity and the clocks are restarting. So if you've got your content and then you've got a discontinuity on some ad content where the MPEG timestamps are restarting and the WebVTT content is restarting, because you should probably be subtitling your ads as well. You'll have a discontinuity in the following WebVTT segments, you'll just want to update that timestamp map so that we can update our internal bookkeeping of that.
And the other place where we'd like you to update the timestamp map tag is in the segment following any 33-bit rollover, because MPEG has this 90K clock, it's ticking along at 90K per second, and it's a 33-bit number, eventually it rolls over. It takes about a day, and so a lot of you would never even see that, but if it does happen in your content, following that in the next VTT segment, We'd just like to see an update of that so we can resynchronize.
[Transcript missing]
And that means we're going to go back to the server every so often. Then the subtitle target duration, this is just totally following the rules of the spec, the subtitle target duration must be the same as the target durations of the audio and the video and whatever else. So it means that every 10 seconds we're going to go back to the server, we expect to see new video, we expect to see new audio, we also expect to see new target durations.
For on-demand for static playlist content, we actually relaxed this rule a little bit because we recognize that in reality, subtitle data is extremely compact and kind of sparse. There's not a whole lot of it. And so asking someone to create a tiny little file for every 10 seconds of content didn't seem to be necessary.
And so we've relaxed the constraint in the case of on-demand of static playlist content. You can have a larger target duration, so you can have bigger chunks of subtitle data. We do still ask you to segment it. We ask you to segment it to a reasonable size. We're suggesting anywhere between one and five minutes.
And the reason we do this is because for efficiency of access, again, if we seek into the middle of a presentation, we don't want to have to chew through 30 minutes of subtitle data we don't need. Before we find the one we actually want to load. And so your segments can be bigger for on-demand for static content, but they should still be reasonable.
And so those are the segmenting rules. This is pretty much all you need to auth your content. As I mentioned, of course, normally in the usual case of subtitles, the user has to turn them on. And so how does that work? In the case of non-forced subtitles, which is the standard case, we'll talk about forced subtitles in a second, the set of available subtitles appears in the asset and particularly appears as a media selection group.
And so because assets can be loaded asynchronously, the first thing you need to do before looking for your subtitles is to ensure that the following property is available. Asset.availableMediaCharacteristics.media Selection.assets. It's big and long. Make sure that's available. At that point, you can ask the asset for its media selection group for the legible characteristic. And in HTTP live streaming, that will get you all the subtitles. It'll get you an entire set of subtitles.
It may get you, "I've got some English. I've got some German. I've got some English commentary." It may get you an entire set. And so the next step typically is you want to select from amongst the set. And you may invoke the user's help when you do this. You want to select amongst the set that's available in the stream for presentation.
We have a set of powerful filtering APIs in AV Media Selection Group. One of the most convenient ones perhaps is to say, "Take the set and filter and sort it according to my preferred languages." And you can get the set of preferred languages ordered by preference from NSLocale.
And so that will give you an array. And once you've chosen the particular subtitle alternate you'd like to display, you just tell the player item. So you select this media option from this selection group. And at that point, the next time we see sub C, as we resume playback, we will start displaying those subtitles.
So that, in a nutshell, is what you need to do. Let's quickly run over it. You've got to take your text and take your movie and subtitle it. So VI, stopwatch, blah, blah, blah. Got a big VTT file. You've got to segment it into a bunch of different chunks.
You've got to put those chunks up on a web server, create a playlist, a subtitle playlist that has a list of URLs, and then you've got to put the subtitle playlist into your master playlist using the media tag to indicate that it's one of the eligible subtitles for that media.
Then, at runtime, you have to look at the asset and you've got to say, "Oh, I've got English subtitles," and you've got to tell the player I didn't play it. At this point, you're there. You've got 70, 80% of anything anyone would want to know about subtitles. Now, that was so easy and it was so fast. I've actually got some time to tell you about some of the other interesting things you can do. So let's look at that a little bit.
I mentioned forced and non-forced subtitles earlier. I'd like to talk about that a little bit. The case of forced subtitles where you're throwing up a subtitle in front of someone even if they haven't asked for it. You would ask, "Well, why would you do that?" Well, there's a few different reasons.
Your lawyers may have told you that, "Yeah, we'd really like to show a localized legal notice whenever someone watches this particular segment of video." A more common use for forced subtitles is that you may have a character in your movie or what have you who's speaking in a language that he understands but the audience is not expected to understand.
And so he may be speaking Swahili. You may want to force a subtitle to come up at the bottom that translates what he's saying in English for the rest of the An interesting point about non-forced subtitles and forced subtitles is that we're only going to display a single subtitle track at a time.
And the consequence of that is that if you have a situation where you have some forced subtitles and then you go off and you author a set of regular subtitles that has all the dialogue and everything in it, you've got to make sure to include all the forced subtitle content in the non-forced subtitle playlist so that the user doesn't miss anything. of that.
[Transcript missing]
If you author or when you author these subtitles, because we encourage you to, these subtitles, just by way of introduction, in addition to having the regular dialogue that people are speaking on screen, also include things like sound effects. And so if in the movie a gunshot happens off screen and it's important for the viewer to know about it in order to understand what's going on, then an SDH subtitle will come up and go bang, you know, and it'll indicate that there's, you know, what's happened to someone who's deaf.
And so if you author these subtitle tracks, we ask that you tag them with these two characteristics, public.accessibility.transcribes spoken dialogue and public.accessibility.describes music and sound. And so if you author these subtitles, you can get details about how you're going to use them. And that way, if you or we, we can look at the set of subtitles and we can know specifically the semantic intent of that particular set.
I'll show you an example here of this. One of the things I wanted to mention was that it's conventional when doing SDH subtitles that they're actually rendered in a non-default position in the screen. Normally by default we render subtitles at the bottom of the screen. To distinguish SDH from regular dialogue, they're often rendered at the top.
And that actually brings up another important point about authoring subtitles, which is it's often useful or necessary to move subtitles out of the way. We have an example screenshot here of Phil who's talking about some random thing or other. And had we put that subtitle in the default position, it would have actually covered the text that's burned in as part of the video. And so we had to move it out of the way.
WebVTT gives you a pretty good range of capabilities, a pretty flexible set of tools for web and web streaming. And so we have a set of tools for controlling the positioning of the subtitles. The first and most perhaps easiest to use is you can simply put a line break in your subtitle text and that will be broken the same way on the screen. Beyond that, the queue can have a number of positioning directives attached to it.
And in this case we have the line directive which controls the vertical positioning and we have the align directive which controls, at least in left to right text, it tells us that we want to begin the text at the left edge of the screen or the display surface. There's a whole range of these and they give you pretty good control. And so that's how you do that.
The next thing that's interesting in subtitles typically is you want to actually style your subtitles. I mean what we have is great, but you might want to sort of jazz it up a little bit. WebVTT provides a small number of built-in styles, bold, italic, underline essentially. And if you've ever done any HTML authoring, then this will be very familiar and very easy for you to adopt. And you can see in our default rendering how that looks.
We have Johnny saying he's emboldened and he's emphasized in bold and italic. If you want to go beyond this limited range, I mean, let's say you want to change this font size or you want to make it a different color or maybe you want to pick a different font completely, you can do that too. And doing that requires a little bit of authoring and a little bit of programmatic control.
WebVTT actually borrowed a bunch of the CSS class syntax, a bunch of the class syntax from CSS. And what this means effectively is that you can take some of your subtitle text and you can tag it with what are called CSS class selectors. And so in our example here, the word breakthrough in Phil's subtitle has been tagged with the class that has a selector .huge.
So that tells us that some kind of styling needs to be applied to that text. What controls what it is precisely is called a text style rule. In a web context, this information might come from a CSS cascading style sheet. In HTTP live streaming, it's provided by you programmatically.
So a text style rule is essentially a set of markup rules, text markup rules, and associated text selector. And so in this example here, we're creating a text style rule that changes the relative font size of the text. And we've associated with the selector.huge. Once we've set up our styles, we set it on the player item, the text style rule's property. And from that point on, during playback, those styles will be applied to any text which has that text selector.
One of the more widely used kind of styling that happens in subtitles is per-voice styling. This is fairly common in situations where you have multiple characters and they're talking at once and it can be -- in order to help the viewer keep straight who's saying what, you'd like to actually tag each voice with a particular style. And so WebVTT provided a special syntax for this, which is V space voice name.
And so you can tag all the comments by -- from one character with one voice and all the comments from a different character with a different voice. And then, again, programmatically, you can set the style for that voice selector. And in this case, what I've done is I've said that Johnny should be light blue and Phil should be red. And so you can see the result here at the bottom of the screen. Johnny says so capable. Phil says the iPhone was a revolution.
The last thing I'd like to talk about when it comes to WebVTT styling is kind of a fun little thing where you can actually animate the styles. And how this is implemented is that you can see here we have an example queue that says, "My heart cries for you." It's five words, five seconds long.
But it actually has timestamps interspersed throughout the queue itself. And what this has effectively done is it has divided the queue into a series of five time ranges. That means that while the queue is up for that five seconds, the rendering system knows that, let's say, in a time between 12 seconds and 13 seconds, that the first two words are in the past and the last two words are in the future.
WebVTT defines special styles for past and future, and so you can set up a special appearance. And if you do this and you have this kind of animated text, as we render the text, as a part of the queue moves into the past, it will be given the past style, and as while it's in the future, it will be given the future style.
I don't have a good screenshot for this. The easiest way is actually to demo it for you. And so what I'd like to do now is bring up Bill May, who implemented a bunch of this subtitle stuff to help me show you subtitles in WebVTT and HTTP live streaming. So if we could switch to the demo iPad. So what I'd like to show you first is just some regular subtitles.
We have our favorite clip of Phil and John here, so let's run that guy. You know, it's true. When something exceeds your ability to understand how it works, it sort of becomes magical. So Johnny is speaking in English. Subtitles in English. Kind of shows the point, but not as exciting.
As I said, however, though, you can have multiple subtitles associated with a particular stream. And so I'm going to ask Bill to pop up the options box here. You can see in our case that we've rendered two sets of subtitles, one's in English and one's in Chinese. So let's choose the Chinese version. and kind of caught Phil at a bad moment there. Come on, Phil, you can do it.
So many amazing technologies, all the applications, the multi-touch user interface. Okay, but anyway, now Phil, speaking in English, subtitle in Chinese. I think it makes Phil look more sophisticated. I don't know. It's just me. So those are sort of our simple default rendering subtitles. What I'd like to do now is show you another clip, and this clip has some kid soccer, and what makes it interesting is that in addition to sort of regular dialogue, it also has some sounds and some music in it.
And so let's see how that can look. So let's start playing it. And what you can see here is if you look at the bottom of the screen, you see what the kids are saying, and if you look at the top of the screen, there is some SDH dialogue, like kids cheering or what have you.
Now, what I mentioned before was that in addition to placing the SDH subtitles, the sound effects, at the top of the screen, it's also pretty conventional to give them a different style, again, so they can stand out a little bit more from the regular dialogue and so Bill can take a look at the text style rules there, and you can see that we've got the default style rules, we've also got the fancy rules. So let's choose fancy and play a little bit more. And now what you'll see is that when we have some of the sound effects coming up, like the kids laughing, then that actually has dynamically changed from being white to default to yellow. So let's go again.
[Transcript missing]
So, let's talk about some additional stuff that we have for you in iOS 6. The first thing I want to mention is that you remember last year when we were talking about iOS 5, we introduced fast forward and rewind playback using iframe playlists. We didn't quite get to the point in sort of the mad stampede to get iOS 5 shipped. We didn't hook that up in MPMoviePlayerController. So people who are using MPMoviePlayerController and want to play their content were kind of left out.
So we've gone and we've redressed that in iOS 6. So now you can see here using MPMoviePlayerController, you can go into fast forward mode and then go back into your normal playback mode. And so all you need is an iframe playlist and now it's there for you now.
The next thing I'd like to talk about is actually kind of a favorite of mine. And this is glitch-free switching between different audio encodings. You might remember that ever since the beginning of HTTP live streaming, we've always told you, you can have your variable bit rate video, right? You can have your video at your 12 different bit rates or whatever. But you should keep the same audio encoding across all the different bit rates of video. And the reason we told you this, frankly, is because it's just hard.
It's just difficult to transition from one audio encoding to a different audio encoding. During playback, without producing something audible, without producing a glitch or a pop or a gap in the audio. And we don't want our content to look like that or to sound like that. And we didn't want to inflict that on any of you either. So, fair enough.
But, of course, we're The implication of this, and someone reminded me of this in the lab just yesterday, was that, okay, I'm an app developer. I want my app approved over 3G. So Apple's telling me that I need a 64 kilobit audio only stream, not necessarily only, but a 64 kilobit stream to meet the 3G approval requirements.
So my audio can't be any higher than 64 kilobits. Okay, that's fine. But I also have a 5 megabit 720p stream, gorgeous video. You're telling me that I've got to use the same not so great 64 kilobit audio in my beautiful, gorgeous 720p video? And so the wailing of the gnashing of teeth and et cetera, et cetera.
So, I'm We went back and we looked at this in iOS 6 and it is difficult. It is tricky, but it's not impossible. We went in, we fixed a bunch of edge cases, we sat down with our audio guys, we put our heads together, and we made it work.
You do have to have timestamps that are perfectly in sync between your different audio encodings for this to work perfectly, including getting the right timestamps on the priming packets if you're using AAC. But if you do, it sounds, muah, it's great. And I'm gonna actually play a little bit for you. So let's start with, we pulled out some audio here. We encoded a fairly high bit rate. So fairly good quality audio encoding.
And then we took the same audio clip and we just compressed the heck out of it. And so I'm going to play it for you now. Actually doesn't sound that bad. And it's not, really, for given how much it's been compressed. But what's hard to tell, unless you're doing a side-to-side comparison, is that a consequence of that compression is that all the high frequencies have been cut off. And this is actually really important because this is a fugue and there's some interesting stuff going on in those high registers.
And so what I'm going to show you next is a transition that we actually recorded yesterday of the stream player switching up from that low quality I just played from you to the high quality. And if you listen carefully, you can hear where it switches and it's just like suddenly all the high frequencies come back. And so let's listen.
Didn't quite get it. It sounded pretty good, doesn't it? Let's listen to it once more. I heard it, but I've heard this clip like, I don't know, 30 times before. But that was a switch. It happened about halfway through, and we went from the low quality to the high quality.
If you come by in the lab later with a set of headphones, I can actually play it for you. But the important takeaway was that it sounded perfect. There was no audible gap or discontinuity there. We just seamlessly transitioned from one audio to the other. And so we're hoping that this gives you more flexibility when you author your streams.
Next is another audio feature, and that's AC3. AC3 is also known as Dolby Digital's 5.1 multi-channel surround sound. And we don't do a lot of surround sound on iPhones and iPads because, you know, when you take 5.1 multi-channel sound, you decode it, then you remix it into mono and you put it out the back speaker, you're kind of losing some of the richness there.
That changes, however, if said device is hooked up to an actual audio system that has a 5.1 speaker setup. It can make the experience much more rich in that case. And so there's a couple of different ways that can happen, of course. One is you can go buy our $50 little HDMI connector and you can plug it in and hook it up to your audio receiver or your TV with 5.1 surround.
Another way, which is probably more common, is your content can be airplayed to an Apple TV that's connected to someone's home theater system. And in that case, it would be really nice to provide them with 5.1 multi-channel surround. So you can do that. Having said it, you don't know ahead of time whether said device, said user has his 5.1 system set up and turned on, what have you.
And so if you're going to provide AC3, you need to still provide the same content also encoded for AC3. So that way, if we're hooked up to a 5.1 system, we're going to be able to do that. If we're hooked up to a 5.1 system, we'll detect it and we'll automatically choose your AC3. If not, we'll choose the AC and we'll play as we always have. This selection is made once when you start playing back your item. And it really is a once -- it's an all or nothing deal.
And that means that if we choose AC3, we're only going to switch between the AC3 variants. We're not going to switch back and forth between AC3 and AC. All your content must be AC3 within that variant, which means your main content and your ads both have to be AC3. The reason for this is because the process of switching between PCM output and AC3 on today's hardware is ugly.
It takes like three to five seconds to happen. It is often accompanied by some ugly and disturbing clunks and clonks from your audio equipment. And so we're not going to switch back and forth. You can provide AC3, but if you do, all or nothing, you've got to have AC3 for all your different bit rates or all the ones you choose to expose for AC3 and all the content within that variant.
How you do it is you simply add another variant to your playlist file. You indicate in the codex attribute that it's AC3 and that tells us, ah, this stream has AC3. I should go check to see if we're connected to AC3 and throw that switch if so. For the simplicity here, these two variants have just been, I've taken, the video is, I've muxed in AC3 in one instance, I've muxed in AAC in the other instance. You can do it that way. That's the easiest way to do it.
You can also take your AAC and put it into an audio only playlist, take your AC3, put it into a separate audio only playlist and use the media tag so you don't have to duplicate your video. That's probably the more efficient way to go. But those are how you put things together to get AAC. And for premium content, I know that there is a fair amount of demand amongst the more discriminating users for this and so we hope that this will help you satisfy that demand.
In terms of other tweaks that we've done for iOS 6, we spent some time working on our algorithms that decide when to switch bit rates. And specifically we had two focus areas. The first was how we switch up and where we decide to switch up. And the focus of that was to detect when the network connection has a relatively stable bandwidth, which is to say the bandwidth the network is delivering is not fluctuating crazily like this. If it's more steady, if it's more stable, then we will actually choose a bit rate that is higher, closer to that theoretical ceiling of what the network can deliver than we would previously in earlier versions of iOS. We call this reducing the bandwidth cushion.
Correspondingly, if you're on a very kind of highly fluctuating network and you're on a very highly fluctuating network, and then sort of 3G comes to mind, we may not switch up quite as optimistically, because we're going to be a little bit more conservative because we realize that the network is incredibly variable. And even though it might be fast now, it might not be fast two seconds from now. And if we decide to switch up, then we might subsequently stall.
That actually takes us into the second area of focus, which was the switch down algorithm. And what we focused on there was to try to identify points at which bandwidth is dropping rapidly, more quickly than we have previously, and switch to a lower bit rate so that we actually transition to a lower quality prior to the stall, and we don't stall as a consequence.
And so both of these are not really -- they're not really exposed to you as developers. We do expect you to see, in general, better switching behavior in iOS 6. So we'd like you, when you install the seed, you test your apps, you test your streams, to be kind of on the lookout for this. In general, it should sort of be a little bit faster than we have previously, and we're not going to be able to do that. So we're hoping that this will lead to better quality for users and fewer stalls.
Of course, at some point, you're always going to stall, right? There's some circumstances where you're just going to stall. The network just can't keep up with what you're trying to deliver. And so we've added a new API in iOS 6 as well, and this is an explicit notification of playback data starvation. The reason we added this was because we had some folks who were trying to do some quality of service monitoring on their streams, and they wanted to know when stalls occurred.
The problem is that on iOS 5, what a stall looks like to someone who's watching playback happen through their AV player item is that the player item transitions from the playing state into the buffering state for a little bit, then it transitions back into the playing state. The problem is that that transition, that a data starvation stall is not the only thing that can trigger that transition. We will go through that transition when you seek. We'll go through that transition when you choose a different subtitle alternate.
And there's a few other things that cause us to go through that transition as well. They're not strictly about, "Oh, I couldn't keep up because the network was too slow." And so to help disambiguate that, we have a new notification that is sent explicitly when we had to pause playback and rebuffer because the network just wasn't fast enough. Possibly we had to switch down at the same time.
Speaking of quality of service monitoring, we do encourage you to do so. We do encourage you to look at the quality of service you're getting on your client devices and validate your assumptions that your set of bit rates and everything else is what you're expecting. And when you do this, the access log and the error log are your friends.
If you haven't looked at the access log and the error log, they were introduced in iOS 5 or 4.2 or something. And the access log is every now and then through playback, we log some interesting events like when we switch bit rates or when we see a stall, a few other different things.
And so it's kind of a ticker tape of what happened during the -- during playback. When we run into hard errors like when we get HTTP 404s on various parts of the -- on the playlist or various parts of the playlist, those are put into the error log.
So the access log and the error log are even better in iOS 6. We've added some new fields to the access log including the playback type, whether you're live, VOD, et cetera. We've added a lot more information about the bandwidth we're observing from the network, so the min, the max, standard deviation, and also what we saw at the point where we decided we wanted to switch. And then we've also added some things you can kind of keep an eye out for.
For instance, we have the number of segments that actually took longer to download than real time. We had a 10-second segment, took us 12 seconds to download. It's always a red flag. And the number of segments that we downloaded over cellular, which can be interesting, particularly in these -- this day and age of metered billing. And additionally, we've enhanced what we put into the error log so there's more detail there now. If you're getting any kind of playback error, particularly unexplained playback error, the error log is the first place you should go and it may help you understand why things failed.
Speaking of the access log and the error log, you no longer have to pull the logs or use your special ESP powers to figure out when we've added something new to those logs. In iOS 6, we now have notifications you can listen to and we will fire those notifications whenever a new entry is added to the log. And so you can keep an eye on those.
[Transcript missing]
Next, new API. This API is around making your life easier when you're providing content for application defined URLs. And for those whom this is a new thing, let me briefly explain. When you tell us to play a playlist file or go get an encryption key or something like that, those are specified as URLs. And in the normal course of affairs, those are HTTP colon slash URL. And we go off and we load them from network and we get the information that way.
HTTPS often in the case of encryption keys. But you have an alternative. And that is you can make up your own URL scheme, you know, ABC colon slash slash. And you can give that to us and say my playlist is at ABC colon slash slash playlist. Or what have you. And if you do that, that tells us we have to go back and ask you for what the content is. Roger Pantoos There's a couple of good reasons to do this. For playlist files, this gives you the ability to customize your playlist files.
For instance, if you wanted to specify some ad content that was dynamically identified, you wanted to sort of pick your ads at run time. Roger Pantoos And for encryption keys, often you've already built a secure pipeline for delivering the encryption keys from the cloud down to your client.
And rather than have been then set up an entirely new server infrastructure to sort of field our HTTPS requests, you may want to smuggle it down to your application sideband and simply hand it over to us when we need it. And so when you do that, you can specify your keys using your custom URLs.
The problem is in iOS 5, in order to get this to work, you had to implement a global, what's called an NSURL protocol. And it was global, so the state was divorced from all your different objects in your application. And it was a little complicated and it was a little tricky to get right. So in iOS 6, we've introduced a new API. There's a new object called an asset resource loader. And an asset has one of these things. And this object mediates the access to these kind of problematical URLs.
And so what you can do is you can set, your application can set a delegate on this object, on the resource loader. And when it does, when it comes time for us to load one of these URLs we don't recognize, we will package that up in a request that includes the URL and the byte range, if we're asking for a byte range. And we'll say, hey delegate, this resource loader needs to load a URL. Should we wait for you to load it? Load us for it.
Load it for us. So what you should do is you should look at a request, look at the URL and see if it's your ABC colon slash slash that you've decided you want to load for us. If it is, you should return yes. At that point, we will sit back, we will twiddle our thumbs and you go off and you rummage around and you come up with the content for that URL. Once you've got it, you just tell the resource loader, here it is, you can finish your loading, I've got this response. And then we'll go off on our merry way and we'll continue playing.
So, This is an extremely straightforward process. It's much easier in NSURL protocol. You can tie it in to the rest of your object framework. We hope this makes it easier to supply your own content to us dynamically, your own playlists and keys dynamically, and maybe it'll help you clean up your code a little bit as well. The last thing I want to talk about today was sample level stream encryption. What we're doing here is we're giving you an alternative way to encrypt your content.
As you know, since the beginning, we've had a provision for encrypting content, and it was simply once you've got all your segments, you take each segment file and you just encrypt the entire thing. And this has the benefit of simplicity. It's extremely easy to author, and we do like it.
But it has some downsides as well. One of the downsides is that everything gets encrypted. All the structural information in the file, all the padding, of which in TS there can be a lot, and that means we have to decrypt everything. Even though you don't have to decrypt everything to protect your content, you only have to protect the samples.
And so the fact that we have to decrypt all this extra stuff on the device means more CPU, which means worse battery. It's not an ideal situation. The other thing about just going off and encrypting everything is you want your files to remain in their encrypted state as much as possible. That's the most secure thing. And so you don't want to have to decrypt your files.
But while they're in this encrypted state, when the entire file is encrypted, any tools that you may be running that do things like segmentation or validation or anything like that, those files are totally opaque in their encrypted state. And so to do anything with them, you have to decrypt them first. And so that's not an ideal thing either.
So we're providing a new alternative in iOS 6. And what we've done is we've defined a format where only the elementary audio and video streams are encrypted. And that's basically media engineer talk for just the samples. And so we just encrypt the samples and then they're packetized and they're put into your standard .ts files and AAC files in a regular way. And you end up with what looks pretty much like a vanilla transport stream or a vanilla MPEG audio stream, audio elementary stream.
The difference, of course, is if you look at the samples, they look like, you know, they look like random crap. And so to help you not get into trouble there, we have defined new stream types for the encrypted variants in the transport streams. And, of course, when the samples are encrypted, you can no longer sniff the samples to detect what kind of codec you're going to need.
And so we've also made a provision for carrying the codec information for the audio setup in a new descriptor. So that's carried into clear. So you can just take a look at that and you'll know, you know, and you can tell that you'll need 128 kilobit AAC, 44.1, what have you. To indicate to us that your files are in this new format, you just have to use the new method value that we'd find in the key tag. It's sample-aes instead of the old aes128.
Of course, to use it, you need to be able to author them. And so we will be providing you with the format details of this. We'll do it eventually, of course, in a spec update of the internet draft. If you are authoring, if you're responsible for an authoring tool yourself and you'd like to know about the format earlier, contact us offline and we'll get you set up with format details ahead of the public release so you can get a head start on that.
So that's it for iOS 6 for HTTP live streaming. What I like to say in summary is now we've got subtitles. They're really easy to add. They're really easy to author. Go off, check them out. Come to us. Give us your feedback if you have any on those subtitles.
We have made significant changes to the player engine. We've made some improvements. We've made some additions to the API. So install the seed, test your apps, test your streams. Let us know if you see problems and we'll see what we can do about addressing them before iOS 6 ships.
And finally, if your app does AirPlay, then all of these changes and improvements have been carried over to Apple TV as well. If your app does AirPlay, go off and buy an Apple TV. They're $99. Install a WWDC seed on it. Test out AirPlay. It should work better. We'd like to know if it doesn't.
I'm hoping that I've given you more than enough information, more than you could ever want in this talk, of course. There's a possibility I haven't. And so if not, the first place you should look is our fantastic resources page on developer.apple.com/resources/http-stream ing. That has specification information. It has sample codes. It has best practices, tech notes, videos, et cetera, et cetera.
Check that out. It will probably answer your question. I've got the WebVTT link on here one more time. Eric Verschen is our evangelist. He loves answering our questions. But Eric also hangs out on the ADC forums, and so do a lot of other smart people. And if you have a problem, there's a good chance someone else has also had that problem. In fact, they might already have the answer. So take a look at the forums. Search for your problem.