HTTP Live Streaming Introduction - WWDC 2009

iPhone • 51:13

HTTP Live Streaming is a revolutionary new way to deliver a live video experience using the same technology that powers the web. Learn about the HTTP Live Streaming architecture, technology requirements, and how to prepare content for streaming. See the easy steps to integrate streaming into your application to provide live or on-demand video to your users.

Speakers: Roger Pantos, Bill May

Unlisted on Apple Developer site

Downloads from Apple

SD Video (107 MB)

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript has potential transcription errors. We are working on an improved version.

Good morning. My name is Roger Pantos and I work on the QuickTime engineering team here at Apple. And today what I'd like to talk to you about is how with iPhone OS 3.0 or Snow Leopard, you can watch live streams on your devices using HTTP. Now for some of you, that whole notion is like coming out of nowhere, so let me set that up a little bit for you by looking at some recent history.

The iPhone has always been able to play back movies over the network. Using HTTP content requests, we were able to provide a really high quality user experience essentially by playing static movie files off of HTTP servers, and we call that technique progressive download. It works extremely well for content that's already been produced such as podcasts or video podcasts, or even YouTube. And it has been extremely popular on the iPhone. In fact, YouTube is one of the most popular applications we ship on the iPhone.

But the moment we supported it, we immediately started getting another feature request. What about live content? People wanted to be able to watch events as they were occurring on their phones. They wanted to be able to watch newscasts; they wanted to be able to watch baseball games. And so when we decided to address this, we started by figuring out what our goals would be.

The first thing we decided was we wanted to build something that would be tuned for large-scale broadcast-style content delivery. We wanted something that would scale up to millions of simultaneous viewers. Now one of the things we have learned from our previous experience with streaming is that if you have to build up your own infrastructure to scale to that level of demand, it can be extremely expensive.

And for a lot of companies, that expense is cost prohibitive. So in addition to being scalable, we also wanted to build something that would scale in a way that was cost effective. Another thing we learned from our streaming experience was that we wanted something that would negotiate firewalls and NATs, network address translation boxes, fairly effectively. This improves the user experience and it reduces our customer support cost, and it also reduces yours.

Finally, we wanted to build something that was easy. We wanted it to be easy to adopt, and we wanted it to be easy to interoperate with the existing production systems that content producers have for producing live streams. One of the keys to doing this was to leverage existing standards.

So we sat down with these goals, and when we started thinking about them, we concluded pretty quickly that we wanted to use HTTP. We have a lot of experience with HTTP and we like it a lot. It is extremely widely deployed and well supported. Just about every company in the world now has a website, and there are a lot of companies that have really highly-skilled people who can make those websites sing. The other thing about HTTP is that it has demonstrated the ability to deliver massive amounts of data. Just at Apple, we deliver hundreds of gigabytes everyday just in software updates.

There are a number of proven techniques that you can use to scale HTTP to high demand. You can offload a single server by using server load balancing. You can create caching proxies that will help you scale your demand geographically. And finally, there is an entire industry of companies, the so-called content delivery networks, CDNs, Akamais, Limelights, people like that, that is set up to essentially rent you the capacity to scale HTTP on-demand. And so it's an extremely cost effective way of delivering massive amounts of information over the internet.

Finally, HTTP is extremely adept at passing through firewalls, and because it's so popular on the internet, just about every network address translation box is preconfigured to support out of the box. This is extremely important particularly for iPhone users, because iPhones will often access a network through a NATed Wi-Fi hotspot. And some of these hotspots are in places like airports and coffee shops where even if the user was knowledgeable enough to reconfigure the NAT, they have no administrative access to it. So it was really important to find a solution that would work well with firewalls and NATs.

The thing about HTTP though, is that it's really tuned, it's really optimized, it's really prioritized to delivering static files, that's what all the third-party supported products are designed for, that's what all the infrastructure has really been set up for. And so, but you have to ask yourself, OK, how do we take something like HTTP that really delivers those static files and deliver live streams over it? Well, the first approach you can think of is probably not too far from what we're actually doing.

Let's take a look. What we do is up there on the left, we start with a live audio/video signal. This signal could be coming directly from a camera, it could be coming off a satellite feed, it could even be coming off a tape deck, whatever. We take that audio/video signal and we feed it into a media encoder.

Now what this media encoder will do is, digitize the signal, encode it, and produce a continuous stream of digital media. That stream of digital media is then sent to the segmenter step. What we do in the segmenter is, we divide that continuous stream of digital media into a series of individual segment files.

And typically each segment file has about the same duration of media, so for instance each segment file might be ten seconds of media. Once we have these individual segment files, we have something that we can place on a web server and that can then use the native capabilities of HTTP to distribute through the cloud to our clients on the other end.

So now what I'd like to do is take a look at this in action, and so I'm going to invite my colleague, Bill May, come up on stage. And what we're going to show you is a third-party application from CNN, I think we showed it to you a little earlier in the week as well.

It's the cnn.com application, so Bill is going to click it here and it's going to come up. The first thing it's going to do is, it's going to go off the network and it's going to populate itself with some news of the day. There's a live tab down at the bottom, Bill has clicked it. And so now you can see the live features of the CNN application.

So if you tap on that large picture at the top, that will launch the live feed. And so it's launching here now.

And [phoen audio] accuracy is very important -

And here we are. There we go, OK.

She's a record holder now, she holds three now -

Wow.

-- world records.

[ Applause ]

And it looks great, it's really fantastic. And the cnn.com developers were able to add this capability to their application really with just a couple of lines of code. And so it's very simple to watch CNN on your phone, but it's also very awesome. And so we're really excited about that.

What we're going to do next is take you a little bit behind the scenes and show you the equipment we were actually using to demonstrate, to put together that demo. In this case, the audio/video input that we were starting from was originating at the CNN broadcast center in Atlanta.

The audio/video signal was being sent into an Inlet Technologies Spinnaker 7000. Now this is a box that can consume an analog signal and produce an MPEG2 transport stream. So we take that MPEG2 transport stream and we send it via multi-cast UDP across a local network to a segmenter program that's running on an XServe.

The segmenter program consumes the transport stream, divides it up into a series of segment files, and places it onto a web server which then acts as an origin to the Akamai network. The iPhone we have up here on the stage was able to pull those files off a local edge server from Akamai here in San Francisco. That's kind of the marketing level overview.

What I'd like to do next talk a little bit more about how the actual protocol works. So as I said, we start by having a server that converts a continuous stream of media into segment files. As it's writing these segment files to disc, it's also maintaining a playlist file. And now a playlist file is essentially just a file that has a list of all of the segment files that are available on the server.

So when we want to play, the way it works is, the client starts by downloading the playlist file, now it has a list of all the available segments, it downloads each segment in turn and then plays them. In the case of a live stream, after it's done that, it will then go back to the server, redownload the playlist file, and see what's changed, see if there are new segments.

So clearly the key to all of this is this notion of a playlist file that is continuously being updated. So I want to show you how that looks. What we have up here is a diagram of a series of segment files that have been produced by our segmenter. Those blue segment files in the box are the files that actually appeared in the playlist.

You'll notice off to the right we have segment number six. It's not in the playlist yet because it's actually currently being written. Remember that we were going to write about ten seconds of media into each file, and so we always have a current file that we're writing the media into, and maybe it's only two or three seconds in, so it's not part of the playlist yet. But if we wait ten seconds and then down load that playlist again, you'll see that now we have still three segments in the playlist file, but now segment six has become the last segment in the playlist file.

You'll also note that segment three is no longer in a playlist file because we have a live playlist here, we have a fixed number of segment files in the playlist at any given time. So as a new segment file is added to the playlist, the old segment file rolls off the top. So let me show what a playlist file actually looks like.

Here we go. When we went looking at the different playlist file formats that had already been defined, we ended up choosing the M3U playlist file format. It's extremely popular on the internet largely because it is extremely simple. An M3U file is essentially just a text file that contains a list of URLs, each URL being a byte of media to play. The other thing an M3U file has in it are tags, these are these things that are prefixed by hash mark. So that first tag you see up there, EXTM3U simply indicates that this text file is an extended M3U playlist, that's thay guy there.

The next tags, the ones that have the X in them, are tags that we added to the M3U format to support this notion of a dynamic playlist. So this one here, TARGETDURATION, indicates to the client that when new segments are added to the playlist file, the likely duration of each segment, the new segment that will be added next, is probably going to be about ten seconds, and that is used as a hint to the client when it's refreshing that playlist file.

The next tag, the media sequence number, indicates the media sequence number of the first URL that appears in the playlist. And what the client uses this for is to resync itself inside the playlist when it refreshes the playlist, because it may not be refreshing a playlist in sync with how the server is recreating it.

And so right now the media sequence is one and the first segment in the playlist is one. If we were to wait ten seconds and download it again, then you'll see segment one rolls off the top, the media sequence number is two. And in fact, the first segment in the playlist file is two, and so on and so forth.

So those are the media sequence tags. That's essentially how the live playlist works. As new segments are added to the bottom, all segments roll off the top. So now let's take a closer look at one of those individual segments and see what kind of animals they are. The primary file format for segment files is an MPEG2 transport stream. Now an MPEG2 transport stream is an industry standard for the representation of continuous media. It is extremely popular in the broadcast industry.

It is used by cable providers. It is used by over-the-air digital TV transmission. It is used by satellite providers. And in fact now it's also being used by Blu-ray. One of the advantages of the ubiquity of this standard in the industry is that there are already a large number of third-party products, both hardware and software, and also people who understand MPEG2 transport streams.

One of the things about MPEG2 transport streams however, is that there is a certain amount of overhead associated with them. Now this overhead is not such a big deal when you have both audio and video, but it really starts to show up when you have audio-only streams. So for this reason, we also support as a supplementary format, MPEG elementary audio streams.

And what these guys are, is essentially just a series of audio packets catenated together into a file. And they're extremely well suited for delivering high-quality audio at extremely low byte rates. Speaking of which, what kind of codecs are we using inside these containers? Well there's no surprise here. MPEG2 transport streams contain audio and video. For the video, it's H.264. That means on the iPhone H.264 baseline profile, level 3.0. On a desktop machine where you've got larger displays and some faster processors, we'll support currently up to H.264 high profile level 4.0.

For audio, our recommendation is to use HE-AAC version one. And the reason we recommend it is because it produces the highest audio quality at the sort of byte rates that we typically use for streaming. These byte rates typically for audio are between 32 kilobytes and 80 kilobytes per second.

We also support the traditional low complexity AAC as well as our old standby MP3. So if you have some media which is encoded with H.264 video, AAC audio, you put it into a transport stream, you chop it up into segments and stick it on the web, you can play it in your application.

How do you do it? Well it's easy. On the iPhone we play live HTTP streams using the same API that we play for progressive download. The only difference is that the URL you feed to the MPMoviePlayerController is the URL to the playlist file rather than to a static MP4 file. You can also play live streams in mobile Safari. And to do that, you'll use the HTML 5 video tag, and you'll provide the URL of the M3U file as the source attribute of the video tag.

On the desktop we can play HTTP live streams through QTKit. HTTP live streaming is one of the new features of the QuickTime X support in QTKit, it's one of the benefits you get from adopting QuickTime X in your application. And so the way you get it is use the QTMovie initWithAttributes call, you set the QTMovieOpenForPlaybackAttribute to yes to activate the QuickTime X support. And then you provide again the URL to the playlist file as the QTMovie URL attribute.

So that is essentially all you need to know to play live streams on your applications, it's very straightforward. But I'm standing up here, I've got some time, so I'd like to talk to you about a few of the more advanced features that we offer in HTTP streaming. The first one is media encryption. A number of content owners care a lot about protecting their assets, and so we provide the ability to encrypt the segment files in transit.

Another thing we have is the ability to capture a live stream as it's being streamed, capture a live event for later playback, and we refer to that as on-demand presentations. And finally, perhaps one of the most exciting features in HTTP live streaming is the ability for the player to switch dynamically between streams of different byte rates to adapt to the current network quality. So let's talk about encryption first.

When we encrypt segment files, we use 128 byte AAS encryption. While inside the segment file, we use Cipher Block Chaining, and on the edges, we pad them with PKCS7 padding. Encryption is typically done at the segmentation phase. So we'll encrypt the segment files before we distribute them off the web. After you've encrypted the segment files, the next step is to deliver the encryption keys to your clients.

Now the encryption keys are delivered over HTTP, and so it's up to you to set up some means of access control to ensure that only authorized clients are able to retrieve those keys. And to do this you can use any of a set of well-known, well-established HTTP techniques such as creating a secure realm that requires HTTP authentication, or establishing a secure session cookie and requiring the presence of that cookie in order to release the key. And so let's take a look at how encryption is represented in the playlist file.

You'll see that we have a new tag up here, it's the X key tag. And what it's telling us is that for all subsequent media files that appear in the playlist, they will be encrypted using AES-128 encryption, and that the key to decrypting those files can be obtained from this URL here.

If you want to, you can switch the key you're using once you've decided you've encrypted enough files with it, you can rotate the keys. And the way you do this is you just add a new instance of the tag into the playlist. So what this playlist is now saying is that segment 39 is encrypted with the first key, and now segment 40 and all subsequent segments will be encrypted by it with the second key, at least until it changes again.

So using this technique, you can encrypt your media, but you can still deliver your media over HTTP which is fast and cheap. The next thing I'd like to talk about is live versus on-demand presentations. Now essentially the only difference here in our terminology of live presentations is that when you're adding new segments onto the end of a playlist file, the old ones disappear off the top, they do that. But what if they didn't disappear off the top?

What if while you were adding new segments to your playlist file you simply kept the old the ones. Well that's an on-demand presentation. And they're really focused at two different usage models. The first is, well let's say you're CNN, let's say you're a 24-hour news station. You're broadcasting continuously, you don't want to keep all of your segment files on your website going back 50 days. You don't want to have your playlist file expanding indefinitely.

And so you'd use a live presentation, the guy on top. On the other hand, let's say that you're broadcasting a baseball game for instance. Something that has a well-defined beginning and a well-defined end is actually really well suited for on-demand playback, because I as a viewer could tune into the baseball game half an hour after it started, but I could still watch it from the beginning. I could also seek around it a little bit, or even jump to live and start watching it live. And so there are some benefits of having all the segments remain available on the server.

And so we offer a choice there. When the game ends, in the example of the baseball game, the last segment file is added to the playlist and the playlist is marked as complete. This tells the client they can stop refreshing the playlist for changes. So now let's compare the actual playlist files of live streams versus on-demand steams. The first on the left here is a live playlist and as I said earlier, what happens is every time a segment is added to the end of the playlist file an old segment rolls off the top.

Well contrast that with an on-demand playlist file. So segments start coming in here at the beginning when the game begins and they'll remain in the playlist as new segments are added until the game ends. And this might end at segment 1130 for instanc0e. The last thing the server is going to do is place an endless tag into the playlist, and this informs the client that the playlist is now complete. What you have at this point is something that is kind of akin to a progressive download situation where you have a bunch of media up on your web server, the client can go back to it at any point they want and watch it in its entirety.

The only difference between HTTP live streaming on-demand playlist and the more traditional sticking MPEG4 files up on the HTTP server is that today our players give you a little less control in terms of the granularity of seeking around in an on-demand playlist. But you still get the benefits of encryption and stream switching, so you kind of have to decide which one makes more sense. But one of the nice things about on-demand is that it's almost instantaneous to transition from broadcasting a live event to having an archive available as on-demand on your server.

Speaking of stream switching, let's talk about the problems of network variability. So when I say network variability, what I'm essentially saying is that you've got your iPhone, you've got your Macintosh, and you're downloading information from the network. The network bandwith that is available to you through whatever connection you have, is going to vary, and sometimes it's going to vary a lot. This has always been a problem for anyone who wants to stream video over the internet.

But the iPhone particularly has this problem in spades. Versus let's say you're starting at home and your iPhone is associated with your Wi-Fi hotspot at home, it's backed up by your cable modem. You've got tons of great bandwith available to you. But then you leave. You walk out of your house, you take your iPhone with you, you drop off your Wi-Fi network onto 3G today, that drop off in bandwith is considerable. And the variations don't stop there. As you walk around with your iPhone, as you pass buildings, as you go under trees, as you get closer to cells, as you get further away from cell towers, your bandwith will change considerably. You don't even have to move in fact.

With 3G the bandwith will change depending on how many people are in your cell and how many people are using the network in your cell to the back hall. And so the iPhone particularly needs to deal with a network bandwith that is fluctuating continuously. Now if all you had was a single stream at a single byte rate and you wanted to deliver that to your users, you've got this kind of ugly choice you have to make. On the one hand, you could encode a low byte rate stream that everyone could watch.

Great, but it's not going to look very good. On the other hand, you could encode a higher byte rate stream that looks great, but most of your users will not be able to sustain that bandwidth as they're downloading it. And sustaining the bandwith is particularly important for live streams.

The reason being that if you have to stop and buffer for 20 or 30 seconds beca0use your network connection can't keep up, well you have live content that's expiring on the server as you're waiting. When you finally get back to playing, your users are going to notice a discontinuity because the next content you are planning on playing is no longer there and they have to skip ahead. So it's key to the user experience to be able to sustain playback.

So the best approach to delivering this kind of user experience on the internet, the wide open internet, is to encode multiple versions of your stream and to have the client switch between them dynamically based on its view of the current bandwith available to it. That's what we do in HTTP live streaming. And the way we express it is through something called variant playlist.

Now a variant playlist is an M3U file like our other playlists, the difference is that instead of the URLs being media segment files, the URLs are actually pointing to individual streams. Each URL is prefixed by a tag that indicates information about the stream such as the overall byte rate required to sustain viewing it, to keep up with it, and optionally some information about the codecs.

So the way this works on the client end is the client starts by downloading the variant playlist, this gives it a sense of which variants are available, what kind of alternatives it has to viewing a stream. It will then decide if and when to switch based on current network conditions. Now our current player algorithms have been tuned to minimize the risk of creating that kind of network stall that is so disruptive to a live streaming experience.

I think that this is pretty clear, but the graphics guys got together and created this really great slide, so I have to show you this slide. So let's say that you set up a live playlist, you've got some content, it's kind of a medium resolution, medium byte rate. And you've got your playlist up there and segments are appearing and disappearing through that playlist. What if at the same time you put those files up on your web server, you also put two other streams up on your web server.

These are all peer streams, your client could chose any one of them and play them and get the regular streaming experience, it could play the very low resolution one, the very low byte rate, the medium resolution one, or the very high resolution one at the highest byte rate. Or you could put all three URLs for each variant stream into a variant playlist file and give that to the client. Now the client knows about the existence of all three of these streams and it can go back and forth between them as it chooses.

So let's see how that actually looks in a playlist file. So again we have an extended M3U playlist up here, it's got three URLs in it; one for low, one for mid, and one for high. Each URL is prefixed by that stream info tag, and what each stream info tag is saying, first of all, that all three streams are program ID1, so they all refer to the same content.

The difference is that each one has a different bandwith. The first one has a bandwith of 128 kilobytes per second, the second has 256, and the third has 768. So with this information, the client can choose which one it wishes to play and then switch between them as its network changes.

In our current implementation, the players will actually start with the first URL in that playlist, but that may change in subsequent release. Another thing you can express with a variant playlist is the set of codecs that are required to play a stream. And this can come in useful in cases like this Let's say that you decided that in addition to these three streams, which you would expect pretty much any client of your services to be able to play, you also wanted to provide a higher byte rate stream that had a much higher resolution. And in order to achieve that resolution, you needed to use some more advanced tools from the H.264 codec.

In that case, you could add something like this to your playlist. Now this fourth stream here in addition to having a byte rate of 1.5 megabytes per second, has a codecs tag. The codecs tag is formatted according to the rules of RFC 4281. I'll translate it for you.

That first element, AVC 177.40 indicates that the stream requires an H.264 codec level four or main profile level 4.0. That second entry, MP4A.40.5 indicates that it needs an audio codec which is HE-AAC version one. So a Macintosh downloading this variant playlist might well attempt to switch up to that 1.5 megabyte stream because it knows it can play it. Today's iPhones on the other hand will not because they know that they don't have a main profile level for a codec. Another thing that you can express with a codecs tag is the fact that it's audio only.

The codecs tag when it's present indicates the complete set of codecs that must be installed in order to play the stream. What that means is if you have something like this you've got a fairly low byte rate stream at 64 kilobytes,d you have a codecs tag that has only the HE-AAC entry. Because there's no AVC in there, there's no video codec in there, you can conclude that that stream is audio only.

And so you can add something like this so that your users will switch down to an audio only version of your stream in the case of an extremely low bandwith connection. So what I'd like to show you now is what this looks like when we put everything together, and so I'd like to bring Bill up on stage again. What we're going to start with here is streaming on Snow Leopard.

So we're going to go into Safari in the browser, and before we start playing, what we're going to start with is an extremely low byte rate stream. And if you watch it carefully, you'll see it switch up to a higher byte rate once we decide we have enough bandwith. So Bill can go ahead and play it.

At Accuweather.com through the afternoon and evening.

So it's opening in Snow Leopard here.

meteorologist, Dr. Joe Sobol.

And -

[ Music ]

And can we switch?

Here's positively the best place to start your day.

Let's run it again.

Here's where night lights up in the most unexpected way.

We'll close that and start it again.

Here's positively the best -

So you can see it's rally pixelated, it's a low frame rate.

Here's where your night -

This is about 200 kilobytes stream here.

-- in unexpected ways.

I think the previous time we went it switched almost instantly, so we weren't able to see it. But the point at which we switch is going to depend on a number of things such as the cadence of the input file -

such as the cadence of the input file

-- and of the I frames that come in.

I can switch now, you can see the higher frame rate. And so this is about a one megabyte stream, it looks a lot better. The other thing about HTTP live streaming is that we have new transport controls. And so if we can bring those up here. So what we have on the left-hand side is a back-30-seconds button.

And so if you click that, you'll drop back 30 seconds into the stream so long as there is enough media in the playlist file. There has to be enough media in the playlist file for us to seek backward 30 seconds. So I want you to do that Bill.

And so we're going to jump back 30 seconds here, we'll see this fellow we saw earlier, and so he's talking about the weather. I think I don't want to hear about damaging storm threat on a Wednesday night, so we're going to click the button the right which is jumping back to live. So we'll click that and now we're back to the live stream. So those are how the transport controls work. I'd like to show you the same thing running on an iPhone.

And so what Bill is going to do now is move away from his Macintosh and go over to his iPhone. So we have an iPhone up here. We're going to tap our little web link, we go to the same web page, and we're going to play the stream. So we start playback here. Chug, chug, chug.

The Big Book of Parenting Solutions

There we are.

The Big Book of Parenting Solutions

Again we're starting with the low byte rate stream here, it's low frame rate and fairly pixilated, and I believe we just switched. And so now we're at the full resolution and the full byte rate. We have the same transport controls on the iPhone, so we tap that. You can see the back-30 and the jump to live. And so we can go back 30 even though we didn't start watching at the 30 second point, at least I trust that we can.

Here we are. Um, how are we doing?

>>ABC News medical editor, Dr. Tim Johnson. Dr. Tim.

Thank you, Tanya.

OK, so we've got the same transport controls as we have on the iPhone.

-- is Dr. Michelle Borba

Great. Thanks, Bill.

-- who is an educational psychologist

OK. Thank you.

[ Applause ]

Thank you very much. So now let's take a look again behind the scenes at what was going on there. Again we started with an audio/video signal. In this case, the AVC news feed was being fed by a satellite to the Yahoo broadcast center which is in Dallas, Texas. The audio/video signal was then transferred again to an Inlet Technologies Spinnaker 7000. This time however, this Spinnaker instead of producing a single MPEG2 transport stream and feeding it to our segmenter on the Xserve, was producing two.

The Spinnaker 7000 has the cappability to encode up to four real-time streams at different resolutions and byte rates at the same time. So for our demo, we set it up to encode two, one at 200 kilobytes and one about 900 kilobytes. And both of those transport streams are then fed over multi-cast UDP across the local network to our XServe which is running two instances of the segmenter application.

And each of which is producing its own list of segments, its own playlist file, and placing them on a web server. Now from there it's the same thing, the web server axis and origin into Akamai, the clients can then download those segment files over a local edge server here in San Francisco.

So one thing you'll notice about that demo there was that I didn't use a custom application, all I had was a browser. It's not actually necessary to have an application to deliver a great live streaming experience to your users. It's great to have an application, you can add new additional features to it, and I'll show you one in a second. But if all you want to do is deliver your live stream and you don't want to get into the business of running an iPhone application, we'd love for you write an iPhone application, but you don't have to. All you need is the server.

So let's take a look at what's necessary to do that. The content deployment story from a server for HTTP live streaming is pretty simple. It breaks down into three steps. The first step is you need to create a continuous MPEG2 transport stream. The second step is you need to divide that MPEG2 transport stream into individual segments and create a playlist file. Once you have those segments in that playlist file, the third step is to simply serve them out over the web.

So let's look at each one of those. To create your MPEG2 transport stream, you need an MPEG2 transport stream that contains H.264 video and AAC audio. One of the benefits of using industry standards like MPEG2 transport streams and H.264 and AAC is that hardware and software, to produce these streams, already exist. And we've been working with the leading vendors of encoding hardware to tune their support for HTTP live streaming. Now these boxes work really, really well, but they're not cheap.

So an alternative is you can use software to create your MPEG2 transport stream. And in fact, there are some Open Source packages that are capable of producing transport streams that contain H.264 and AAC. Less than a week after we publish the specification for HTTP live streaming, some folks in the Open Source community got together to tweaking these packages to work correctly with HTTP live streaming.

So this is fantastic. So there are a number of tools that exist today to produce these MPEG2 transport streams, and we're confident that there will be even more coming soon. In any case, you've got your MPEG2 transport stream, the next step is to segment it. When you decide to segment, you have to make some choices.

The first choice you have to make is, are you going to hold on to all your segments on your server forever or for as long as the event lasts, or are you going to throw them away as new segments are generated? In other words, will your presentation be live or on-demand. The second choice is, you have to choose the target duration. And remember, we had that target-duration tag in the playlist file. The target duration specifies the average duration of each segment that you intend to generate.

Now there's a tradeoff here. A long target duration, or any target duration really, creates an inherent delay between the point at which you on the encoding side receive your AV signal, and the point where your user who is watching at home, is able to see that event occur. And the reason for this is because we can only distribute segment files over HTTP once they're complete.

So let's say you chose a target duration of 20 seconds, that means when you're dealing with a live stream, you're going to have to sit there on the encoding end and accumulate 20 seconds of media before you can place it on your web server and make it available to users. So they're going to have to wait at least 20 seconds before they can see those events.

So long target duration is good. No, long target duration is bad. Short target duration is good. Unfortunately the flip side of this is that the shorter your target durations, the more often the clients will be requesting playlist file updates from your servers, and the more files you'll have in the pipe. This isn't as much of a big deal when you're talking about a few servers playing something off of a local HTTP server. But once you're talking about distributing your stream across the country and you've got a million users, all those little requests really start to add up.

So we've done a lot of testing and we've spoken to some large content delivery networks, and we've ended up with a recommendation that ten seconds produces a kind of nice balance between having kind of a taped delay on the one hand that's not too extreme, it's about 30 seconds it's sort of akin to the delay you would see when watching digital television broadcast over the air, and having a network load that's not too outrageous. So we recommend about ten seconds, but you can play around with it in your own applications and see how it feels. The next decision you need to make is the number of segments that you'll have in your playlist at any given time.

This obviously applies only to a live playlist, for an on-demand playlist you'll have as many segments as there are, as you have duration of event. But for a live playlist file, you need to choose the number of segments. Now you need to have at least three segments in your playlist file at any given time, but you can have more.

And one of the advantages of having more is it allows your clients to seek around. It also allows them to pause for a period of time and resume, and still watch the live presentation and not miss anything. Obviously if they pause for too long, then when they resume they're going to go back to live.

But if you give them a nice long window in which they can seek, then we will also pause for that period of time and they won't lose any of their presentation. So we think that's a pretty nice user experience, and so we recommend that when you create your playlist files, when you set them up, that you set them up to have at least 60 seconds of media in each playlist file and more if you choose.

In any case, once you've decided how you want your playlist file to look, the next step is to create it. And here we have a couple of choices. The first choice is you can use our media stream segmenter. This is a tool that you can download from the iPhone developer center, it's also available on the Snow Leopard seed [phonetic], so there's a couple of different ways to get it.

The way it works is it's a command line tool, it will consume an MPEG2 transport stream either from a Unix pipe or from a UDP socket. And as it consumes the MPEG2 transport stream, it writes out these individual segment files. It will also in the case of a live stream, delete all segment files as they roll off the top of the playlist so your hard disc doesn't fill up.

It can also encrypt the segment files as it writes them out to disc, and finally has an option to produce an audio only stream and one of the MPEG elementary streams from and MPEG2 transport file. So you can use this to provide a low byte rate audio only stream of your stream switch to varying presentation. So this media stream segmenter tool runs on any Macintosh that can run Snow Leopard, so it's pretty flexible.

But we recognize that there are a lot of situations in which you can't run Snow Leopard, you may not even have Macintosh, particularly in a lot of the dedicated live video production workflows. And so the second option is that we have written a complete specification for HTTP live streaming, and we have published it through the IETF as in internet draft.

Now this specification has all you need to know to write your own segmenting tools, and it's not actually that complicated of a job. We've already worked with several folks who've been able to build their own segmenters and they're working great. So this is definitely an option for you if you decide that you don't want to use our segmenter for any reason. Once you have your segments and your playlist file, the final step is to serve them over the web. In some respects this is the most straightforward part of it.

You can use any standard web server, you don't have to add any kind of additional mods or plug-ins or anything like that. The only configuration you have to do is you have to configure the delivery of those playlist files. And there are two things that we recommend you do; the first is to set them up to deliver a MIME type. We recommend that you use application/x-mpeg/url, you can also use audio/mpeg/url instead. You also need to configure the time to deliver those playlist files, and that should be the same as that value that you chose for the target duration.

If you're encrypting your segment files, then you also have to set up some sort of access control system so that only authorized users are able to retrieve those keys over HTTP. And if you're employing a CDN, then you need to transfer those files to your CDN and you can use any of a variety methods for this, you can use something like Akamai's net storage system, you can use FTP, you can use Rsync, whatever your CDN supports.

They're just like any other file at that point. And in fact, one of the advantages we have of using vanilla HTTP to deliver these live streams is that you can use some of the techniques that you already know around HTTP to do some really cool things. Now I'd like to show you an example of one that a vendor that we were working with came up with that we thought was really cool. So imagine for a moment that you have a server and it's streaming say a hockey game, and it's accumulating the stream as an on-demand presentation playlist. So as segmenter generator being added to a playlist it's growing and growing and growing.

You can use server-side scripting to add a feature to your application of being able to watch a highlight clip of that game. And the way it would work would be this: let's say that in addition to producing this on-demand playlist, every time a goal was scored, the server used the new push notification services that are available on iPhone OS 3.0 to send a notification to your application on the iPhone. If your application wanted to display this highlight to the user, it could package up the event ID and the notification into a query parameter of a URL, and then send that to your server with a request to play it.

The server could parse out that query parameter, recognize the event idea as, Oh, this is highlight number three, this is the third goal in the game, it happened 13 minutes and 12 seconds into the game. And it could then dynamically using a server side scripting tool like PHP, it could dynamically assemble the relevant sections from the on-demand playlist and create a highlight playlist that contained only those segments that contained the action around the goal, deliver it back to your application, you could then play that. And so that's really kind of a clever technique to adding a nice feature like highlight clips to your application with just a little bit of server-side scripting.

So to recap content deployment, you first have to create your MPEG2 transport stream, H.264 video, AAC audio. You have to break it up into segments and produce that playlist file, and then you just have to serve it over to the web. So it's pretty straightforward. And we've seen folks get this kind of streaming deployment running in just a small matter of days, and so it's actually pretty straightforward to do. We're at the end.

OK, let me summarize. So using HTTP live streaming, you can now deliver live content to your users on iPhone and on Snow Leopard. You can create multiple byte rates of your streams and multiple qualities so that your users get a good user experience even if they're on a mobile device like the iPhone. We use standard movie formats or standard media formats so that you can use existing hardware and software to produce those streams, some of which you may find you already own.

Because we're using Vanilla [phonetic] HTTP, we can use existing internet infrastructure to scale to extremely high demand and you can do it without breaking the bank, which is important. And finally, we have created this public specification for HTTP live streaming, and it's available through the IETF. And in fact, we've just uploaded a new version of that to the ietf.org, and so if you haven't read or even if you've read it before, I encourage you to download it again and check that out. 1