Large-Scale Webcasting: Macworld Keynote Case Study - WWDC 2003

QuickTime • 53:19

The Macworld keynotes are the largest webcasting events on the Internet. More than 100,000 viewers watched this year's Macworld San Francisco presentation live, and almost half a million watched the replay over the next few weeks. We take a look at the planning, engineering, and execution that takes place behind the scenes with the team from Apple and Akamai. We also cover the lessons that have been learned and how you can take advantage of this experience to help manage your streaming needs.

Speakers: Dennis Backus, Clark Smith, Ryan Lynch, William Weihl

Unlisted on Apple Developer site

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.

This is session number 714, Large-Scale Webcasting: Macworld Keynote Case Study. Our goal today is to try and give you guys an idea of what it takes to be able to execute very large-scale Internet webcasting, live and video on demand. The case we use is the Macworld Keynote, which is arguably one of, if not the largest, live Internet streaming events every time we do one. As a matter of fact, every time we do one, it gets bigger and bigger and bigger.

The reason we do a webcast of the event is because when we have Steve in a hall, he has 5,000 to 6,000 people who are sitting there getting two hours of Steve, a direct marketing message from probably one of the best marketers around. But while there are 5,000 to 6,000 people there in the room, we have over 100,000 unique viewers who are watching via the Internet, live. And then over the next seven days, we have probably more than 500,000 people watching the event live.

So that's over 600,000 people who get to see two hours of Steve Jobs delivering a precise and very precise marketing message about Apple's new products. That's a very powerful thing, as you can imagine. So today to speak about this, we have Clark Smith and Ryan Lynch from the Apple QuickTime Operations Group. These guys are the two that are majorly responsible for doing the keynote preparation and execution.

And in addition, a little bit later on, we'll have Bill Weil, the CTO from Akamai Technologies, here to talk about how Akamai works with us before, during, and then after the keynote to pull one of these things off. So let me turn it over to Ryan, and we'll get started.

One thing, when we have questions, it's really important that you step up to the microphone. We have simultaneous translation happening in the back of the room, and if we don't have a clear version of your voice going over the microphone, they can't translate. Okay? can't translate. Okay, thanks.

Thanks, Dennis. So we're going to talk today about the process behind the keynote and what we do at Apple to prepare for the impending doom of going live on the Internet. So we're going to start with what we do behind the scenes with network requirements and planning for the whole event all the way up to the event and afterwards. And three months ahead of time, before we actually start, we deal with network. And Clark is going to talk a little bit to that point.

In order to provision a network for the kind of load that the keynote would require, you have to plan ahead and decide, well, what other events might be happening that day for Akamai. Akamai is a content distribution network that serves many, many clients. Apple is an important client, obviously.

But if there's a large news event, it's going to be a big one. And if there's a large news event or something that takes place on the same day as our keynote, we need to make sure that there's going to be enough room for us to still get our data out on the Web.

So we try to estimate two months in advance how much data we think we're going to need, and we try to negotiate with them to see how much they can give us. So we go into traffic expectations based on the location that the event is taking place. If it's in San Francisco, Japan is going to be more involved because it's still within their day range.

If it's a European event, Japan's not going to be a player, and probably neither is California. But if it's in New York, there's a good chance that California will come on later in the stream. So based on the amount of data available to us, the bandwidth available to us, and the expectation we have, the amount of promotion we expect is going to happen for that event and so on, we start determining-- if you want to hit that, hit it. Sorry about that. Bit rates. I hit it. You hit it. We both hit it. We start determining our bit rates.

We normally do four bit rates. We do a 28k for audio-only users, which has to run well below 28k because we have to be able to build a buffer up. And then 56k users are a very strong subscriber to our events, so we have to make sure that we actually hit somewhere around-- 37k is our target, somewhere around 37. We even like to get below that sometimes.

And that gives 56k users enough of-- most 56k users aren't getting much over 42 anyway, so we have to make sure that they have some buffer. And then 100k users are generally from Europe and on dual ISDN. And the benefit of 100k also is it gives you a pretty darn good stream, and if we do have to roll down-- we'll talk about how we roll down later-- but if we have to roll down to 100k, you still get a pretty good stream.

And let's see, so the next step is we provision the Akamai entry points and ports, and that's just a process of basically requesting from the edge servers that we're going to begin streaming to specific IP addresses and specific ports. And you're hitting the button, so cool. And then we have a backup plan, and the backup plan is always to have an additional connection that is also as close to flawless as possible so that if somehow something happens to that initial connection or something happens to that entry point, Akamai entry point, we have an immediate solution to rolling over to another live stream that will look as close as possible to identical to the Akamai entry point that we serve as a primary.

So the first thing we do is we start to gather our hardware and software. Because we're working in a co-location facility, which is basically all rack mounted, the XSERVs are actually wonderful for the solution. We have a group of five or six XSERVs right next to, within feet, of the Akamai entry point.

We have an MPEG-2 recorder, which is a digital disc recorder. It gives you a random access playback. So immediately when the event is over and we're given approval, we begin a replay of the event, which is pretty much undetectable from the original. And that continues to loop for a significant period of time until we can get the data posted for video on demand.

[Transcript missing]

So we're going to talk, after we get that all set up, we want to test and make sure that the hardware actually functions. Because on occasion you can have a little, you know, somebody can plug a FireWire plug-in backwards and fry the whole thing. It's kind of fun.

And so we test all of those, make sure we have enough disk space available for the actual event if you're going to actually record a disk. We did a two and a quarter hour program on Monday, and that took roughly 250 megabytes to record that to disk at 250K.

So we did a, it was pretty sizable, and if you have a full event and you don't actually delete anything on your disk, you're going to run into trouble. And then we take those settings that we figured out, okay, we need to target about 37K for 56 and all that, and set those up and make sure, and test, and make sure that they're optimal for the bit rates we want to target, and that they don't swing too much. Because if you swing, then you might max out your bandwidth, and somebody on the network will get a really crappy experience, and you don't want that to happen.

And we take these settings, and we copy them to every single machine so they have an identical setup on every box, just in case we need to pull one out or something like that. And we, like we had on the previous graphic, there is a spare machine sitting, standing by, just in case we have to do something drastic.

Then we take the team. So we have everything set up, we've got our settings, we're ready to go. We need to get people together about three months ahead of time and start talking about what's happening so that we're ready to go for the day. We have Akamai Engineering who's provisioning the network and taking care of everything to make sure that we're ready, and we have our 16 gigabits or whatever we're going to use for the event.

And then we have QuickTime Engineering team. So sometimes we'll run into a little funny bug or something and we wonder what's going on with this. So we have them kind of do some packet tracing and make sure that everything's functioning correctly for us. And then we have our, the PR team is actually really important. They're the ones who give us the final go to say we can actually use VOD and be ready.

Because if we show something we don't have legal clearance for, we don't want anybody coming back at us and saying, "Hey, no, no." It wouldn't be too fun to get into legal over that. And then we go down to the television contractors. We have them actually on site recording the videos so that when we have a tape we can go back and do an encode. after the fact.

So then we have the preparation step. We have everything ready to go. We need to export SDP files from the QuickTime broadcaster. And these SDP files, Session Description Protocol, basically tell the QuickTime player where to go to get the stream, how do I get the stream, and everything about it.

Since we're using Akamai, we have something called ARLs, Akamaized Resource Locators. They're essentially URLs that point to that content, the STP files, and have a little metadata along with it. It helps Akamai figure out how to get the stream and how to deliver it best to you. And we have, of course, reference movies.

Now, reference movies are kind of interesting. They allow you to have one link and point to as many other movies as you want. So you can basically, what we do is we filter based upon bit rate. So if you claim that you have a 300K connection and you can get that, we say, "Good," and we deliver the 300K to you. So that's all based upon your configurations of your machine. So what we do is we have a web page. So we have a web page that has all the links, or It has one link for all, one ref movie for all of the links.

And that ref movie points to, lo and behold, four different bit rates. And based upon your settings, we deliver that to you. So we have the ARL that says, "Here you go." And it hands you the stream, and you've got that. You notice at the bottom that we have a spliced ref movie. And what that means is we have a graphic that we actually paste on top of the stream so that there's a visual experience for the 28K people. Because, you know, you've got to provide something. You've got to make it somewhat aesthetically pleasing.

And then this comes into the fact that we have the velvet rope system, which is first point on the Akama network. And what that does is, you know, we have hard stop at some bandwidth limit. So for a typical keynote, we have maybe 16 gigabits. And that's all we have.

And if we go beyond that, like I said before, we're going to have some really terrible experiences for everybody across the network. So when we, in a particular region, if we're ramping too quickly, we have too much bandwidth being consumed too quickly, we want to make sure that we can cut that off and we slip down, back down to a 100k stream as the top bit rate available, or a 56k, depending upon how many people we have on the network. And of course, those webpages point to ref movies and again, point to individual STP files to grab the stream. So do you change the webpages? Could you speak into the mic, please?

Yes, essentially that's what Akamai- what the First Point system does is it swaps out which webpages deliver to which region. Okay, I was just curious whether it's the reference movie you swap or the webpage. Both. Oh, you change both the webpage and the- well, the webpage points to a different reference movie. Yes. Those first four webpages were already made. Yes. So you just change the- or you change it in the backend. Yes. That's all we're doing is swapping out the webpage essentially, and each one has has a different link on it to a different rough movie.

Now if you are like Apple and you have a whole bunch of zealots on campus, and they all want to watch this stream at the same time, you don't want to consume a ton of bandwidth on your local network. So if you have a 600K stream, and you have 100 people wanting to watch, that's 60 megabits that they're going to be pulling down across your pipe. So you're not going to be able to provide anybody outside of your company anything.

And that's going to be a really terrible experience. So what we do at Apple is we have an internal web page. Again, using the Akamai First Point system, we say anybody that's coming from our network gets routed to a specific web page for internal traffic. And that has a link for a multicast.

And in short, what a multicast does is it takes one stream and sends it out across the network. So if you have 100 people watching it, you have 600K bandwidth. If you have one person watching it, you have 600K. So it's a great way to conserve bandwidth on a local network.

And now I'm gonna turn it back to Clark to talk about testing. So we test in a lot of different levels throughout the process, but there are specific things that can go wrong. If you're creating all these little files and you're moving things around so many times, something can go wrong. An example would be annotations.

Sometimes the annotations change somehow, and those are actually generated in the SDP file, and they're--so at the very beginning of the chain, if that SDP file has the wrong annotation in it or a wrong copyright or something, that can translate throughout the entire-- throughout the entire group of reference movies that we just outlined.

We're very attentive to data rates and swings. I spend hours staring at Command-J, you know, the get info or the get properties function in QuickTime Pro, because I can watch how full the buffer is, whether the buffer is seeing any kind of sawtoothing from some kind of network interference.

I can see whether or not there's any packet loss, but I can actually see where the packet loss is taking place. Sometimes you can't tell why there's something wrong with a stream, but you can actually visually see why that--you know, why you're dropping audio at certain times or why there--sometimes the video will come across absolutely perfectly, but because there's a sawtooth in the buffer, it'll drop actually just a moment of audio, and you're kind of wondering why--where that's coming about.

A lot of it's very visual. The importance of swing is if you're provisioned for a certain number of gigabits, and you have, you know, 10 or 15 or 80,000 different viewers watching, and you have a certain number of people watching, and you have a slight swing from a camera zooming in to Steve or Pan on the stage where almost every pixel is changing, it can be a very dramatic swing in Akamai's data rate.

So the higher the data rate that you're playing with, the higher the percentage, so it's gonna be obviously much more reflected in a 300K versus a 56K, but it's extremely important that you pay attention to where that median point is and how far you're allowing it. The reference movies all have to point to the right things.

So I saw just a moment ago in Ryan's diagram how each reference movie points to a different data rate. One of the best ways to test that is actually go into your QuickTime preferences and just keep changing your data rates down in the user-defined connection speed portion of your preferences, and just make sure that the right data rate comes up for that one.

So we go through this over and over again, 'cause sometimes we make reference movies more than once, and so I--especially as you're getting closer and you're hurrying to the reference movies more and more, you really have to be very attentive to what mistakes you might make. The splice graphic is another potential problem in that if you--when you first create a splice graphic using one of our tools, sometimes it's not layered exactly properly, so you have to go into the properties and actually set the layering for that image. If the image is layered incorrectly, you'll actually get a little cue, and the splice will actually be behind, and being that that's all tied into all the other reference movies, it's just critical that you test it.

And last of all is packet loss. Different places will get different results. Even though we have such a wonderful content distribution provider, there are certain places that will get different results, and so what we try to do is we try to call people. We try to have some contact with people in New York and other places and see how they're doing.

All they have to do is open the properties that I was just talking about and tell me, you know, what's the percentage of packet loss that you're seeing? Are you having any kind of dropping issues? Are you able to get the stream whenever you want it? Those are all the kinds of tests that we make certain that we-- were very attentive to.

The day of the event, we established a conference call with many of the members of the team that Ryan outlined, but predominantly Akamai. And they'll be standing by and paying attention to their network before we actually push anything live. I'm also on the phone with our encoding partner at the satellite site.

And I sit there and I watch all of his streams. And then at about half an hour before the event, we restart all the broadcasters just to make sure there aren't any memory leaks or anything. I mean, when you're dealing with such a high profile event, even though it probably doesn't make any difference, you do a lot of little things that you just don't want to take any risks. So we restart the broadcasters.

And then shortly before the event, we roll the MPEG-2 and a backup Betacam SP machine. We're glad we did that this last event, because there was a disruption in the NTSC signal into the MPEG-2 machine and had that disruption. That disruption was long enough to actually stop the disk recorder. And that was catastrophic, because we had no backup to go to.

So what we ended up doing is using the Betacam SP through the exact same proc amp that we were talking about before. And at the same time, we recorded to the MPEG-2 machine. And no one was the wiser. It was just absolutely seamless. But had we not had just yet another layer of backup, we wouldn't have been in a position to make ourselves look OK. Yes?

"What broadcaster are you using? Are you using the old Sorenson or is it the new MPEG-4 from Apple?" "Well, the broadcaster product is Broadcaster 1.01. It's QuickTime Broadcaster. It's free on our site." "Right, but you're using yours now. In the past you were using Sorenson's." "No, absolutely. Yes, we're using ours and we're using MPEG-4's, as you mentioned, and AAC for audio." AAC is just a wonderful audio codec, and for streaming MPEG-4, it's just a delight to work with, really.

And we've come from a chain of codecs beforehand, so believe me, we have the scar tissue. So then the actual term, pushing the event live, takes place. We actually, when we start to see pans of the audience, that means that our television content provider from the location is likely to stay on programming from that point on.

So when we start seeing that with the lower third, I call our web team, and I tell our web team to go ahead and push the page live. And then the excitement really begins, because there are hundreds of people, thousands of people out there just waiting to get on. They want to be the first ones on, because they think they're going to get the highest data rate if they're the first ones.

And to a degree, they're correct for most events. So they jump on, and you just see the ramping. We have our own monitoring tool, but so does Akamai back at their NOC. And you can just see the ramping of people getting on. And it's very exciting, really, because you have to know when to pull off. You know what the highest point is, and you have to make sure that everybody's going to be able to get on.

It's stressing their network to death. It's a wonderful test, but at the same time, it's frightening. So there's a point when we say, OK, we're going to roll off some region that might be particularly stressed, like maybe Australia, which seems to be first. But there are certain areas that we sometimes have to roll off, and then we'll come back to them.

We'll come back to them after things moderate a little bit. We'll come back to the 300K. So if you ever have an experience where you come on and you get 100K, and for some reason leave and come back, you 300K if you have the bandwidth provided to you.

So we watch the consumption, and then the last thing is actually probably the most fun for me, is we report the numbers. And there's usually someone, usually Dennis, underneath the stage, or in the back of the stage, and just before Steve goes on, someone will ask, you know, "What are our numbers?

How many people?" You know, and last time, I think, if any of you watched, the Vatican was on. And that was just so thrilling to have Steve go out and say, "Yeah, and even the Vatican's watching. You know, you guys are our friends." So those kinds of little reports are just, they just add a little something. They add a little realism to the keynote, and we all benefit from that. So I think now we're going to bring on the great and good Bill Weil of Akamai. Bill Weil: Post event.

Oh, post event. I'm sorry. I got a post event. Sorry. Hold on. Bill Weil: That's all right. Post- I'm sorry. So then we do the rebroadcast. I spoke about that earlier. We hit that MPEG-2 machine, and we start playing back as soon as we're told that we're going to do so from Apple PR.

And at the same time, a tape is being rushed to our encoding partner, who will capture that entire tape and do a pre-process on it, and probably within 12 hours have content for us to start posting, which I'll do later when that data starts to arrive. And then we slowly start replacing the links that were originally being posted to on that web page, the same reference movies.

So the VOD is posted. The pages are changed so we're no longer doing the velvet rope. And then we--the last thing is usually days later, we do an analysis and a reporting to determine exactly how well we did and how many people came and whether people had a good experience. We're very keen and interested in that. So, Bill, now you're up. Sorry about that. No problem. Thanks.

So I want to tell you a little bit about what happens once the bits get handed off to us. And I'm going to start by telling you A little bit about who we are and what we do in terms of webcasting, both large and small scale. Apple, I think, as these guys said, put on some of the largest live events on the Internet in terms of the amount of traffic, number of viewers, and so on.

But we do events from small to large. We do on demand as well as live. So I'll tell you a bit about that. I'll talk about the partnership between Apple and Akamai in terms of webcasting and other things. And then a little bit more detail on the keynote itself.

So today, what is Akamai? What do we let you do? What do we do for our customers? Fundamentally, what we do is allow our customers to extend their e-business infrastructure, web and streaming, out to the edge of the Internet, close to the users. This gives better performance, better reliability, better scalability, in many cases better security, and to gain greater control over that infrastructure and over the delivery of the applications, the content, and so on across the Internet. Today, if you're delivering content over the web or via streaming, you can control what you do in your data center, you can control your first mile.

But at that point, your control ends, and from there to the end user, there's a collection of networks that are going to take your bits and transport them. You have no control over that. If someone out there screws up, if UUNED's backbone goes down, if the slammer worm hits and there's chaos across the Internet, there's nothing you can do about that.

We give a lot of control all the way out to the edge, very, very close to the end user. We are the leading delivery service for streaming web content and web applications, and the real value is improving in the end the user experience of the people who are watching the streams or accessing your applications, your content over the web, and often at lower cost to the provider.

We've moved from what we did in the early days, which you could think of as simply static content delivery, we've moved to really now doing distributed computing. So when I talk about applications, I'm really talking about the ability for one of our customers, people we used to call content providers, but it's not just content.

People are doing business on the net, and they're reaching out to their users with an application, be it a configurator for a car or an e-commerce site or any other kind of interactive application on a site. We're providing the ability for pieces of those applications to run on the edge close to the end user, which allows you to give sub-second response time to users around the world, wherever they are, regardless of what's going on in the Internet.

We have almost 1,000 recurring customers today. We've got about $145 million in annual revenue. We've survived the dot-com boom and bust. And spent the last two years building a very solid customer base in terms of large enterprise customers as well as smaller companies. And I think we're well poised for growth over the next few years. And we also have a lot of intellectual properties surrounding the way we deliver content and applications over the net.

[Transcript missing]

The other major thing that we give to our customers is a great deal of information about what is happening both on the Internet in general and to their content, their streams, their web applications across the Internet. So information, for example, about the number of streams that are being delivered, the number of streams at different bit rates, the total amount of bandwidth, furthermore broken down by geographical area. So you can see how many people are watching in Australia, how much bandwidth are we pushing in Australia, and so on.

Just to say a little bit more about our platform, as I said, we've got over 15,000 servers. We're in over 1,100 different networks. And this is really a range of networks across the board, from hosting and access providers to companies that provide data center space, colo space, and connectivity to companies that are providing sites and streams and so on, access providers that are providing dial-up or broadband or other access to end users, as well as tier one backbones and, of course, more and more broadband for access of one form or another. So we are in the same network as--have machines in the same network as on the order of 70 to 80% of the end users on the Internet.

Which means that if a user wants to get a stream, there's one of our machines very close by that can serve that stream to that user. Okay, same thing for web content or for web applications. And that's been one of the basic premises of our company from the beginning is that it's vital to be near the end user. In terms of performance and reliability, and in terms of the scalability of the system, that is as there are more users, as there are more eyeballs, we'll have more machines near those users. And the system as a whole will scale with the user base.

Okay, so let me talk a little bit about how live streaming works. The stuff on the left, Ryan and Clark have talked about, The actual video signal needs to be captured, and it needs to be sent to an encoder, which is going to produce then a digital stream in a certain bit rate.

That stream is then sent. So first you get it from the camera through satellite and other mechanisms through the encoder. From there, it goes to what we call an entry point. Now we've got a stream of packets entering our network, the Akamai cloud, represented here with the four circles. The entry points themselves actually are fault-tolerant, so there are mechanisms, as they discussed, to allow that path to fail over to a different one should that entry point, for example, go down.

Or, in fact, it might still be up, but the path between it and the encoder might be congested. Might have been fine when you started, but at a certain point, the quality there degrades. So you want to make sure you're talking to an entry point where you can send packets with very minimal loss.

From there, we send that stream to a set of what we call reflectors. We have many of these scattered around the Internet, typically in major backbones with very good connectivity. So the stream is basically being replicated. Now I'm talking for live streaming. So this is for a keynote.

You've got one packet stream going from the encoder to the entry point, and then from there, that same stream is being replicated. Essentially, a separate unicast to each of those reflectors. We can't use network-level multicasts because these aren't on the same network. And you can't do multicasts really across the Internet in that way. You can think of this as an overlay multicast, if you will.

The idea with the reflectors is then from there, if a user wants a stream, he'll contact an edge server. And let's say, in this case, it's the middle one. He'll say, I want to get the Steve Jobs keynote. And the ARL that was mentioned tells us, when the user hands us that, tells us how to actually get that, what the appropriate port is and so on, to make sure we get the right stream from the right entry point.

He'll say, I want the Steve Jobs keynote. That edge server then subscribes to that stream from one or more reflectors, and packets will start to flow. Now, you might have a situation where packets start to flow, but then some of them get lost. Because the Internet, while it's amazingly reliable for such a large and essentially decentralized system in terms of how it's managed, packet loss happens all the time.

Congestion happens, it appears, and then disappears. How long it lasts depends on the stream. And then, of course, the edge server starts to pull from the length of the flows that go through a congested link. So, in this case, we're showing those packets all grayed out because, in fact, those four packets got lost.

The edge server, if there's congestion when it's pulling from a single reflector, will then start to pull from more. And it will pull from enough to guarantee that it gets a complete copy of the stream. So, in this case, it was seeing congestion on the initial stream that it got from one reflector. It subscribes to the same stream from another reflector.

It manages to get some of the packets, but still not all. So, it will subscribe to yet another. So, we will pull multiple copies to an edge server as needed to guarantee that that edge server gets a complete copy of the stream. In the early days of the system, we just sent multiple copies to every edge server, doing sort of blind-forward error correction. But, as you can imagine, this is expensive. And we've built a system now that is much more adaptive and responds to... the conditions of the network between the reflectors and the edge servers to do that adaptively and only pull as many copies as are needed.

The other thing I should say about this process is that one of the key things that's not really shown here is-- and then the end user gets the stream from that edge server and gets a very high quality stream. One of the key things that isn't shown here is the mapping process that decides which edge server an end user will talk to to pull the stream.

And that is one of the key pieces of technology that our system is built on, both on the web side and the streaming side, to monitor the entire Internet on a real-time basis and then make mapping decisions every 10 seconds that determine, among other things, for a given end user, when he wants some piece of content, be it a stream or web content, which edge server he should talk to.

Okay, and we choose an edge server that is lightly loaded, that's likely to have the content, that's up, that's always a good thing, and where the path between the end user and that server is uncongested, can deliver high quality. Because the goal is to deliver that content, whatever sort it is, to that user quickly and reliably.

A couple other things I want to mention here about what we do to ensure quality and good performance and so on, we do something that we call pre-bursting. So you could imagine that when a user connects to an Edge server and says, "I want the Steve Jobs keynote," well, a subscription goes from there to one of the reflector nodes, or perhaps more than one, and if those reflector nodes are not currently getting the stream from the entry point, they'll subscribe from there. And then the stream will start to flow.

In a normal situation, you might simply start to send the stream at the speed, at the data rate of that stream. So if it's 300K, you start sending packets paced at approximately 300K, or whatever the actual bit rate of the stream is. Which means there's going to be significant latency until the user has built up a buffer in the player before it actually starts playing.

What we do across the network, which fits well with the instant on feature of the current QuickTime system, is we do pre-bursting from the entry point through the set reflectors all the way to the edge machine. So when an edge machine or a set reflector subscribes to a stream, we will send the data at eight times the actual bit rate to build up a buffer very quickly close to the user. And then the player itself does that from the edge server to what will pull as fast as it can to fill up the buffer initially, and then from there will pull at more or less the normal bit rate.

The other thing that we do to try to ensure high quality is you might think that the best thing to do in terms of mapping a user is simply to map a user to the nearest edge server that has good connectivity between it and the user. In fact, because of the characteristics of the stream servers that run on the edge servers, and the dynamics of building a system like this and running many different streams, many different kinds of content over it, you can get much better quality by being much more judicious about what streams you serve from where. We use something we call block maps, which rather than essentially spreading the load for a given stream, in some sense, over the whole network, we will restrict the regions that it is mapped to.

When I say region, I mean data center. We will restrict it somewhat so that, for example, we would rather serve the same stream out of a small number of servers and some other stream out of other servers than just mix them all at once. over the place. We can get better quality that way.

We then monitor the whole system on a regular basis, I mean continuously, certainly for an event like this, but in general we monitor the whole system to watch a number of different metrics on the performance and quality of the stream. For example, what's the actual bit rate in terms of packets that are delivered on time? Because you can deliver a packet, but if it's too late and the player throws it away because it's arrived too late, then it's not useful. The number of packets, the percentage of packets that are delivered on time to the player in the end.

That's an important metric. How much thinning takes place between the server and the end user is another important metric. Because the user might have a 300k stream, but there might be enough congestion on that path that it starts thinning and actually delivering a much lower bit rate stream.

How long does it take to connect and get started is another metric. You want people when they connect to not sit there for 20 or 30 seconds looking at something and waiting for the stream to start. You want it to start in a few seconds, 3 or 4 or 5 seconds at most if possible.

And then how much rebuffering and other things like that take place during the playing of the stream. And all of the different technologies that we put in place in the network are derived from, really in part, measuring all of those metrics and trying to understand when there are issues, when there are problems, what's causing those, and then developing mechanisms like pre-bursting, like block maps that will allow us to get much, much better quality and continue to deliver a high quality experience to the end user.

Okay, so let me talk about Apple and Akamai. Apple has been one of our major customers since the early days of the company. The company actually started in 1998. Our first paying customer was early 1999, and Apple has been a major customer. In fact, it was an investor in the early days as well. Akamai is Apple's platform for the QuickTime streaming network. We've done, since 1999, over 1,000 live and on-demand events, and we've done nine live Steve Jobs keynote addresses. Those have been, every time, the biggest event to date on the Internet.

Steve is the rock star of the Internet. I think there's not much question about that. There are movie trailers, Lord of the Rings, and many others. There's Tomb Raider and lots of others. You can go look and see what's on the QuickTime streaming network. Those streams are coming over us.

We also do a number of other things for Apple. We are delivering the iTunes music store, both the actual music downloads, so when you download a song, that's coming from our network, but also a lot of the data and control information that is used to make decisions about playlists and other things. Movie trailer downloads, software updates also come through us. We're providing web analytics services, so reporting on what's happening on the site and what's happening in different parts of the world to allow the marketing folks and other people to make decisions about what to do.

And then geolocation services. For example, for iTunes, there are, I think, contractual obligations in terms of where you're going to be able to get your music. Where in the world you have to be if you want to actually download those songs. And we help ensure that those contractual obligations are met.

So let me talk a little bit about the keynote itself, what happens both before the event and day of, and I'll say a little bit about afterwards as well. We ourselves, for an event of this scale, this is not just a normal event. As I said, most people, most of our customers don't do events on anything approaching this scale, and they run events all the time and don't even let us know in advance that it's happening. Large events, a gigabit, several gigabits, or in the case of Steve, 16 gigabits, we need to know about because we don't have infinite capacity. So, well in advance of the event, we internally develop a project plan.

Who are all the people that are going to be involved from engineering to various support groups to the network groups to make sure that we have adequate capacity in all the different regions of the world that it's needed. Do capacity planning, understand what capacity we have on the network for serving QuickTime and where is it. Furthermore, what else is going to be happening on the network? Some of that is hard to predict.

Two months in advance, you can't always say when we might be at war or when some major event might happen. So, we obviously have a certain amount of headroom in terms of capacity in our network and are prepared to deal with fairly significant bursts. But 16 gigabits for Steve and another who knows what for a war and so on can certainly add some stress. And so, we want to do as much planning for that as we can.

We then talk about and talk with the Apple folks about what is needed in terms of velvet rope. And the idea of a velvet rope is basically to limit the total amount of bandwidth that is used. Think of it as a velvet rope around some fancy event, and you've got to be inside the rope to take part. Now, there are many ways to think about velvet rope. You could imagine simply saying, "I'll let people come in, grab whatever bitrate stream they can get, and then when I hit my limit, I turn them off."

Or you could say, well, I'll just, you know, I won't provide a very high bitrate stream because that way I can let as many people in as possible. But if you don't know how many people are going to come and you've provisioned for a certain amount of capacity, what you'd like to do is let as many people in as you can, or this is certainly what Apple wants to do, let as many people in as you can and give as many of them as possible the highest bitrate they can get.

So the idea is that we start out with... All of the bit rates being available to everyone, and we watch the ramp, and then there's a decision point when we hit certain thresholds to decide to clamp down and not provide access to the higher bit rates, depending on how much of the available capacity is left. The interesting thing is that that decision is made not just globally.

You don't want to say, "Oh, you know, we've got 16 gig globally, and gee, we're using 12 gig, so nobody should have access to the 300k stream," but rather it's done on a regional basis. And the reason for that is that we want to serve people to give them good quality.

We want to serve them from reasonably close by. So if we just made a decision globally, then it might be that, in fact, we're not going to be able to do that. We want to make sure that, in fact, the servers and the network links in Australia from our servers are maxed out.

[Transcript missing]

So we need to determine the need for Velvet Rope and what the thresholds are going to be. We need to provision First Point to do the load balancing and the load management. And then for an event like this that is so high profile and so critical and where failure, I mean, as I said, if Steve's unhappy, you can imagine what happens. You don't want to fail.

Okay? It's just not acceptable. You don't want to fail even for, you know, 30 seconds. Having things drop out for 30 seconds would be a disaster. Okay? And not to say that we want to fail for other customers. Obviously, we don't. But as I said, for an event like this, Apple cares enough about what happens with the event, the profile is sufficiently high that they are willing to invest in a level of testing and a level of attention that's paid to it.

To just make sure that any contingency that comes up will be covered. So there's a lot of testing that goes on before the event end to end to make sure we can capture a signal, send it to the entry points, fail over the entry points, send it through the network to the edge servers, and then get it with high quality to users around the world.

The day of the event, we provide, first of all, automated network management. Our entire network, those 15,000 servers, the entry points, the reflectors, the edge servers that actually run the QuickTime server, is remarkably self-managing. Our NOC, on a normal basis, has on the order of four people sitting in it watching the whole network.

And that's because we have a very extensive automated system for monitoring lots of different aspects of what's happening on every machine and what's happening on the network paths between machines and between them and end users. And we have automated failover at a number of different levels of the system so that when a machine fails or a region fails, a data center goes offline or connectivity is disrupted, there's automatic failover and remapping so that very few users will see any impact at all from that kind of event. But there are times when something goes wrong that needs attention, and so we do have people who are actually actively watching.

So our NOC monitors on a daily basis constantly the whole network, and for an event like this of this magnitude and this importance, then they are also specifically watching what's going on with this event. And in addition, we set up a situation room for an event like this, where the team that has been assembled to put together the event and run it is in that room watching the network, watching what's going on, and making sure that if there are any anomalies that crop up, that they get fixed very, very quickly, often before any end user notices an impact.

So what do you get from all this? Well, in January of this year, There's a lot of expertise that we have for delivering these kinds of events. We provided 100% availability. We provided over 12 gigabits per second peak delivery to almost 80,000 concurrent users. I think the total number of users during the event was on the order of 100,000. At the same time, maintaining a high-level service, high quality to all our other customers as well. And then in addition, we provide real-time and, after the fact, historical reporting on what's going on with the event and on the traffic that's being served.