Broadcasting Live with QuickTime Streaming - WWDC 2001

Digital Media • 39:21

Want your application to send video, audio, and text live over the internet? Find out how with just two pages of code. Take advantage of reliable broadcast, stream recording, and IETF standards-compliant broadcasting.

Speaker: Team Lemur

Unlisted on Apple Developer site

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.

So welcome to session 403, Broadcasting Live with QuickTime Streaming with the extra added bonus of MPEG-4. My name is Ann Jones. I'm a member of the network media group in QuickTime. Today we'll talk a little bit about streaming, and then we'll go more deeply into the broadcast APIs that are new in QuickTime 5. And then we'll have a discussion about MPEG. And I know it's been a long day and this is the last session, but be sure to stay for the MPEG talk because it's really interesting and you'll see some cool MPEG-4 demos.

Today, you'll learn a little bit about QuickTime Streaming and some basic network concepts that are interesting in the streaming realm. You'll learn how to write a broadcast application, and you'll learn some stuff about MPEG-4 and how it relates to QuickTime. Okay, so how has QuickTime and QuickTime Movies been able to be played on the internet?

Well, in QuickTime 3, we introduced Fast Start, and that was good for some things, but not for everything. In QuickTime 4, we introduced streaming, and that was good, but mostly you could receive, and only very limited applications could actually broadcast. In QuickTime 5, we added broadcast APIs so that third parties could write broadcasting applications.

So first, what is Fast Start? Fast Start is basically an HTTP download of a movie. And there are some smarts inside QuickTime that allow you to start to begin playing the movie before the entire movie has been downloaded. Now there's lots of good things you can do with Fast Start, even today.

One good thing is you could host it on any standard HTTP server. Another thing is that even if your connection speed is low, you can watch high quality, high data rate movies. So if you have a slow 56K modem using Fast Start, you can still watch really high quality movie trailers. But Fast Start isn't good for everything.

For instance, since it's basically HTTP download, you can only use it for stored movies. This means that the movie had to live on your hard drive, and if you didn't have enough hard drive space for it, you couldn't really view the movie. And this is a problem with really long movies and really big movies. Also, you could only watch stored content with it. You couldn't watch any live content with it. So to solve some of those problems, we introduced QuickTime streaming.

QuickTime Streaming uses different protocols than FastStart. FastStart uses HTTP. QuickTime Streaming uses RTSP and RTP. These protocols assume that the data arrives in real time or near real time. In trade-off for that, in these protocols you can lose packets, and that results in lesser quality. But some of the advantages of QuickTime Streaming is not only can you view stored data, you can view live events, such as concerts and keynotes and presentations like this.

Also, you can't save the movie, so that means you can watch broadcasts for days at a time without having to worry about your hard drive filling up. Also, a lot of content providers prefer this method over FastStart because that means people can't take their movies and redistribute it without the permission.

So in QuickTime 5.0, we introduced broadcast APIs. These APIs allow you developers, third parties, to write authoring-- to write broadcasting applications. They're meant to give you very basic functionality. If you just want to add basic functionality, send from the audio and video-- basic audio/video capture to your app, then you just have to use very few calls. But if you want a more complicated app or do more things, then that functionality is available, but you have to do a little bit more work.

So streaming is all about getting, you know, sending packets over the network, sending media packets over the network. So before we delve more deeply into the broadcast APIs, let's talk about a few really basic networking concepts that are interesting in streaming. So, you might have wondered, you know, I've heard the words multicast, unicast. What's the difference between them? Why is that interesting to me?

Well, unicast is basically a one-to-one relationship. So, whenever a server is sending a stream to a client, it's sending that stream directly to that one client. So, if three clients are connected to the server, the server is sending out three streams. If a thousand clients are connected to the server, the server is sending out a thousand streams. So, this isn't very efficient bandwidth-wise because you quickly fill up your bandwidth and you can't service very many clients.

Multicast, on the other hand, is a one-to-many relationship. So no matter how many clients are connected to the server, the server is only sending out one stream. If three clients are connected to that server, the server is sending out one stream. If 1,000 clients are connected to the server, the server is still sending out only one stream. This is very efficient bandwidth-wise. And you want to use Multicast whenever possible, but in a lot of situations, you can't. The hardware between the client and the server needs to support Multicast, and that isn't usually the case. on the public internet.

So another basic networking concept that's interesting in streaming is the difference between TCP and UDP. Well, there's lots of differences, but the differences that we care about for streaming is that TCP is a protocol that guarantees the packets arrive on time. I mean, sorry, that's wrong. TCP is a protocol that guarantees that the packets arrive. But when there's congestion, the packets can take a really long time to arrive.

So when you're using protocols that assume that the packets arrive in real time or near real time, this can be a problem, and sometimes the packets then arrive too late. So a lot of the streaming protocols prefer to use UDP. In UDP, the packets will arrive, but when congestion occurs, instead of delaying the packet, the packet just gets dropped, and the client never receives the packet. So then you get data loss, and you get lesser quality.

So what are the different ways that clients, servers, and broadcasters can connect? In the typical video on demand case, a client connects via unicast to a server and receives the data that way. Since this session is mostly focused on broadcasting, we'll go into more detail about the four different ways that broadcasters can connect to servers and clients.

In a unicast broadcast, the broadcaster sends out a set of streams to one client. Now, the broadcaster is not a server, so it can only send out one set of streams. So when you're doing a unicast broadcast, it can only service one client. In a multicast broadcast, the broadcaster still sends out just one stream, but it sends it out to a multicast address.

This situation is similar to a TV station where the TV station is sending out its signal on channel 10. Clients who wish to connect to this broadcast then connect to that multicast address, get the packets, and start viewing the broadcast. And this is similar to you turning on your TV to channel 10 to watch that station.

Now, as I said before, in order to get multicasts, the hardware between the server and the client, or in this case, the broadcaster and the client, has to support multicasts. So, on the public internet, you know, even if the ISPs have the hardware that support multicasts, most ISPs don't turn on that functionality and allow you to get it. So, you know, on the public internet, most people can't get multicasts. So, multicasts is mostly used in corporate, private intranets and also universities and colleges.

Okay, so on the public internet, you'll mostly see broadcasts in this configuration. Here the broadcaster sends a unicast stream to a streaming server. Then clients who wish to connect to that broadcast connect to that streaming server via Unicast. And the server does the job of replicating all the packets for the clients.

If you notice here, the link between the broadcaster and the server is a UDP link. So since UDP is a lossy protocol, packets between the broadcaster and the server can be lost. Any packet that's lost between the broadcaster and the server is, of course, in addition, lost between all the connections between the server and the client viewing that broadcast.

So this is pretty bad. So new in QuickTime 5, we introduced a new mode of connecting. which is called reliable broadcast. Here, the configuration is the same as before, except the link between the broadcaster and the server is a TCP connection. So this solves the problem of losing packets between the broadcaster and the server, but this does require additional bandwidth between the broadcaster and the server.

Okay, when you want to connect to a stream or a broadcast as a client, usually all you need to know is the URL that you want to connect to. When you're producing the broadcast, however, not only do you need to know the connection information such as the URL or the server that you're going to send your broadcast to, you need to know things like how many streams you're going to send, what types of streams you'll send such as audio, video, or text streams. You'll also need to know where the data for that stream comes from. For instance, your video camera, your microphone, or a stored movie.

This information is provided to QuickTime Streaming through SDP or XML. SDP is an IETF standard that's often used in streaming. However, it's confusing for users to look at and understand, and it's a little bit kludgy to extend. So, in QuickTime, we prefer to use XML. The XML import for broadcasting uses the W3C syntax with tags added for broadcasting. And that's the format when we add new functionality to QuickTime broadcasting, that's the format that will allow you to access that.

So let's go over to demo four, where Kevin Marks will show you a broadcasting demo. So here's an XML file used to specify a broadcast. You can see here that the outer structure is a presentation, and if you're familiar with XML, there are tags nested within tags that define sub-pieces. So this is defining a presentation. Within that, it's defining a stream. It's a video stream.

Then it's defining some network information here. There's RTP information here, and there's the multicast address defined here at the top, and the port it's going to go out on. And then there's some additional information down here to define the dimensions of the stream. Information that's not defined in this file, QuickTime will pick defaults for you. So for example, we'll pick the C-Series. We'll pick the sequence grabber as the default capture device, as the default source for the broadcast, and we'll pick a default compressor for you.

So what we have here is an application called Usher, which is our example broadcasting application. And I'm going to open the XML file with that, and this will start a broadcast straight away. And you can see it's broadcasting the furry animals over here. And this is in fact Usher, which the program is named for.

And it's picked some default values for the sequence grabber and for the codecs to start a broadcast for you. So, and it's now going to explain what you have to do to, what you have to call to broadcast such a file. Okay, can we go back to slides?

Okay, so the program that Kevin Marks showed you is a sample code program that we developed, and it will be, it and its sample code will be available for download off of our website. So, what can you do with the broadcast APIs that we publish? You can send audio, video, and text streams. You're not limited to the number of streams except you're limited by the bandwidth that you have and, you know, common sense. So, you can send, you know, two video streams and an audio stream or, you know, two audio streams and a text stream or something like that.

You can, the media that you send can come from various places. Capture devices such as your video camera and your microphone. It can come from a stored movie that you have on your local hard drive, and it can come from someplace that your app, you know, that you implement in your app. These streams are sent via the standard RTSP/RTP protocols, and these streams can be viewed by QuickTime 4.0 clients and above.

Okay, so if you want to write a broadcaster, you know, what does QuickTime provide for you, and what does QuickTime do for you, and what do you need to do? Well, QuickTime will handle most, pretty much all of the networking aspects for you. It handles making connections to the server, making any negotiations necessary on the server. It'll handle taking the media samples and converting them into packets in the correct format. It'll handle if you want to use your video camera or your microphone, it'll handle taking the video and audio from there and sending that out.

Your application, in turn, is responsible for providing all the UI to the user to get the information that we talked about before, such as the number of streams that you want to send, the type of streams, where this data comes from, such as your video camera. It's also responsible for sending that information to QuickTime Streaming.

So how does it send this information to QuickTime Streaming? The main thing that your application will interface with is a QTS presentation. A presentation manages and contains all the streams of your broadcast, such as your audio and video stream. This is very similar on the receive side to a movie, where the movie will contain all the track, you know, will contain and manage the tracks inside the movie, such as your audio and video track.

So how do you make a presentation? Call QTS New Presentation from File and pass in the XML or SDP information. Inside the XML or SDP, you pass in the things we talked about, such as connection information, where that data comes from, how you want it compressed, et cetera.

Once you have a presentation, call QTS Prez Idle on the presentation. And it's then that QuickTime Streaming does most of the work, so you need to call that. And on Mac OS X, you should use the Carbon Event Loop Timer or CF Run Loop. You also should install a notification proc on the presentation.

The notification proc is the way the presentation tells your application about events that happen asynchronously. So, for instance, when the presentation is making a connection to the server, this might take a long time. When the connection either succeeds or fails, your notification proc will get told of the success or failure.

So, once you have a presentation and you've configured it, how do you start the broadcast? Call QTS Prez Pre-Roll on the presentation. It's at that time that the presentation will make the connection to the server, negotiate whatever is necessary with the server, set up any structures that are necessary, set up the sequence grabber, and get it ready for broadcasting. When all that is done, your notification proc will be called with KQTS-Pres Preroll Act with a success or failure. If everything went well, then you can call QTS-Pres Start, and at that time the packets will start flowing from your machine and clients can view your broadcast.

So now you know basically how to make a presentation and start the broadcast. What are some things you can do in your application to make your application interesting and unique? Well, in QuickTime 5, we introduced a new type of component called Sourcer Components. Sourcer Components, the job of the Sourcer Components are to get media data from wherever it is appropriate for that Sourcer Component, and also a sample description describing that media data, and send it off for broadcasting.

So earlier I had talked about grabbing the audio and video from your camera and microphone. To do that, you use a standard sequence grabber sourcer. If you want to send data from a movie that you have stored on your local hard drive, you can use the track sourcer that we provide. And this is similar to a news broadcast where you have a live camera feed, and then they switch to a previously stored segment.

Okay, so you would use the Push Data Sourcer to do all other things. The Push Data Sourcer basically just waits for data from the application, and when it gets its data, it sends it out to be broadcast. Then it's up to your application and your imagination as to where you can get the data. And this is also the way that you would send text and URL tracks. So let's go over to demo two, where Kevin Marks is going to show an interesting application of using the Push Data Sourcer.

So what you're watching here is the client receiving the live broadcast. which is coming from a push data sourcer. What I'm doing is I'm taking data from a movie, taking image data from a movie, decompressing that into a G-World, compressing that data again, and sending that out through the push data sourcer, which does sound rather cumbersome, and why would I do that rather than use the track sourcer? Well, the G-World that I'm using is actually the screen. So I can actually send any part of the screen around to you over there.

So the point of this is that anything you can draw in a GWorld, you can broadcast with QuickTime Streaming. Any text you can generate, you can broadcast with QuickTime Streaming. Or any sound you can generate, you can broadcast with QuickTime Streaming. Anything you can make on the computer as QuickTime media, you can send over the wire, have it received live by someone else. Can we go back to slides?

OK, so in summary, the broadcast APIs are available in QuickTime 5. Please go look at them and try to use them. Do you have interesting ideas that you want to do? Do you want to write a conferencing application? Do you want to write a-- Do you want to write an internet radio station application? Please send us this information and any other feedback to [email protected]. All that email will be personally read by Adam Wells right over there, who will then forward it to the appropriate place.

Also sending email to that address will allow you to be put on a private NDA seed program if you wish to do that. And also participate in a private email list about broadcasting. Sample code and documentation can be found at these URLs from the Apple website, and this slide will be put back up at the end of the program. So here's Shilong to talk about MPEG and MPEG-4.

Thanks for staying. I'm Shi Long, and I have the privilege of managing an awesome team of Kodak guys. This is one of them, Thomas Pun. He's the standards boy from this morning. MPEG-4, let's see. I've got to figure out how to do this here. Okay, I haven't figured this out, but we'll figure it out. Oh, I got you, I got you. So, ooh, that's backwards, isn't it?

So I want to give you a brief technology overview of what we're doing with MPEG-4. MPEG-4 is sort of the newest thing in the standards block to come by, and we think it's pretty cool. So we're going to talk a little bit about standards and how it's fitting with QuickTime. We're going to talk a little bit about the MPEG family, because I'm most interested in it. And then, of course, we want to get to MPEG-4.

So QuickTime and standards. We've been doing standards for a long time. We shipped JPEG with the very first release of QuickTime. And since then we've done quite a bit. And it spans sort of the whole spectrum of functionalities in QuickTime. We've done video, audio streaming, MPEG, you know, and the ever popular Tiff Fax is in there. We like standards.

And we think open standards are the way to go because it allows, it facilitates interoperability between the various communities that form around these things, like the content providers, the deliverers, and the consumers. And we find that the communities that converge around these things outlast any single member of these things, of the communities. So all the effort that you put into building apps around the standards, it's not going to happen. So we're going to be working on that.

So QuickTime and standards. You can be sure that it'll be relevant for a while. So we like it a lot. I'd like to point out that the file format stuff, MPEG-4 file format is actually based on QuickTime. I've spent quite a bit of time doing that. So we're quite proud of it.

MPEG-1, MPEG-2, we're quite familiar with. MPEG-4, I just want to get to that quickly. Each of these standards have a target application and a set of technologies that each of these guys define and standardize. You'll find that as you move from MPEG-1, 2, and all the way to 4, that the target applications and the technologies that are involved start to become bigger and bigger until they go crazy.

I've got to stand here. There we go. MPEG applications. Let me talk about that a little bit. MPEG-1 is probably some sort of de facto standard for enterprise streaming, and internet sort of high rate kind of streaming stuff. It's used very often in video CDs, and of course the MP3 audio stuff that you've all come to love is MPEG-1. MPEG-2 hits a little higher quality, higher bit rate broadcasting, digital TV, DVDs. The technologies in MPEG-2 actually can be used in the market areas for MPEG-1 as well. This is not to say that one is exclusive of the other.

And then MPEG-4 just goes big and crazy. It has just about everything else in there that you want. Internet streaming, consumer electronics, you can read this stuff. Now, the thing is, a lot of people think that MPEG-4 is a crappy video at low rates kind of stuff for streaming only, and that's not true. The technologies in MPEG-4, again, similar to MPEG-2, can be used for any of those other spaces.

Okay, so let's go through some of the technologies that are in there. I failed to mention, but under technologies, MPEG standardizes three basic components. The system layer, the video layer, and the audio layers, or audio components, let's just call them components. So, in video and audio and systems, MPEG-1 is actually the most basic of the three MPEG standards. And video is about 320 by 240, gives you about a megabit of bit rate.

Audio comes in three layers, stereo, and gives you these sort of kilobits. And the three layers get more complex as you go up the layers. And then layer three is, of course, MP3 that we know. Synchronization is very simple. They do... Basically, so the systems layer is just the synchronization and multiplexing.

And this thing, it was meant for local storage type issues. So, you'll see that the system layer doesn't include anything network-related. I guess we're up for demo. On demo three, we're going to show you MPEG-1 running at, I think the video's at 1.2 megabits, and audio's around 192 kilobits, joint stereo. Go for it.

♪ She's back in the atmosphere ♪ ♪ With drops of Jupiter in her hair ♪ Okay, thanks. We've been shipping MPEG-1 with Macintoshes since '96, I think. And then new in QuickTime 5.0, we've got Windows support and also streaming. So that's all good stuff. Yeah, back to slides, please. Thank you.

So MPEG-2 technologies, still sort of the basic set of components that you've recognized from MPEG-1, except everything goes up higher a little bit. You've got audio hitting 5.1 channels, and then audio introduces a new type of coding technology called advanced audio coding. And that's on top of the three layers of audio that you saw in MPEG-1.

Video, everything gets bigger, 720p, 480p. The megabits, the bitrate is like whopping 4 to 15 megabits. That's sort of the target. You can actually change the sizes on any of these things. These are just the target sizes and target bitrates. Adds a number of technologies. It's scalable. It handles internet video. It's very nice.

And then the system layer, on top of synchronization, it puts a data transport layer, so it becomes a little bit more network aware, which is good. And we've got a demo for this, too, I think. Thank you. OK, so on demo three, we've got-- and this is a big two. You'll see this is much bigger. 720p 480. I believe it's at 8 megabits a second. So let's go.

Back in the atmosphere with drops of Jupiter in her hand. So we keep cutting train off, but the experience is much better. And the audio, I think, is layer two audio from MPEG-1. And back to Slice, please. Thanks. So let's go back. MPEG-1 technology, this is the basic three blocks. We've got the basic stuff. MPEG-2 sort of adds this thing. It gets bigger and better. And then you get to MPEG-4. Boom. Right?

To the untrained eye, this may seem confusing and impossible. But let me assure you, to those of us skilled in the art, this stuff is confusing and impossible. So, just a little bit of a... If you go back, see those, for example, say video, that block there? That block gets mapped into...

[Transcript missing]

So the experts understand this.

Oops, this way. And so they've tried to come up with place conformance points. And these are defined in terms of profiles and levels. You might have heard about these. So profiles are subsets of technologies targeted at specific solutions. So some of those-- if you wanted to just stream basic video, then you wouldn't need all that other stuff.

So they've come up with something-- some small set of these things that you just need to implement in order to be compliant with some other set of servers or something like that. The levels now, per profile, the levels indicate the system constraints, like the memory that the decoder has to have, the bandwidth that the system has to sustain, and computational complexity, and things like that.

So-- But when you put a bunch of experts together, you get a whole bunch of opinions. So this is what happened. Just for visual now, we have 10 profiles. And per profile, you've got five levels. So what do you do? You've got a ton of conformance points. And if you're a tiny little company A trying to be compliant, this becomes quite a bit of a problem to deal with.

So MPEG-4 is huge, in spite of all the conformance points. and where do you start Apple has been fortunate enough to team up with a number of other streaming companies to form the Internet Streaming Media Alliance. It's a bunch of guys that want to make sure that MPEG-4 gets deployed quickly and well.

We'd like you to join us if you can. We intend to integrate on a small number of subsets of MPEG-4 plus some IETF standards. I think we're coming along very well. We've got an interesting set of things to work on. You can check us out at www.isma.tv, and we want you to do so.

What is QuickTime doing with MPEG-4? In the short term, we're going to focus on something feasible. Inside the visual component, we're just going to handle frame-based video, probably around 320 by 240, probably going to hit anywhere from 300 kilobits. That seems to be the big thing now, all the way up to maybe two or three megabits. And audio will handle kelp for speech and AAC for general music. It turns out AAC is actually very good technology for general music.

and the system stuff, we've always done IETF streaming. And then, of course, the MP4 file So at this point, we want to show you a number of stuff that we can do in QuickTime with MPEG-4. This is just stuff that we've been playing with. The thing that I want you to get across, I want you to get through this, is that MPEG-4, when we do ship this thing, it's just going to work.

It's part of QuickTime, and it's going to be just as if it were just another codec in there. So go ahead. Let's see. Since this is a broadcasting demo, here's a broadcast of MPEG-4 coming live from that cool titanium power book that's sitting right there. There you go.

Thank you, Kevin. Maybe a little song and dance? Yeah. Thanks. Okay, so -- Ooh, it's about half a, five minutes, we'll talk later. Come talk to me. And let's see, so earlier today you saw a whole bunch of skins, so here you go. We can throw this in a skin as well. So let's try that. This is the awesome power, ooh, the iBook.

So it basically works with a lot of stuff. We can throw this into the player, and all the functionality in the player should map straight onto this stuff. So let's try that. We can flip it around as we're playing it. You can flip it upside down. We can go sideways. Yeah, that's kind of cool. You can change the color on these things as well with our new controls.

You can turn green. Yeah, do green. Green is good. Yeah. So there's a lot of interesting things that you can just pop this stuff right into, and it ought to work. So that's sort of the message. One last thing we want to do before we leave is-- so just to prove that MPEG-4 can be used at the high rates, the quality stuff, I want to show you what we can do at-- this is running at 3 megabits. and sort of comparable to something you might see off with EVD. So go ahead.

It's not bad. And software decode, all good. Can we go back to the slides, please? So in summary, I just want to say that MPEG-4 is going to be a full citizen of QuickTime, which means that you can create and play back MPEG-4 content. And all the tools that you're writing today are going to support this stuff and be able to take advantage of MPEG-4 immediately. We'll be able to do things like export and import video and audio tracks from MPEG-4.

You'll be able to mix those things up with other types of QuickTime media, Flash, wired sprites, QuickTime VR, that kind of stuff. So it's coming. Oh, wow. We have a QuickTime feedback forum tomorrow at 3:30. And then there's some streaming issues that you might want to visit tomorrow also in the morning. It's a little early. But please do that.

And if four sessions of QuickTime hasn't been enough for you, we've got an entire conference on QuickTime for you. So you ought to come. October, and we'll talk about things like authoring and making QuickTime applications. Is that correct? And let me put up this information panel. So that you all can get the information you need. So some of us will be down here, and you can talk to us if you'd like. Thanks.