MPEG-4 Demystified - WWDC 2003

QuickTime • 1:05:18

MPEG-4 is revolutionizing Internet media by providing a robust and high-quality standard for content creation, delivery, and consumption. MPEG-4 is an all-encompassing specification, with many profiles and technologies that cover the entire spectrum of digital media. This session provides an in-depth look at the MPEG-4 specification, and explain how these technologies benefit you.

Speakers: Aimee Nugent, Rob Koenen

Unlisted on Apple Developer site

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.

Well, hello. Welcome. Thanks for coming. This is session 702, MPEG-4 Demystified, as part of the QuickTime track that we're very happy to have this year at WWDC. I'm Aimee Nugent. I work on the QuickTime product marketing team, and I'll be your host for this session. You probably just came from the QuickTime State of the Union presentation, and there you've learned where standards really play a very important part in the strategy for Apple, as well as QuickTime.

It's my honor to introduce to you Rob Koenen, who briefly talked in the State of the Union, president of the MPEG-4 industry forum, as well as many other jobs. He will go through the MPEG-4 specification. It's a very vast and deep specification that is capable of many things. I will leave you in the very capable hands of Rob, and hope you have a good session. Rob, thank you very much. Aimee?

I think we got to do one of two things first. Either all of you flip over your iMacs, or we put on some light in the room, because it's really very dark. I can't see anyone. So, here are your iBooks. There's not enough iBooks to go. Can we have some light in the room? Is that possible?

Yes, I think you can still see the screen, right? And the screen is much more important than I am, so allow me to take off this. Good morning, everybody. It is my pleasure and honor to be able to talk to you today and to explain the high-level concepts of MPEG-4.

I think I will be able to demystify some of it, maybe not all of it. Notably, one thing I have to say up front is about the licensing. Some people of you have heard about licensing. You should come back on Thursday morning when there is someone here to explain just that. I will say a few words though.

And if during this talk you have any questions, I don't mind being interrupted. Just raise your hand. It would be good if you used a mic for a question, because this is being translated simultaneously into Japanese, and the translators can only hear you if you use the mic. If that's difficult, just shout out your question, I'll repeat it for the translators.

I have this neat little device. I've been told not to press this button, because then everything will go wrong. Because when they give you neat little devices with buttons that you're not supposed to press... Thank you for coming. This is what I would like to address today. What is MPEG-4? How does it work?

What are the recent and interesting developments? Why should you use MPEG-4? This is just a bunch of business talks, so I'll go over that quickly, because you're all developers. I'll tell a little bit about the deployments of MPEG-4, and then just a few words about M4AF, or the MPEG-4 Industry Forum, which is an advocacy group for MPEG-4. So, let's start with the basics. What is MPEG-4? I am today not going to give you… What's happening to Baker here?

I got the message. What is MPEG-4? So I'm not going to give you the gory details of the video codec, or the audio codec, or any of the system codecs, but a high-level, functional overview of what MPEG-4 is and what it does. First, it's what we like to think of as the media standard. And we call it that way because it's a standard that works across all devices, all networks, all carriers, all everything, basically.

And that's why we also call it the interoperable cross-platform ecosystem. While it's interoperable, it's also competitive, and I will tell you why it's competitive. Because just that you have a standard doesn't take away all the competition there is. On the contrary, actually, that creates a lot of opportunities for competition.

Most people know MPEG-4 as a video codec, and Apple likes to talk about MPEG-4 video and AAC. AAC is actually also MPEG-4 or MPEG-2, Advanced Audio Coding. And it's also a systems layer, a part of which is the file format, which was based on QuickTime, as Frank Casanova explained this morning. It goes way beyond audio and video, although audio and video are the first elements that will see deployment. And it supports stuff that's way beyond that, and I will tell you a bit about that. And it's designed for all multimedia platforms, digital ones.

So where does it come from? Most of you will know So who knows MPEG-3 here? Anyone know MPEG-3? Some people still do. MPEG-3 doesn't exist. Because what you know as MPEG-3 or MP3 is actually MPEG-1 audio. But MPEG-1 had three layers of audio. Layer 1 was really simple. Layer 2 is what's used in most of the digital broadcasting systems today, in Europe at least. In America, it's Dolby. And Layer 3 is the most complicated of MPEG-1 audio layers. That was way beyond what could be implemented when it was designed, but is now just the norm.

And then you may know MPEG-2, which is a standard for digital television and for DVD, video, and audio also in Europe. Again, in America, it's more Dolby Audio. Then there are MPEG-7 and MPEG-21, which are not successors to MPEG-4 or MPEG-2. MPEG-7 is sort of a metadata standard that allows you to describe content.

MPEG-21 is a very fuzzy phrase. It's a framework for interoperable use and exchange of digital media. It does all sorts of stuff that has to do with where you find content, what's a unique identification for content, where do you find the rights to content. And it attempts to standardize elements of digital rights management, which is more of a challenge, I can tell you, than standardizing just a video codec, which is hard enough as it is. Thank you.

So let's talk a little bit about the MPEG-4 vision. And this is a vision that's been with us for a long time, actually since the middle of the 1990s. And it's coming on close, and I've been working on MPEG-4 for 10 years almost. That will be next year.

The vision, and back in the, if you remember, back in the early 90s, or halfway the 90s, there was all this talk about convergence. And everything was going to be the same. And we're all going to have glass or fiber into our living rooms. And the big discussion was, okay, are we going to consume our content on the PC, or are we going to consume the content on the television? And back then we said, that's all a load of bullshit. There's not going to be, I'm not sure that can be translated, by the way.

That wasn't quite right. There was, rather than convergence, we saw proliferation of multimedia. Rather than less networks, we got more. We got all these sorts of different mobile networks. We got the normal telephone network, we got the digital telephone network, ISDN, which has had quite a bit of adoption in Japan and Europe, not as much here. We got ADSL, we got cable, we got stuff coming to us through satellites.

And it's going to be a chaos. And rather than just this convergent terminal, it's going to be either the PC or the television, another load of bull. We're going to have a lot of different terminals, handheld devices, phones, more PCs, different PCs, set-top boxes, and they're all going to want to do digital multimedia.

And back then, the way to do standardization was new network, new standard, new stack. Communication protocols, codecs, everything anew, especially in the communications world. And we said that doesn't make any sense. We have to have one layer of content representation that works across all of these different applications is agnostic. to the network, to the terminal, and supports all these different types of what you could call, Paradigms for using content: broadcast, communication, retrieval, and retrieval could be online or in packaged media like DVDs.

So basically, a single technology for all these devices. And of course, that doesn't mean that on your high-definition television, you're using the same bit rate. But you're using the same-- you could very well use the same systems layer, by the way. You could use the same MP4 files. And they could point to different content media files with different encoded bit rates. And like Frank showed this morning, you could easily take one file and transport it to-- transcode it to another that does work on a mobile device.

Enabling what I like to think of as this "write once, play everywhere" paradigm, where you can use your content across your devices, on your PCs, on your CE devices, and on your phones even. You can even take it with you. And on the other hand, you can shoot your films while you're on the road, and just upload them to your PC, and they just play in quick time.

So that's where we see the applications of MPEG-4 today, and I already talked a little bit about in the State of the Union. We see it in mobile devices. We see it in broadcast, not as much yet. This will explode, excuse me, if we get the new advanced video codec, about which I will say a few words a bit further down in this talk. We do see streaming services.

Interestingly, we are getting a little bit of interactivity in these. MPEG-4 allows for great interactivity, which I will definitely talk about. And the BBC is doing a trial now with that sort of stuff. And for Package Media, which is also waiting for the new codec at this moment. So let's now go to the heart of this presentation, which is how does it work?

MPEG-4 is an object-based multimedia content representation standard. And some people that know a little bit about image coding or have heard about MPEG-4 will know that MPEG-4 supports arbitrary shape objects in video, and that you can do segmentation and stuff. That's all true, but you don't need to do it. An object might just as well be a rectangular frame of video and the audio. And then maybe the text that scrolls across the video is another object, which is already a huge difference with MPEG-2, where everything is just pixels.

It's got a revolutionary systems layer. It's got state-of-the-art codecs, which are responsibly upgraded, which means no new codec every half year. It's something that you can perfectly well do in the Internet world, but the CE world doesn't work that way, and people don't want to buy a new DVD player every half year. And it's got something called profiles and levels to restrict complexity and guarantee interoperability. Those are the, what we call, interoperability points, profiles and levels.

And while MPEG defines a whole bunch of those, and you could actually say there's too many of them, it doesn't really matter because industry consortia, such as the Internet Streaming Media Alliance, which is a consortium that Apple helped founding, JPEG their profiles and levels. They say, okay, we're going to use this profile for video, this profile for audio. We're going to use this file format. Well, there's only one of them, so that's simple. And that's how we are going to do interoperable streaming media. All of us. Philips, Sun, Cisco, Casena, Apple, you name them.

So let's take a look at this picture, and I'll jump off stage for a second. And you can read the text, and that's okay, because you don't need to. But what you see here is all the different content types that MPEG-4 supports, and what you see, there's audio, there's video, there's graphics, there's even 3D graphics. There's text, there's animation, there's something we call BIFs, so we'll tell about what BIFs is.

And then there is, basically this represents a multiplex, which you can see is an MP4 file, which is the basic container that can carry everything. Then you can distribute that stuff, these containers and these streams, using whatever you would like to use, because basically MPEG-4 is agnostic to all these things, so there's broadcast, there's broadband delivery, satellite, there's wireless, there's phone lines, whatever. And then you can put it on a number of different devices.

And what I talked about the devices before, so let's skip that bit. But what's now interesting is, let's look at what MPEG-2 does. And I'm actually going to go back to the screen. In MPEG-2, you would do authoring, and you would take all of these objects, you would do your authoring, and Apple has a couple of great products for doing authoring. But then you're going to do encoding, and what you do with encoding is you basically say, okay, now I'm going to convert all these things into pixels. One plane of pixels. Everything is collapsed into a single plane.

"The rectangular frame of pixels gets encoded. I explain everything using video concepts. Actually, in audio I could do something similar, but it's just a little bit easier for me doing it in visual concepts. So you take all the objects, you collapse them into a single plane of pixels, you encode this using MPEG-2, and then you just display it here. There's nothing you can do anymore."

Now with MPEG-4, you can, if you wish, you don't have to, but you can keep all of these objects separately. You can have multiple video objects, you could have one. You could have a graphic that's encoded independently. You could have your streaming text. You could have your voice and your music encoded separately.

You can keep them separate, what's called elementary streams. You could send these to the decoder, and then you do the composition here. So instead of doing composition before encoding, here, we are now doing composition after decoding of the objects, which is here. And that's the major, actually, if there is one major paradigm shift in MPEG-4, that's it.

Now, in order to be able to do this, you need some sort of a language. It tells you, okay, this is where the objects go on the screen. This is when they appear. That's what we call the BIFs, the binary format for scenes. It's an efficient binary language that allows you to describe where the objects are, where they go, when they appear. Now, if you have this BIFS language, you cannot not just describe the scene statically, You can also start describing the scene dynamically. You can attach behavior to the objects. You could say, "Okay, this logo is spinning.

It's changing its color. It's moving from the top left of the screen to the bottom right of the screen." Now, if I were to do this in MPEG-2 or any traditional codec, I would have to encode all these pixels, and again and again and again and again and again until the logo was here, which is quite a bit of waste of bits.

While in MPEG-4, I just give one command saying, "Okay, move the logo from there to there," and take a second to do it, and that's it. This is a very small binary command sent to the decoder. The decoder takes care of everything. Now this applies to visual objects, it applies to audio objects as well. You can describe 3D audio scenes in this and have sources move around in a scene if you wish. That's quite a bit more advanced. But that's the basic concept of MPEG-4.

So let's look at this in a typical MPEG-4 scene that is fully free of any copyright. So I won't get in any trouble, which means it's a bit dull. I made it myself. It's an aquarium with some seaweed. There is an arbitrary shape video object. And I've been using this for a while. She's four now. This was when she was one day old. There's some bubbles, some fish, and there's another type of fish, which is a special sort of fish, which I'll explain in a little bit. And all these are different objects.

So this is an arbitrary shape video object, or natural video object. These are graphic things, the fish. And then there's the bubbles, there's some background, it has music, maybe a voiceover. And then there's this, it looks like a wireframe, it actually is a wireframe. It is a wireframe with a picture projected onto it.

And the neat thing about this is if you move the... The vertices in the wireframe, you can make the fish swim. And actually, in real life, you wouldn't see all these wires. These would be hidden, but that's just to show you how it works. These are a couple of the objects that MPEG-4 supports. Now, this is what the scene tree looks like.

All these objects are represented by branches in this tree, and they have sub-objects. At whoops, some of these, I'm trying to go back. Didn't really want to do, yeah, back works. So all of these objects can have, Audio and video associated with them. Some of them are static, just graphics. Some of them are streams. Some of them are audio, some of them are video.

And this is actually literally what's represented in the decoder. And this is, now you can go in with your BIFS language, and just do stuff with the branches. You can take a branch out and an object disappears. You can change the place of the whole branch. You can change the color of an object, just by issuing these little BIFS commands. So, recapping.

We have an audiovisual scene with objects. It could be a very complicated scene, it could be a very simple scene with one audio object and one video object, and it just provides interoperable streaming, which is quite a feat in itself. These objects can be of different nature. They can be natural, which is they are recorded with a camera or a microphone. They can be synthetic, which is they are generated with a computer program. And there is a compositor, which is this new element that puts the objects in the scene. And then there is an efficient, real-time, binary scene description language, which is called VIFS.

And BIFS, I'll say a couple words more about BIFS. It inherits a lot of VRML, the virtual reality modeling language. But as you may know, that one was neither real-time nor binary, and therefore not very efficient for stuff like streaming over the Internet or to mobile phones. It was perfectly okay for doing computer stuff.

And the coding scheme of all these different types of objects is optimal for the object type. So you don't try to encode speech with a music encoder, which is not really optimal. You don't have to encode a graphic with a video encoder, which is optimized for moving video rather than just still graphics. You can use the optimized coding scheme for each of these objects.

And this is completely independent of bitrate. And I still say this because most people now understand that MPEG-4 isn't about low bitrates, just about low bitrates. It's also about low bitrates. Way back when, 1993, MPEG-4 started as a low bitrate project, but that got changed really quickly in 1994. But some people still think it's about low bitrate. So, MPEG-4, there's a studio profile that actually goes up to over a gigabit per second in video coding.

So let's look at the different objects that are supported in MPEG-4. The ones you know are video and audio. And these are the most widely deployed. Video coding and advanced audio coding, MPEG-4 advanced audio coding. In addition to the video coding, on the visual side, we have animated faces and bodies, and there's some companies that have products out there for animated faces.

And I think the BBC has been looking at doing this because they have a legal requirement to do... Talking heads for people that can't hear, deaf people, and they're supposed to be able to read lips, and you can do this with animated faces. There are two-dimensional, three-dimensional animated meshes. Those are little wireframes that you can project either still or even moving video into these wireframes, and then you can deform the wireframes, you get really intricate effects, and there's text, streaming text and still text and graphics.

And JPEG is also supported as a part of the MPEG-4 framework. You can just use JPEG graphics. And then in the audio side, we have generic audio from mono to 5.1 channels. And by combining different audio objects, you can actually go up almost indefinitely. You don't need to stop at 5.1. There's specialized speech codecs, synthetic sounds. This is very advanced.

Structured audio is basically a language to program a synthesizer, and then first to describe

[Transcript missing]

A score language. There's text-to-speech, which is merely an interface, but you can mark up text so that it can be regenerated as speech. And then there's something called environmental specialization, which is making stuff sound like it's in a specific place. You can describe the place.

So let's look at the parts, how this all fits together. First, there's the visual coding, and then there's the audio coding. And this is just decoding. I'll say a few words about this a bit further down in my talk, but it's important. MPEG-4 only standardizes decoding. It doesn't standardize encoding, and that's why there's so much competition between providers. It's the same with MPEG-2. And as you will see a bit further down in my talk, this provides for a lot of improvement in quality of these codecs.

And this is also why you have to be very cautious with statements from proprietary vendors about the quality, the quality of MPEG-4. It doesn't exist, basically. But you can get the best quality with MPEG-4, and there are fair comparisons to be made. But I'll say a few words more about that a little later.

Then there's a systems layer in MPEG-4, which does stuff before decoding in terms of demultiplexing and buffering, and after decoding in terms of presentation, which is the composition of the objects. And the systems part used to contain the file format, which is the MP4 file format, which is extremely close to the .3GP file format, which you saw Frank talk about in his presentation. The only difference is basically that there is a top-level atom that says, "This is a 3GP file," which means, "Okay, I now have AMR voice coding support." which is not something that is natively known to MPEG-4, but for the rest it's just the same stuff.

And then there's something called DMIF, which isn't always used, you don't have to use it, but which would provide you with an abstract interface to the transport. And if you use DMIF, which has a little bit of a grandiose name, Delivery Multimedia Integration Framework, it stands for, it's actually a quite compact part of the standard.

If you use DMIF, you can write your application to a transport layer, and then you only need to write separate interfaces to a disk, or to a network, or to a broadcast even. And your application needs to be further, fully unaware of what it's talking to. And then there's the transport layer, which in principle is not in the standard.

And this is how content flows through. It comes through transport, it goes through D-MIF if it's present. Systems take care of demultiplexing of all the different objects. It's decoded, and then the decoded objects are composited onto the screen or into the sound space. Composition of audio could very well be, okay, I turn up the volume of the background a little bit, and I turn down the volume of the foreground speaker a little bit, or I choose the Japanese speaker rather than the English speaker. These are all possibilities by using MPEG-4 composition.

And there's two sort of orthogonal parts. Conformance, which contains a lot of bitstreams. If you have a decoder, you can use the conformance part and see if your decoder is conformant. Gives you some level of indication of interoperability. In the MPEG-4 industry, we do much more interoperability work with exchange of bitstreams. And then there's reference software, which is actually free of copyright.

[Transcript missing]

There is something, even though in principle this is not in a standard, something called MPEG-4 on IP, which is a specification on basically how to use IETF protocols and how to do the mappings. And more recently... What's called advanced video coding was added to the MPEG-4 standard, and I'll say a bit more about that in a bit as well. And you will see that the numbers don't quite add up. There's more stuff that I don't think is important to talk about right now. So let's take a look at some of the recent developments.

Hey, this slide was supposed to have been hidden. I want to first say a little bit more about the objects. I'll keep this a little bit brief. So we have video, which basically goes from 10 kilobits per second to over a gigabit per second. So if you take one set of zeros out, it's megabit. And if you take another set of zeros out here, it's gigabit per second. And Sony actually has cameras that support this stuff. Studio Profile, it's called. Multiple rectangular or arbitrary shaped objects in the scene. Scalability is supported, including fine-grained scalability.

[Transcript missing]

And there's a lot of stuff here, and I should say some of this will be used, and some of this will likely not be used. And that's quite okay, because we have these profiles, and people will pick what they need. Again, with audio, we can have a number of objects in the scene, so you can make your audio composition. I think the most important codec in MPEG-4 is MPEG-4 Advanced Audio Coding, which is very much like MPEG-2 Advanced Audio Coding, it has a couple of new things.

There's another audio codec for really low bit rates, but AAC is getting really low as well these days. And then there's one for extremely low bit rates, which is called HILN. And then there's a voice codec, actually two of them, one again for extremely low bit rates and one of them for normal bit rates, and at 24 kilobits per second you have just basically transparent voice quality you can't distinguish from real voice.

And in audio, you have again scalability so that you can have... Actually, it's interesting. You can build an AAC layer on a CALP layer if you wish, even. So you use the CALP layer as sort of what's called the prediction, and then you can build... Then you can put an AAC layer on it if you, for instance, do radio. The basic quality goes in kelp, because it's mostly speech, and if you want to have the really good quality, you do it in MPEG-4 AAC.

And something like that is actually done in Digital Radio Mondial, or DRM, which is a digital broadcasting standard. MPEG-4 AAC And the conditions were such that you have to be able to receive it in very poor reception conditions, and then you have to get a good, really good quality signal if you receive a good, just a good signal, and that uses this type of scalability.

[Transcript missing]

The music itself. And this is really at one or two kilobits per second, you can do really great music. And there's a company that's been working on this for a long time, and they were going to build a QuickTime plugin, and I hope they'll come out with it soon. It was promised for this summer. Media supported and a couple of types of synthesis. And then there's this text-to-speech interface, which you can use together with face and body animation.

Those were some of the more esoteric object types. Support today in industry is for AAC and for just normal rectangular video coding. And there's some companies that are trying to do more interactive stuff with MPEG-4, but they start with the systems layer. They have graphics, they have arbitrary shape stuff, they have semi-transparent graphics, and they use Notably, the binary scene description.

As I explained, it's inherited from Vermal, but it's much more efficient, and it's added real time. It's basically married the MPEG-2. MPEG-2 came from the broadcast world, right? So people know about synchronizing audio and video, about synchronizing different objects, and about buffer models and stuff. And the scene description marries these concepts from Vermal and from MPEG-2 great broadcast, great synchronization.

That's what allows the interaction. It works in 2D and 3D, and there's a couple of 3D players out there already. And it allows you to do dynamic scene updates. You can add objects to the scene on the fly, you can delete them, and you can change them all on the fly by using this scene description language.

And to provide an interface with the smile world, and to make it better authorable, MPEG later added what's called the extensible MPEG-4 textual format, or XMT, which is basically a textual format for bits, and there's actually two versions of them. One of them is very close to the bits, and one of them is more generic, but I'll spare you those details. But the important part is that there is a smile harmonization to the extent possible, because there is a lot of smile content out there. Thank you.

What's very important in MPEG-4 systems is that you do get predictable behavior of audio and video, which hasn't always been the case with all the web, the Internet technologies, and that you get predictable buffer management, so that you know, if I send content, it will play on the player, because the player knows what to expect, it won't get trouble with buffer overflows, it's all predictable and it's all standardized.

There's some more stuff for smile integration here in the timing, which you can basically do a more loose timing of your objects. And what's important here is that while MPEG-4 doesn't standardize digital rights management, it has interfaces to proprietary systems. Digital rights management is not going to go away.

I think it actually could provide some useful features for end users, even though it's been portrayed as something that is hostile to end users. That's wrong. But in order for an ecosystem to support Serious content being deployed in the ecosystem, something needs to be done about this rights management. And I think Apple took a great approach with what's being done right now in iTunes. It's very user-friendly, and basically DRM needs to be, you don't see it if you make normal use of your content. And that's what we're getting to see these days.

There's a standard interface in MPEG-4, and there's MPEG-21, which will bring more interoperability in DRM, which means it's no longer, as it is today, the monopoly of one big company, basically. And the file format, I already said it a couple of times, is based on QuickTime MP4, just like the .3GP file format, which is very close to MP4.

Quickly wrapping this up, there's MPEG-J, or Java, which you can use for really complicated content, and for having programmed content, basically. But also standard APIs to find out what you're talking to, what are the terminal resources and stuff. And there's some advanced audio rendering, where you can basically create the sound without changing the source of the sound. You can describe the environment in which it should be played, basically. You can say, okay, this is in a closet, or this is in a giant hall, or this is a football field. You can describe this with the audio rendering stuff.

And I realize I'm giving you a lot and a lot of details. Gonna build, we're gonna try and talk about applications soon. But I have to make the case for the profiles first. If you have this huge toolbox of all this stuff, then in theory you would have interoperability, but in practice there's not going to be a lot of interoperability because everybody's going to be implementing different things, right? Which is why MPEG defines profiles.

And these profiles are the conformance points, as we call them, which is, okay, this is where you can test the interoperability. A profile basically determines a toolset. I use these tools to encode my video. I use these tools to encode my audio. And then the level within the toolset limits the complexity. And stuff like, okay, bits per second for video, or the screen size for video, or the sample rate for audio, these things are in the levels.

And if you take a look at the ISMA, the Internet Streaming Media Alliance, they have said, okay, we're going to use what's called advanced simple profile for the video. We're going to use low-complexity AAC, MPEG-4 AAC for the audio. And we use the MP4 file format. And that's what we're all going to do. And now I have a, within ISMA, I have an interoperable... stack, and they add some transport to that, which is something MPEG-4 doesn't define.

And then you have the interoperability. And it's interesting to see that while MPEG-4 has many profiles, and I would say too many, and I'm partly responsible for them as chair of the MPEG requirements group, it's very good to see that industry is converging on just a few. Just a few.

Which means there is this interoperability. And the ones they are choosing are hierarchical. So there's the simple profile and advanced simple profile, which are mainly used in video. Simple is what is now in QuickTime. Advanced simple is what is in some of the more advanced MPEG-4 players and encoders and decoders. But they're compatible in the sense that if you have simple content, simple profile content, it will play in an advanced simple player.

So that's good. And that's why you can see that people are exchanging content. And DivX, for instance, is an implementation of Advanced Simple. We have a couple of profile dimensions. I think this gets too technical to go into real detail. But all the elements in MPEG-4 have been profiled. That's basically the point of this message.

And actually, I don't think there are handouts, right, at this conference, are there? So we'll make this available on the MP4F website. Can we do this? I think so? Yeah. So we will make this presentation available on the M4F website, and you can download it if you later want to review it. Let's look at recent developments. This is a very interesting development. MPEG-4, as we know it today, was standardized in 1998, 1999. Some stuff was added. That's four years ago.

Very recently, a new codec was added to MPEG-4. It's called Advanced Video Coding. And while MPEG-4 As it was until this was added, it was very attractive to mobile and Internet, where there was no MPEG-2 yet. It wasn't attractive enough yet to the broadcast, because in order to replace the MPEG-2 infrastructure, or to add something to the MPEG-2 infrastructure, you really need good advances in coding efficiency. And while MPEG-4 Advanced Visual Profile provides this, It wasn't enough for these major investments in the broadcast industry. It was enough for the internet and for the mobile and stuff.

But this new codec, which is called Advanced Video Coding, which originally comes from the ITU world, and they've been working on this for a long time also, maybe the first I knew of the project was 10 years ago basically. And it's the same codec standardized in ITU and in ISO.

It's actually basically, there's two groups in the world that work on video coding standardization. There's the video coding expert group in the ITU, the International Telecommunication Union, and then there's ISO, MPEG. Came together, formed the joint video team, hence the JVT codec, and standardized this new codec, which beats everything out there. So forget about what you hear from Microsoft. This is better. And this has been confirmed by independent parties like LSI Logic, whom I have great respect for.

And interestingly, and again I will say more about that, improvements will continue because of the fierce competition in this market. There's really fierce competition, and we've only standardized the decoder, so people will come up with amazing encoders, basically. And this will give you about broadcast quality MPEG video at about 700 kilobits to 1 megabit per second.

Now that's significant, because that starts to get in the range where you can do streaming over a broadband network, over a good ADSL connection or a good cable modem. It starts to get there. It's also good enough for people to think about, okay, I'm investing in a new generation set of boxes now. Maybe I should take a look at this new codec. It's also good enough for this to be implemented in mobile devices at some point in time. They already did support the basic MPEG-4, as I could call it now, and they will start supporting this stuff.

And it's amazing that what Apple is doing with the conferencing stuff, using MPEG-4 for conferencing, there's a lot of people lining up to do this, to use advanced video coding, or H.264 for conferencing. There's a lot of industries waiting to start using this codec. And I'm sure, even though Apple never discloses its product plans, not even to me, I'm sure that they're working on this codec and they will have it ready pretty soon.

Then there's advanced audio coding and high-efficiency advanced audio coding. Now with high efficiency, it's a neat little trick where you split the spectrum in half, and then you predict the upper half of the spectrum. For you, this is the upper half of the spectrum from the lower half of the spectrum.

And if you do CD quality, or near-CD quality, or really good audio, just basically Internet quality, this gives you a lot, really a lot of bandwidth savings. Like CD quality, or about CD quality at 48 kilobits per second, and high quality at, just general Internet quality at 32 kilobits per second.

The trick doesn't work for transparent quality. So if you want to have really transparent quality, which is something that iTunes is trying to achieve, then you would still use normal AAC. Then you don't gain anything from this prediction trick. But it's really neat, and it's being used in XM Radio and Digital Radio Mondial, or DRM, for their broadcasts, because it works so well.

And now AAC, and including high-efficiency AAC, have been tested as the best-coded by the European Broadcasting Union over all the proprietary codecs. And that's what's being shown here. And this should actually say AAC, Advanced Audio Coding. This is the original, and then you see high-efficiency AAC, which was tested in one specific implementation called AAC+. Whoops.

And then you see AAMP3 Pro, which actually is AAC+. AAMP3 with the same trick applied, the same prediction trick. And you see Windows Media here, and Reel here. And Windows Media 9, I have been explained by Audio Coding Expert, doesn't really differ a lot from Windows Media 8.

So this was a test at 48 kilobits per second. And, whoops, why does it do this if I don't want it? Hello. 48 kilobits per second, done by the EBU, which is really independent. And this was a really professional test, double blind, which means people don't know what they're listening to.

And basically, because audio testing is a lot like voodoo, and if the experimenter knows what's being tested, he can make you believe anything. And if you know what you're listening to, you can also be made to believe everything, or anything. But if it's double blind, and neither the experimenter nor the listener knows what's going on, then you get really valid results. And that's what happened in this test.

Couple of other developments. Some work going on on truly 3D video coding. That's very advanced. Some work going on on truly lossless audio coding. That's also for the high-end. And there's some work going on on a very interesting animation framework, which actually takes a step back and says, okay, let's do this right, this animation. Let's create an integrated framework for animation of all sorts of graphics content. It's not for video content or for just natural audio, but for computer-generated content. And we'll see where that goes.

So why should you use MPEG-4, apart from the technical details? And this is going in the business stuff. I'll quickly go over this. I think if you're a developer, this may interest you just a little bit less. But let's just take a look at standards and why they make sense. They fuel a lot of innovation. And actually, this slide, I have to acknowledge Tim Schaaf, who originally made this slide, for giving me this slide.

Standards Fuel Innovation. GSM is a great example, the European, or actually now worldwide standard for mobile telephony. N802.11, also known under different names at Apple, but you can connect here to your wireless network. It just works great. They have really long life standards. If you look at the TV standards like PAL in Europe and NTSC in the US, or MP3, which is actually over 10 years old now, but is still a premium feature. If you buy a car stereo, MP3 comes at a price. It's being built into car stereos and stuff and digital devices today, and it will not go away. No matter how great the successors are, this will have to, this will keep being supported.

That means that as a consumer, you don't have to throw away formats every other year or every year. You can just keep your stuff and it keeps working. VHS has had a long life. The CD has been with us for over 20 years. Standards create huge markets. The CD, the DVD, and MPEG-2, which is a... Which are really, really multi-tens, hundreds of billions of dollars markets.

And they provide an interoperable ecosystem of tools and content where you can just use stuff from different providers and plug them together and it works. And these different providers can work independent of one master, so to speak. You're not dependent on anyone. You're not locked into any single vendor.

And the vendor may be competing with you, by the way, if it moves into different spaces. And there are, the pricing is controlled by the market and not by a single vendor again. And if you don't like your equipment from one vendor, you can go to the other one.

So, if you use MPEG-4, a couple benefits for you. You can author your content once and then use it on a couple of different mini platforms and players. Encode once. You may have to encode at different bit rates, but like was shown this morning, this can be made really easy.

Your users can pick their favorite stuff. They don't have to stick to one player. Content providers, on the other hand, can pick their favorite stuff. Everybody can just provide tools in their own niche. There's a lot of different niches, and it isn't like one size fits all. Compete And competition drives the quality up.

If you look at MPEG-4, it's both a revolution and an evolution. It's a revolution in what I explained about the design, how it works and how it can expand to synthetic content. It's an evolution in the sense that it doesn't define new transport protocols and stuff, you can just use it on whatever is there already in place.

And specifically, in MPEG-4 as it is today, and MPEG-4 with AVC, Advanced Video Coding, as it's coming now, it can all be used in an MPEG-2 environment, which is a very big plus for broadcasters again. They don't have to replace all their MPEG-2 broadcasting stuff, they just need to plug in a new codec. Which is difficult enough, but if the gains are good enough, then the economics are sound.

So it saves you money, and it makes you money, I believe, by making more efficient use of bandwidth, because it's efficient, by being able to repurpose existing content and now making it interactive, or deploying it on a mobile network. No need to duplicate work if you go to different networks.

You can integrate it into existing MPEG-2 environments, and you can use it on IP networks just as easily. And it makes you money, because you can use your content in your networks in new ways, you can add new dimensions to content, and there's little risk because it's a standard that's widely supported. Proprietary technology, on the other hand, does lock you into third-party business and pricing models, and make you dependent on their roadmaps and their plans and the way they choose to evolve their business. And it can get you into channel conflicts.

So this is just one of the forecasts. This is of chips. Standalone MPEG-4 chips and cores embedded in processors. They think it will explode, and it's already happening. I tend to agree. And there are many similar forecasts, and one interesting trend is, okay, for the coming few years, competition with Windows Media, after that, standard will win, because the benefits are just so obvious that the market will choose for the standard. And there's such a lot of people already making MPEG-4 stuff.

It's amazing. Now, this is an important point, and I want to dwell on this for a while. Because MPEG-4 only standardizes the decoder, there's a lot of room for innovation. And if you see comparisons, and I've seen very bad comparisons of, notably by Microsoft, that put QuickTime here, and then their latest, Microsoft's latest codec on the other hand, and then they compare the quality without saying that they're only using Quick, a simple profile for MPEG-4, and that they did the encoding themselves. I mean, there's such a lot of tricks you can pull if you do quality comparisons.

But, if I look at MPEG-2, and this is really the proof of the pudding, MPEG-2 bit rates have reduced by over 50% over the lifetime of the standard, and this is an underestimation. I will show you the graphs. And this was after the standard was frozen, and without needing to replace the decoders.

A great new encoder comes out, it just gets plugged into the broadcasting system, you don't need to replace the set-top boxes. Just works. People come up with great new tools for encoding DVDs. DVD players don't need to be replaced. Because the decoder is the same, the encoder gets better.

And that's what's happening with MPEG-4 in the market today. And that's what will happen, really happen, with advanced video coding, what's already happening today. AVC will beat all the proprietary codecs already up there, including Windows Media 9. And if I look at... That's interesting. You should disregard the numbers here, because they are wrong.

What's actually right, I thought we hit this stuff. Someone's phone is ringing. This should be 6 megabits per second. When it started in 1994, 1995. Today, you can deliver the same quality in 2 megabits per second. So it's not like suggested 1 megabit, it's 2 megabits per second. But still, from 6 megabits per second to 2 megabits per second, without changing the decoders. That's quite impressive.

And that's from Harmonic. And if I take a look at what Tenberg says, Tenberg TV, is a competitor of Harmonic, they basically tell you the same story. With them, the graph should start at 8 megabits per second. And now, whoops, this should have read 8, and here again it should have read 2 megabits per second.

Something went wrong on the conversion from PowerPoint to Keynote. The picture is clear. From 6 megabits or from 8 megabits per second to 2 megabits per second today is huge improvements because there is competition. So there's open standards, it's interoperability, but there's a lot of room for competition.

So briefly, let's look at the deployments of MPEG-4. We see PC media player support, and a recent survey turned up on the MPEG-4 M4F tech notes mailing list, turned up like some 20 different players. And some of them are for facial animation and for 3D content. Most of them do basic streaming.

QuickTime 6 is there, of course. Reel has a standard plugin, which means if you hit MP4 content with a Reel player, it goes back to the Reel server, downloads a plugin if it's not already there, and decodes the content. That's done by NVivio. There are several plugins for Windows Media. There's DivX, which is an MP4-compliant implementation, which has millions of downloads weekly, just like QuickTime.

MPEG-4 is widely supported in 3rd generation and 2.5G mobile phone networks. Like Roberto Castaño said this morning, it really becomes the case that in spite of all these different mobile networks, which are not really interoperable, you can take content from one of them in Japan and move them to Europe, and the content will play. So you can take your phone to Japan, but you can send your content using the phones, and it will play.

MPEG-4 is used for video. AAC is the optional sound codec, in addition to the mandatory speech codec. And the file format, like we said, 3.3GP is very close to MP4. It's just this top-level Atom and the AMR codec that's used. Quick Term 6.3, recently released, of course, supports 3GPP. Then the Internet Streaming Media Alliance. I've said a couple of words about that already. Made a specification for interoperable MPEG-4 across the Internet.

What's maybe more hidden in the background is that MPEG-4 is becoming the de facto standard for security and surveillance. There's a lot of surveillance cameras with hard disk recorders and stuff that just use MPEG-4s almost silently, because we don't, and I don't, get to hear a lot about them. And interestingly, you see a lot of home media centers that do MPEG-4. And people use DivX to rip their content, and then they put it on a DVD, and they put it in the DVD player, and the DVD understands MPEG-4.

And these are just a couple of recent announcements, and I won't mention them all, but there's chips, there's video cameras, there's solid-state video cameras. These are cool. These are just this size, basically, and you can record on an SD card or a memory stick or something. You can record a half an hour of video and audio that's watchable on a TV. I won't say it's like DVD quality, but it's perfectly watchable, just on a device this size.

There's portable stuff, and it's coming more, like there's video jukeboxes that use MPEG-4. And there's, of course, mobile phones that don't just decode, but some of them also stream it. So it's not just in, or it's not just recording. Some of them can even play it out while it's being recorded. It's pretty cool.

So lastly, I want to say a couple of words about the MPEG-4 Industry Forum. And in that context, even though we're not responsible for it, about licensing of MPEG-4. Because some of you may have-- who's heard about licensing here, by the way? Yes, right. OK, so come back here Thursday morning. I won't be here, unfortunately, because we have our annual M4F meeting. But someone will be here to explain.

So let me say a couple of words. We're three-year-old now. About 100 members we have worldwide and across industries, very much according to MPEG-4's vision. A nonprofit organization. We have these and many other members. Apple is, of course, there. And you see a lot of major companies, but there are also smaller ones.

And they come from IT, they come from the consumer electronics industry, they come from the mobile operators, they come from all across the globe and all across the industry. And they all believe in this single standard that works across everything. Our goal is to get MPEG-4 adopted, and we have done a couple of things that are important. We've discussed licensing a lot.

Again, I will say we're not responsible for licensing. I'll clarify that in a later slide. We've done a lot of interoperability with a program with over 30 companies exchanging bitstreams between their products. We will have a logo program pretty soon, and if you type in MPEG-4 in Google, you come to the M4AF website and you get a host of information.

And this is our membership. If you're interested, $3,000 for a full membership and $300 for not-for-profits.

[Transcript missing]

But the responsibilities are as follows: MPEG standardizes. So MPEG is the Moving Picture Experts Group, makes the standards. And by ISO rules, they can't really deal with licensing. Although, with the new codec, there's been a lot of effort to get a royalty-free baseline codec, which is the simplest incarnation, it's a profile, and there's been a lot of effort to try to keep this royalty-free for licensing.

Then there's the MPEG-4 industry forum, which has done a lot of work to To get licensing off the ground, but doesn't see anything of the proceeds, doesn't require anything of its members with respect to licensing, it's just literally a catalyst. If you know how a catalyst works, before and after the chemical reaction, the catalyst is unchanged.

Well, if the reaction goes really bad, the catalyst may go away. It may still happen. But then there's the license source. That's the people that actually have patents that sit together in some room and decide and sell licenses. And what M4AF says, the licenses need to be competitive. It should be possible to build competitive products, given the licensing. So that's what we're working on right now, still working on right now, and actually working on really hard right now to get this right for AVC. Because we know some things need to be improved there.

Monday morning, Larry Horn, I think, of MPEG-LA will be here to answer your questions about licensing. He's going to tell you what's called the truth about MPEG-4 licensing. My personal opinion is that it's great for devices, it's great for phones, it doesn't work yet for content providers. That's what we're working on right now.

And there's a lot riding on this, I can tell you. Hey, I hadn't expected this slide. Aimee. MPEG-4 is an all-encompassing specification, with many profiles and technologies that cover the entire spectrum of digital media. This session provides an in-depth look at the MPEG-4 specification, and explain how these technologies benefit you. Aimee Nugent, Rob Koenen