QuickTime • 1:05:18
MPEG-4 is revolutionizing Internet media by providing a robust and high-quality standard for content creation, delivery, and consumption. MPEG-4 is an all-encompassing specification, with many profiles and technologies that cover the entire spectrum of digital media. This session provides an in-depth look at the MPEG-4 specification, and explain how these technologies benefit you.
Speakers: Aimee Nugent, Rob Koenen
Unlisted on Apple Developer site
Transcript
This transcript was generated using Whisper, it may have transcription errors.
Well, hello. Welcome. Thanks for coming. This is session 702, MPEG-4 Demystified, as part of the QuickTime track that we're very happy to have this year at WWDC. I'm Amy Nugent. I work on the QuickTime product marketing team, and I'll be your host for this session. You probably just came from the QuickTime State of the Union presentation, and there you've learned where standards really play a very important part in the strategy for Apple as well as QuickTime. And it's my honor to introduce to you Rob Conan, who briefly talked in the State of the Union, president of the MPEG-4 industry forum as well as many other jobs. And he will go through the MPEG-4 specification. It's a very vast and deep specification that is capable of many things. And I will leave you in the very capable hands of Rob. And I hope you have a good session. Rob, thank you very much, Amy. So I think we've got to do one of two things first. Either all of you flip over your iMacs, or we put on some light in the room, because it's really very dark. I can't see anyone. So here are your iBooks. There's not enough iBooks to go. Can we have some light in the room? Is that possible?
Yes, I think you can still see the screen, right? And the screen is much more important than I am. So allow me to take off this. Good morning, everybody. It is my pleasure and honor to be able to talk to you today and to explain the high level concepts of MPEG-4. I think I will be able to demystify some of it, maybe not all of it. Notably, one thing I have to say up front is about the licensing. Some people of you have heard about licensing. You should come back on Thursday morning when there is someone here to explain just that. I will say a few words, though.
And if during this talk you have any questions, I don't mind being interrupted. Just raise your hand. It would be good if you used a mic for a question, because this is being translated simultaneously into Japanese, and the translators can only hear you if you use the mic. If that's difficult, just shout out your question. I'll repeat it for the translators. Okay.
I have this neat little device. I've been told not to press this button, because then everything will go wrong. When they give you neat little devices with buttons that you're not supposed to press-- Thank you for coming. This is what I would like to address today. What is MPEG-4? How does it work? What are the recent and interesting developments?
Why should you use MPEG-4? This is just a bunch of business talks. I'll go over that quickly because you're all developers. I'll tell a little bit about the deployments of MPEG-4 and then just a few words about M4AF or the MPEG-4 Industry Forum, which is an advocacy group for MPEG-4. So let's start with the basics. What is MPEG-4? I am today not going to give you-- I have to take out the computer right now. What's happening to Baker here?
I got the message. What is MPEG-4? So I'm not gonna give you the gory details of the video codec, or the audio codec, or any of the system codecs, but a high level functional overview of what MPEG-4 is and what it does. First, it's what we like to think of as the media standard. And we call it that way because it's a standard that works across all devices, all networks, all carriers, everything, basically. And that's why we also call it the interoperable cross platform ecosystem. While it's interoperable, it's also competitive. And I will tell you why it's competitive. Because just that you have a standard doesn't take away all the competition there is. On the contrary, actually, that creates a lot of opportunities for competition.
Most people know MPEG-4 as a video codec, and Apple likes to talk about MPEG-4 video and AAC. AAC is actually also MPEG-4 or MPEG-2, Advanced Audio Coding. And it's also a systems layer, a part of which is the file format, which was based on QuickTime, as Frank Casanova explained this morning. It goes way beyond audio and video, though audio and video are the first elements that will see deployment. And it supports stuff that's way beyond that. And I will tell you a bit about that. And it's designed for all multimedia platforms, digital ones.
So where does it come from? Most of you will know-- So who knows MPEG-3 here? Anyone know MPEG-3? Some people still do. MPEG-3 doesn't exist. Because what you know as MPEG-3 or MP3 is actually MPEG-1 audio. But MPEG-1 had three layers of audio. And layer one was really simple. Layer two is what's used in most of the digital audio-- the digital broadcasting systems today, in Europe at least. In America, it's Dolby. And layer three is the most complicated of MPEG-1 audio layers that was way beyond what could be implemented when it was designed, but is now just the norm.
And then you may know MPEG-2, which is a standard for digital television and for DVD, video, and audio also in Europe. Again, in America, it's more Dolby Audio. Then there are MPEG-7 and MPEG-21, which are not successors to MPEG-4 or MPEG-2. MPEG-7 is sort of a metadata standard that allows you to describe content. And Epic 21 is a very fuzzy phrase. It's a framework for interoperable use and exchange of digital media. It does all sorts of stuff that has to do with where do you find content? What's a unique identification for content? Where do you find the rights to content? And it attempts to standardize elements of digital rights management, which is more of a challenge, I can tell you, than standardizing just a video codec, which is hard enough as it is.
So let's talk a little bit about the MPEG-4 vision. And this is a vision that's been with us for a long time, actually since the middle of the 1990s. And it's coming on close, and I've been working on MPEG-4 for 10 years almost. That will be next year. The vision, and back in the, if you remember, back in the early 90s or halfway the 90s, there was all this talk about convergence, and everything was going to be the same. And we're all going to have glass or fiber into our living rooms. And the big discussion was, okay, are we going to consume our content on the PC or are we going to consume the content on the television? And back then we said that's all a load of bullshit. There's not going to be -- I'm not sure that can be translated, by the way.
That wasn't quite right. There was, rather than convergence, we saw proliferation of multimedia. Rather than less networks, we got more. We got all these sorts of different mobile networks. We got the normal television network. We got the digital telephone network. We got the digital telephone network, ISDN, which has had quite a bit of adoption in Japan and Europe, not as much here. We got ADSL. We got cable. We got stuff coming to us through satellites. And it's going to be a chaos. And rather than just this convergent terminal, it's going to be either the PC or the television, another load of bull. We're going to have a lot of different terminals-- handheld devices, phones, more PCs, different PCs, set-top boxes. And they're all going to want to do digital multimedia.
And back then, the way to do standardization was new network, new standard, new stack. Communication protocols, codecs, everything anew, especially in the communications world. And we said that doesn't make any sense. We have to have one layer of content representation that works across all of these different applications is agnostic.
to the network, to the terminal, and supports all these different types of what you could call paradigms for using content broadcast communication retrieval and retrieval could be online or in packaged media like dvds So basically, a single technology for all these devices. And of course, that doesn't mean that on your high definition television, you're using the same bit rate, but you're using the same-- you could very well use the same systems layer, by the way. You could use the same MP4 files. And they could point to different content media files with different encoded bit rates. And like Frank showed this morning, you could easily take one file and transport it and transcode it to another. That does work on a mobile device.
Enabling, what I like to think of as this write once, play everywhere paradigm, where you can use your content across your devices, on your PCs, on your CE devices, and on your phones. Even you can even take it with you. And on the other hand, you can shoot your films while you're on the road and just upload them to your PC. And they just play in quick time.
So that's where we see the applications of MPEG-4 today. And I already talked a little bit about in the State of the Union. We see it in mobile devices. We see it in broadcast, not as much yet. This will explode, excuse me, if we get the new advanced video codec, about which I will say a few words a bit further down in this talk. We do see streaming services. Interestingly, we are getting a little bit of interactivity in these. MPEG-4 allows for great interactivity, which I will definitely talk about. And the BBC is doing a trial now with that sort of stuff. and for package media, which is also waiting for the new codec at this moment. So let's now go to the heart of this presentation, which is how does it work.
MPEG-4 is an object-based multimedia content representation standards. And some people that know a little bit about image coding or have heard about MPEG-4 will know that MPEG-4 supports arbitrary shape objects in video and that you can do segmentation and stuff. That's all true, but you don't need to do it. An object might just as well be a rectangular frame of video and the audio. And then maybe the text that scrolls across the video is another object, which is already a huge difference with MPEG-2, where everything is just pixels.
It's got a revolutionary systems layer. It's got state-of-the-art codecs, which are responsibly upgraded, which means no new codec every half year. It's something that you can perfectly well do in the Internet world, but the CE world doesn't work that way, and people don't want to buy a new DVD player every half year.
And it's got something called profiles and levels to restrict complexity and guarantee interoperability. Those are the, what we call, interoperability points, profiles and levels. And while MPEG defines a whole bunch of those, and you could actually say there's too many of them, it doesn't really matter because industry consortia, such as the Internet Streaming Media Alliance, which is a consortium that Apple helped founding, JPEG their profiles and levels. They say, OK, we're going to use this profile for video, this profile for audio. We're going to use this file format. Well, there's only one of them, so that's simple. And that's how we are going to do interoperable streaming media, all of us. Philips, Sun, Cisco, Casenna, Apple, you name them.
So let's take a look at this picture, and I'll jump off stage for a second. And you can read the text, and that's okay because you don't need to. But what you see here is all the different content types that MPEG-4 supports. And what you see, there's audio, there's video, there's graphics, there's even 3D graphics. There's text, there's animation. There's something we call BIFs, so we'll tell about what BIFs is. And then there is, basically this represents a multiplex, which you can see is an MP4 file, is the basic container that can carry everything. Then you can distribute that stuff, these containers and these streams using whatever you would like to use, because it's basically MPEG-4 is agnostic to all these things. So there's broadcast, there's broadband delivery, satellite, there's wireless, there's phone lines, whatever. And then you can put it on a number of different devices.
And I talked about the devices before, so let's skip that bit. But what's now interesting is, let's look at what MPEG-2 does. And I'm actually going to go back to the screen. In MPEG-2, you would do authoring. And you would take all of these objects, you would do your authoring, and Apple has a couple of great products for doing authoring. But then you're going to do encoding. And what you do with encoding is you basically say, okay, now I'm going to convert all these things into pixels. one plane of pixels, everything is collapsed into a single plane.
frame, rectangular frame of pixels gets encoded. I explain everything using video concepts. Actually in audio I could do something similar, but it's just a little bit easier for me doing it in visual concepts. So you take all the objects, you collapse them into a single plane of pixels, you encode this using MPEG-2 and then you just display it here. There's nothing you can do anymore.
Now with MPEG-4, you can, if you wish, you don't have to, but you can keep all of these objects separately. You can have multiple video objects, you could have one. You could have a graphic that's encoded independently. You could have your streaming text. You could have your voice and your music encoded separately. You can keep them separate, what's called elementary streams. You could send these to the decoder, and then you do the composition here.
So instead of doing composition before encoding here, we are now doing composition after decoding of the objects, which is here. Thank you. And that's the major, actually, if there is one major paradigm shift in MPEG-4, that's Now in order to be able to do this, you need some sort of a language. It tells you, okay, this is where the objects go on the screen. This is when they appear. That's what we call the BIFs, the binary format for scenes.
It's an efficient binary language that allows you to describe where the objects are, where they go, when they appear. Now if you have this BIFS language, you cannot not just describe the scene statically, You can also start describing the scene dynamically. You can attach behavior to the objects. You could say, OK, this logo is spinning. It's changing its color. It's moving from the top left of the screen to the bottom right of the screen. Now, if I were to do this in MPEG-2 or any traditional codec, I would have to encode all these pixels and again and again and again and again and again until the logo was here, which is quite a bit of waste of bits.
Well, in MPEG-4, I'll just give one command saying, OK, move the logo from there to there, and take a second to do it, and that's it. This is a very small binary command sent to the decoder. Decoder takes care of everything. Now this applies to visual objects, it applies to audio objects as well. You can describe 3D audio scenes in this and have sources move around in a scene if you wish. That's quite a bit more advanced. But that's the basic concept of MPEG-4.
So let's look at this in a typical MPEG-4 scene. It is fully free of any copyright. So I won't get in any trouble, which means it's a bit dull. I made it myself. It's an aquarium with some seaweed. There is an arbitrary shape video object. And I've been using this for a while. She's four now. This was when she was one day old. There's some bubbles, some fish. And there's another type of fish, which is a special sort of fish, which I'll explain a little bit. And all these are different objects. So this is an arbitrary shaped video object, or natural video object. These are graphic things, the fish. And then there's the bubbles, there's some background, it has music, there's maybe a voiceover. And then there's this, it looks like a wire frame, it actually is a wire frame with a picture projected onto it. And the neat thing about this is if you move the-- The vertices in the wire frame, you can make the fish swim. And actually, in real life, you wouldn't see all these wires. These would be hidden, but that's just to show you how it works. These are a couple of the objects that MPEG-4 supports. Now, this is what the scene tree looks like.
All these objects are represented by branches in this tree, and they have sub-objects. And whoops, some of these-- I'm trying to go back. Didn't really want to do-- yeah, back works. So all of these objects can have-- audio and video associated with them. Some of them are static, just graphics. Some of them are streams. Some of them is audio. Some of them is video.
And this is actually literally what's represented in the decoder. And now you can go in with your BIFS language and just do stuff with the branches. You can take a branch out and an object disappears. You can change the place of the whole branch. You can change the color of an object just by issuing these little BIFS commands. So recapping.
We have an audiovisual scene with objects. It could be a very complicated scene, it could be a very simple scene with one audio object and one video object, and it just provides interoperable streaming, which is quite a feat in itself. These objects can be of different nature. They can be natural, which is they are recorded with a camera or a microphone. It can be synthetic, which is they're generated with a computer program. And there is a compositor, which is this new element that puts the objects in the scene. And then there is an efficient real-time binary scene description language, which is called BIFS.
And BIFs, I'll say a couple words more about BIFs. It inherits a lot of VRML, the virtual reality modeling language. But as you may know, that one was neither real time nor binary, and therefore not very efficient for stuff like streaming over the internet or to mobile phones. It was perfectly OK for doing computer stuff.
And the coding scheme of all these different types of objects is optimal for the object type. So you don't try to encode speech with a music encoder, which is not really optimal. You don't have to encode a graphic with a video encoder, which is optimized for moving video rather than just still graphics. You can use the optimized coding scheme for each of these objects.
And this is completely independent of bitrate. And I still say this because most people now understand that MPEG-4 isn't about low bitrates, just about low bitrates. It's also about low bitrates. Way back when, 1993, MPEG-4 started as a low bitrate project, but that got changed really quickly in 1994. But some people still think it's about low bitrate. MPEG-4, there's a studio profile that actually goes up to over a gigabit per second in video coding.
So let's look at the different objects that are supported in MPEG-4. The ones you know are video and audio. And these are the most widely deployed. Video coding and advanced audio coding, MPEG-4 advanced audio coding. In addition to the video coding on the visual side, we have animated faces and bodies. And there's some companies that have products out there for animated faces. And I think the BBC has been looking at doing this because they have a legal requirement to do, talking heads for people that can't hear, deaf people. And they're supposed to be able to read lips. And you can do this with animated faces. There are two-dimensional, three-dimensional animated meshes. Those are little wire frames. Then you can project either still or even moving video into these wire frames. And then you can deform the wire frames. You get really intricate effects. And there's text, streaming text, and still text, and graphics.
And JPEG is also supported as a part of the MPEG-4 framework. You can just use JPEG graphics. And then in the audio side, we have generic audio from mono to 5.1 channels. And by combining different audio objects, you can actually go up almost indefinitely. You don't need to stop at 5.1. There's specialized speech codecs, synthetic sounds. This is very advanced. Structured audio is basically a language to program a synthesizer and then first to describe instruments and second to play the instruments. a score language. There's text to speech, which is merely an interface, but you can mark up text so that it can be regenerated as speech. And then there's something called environmental specialization, which is making stuff sound like it's in a specific place. You can describe the place.
So let's look at the parts, how this all fits together. First, there's the visual coding, and then there's the audio coding. And this is just decoding. I'll say a few words about this a bit further down in my talk, but it's important. MPEG-4 only standardizes decoding. It doesn't standardize encoding, and that's why there's so much competition between providers. It's the same with MPEG-2. And as you will see a bit further down in my talk, this provides for a lot of improvement in quality of these codecs.
And this is also why you have to be very cautious with statements from proprietary vendors about the quality of MPEG-4. It doesn't exist, basically. But you can get the best quality with MPEG-4. And there are fair comparisons to be made. But I'll say a few words more about that a little later. Then there's a systems layer in MPEG-4, which does stuff before decoding in terms of demultiplexing and buffering. after decoding in terms of presentation, which is the composition of the objects, and the systems part used to contain the file format, which is the MP4 file format, which is extremely close to the.3GP file format, which you saw Frank talk about in his talk this morning. The only difference is basically that there is a top-level atom that says this is a 3GP file which means okay I now have AMR voice coding support which is not something that is natively known to MPEG-4. But for the rest, it's just the same stuff.
And then there's something called DMIF, which isn't always used. You don't have to use it, but which would provide you with an abstract interface to the transport. And if you use DMIF, which has a little bit of a grandiose name, Delivery Multimedia Integration Framework, it stands for. It's actually a quite compact part of the standard. If you use DMIF, you can write your application to a transport layer, and then you only need to write separate interfaces to a disk or to a network or to a broadcast even. and your application needs to be further fully unaware of what it's talking to. And then there's the transport layer, which in principle is not in the standard.
And this is how content flows through. It comes through a transport. It goes through a demyth if it's present. Systems take care of demultiplexing of all the different objects. It's decoded. And then the decoded objects are composited onto the screen or into the sound space. And composition of audio could very well be, okay, I turn up the volume of the background a little bit, and I turn down the volume of the foreground speaker a little bit, or I choose the Japanese speaker rather than the English speaker. These are all possibilities by using MPEG-4 composition.
And there's two sort of orthogonal parts. Conformance, which contains a lot of bit streams. If you have a decoder, you can use the conformance part and see if your decoder is conformant. Gives you some level of indication of interoperability. In the MPEG-4 industry, we do much more interoperability work with exchange of bit streams. And then there's reference software, which is actually free of copyright. if you use it for building a compliant implementation.
There is something, even though in principle this is not in a standard, something called MPAC4 on IP, which is a specification on basically how to use IETF protocols and how to do the mappings. And more recently-- What's called advanced video coding was added to the MPEG-4 standard. And I'll say a bit more about that in a bit as well. And you will see that the numbers don't quite add up. There's more stuff that I don't think is important to talk about right now. Thank you. So let's take a look at some of the recent developments.
Hey, this slide was supposed to have been hidden. I want to first say a little bit more about the objects. I'll keep this a little bit brief. So we have video, which basically goes from 10 kilobits per second to over a gigabit per second. So if you take one set of zeros out, it's megabit. And if you take another set of zeros out here, it's gigabit per second. And Sony actually has cameras that support this stuff. studio profile it's called. Multiple rectangular or arbitrary shape objects in the scene. Scalability is supported, including fine grain scalability.
which has some support but not a lot yet. But it means if I have my full bit stream, I can drop layers of that full bit stream and you can still decode sensibly the picture, the audio, in this case the video. Sprites, you can use sprites for backgrounds. You can send them once and then you can warp the background to make the scene change. But you don't need to keep sending them as moving video. And then we have some types of computer generated visual information, synchronized graphics and animated text, face and body animation, talked about this, and the meshes with the moving texture, still our moving texture. Now for audio.
And there's a lot of stuff here, and I should say some of this will be used, and some of this will likely not be used. And that's quite okay, because we have these profiles, and people will pick what they need. Again, with audio, we can have a number of objects in the scene. So you can make your audio composition. I think the most important codec in MPEG-4 is MPEG-4 Advanced Audio Coding, which is very much like MPEG-2 Advanced Audio Coding. It has a couple of new things.
There's another audio codec for really low bit rates, but AAC is getting really low as well these days. And then there's one for extremely low bit rates, which is called HILN. And then there's a voice codec, actually two of them, one again for extremely low bit rates and one of them for normal bit rates. And at 24 kilobits per second, you have just basically transparent voice quality. You can't distinguish from real voice.
And in audio, you have again scalability so that you can have-- actually, it's interesting. You can build an AAC layer on a CELP layer if you wish even. So you use the CELP layer as sort of what's called the prediction, and then you can build-- And you can put an AAC layer on it if you, for instance, do radio. The basic quality goes in kelp because it's mostly speech. And if you want to have the really good quality, you do it in MPEG-4 AAC. And something like that is actually done in Digital Radio Mondial, or DRM, which is a digital broadcasting standard. And the conditions were such that you have to be able to receive it in very poor reception conditions, and then you have to get a good, really good quality signal if you receive a good, just a good signal, and that uses this type of scalability.
Synthetic audio objects, also I talked a little bit about this before. We have this orchestra language-- whoops-- orchestra language and score language. So with this language, you describe the orchestra. And with this, you describe the-- the music itself. And this really, at one or two kilobits per second, you can do really great music. And there's a company that's been working on this for a long time, and they were going to build a QuickTime plugin, and I hope they'll come out with it soon. It was promised for this summer. media supported, and a couple of types of synthesis. And then there's this text-to-speech interface, which you can use together with face and body animation.
Those were some of the more esoteric object types. Support today in industry is for AAC and for just normal rectangular video coding. And there's some companies that are trying to do more interactive stuff with MPEG-4, but they start with the systems layer. They have graphics. They have arbitrary shape stuff. They have semi-transparent graphics.
notably the binary scene description. As I explained, it's inherited from Vermal, but it's much more efficient, and it's added real time. It's basically married the MPEG-2-- MPEG-2 came from the broadcast world, right? So people know about synchronizing audio and video, about synchronizing different objects, and about buffer models and stuff. And the scene description marries these concepts from Vermal and from MPEG-2 great broadcast, great synchronization.
That's what allows the interaction. It works in 2D and three dimensional. And there's a couple of three dimensional players out there already. And it allows you to do dynamic scene updates. You can add objects to the scene on the fly. You can delete them. And you can change them all on the fly by using this scene description language. And to provide an interface with the smile world and to make it better authorable. MPEG later added what's called the extensible MPEG for textual format or XMT, which is basically a textual format for bits. And there's actually two versions of them. One of them is very close to the bits and one of them is more generic, but I'll spare you those details. But the important part is that there is a smile harmonization to the extent possible because there is a lot of smile content out there. Thank you.
What's very important in MPEG-4 systems is that you do get predictable behavior of audio and video, which hasn't always been the case with all the web, the internet technologies, and that you get predictable buffer management, so that you know if I send content, it will play on the player, because the player knows what to expect. It won't get trouble with buffer overflows. It's all predictable, and it's all standardized. - I'm sorry, I didn't realize.
There's some more stuff for smile integration here in the timing, which you can basically do a more loose timing of your objects. And what's important here is that while MPEG-4 doesn't standardize digital rise management, it has interfaces to proprietary systems. Digital rights management is not going to go away.
I think it actually could provide some useful features for end users, even though it's been portrayed as something that is hostile to end users. That's wrong. But in order for an ecosystem to support Serious content being deployed in the ecosystem, something needs to be done about this rights management. And I think Apple took a great approach with what's being done right now in iTunes. It's very user friendly. And basically, DRM needs to be-- you don't see it if you make normal use of your content. And that's what we're getting to see these days. There's a standard interface in MPEG-4.
And there's MPEG-21, which will bring more interoperability in DRM, which means it's no longer, as it is today, the monopoly of one big company, basically. And the file format, I already said it a couple of times, is based on QuickTime MP4, just like the.3GP file format, which is very close to MP4.
Quickly wrapping this up, there's MPEG-J or Java, which you can use for really complicated content, for having programmed content, basically, but also standard APIs to find out what you're talking to, what are the terminal resources and stuff. And there's some advanced audio rendering where you can basically create the sound without changing the source of the sound. You can describe the environment in which it should be played, Basically, you could say, OK, this is in a closet, or this is in a giant hole, or this is a football field. I can describe this with the audio rendering stuff.
And I realize I'm giving you a lot and a lot of details. We're going to try and talk about applications soon. But I have to make the case for the profiles first. If you have this huge toolbox of all this stuff, then in theory you would have interoperability. But in practice, there's not going to be a lot of interoperability because everybody's going to be implementing different things, right? Which is why MPEG defines profiles.
And these profiles are the conformance points, as we call them, which is, okay, this is where you can test the interoperability. A profile basically determines a tool set. I use these tools to encode my video. I use these tools to encode my audio. And then the level within the tool set limits the complexity and stuff like, okay, bits per second for video or the screen size for video or the sample rate for audio. These things are in the levels.
And if you take a look at the ISMA, the Internet Streaming Media Alliance, they have said, OK, we're going to use what's called advanced simple profile for the video. We're going to use low complexity AAC, MPEG-4 AAC for the audio. And we use the MP4 file format. And that's what we're all going to do. And now I have a, within ISMA, I have an interoperable. Thank you.
stack and they add some transport to that, which is something MPEG-4 doesn't define. And then you have the interoperability. And it's interesting to see that while MPEG-4 has many profiles, and I would say too many, and I'm part responsible for them as chair of the MPEG requirements group, it's very good to see that industry is converging on just a few, just a few.
which means there is this interoperability. And the ones they are choosing are hierarchical. So there's the simple profile and advanced simple profile, which are mainly used in video. Simple is what is now in QuickTime. Advanced simple is what is in some of the more advanced MPEG-4 players and encoders and decoders. But they're compatible in the sense that if you have simple content, simple profile content, it will play in an advanced sample player.
So that's good. And that's why you can see that people are exchanging content. And DivX, for instance, is an implementation of advanced simple. We have a couple of profile dimensions. I think this gets too technical to go into real detail. But all the elements in MPEG-4 have been profiled. That's basically the point of this message.
And actually, I don't think there are handouts, right, at this conference, are there? So we'll make this available on the M4F website. Can we do this? I think so. Yeah. So we'll make this presentation available on the M4F website, and you can download it if you later want to review it. Thank you. Let's look at recent developments. This is a very interesting development. MPEG-4, as we know it today, was standardized 1998, 1999. Some stuff was added. That's four years ago.
Very recently, a new codec was added to MPEG-4. It's called Advanced Video Coding. And while MPEG-4-- as it was until this was added, was very attractive to mobile and internet where there was no MPEG-2 yet. It wasn't attractive enough yet to the broadcast because in order to replace the MPEG-2 infrastructure or to add something to the MPEG-2 infrastructure, you really need good advances in coding efficiency. And while MPEG-4 Advanced Visual Profile provides this, it wasn't enough for these major investments in the broadcast industry. It was enough for the internet and for the mobile and stuff. But this new codec, which is called Advanced Video Coding, which originally comes from the ITU world, and they've been working on this for a long time also. Maybe the first I knew of the project was 10 years ago, basically. And it's the same codec standardized in ITU and in ISO. Actually, basically there's two groups in the world that work on video coding standardization. There's the video coding expert group in the ITU, the International Telecommunication Union, and then there's ISO, MPEG. Came together, formed the joint video team, hence the JVT codec, and standardized this new codec, which beats everything out there. So forget about what you hear from Microsoft. This is better.
And this has been confirmed by independent parties like LSI Logic, whom I have great respect for. And interestingly, and again, I will say more about that, improvements will continue because of the fierce competition in this market. There's really fierce competition. And we have only standardized the decoder. So people will come up with amazing encoders, basically. And this will give you about broadcast quality impact video at about 700 kilobits to 1 megabit per second.
Now that's significant because that starts to get in the range where you can do streaming over a broadband network, over a good ADSL connection or a good cable modem. It starts to get there. It's also good enough for people to think about, OK, I'm investing in a new generation set of boxes now. Maybe I should take a look at this new codec. It's also good enough for this to be implemented in mobile devices at some point in time. They already did support the basic MPEG-4, as I could call it now. And they will start supporting this stuff.
And it's amazing that what Apple is doing with the conferencing stuff, using MPEG-4 for conferencing, there's a lot of people lining up to do this, to use advanced video coding or H.264 for conferencing. There's a lot of industries waiting to start using this codec. And I'm sure, even though Apple never discloses its product plans, not even to me, I'm sure that they're working on this codec and they will have it ready pretty soon. Thank you. Then there's advanced audio coding and high efficiency advanced audio coding.
Now with high efficiency, it's a neat little trick where you can split the spectrum in half and then you predict the upper half of the spectrum. For you, this is the upper half of the spectrum from the lower half of the spectrum. And if you do CD quality or near CD quality or really good audio, just basically internet quality, this gives you a lot, really a lot of bandwidth savings. like CD quality or about CD quality at 48 kilobits per second and high quality at just general Internet quality at 32 kilobits per second.
The trick doesn't work for transparent quality. So if you want to have really transparent quality, which is something that iTunes is trying to achieve, then you would still use normal AAC. Then you don't gain anything from this prediction trick. But it's really neat. And it's being used in XM Radio and Digital Radio Mondial or DRM for their broadcasts because it works so well.
And now AAC, including high-efficiency AAC, been tested as the best coding by the European Broadcasting Union over all the proprietary codecs and that's what's being shown here and this should actually say AAC, Advanced Audio Coding. This is the original and then you see high efficiency AAC which was tested in one specific implementation called AAC+. And then you see AAMP3 Pro which actually is MP3 with the same trick applied, the same prediction trick. And you see Windows Media here and Reel here. And Windows Media 9, I have been explained by Audio Coding Expert, doesn't really differ a lot from Windows Media 8.
So this was a test at 48 kilobits per second. And whoops. Why does it do this if I don't want it? Hello? 48 kilobits per second, done by the EBU, which is really independent. And this was a really professional test, double blind, which means people don't know what they're listening to. And basically, because audio testing is a lot like voodoo. And if the experimenter knows what's being tested, he can make you believe anything. And if you know what you're listening to, You can also be made to believe everything or anything. But if it's double blind and neither the experimenter nor the listener knows what's going on, then you get really valid results. That's what happened in this test.
Couple of other developments. Some work going on on truly 3D video coding. That's very advanced. Some work going on on truly lossless audio coding. That's also for the high end. And there's some work going on on a very interesting animation framework, which actually takes a step back and says, OK, let's do this right, this animation. Let's create an integrated framework for animation of all sorts of graphics content. It's not for video content or for just natural audio, but for computer generated content. And we'll see where that goes.
So why should you use MPEG-4, apart from the technical details? And this is going in the business stuff. I will quickly go over this. I think if you're a developer, this may interest you just a little bit less. But let's just take a look at standards and why they make sense. They fuel a lot of innovation. And actually, this slide, I have to acknowledge Tim Schaaff, who who originally made this slide, for giving me this slide.
Standards Fuel Innovation. GSM is a great example. The European, or actually now worldwide standard for mobile telephony. And 802.11, also known under different names at Apple. But you can connect here to your wireless network. It just works great. They have a really long life standards. If you look at the TV standards like PAL in Europe and NTSC in the US. Or MP3, which is actually over 10 years old now. But it's still a premium feature. If you buy a car stereo, MP3 comes at a price. It's being built into car stereos and stuff and digital devices today. And it will not go away. No matter how great the successors are. this will have to this will kept being supported That means that as a consumer, you don't have to throw away formats every other year or every year. You just keep your stuff and it keeps working. VHS has had a long life. The CD has been with us for over 20 years.
Standards create huge markets. The CD, the DVD, and MPEG-2, which is a-- which are really, really multi tens, hundreds of billions of dollars markets. And they provide an interoperable ecosystem of tools and content where you can just use stuff from different providers and plug them together and it works. And these different providers can work independent of one master, so to speak. You're not dependent on anyone. You're not locked into any single vendor. And the vendor may be competing with you, by the way, if it moves into different spaces. Thank you. And the pricing is controlled by the market and not by a single vendor again. And if you don't like your equipment from one vendor, you can go to the other one.
So if you use MPEG-4, a couple of benefits for you. You can author your content once and then use it on a couple of different mini platforms and players. Encode once. You may have to encode at different bit rates, but like was shown this morning, this can be made really easy.
Your users can pick their favorite stuff. They don't have to stick to one player. Content providers, on the other hand, can pick their favorite stuff. Everybody can just provide tools in their own niche. There's a lot of different niches, and it isn't like one size fits all. and competition drives the quality up.
If you look at MPEG-4, it's both a revolution and an evolution. It's a revolution in what I explained about the design, how it works, and how it can expand to synthetic content. It's an evolution in the sense that it doesn't define used transport protocols and stuff. You can just use it on whatever is there already in place. And specifically, in MPEG-4 as it is today, and MPEG-4 with AVC, Advanced Video Coding, as it's coming now, it can all use in an MPEG-2 environment, which is a very big plus for broadcasters again. They don't have to replace all their MPEG-2 broadcasting stuff. They just need to plug in a new codec, which is difficult enough. But if the gains are good enough, then the economics are sound.
So it saves you money, and it makes you money, I believe, by making more efficient use of bandwidth, because it's efficient, by being able to repurpose existing content and now making it interactive or deploying it on a mobile network. No need to duplicate work if you go to different networks.
You can integrate it into existing MPEG-2 environments, and you can use it on IP networks just as easily. And it makes you money, because you can use your content in your networks in new ways. You can add new dimensions to content. And there's little risk, because it's a standard that's widely supported. Proprietary technology, on the other hand, does lock you into third party business and pricing models and make you dependent on their roadmaps and their plans and the way they choose to evolve their business. And it can get you into channel conflicts.
So this is just one of the forecasts. This is of chips. Standalone MPEG-4 chips and cores embedded in processors. They think it will explode. And it's already happening. I tend to agree. And there are many similar forecasts. And one interesting trend is, okay, for the coming few years, competition with Windows Media. After that, standard will win. Because the benefits are just so obvious that the market will choose for the standard. And there's such a lot of people already making MPEG-4 stuff It's amazing. Now, this is an important point, and I want to dwell on this for a while. Because MPEG-4 only standardizes the decoder, there's a lot of room for innovation.
And if you see comparisons, and I've seen very bad comparisons of, notably by Microsoft, that put QuickTime here and then their latest, Microsoft's latest codec on the other hand, and then they compare the quality without saying that they're only using Quick, a simple profile for MPEG-4, and that they did the encoding themselves. I mean, there's such a lot of tricks you can pull if you do quality comparisons. But if I look at MPEG-2, and this is really the proof of the pudding, MPEG-2 bit rates have reduced by over 50% over the lifetime of the standard and this is an underestimation. I will show you the graphs and this was after the standard was frozen and without needing to replace the decoders.
A great new encoder comes out, it just gets plugged into the broadcasting system, you don't need to replace the set-top boxes. Just works. People come up with great new tools for encoding DVDs, DVD players don't need to be replaced. Because the decoder is the same, the encoder gets better. And that's what's happening with MPEG-4 in the market today. And that's what will happen, really happen with advanced video coding, what's already happening today. And AVC, Advanced Video Coding, will beat all the proprietary codecs that's already up there, including Windows Media 9.
And if I look at, that's interesting, you should disregard the numbers here because they are wrong. What's actually right, I thought we hit this stuff. Someone's phone is ringing. This should be six megabits per second. When it started in 1994, 1995. Today, you can deliver the same quality in two megabits per second. So it's not like suggested one megabit, it's two megabits per second. But still, from six megabits per second to two megabits per second without changing the decoders.
That's quite impressive. And that's from harmonic. And if I'll take a look at what Tenberg says, Tenberg TV is a competitor of harmonic. They basically tell you the same story. With them, the graph should start at eight megabits per second. And now, whoops, this should have read eight. And here again, it should have read two megabits per second. Something went wrong in the conversion from PowerPoint to Keynote, but-- The picture is clear. From 6 megabit or from 8 megabits per second to 2 megabits per second today is huge improvements because there is competition. So there's open standards, it's interoperability, but there's a lot of room for competition.
So briefly, let's look at the deployments of MPEG-4. We see PC media player support. And a recent survey turned up on the MPEG-4 M4F tech notes mailing list turned up like some 20 different players. And some of them are for facial animation and for 3D content. Most of them do basic streaming.
QuickTime 6 is there, of course. Reel has a standard plugin, which means if you hit MP4 content with a Reel player, it goes back to the Reel server, downloads a plugin if it's not already there, and decodes the content. That's done by NVivio. There are several plugins for Windows Media. There's DivX, which is an MP4 compliant implementation, which has millions of downloads weekly, just like QuickTime.
And then MPEG-4, of course, is widely supported in third generation and 2.5G mobile phone networks. And like Roberto Castaño said this morning, it really becomes the case that in spite of all these different mobile networks, which are not really interoperable, you can take content from one of them in Japan and move them to Europe, and the content will play. Thank you. So you can take your phone to Japan, but you can send your content using the phones, and it will play.
MPEG-4 is used for video. AAC is the optional sound codec in addition to the mandatory speech codec. And the file format, like we said, 3.3GP is very close to MP4. It's just this top level atom and the AMR codec that's used. And Quick Term 6.3 recently released, of course, supports 3GPP. Then the Internet Streaming Media Alliance. I've said a couple of words about that already. Made a specification for interoperable MPEG-4 across the internet.
What's maybe more hidden in the background is that MPAC4 is becoming the de facto standard for security and surveillance. There's a lot of surveillance cameras with hard disk recorders and stuff that just use MPEG-4s almost silently because we don't and I don't get to hear a lot about them. And interestingly, you see a lot of home media centers that do MPEG-4. And people use DivX to rip their content and then they put it on a DVD and they put it in the DVD player and the DVD understands Epic 4.
And these are just a couple of recent announcements, and I won't mention them all, but there's chips, there's video cameras, there's solid state video cameras. These are cool. These are just this size, basically, and you can record on an SD card or a memory stick or something. You can record a half an hour of video and audio that's watchable on a TV. I won't say it's like DVD quality, but it's perfectly watchable just on a device this size.
There's portable stuff and it's coming more like there's video jukeboxes that use MPEG-4 and there's of course mobile phones that don't just decode but some of them also stream it. So it's not just in or it's not just recording, some of them can even play it out while it's being recorded. It's pretty cool.
So lastly, I want to say a couple of words about the MPEG-4 industry forum. And in that context, even though we're not responsible for it, about licensing of MPEG-4. Because some of you may have-- who's heard about licensing here, by the way? Yes, right. OK, so come back here Thursday morning. I won't be here, unfortunately, because we have our annual M4F meeting. But someone will be here to explain.
So let me say a couple of words. We're three-year-old now. About 100 members we have worldwide and across industries, very much according to Impact Force Vision, a nonprofit organization. We have these and many other members. Apple is, of course, there. And you see a lot of major companies, but there are also smaller ones.
And they come from IT. They come from the consumer electronics industry. They come from the mobile operators. They come from all across the globe and all across the industry. And they all believe in this single standard that works across everything. Our goal is to get MPEG-4 adopted. And we have done a couple of things that are important. We've discussed licensing a lot.
Again, I will say we're not responsible for licensing. I'll clarify that in a later slide. We've done a lot of interoperability with the program with over 30 companies exchanging bit streams between their products. We will have a logo program pretty soon. And if you type in MPEG-4 in Google, you come to the M4AF website and you get a host of information.
And this is our membership. If you're interested, $3,000 for a full membership and $300 for not-for-profits. But I won't dwell on that too much. This is important, though. And this is the last thing I'm trying to tell you. And then we'll have questions. The licensing. A lot has been said about licensing. A lot has been true and a lot has been false, by the way.
But the responsibilities are as follows: MPEG standardizes. So MPEG is the Moving Picture Experts Group, makes the standards. And by ISO rules they can't really deal with licensing. Although with the new codec there's been a lot of effort to get a royalty-free baseline codec, which is the simplest incarnation, it's a profile, and there's been a lot of effort to try to keep this royalty-free for licensing. Then there's the MPEG-4 industry forum, which has done a lot of work to to get licensing of the ground, but doesn't see anything of the proceeds, doesn't require anything of its members with respect to licensing. It's just literally a catalyst. If you know how a catalyst works, before and after the chemical reaction, the catalyst is unchanged.
Well, if the reaction goes really bad, the catalyst may go away. It may still happen. But then there's the license source. That's the people that actually have patents that sit together in some room and decide and sell licenses. And what M4AF says, the licenses need to be competitive. It should be possible to build competitive products given the licensing. So that's what we're working on right now, still working on right now. and actually working really hard right now to get this right for AVC, because we know some things need to be improved there. Thank you.
Monday morning, Larry Horn, I think, of MPEG LA will be here to answer your questions about licensing. He's going to tell you what's called the truth about MPEG-4 licensing. My personal opinion is that it's great for devices. It's great for phones. It doesn't work yet for content providers. That's what we're working on right now. And there's a lot riding on this, I can tell you. Hey, I hadn't expected this slide. Amy.
probably of particular interest. Oh sorry, and there's a lot more good QuickTime stuff too. All in your guide. Is it a Larry Horn that's coming? It's Larry Horn. Okay, I wish I could be here, but we just have this M4EF annual meeting and I need to elect a new board and all that sort of stuff and it's interesting too, but I wish I could just be here and ask Larry some questions. That would be a great session. Questions?