Digital Media • 1:04:22
The Mac OS X Quartz Compositor seamlessly integrates 2D, 3D, and multimedia content on-screen. This session details the Quartz Compositor's design and capabilities. Special attention is focused on how developers can easily build new classes of interactive applications by leveraging the Quartz Compositor.
Speakers: Peter Graffagnino, Ralph Brunner, Ken Dyke
Unlisted on Apple Developer site
Transcript
This transcript was generated using Whisper, it may have transcription errors.
Ladies and gentlemen, please welcome graphics and imaging evangelist, Travis Brown. Good afternoon, everyone. Welcome to session 503, Exploring the Quartz Compositor. We're really excited about being able to tell you about new compositor technology such as Quartz Extreme. But more importantly, we're really excited to be able to begin to articulate why Mac OS X's visual pipeline is organized the way it is. Over the past year, a lot of developers have expressed concerns about getting their pixels to the screen. And it wasn't readily apparent why we had architected the system the way it was.
But hopefully in yesterday's keynote, it sort of became clear that we're doing some very innovative things with regards to how we approach graphics acceleration on the platform. And the fact that Quartz Extreme is able to accelerate 2D content, 3D content, and video and accelerate them seamlessly and with minimal CPU overhead because it fully leverages the GPU. So this is what we're going to be talking about in today's session. But one thing I want to make really clear that's a point of confusion just on the early buzz that we've seen that Quartz Extreme has generated. is that even though Quartz Extreme is a hardware accelerated architecture, we still have been working and optimizing the existing Quartz compositor architecture that we've had in the system since we shipped Mac OS 10.0. So that's a good story for all our customers who are going to be running Jaguar. They're going to get improved performance, and we're going to talk about that today. But for customers with the right hardware in their systems, we're going to fully leverage that hardware to give them the best possible graphics experience. Now, to take you further and tell you more about this, I want to welcome Peter Graffagnino, Director of Graphics and Imaging Engineering, to stage. Thank you.
Thanks, Travis. Hi, everybody. Thank you. Welcome to our session on... Network server? No, I don't think so. Exploring the Quartz compositor. My name is Peter Graffagnino. I manage the graphics and imaging group at Apple. And what we're gonna do today is give you kind of a-- I'll do some introduction stuff, and then I'm gonna have a couple of the engineers who worked on the overall architecture and the OpenGL implementation come up and talk you through how we do the windowing system on OS X and how we accelerate it with OpenGL.
So some background comments. First of all, you've seen the architecture diagram before. We have three basic drawing APIs that people get to the screen with. We have Quartz, Quartz 2D for a 2D drawing, OpenGL for 3D, and QuickTime for video multimedia. And the compositor-- Quartz compositor sits underneath those APIs and blends the content from those APIs onto the screen to present it to the user.
So what we've really done is taking the windowing system, which we call the Quartz compositor, and made it orthogonal to any of the drawing models you have. So a new drawing model could come along, for example, and we could composite that into the desktop just the way we do everything else. So I think it's a really good thing in OS X that we clearly separated drawing APIs from window presentation.
And this is really not a new idea. The computer graphics industry for a while has been compositing together the results of different applications. In the old days, you had, you know, one program could do a sphere, another program could do fractal terrain, and you didn't want to run them all every frame, so you would, you know, if you had a bug in your fractal terrain renderer, you wouldn't have to run your sphere renderer every frame again. And so this idea of sort of caching results and being able to composite things together was developed in this 1984 SIGGRAPH paper by Porter and Duff, where they introduced the alpha channel and the whole compositing algebra concept. And what we're doing is just using that on the display in real time to create the desktop for Mac OS X.
So to look at what the architecture basically looks like at the block diagram level, we've got applications, as you can see on the left of the slide, drawing into some kind of buffer, whether it's a window backing store, a surface, or any piece of memory that can be composited. And that content then comes through the compositor and is presented on the display. This diagram is kind of the pre-Quartz extreme diagram where you can see the parts in yellow that are currently done in software and the parts in red, which were done in hardware in PUMA. What happens in the compositing step in PUMA is that any areas that are simple, that are opaque, that are pretty straightforward to get to the screen using 2D blit operations, are done with the hardware. And parts where there's a blending calculation involved, we have to blend-- do math with a CPU to get the blending done.
So this allows us to create a fully composited desktop experience, as you see here. We've got accelerated 3D in the lower left corner. We've got a demo from NVIDIA. The ubiquitous transparent terminal, which always goes over good at WWDC. And the volume control, you can see it composited into the scene. The transparent clock and QuickTime, 2D PDF, everything being blended together.
nice anti-aliased icons on the dock. So it really kind of ups the production values, if you will, of the desktop display compositing everything together. And with that, I'm gonna turn it over to Ralph, who's gonna go in more detail about the general compositing architecture on OS X. Ralph?
Thanks. So as Peter said, the Quartz compositor is the piece that takes the content provided by all the application and mixes them together to produce the final on-screen presentation. And the Quartz compositor in our implementation has quite a number of features. It can do transparency, as you know, from menus and volume control. It can do drop shadows. It can scale content, like you see that in the dock, where the dock icons are actually windows that have a fixed size and just in the composite themselves they get scaled to the destination size.
And this is how it works. So when an application creates a window, it sends a message to a process running on OS X. And that process, if you use top or PS, you will see it has the name of window manager, which is kind of pedestrian. But it contains the Quartz compositing engine. So the message goes over to the Quartz compositor. And the Quartz compositor allocates a buffer to contain the content that the application will draw on. That piece of memory is then mapped back into the application's address space. So both the application as the Quartz compositor have access to these bits. Then the application draws something into these bits. You're using various methods like Quartz2D, QuickDraw, QuickTime, or whatever.
At one point, the application decides, now it's time to present these final results on screen. So it sends another message to Quartz Compositor, which says, flush this. And Quartz Compositor then-- the whole machinery kicks in. That is the transparency and whatever. So if there is translucent content on top of your window, at that point, these two-- the content of the translucent window and your window get combined.
Okay, so word about flushing. For the most part, as an application programmer, you don't really have to worry about that. Cocoa and Carbon both take care of you for that. What's essentially happening in the event loop, the event comes in, it gets distributed to whichever object needs to respond to events, and when the control returns to the event loop, these frameworks will essentially call flush for all the drawing that has happened. Every now and then, you will have to call flush yourself. For example, when... Your drawing is very complex, and you would like to give feedback to the user by showing intermediate results.
So you could call flush every now and then in between to, you know, every second or so to show there's still something going on and the application is still working on it. Okay. So usually, however, we are more interested in flushing rates that, you know, are somewhere close to the display refresh so that the user experience is very smooth and everything looks like it's, you know, a physical object and whatever. So to achieve that, flushing is actually synced to the display refresh. So if you have a CRT, this means the flush will happen in a way that the entire update area will appear on screen in one piece. So the video beam will not cut through it while it is refreshing the cube. Similarly, LCDs have a refresh rate, too. So they don't have a beam, but there's still a rate at which the frames go over the wire to the display. So flushing is enforced to be synced to the display. However, from the application's perspective, flushing is asynchronous, which means when the application calls flush, the control immediately returns to the application because it really just sent this message over to the other process, which then does its thing. So in, uh... In Mac OS, since Mac OS 10.0, flushing has been asynchronous, and that's a gain, for example, if you have a dual-processor machine, because then the second processor will essentially run the window manager, which does the flush, while the first processor is still available for your program to do whatever it needs to do. In the case of Quartz Extreme, flushing is actually asynchronous in a different way, because in Quartz Extreme, the bits are no longer pushed by the CPU over the bus to the video card. It's just, it tells the video card, okay, here are the bits, DMA them over. So in the case of two CPUs, then you actually have two CPUs available during the flush. Okay.
So that allows you to do an interesting optimization. So if you would produce a frame and then do the flush and then produce the next frame and the flush and do them in sequence, well, it will take a certain amount of time to do the flush. Because it is asynchronous, it allows you to-- prepare the next frame that you want to present while the flush is in progress.
So in the best case, preparing the frame and flushing the frame would be exactly the same amount of time. Then you get a 2x speed improvement out of this. So important thing here is, in most cases, the application frameworks will take care of that for you. Because typical scenario is, event comes in, and you update your model of whatever it is the user is modifying with that event. And at the end, you redraw-- you update the screen to reflect the change in that model. So if you repeat that, then event comes in and update the model can run in parallel with the flushing of the previous frame.
So if your model is really trivial, like, you know, you have a little rubber band, so the model is really just updating a rectangle, then you will not get much out of it. But if you're doing some reasonable amount of computation, you can get a nice speed improvement by doing that. So the key message is here, most of the time, this is automatic for you. It's really the only case where you can actually spoil that is by, if the event comes in, and you immediately decide, well, let's draw this little control over there first, and then go off into all the competition because as soon as you hit the drawing surface the application has to wait until the flushes actually happen i would have a little bit more that okay first of all in the bottom right corner i will frame rate counter which measures how many frames come out of quartz compositing engine so whatever i do you see the needle goes up and tells you the frame rate of whatever is in doing So, let's turn this off. This little application does quite a bit of computation, and it produces one frame after another. So if I just start that, you see, this computes a lot of floating point math. And we've seen it runs, you know, 45 frames per second. It peaks, and then depending on how far you zoom in, it will drop over time.
This was the case of the synchronous flush. And the way I implemented this is there's a timer which fires 60 times a second, and I either compute the frame and then flush it, or I draw a single pixel in the top left corner, compute the frame, and then flush it. So drawing that single pixel will exactly do what I mentioned before. It spoils that parallelism you get for free. So when I actually remove that single pixel call, You see the frame rate peaks at almost 60 frames per second now. Okay. So that's it for the demo. Thank you.
So the important thing to notice there is, these frame rates I just showed, this is about the order of magnitude which we expect things to be. So I measured this machine we're using here, and the flush for this machine and this graphics card and this window size is about 9 milliseconds, which is, if you would, if flushing were the only thing you do, you get 110 frames per second. So that means if your application produces less than, say, 20 frames per second, then saving these 9 milliseconds is probably not gonna be a big deal, so really don't bother. And you probably have other places where optimizations are better-- optimization time is better spent than doing that. But if you're trying to get a really smooth user experience and you know you're in the 30+ frames per second already, then taking care of this might give you another 20 or something like that. Okay.
Another feature of the Quartz Compositor is that window buffers can be compressed. So because every window has its buffer, that can take up quite an amount of memory. And to take care of that, there's a mechanism in Jaguar which allows window buffers to get compressed when they are idle. So idle means if a window hasn't been touched by the application for five to 10 seconds or so, in the background, the Windows Server will go and take the window buffer away from the application and compress it. Because typical windows have, you know, a lot of white space and so on, compression ratios are fairly good. There's about 3 to 4x is typical. So, One neat feature about the Quartz compositor is damage repair can be done directly from compressed windows without decompressing the entire thing. That means if, for example, your Finder window has been compressed because it didn't do anything in the recent few seconds, and you have a text edit window on top, if you move that text edit window away, you reveal certain parts of the Finder window, and if the Finder window is compressed, we can decompress only the parts that have been revealed without decompressing the entire thing. So the message to application developers here is, well, if you don't do anything with your window, well, don't draw in it, and you get an additional nice memory and speed savings. Okay.
The cool thing about this is it's completely transparent to applications. There is absolutely nothing you have to do about it. Essentially, whenever a draw-- a primitive is drawn, at the bottom end, the Windows--window backing store lock is acquired. And at that point, if the window is compressed, it will get decompressed for the application. And you will never know this, hopefully.
Another feature about the Quartz Compositor is surfaces. So a window doesn't only have a back buffer. It can have a number of surfaces attached to it. Now, a surface is just an additional buffer that typically lives in video RAM, and it enables certain kinds of hardware acceleration. So, it is mainly used for OpenGL, DVD, and for QuickTime playback.
In that case, in OpenGL, in the OpenGL case, the surface is essentially a piece of VRAM where OpenGL can draw into it hardware accelerated. And in the case of DVD and QuickTime, it's a YUV surface so that the conversion from YUV to RGB can be done in hardware. And this is essentially the key to enable the seamless mixing of 3D, 2D, and video.
so here is the same architecture diagram again slightly expanded In the software compositor case, so that's the compositor that is in 10.1, and this is in Jaguar, if you don't have a graphics card that can support Quartz Extreme. you have applications drawing either in the window backing store, which is in-- DRAM, or you have OpenGL drawing into a surface which lives in video RAM, or you have QuickTime drawing into YUESurface which lives in video RAM, and the Quartz compositor ties everything together. So there are the proper paths through this green rectangle in the center, that if a surface is not obscured by anything, we take the fast path and just use the hardware to copy stuff into the frame buffer and things like that. Usually, essentially, the message here is there is no tax associated with the Quartz compositor if you don't do things like translucent terminals on top of your content.
Okay. So this is a look at what's going on, well, without going to extremes, meaning if you don't have Quartz Extreme. So blending is done with the CPU. So the Quartz Compositor contains a fairly hairy piece of code that can take an arbitrary number of layers at the same time and produce an output blended pixel. So it is, in a sense, it is optimal in terms of memory bandwidth. So every pixel that ends up on screen is read exactly once, and every pixel that needs to go on screen is written exactly once. And in the meantime, there are no temporary buffers available at all. Okay. This code makes use of multiple CPUs. So if you have a dual CPU system, then the screen update is actually sliced into two bands, and each CPU works on one of them.
There's also some degree of hardware acceleration. For example, when you move a window around and the center is opaque, then that part is actually moved with the graphics card without using the CPU. And that new bullet should actually be one further up as well. So in Jaguar, the entire compositor is tuned for the velocity engine. So you can easily say if you don't have Quartz Extreme, every pixel on screen that you have on Jaguar went through the velocity engine. It's kind of an interesting milestone to achieve that all the paths are vectorized. And that gives us about a factor of two in reasonably complex areas and about a factor of four in very hairy areas. So in the case of five terminal windows, five translucent terminal windows on top of each other, you get about a 4x speed improvement over 10.1. We also added 2D hardware acceleration for scrolling. So how that is done, it's actually just a... humongous amount of bookkeeping, essentially, to make sure that when the application scrolls and sometime later the application flushes, that at the time of the flush, we still remember which pieces we scrolled and can use the all-screen move for moving these bits. That, again, at the bottom level, has given us about a three-time speed improvement for normal scrolling distances, like about two text lines or something like that. And at the top end, we've seen, you know, in text edit, about 30% or so, because a lot of other stuff is of course going on in the system. Okay. So, with that, I'll give it back to Peter.
Okay, so I'm going to talk for a few minutes about Quartz Extreme from a high level, about kind of some of the motivations for why we did that, and then Ken Dyke is going to come up and go through some of the gory details and do all the fun demos for you. So why did we do this? Well, we knew that we were going to be going down a path like this for quite a number of years now. It's been pretty obvious that GPUs were getting more and more capable of doing things like this. And the model -- so we went for the model kind of before the GPUs were quite ready for it. And fortunately, we had things like Velocity Engine and smart people like Ralph who can make it work really well on existing architectures. But the model can be computationally expensive, especially when you start having to read pixels back over the bus in order to blend them. You know, if you're doing the translucent volume control over DVD, for example, is one of the worst cases. In fact, in Puma, we don't even-- we can't spare the CPU to do that, so we do it over a gray rectangle.
So--and also, there's issues about, you know, some concerns about us taking the frame buffer away, and we weren't really able to tell the whole story of why we're going down this architecture until this year. So it's good to be able to talk about Quartz Extreme and hopefully make you understand why we're kind of moving in this direction for our graphics architecture.
So Quartz Extreme is really just an implementation of the Quartz compositor on OpenGL. The desktop is really just a 3D scene. The nice thing about it is it removes the transparency tax for video and 3D. Since everything is over on the graphics card, if I have to mix the volume control over DVD, it's just an extra blending layer, and it's really not much extra work. It also frees up the CPU. The CPU's not involved in doing any of the blending calculations and can do other things in your application or for other users or whatever. And it allows us to kind of showcase the GPU and the user interface. So there's a lot of bandwidth in these cards, and I think over time, we'll be adding more and more sort of flourishes in the user interface, and maybe on your high-end card, you get a little fancier window animation or some extra effects. For the customers who spent the money, why not entertain them a little bit more? So, and I think over time, we've got a lot of headroom with this architecture. I think we're really gonna be able to do some dramatic things. And I think it's just sort of the tip of the iceberg right now.
To kind of motivate things even further, I thought I'd talk about programmed I.O. versus DMA. And in working with devices, there are kind of two basic ways. Back to computer architecture 101. You know, there's programmed I.O. where the CPU pushes data and commands to a relatively slow device, and it's not necessarily the most efficient use of the CPU. A more complicated model but more efficient model is to set up DMA for the device where the device can actually pull the data out of memory or commands out of main memory that it might need and process that while the CPU is free to do other things. And when it's done, it can either interrupt the CPU or the CPU can wait, depending upon what is appropriate. And so CPU drawing in the frame buffer is really sort of just programmed I.O.
and not very efficient. I mean, you think about your gigahertz CPU talking over 100 megahertz I.O. bus, it's really not the best thing to be doing with the cycles on the machine. So that's why we're moving more and more towards a DMA architecture. In fact, Quartz Extreme is complete DMA architecture. All of the window textures are pulled from the GPU, rather being pushed by the CPU.
The other important thing to kind of keep in mind in all of this is the kind of evolution rates of CPUs versus the evolution of rates of GPUs right now. For example, the traditional Moore's Law for CPUs is that performance doubles every 18 months, and that's proved to be roughly true over the years.
For GPUs, performance has been recently doubling every six months, and one of the vendors talks about the Moore's Law cube effect, where you're actually getting three times the exponential growth, three times in the exponent, because it's six months instead of 18. And another way to compare it is, in terms of transistor count, the G4 having about 10 million transistors, and the latest GeForce 4 Ti from NVIDIA has about 63 million transistors in it. So really, you can see that there's just more and more logic being thrown to the problem. And one of the reasons is, The GPU vendors have an extra degree of freedom in that they have a lot of parallelism in the problem they're trying to solve. You've got a lot of pixels on the screen. You need to do the same thing to megabytes and megabytes of data. You can just replicate gates and get lots of pipelines going at once. This is not really true for CPUs, which, you know, may have 4K page sizes and are just sort of nibbling at the memory instead of taking it in huge gulps like the GPU can. So it's clearly a curve we want to ride, and we think all of you want to ride, too, is that GPU-- for things that the GPU can do, let the GPU do it and do it fast.
And we think this is the beginning of kind of a new graphics platform for us. You know, every day we have new ideas of the kinds of things we can do with this, and I'm sure you guys will have impressed us with stuff, too. It's really kind of a next-generation windowing system. There are various hacks out there that do, you know, transparent menus and transparent terminals, but a lot of them are really just hacks. We wanted to do it right, and we have a full compositing engine and a fully composited model, as Ralph talked about. And we think that this is kind of an inflection point, if you will, in platform graphics in terms of really factoring out the windowed presentation and the layer compositing for the desktop from the--whether you're doing 2D, 3D, or video, a really nice architecture that's gonna have a lot of headroom as we move forward.
And it's a combination of a lot of things. There's obviously all the great work the vendors are doing with GPUs and just making them more and more advanced. And some of the great architectural work we've done in the OS to really treat the GPU as a--almost as a traditional DMA device, but with very high bandwidth transfers and getting the CPU disengaged from babysitting the device and just letting it take the memory. The other nice thing about Quartz Extreme, and if you've gone to-- well, if you've gone to OpenGL Early Bird Session, you would have found this out, that all of the advances that we do to-- that--in OpenGL to enable Quartz Extreme are available to the OpenGL programmer.
So all of the extensions we've done for FastDMA, texturing for synchronization with NVFence-- or, sorry, with AppleFence-- all of those things are available to you as a OpenGL writer. So we kind of made the bet that the things we were gonna need to do to OpenGL to make it great to support the windowing system were just gonna be great things to do to OpenGL in general. And I think that's worked out pretty well. So with that, I'm going to invite Ken Dyke up, who is going to take you through the Quartz Extreme implementation. Ken? Great. Thanks, Peter. - Is that good?
All right. So I'm going to talk to you guys a little bit about what's going on behind the scenes, what is accelerated, what's not, and sort of how we pulled off some of it. There's been a lot of speculation on the Web about what it does, what it doesn't do, does it take all my VRAM, et cetera. Right now what it does is it accelerates all compositor operations. This is the type of stuff that Ralph talked about. Window warps, all of the transparency, scaling effects, blending of drop shadows. All the type of stuff you see in the Aqua interface is now fully hardware accelerated.
It doesn't accelerate Quartz 2D or Quick Draw. That's all still basically being done with the CPU right now. Now, the caveat to that is obviously we can get those bits to the screen a lot faster than we used to be able to. We've done some measurements of this, and right now we can usually get stuff from your window backing store after it's been drawn up into video memory at about 400 megabytes per second. Now, you know, a lot of people used to go, "I can write the fastest copy bits routine in the world." I don't think you could get 400 megabytes no matter how well you did. And of course it's all implemented in OpenGL. So why OpenGL? Well, 2D is sort of done, you know? It's--nobody is-- nobody is doing anything with it anymore in the GPUs. They sort of, like, all stalled out at sort of the quick draw, GDI sort of level, and that's where everything's, you know, done. You've got, you know, foreground color, background color, fills, index support and all that junk, but it's not going anywhere anymore. So 3D, on the other hand, We've still got, you know, gigapixel fill rates coming. I forget what the quoted numbers are in the GeForce 4, but they're insane. We're starting to get lots of video memory, so it's not a big deal. If you want to get lots of window backings that are cached in video memory, for example, you can have DVD going and video and lots of 3D stuff all at the same time, and it's not a big issue. The other thing is the GPUs are finally getting good support for 2D data. Traditionally, previous generation hardware could do 3D pretty well. But there are a lot of limitations.
And I'll get into those in a little bit. But some of the current GPU stuff, like I said, it's really good and good for 2D content, believe it or not. And, of course, OpenGL is the industry standard for 3D. And most of the GPUs these days are built around OpenGL, regardless of what Microsoft might have you believe otherwise. OpenGL has a nice stable set of rasterization rules and and the way things are supposed to work and it doesn't change every nine months or so. Thank you.
So, and while we think OpenGL and Mac OS X rocks, and I want to like spend a little bit of time on this, because this is where a lot of the confusion I think has come in. We did a lot of work years ago looking in the overall architecture on OS X to figure out, you know, if we're going to accelerate the Windows server, what are we going to have to deal with? Well, the Windows server needs to put textures in video memory and 3D apps need to put texture and Quake wants its textures in there, too, and QuickTime wants video memory. So what do you do when you run out? Well, we can't have the windowing system stop work when you run out of video memory. That's just not going to work. So we did a lot of work towards virtualizing memory management in the system, for example. So if you've got Quake running and you've got DVD playing and you've got the accelerated Windows server running, they're all sharing the same video memory. Now, if you stop doing things, stop moving around Windows and all you're doing is sitting there playing Quake in full screen mode or something, it can get all the video memory. It's just, it'll page, we'll page everything else out. Don't worry about it. It's not a big deal. You know, and we did this too because, like I said, we can't have you create that 57th window and now all of a sudden you don't get Quartz Extreme working anymore. We really had to make sure that everything is paged virtually, just like the virtual memory management system in MoC.
We also had to do a lot of tight integration with the Windows server. The graphics drivers have a lot of communication back and forth. They know that when there's a surface that they can just blip the stuff straight to the screen, the Windows server can tell them, hey, this is completely opaque. You can just do the fast 2D blip. However, if there's something obscured, you've got the clock over it, or if you're playing DVD and you've got the volume control, the Windows system can tell the graphics driver, hey, you've got to get the core compositor involved because it's the only one that can make the display right for this. So that's the way it works. So, like I said, there's a lot of tight integration between the windowing system and the graphics drivers. And along with that, Apple is really heavily involved in the graphics driver development. We have full source code to all the graphics drivers. And when we need to make experiments or, you know, just generally toy around with ideas, which is a lot of how we came up with this stuff, you know, we can just go and do that. We don't have to call up NVIDIA and go, hey, could you try this thing for us when you get get a chance, we can just go and do it, see how it turns out, and integrate the sources back to them. Now, in doing this, we came up with some new extensions that were really driven by the compositor needs. And we'll get into these in some of the OpenGL sessions later this week. But just to give you an example, two of these GL client storage and GL texture range are used to let us get backing stores up into video memory at that 400 megabytes per second I was talking to you about. So the CPU doesn't have to touch any of that stuff. And the great part about these extensions is they're available to all you guys. You know, we're not cheating. You guys can use all the same stuff we're doing in the Windows So you guys should think of the Quartz Compositor as just another OpenGL app. It loads the same drivers that you guys have access to. You know, there's no cheating going on here again. And as Peter said, you really should also think of the desktop as a 3D scene. Every single pixel, if need be, can go through the entire OpenGL pipe. Anything you can do with a GPU, we can now do to any pixel on the screen. I'm not going to give away everything, but, You know, use your imagination. You guys have seen a lot of cool effects this week from some of the other demos, so you can imagine what types of things we could do. In general, everything on the screen ends up being a textured polygon, usually just a textured quad of some kind-- surfaces, windows, menus. Everything basically just turned in-- it's a window, it gets turned into a textured polygon. Same thing with surfaces. All the compositing is done with standard OpenGL blending and in some cases, multi-texturing. You know, it's all really pretty standard.
So in the Quartz Extreme world, the sort of green box has moved out of the way. He's really just sort of sitting in the driver's seat, telling the GPU through OpenGL how to get everything to the screen. So an application, for example, does this stuff, draws it into his window backing store, tells the Windows server, hey, I need to get this stuff on the screen. Quartz Extreme wakes up and says, okay, fine, I can do that. Tells OpenGL, hey, get this stuff into video memory and get it on the screen right now. And that's basically the way it works. In the case of OpenGL or QuickTime, generally the data is already on a surface in video memory, so we just have to turn around and turn that surface into a texture.
So does it work on my PowerBook? Question on everybody's mind. So these are generally the requirements we had. We're recommending 32 megabytes of video memory. That's not a completely hard limit, so it works on the second-generation tie books. AGB2X right now is definitely required for us to get the bandwidth we need into video memory.
We need the hardware to support all of the core graphics native formats without having to have the CPU touch the data. It's kind of pointless if we have to make extra CPU copies, reformat the data around to do a bunch of junk, and then put it in video memory. Because in a lot of cases, we end up having to expand the data, so then use even more video memory, and it's not really worthwhile.
The other really, really critical one is we have to be able to support non-par of two textures. Traditional OpenGL, each dimension had to be, you know, 1, 2, 4, 8, 16, 32, 64, et cetera, in each direction. With rectangle textures, it's pretty much arbitrary. And you can imagine the cases where we would need this. If you're playing back video, for example, DVD at 720 by 480, we don't want to have to, like, turn around and do funky scaling or anything, which would just tie up even more video memory to, like, convert that into a power of two texture. So that's really a showstopper on some hardware. I'll get into that in a minute. Multitexture is also required. There's cases where we have, like, alpha channels even with 16-bit windows, and we need to be able to combine the sort of 8-bit alpha channel with the 16-bit texture, and we use multi-texture to do that. And we're recommending 256 megs of system memory, and this just has to do with the fact that we end up tying up AGP space quite a bit, so you don't want to, like, wire down all the memory in the system.
So here's the quick upshot. All of the NVIDIA stuff is supported. GeForce 2MX, GeForce 3, GeForce 4MX, GeForce 4. Radeon AGPs are all supported, including the ones in the second generation tie book. And Rage 128 is not going to be supported. And it's not because it doesn't have enough video memory.
That's part of it. But it doesn't support all the texture formats we need. It doesn't support non-power of textures. The functionality just isn't there. It's not that I wouldn't like it to work on my machines with RAID 128's too. I can't have it. I don't know how to make it work. There doesn't seem to be any way to do that.
So, give you guys a demo, see if we can bring the system to its knees a little bit. Everybody's been doing the fun demos that look cool, but let's see what we can really do here. So let me get rid of Ralph's thing. So let's see, what else can I do to beat on this? Let's see. Oh, I guess I should have left that up there. So you guys have seen all these, obviously, before. Hopefully, anyway.
Again, I can get as many of these things stacked up as I want. You know, and this is in the case-- this is similar to DVD, in which case we're compositing on top of OpenGL content. So if I get these over here-- where'd you guys go? I can still bring all those terminal windows off on top of these guys, and you still see I can-- I can get it to fade out if I want, but it takes a while.
And even now I can sit here and still drag, drag this around pretty good. I don't think with the software compositor we'd be quite doing this good. It's fast, but it's not this fast. All right, well I've got other demos, but I'll have to save them for a little bit later. They're good though.
All right, thanks. Oops. So as you can see, there's really no transparency tax. If you've got a fast graphics card, you can just keep piling this stuff up. You know, like I said, I probably lost count of the number of layers there. There's probably 20 or so terminal windows, so that's 40 layers of transparency on top of DVD. Whereas before, we couldn't even do the simple little volume control. And obviously the GD works.
You can GD underneath transparency, all that fun Aqua UI stuff. So-- gives us a lot more CPU headroom, you know. I forgot--I should have run this and I'll try it again later, but even with all that transparency up, the CPU monitors are just sitting there doing nothing. I mean, it's like they're decoding the DVD and they don't have any other work to do. They don't have to worry about the compositing. It's not--it's not CPU's problem.
You know, as you can see, we've sort of got completely seamless integration now of DVD with the rest of the system. We don't have to worry about we need to treat DVD windows special because they take a lot of CPU horsepower. It's just like any other window in the system. And the cool part about this, and the demo sort of shows us a little bit, is this gives us a lot of exciting new possibilities on OS X, where before, you know, on like SGI hardware or even some PC hardware, you know, a lot of people said, you know, I have to have overlays. You know, I've got GL. I need to draw something on top of it. You know, you guys don't have overlay support. What are you going to do? Well, why do you need overlays when you can layer 20 windows on top of something with transparency?
Now you know, per pixel issues anymore. Um, another cool thing we can do, which I'll get demo here in just a minute, is underlay surfaces. And there's a couple other cool things too. So underlay OpenGL surfaces are sort of a new thing. In previous versions of Jaguar-- or previous versions of Mac OS X, surfaces were always basically sitting on top of the window. You couldn't get rid of them. You know, they were just always there. You could reorder surfaces relative to each other, but they were still always on top of your 2D content.
You know, people have always, you know, they'd say, I'm drawing 2D in my view or whatever, and I'm not seeing any of it. Well, it's drawing. It's just drawing into the window backing store, which is completely obscured. So with the compositor, though, it becomes pretty, I don't want to say easy from an implementation standpoint, but from a compositing standpoint, there's no reason why we can't flip this around and put the surface under-- surface underneath the window backing store. So now, if you want to do your 3D stuff, you can draw, you know, using Quartz or Quick Draw, you can draw on top of OpenGL, and as long as you've sort of punched a hole in it, you can see what's behind it, and we basically allow the GL stuff to poke through. Now, what this sort of alludes to is that there's still independent buffers. You can draw to your 2D, clear it, draw 2D, clear it, and you don't have to modify your GL content every frame. Likewise, you can sit there and animate the 3D content, And you don't have to redraw the 2D. It just sits there and gets composited by the Windows server. So leave and give you guys a demo of that real quick.
this real quick. Oh, where is he here? So here we go. So, you know, world's most complicated, OpenGL demo, spinning lit cube. I think this is about ten lines of code. But one thing you really couldn't do before in Puma, for example, oops, ten one, is sit there and just draw and paint with it. So this is just splatting with core graphics, a little round little thing. You know, so the frame rate just keeps up. I can paint in here as much as I want and go and clear it all out.
So all you guys have said, I want to do 2D over OpenGL. Well, now you can do it. You can just sort of clear it all out. Make a little Etch-a-Sketch thing. And if I just had to use metal, I was just like, okay, that's cool, it's in there. So another thing that would be common for an OpenGL implementation would be to do selection. So I can, just to simulate what that might look like, and just, you know, say this is a fairly big OpenGL surface, I can just now take my 2D and I go, okay, well, now I can just, like, do a selection similar to what Finder does. But I can do the selection stuff in 2D. I don't have to do it with OpenGL anymore. more. Unfortunately, I can't do both. I can show you guys this again, but the app's pretty simplistic. It only lets me do one at a time. So anyway, so that's underlay surfaces. Thanks. Go back to slides.
So how do you do this? Well, it's actually pretty easy. You can actually order a surface above or below a window. Obviously, above is the default. So there's basically a new method for NSOpenGLContext or an existing call in Carbon for AGL that's been there forever, AGLSetInteger. But now you can basically say, "Hey, the surface that I'm attached to "that my OpenGLContext is using, "I want to order that beneath the window." So it's pretty simple, as you can see. That's all the code it takes. You just call set values for parameter and pass in NSGL CP surface order, and you can set it to -1. Probably should use a better constant for that. Basically, 1 is above, -1 is below. Same thing for Carbon. Works exactly the same way.
So things to watch out for when you're doing underlay style drawing with OpenGL, you have to punch a hole using clear color in your backing store or nothing's going to show up. So for the Cocoa example, it's pretty simple, NSColor clear color set and just do a rect fill on your bounds. Or if you're using Carbon, I believe the correct call is CGContextClearRect, and that will paint the clear color in your backing store. So there's a couple of things to watch out for in this. It's really easy to just blindly write your sort of drawRect method for your NSView subclass if you're in Cocoa Land to just always update your GL content every time it needs to be redrawn and to repaint the clear. That's pretty wasteful. Usually only one or the other changes.
So you really want to watch out for that. A little trick I'll share with you that I found while actually doing one of these demos is to actually--when you know that the GL content is dirty, you can actually set a flag your view that says, okay, just the GL content's dirty. Then you can call the standard Cocoa set needs display on your view. The next part of the trick is you modify your isOpake call, or method rather, to always return the value of that flag.
So what happens when the view system goes to render everything, if it goes, gets down to your view, realizes that, hey, everything's, you know, you've said that you're opaque, it's just gonna stop right then and there and draw. And it won't try drawing anything behind you. So it's actually a neat little to do.
So the other thing to watch out for is extra flushes. It's also pretty easy if you draw with 2D, that's going to cause the Quartz compositor to do one flush to the screen. And then if you're drawing with GL, it can cause another redundant flush to the screen. So there are some cases where you kind of want to figure out which guy you want driving. In some cases, you may just want to draw some stuff in 2D, but you don't want it to hit the screen, actually, that's probably tricky. Let me back-- try it the other way. So there are cases where you can draw stuff with GL, but just call GL flush at the end. Normally we wouldn't recommend people make that call, but if you're using double buffered context, you can call GL flush, which will cause GL to get everything ready to go on your surface, but it doesn't put it on the screen yet. Then you can do some 2D drawing with core graphics over the top of it, and then everything gets swapped and composited to the screen at once, so you get a performance boost out of it. But anyway, I just wanted to mention that, because you guys will run into this when you try this stuff.
So the other cool thing that's my personal favorite, and one of the first things I did when I got this stuff running was transparent OpenGL surfaces. Now, you guys have seen all my demo materials been stolen for stuff up till now. But you can get the compositor to use the destination alpha channel if you've got a 32-bit OpenGL context. Now, obviously, you know, you guys can tell what you can do with this. You can get OpenGL, you know, objects to basically appear sort of standalone. Now, you don't have to do it quite like we've done in most of the demos so far. I'll give you an example of it in a second. Now the things to watch out for is that the surface content has to be premultiplied. Normally what that means is, or the way to achieve that in OpenGL is first draw your content on a completely black, clear background. You know, call geoclearcolor with all zeros and then geoclear, geocolorbufferbit, whatever. That gets you starting on something clear fundamentally. Make sure then when you draw that you've, if you're doing any kind of blending effects, that you've actually got blending enabled the alpha value that ends up at, excuse me, in the destination buffer has the right value, and that OpenGL has multiplied your color values times the source alphas so that the blending comes out correctly. So I'll give you guys a quick demo of transparent surfaces. So what I did is I took a little-- application that I've been fooling around with in my non-existent spare time.
It's basically just a Lego model viewer that I've been toying with. And it's also a gratuitous use of metal. close this. So what you've got here is this 3D object, in this case just a little train. But you can see that he's basically composited on the metal background. You can't tell that there's an OpenGL surface here at all. And the really cool part is that I'm actually using the GeForce 4's multi-sample support, so you get nice, let's see if this is gonna work here. Oh, it's off. All right, here. Sidestep here for a second.
You can actually see that the edges are all nice and anti-aliased on everything. So you get this really smooth effect, just sort of like it was drawn with Quartz 2D. can make everybody sick by moving it around. Stop that. Stop it. But, you know, obviously you don't have to do anything like this, but you could certainly-- as another example where I think this would be useful, is you've got some kind of really cool custom control you want to draw, and it's obviously better to draw it with OpenGL. You can now sort of do it, and you don't have to have the little black or white or blue box around your OpenGL content anymore. You can really have completely stand-alone-- completely stand-alone 3D objects in your windows now.
Now I've got another fun demo of transparent surfaces here. So if you guys were alive in the '80s, you probably have seen this before. But I've done a few tricks. I've played a few--oops, sorry. Idiot driver. Try it again. So I've done a few mods, made it bigger.
Added lighting. Now, the Amiga was pretty cool when it did this. They cheated a lot in this demo. I don't know how many of you know, but all they did was color cycling and play field animation, so they didn't have to blit or anything. I'm actually drawing this in real 3D, lighting it. Now, believe it or not, this whole thing is being composited under the desktop. Well, you say, "Well, it's opaque." Well, that's fine, but what if it's out of the window? So that one didn't get stolen. Now if that isn't enough, what if you want to play Space Invaders on your desktop?
How come, hold on a second. It's supposed to be underneath the desktop. That's no fun. Well, I'm not gonna go recompile it, but... See, I can sit here and actually play it. Doo-doo-doo. Completely useless, yes, but, you know, you're not gonna do it on your XP box anytime soon either. Probably a few years.
So fortunately not all my material got stolen for the keynote. It's kind of hard. Can I go back to slides, please? All right, so how do I enable that? I want to do that. That's really cool. I want to make fun demos for my friends. It's a lot, it's very similar to what we just did with ordering surfaces. In this case, you just use NSOpenGL CP surface opacity, and you can basically just say, my surface is now, you know, use the alpha channel. That's now important, and the Windows server will take, will do that for you. And I alluded to this a minute ago, but this is another one of those cases where I actually, this is the case where I ran into the sort of double rendering. When I first did the train demo, it was kind of slow, I was like, "Well, what's going on?" So it turned out what was happening was every time I had marked my view as basically-- I had to mark my view as transparent. I had to return no for isOpake. Otherwise, the window back-- metal background wouldn't draw. But now then what was happening is every single time I redrew my view, it would redraw the metal background, and performance wasn't that great. You know, it's-- it's a pretty expensive thing to do. So this is where I use that trick of sort of keeping track of whether or not, you know, I needed to update just the GL and to basically change on the fly whether or not I was telling the view system I'm opaque or not. So just an example of what happens again. If the view system, some, anybody else comes along the system, they call set needs display yes on me, like on a live resize. I basically let that go through as is.
And my flag still says, no, I'm not opaque. everything draws again. But if I've just, like, modified the GL, for example, and I set my flag that says, hey, it's just the GL, to true or to yes in CocoaPyLens, I return that yes basically for all of the isOpate calls I get until I've done the drawRect.
So then what happens is the view system as it goes down and draws basically stops with my view. It doesn't, it doesn't draw the texture underneath it and I only draw the GL part of it. So that was one way to get full performance. The other thing I ran into, which is just a minor little thing is notice there was no resize thing down on the lower right-hand corner of the window. That's typically drawn above surfaces and just due to some weirdnesses in the view hierarchy and where that thing lives and where it's drawn, that was also causing a performance issue. So in some cases you want to turn that off if you're using OpenGL surfaces.
So all this new functionality is pretty cool, but you guys are like, well, what do I do on a RAGE 128? Well, as it turns out, all of the overlay and underlay surface stuff and transparent surfaces are supported in the software case. You don't have to have Quartz Extreme enabled on your machine to do, like, the little LEGO demo like I did. If you're doing fairly small OpenGL surfaces, performance will probably be okay. I can actually run the-- the Boeing demo small, for example, if, you know, with a small surface, with the compositor-- hardware compositor turned off, and it does okay.
It gets really big, we have to pull a lot of data across the bus, and it slows down. And again, the main bottleneck in this is the CPU read from video memory. That's pretty tough to do. It's a 66 megahertz wide, 32-bit bus with hellish read times. Just--you read a byte and take a nap, and another one comes back.
So if you're doing something that you know is going to be really expensive and you want to maybe not do it if Quartz Extreme isn't there, you can call CG Display Uses OpenGL Acceleration on a per-display basis to figure out if Quartz Extreme is around. So if you've got, for example, GeForce 3 in your system and you've also got a PCI Radeon, it's very likely that the PCI card won't have Quartz Extreme enabled on it. And you can test for that so that if you move the window over to that screen, you can stop doing something that's so expensive that your Windows would think your app is slow or a bummer or something like that.
So anyway, so just the sort of real quick recap on all of this. The main thing we went for is, you know, reduced CPU usage. All of the effects in the world are cool, but if your CPU is dragged down into the dirt, it's not as much fun. The other point hopefully you got to-- I got across with the million terminal windows thing is that the compositing is really, really, really fast. You know, with 10 gigabytes per second of memory bandwidth, you know, which is probably an order of magnitude more than the CPU has access to, we can blow pixels all over the screen all day long, and it's pretty hard, as we saw, to drag the GPU down doing it. So if Quartz Extreme is around, take advantage of this stuff. It's, it's almost free.
We've now got completely seamless integration between 3D, 2D, DVD, everything else in the system. You know, you don't have to worry now if I've got my OpenGL CAD app or something and I want to pop up, you know, some kind of window on top of it for menus or whatever, like, for example, Maya's Hotbox, there isn't going to be a performance issue with bringing that up and down really quickly because the CPU doesn't have to go across the bus and composite a fairly big window anymore. So take advantage of this stuff again. And related to that... you should really take home that with Quartz Extreme, transparent windows and surfaces are better than overlays. You don't have just one. You have n. How many do you want? You know, do you want, you know, one overlay for selection, another one for statistics, you know, ten more for whatever you want? You can really do that.
shoot, there's a demo I forgot to do. I don't actually have code for it, I just thought of it. Um, so I'll just explain it to you so you guys know you can do this. Um... Sorry. This is what happens when all of your stuff gets taken. for um One of the other cool little hacks I did at one time was to-- with the underlay surface stuff-- well, actually, here.
We'll do this on the fly. This is completely ad hoc. Can we go back to the demo machine real quick? So this was not planned, I promise. Where is the source? So first we're going to-- interface builder here and pull up my really ultra complicated UI. One of the cool things I can do is just take some Cocoa controls and kind of drop them in on top of my GL content now.
I don't know if I'll be able to make this one do exactly what I want. Anybody know how to make that not draw the background? Let's see, actually, if I just do this. Oops. Save. Cross my fingers. So now I've got Cocoa controls on top of OpenGL. In fact, I can sit down here and type in a text edit control on top of OpenGL. And again, you know, you can see the text is all anti-aliased on top of the OpenGL content.
Ken's little ad hoc demo. But I mean, this can be useful, like, where you've just got, you know, you want to pop up a control somewhere dynamically over your GL content. You don't want to have to open another window for it. It's anyway just something I thought I would-- I can paint over the control and... be really bad and... You know, all that. So anyway. Alrighty, so Ken's quick ad hoc demo.
So I'd like to welcome Travis back up to go over the road map. Thank you, Ken. Hopefully the mic's working. Yes, there it is. Yeah, actually I just want to let you know, as you saw, we have a variety of demos. And even on stage, Ken was able to think up things that he can do with this new technology, Quartz Extreme. I think it's very important that this is really going to be a canvas for your imagination as developers. because a lot of the things that you were unable to do with various types of media that were playing back on the computer are now completely possible. So it gives you the opportunity to think outside the box of your applications and do incredible new things. And hopefully incredible new things will really show off the power of your application and also Mac OS X. What I want to do here real quickly is point out a couple sessions. We have a lot more graphics and imaging content for you here at WWDC. And for people, just in case you have been in other tracks, I'll just quickly step through most of the sessions that we have planned today in case you want to view them later on the DVD or maybe ADC TV. Obviously we had the Graphics and Imaging overview earlier today, and this is where we went over all the updates that we have in Graphics and Imaging for you at this year's WWC. A lot of announcements, not just Quartz Extreme. We also talked in depth about Quartz 2D and our PDF support. We also had new announcements there. And then obviously this is today's session, 503: Exploring the Quartz Compositor. Key thing is we've had a lot of innovations in OpenGL, such as programmability, and there's a session dealing with OpenGL programmability, which helps you leverage the latest in what some of the hardware, such as the GeForce 4 Ti that we've been demonstrating on today is capable of.
We also have an interesting pair of OpenGL sessions, which is Integrated Graphics 1 and Integrated Graphics 2. A lot of what we've been doing with Quartz Extreme is integrating the visual pipeline on Mac OS X so that these medias, 2D, 3D, and multimedia/video, are no longer independent from one another. They can be seamlessly used and integrated.
A lot of the techniques that we've used to create the Quartz Extreme are going to be available for you to learn and use in your own applications. And this is where you'll learn the tricks those new applications, those new classes of applications that you can develop that I just spoke about. So integrated graphics one, integrated graphics two, very important, particularly integrated graphics two. I think a lot more content relating to how to do compositing, advanced compositing, will be communicated there. Obviously, part of what we do in graphics and imaging is printing. We have a lot of information on printing. We have a session on Darwin printing, which is going to talk about the CUPS announcement, the Common Unix printing system that we talked about in the graphics and imaging overview. Also, we have a bit on Color Sync, and we're gonna actually show Color Sync used a lot of the power of OpenGL to do interesting things with dynamic color correction, doing things like color correcting media that you wouldn't think would be able to be color corrected.
We also have a general printing session on Mac OS X. We also have an OpenGL dealing with advanced 3D. And then an important one for anyone who's doing anything with OpenGL is to go to the performance and optimization. There's lots of new optimizations in Mac OS X's OpenGL stack that came about because we wanted to develop something as cutting-edge as the compositor. So that session is going to teach you a lot of the fast paths through the system to get the ultimate of performance in your applications.
We also have some announcements for the image capture framework, which is the technology that allows you to easily use your digital camera, plug it in, and have your digital camera just work. Interesting point is we've announced that image capture also supports scanners. There's going to be lots of information on scanner support in Mac OS X as well. Then a big one that's important if you're doing principally 2D stuff is graphics and imaging performance tuning. You know, big area. We get a lot of developer questions on how do I make my application go faster in a Mac OS X when drawing graphics. We're what I'd like to do-- well, before I do that, let me just let you guys know how to get in contact with me. Again, I'm Travis Brown, the graphics imaging evangelist. My job is to work with you to help you adopt technologies like Quartz Extreme and also listen to what you need, what you want out of Apple, the types of technologies and enhancements to the technology portfolio that we do have that you'd like to see us potentially provide. So I'm the conduit for that. So if you have any questions, send me an email-- [email protected].