Modernizing Your OpenGL Application for Leopard - WWDC 2007

Graphics and Imaging • 1:02:57

OpenGL in Mac OS X Leopard takes advantage of the most recent innovations in graphics hardware. Come see how advances in OpenGL will unlock the incredible rendering power of the GPU. Learn how to use powerful extensions to OpenGL to fully modernize your OpenGL code and ensure that it is ready for evolutions in the OpenGL specification. A must-see session for all developers who use OpenGL in their application.

Speaker: Allan Schaffer

Unlisted on Apple Developer site

Downloads from Apple

SD Video (171.1 MB)

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript has potential transcription errors. We are working on an improved version.

My name is Allan Schaffer, and I'm Apple's graphics evangelist, and this session is all about taking an OpenGL application that you've written or that you have in progress and modernizing it for the things that we've done with our implementation of OpenGL and Mac OS X Leopard. And as I was saying in an earlier session, there's really, there's sort of three categories that modernizations fit into when we're thinking of them in presenting to you.

The first category is that we've really added a lot of new features to our implementation of OpenGL for Leopard. Things like the multithreaded engine, 64-bit support. Other things I'm going to get into later in the, in the presentation. So that's kind of the first category. We want you to adopt those features really to keep up with what's going on on the platform.

The second category is that with OpenGL, we're tracking an external specification. You know the OpenGL is defined by the OpenGL ARB working group, now part of Kronos. And so as they do their work and you know refine the specification and add extensions and so on, we bring those into our implementation. And so some of this presentation is about that, it's about things that the ARB is working on that we want you to adopt to bring your application to, into the modern times.

The third category is that there's really a number of best practices. Either that are specific to the Mac OS X platform or things that are just in general now with GPUs, the best practices that you should be using for when you're developing using OpenGL. A lot of this centers around the different buffer objects. And so we'll be talking about that quite a bit.

So this presentation is actually the first of four OpenGL talks that we're giving here at WWDC this year. Along with an OpenGL lab. So the session later this after, Switching to Mac OS X OpenGL is intended for those of you who are coming to Mac OS X and OpenGL from another platform.

Either from Direct 3D to OpenGL or maybe you're using OpenGL on another operating system and just want to know how to get started with it on Mac OS X. Then tomorrow morning we have the GLSL and Tuning session so that's really for, especially for people who are deeper into OpenGL already and you have working applications, you want to know how to kind of take them to the next level. And then we'll all be in the lab all afternoon Tuesday until closing.

So for this session what we've got, how I've got this set up is that I'm going to spend a few minutes just talking about what is OpenGL and its role in Mac OS X. This is just some introductory material. And then after that we're going to talk about 10 ways that you can modernize your OpenGL-based application.

So let me just kind of go back into the past, you know when, when OpenGL was first ratified, 1992 with OpenGL 1.0. You know if you think back to the applications of the time, you know they were, they were really defined by the graphics cards of that era. You know, it was very expensive workstation class and higher-end graphics pipelines. And so those were primarily being used by people doing high-performance scientific visualization, or also on the other side being used by people in the entertainment industry.

Now this is a picture on the left here of a modern application actually. I just mean it to, to sort of illustrate the market that was enabled by OpenGL back at that time. But if you look at these models here, their sort of, they have a certain look, right?

They are garrow shaded, they have one point light source. In this particular case, there's no textures being applied. Pretty simple models, probably just triangle strips. And you know not using any, any real further uses of OpenGL. But really that's because that's what was the, you, know the main, the mainline use of OpenGL on those graphics cards back when it first started.

And then if you jump forward a few years, and think of the progression of graphics technology and hardware, what happened is it started to move down market into the consumer space. And what the consumers were doing, you know one thing they really liked doing is to play games.

And so once again, a modern image from a modern app but just to illustrate, you know games kind of started to come online and it was being driven by changes in the hardware. So once the hardware started to support things like multitexture and you know more complex blending operations and other things like that, you started to see these kinds of games, you know, come into the marketplace.

Of course, this is a shot from Doom 3 which is much more recent. But, just you know, games started to happen. Then likewise in the, in sort of the higher-end space, you started to see a move from geotypical terrain and outdoor scenes to geospecific data, meaning real-world terrain, real-world textures, being paged off the disk and moved into the, into the graphics hardware.

And once again this was driven by hardware changes, what was going on was that the interconnect between the host and the graphics hardware was getting faster. And so, it was becoming practical now to stream textures down to the graphics cards. And so you were able to do applications like this. Of course, this is a shot out of Google Earth, a modern application that, that's the Apple campus right there. And so, but you know, just to show you as the graphics hardware advanced, these new markets started to come online.

And likewise, the last example, texture memory kept increasing. You know sort of over the years as texture memory has increased, volume visualization has become more and more practicable using OpenGL. Once again we have a case here, this is the OsiriX app and you know even with a chat window in the front.

But behind the chat you know you can see a heart, you can see the skeleton and veins and things like that and nerve endings. So you know this sort of volume metric dataset started to become more applicable for OpenGL applications to go and implement as the graphics hardware advanced.

So that's sort of like the past in you know going through three-dimensional graphics. But so if you think about now, OpenGL in the present, I actually want to start somewhere else. OpenGL has become pervasive now beyond 3D graphics and we're using it in Mac OS X in a lot of places for 2D. And so you see some examples here.

The Window effects like the genie effect is the, as you, you know, minimize the window. Video effects coming in through Core Image, excuse me, through Core Video and fed into Core Image. Image effects as well. So these are things that are now being driven once again by hardware.

And so like for the image effects, just as an example, Core Image is implementing those using fragment programs. And so that's programming the GPU to do a complex image processing operation and move that work essentially off the CPU, move it onto the graphics hardware, which is a process that's really geared toward that kind of work. And we're able to really do some things you know outside of just this, the, the sort of expected 3D graphics and image processing space.

Then likewise in Mac OS X, there's a lot of our frameworks that are using OpenGL under the hood. And so I mentioned Core Image, so that's how they're doing their image processing effect. Core Animation, same story. So when, when a layer is being animated in Core Animation, that is actually, you could think of that like a texture being drawn on a quad in OpenGL.

Core Video, another case where you have, you know, Core Video is used to sort of tear frames out of a QuickTime movie and feed it through a pipeline. Well, the end of that pipeline can be resulting in a, you know, a Core Video pixel buffer which is stored down on the GPU.

Core, Quartz Composer likewise is really an OpenGL-based application. All of Quartz Composer's rendering is just doing, using OpenGL. And I'm going to show you that in a little bit. And then maybe the last one is kind of interesting now. Added in Leopard is that Quartz itself can be accelerated through OpenGL.

This is an opt-in thing that your application can do but you see some good speed improvements because you're moving essentially 2D graphics work onto the GPU, and just you know besides moving it, you're offloading that work from the CPU, and so the CPU doesn't have to go and try and fill every pixel anymore. That's the kind of work that the GPU is really tuned for.

So that's in the 2D space. Now let's think about the 3D space. The big change that has happened here is programmable shading. And you, I think many of you saw this, this example during Steve's keynote on Monday. This is a shot out of id's Tech 5 demo. Excuse me.

And I guess what this really represents is sort of that, that extreme leading edge of where 3D graphics is and where it can be right now. Using things like programmable shading, using really all of the power of the graphics card, you know for as much as the 3D pipeline can do for you.

And so I sort of present this as a, as a question for, for all of you. I mean it, I mean it very gently. But I want to ask, so does your OpenGL application, well, does it look more like this? Or more like the dinosaur from before? So, because if it, if it looks like where on that spectrum are you, you know? Dinosaur over here, John Carmack over here, you know somewhere in between is where most everybody else is.

And if your application is really way more on the dinosaur side, then, then you're the motivating purpose for this session. And that's why we've put together a list of ten ways for you now to modernize your application so that you can make it, you know, really make use of what's modern in graphic hardware. So okay, off we go. Number one.

This is really the most fundamental of all of the top ten. You've heard us say this before. If you've been to WWDC in the past, and so I apologize for one especially repetitive item. But it's so important that we have to just keep emphasizing it. And its vertex buffer projects.

And what this is is that we need for people, well, actually I'll say it this way. Let's look at the cause of the problem. So one of my colleagues has a copy of the Red Book, first edition, first printing. I think it might be signed by Curt Ackley. And so I looked in it and on page six, the very first example is this.

And if you look at this example, well, this is the, kind of the cause of our problem now a days. It's teaching people that, you know, the easy, easy way to get just something up and running in OpenGL is to use immediate mode, using calls like glBegin, sending a few vertices, glEnd, you know sending a, a color and so on.

And you know this was true back in the days when kind of the graphics hardware was pretty power, you know was kind of the, the most expensive part of your machine. The CPUs were pretty powerful and you had a good interconnect. But now a days, the overhead of doing calls like this is just so inefficient.

There's much, much more modern ways. And like if you've been following OpenGL over the years, you know that now a days, there's four different ways of sending vertices down to the graphics hardware. Why are there four? Well, because over the years there's been, you know, this was what was fastest in 1992, and then there was something faster in 96 and something faster in 2000 and something faster now. And so you know you've got to use these more modern methods. If you're code looks like this, then you're here, you know, you need to modernize.

And so the reason why this is such an issue is it really has to do with the transfer bandwidth between the host and the graphics hardware. And if we kind of open up these diagrams, so you know sort of stylized here on the left is the host with your application talking to OpenGL and the main memory is there. You have all of your triangle data and so on sitting in memory in immediate mode using or you know you're planning to use immediate mode or display list to send it to the hardware.

And then just graphically speaking on the GPU side, you have a vertex unit, fragment unit, and you've got video memory with you know your textures and frame buffers. Just conceptually there. So the problem with immediate mode is very obvious. It's that you begin sending very, very small chunks of data down to the graphics pipe, they get bottlenecked in the transfer, and then meanwhile also the vertex unit is just sitting there starving for work to do. Right? It's just, you know, it's such little bits of data coming across the pipe, its able to process vertices much faster than you know you'd be sending them one at a time. And so that's bad. Don't do that.

What we want you to do instead is to set up your data into a buffer object. That's the new box that's appeared under VRAM there. And what you're doing is your caching your data on the GPU. And so since its cached there, the internal bandwidths of the GPU are immense. And so when you go to draw this, it's a very fat pipe. It just goes flows into the vertex processing unit and then can go through the rest of the system.

And it's just much, much faster, much more efficient. And then likewise just think, you know, realize that over on the host side, basically no work is happening. So your CPU is, is free to go do other things rather than your CPU being busy bundling up vertices, trying to you know bundle them up more optimally and send them down to the GPU.

So this is kind of the basic of VBOs, vertex buffer objects. But usual, the usual criticism that we receive at this point is, well, Alan, my, my data is dynamic or it's too big to fit in VRAM, what do I do? You know I can't just push it all down there on paging data, just whatever. I've got some special case. The, the answer for that is that actually you can set up dynamic buffer objects over on the CPU side in regular RAM.

And so these are still considered, you know, treated like vertex buffer objects in terms of being packed together optimally and sent down to the graphics hardware. And actually what's nice is that it uses a DMA transfer to kind of more optimally send them down. So the thing to know about this is that the worst case scenario with, with dynamic buffer objects meaning you're changing like every texture, or sorry, you're changing every color, every, you know, normal, you know, every vertex, every frame. Generally speaking, that ends up performing faster than the equivalent in immediate mode for big datasets. Just because the transfer is more efficient.

But okay, so now let's look at a little bit of code. So the steps for using a vertex buffer object are here. First, you're just defining a buffer and then creating it. So in this case we create it with, we're passing NULL for the data. And this tells OpenGL that we're going to give it the data a little bit later, that's what's going to happen in step three.

So now essentially that step two is like a big malloc. And now we're asking for the result of the malloc back. The MapBuffer maps the data back into our address space and then we can insert code to go and modify VBO data however we like. And then when we unmap that sort of sends it back into the storage in OpenGL.

So that's all, step one through three is the set up and then step four here is what you do during draw time in your draw loop. You're just binding the buffer, or you're setting an offset to tell OpenGL kind of where in that buffer you want to start drawing. Telling it what's in there and then, boom, you go and draw those arrays.

Okay? I want to mention a brief aside. So about this code in these slides. So the, we're not ready to hand out slides or anything, but if you, if we're going to go too fast for you to write it down, my email address will be at the end of the present. You can email me and I will reply back with the text of the code in the slides, okay?

( Period of silence )

So that was number one. Replace immediate mode with vertex buffer objects. Number two.

This is a feature that we added first in, I believe, 10.4.8. On Intel and now in Leopard it's going to be across all the platforms that Leopard supports. So the idea of multithreaded engine is that, you know, usually your application is just sort o,f you know, your application is running, it's talking to OpenGL. OpenGL is taking some amount of your CPU time also. And then OpenGL is talking to the graphics hardware. Now the systems that we're shipping now, you know today, are multicourse machines. And so, in many cases, there's extra CPU horsepower that you can offload some work to.

And so that's what the multithreaded engine is all about, is just giving your application more time on, you know, sort of in the main thread. And then spawning off another thread that's going to be the one that's doing the work to bundle up OpenGL commands and send them down to the graphics hardware. Now you notice there's still a little bit of blue on CP1 there. That's because your application is still communicating with OpenGL and there's a little bit of overhead.

OpenGL has to take your calls and send them over to the other thread and so on. So that's happening. But then CPU2 or CPU core 2 is the one that's going to communicate with the graphics hardware. Now the main benefit of this is that when you issue a OpenGL call once this is turned on, all that happens is that call gets queued up by the other thread. And your application thread returns immediately and can go on to do more stuff. And so you get a performance boost because of that.

So to turn this one, it's really just one line of code. You have a pointer, or you have a handle to your context. And you just call this CGLEnable to enable this. You can also enable it from within the OpenGL Profiler just using a checkbox. You can force on or off. So that's a good way to test.

But so there's a little bit of fine print here. So with the multithreaded engine, first of all, you know, there's a certain profile of applications that really are going to benefit from using this. If you are CPU bounded in your application, then that's great. You are farming the work that OpenGL is doing off into another thread and that gives you more CPU time. So that's great. If you, but you know, cases where it won't help you is if you have a lot of what we call client/server sync points in OpenGL. The number three on my list is all about that. So we'll come back to it.

Obviously it only helps you if you have multiple CPU cores. If you enable this on a machine with just one CPU, and you know, and one core it just has no effect. Also this is called the multithreaded engine, but it doesn't change the threading behavior in relation to OpenGL and contexts itself. So you still, just as before, you still want to have one rendering thread per context.

And that's basically no change. This doesn't add anything to that. And then I mentioned there's some tradeoffs here. So there's you know there's additional buffering, there's queuing that's going on. And we have to copy the data over to the other thread, and so there's an overhead associated there. But the main thing with the multithreaded engine is that you have to get rid of the sync points. And it's just not going to do you any good if you're using immediate mode. So you have to move over to at least to use VBOs.

So that's number two, use the multithreaded engine. Number three, okay, these sync points that I'm talking about. So the basic idea here is that you want what the CPU is doing and what OpenGL and the GPU are doing to be as decoupled as you possibly can. And so you know OpenGL works in a client and server model and there's some calls that when you make them in your app, you know like glGett or glGetError, and so on, well, your app, you know, your app waits for the OpenGL framework to reply back. And there might be things that the OpenGL framework has queued up that would have to complete before it can give you that answer. So you, in many cases, you're waiting for the pipe to drain on the software stack. And just blocked there waiting for an answer back.

And then that's sort of case number one. Case number two, even worse, is that there's some calls that have to wait for an answer back from the GPU. Which means that you have to wait for the whole graphics pipeline to drain before you can get your result back. So like glReadPixels, for example, if you just call that, that's going to, your process is going to block until the result of the glReadPixels comes back to you.

And then it's even worse because it's blocked and, and you know, once you start up again, now your pipeline is dry and you have to start filling the pipe. You know, and the pipe is going to be frontloaded at first, it'll take a while for things to get down.

Now getting this perfect is kind of a tough balancing act. But there's a few steps that you can take to really just make this better in your app. And number one is to just methodically go through and look for places where you call glGet. And those other synchronizing calls.

You know, you can set break points in OpenGL Profiler, you can use grep, whatever is your favorite method. Just go through and every time you find one think, do I really need this? Like, do I really need to ask GL is lighting enabled? And aren't I the only one who set it? Like I could just have a Boolean that keeps track of that. And by doing that, that's removal.

Bing, one sync point. And so you know once you finally make your way down to zero, you're golden. The, another one that's really important and especially because it's seen in a lot of documentation, is glGetError. So you know glGetError is great. You can use it in your code, you know sprinkle around, do whatever, you know, all the things that I've seen people do, but when it comes time for production use of your code, you need to get rid of those.

Because every one of those glGetErrors is a sync point. And what I see a lot is people trying to debug something will just start inserting glGetError kind of like a print F. You know like where is this problem coming from? And then they leave it and you know their performance suffers.

Number two, if you have intentional sync points, that's bad obviously. glFinish, you know, in the Mac OS X is usually not a necessary call, as compared to like on Linux or other platforms in the past, where it's just been sort of a standard synchronizing operation. Instead it's best to just letCGLFlushDrawable do a glFlush. Now there are special cases.

So sort of if you fall into maybe the, your way more on the, you know, the shader end of that spectrum from the dinosaur that I was talking about. Okay, you probably know of some cases then where you still need to call glFlush. But if you're a developer who's just adding a glFlush because you think, oh, am I supposed to do this here? No. Just you know in most cases, you can go and take that out of your code. A few other things just to grab bag.

You know you can avoid a lot of state setting. And that can help you avoid sync points. You know OpenGL is just a big state machine where you're kind of tweaking all these knobs everywhere and then saying, okay, you know fire me another frame. And so the more that you can, as much as you can minimize that, that's positive here. And usually you do that by sorting by state.

And then something more advanced. If you are using fences, just make sure that, that you, you know you set the fence early and you go off and do enough work so that the fence will have completed or whatever you're fencing will have completed by the time you go back and test the fence.

But okay, so that was remove client/server sync points. The take away there, go through and get rid of glGets. Number four, this is how you can do that. Master the developer tools. So in Mac OS X Leopard there's a number of developer tools that just are the standard ones that we use when we're trying to do performance profiling on an OpenGL application.

You know, we usually actually start with Shark and then go into the OpenGL Profiler and run Driver Monitor in addition. Also there's a tool that's coming called the Shader Builder. And I think there's sessions here at the show about Xray as well. So since we start with Shark, I just want to make a quick mention there, I'm not going to go into Shark in too much detail.

So you know the things that you need to look for in Shark are actually, you know, am I spending a lot of time, you know, in disk access. Am I spending a lot of time calling some, you know, JPEG load library? You know like where am I spending the time in my app? But, if it is in OpenGL, if that's what's showing up, then there's a few tell-tale symbols that you need to look out for. So glgProcessPixels.

If you're spending a lot of time in it, usually means that you're spending, you know, you're doing texture format conversions or image format conversions. If you see a GLRendererFloat, it means you did something to fall off the hardware path and now you're in the software renderer and that's almost, you know, if it's unintended then it's bad. And then finally, if you see gleFinishCommandBuffer, it means you fell into one of these client/server sync points, you're spending too much time there. But okay, so for the next, I'd like to show a quick demo of the OpenGL Profiler in Leopard.

So I've got it just running here already. With a Quartz Composer composition. As I mentioned, Quartz Composer is rendering through Open L. So it just, it makes a very convenient test case for the Profiler. This is the Profiler main window, up here. I'm using, oops, let me zoom in so you can see that up here.

The place where most people start in the Profiler is, is here to either launch an application or to attach to a running application. In this case, I just launched it right before we started here. And now there's a number of different views that you can go. A trace, to statistics and so on.

I'm going to start with the statistics. So this is just showing statistics of what we're calling, they're just totaling up every frame. And it's always interesting to kind of look at this to see where the GL time is being spent. But, you know, it's, there's really some more specific things that you can do. Now the reason why I want to show you this is if you're developing with OpenGL on Mac OS X, I expect you to run the Profiler.

I mean, I expect the Profiler to be in your Dock, to tell you the truth. And so you know, it's something that's just very, very useful to have. And I want to show you just a usual pattern of using it here. So I have this statistics window up.

What I want to do actually is slow things down. I'm going to set a break point over here. This is the break points windows, and it's just showing different to OpenGL calls. And I can set a break point here at CGLFLushDrawable, there. And my rendering start stopped. It shows me the stack trace here if I want to see that. But more importantly, the statistics window stopped going. I can clear that and now advance a frame, and now I can see what OpenGL calls are being made in one frame in this application.

And you know, there's some really interesting stuff in here. In one frame, we're calling glGetError 121 times. Okay? So that's a problem. We're calling glGetIntegerv 544 times. glIsEnabled, 359 times. Y You know in one frame. So that's, you know that is what close to 1000 sync points, right there. That would be good to go and at least justify for why they're there.

Now I happen to know, you know, since Quartz Composer now is extensible, there are certain cases where you know an extension could be changing the GL state and so Quartz Composer has to go and double check it and maybe restore it. But that's why there's some of these. But there's others that are maybe a little more interesting. And so I'll show you, you know, another thing that you can take a look at. That's the trace window.

So clean up a little bit. So here's the trace. And once again, if I advance a frame, this shows me all of the OpenGL calls that I just made in that one frame. Right, and I can scroll through the list. I can look at see if I, there's a button I can check so that if I click one of these, it'll show me the stack trace at that particular moment. But you know, let's, I saw something right down at the bottom here.

Now let me see, where is that? Here's a call to glGetError. And check it out, there's another call to glGetError. And there's another call to glGetError. And there's another one. You know, like there's some things where if you go and actually look at the trace of OpenGL calls that you're application makes, you might end up embarrassed, right? Because there's things in here like, okay, you didn't realize that your clean up function, you know, your destructor for every class in your scene graph hierarchy is calling glGetter.

You know, these kinds of things are what you should be looking for. And you know, this is the truth. Right? And so it'll show you. Do it in, you know, look at it in private, but...

( Laughter )

Go fix that stuff. All right. So thank you, we'll go back to slides.

( Period of silence )

So number four, master the developer tools, takes us into number five. Okay, 64-bit. So the idea for 64-bit in OpenGL and primarily in scientific graphics or especially in scientific graphics is to get big address space. You can load big datasets. You can move the processing away from disk IO and some of the other bottlenecks that are frequently found there and move, you know, essentially change the balance point again by moving to 64 bit. So you know essentially you have the ability to have a bigger dataset.

There's also sort of a hidden benefit and it's on Intel. You actually have twice as many registers as the 32-bit process. And so that means twice as many things end up in registers instead of on the heap. And it's just a nice benefit you see some performance gains there. Now you know 64-bit just, you know, the 30-second overview.

We use the LP64 model. That means that longs in pointers are 64-bit. And of course, you have this huge, you know, theoretical address space, more than physical RAM you can put in the machine certainly. Now between that last slide and this one is where you ported your app. I'm not covering that topic. So you know, a lot of, there is some porting work that has to happen to move your app to 64-bit.

And so this is like once you're done, how you can make a binary. We have a number of sessions here at, you know, at the conference for how to, you know, for issues in moving to 64-bit. d You know, my best advice to you is look at compiler warnings, and you know, that'll tell you a lot of what you're going to find.

But it's just a checkbox here once it's ported over. Or if you're a command-line guy like me, then you can use these command-line arguments here, -arch for PowerPC 64 and for Intel 64. Or for Xcode ability, you just add a few more variables, or a few more items into that variable.

Okay? Something to remember as far as 64-bit goes is what's actually been deprecated and what's not available in 64-bit. So there's no 64-bit support fdor QuickDraw. And so that means that some of the routines in our AGL implementation, excuse me, are no longer available or have replacements now.

And so you need to go and check that out. The thing to know is that, okay, so this is talking about 64-bit in Leopard, and so if you have a 32-bit QuickDraw-based application, and incidentally, that's redundant. All of QuickDraw-based applications are 32-bit. They will continue to work in Leopard.

Or you know, they won't break because of this change, I should say. All right, so that was number five, build for 64-bit. d For number six, I'd like to invite Geoff Stahl, the engineering manager for the 3D graphics te=am up to the stage and he's going to talk about adopting GLSL.

- Thanks, Allan.

( Applause )

So I'm going to talk about adopting GLSL and go through a little bit today. And then tomorrow in the morning we're going to talk more in depth about GLSL in general. So first thing I want to talk about is what is GLSL? If you haven't used it before, it's a high- level C-like language which really kind of simplifies the programming of OpenGL.

It allows you to directly program what you want to program rather than setting maybe your color mask or your colors and try and draw it all with vertices and fragments. It really allows you to actually define a program that defines your vertices, define a program that defines your fragments. And now it will allow you to define a program that defines geometry also.

So it's a rendering in the OpenGL API. One of the keys here is that it allows future access to the power of the GPU. Right now if you take Ope GL 1112 even through 15, there are features that are being built into GPUs you just can't get access to. Geometry shader, brand new, fourth generation shading extensions. Not available through any non -programmable API. So if you're not on a programmable API, you can't get access to those features.

An example at the bottom of the slide is what it kind of looks like. You got some vertex attributes coming in, uniforms are like constants to your program. And the variance in this case a vertex shader is something you're outputting to the fragment into the pipeline. It's a main like you would expect in C, and basically, in this case we're going to do some work with the vertices, set a texture coordinate, and output your GL position and then will go to be interpolated for your fragment programs.

So why do you want to adopt GLSL? A lot of folks that Allan may have talked to in the beginning of the session have like the dinosaur kind of app. What benefit is GLSL in your dinosaur? The thing here is I think it opens up a really large set of new types of apps and new types of kind of programming designs for OpenGL. Because this really is approachable.

It's C-like language. You can just kind of type what you want like I showed in the example. And you don't really have to think of how can take a, you know, use a texture combine stages and combine different texture, you know, different texturing to get the effect that I want? You actually can just blend them in a small program that you write.

It's also easy to experiment with. We give you some tools, we give you Shader Builder which we're rewriting for Leopard. We also have the Profiler works that's hooked in, and we have an Editor sample out there which you can use, and it allows you to very easily just on the fly write shaders, experiment with how they fit together, and look at the output in a quick-turn kind of iterative development environment which may not be if you're setting a lot of state and trying to figure out kind of where you went awry, it may not be as easy to do. It's also, as I said before, you get direct access to the graphics hardware. Vertex shaders, fragment shaders, geometry shaders, those all allow you direct access to the newest features of graphics hardware, and we'll continue to really focus our effort there.

And it's fast. You can directly access the power of the GPU. I'm going to show a few demos, and one of the keys to programming modern GPUs is understanding kind of what Allan was talking about with where your dataflow is. Are you taking vertices from the CPU moving them across a very small bus one at a time to the GPU and starving it for data? Or are you statically putting your vertices and your geometry on the GPU and allowing the GPU to run full open and run all those effects?

You want to really minimize those synchronization points. You want to maximize the amount of things a GPU can do on its own without having to come back to your program, and a lot of the topics we're talking about today really are down that road. And the GLSL allows you to put programs on the GPU that run natively on the GPU and stay there, and that's where it gets data from and executes data. And also really from an industry standpoint with the ARB and from the Apple, that's our focus moving forward. Our focus is going to be GLSL modernized in the OpenGL API for using programmable hardware.

So one thing I want to show here is talk about the fourth generation shader support. So there's been some questions. We just brought out a new laptop with the Invidia 8600 MGT, part energy force, G Force 8600 part, and it is a fourth generation capable shader for Leopard.

We're going to have geometry shaders, GPU shader for transformed feedback and bindable uniform extensions, all part of the new modern fourth generation shading extension. And what those things are, geometry shaders allow you to manipulate geometry downstream of the vertex process. And so you have a set a vertices maybe that form a triangle. Well, that triangle goes into a geometry shader and you can do things with it. GPU shader allows you to use large texture arrays, integer texture arrays.

Transform feedback allows that bottom feedback loop that you may see which allows you to go out from a geometry shader back into your inputs and do an iterative on the GPU. Again, you're not synchronizing across the bus, you're not reading it back to the CPU, iterative work on the GPU with using the full power of the GPU.

And finally bindable uniforms allow you to take these large constant set, bundle them up in packages and submit them to a large number of shaders at one time. So for GLSL we're going to talk more in depth tomorrow morning, but right now I'm just going to go over, do a little demo. Show you some of the things we can do with these new fourth generation shading extensions.

( Period of silence )

Okay, this is very much like dinosaurs. This is, the only thing interesting here is that this is running through a geometry shader. And when we move to the next version you'll see that what we have here is exactly what you saw on the last one, but all this is occurring on the GPU.

It's not very interesting in a small shader like this, but all we sent was the original geometry down, all the original vertices, and we then sent a single uniform at that point to manipulate over time where the inset and outset of the new geometry we're actually coming out of the geometry shader. Again, not, not too bad, but we can do better. Let's see, let's move up a little bit in the idea here.

Red dots, ten control points. These lines are all draw with 10 control points. I didn't send down line segments, I didn't send down any kind of additional pieces. I sent down 10 points to the GPU. What we can do is we can-- you'll see that's really the geometry that you would get if you wanted to draw just a line between them.

Geometry shaders allow you to get line adjacency. They allow you to figure out and kind of trace along your path so to speak. So you can add in iterations and you can add in smoothing. So you have a-- you can add in a very smooth curve, and you can, of course, manipulate it in real time, again all the program here is doing is sending down 10 control points. Control points are bouncing off the edges and the rest of the geometry is rendered all in the GPU.

This is simple for a demo, but obviously you can expand that and you can see if you have very powerful GPUs you can leave the geometry there.

( Period of silence )

This is a fairly simple demo as far as-- this may be what you, what you'd like to have to see if you're having a standard teapot drawing in your apps that need teapots.

But really here, we have a fairly high tessellation and it looks pretty good. Both of them, both versions on the left without a geometry shader and on the right using quadratic normal interpolation with the geometry shader where it takes all of the, all of the information in from each quad, they're equivalent.

But if I lower my tessellation on this and you can see it's not very many polygons now. That you can see, the highlight stays as a nice highlight, and so you can get better highlights, better geometry information using that geometry shader because it can look at adjacent points and pull out that geometry information, allow you to do things you couldn't do with high tessellation. So instead you would have to send a lot of data down. And you can see from the wire frame, I don't have wire frame for that one.

So moving up again, this is kind of a standard bump map demo, and like okay, why are we showing bump map demos? That's not very interesting. That's like so 1982 or 1992 or 2002. What's interesting on this is that the geometry is static. It's all sent down and the vertex program interpolates with the time value, so one uniform. Keyframes between them, so it interpolates between them. The light vector is sent when the light vector changes. The rest is done on the GPU.

So the GPU takes the light vector in and does all of the bump mapping actually on the hardware. There's no additional light vectors down. You don't have to go per poly, per polygon or per vertex and send additional information. So all of the bump mapping is down completely on the GPU and the CPU is completely idle other than sending one time value and one light vector.

( Period of silence )

And of course, you'd always want to add shadows to it, so in this case, this is using a geometry shader to generate lines. We don't draw any of the original data. All we're going to do is look at silhouette edges. We're going to use GPU shader, calculate the silhouette edges and draw lines instead of the original data. So GPU shader processing geometry information on the GPU.

( Period of silence )

Taking that to another step, in this case, we have-- we're going to draw shadow volumes. This is kind of a visualization of actually what you would see if you extended shadow volumes. In this case, I will show the wire frame, the wire frame you can see the lines that were generating by the GPU shader. Again, additional lines generated.

And if we fill them in and draw them as filled surfaces, you can see the shadow volumes from the light source. Finally let's put this together and get a, are lit, bump mapped Quake 2 model, but the key here is, again, one light source, one time value. The rest of it's all in GPU.

So this is what geometry shader adds. It adds you to do the ability of actually understanding the geometry sent down, generating geometry and allowing you to do these things on the GPU. Alleviating the CPU of that, alleviating the busses of bottleneck, allows you to do some great things using this new technology. And this is going to be available in Leopard.

( Period of silence )

I'll hand it back over to Allan.

( Period of silence )

Thank you.

( Applause )

Thanks.

Thanks, sir.
Matching shirts. Thank you, Geoff. So we'll continue on with our top ten list. That was adopting GLSL, certainly something that we want you all to look into some more. And just a reminder. So Geoff's session on GLSL is 9:00am tomorrow. Come early, get up early, have that morning coffee.

Number seven, use APPLE_flush_buffer_range. It's an extension that we've added. And this has to do with vertex buffer objects and how to make updates to them. So going back to our chart, our diagram again. So the issue with vertex buffer objects is that when you do what I said before and you just call map, what that does is to just calls, sorry, it copies the data of those objects into your-- or it maps it into your local address space.

And then if you go and make some modification to one of those objects, when you call unmap what's going to happen is that the entire object is going to get copied back to the GPU. And that's inefficient. If you have some really big object, you're just changing one byte, it's going to copy the whole thing back. So instead what flush_buffer_range lets you do is to just, come on baby, there we go. All right. Well, it didn't quite show it.

What flush_buffer_range lets you do is if you make a change to an object, you can say, okay, just copy back the part that I changed. And so that way you're being much more efficient. If you make multiple changes, you're able to, you know, say exactly where they are and only those parts gets, get copied back. And so you end with a much more efficient transfer here for when you're modifying the data. So the steps here are really pretty easy. The first is just you essentially start by turning off the default behavior which is to flush the whole thing back.

And then proceed normally. You map the buffer, that copies the buffer into your, the buffer into your local address space. You make some changes to modify it and now you can go and say explicitly exactly what parts you changed. You tell us the offset and the number of bytes. And if you make multiple changes in different locations, you just call this with different offsets and bytes, you know call it multiple times.

And so now when you call unmap buffer, that's you know only that part is going to get mapped back into the original VBO. So that's number seven, use Apple_flush_buffer_range to modify your VBOs. So if you're modifying VBOs, this is critical. This is one of those modernization steps that is going to give you a good performance boost.

Okay, number eight. Going onwards. Another buffer object type. So frame buffer objects. These basically replace code that used to be dealing with PBuffers. And a lot of this is, a lot of the purpose of using PBuffers in the past has been for doing multiple, multipass effects. Essentially like our render to texture and then using the texture in a, in another part of the frame. The benefit of frame buffer objects though is that they don't have to be, well, you don't have to switch context like you do with a PBuffer.

And so, you know, that ends up giving you a nice performance boost there and avoiding a sync point in many cases. So then there's a more advanced use, that combinations of frame buffer objects along with the other buffer objects can be used to do a render to vertex array. We're not going to show that this year. Go look at last year's video if you want to see an example of that.

But so just talking about render to texture. You know, the basic steps here are, there's two steps. You know, in the first step we're going to take our normal geometry, we're going to render it normally using OpenGL. But instead of rendering into the frame buffer we're going to render into a frame buffer object. And you get that result sitting there, and it's basically now just a texture. And you can use that texture in the second pass and put it on some arbitrary geometry, of course, a teapot.

And render that normally into the normal frame buffer and you end up, you know, with that result. So that's just, that's the basic concept of a render to texture. So to use frame buffer objects here, here are the steps. The first is to bind the frame buffer and then attach a texture ID that is going to be the destination or, you know, the output of that.

You check to make sure that everything is complete. And then you can do whatever normal OpenGL operations you want. And those are going to end up instead of in the regular frame buffer, they're going to end up in that texture. Now you're done, and so you bind, you put everything back to normal, bind to the default frame buffer. And now you can go and do a second pass using that texture ID that you generated, or the texture that you generated the first time around.

So that is number eight. Replace. Replace uses of PBuffers or render to texture with frame buffer objects instead. Number nine, pixel buffer objects. Seems like it's the same thing. Wasn't I just talking about PBuffers, pixels? No, this is a different feature. It's a part of the OpenGL 2.1 specification now which we're supporting in Mac OS X Leopard. The main value here is to, for really fast copies of pixel data between the different buffer objects without having to come back to the GPU to do like a read back and then send it back down again.

The main purpose that we see a lot of-- or we expect you to use this for is this actually could be a faster way for you guys to do glReadPixels. There's also a number of different things that are more advanced, render-to-vertex arrays, streaming, and so on. But asynchronous glReadPixels is something that could give you a big benefit, because as I mentioned before, ReadPixels is one of those really, really heavy-weight sync points.

And so, you know, usually you might have code that looks kind of like this, right? You have glReadPixels, you're passing in a particular format and you're getting back some result, right? So the idea here, you probably have something like this in your code, just you know with other stuff around it. Well, here's what I want you to surround this call with.

You want to augment it now with a call to, or with PBOs. So the first is to just bind the PBO as the new destination for where you're glReadPixels is going to end up. So you notice instead of result now on number two we have offset. So that's just you know an offset into the PBO.

We make the glReadPixels call as before, but now you're asynchronous. It's decoupled. And so you don't want for it to complete. You go off and you do other work. Go do something, have you know a cup of coffee when you're out of that teapot. And so, just kidding, but you know go do some other work.

And then insert enough time essentially so that when you come back and you call map buffer, the glReadPixels will have completed in that time. And it's, it's asynchronous. So if it's completed, you'll get the pointer back immediately. If it hasn't completed yet, this will block until it's complete. Okay? And then when you call unmap buffer... Oh, and then unmap the buffer to, to clean up.

Okay, so actually I want to bring Geoff back up. We have a really interesting demo that's using these techniques for read back and so on. So Geoff.

- Thanks. So thanks, Allan.

( Applause )

So we're going to show, we're going to talk to the same, basically the same demo we did on Monday afternoon but we're going to talk about the techniques behind it.

Blizzard has done a great job of using PBO and using some techniques here to get a, really an interesting user enhancement in World of Warcraft. So let me jump right into the demo, and we'll talk through what they're doing. They've allowed us to kind of glimpse into their app a bit.

So as far as you know kind of how this works. So we talked about before is that in the next major release of World of Warcraft for Mac OS X, you'll be able to capture movies in real time and compress them on the fly for documenting your greatest heroics. And so what we're going to do is we're going to do that and then talk about exactly how Blizzard kind of implemented this behind the scenes. And so you can do something in your app the same way.

I think there's a big community now of, of capturing you know either, whether it's a scientific visualization app, whether it's a game, whether it's movie playback, it, you'll want to get this data back off the GPU and this is a great way to do it So we're here, I'm running again on my laptop. It doesn't require the highest end hardware.

It is, you know, depending on the amount of hard drive space you have and the amount of how fast your hard drive does depend on how fast you can stream. But this is a, this is a current laptop and we have all of the visual options turned on to maximum. So everything is turned up there.

And let's look again at the Mac options. In this case, I'm going to capture 800 by 600, a little bit smaller than I did on the Mac Pro, but it's 20 frames per second, and then I'm going to use H264 so we can use it, iPods, Apple TVs, iPhones, websites, whatever you want to do here.

And we just jump right in and talk about the, actually I'm going to show the demo of the video capture. And again, the big thing is here that I can start the capture and we have an icon that came up in the upper corner here that shows it, the movie capture is going. But you know, it's, you know, you have a good play-- everything looks nice.

It plays well and there's no problems with, with playing the game while you're doing video capture. And one thing, I'll jump in on, I'm probably going to get this wrong, but Rob can correct me later if I'm, if I do. I'm going to do the capture, I'm going to start compression now because this is a little demo.

Your, if you go into the movie control panel it's also just limited by hard drive space. Because basically, from a technique standpoint, what have they done so far? So they took the frame, and they've rendered it into basically a texture, rendered it onto the GPU. So we kept, we did that work on the GPU. Also they rerendered to a 12-bit YUV buffer on the GPU to get the data as small as possible. So we're dealing with the smallest piece of data.

They then use the technique that Allan described to map this as a PBO and use glReadPixels to pull it back. And when they pulled it back, they try and they do that on, they do that and it gets back into the system memory. One thing is here is there's a synchronization on this. So if you only have one buffer and you're trying to read that back for the next frame needs a buffer and you're kind of, so they set up a round robin. So they have three buffers in flight.

And they always try and keep you know the, they try and as Rob describes when he's talking about it, is it's a juggler. You always try and keep some things in flight. You're only working to draw one and receive one at the same time. And you have to experiment in your app depending on how big your frames are, how much bandwidth you're using that kind of thing to determine what the right asynchronicity is there. But again, the theme of the talk is about asynchronicity. So we have this asynchronous mapping of, or asynchronous use of these buffers.

And it allows the game to continue playing while you're doing the capture and not really affect the game play. Because it keeps it on the GPU. Finally the compressing sequence uses QuickTime, spawns a thread, asynchronous again, reads the frames in 12-bit YUV off disk, because it's a custom fragment program to write that.

So we compress small, small amounts of data. It then does the compression on the fly with QuickTime. They've talked about trying to do the compression in real time because you can imagine if you could do that basically you're limited by compressed space rather than uncompressed frames. And that just depends on synchronization.

If you had an eight core machine or a four core machine, you probably could get away with it. But for the majority of the users of World of Warcraft and your laptop, you know, you only have a couple cores. Maybe a couple years from now. But so, you're back at the, back at the old ranch and you want to show your movies. And so we'll go back to here and show the movie. And again it's, it was 800 by 600 of it.

Oops.

( Period of silence )

And we should have the standard, you know, standard movie scrub. And it just plays like you would expect.

( Period of silence )

So you get, you know, a movie of your travels, captures your stuff, and you can post it to the web, do whatever you'd like with it. Again, again as I mentioned before, Monday, it's a great synergy of technologies, has a lot of things that can tie together. But the key here is some components from Allan's talk.

Remove sync points, use asynchronous PBO API, use multicore. Another point is a big point is the multithreaded engine. Multithreaded engine really makes this work a lot better because really you have these sets of the ReadPixels in the command queue for the multithreaded engine. They're not necessarily being executed by the engine. So they can be queued up also. So you can get that additional asynchronicity. Multithreaded engine, key component here.

So now we've hit, you know, maybe four or five of your, of your top ten in implementing a feature like this. So I think everyone can go find a place in their app they can use some of these techniques. And I'll hand it back over to Allan.

Great, thanks, Geoff.

( Applause )

So thank you, Geoff. So that was the end of number nine. Move pixels with PBOs to make them asynchronous. That takes us to our last one, number ten. It's embrace the platform. So you know a lot of the OpenGL applications I look at are trying to do cross-platform stuff and that means that when they run on Mac OS X they look not like a Mac OS X application.

And so, you know, there's some things that I want to really, you know, my title is evangelist, so here we go. I would really like to, to encourage you guys to think of places in your application especially if it's cross platform or you're bringing it to Mac OS X, where you would use some of the frameworks that we provide for you rather than kind of rolling your own or doing, you know, one of these cross-platform GUI tool kits and so on.

So you know, first of all, we want to encourage you to use Cocoa. So use and NSOpenGLView to render your OpenGL. And there's a very cool thing that's now a tie in, related to Core Animation. You know a lot of apps that I've looked at over the years that are trying to like put text or a HUD or something up on the screen and they're doing it by rendering, you know, the letters themselves in like Battle Zone Letter Art.

You know it doesn't look right. And now, you know, with Core Animation, you can render OpenGL into an OpenGL layer and you can have Cocoa controls in another layer right on top it, composited. Don't have to do anything. You can write those Cocoa controls in Interface Builder. You know, it's just-- it's a very seamless experience for you to go and use.

So the other thing is, you know, in my job I see a lot of different applications as people are bringing them to the platform. And I, I mean we love to get, or you know, to see applications coming. But once, once you've reached the platform to really modernize, you should and give your application a Mac OS X user experience.

And what this really means it that if you're using X11 of GLX or GLUT, or you know, one of these sort of, you know, dinosaur windowing models, you could move to what is native on Mac OS X and give your application a much more, a much better user experience and more of a Mac OS X look.

And then also other things. If your application goes beyond 3D, think abou=t some of the things that you could do with some of our other Cocoa-based UIs. So I mentioned Core Animation. So you know, Core Animation isn't just about having, you know, kind of an interesting demo. It can really be something that does something new in your user interface. And particularly I want you to look at it as a way to layer controls into OpenGL. And we're kind of jettisoning some of that old code for doing really weird text and stuff on top of the OpenGL rendering.

If you're doing image processing in your app, then you probably already know about Core Image, because that would be kind of your bread and butter. But maybe you're, you just want to do something like add a, add a transition or add a bloom effect or add some other kind of imaging effect into your application.

That's something that you could think about implementing through Core Image in the Mac OS X version of your app. If you're bringing video into your app, maybe you're playing video on a texture. The way to do that the right way on Mac OS X is through Core Video.

And you can, if you're bringing it in, you can then pipe that through Core Image on its way to apply image processing effects to every frame in the video and then it ends up in the texture and OpenGL and you can use it in all the normal ways. And if you're using, if you need to do movie playback and capture, I want to encourage you to look at QTKit. Now there's a QTKit session, Core Image session, Core Animation sessions here at the show. So for more information on those, go to those sessions.

But really embrace the platform. You know I would love to see more like hardcore OpenGL applications going into the best Mac OS X user experience category at the ADA. That's what I want to challenge you with. So that was number ten, embrace the platform, which concludes our top ten list. For more information, something I want to point out for those here is that we're going to be issuing an OpenGL seed before Leopard ships.

And so if you're interesting in getting into the OpenGL seeding program, send email to that address. Secondly, we have a public mailing list for discussions of OpenGL. I want to remind you that what we discuss here at the conference is confidential. But for, you know, you want to get on that list just to kind of find out what's going on. And then you can find documentation and that's my email address there.

A few other things just before we wrap up. So these are the other OpenGL sessions that are going to be happening. And so sort of depending on where you fit in, if you're more advanced, you want to go to the GLSL and Tuning. If you're coming from another platform, you probably want to go to switching. Everybody should come to the lab.