OpenGL Techniques for Snow Leopard - WWDC 2009

Mac • 58:56

OpenGL is the foundation for high-performance, hardware-accelerated graphics on Mac OS X. Attend this session to explore the full power of OpenGL through key best practices. See how to use OpenGL from multiple threads, across multiple GPUs, and with multiple displays. Learn how to integrate high-performance OpenGL graphics with the computational capabilities of OpenCL.

Speaker: Chris Niederauer

Unlisted on Apple Developer site

Downloads from Apple

SD Video (197 MB)

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript has potential transcription errors. We are working on an improved version.

Hello. So I'm Chris Niederauer, I'm on the OpenGL team, actually we're the GPU software team now. I'm a senior engineer there, so I was going to talk to you today about OpenGL and what you need to do to support our platform and in particular Snow Leopard. So first I want to get into what OpenGL is.

I'm sure everybody who's here is already familiar with OpenGL, but it's basically just an API that allows us to interact with the GPU. So these GPUs are the Graphics Processors that accelerate 3D graphics, 2D graphics, that sort of stuff. So here we have an example: Doom 3 where it's a 3D cinematic e=xperience.

And you can do... here's Google Earth, and then a lot of the user interface elements are done as well in OpenGL. So here we have iChat, AV, and some medical imaging. And then in the bottom left the Expose and the Window Effects, all that stuff is using OpenGL. And Core Image in the middle with the Image Effects; all these things make it so that your user interfaces stand out from other things.

So I'm not going to be getting into the OpenGL APIs in specific, I'm going to be getting into our platform specific OpenGL stuff today. But for the actual APIs and actually doing the drawing and all of that, there's a few books that I recommend you guys check out; pretty common knowledge but it's the Red Book and the Orange Book - so the OpenGL Programming Guide and the OpenGL Shading Language book.

Those are good references for if you've just now learning OpenGL, good starting material, and the 7th edition and 3rd edition, respectively, those are going to be coming out I think in July. So if you can wait a month, check those out. And then obviously a wealth of knowledge on the internet, so you can Google anything you want and pretty much helps you there.

And then in particular we have on the ADC site we have an OpenGL reference for Apple OpenGL stuff. And that's at developer.apple.com/opengl. So whenever you see this little book here in the bottom right, I've got that book with the ADC on it, that means that there's some document on our ADC site specifically about that subject.

So let's get into what we are going over today. So if you read the overview, it's a lot about multiples. It's about dealing with multiple GPUs, there's a lot of systems that... Mac Pros where you can have multiple GPUs, being able to deal with multiple screens and for instance I'm going to go over full screen applications There's also multithreading, going to go over that a little bit, multiple contexts - how to share them - and finally sharing between OpenCL and OpenGL.

I'm going to get into that a little bit. So first let's go over some of the new stuff that we have in Snow Leopard. So some of the new extensions in particular here. We've got APPLE rgb 422, so this is a simple extension which allows you to sample from a YUV texture. But unlike the YUV 422 extension that previously existed, this one does not do the 601 color conversion.

So you can do your own custom conversions for that. There's ARB color buffer float, and what that allows you to do is it allows you to explicitly set, if clamping a float, between 0 and 1, occurs at specific stages in the pipeline so the vertex stage, the fragment stage, and then at read pixels, at read back. ARB half float vertex, simple extension, adds the ability to pass in an OpenGL half float type for your vertices.

And then the next 2 - ARB texture rg and ARB texture compression rgtc, are an extension for basically single and double component textures. So RG stands for Red Green, and so it adds GL red and GL RG, which is for Red Green component textures. So if you have like floating point texture data that only has 2 components, you can save memory by using the RG instead of the RGB or RGBA types here, and the texture compression just has compression for that.

And then finally, texture srgb is a texturing format that allows you to basically take advantage of precision where the gamma has the most differences in it, so you can see... you can get a better texture precision for the visible spectrum. And then the last 2 I noted here are actually available already in Leopard, but they're relatively new so I wanted to just mention them. So there is EXT framebuffer blit, and EXT framebuffer multisample.

They simply allow you to be able to do multisamples with your frame buffer objects. So let's dive into working with multiple GPUs. So recently we've been shipping a lot of computers that have multiple GPUs in them, and in fact as of recently the MacBook Pros have shipped standard with 2 GPUs in them, and the built-to-order Mac Pros even have an option.

It's where you can actually select 4 GPUs in your system, and it gets sent to you straight from us just like that. So it's becoming increasingly important that you are able to deal with these types of environments, where there's not just 1 GPU... potentially multiple displays, and so I'm going to go over how to work with that because your app should be able to support these environments.

It's not just everyone on the MacBook. So here I want to just go over a quick example of sort of what the idea is, what you're supposed to be doing in order to have your applications support these multiple monitors, multiple GPUs, these sort of situations. And so here we have an example of the GLSL show piece example, and we're running, I think that's the Gooch Shader. And so it's running on the left display in this example, so we have a computer with 2 displays, 2 GPUs, ATI on the left here, and video on the right; not the same vendor, but we still support this type of system.

And so on the left we're rendering the teapot here on the ATI card, but then the user can move that application, just drags it over to the other display, and at this point basically what you want your application to be able to do, is to be able to switch this rendering from that ATI card to the NVIDIA card because the display's faster in NVIDIA card.

So as a result if you weren't to be doing the rendering on the video card at this point, what would have to happen is the ATI card would do the rendering, it would read back, copy it over to the NVIDIA card at which point it could draw it there.

So this saves a lot of bandwidth, it's very efficient, and makes a very Mac-friendly application. So before I get into the details of how you're supposed to support this, I wanted to go over the concept of virtual screens. Virtual screens is basically an index into an OpenGL render list, so render plug-in.

For instance, each hardware device is a virtual screen, and then also the software render is a virtual screen. And so here if we have like a system with 2 GPUs, and then also we have the soft render, we would have 3 virtual screens here. So the first 2 virtual screens point to the video cards.

It doesn't matter if the same video card or the same type of video card, or different types of video cards, it's still 1 virtual screen per device... and it has nothing to do with the number of displays that are hooked up to it too. So here we have 3 virtual screens, and then finally the software render is always the last virtual screen when you have a software render in your virtual screen list.

And so this list is chosen by the function call CGL choose pixel format. So when you pass in your pixel format attributes, you're able to ensure that you get the GPUs and the software render as you're expecting, and it gives you this list of virtual screens as a result And the setting of that virtual screen is set by... we've got functions here like CGL update context and context update, which will implicitly update a view that's onscreen. It will implicitly update the context with where that view is onscreen.

So if you call update context after you've moved the view from 1 display to another display, it will automatically switch your virtual screen to the correct one at that point to render on the GPU that will be most efficient. And then also you can manually set the virtual screen by calling your CGL set virtual screen or set current virtual screen.

And then also for P buffers, although we recommend frame buffer objects, but for P buffers you can call CGL set P buffer, which will then attach the P buffer and attach it with a specific virtual screen. So what do you need to be doing to actually make sure that your virtual screens are up to date basically, when you have these multiple GPUs and multiple displays?

So basically what you need to be doing is maintain your virtual screen, make sure you're on the right one. So that means both initializing the virtual screen to start up, so if you have a context that isn't just shown onscreen for instance, you may need to associate a context virtual screen with the correct one at that point. And then also at all the CoreGraphics display change notifications, those are points where your virtual screen may have changed as well.

So you need to ensure that you're tracking those notifications. And then, as I was saying earlier, surfaces unrelated to a display need to be manually managed. So for instance if you have a P buffer and you try and use a P buffer with a context that's shown onscreen, then you need to make sure that your P buffer's virtual screen is matching your context's virtual screen. So here's the code for NS OpenGL view for a subclass, it's pretty straightforward, all you have to do is overload your update function.

So here, after we call super update we just get the current virtual screen, and so the trick here is we're checking when the virtual screens change, and so its new virtual screen does not equal old virtual screen here. We basically need to put in a little bit of logic in our application to ensure that all... if you're switching from say NVIDIA to ATI, to make sure you're not using NVIDIA extensions on the ATI or ATI extensions on the NVIDIA for instance. So you need to ensure that all of your GL support that you're currently relying on is still supported on your new virtual screen. And then so... and then if you have a P buffer for instance, this is also related to that context.

So after the virtual screen changes with the onscreen context, you want to change your P buffers to follow that. Some notes I wanted to mention, that the P buffers and the front and back buffers are not preserved on a virtual screen switch. So if you called update context or update, and see that your virtual screen has changed, you may have to redraw into what you already have. Although this event should be called in your event loop, in between your drawing anyway, so in general, in practice this isn't really an issue.

And then also if you're not subclassing an NS OpenGL view, I was saying you need to register for the display change notifications from Core Graphics, and this update call, if you're subclassing NSOpenGLView, is already being automatically called at those So you don't need to do that if you're just subclassing, it's pretty easy.

So I also wanted to go a little bit over sharing contexts, so here we have 3 contexts and we can see that each of them we have a system, a Mac Pro again, it's got an ATI card, an NVIDIA card, and then also a software renderer are the available list of virtual screens that we have.

And so what we have here, if you look at the dotted line on the right, we've got that connection to the software renderer as a virtual screen, but if we pass into our pixel format attributes to CGL choose pixel format, we pass in KCGL PFA accelerated, and KCGL PFA no recovery.

So by explicitly setting those, what that says is that we don't want the soft renderer. So as a result that connection, that virtual screen to the soft renderer is lost, and so the context that you create with that pixel format are only going to have those 2 virtual screens in this situation. So as a result, you see the context on the right, context C, because it only has 2 virtual screens and those virtual screens are not explicitly the same as the other 2 contexts, context A and B, only context A and context B can share with each other.

But context C cannot share with those 2 contexts. It could share with something similar to it, but in general we recommend that you try and always have all virtual screens available to you with all your contexts, because it tends to make things pretty straightforward and there's not really a reason to not have a virtual screen supported.

So sort of an opposite sort of look at this, is with offline renderers... so we've got the same system with 2 GPUs but note that we only have 1 display attached. So the ATI's attached to this display here, but the NVIDIA is actually not attached to a display. So for instance, they may have a OpenCL context that they're trying to do some computations on their second GPU.

So even though they have GPUs they're not necessarily attaching displays to them. And so by default CGL's choose pixel format only includes the online render. It's basically virtual screens that point to renders that are capable of rendering to the screen. So because the NVIDIA is not physically connected to that screen, it would have to do the read back to show up on the screen.

As a result it's not included in this list, so the simple fix to this is you add KCGL PFA or NS OpenGL PFA allow offline renderers. And what this does is it tells it that you care to include offline renderers in your virtual screen list. And this is very important, especially if you're going to be wanting to share with OpenCL, that you have those offline renderers because you might be doing some computations there. Or you might even want to be doing multiple OpenGL things on each card too.

So when we add allow offline renderers, we get that virtual screen added, so here we notice we had virtual screen 0 and virtual screen 1 with the ATI and the soft renderer. Since we have this new virtual screen added, the way... since it's an index into that list, we actually, the virtual screens may actually not be the same order.

So for instance we see that the ATI's actually virtual screen index 1 now, whereas it was 0 before. So note that if you don't have the same list of virtual screens, the indices do not necessarily exactly correlate to each other. And also again, the last virtual screen, virtual screen 2 here, is always a soft render. So by doing this you can also, if you were to unplug your computer, your computer's monitor, plug it into the other GPU, you'd automatically be able to take advantage of that second GPU because you included allow offline renderers.

So our recommendation for you guys testing this stuff, is to get a system that has 2 GPUs and 2 displays, and basically try and ensure that when you move your application over from 1 GPU to another GPU with 2 displays hooked up, that it properly is moving your OpenGL rendering to the other display GPU.

And then also it's another thing that we do support, and something that's very Mac-friendly, is to be able to actually unplug video card, plug-in another video card, and so to a single display; and when that happens you want to ensure that your virtual screens and all your rendering is still moving GPUs to the correct virtual screen at that point.

So for a little bit more in depth sample code and information on this we've got a Tech Note that just went up about couple days ago, and it's Supporting Multiple GPUs on Mac OS X, and it should be related to the session as well. Now that I've gone over sort of how to make sure you have multiple virtual screen title, make sure you have... you understand the concept of it. I wanted to go into full screen rendering.

So full screen rendering in Snow Leopard is actually being deprecated. What do I mean the full screen rendering is being deprecated? What I'm saying is we have special contexts that used to be specifically for full screen. So if you passed in a pixel format attribute that said full screen, it would make a special context that's only able to access a single display.

And so that's bad because for the virtual screen list, like we were talking about earlier, that virtual screen list is going to be dedicated to that single display and only have 1 thing, which is not necessarily going to be able to share with anything. So it doesn't interact well with multiple GPUs, and so you can't share with other contexts necessarily when you have full screen passed in. And also those contexts, whenever you want to switch display, you need to create those contexts again from scratch. And then also another drawback to the full screen context, is that they cover up some of the critical dialogs that exist in the system.

So now that you're hopefully not yet, but thinking about crying that full screen's gone -- let me make sure that you guys know everything's going to be alright. So now full screen rendering, there used to be advantages to full screen rendering basically where you'd get cache enhancements for instance, or it would be tiling on certain video cards, or page flipping where the pointer to what's being displayed on the screen actually literally just changes the pointer as opposed to copying data over. And so now in Snow Leopard when you create a context that covers the full screen, it just automatically gets those benefits.

It automatically goes faster, and so you get to have a screen that's not literally full screen but just because it's covering the entire screen you get all of these benefits for free, and you're still able to do things like show some of those critical dialogs. So just want to go over quickly, it's pretty straightforward, how to make a full screen covering window. And so you just make a NSWindow here for instance, that has the main screen frame, and set the window to be that size, to be the size of the screen that you want to place it on.

And then you just simply place that window right above the menu bar. So here we're setting it to NS main menu level plus 1, and so that sets it right above the menu bar, and so we're ensuring that we're covering the entire screen at this point; but it's still low enough that the critical dialogs are still able to appear.

So I want to go over an example of this, so got a system here... and hopefully it's showing up. Open up a program we have here called FullScreenNSOpenGLView. So it's a subclass of NSOpenGLView that we've hooked up to work with this. And so here we see a window that's almost full screen, you can still see there's a menu bar here, and so we see on our card that we're getting around 3100 frames per second. So what would you expect if I make the view slightly bigger?

Usually you'd expect the framerate to go down, but as a result of switching it to completely full screen, we've gone from around 3100 to 46, 4700 framesF per second. So why does this happen? It's because we're able to take advantage automatically of the fact that your view is covering the full screen.

And so in your normal everyday applications you're not going to see a 3000 to 5000 frame per sPecond increase. Hopefully you're only running at 60 frames per second, but this is a really simple straightforward application; simply doing a single draw here of that frame rate, and so because the overhead is so low we're able to take a look at how that... is actually affected, it's slightly faster as a result. And so I was saying earlier... even though it's full screen I can still, while it's in full screen, all those critical dialogs still show up over it.

[ Silence ]

[ Applause ]

So full screen's hopefully a lot easier now. You don't have to be dealing with, I mean it makes it a lot easier especially when you're working with multiple GPUs. So as I was saying, there's some deprecated API. And so as opposed to what we used to do before, we used to have to require that you have the pixel format attribute full screen, KCGL PFA full screen.

And also you'd call the function CGL set full screen, and those are deprecated and instead you just literally make your window covering the full screen size. And then also I wanted to make a note about when you are going into... when you do have a screen, when you do have a full screen context, I was mentioning there could be page flipping and this page flipping means when you call flush drawable, or CGL flush drawable, your back buffer contents are not guaranteed to be preserved after flush drawable; because if you're moving the pointer that means you may have, for instance, the last frame's data instead of what you just drew in your back buffer at that point.

And so there is this pix format attribute called backing store that preserves your backing store after, it basically guarantees you have a copy of your data, and then you can also for instance change the backing size. So when you have things like that that cause a copy of the data, those will actually prevent the page flipping performance gain.

And so it's recommended that you don't depend on the back buffer contents. But you can still take advantage of it if your application really does do that, or you could use an FBO, render to an FBO, and of course the FBO contents will be maintained all the time. So let's go into a little bit of multithreading.

So this is kind of a tricky subject, no one likes to debug an multithreaded application, it's always... always very latent bugs that show up, happen at different times, really hard to find out what's going on. So I want to try and go over a little bit what our recommendations are for your applications in multithreading. So as you are all hopefully aware of already, I wanted to just note it again though that OpenGL is not a thread safe API, at least per context. So what this means is that any context can only have 1 thread talking to it at a time.

However if you do want to be using 2 threads, what you can do is you can have 2 contexts with 1 thread talking to each context separately. And so as a result, you could have those 2 contexts share and for instance if you have 1 context on the main thread doing all the drawing, you could have a second context which is sharing with it, and upload textures on the secondary thread, upload other objects... while still maintaining thread safety by having those 2 different contexts, 1 for each thread.

And we also have some convenience functions in OpenGL on Mac OS X, so we've got in particular here CGL lock context and CGL unlock context... and these take the CGL context object, and these are necessary for NS OpenGL views whenever you're doing multiple... whenever you're either using multiple threads or doing any of your OpenGL work on a thread other than the main thread.

And they're basically recursive functions that lock your context that you're working on... and you can use them outside of NS OpenGL as well, but they are required for NS OpenGL when you are doing the multithreaded things. And then also as of Leopard, we added a simple ability to call CGL enable with the KCGL context enable of the MP engine.

And what this does is it offloads all of your OpenGL processing work onto a second thread, so it's automatic. So when you do that, what happens is that you're setting up your OpenGL, all your OpenGL calls on the main thread, but they immediately return and meanwhile OpenGL, we take care of it behind your back. We put all the computation that has to occur onto a second thread for you automatically. You can see performance gains of 100 percent pretty easily if your application is well behaved in terms of any synchronicity.

And what I mean by this is if you have functions that require getting things back, so like GL get there which you should never use because we've got nice tools that allow you to break on errors. Or if you've got GL read pixels without PVO's, anything that basically gets something back from the GPU... that's not supposed to return until the data is valid in the return value. Anything like that will stall the multithreaded engine, but if you avoid those types of things your application can get enormous speed increases by using this, if you happen to be CPU bound.

And then also I wanted to finally mention if you do have OpenGL, physics, AI, those sort of things occurring, the other alternative to multithreading OpenGL is to multithread everything else. So move the AI, move the physics, move all those to separate threads and that way you don't have as much trouble with... that way you can maintain all of that multithreading yourself. And finally, there's some documentation on this that's pretty good, on the ADC documentation site. So the OpenGL Programming Guide for Mac OS X has a chapter dedicated to multithreading in OpenGL on our platform.

So now I wanted to go into OpenCL. So OpenCL is one of the new features of Snow Leopard that you guys are all probably familiar with; pretty neat technology. What this technology does is it is also allowing us to take advantage of the GPU in order to do computations that may not be graphically, in particular, graphically related. But it's still important to be able to sometimes visualize that data on the GPU using OpenGL.

Working with OpenGL and OpenCL, the idea is when OpenCL will create some data, that data's created on the CPU, and then OpenGL you want to display it. So in OpenGL you're going to be drawing data on the GPU. So here we've got OpenCL, just a normal program that could be using OpenCL, that creates this data on the GPU.

And what we're doing here is we're reading it back to the CPU in order to call OpenGL, to send it to OpenGL, at which point it copies it back up to the GPU and then OpenGL's able to render from it. But this is pretty inefficient, so we've created some methods in our version of our OpenCL implementation that allow you to easily pass this data directly from OpenCL and use it in OpenGL, and vice versa.

So meanwhile your bandwidth is not being saturated between the CPU and the GPU. And then also the CPUs idle so it can do a lot more things as a result of these APIs. So as you saw earlier, we had an example for those of you that were at the Graphicss and Imaging State of the Union, we had a pretty cool example called Molecule, which would generate a heat map for a whole bunch of molecules - basically create a texture, just a little cube you see around it.

And this was created in OpenCL, and what this application would then do is have that data and pass it into OpenGL as a texture, and then we drew it here so that's a pretty straightforward example, pretty simple example of how you can use that data from OpenCL and put in OpenGL.

So the basic OpenGL OpenCL interactions that we have in order to pass this data between OpenCL and OpenGL, we've got the ability to create your CL context from a GL context, there's also you need to be able to create buffers potentially, CL buffers from your GL buffer objects.

CL images can be created from GL textures, OpenGL textures. And then I'm going to go into how you can share those buffers and textures between OpenGL and OpenCL, because there needs to be some synchronization there. And then finally I'm going to go over just again, how to support multiple GPUs, how to be pretty friendly with OpenCL on multiple GPUs along with your OpenGL app that's already hopefully being friendly with those multiple systems, multiple GPUs. So here I've got some sample code of how you create a CL context from a GL context.

And so we do all the includes for the extensions, and then the cl_gl.h header is basically all the functionality that allows to do the CL and GL sharing. And so what an application would do next is test for that... the Apple sharing, Apple GL sharing extension in OpenCL.

And although this will always be true on Mac OS X, if you want to be multiplatform specific, your code will be basically testing for this extension. And then they would be able to get function pointers and things like that using the normal APIs OpenCL has for extensions. So now that we've set that up, we actually are going to create the context from the OpenGL context, and what we have added for Snow Leopard is the ability to get basically a share group obj, a share group object, and by getting back this share group calling CGL get share group, we're able to easily then pass this share group into the CL context to generate a CL context which shares basically all the virtual screens with the OpenGL context that you are creating this from.

And there is 1 slight exception to this which is that there's a software path for OpenCL, so the software renderer doesn't by default get a software path in OpenCL. So in the CL create context function call at the bottom of this blue box, you simply add... you still pass in the property USE CGL share group, but then you can also add in the CL software path to your context in CL create context in the context list, or in the device ID list, in case you do actually want to be running OpenCL on the CPU.

And then finally, we can check what virtual screen... what the device ID is, the OpenCL device ID is, that relates to the current virtual screen that you're on. So here we get CL get GL context info Apple, we call that, and we're able to get back here - what the CL device ID is for the virtual screen that we're currently running on. So how do you share the objects between OpenGL and OpenCL? So here it's pretty straightforward, we just create here for instance, we're creating a VBO, Vertex Buffer Object.

And we just use a normal OpenGL code, Gen buffers, Bind buffer, and passing the buffer data with a size. Then we're able to get back a CL mem object by calling CL create from GL buffer, and because you created your context from that CL... your CL context from that GL share group, it automatically knows which share group it's going to be getting it's objects from.

So you're just passing the vertex buffer object name in this case, and that gets you back the CL mem object. Same sort of thing for images and textures, you create your texture as you normally would. Then you call CL create from GL texture 2D, and so you pass in the target, the texture level, and the texture name and that creates the CL mem object again from that texture that you create in OpenGL. And then there's similar mechanisms in place for GL textures 3D's, so create from GL texture 3D.

Then CL create from GL render buffer, so you can also work with render buffers; same exact idea. So then synchronizing that data between the 2, if you're going to be creating data in OpenCL and trying to show it in OpenGL, there needs to be a little bit of synchronization.

So basically if... what you need to do is with your OpenGL context you need to ensure that you've flushed all of your data that OpenGL has done to the GPU, all the work that it's supposed to do to the GPU in the OpenGL context. And then what you can do is you call CL enqueue acquire GL objects, and what this allows... what this then does is CL is able to, at this point, render into any of those GL objects that you've created.

And then when you're done with that, you call CL enqueue release GL objects and at that point CL will automatically flush your data and then you continue on with OpenGL with that data up to date, and it basically works both ways. You can pass data back into... from CL to GL or GL to CL by going through this sort of pattern here.

So then working with multiple GPUs, just wanted to quickly go over this. There's also that CL get GL context info Apple in addition to be able to giving you back the current virtual screen's CL device ID. We're also able to get back a list of the device ID's for each virtual screen in the system.

so here we call it with CL CGL devices for supported virtual screens, and this gives us back the array of the CL device IDs associated with those virtual screens. So in our update function that we created earlier, when we see that that current virtual screen is changed, you see that I've added a small part where I changed the cur device ID, so I changed my CL device ID that I'm using to be the device for that virtual screen.

I've already just pre-computed the lookup, and so I can have it for instance either match the GPU it's going to be rendering on... which you usually have it so it matches the GPU you're on, but you can also do things like have it render, have it do your CL computations on the GPU that's not being used to render. So if you wanted to offload work, if you're doing a really heavy GL view and a really heavy CL computation, you can just switch the GPUs around like that and this API makes it pretty easy to set it up exactly how you want it to be.

So then I want to go over the tools a little bit. So the tools are great on our platform for basically helping you tweak the performance out of your applications, debugging them, so I was going to go over the application we have which are Profiler, OpenGL Profiler, OpenGL Driver Monitor, OpenGL Shader Builder, and then also we've got some non-GL-specific tools which are Shark and Instruments.

So let's get into those a little bit. So we've got OpenGL profiler and what this is, is an application that basically allows you to debug performance and correctness in your OpenGL applications. It's pretty useful, it's very useful because it's specific to GL and so you're able to do things like... obviously you could set up breakpoints on things like all the GL entrypoints.

But in addition to the GL entrypoints, you can do things like set breakpoints on GL errors, so if you ever create a GL error there's a simple check box for that. Thread conflicts - again if you're trying to do multithreading, you just check a check box, you can break on thread errors.

Then renderer changes, and then more specifically software fallback, like if you ever do something that the GPU is not capable of, you can see that you're hitting software fallback, at that point look at what may have caused you to go into that mode. And so it's useful for checking that, like when you're moving between displays that your renderer is changing for instance. And then also it lists here, trace GL call so you can look at all the OpenGL calls that you've made - how long they've taken, each call takes in the application time, backtraces for that.

You can look at all the GL objects, so you can look at the textures... GLSL shaders, those sort of things; even modify them, some of them. And then finally one of the most useful things, you can look at the draw buffers. So you can look at the current draw buffers - color buffer, the depth. So you can actually set breakpoints, for instance, after all your draw calls and see as each draw does some drawing, see exactly where something may go wrong and make some corrective actions based on that.

And then Driver Monitor, let's get into that. So Driver Monitor is a little bit more advanced of a tool, it's very easy to use but at the same time it's got a lot of parameters that are very useful. So what it's able to do is show you things like CPU stalls or if you're leaking memory, those sort of things. I just wanted to go over quickly some of the driver statistics that we tend to always stick in when we're looking at an OpenGL Driver Monitor. So some of the basic ones that we recommend everyone has when they're looking at these.

So for the first one, CPU wait for GPU. If you enable this parameter you basically can see how long the CPU is waiting for the GPU. It's pretty straightforward. So you can see if some operations, if you're fragment bound you'll probably see your CPU wait for GPU go way up. Or if you're trying to draw faster than 60 frames per second, which you shouldn't do because the display... well faster than the display refresh rate.

You shouldn't be drawing faster than that. But if you were to be drawing faster than that, you might see this also pop up for instance. Then there's current free video memory, so you can look, make sure that you're not going over your limits of the memory that your system has. So if you have 128 megabyte video card and you're trying to load lots of textures, you'll be able to watch that the memory's going... see what your memory's at.

Hopefully it doesn't fluctuate, but it stays steady after you've loaded all your textures. And then finally we've got surface page off data, and texture page off data. And so if you're over saturating, trying to over saturate your VRAM, you may see that surfaces, or textures are being paged off. So you want to try and avoid those from ever getting high.

So OpenGL Shader Builder, pretty straightforward tool. It allows you to create ARB for vertex programs, ARB fragment programs, and then GLSL vertex, geometry, and fragment shaders. So you can look at all your, what your programs generate, tweak the parameters on the fly, and then also you can even benchmark what your performance is going to be like right in these applications.

A pretty useful app. And then outside of the GL-specific tools, again I was mentioning Shark's a pretty useful tool. What Shark allows you to do is profile your code, it automatically profiles code without ever having to add specific hooks into Shark. It just automatically looks straight into what your application is doing, and gives you like where you're spending most of your time. There's some symbols in particular however, that are useful to be looking out for in your OpenGL applications.

So the first symbol here we've got, was actually a library. The software render, if you see GL render a float as a library... this means that you're probably falling back to software render so it might be a fallback for some reason. So if you were to ever see that, what we'd want to do is OpenGL Profiler, break on that software fallback break button, and you can then from there figure out what you're trying to do that the GPU doesn't support, and get rid of that software fallback as a result. Second thing here, RTC symbols; so Run Time Compiled symbols. OpenGL has some Run Time Complied code that it creates on the fly.

For instance if you're trying to upload a texture in RGB and it's converting its RGBA for OpenGL, that will cause some code to be generated that will do that conversion. Even if you have RGBA versus BGRA, like the endianness swapping is going to cause some RTC symbols to be made by our use of the LVM compiler that we use in OpenGL. And so these are going to show up as just an address that's basically unknown. And so I have actually, in this particular trace, I don't know if you can see it pretty easily.

I have a library right here, the gl renderer float. And then the RTC, we've got the unknown library coming from OpenGL. So those symbols can be pretty much eliminated once you figure out where they're coming from. And then finally another, if you're using the multithreaded engine, there is a function called gle finish command buffer.

What this function does, what this is doing is it means that your multithreaded engine, you're calling some function that's causing it to stall the command buffer. So for instance, any of the read back, like gl get error for instance will cause a synchronization finish command buffer to be called, because it has to make sure it computes everything that happened to see if you've actually sent an OpenGL error or not.

Again, use OpenGL Profiler to just break on an error, and then you don't need to have anything like that in there. And then 1 note - driver symbols. If you ever see driver symbols in Shark, usually they'll show up as something like GLD get texture level plus 1, 2, 3, 4, 5 you know; sort of a random thing. Even though it's not necessarily giving back texture levels, so you might be confused by this but really we don't ship any of the symbols in the system, and so you'll get back offsets from symbols that it does know about, which are not necessarily actually what's happening.

So just take into account that you are in the driver, but it's not necessarily doing what that function says it's doing when you see those driver symbols. So if you're new to Shark, I recommend you go see the sessions. I'll tell you about the sessions very quickly, when I go over the sessions, so they're in the future.

And then also Instruments is a final tool I'll show you guys. Instruments is basically an application visualizer, let's you visualize what's happening in your application, and we have the OpenGL driver monitor's statistics I was mentioning earlier, in an OpenGL Driver Monitor, are available in Instruments as well. So you can actually put those statistics side by side with other things like your... here we have it with a CPU monitor, so we can see where the CPU and the GPU sort of... how the interaction occurs between those in this application. And I just wanted to note that with the driver statistics, the statistics that you're seeing are actually per GPU. They're not per application.

So for instance if you have 2 applications running that are both using OpenGL, the driver monitor statistics are going to be the cumulative effect of both of those applications. It's not just your 1 application that it's able to be looking at. So with that I wanted to do a little demo, probably everyone already saw the... the Molecules demo, at the Graphics and Media State of the Union, or Graphics and Imaging, Graphics and Media.

But in case you didn't, and I'm just going to go over it a little bit more here too in depth. So here we've got this neat whole bunch of molecules that are being computed, and so what we have here is actually pretty tricky OpenGL visualization and I'm not going to get into the OpenCL parts we were showing, but I'm going to get into the OpenGL parts.

And so what we have here is actually a whole bunch of... we have a whole bunch of spheres that we're rendering here. We're doing some tricks like for instance, we have a ambient occlusion is actually computing the light here. So you notice that we have lighting is based on if the molecules are visible. And then also these spheres are actually being ray traced with imposters, so there's basically a billboard for each of these spheres; a single billboard for each of these spheres.

And what's happening is there's a GLSL fragment program that runs, that just shoots out a ray for each one of those pixels. We basically create imposters from that, and by using this ray tracing technique in this particular application, we're able to easily do... we're able to get basically perspective correct imposters, and also do things like look into the ambient occlusion map pretty easily in order to do this texturing.

And then also it lets us do some pretty neat effects like here we've got... we're able to actually go into the objects... so pretty neat effect. And then also we've got a depth of field, so we'll switch to a little simpler one. It's a sort of a depth of field effect, so it's a little blurrier with the things that are closer, and then gets a little sharper as we put it more into focus.

The way that we do some of these effects, so with the ambient occlusion for instance, what's happening is that we're actually computing shadow maps. But we're doing a little bit more complicated a thing than a shadow map, we're doing a lot of shadow maps. So instead of the normal idea where you have a shadow map, where you aim the light at the scene and you see what's in shadow, we're actually aiming lights at the scene from a whole bunch of samples around the scene. And so by shooting these lights into there, we're able to then take sort of the addition of all those shadows and we add them together and we create this ambient occlusion map from that, that maps onto these spheres.

It looks pretty cool. And then with the depth of field it's pretty straightforward how that works. What the depth of field is, is we have just 2 textures that we're working with here. So we first render the scene into an FBO, so that FBO has this back buffer, and then we take that FBO and we run it through a 2 pass shader that will do a Gaussian blur basically, will have it 1 way and then have it the other way, applying Gaussian blur.

And so we get a quarter sized texture with a bit of blur in it, and then we look into the depth map... the depth values, at each pixel and when we do the final render we interpolate between those 2 images to basically pick which parts of the scenes should be in focus, and which parts of the scenes should be out of focus for that second texture. So I'm actually going to... try and use OpenGL profiler here ... and show you how to use this app, and let's take a look at that Molecules demo.

I can hear it, it might be actually launching. There we go, wow. So here this warning dialog, I wanted to note this to everybody, this warning dialog is saying that we're going to be unable to attach to our applications... until we log out and log back in, and I'm not going to log out and log back in here, but instead what I'm going to be doing... is enable it, but instead what I'm going to be doing is launching... the application instead of attaching to it.

So normally you can attach to your application after the fact, and there's a list of all the applications running. But here, so you can add your application in by... you can add it through the open dialog, or just dragging it straight in here. So let's launch this application. And I'm going to take this out of full screen so we can debug this, make it a little smaller.

[ Silence ]

So we have here in the views, basically have a list of the things that we can take a look at. So we've got trace statistics, resources, the pixel formats that we've created, the breakpoints, you can have scripts to those breakpoints, buffers so you can look at the current draw, current read, system buffers, and then messages to any logs that you might be logging. So here let's sort of animate that, and in here I'm going to open up the breakpoints.

So here, let's break after CGL flush drawable. So we can see here, let's zoom in a little bit so you guys can see a little bit better, so we've got here the backtrace. We have CGL flush drawable and we see the backtraces, so we can see if your application were to have full symbols in your source lines in here, you'd for instance see exactly the line of code that we're calling GLUTSwapBuffers from, this happens to be a GLUT app, but we recommend NS OpenGL View. It's just a quick demo, just happened to be written in GLUT. And so also what you can look at here, so you see we're running on the ATI, virtual screen 0 in this case.

And then here we've got the fact that we know we're not falling back to software, we're running both the fragment and the vertex processing on the GPU. So here I'm going to click on the state, and then there's this handy little button here from default state, which will highlight... it'll show the changes from the default state.

So here we have highlighted all the differences, so for instance we can see that we have texture 0 and texture 1 are bound, we have texture 0 bound to texture 2D with a name of 8; can look at the width and height of those textures. Color buffer, we see those draw and reader both happening on the back buffer. Our blend mode is set to source over, or additive, looks like additive, yeah.

So... let's continue from here. So then they'll show the resources, so I'm going to look at some resources. While we're in the break point we can look at the textures, shaders, those sort of things. So here we can actually look at the programs that are being run. So this looks like it might be one of the ray tracers... actually running in the program.

And then the textures, we can take a look at that. And so you can see... we've got the normal texture, the FBO that we were rendering into, we've got the first pass of that Gaussian blur, second pass of the Gaussian blur, and so we see we have a blurrier texture as a result here. It's just a lower resolution but blurry. Also you can take a look at for instance, your depth, so all the depth maps for instance that we're using.

And then this is pretty neat, this is actually texture atlas for all of those... each of those molecules has a texture atlas for the ambient occlusion for the amount of light that was touching each part of those spheres. OK, so here I'm going to continue this again for a second with the breakpoints turned off, and then there's also... statistics. So we can look at how many calls we're making. You should be using draw rays instead of these, we just tap into again.

Don't use GLUT, don't use GL vertex... this is an example of a bad application.

[ Laughter ]

So you can see the time, the average time per call or even, or not per call but per different type of call. And then I wanted to show you, let's do a break after GL end. And so we can again look at the current draw color buffer, I'll even bring it up, let's bring up everything.

That might not be useful actually, we might use the Alpha. So you can see here, when I continue I can see it changing, and you can see what it's doing in each draw call. This first draw here is just drawing the background, we've got a little vignette going for the background, then we draw in all the spheres with the ray tracing, and then we actually... draw in, we sort of see it, we drew in some cylinders to attach the spheres to each other. And then of course just doing the Gaussian blur, and then doing the final composite. So it's pretty neat what it can do.

And then of course the break on error, break on thread conflict, software fallback renderer changes are all there, and so hopefully if you turn this on, good. We're not hitting any of those problems, so that's just a quick look at Profiler and I think that we've got some time for Q and A.

But first, some more information... Allan Schaffer is our Graphics Evangelist, so you can contact him if you have any questions related to our platform in OpenGL - [email protected]. The documentation of course at developer.apple.com/opengl. So you can get a lot of the things that I was talking about earlier there.