OpenGL Optimization: Live - WWDC 2004

Graphics • 1:02:08

The "best in class" suite of OpenGL tools in Mac OS X helps take your application's graphics to the next performance level. In this session, Apple's OpenGL experts work live on stage to debug OpenGL client applications and to work through real-world optimization scenarios. This session elaborates on topics discussed in other OpenGL optimization sessions. A must-see session for all OpenGL developers.

Speakers: Dave Springer, Chris Niederauer

Unlisted on Apple Developer site

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.

Session 211, OpenGL Optimization: Live. I'm Dave Springer. This is Chris Niederauer. So it's 3:30 on Thursday. This is like nearly the last day of the conference, right? You guys have been here a long time. You feeling a little conference burn? Here's a thought that keeps me going. In the whole of human history, there's been appointed one Dave Springer. And for my life, my entire life, I get to be him.

We're going to talk about a couple tools that we developed at Apple: OpenGL Profiler and OpenGL Driver Monitor. And we're going to have live demos, because we like to live on the edge. And this software you see is pretty fresh, right? Pretty fresh. So anything could happen at any time.

But what we're going to do during these demos, and we're going to do it, is show some of the performance bottlenecks that we've seen that are common among OpenGL apps. So we get a lot of applications come through the shop, and we see a lot of things like immediate mode, when display list might be more appropriate. We see things like texture upload usage that could be a little better. We see state changes that aren't always necessary. So also, we'll show you about how to debug your OpenGL applications using these tools as well.

Okay. Now, why did we build this tool? Well, really, it's -- those were the issues that we were running into all the time. We see a lot of performance problems and we notice common themes. So we built this tool to quickly identify those areas where you're maybe losing performance in your OpenGL apps. Also, there was a lot of common misconceptions about why performance is being lost. A lot of people would, you get a lot of finger pointing. Well, now we have a tool that will exactly measure and precisely identify where the performance is going.

So what does it do? Profiler will show your usage of the OpenGL engine library. Collect a lot of data and a lot of statistics. You can also control your application kind of like a debugger level. And we'll also show the graphic state that your application has in it.

And we'll get into what all that really means in the demos. Okay, now here's how Profiler works. Now I've got to refer to my cheat sheet here, because my memory is like a sieve, and besides, this makes it look like I prepared beforehand. Profiler is a runtime system.

What happens is it gets in between your app and the OpenGL engine. You don't have to recompile your application, run it in a special mode, or anything like that. Profiler really does work like a debugger in that sense, in that you can just run your app under the Profiler environment.

And now how it does it is that Profiler gets into the OpenGL library at the library level and wraps all those functions. So imagine like in the old days you just have jump tables, dynamically loaded libraries. Get in there and masquerade each one of those function calls to go into Profiler and then from there into the engine or to the CGL shim. And it does wrap both CGL, which is our OS dependent layer of OpenGL and also OpenGL. And there's a quick note under the X window system. If you're using that platform, you actually end up profiling the server, not your client app.

Because it's a runtime system like this, it means that you can launch an app under Profiler and then using GDB or another favorite debugger, you can attach to that same app. So you can have a full debugger at the same time that you're running Profiler. There's some little tricks in how to keep it synchronized there and you'd have to experiment with that. That's an exercise to the reader.

Okay, here's a screenshot. This is all new for Tiger, Profiler 3.0. First thing we did was take the two-panel approach, startup approach that we used to have, and compress it into one panel. So now there's no start and then start and then really start. This is, you select your app at the top. Now let me get these fancy builds. There we go. You, first of all, set up the app that you want to profile in this top table here. If you click "Attach," that table's automatically populated with all the running applications on your system.

You can, in the new 3.0 profiler, set environment variables. This window, by the way, is open large. This doesn't default to this size. Normally, this part of it is hidden when you run profiler. You can set a custom pixel format, which I'll talk about more later. You can emulate sort of graphics drivers. Again, I'll talk about that in more detail later.

New for Tiger, you can set environment variables. Like you can if you're launching from a shell in Unix, you can set environment variables that you then get inside your app. Now you can do it right from profiler. I do this all the time to tell my target app to launch different dynamic libraries, like debug versions of a framework, for example. You can do that through environment variables.

Okay, and the third part of this panel is down here. The most important part, really, of this panel is the frame rate. Now, this is an out-of-the-gate estimation of your app's performance. It's not a real precise, nailed-down, "This is what my performance really is," but this is going to give you a general idea. Oh yeah, I'm getting about 200 frames per second, and I'm expecting 500. Okay, so that's what that gives you. Now, let me get into some of the things that we collect. This is the data that Profiler gathers out of your app.

It takes the amount of time that you actually spend in OpenGL Engine. So when you make a call, we start a timer, go into the engine, come out of the engine, stop the timer, add all those up. So this is a pretty precise measurement of your actual usage of the engine.

"Per call." Now these are cumulative values, and they are also, but they are per context or global. So you can see globally if you have a lot of threads and a lot of context, you can see globally how much time you're spending in functions, or you can look at it per context as well.

Now, one of the really important numbers on this panel here is the estimated percent time spent in OpenGL Engine. And here we're spending about a quarter of the time. If the app is 100%, we're spending about a quarter of the time actually in GL-- in other words, on the GPU.

Now, what this number is going to tell you is-- "Profiler, the tool you want to keep using to measure performance, or do you want to move on to something like Shark and the Chud tools in order to work on your performance on the CPU side? And we'll go into more detail later about how to balance those two off.

Okay, with that, I'm going to turn it over to Chris, who's going to show us a demo." "Okay, so we're on computer number three. Okay, that's good. So here's the new Profiler window, and as we see, it's actually a little bit smaller than what we saw in Dave's screenshot. This is how it starts up by default. So we've constructed an application that shows some of the common pitfalls that we see with a lot of the applications today that use OpenGL on Mac OS X. And so this application, we call it, what did we call it?

Trytip Bandit. Trytip Bandit. That is its name. We wrote that it will use this in OpenGL Profiler, use these tools and improve its performance. So one of the first things that you generally want to do is you want to figure out what percentage of the time your application is spending in OpenGL and what it's doing with that time. So we've got the applications right here ready to launch. And so launch that.

So we've got it showing up here. It's just running as it would normally if I were to start it from the finder. And one thing you may notice is we already have the frame rate is already showing in the bottom of the profiler panel because it's non-invasively able to capture the calls of OpenGL and it's displaying this information. So we're saying we're getting around 33 frames per second right now. So let's go and find out where that time is going to make that 33 frames per second. So I'm going to check the statistics, collect statistics right here.

[Transcript missing]

So what I'm going to do is I'm going to disable that function. So let's go-- So I'm going to open up the breakpoints window. And this window gives a list of all the functions of OpenGL. And you can control them through different ways. We're going to show you how-- if you don't understand all of it yet, don't worry. We're going to go over how to use this window in depth a little bit further later on. So I'm going to look for glFinish.

We see the command here. And I can simply-- we see this column execute, and I'm going to simply turn off that column. And we see already in our statistics that it's turned red, which means that we're no longer calling that function. So I'm going to clear this, recheck, and we can confirm that glFinish is no longer in the statistics.

So we're already-- It's about 5% faster from what I've measured just by taking out the function. It allows you to do this all on the fly. So, let's see. So with that, back to you, Dave. Thanks, Chris. OK, I want to mention here that Chris and I work about 200 miles apart or so. And I have never seen this demo until now. And it was awesome. So thanks. It was awesome.

Okay, let's go on to another section of usage data that we harvest with Profiler. This is a call trace. In the last demo, you saw that we capture every function and time it. And in this usage data capture, we grab every function as you call it and store it.

So you've got a whole trace of all the GL function calls you make as you make them. And you can see here that the output is kind of C-style. You couldn't actually just take, grab this and compile it and get errors. But it does print out the symbolic names of the parameters.

So that makes it a little easier to read. And I want to point out a couple new features for Tiger. One is that you can apply a filter to this trace. A couple of you guys, developers out there, had this excellent, excellent idea of taking this text and running it through various Python and Perl scripts to come up with statistical analysis. And that was built right in. So you just say, "Enable filter." You pick the filter and it'll push it right through there and show you your output. The other thing that we have on here that's new for Tiger is timing information per function.

So before you saw in the statistics window that the timing information is cumulative for all the function calls. So when it read GL vertex 3F, for example, that timing information is the sum of all your GL vertex 3F calls. In this case, you get the time for just the individual one that's on this list.

Now what that's really useful for is finding hotspots. Because you might have instances, for example, of CGL flush drawable. I'm just pulling this out of the air. That might take a short amount of time and one or two that are super long because of state changes and things like that.

Well, on the stats window, it's going to show up as taking a long time. Cumulatively. But what you really want to do is narrow down those one or two that are really soaking up all the time and figure out why that is. Well, this timing information that is coming in Tiger will enable you to find those really fast. Plus, attached to each one of these lines will be a reveal for a full backtrace.

So you can click on the function and find out where in your code that specific call was made. narrow it down. And again, this is all per context or global. So if you have a bunch of contexts, you can narrow it down to looking at function calls just in one context.

Okay, and I think with that we're going to turn it back over to Chris. So back to computer three? Thanks. So I've got here, I have a second application here which I've already taken the finish out of and I'm going to demonstrate how to more effectively pass down vertices through your application to OpenGL. So let's launch this application up again.

And so, as Dave was showing, OpenGL Profiler allows you to get the trace of all the OpenGL functions that are being called. And I'm going to go ahead and do that. So, let's click this button, "Create Trace." And I'm going to stop that because otherwise I might fill up hard drive.

So, looking through this trace, we see that there's a lot of vertex begin-end, basically, calls. That type of call. So, GL begin, GL vertex, GL vertex, GL vertex, GL end. And this is actually, for static data, this is a, actually, it's more, it would be more efficient to use display lists for this particular case. For instance, the land here is all static, yet I'm passing it down through immediate mode.

And so, what I'm going to change about it is you can either add vertex array range, vertex buffer object, display lists. All of these allow you to effectively pick, you can pick the type that you feel is most appropriate for the type of data that you're trying to draw with. And use that to more efficiently take advantage of the video.

More efficiently take advantage of the bandwidth of the system. So, one thing to note is when you do have all these calls, like, immediate mode requires a lot of calls with all the begin-ends. Versus if you were to say use a display list, which is a single call, GL call list.

So, again, also as John Stauffer went over in his talk earlier today on optimization in general. And so, that's a little bit of a different approach. But, in the case of a display list, you can use CGL macros to, in order to cut down on the overhead that each function call, basically the overhead of making the function call itself.

So, in the case where, in this particular application, simply switching to CGL macros will definitely get us a gain, a good gain. But, I've already written this to use display lists. And, well, first let's check out the statistics again. And we can see, sorting by the GL time, that we've got vertex 3f, tech cord 2f, vertex 3d, color 4f, begin-end. All these calls are immediate calls. And so, let's launch, well. And so we get about 35 frames per second. So I'm going to stop this application. Start up one using display lists vertex array range.

And that alone, we're already up to 160, 170 frames per second. 165. Simply by passing the data using display list vertex array range. Let's go back to statistics. And now we see that we're actually... So most of our time is being spent in GL Call List, and then the rest of the time, most of the rest of the time is being spent in CGL Flush Drawable, which basically means it's waiting for the video card to stick more data back on it. So we've pretty effectively used OpenGL on the CPU side right now. So back to you. All right, thanks, Chris.

All right, let's move on to application control and some of these features that are in Profiler for this. One of the ways you can control your app is by setting a custom pixel format. And what we do here is--. "How to inject a different pixel format than what you asked for in your code." So again, this is without recompiling your application or changing any code like that.

We can do things, for example, change the depth buffer size. So you want to see if your app will run with a 16-bit Z-buffer instead of a 32-bit. You can do that through Profiler without having to rerun your app. Or recompile your app, I mean. You have to rerun it.

The other thing you can do is what we call driver emulation. We don't actually fully emulate the graphics drivers because we can't. It's hardware, there's all kinds of stuff involved in there. But what we can do in Profiler as a runtime system is get in the way of the GL get calls and make it seem like you're running on another card.

So this is useful if you want to make sure your app is following correct code paths. For example, you've got different code paths depending on the return from a GL get string because you're looking at different card features. And you're going to enable or disable certain functions in menus. I see this in games all the time. You're going to change a menu that allows you to turn on certain features.

and David DeRose, the team of developers, are here to discuss the OpenGL application.

[Transcript missing]

"Your app thinks it's something else, an NVIDIA card, and really it's an ATI. So you use it with care, but it does have use. Now another way we can control the application Chris already showed in his demo, GL Finish, is that you can enable and disable GL calls. So you want to see what your app looks like without ever calling GL Finish, just turn it off. And you saw that the app not only looked the same, but ran way faster.

We can do that. You can also attach scripts at breakpoints. Now, what that means is you can write little pieces of GL code and Profiler will take and inject those into your application at breakpoints while you run it. And then we're going to see an example of that later on.

Okay, this is the breakpoint window, and Chris showed this earlier. These, like a debugger, you can set breakpoints. But unlike a debugger, you can only set them on certain functions, which is all the GL calls. So this is not a general debugger feature. This is just a way to stop your app on certain GL calls.

and the interesting thing is that you can stop it just before it goes into the engine, or you can stop it right after it comes back from the engine. Why this is useful is because along with a backtrace,

[Transcript missing]

And again, this handles the multiple context case.

Now, what we do here, though, is, you know, like a debugger, if you have a bunch of threads running and you put a breakpoint on, you know, function foo, then it's going to stop in every thread. Well, it's the same here. If you put a breakpoint on GL flush, then it's going to stop at every context and every thread that calls it.

Okay, and with that, I'm going to go back to Chris for catching OpenGL errors. So, GL Profiler is good for breakpoints, and one of those types of breakpoints that you can set is a breakpoint on any GL error. So, what I'm going to do is run my application this time with a breakpoint set any time that GL error might occur.

So, I'm going to go up and go back to the views, set the breakpoints, and we can see here we've got a list of different types of errors we can break on. I'm going to break on error, which refers to the normal GL error, and let's start this program up.

So already, it's caught a geo error. And it says that I'm calling geo blend with geo blend equation, when in fact, blend equation is not something that's supposed to be enabled and disabled. Geo blend is more common-- is actually what's supposed to be there. So we get the error, geo invalid enum. And we also see-- we can see the backtrace and the actual line of code.

Somewhere here. You can see the line of code where this error is occurring. And so using this, I was able to quickly realize where this was, fix it, correct it to GL Blend instead of GL Blend equation. And I'll show you the result. We got one taker. Let's see. So the blending, that's actually, if you notice the function backtrace, that was in drawSky. And so I had blending basically was not enabled for that. So let's start up the version without errors. I'll set that breakpoint again. Let's see.

Break on error. Start it up. And as we can see, it's not breaking on any errors, and now we've got the clouds blending pretty well. Let's go back to you, Dave, and you can explain some of the other types of errors that you can break on. Thanks, Chris.

Okay, we saw breaking OpenGL errors. Very useful. Another way that you can trap these errors in your app is on thread conflict. John talked earlier about multiple contexts, multiple threads, and what's legal and what's okay and what's not. Well, You can have more than one thread talking to one single GL context, but it's up to you to make sure that the thread is locking correctly and not in the context--not more than one thread in the context at the same time. If you end up with more than one thread talking to the same context, you can get all kinds of funky data corruption problems and bad things can happen to your computer. So you don't want that.

Well, Profiler, what really is happening here in this thread conflict is that you are supposed to have the mutex locks on the threads if you're going to talk to one context. Profiler has this mode where it applies the locks that you're supposed to have. So if you get into the case where threads are going to conflict, it'll trip over one of those locks and stop and say, "Hey, you know, there's an error here."

Then you can go back into your app, again by using the back trace. And you can apply the locks and clean it up. Personally, I recommend that you have one context per thread, but that's just my personal opinion. Do what you want. Now, this threading collision stuff is only detected in the OpenGL APIs. We don't detect it in the CGL layer. So you're on your own there.

Another way we can detect errors is the panel up there is a break on var error. It stops on vertex array range and vertex array object errors. Essentially, these four points, in sum, say, any time an index that you're using to draw with veers outside of an array range that you've specified, or if you're going to hand in a pointer that is outside of one of those ranges that you haven't properly set up, then we'll stop and we'll break. And again, show you the back trace, the full GL state, everything you need to see. And we validate your vertex array range on any of these functions that you see up there.

Okay, we talked about the full snapshot of OpenGL state. It is a full snapshot. So every GL get call that you can make is done right here. And we put it in this list, this reveal list. Now what happens is that the state is gathered, it's harvested every time you stop at a breakpoint.

And the changes in the state are shown in red, and the changes being since the last breakpoint. And to show that, what I did here in this screenshot is I've got a stop on glEnable before it goes into the engine, and then another stop as soon as it comes out.

So what I would expect is that it's glEnable, so I would expect the state to be turned on, right, that I'm changing. And that's in fact what happens. You can see down here at the bottom it says it broke after glEnable. In other words, it's gone into the engine, it's come back out, and it's stopped again.

And then you can see there that the call face is now enabled. So this is really useful for detecting errors where you think you have states set up that may not be, or a state that's set with incorrect values. You can watch the change. We've got another take for that. Awesome.

Okay, just another quick, couple quick points on this window. There's, under that Actions pull-down there, these are just shortcut menu options to stop everywhere before, stop everywhere after, stop nowhere, you know, it turns on all those buttons or turns them all off. You can also execute no GL functions.

So if you want to see, and we've had examples of this in the lab, where people, you know, your graphic is slow and my app runs slow because your graphic's just not up to par. Well, so we said, "Okay, we'll take your app, we'll turn off all the graphics, and notice it goes the same speed."

Guess what? So you can run your app open loop and decide, oh, well, maybe I better get Shark and Chud tools out and make that go a little faster first before I start blaming people randomly. Not that I've ever done that. And of course, you can ignore all the breakpoints, too, if you just want to run your app without stopping anywhere.

Okay, now with that, we'll turn it back over to Chris and we'll talk about unnecessary state changes. So, already I talked about the immediate mode. I talked about how making a lot of calls actually will result in function overhead. And the same holds true for setting state. Except there's also the fact that setting state can also itself, the actual setting of the state, can take up time.

And even if the setting of the state doesn't take up time, you may not know, but some of the state changes will be deferred until your draw command. And that will cause your draw commands to go a little bit slower. So, like in the statistics, for instance, you'll see draw rays taking a longer time than usual because you've accidentally turned something on or turned it on multiple times. Or just switch some sort of state. So, that's a state that you didn't need to switch.

So, one thing that developers should try and do is they should try and avoid state changes when they can. But they should also keep in mind that OpenGL does keep track of, as a state machine, it is keeping track of what you're doing. And depending on the type of state that you're setting, it may be more efficient for you as a developer with the semantics of the application. So, that's a good thing. And then, of course, you can also use the state change to decide whether or not to do the state change yourself. So, I'm going to launch up.

The application here. And I'm going to go look at statistics. And one thing I wanted to reiterate that Dave said earlier was the estimated percent time in GL is a really useful feature to look at. Like here we see the applications taking 91% of the application is going to OpenGL.

And sometimes, so depending on your application, that percentage will be different. But for this particular application, since I'm just pounding on the graphics hardware, I'm not doing anything that has to, I'm not doing any CPU calculations such as physics or anything similar to that. So because of that, I have a pretty high percentage of time in OpenGL. Sometimes it's better to have a higher percentage of OpenGL time because that means that you're giving more data to OpenGL in general. So let's look at, I wanted to look in particular at the number of function calls.

With GL Enable and GL Disable. And if you look at the number of calls between those two, you'll notice there's actually, they're very different. So there's 190,000 disabled calls while there's only 125 enabled calls. So obviously this is not necessary, that means that there's some sort of imbalance there. And that's just one example of a state change which is unnecessary.

So by taking that state change out, you don't just gain the time in the actual function itself, like here, percent, so the average time here is very small here for the enable and disable. However, this time might be actually showing up in your other calls such as gl_begin and other similar function drawing commands. So, back to you. There's a 9600, ATR 9600 card on a dual two.

Let's talk about some of the graphics state that your application keeps. There's a differentiation between state in OpenGL, which is a state machine, and graphic state that your application owns. The difference being that your app is going to own things like textures, vertex programs, and as this slide shows, a depth buffer, back buffer, things like that. It's not strictly speaking GL engine state.

But because it's important, especially when you're debugging and in performance analysis, to know what's going on with that state, Profiler captures it all, too. So this view is the depth buffer. And what Profiler does here is grab the Z-buffer, the depth buffer, and then grayscales it. Okay, so that on your gray scale here, the black pixels are minimum z and the white pixels are maximum z.

That slider at the top is showing you your Z range. When you get the depth buffer up and you click that magnifying glass, the profiler will automatically analyze the image and say, "All right, your minimum Z value in the depth buffer is such in this case 0.3 and 0.4, and your maximum is 1." Now, in OpenGL, the default The first thing I want to say is that the z values in the depth buffer are always between 0 and 1. There is a way to change that, but generally speaking, the values of the floating point range of z in the depth buffer is 0 to 1.

The idea here is to show how much Z precision you're using. So there's one of the common problems we run into, which Chris is going to demo later on, is something we call Z-fighting. That's our colloquial term for it. And how that manifests itself is you get these little flashing polygons, because there's not enough Z precision to tell which part is in front and which part is behind consistently. And so you don't have enough precision in your depth buffer.

The first step in how you can see if you have enough precision or not is by using this view. And if that orange bar at the top is really tiny, then you've got almost no z-precision. The wider that bar is, the more z-precision you have. So that's what you're striving for.

And the way that you affect z-precision is by changing the near and far planes in your GL frustum call. You know, I want to, this sometimes has been a point of confusion, especially on the OpenGL list I've seen, where the values of z in the z-buffer, the range of those values, I should say, is not affected by GL frustum. They always go from zero to one.

What changes, what GL frustum changes is how many of those bits you're going to use for the zCompare and the zBuffer. Make sense? So in other words, you want to, you don't want to have just the top two bits being used for all your zComparers. You want to try and get all 32 or all 16 or whatever your depth is. and your near and far plane are going to be the determining factors for how much precision you use. The actual values are always 0 to 1. The range is always 0 to 1.

Okay, then another kind of buffer you can look at with Profile is a stencil buffer. What Profiler does is pseudo-color the stencil planes. The way you use a stencil buffer is that you set individual bit planes. So, Profiler, you can pseudo-color those. On this example here, we've got three bitplanes being used in the stencil buffer.

And Profiler pseudo-colored them with blue, green, and red. So, you can see here there's an area of black where there's no stencil bit set at all. Then, which planes have stencil bits set in the red and the green and the blue? And then where it's purple, Profiler composites bitplanes together and comes up with another pseudo-color. So, the purple areas are where you have red and blue set. So, both of those bitplanes have been set in your rendering there.

Now, other buffer views that you can get are the back buffer. And that's pretty straightforward. It just looks like the front buffer before it got swapped. You can look at the alpha buffer, which is also gray scale colored. And you can look at all your auxiliary buffers. So depending on how many you asked for in your pixel format or how many the engine or card supports, that's how many you can look at.

Buffer views are all static. So they're just what your app put in there. You can't edit them and then shove them back in and say, oh, well, what happens if I really had a Z precision range of much bigger than I really do? You can't do that. It's just reporting what you did. So it's just static images.

Okay, now with that, turn it over to Chris. So I'm going to show an example of being able to look at those buffers. So looking closely at this application, we can see in the background where the waters and the land is meeting. The land doesn't quite look right. It's not a smooth land. There's not a smooth line there. And what I think this is, is I think it's Z-fighting. So the way that I would check on this, the first thing I would do is I'm going to take a look at the depth buffer.

So to do this, I have to set a breakpoint in order to specify exactly where I want to look at the buffer. So I go up to views and... "I'm going to set a breakpoint right before GeoClear is called. So that means everything's done, it's going on to the next frame. But since I set it before, it won't actually execute it until…" So I set my breakpoint and I go up to the views.

Let's look at the depth buffer. So, well, you can set slider here, but I'm just going to use the auto find min max, which will, we can see that the min and the max Z value that we're using is actually very small. So the precision that we're using, this is a 32-bit depth buffer in this case, but we're only using a very small amount of it from 0.9 to 0.9.

So we're going to set the value from 0.996 to 1.0. And what we'd like is for this value to be a lot bigger. We'd like to use a lot more of that 0 to 1. So let's go ahead and figure out why this is, why we're using so little of the Z buffer. And I'm going to set a breakpoint on GL frustum.

I think that I don't actually call frustum unless I resize a window. Here we go, frustum. And we see the frustum is being set with the x, y, or I can't remember what these arguments are. It's basically-- and then these two values are the z min and the z max.

So we're going from 1 to 100,000 or a million or something like that, which is pretty large considering that I'm only really drawing from 0 to 40. So because of that, that's making our depth buffer look incorrect. So let's see. So I'm going to show another application, this same application, with the frustum modified so it will clip between 0 and 40.

And again, I'm going to look at the depth buffer. Well, we can see already that the water is looking much nicer. We've got clear lines where we used to be having some z-fighting issues. So let's set a breakpoint to clear. And we can see the depth buffer automatically updated. And we're actually using a lot more of that range. So in effect, we've gotten rid of those issues. And back to you. All right. Thanks, Chris.

That Z fighting is, that's, that in the past has been a real hard one to find. You know, we've got a lot of chatter on the OpenGL list about my polygons keep flashing in and out and to try and... It's not obvious that your z-precision is related to the GL frustum call. The GL frustum has nothing to do with the death buffer, right? So there's not that instant correlation.

Okay, more of the application graphics state. Profiler will capture all your textures that you're uploading, vertex programs and fragment programs. And you can look at those. and make sure and verify for yourself with Profiler that you really did upload what you think you uploaded. And one place where this is really useful and Chris is going to show later is in your mipmaps because you can get a lot of weird texturing errors when you think you've got a mipmap up there that you really don't. This screenshot up here is showing a cube map.

and what Profiler does there, capture each of the individual six faces that go on the cube map and we stick them on a cube. You can rotate that around and it will show you which map is being applied to which face of the cube. So again, verify that you've uploaded the right texture to the right face.

Plus there's a bunch of information up there that talks about the internal format, the source format. So when you're looking at performance issues in terms of... What kind of texture formats the card's going to perform best with? You can see, oh, well, if I change the internal format and ask for a different internal format, you can maybe get better performance. It shows the texture dimensions. There's a mipmap slider down there, which Chris is going to get into more detail on. and other little buttons and things like that to show you can flip the texture up and down.

Upside down. And so with that, I'm going to turn it over to Chris with ML. Okay. So we've got the profiler running here. This time when we launched the application, just like normal, we -- this time I set collect resources, however. And so this brings up the resources window right here. And so right now I'm viewing the textures.

So looking at this application, we see it's a nice sunny day. You know, it's a sunset. It looks pretty warm here, but looking down at the ground, it looks kind of cold, like snow. But that's actually because one of my textures isn't uploading correctly. And so by default, when a texture image is not specified correctly, it defaults to a white texture.

So let's go and see why this texture is being white. So we see, well here's a grass texture. We don't see that grass texture in here. We've got sand, shows you all these resources, I'll show you these, and like clouds. And so the mipmap slider down here actually serves a dual purpose in that when you have mipmapping enabled, it will let you slide between all of the mipmaps and see each one. And when it's disabled, this actual slider here will be disabled. So let's go look at one of the textures we know is uploaded correctly and go look through these mipmaps. It looks like they're all specified correctly.

So let's go back to the grass texture. And we notice that these mipmap levels have not been updated. So to fix this, either we could turn off mipmapping for this particular texture, or what I'm going to do is just I'm going to specify the mipmap levels for all of those. And so I'm going to actually-- this is a great way to show off the scripting ability in Profiler.

So I'm going to, on the fly, disable mipmapping for the texture so that hopefully we can make sure, we can verify that this is why this texture is not showing up. So I'm going to go to the breakpoints window. This is where you can set up your scripts. And I'm going to have a script that turns off MIT mapping. So a logical place for me to do this is after every bind texture call.

So by doing this after each bind texture call, I'm going to call GL_TECH_PARAMETER_I with the target texture 2D and set the min filter to linear as opposed to linear, mipmap linear. So let's go ahead and do that. So I'm going to attach a script through the actions here.

Let's open up my script. You know what, Chris, while you're doing that, I want to jump in here. Sure. You'll notice that whenever Chris is up there looking for functions, that he's not moving the mouse around. He's typing on here. Because it finds-- yeah. Chris is a real keyboard-oriented guy. And so we put in these-- Keyboard is good. Yeah, put in these ways to find functions passed by just typing.

Okay, so let's attach that script. And for this particular script, I'm going to attach the no-mitmap script that I just specified. I'm going to have it execute after the bind texture call. You can either execute before or after, so I'm going to have it execute after. And after it does execute that script, I'm going to have it continue.

You can have it otherwise pause and show you the state after that script's been done. So let's watch this. Attach, and as we can see, on the fly we've corrected that and everything looks much better than it did before. This is a live demo, ladies and gentlemen. That just really worked. Awesome.

So back to you. All right, thanks, Chris. OK, so let's move on to the OpenGL driver monitor, the second tool in our suite. We can call it a suite because it has more than one tool. It has two. OK. The driver monitor is where Profiler attaches to your software and shows how your software is interacting with OpenGL. Driver monitor attaches to the hardware and it shows you what's going on in the GPU.

Earlier versions of Driver Monitor had these really bizarre, obscure parameter names like Gart Wait Time and stuff like that. It's one of my favorites. And we got a lot of questions like, "What does that mean?" We developed the decoder ring to say, well, when you look at these arcane cryptic parameter names, this is what's really going on.

And you had to go to this URL to get that. Well, for Tiger, we built all that into driver monitor. So not only are the parameter names text that's even sort of human readable, you can roll over it and it'll pop up the decoder ring. For that particular parameter, I'll tell you what. You're welcome.

Uh, Driver Monitor does remote monitoring, too, which means if you have a full-screen app, It's pretty hard to run another app on top of it and see it. Actually, you can't. So what you can do is run your full-screen app on one computer, and then as long as you're connected on a LAN with a second computer, you can run driver monitor on that second one and monitor the other GPU over the network.

Okay, let's have a demo of Driver Monitor. So I'm going to show the Driver Monitor in use. So I'm going to start up my application just using Profiler, just because it's handy. I've got my list of applications, and I've launched it up. And everything looks good. Let's bring up Driver Monitor.

And so we've got the list of everything we've got, of all the parameters, and by default, whoops, We can set use descriptive names. By default, it will be like this, and we've got...

[Transcript missing]

And you also have mouseovers which explain everything that you'd want to know about these things. So here I've added, right now I'm viewing on the graph, the current free video memory, the texture page off and page on data. And so we see that right now we've got, let's switch this to linear.

We've got about 7 megabytes of VRAMs, 8 megabytes of VRAM free. And we can see that there's only about 1 or 2 megabytes being paged on of texture data each frame. If you don't understand what any of these things are, you can always just go over these in your free time, set it up, figure it out, pick the ones that you think are going to work for what you want to figure out. And so let's actually make this window pretty large. And we notice-- uh-oh. This video card must have more VRAM than I expected.

They really beef up these demo machines. Ah, that's well... That's just to make us look good, I don't know. Let's see. So we see that actually we can see that the current free video memory is bobbing up and down and whoa, let's go up to one gigabyte. We see that we're actually uploading texture page on data is reached, it's about 400 megabytes per second simply because I've made the window so large and I've got multi-sampling, all those nifty features on.

So it's taking up a lot of VRAM. So, because I used the drive monitor to figure this out, I can see that it's a VRAM issue that's causing it to slow down so much when I'm at full screen. And what I'm going to do is I decided to fix this by using compressed textures. which allowed me to stay at the same resolution, but I'm actually using only a quarter of the memory in VRAM for these textures using some OpenGL extensions which allow you to do this.

So let's do that again. And let's look at driver monitor. We can see that the VRAM has flattened out. If I were using... Well, it's a little bit faster, but in more extreme cases, you would see a huge benefit from doing things such as compressing textures, saving your VRAM. And there's so many other things that you can check. You can see where your time is being spent using the driver monitor. So... All right. That's it. Thanks, Chris. Thank you.

Quick words on what's new for Tiger in Profiler and Driver Monitor. You saw the single control panel, you saw the decoder ring built in, the new trace info stuff for the call trace. We're also going to--we've worked on better integration between OpenGL apps and Shark. Okay, a lot of you have said, "All my time in the Shark trace is being spent in GLD get string." What is that? Well, it's not really there. We fixed it.

Also coming in Tiger, remote profiling. So as with driver monitor, you can hook up across a network and monitor the GPU of a full-screen app. You can do the same thing with OpenGL Profiler. So you can run your full-screen app really in full-screen and get the full OpenGL Profiler benefit. You're welcome.

Quick note, you need to have the same OS and profiler versions running on both computers to make that work. Okay, now to wrap up, let's talk about really your performance issues is a balancing act. We've seen here a lot of talk about how you can improve the performance, your GPU usage, and you use Profiler to do that in driver monitor.

You also have a CPU in the computer. So you need to be sure that you're on top of its performance too. So your performance improvement cycle is going to work like this. First of all, your GPU usage might be very, very high and your CPU usage low. As a ratio, whole app is 100%. GPU to start with might be way up in the high 90s and CPU down in 10. As you use Profiler and improve your performance on the graphics card, well, now what's going to happen is your GPU usage as a percentage is going to drop, maybe driving your CPU usage higher.

Switch over to Chud Tools in Shark. Start driving that CPU usage back down. Well, that's going to, because it's a ratio, that's going to start pulling your GPU usage up. And then this is a cycle. And what you want to ultimately get towards is where they're just about 50/50 balance. You're never going to, you know, that's a perfect ideal you may not reach, but that's how you would use these tools in conjunction with each other.