Tuning OpenGL ES Games - Tech Talks

2011 • 51:41

OpenGL ES enables iOS games to create incredible visuals while maintaining high frame rates. Gain specific insights into mastering the performance tools, and learn key practices to keep your game on the fast path.

Speaker: Allan Schaffer

Unlisted on Apple Developer site

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.

Hello, I'm Allan Schaffe, the Game Technologies Evangelist at Apple. In this presentation, I'm going to cover some of the techniques that OpenGL ES based games can use to tune their performance, with a particular focus on 2D games. And of course, many of these same concepts will apply to tuning 3D rendering as well. And I'll be using a new instrument included with Xcode 4, called the OpenGL ES Analyzer.

This is something that many of you saw a preview of at WWDC 2010, but it's definitely worth taking another look at now and trying it out with your titles. So, let's get started. Now, as you probably know, OpenGL ES is the interface for high performance rendering in iOS, designed to let you take advantage of the 3D graphics hardware in our devices. And it's the API being used in most of the games on the App Store today.

Now, typically when we think of OpenGL ES based games, what immediately comes to mind are first person shooters, and out the window simulation type games, and adventure games, and so on. Where the world is being presented in 3D, and we're moving through it blasting bad guys with really rich, detailed characters, and a realistic environment. But actually, the use of OpenGL ES is quite widespread in 2D games as well, in a number of genres.

So, here's a few examples. The first is Plants vs. Zombies, a tower defense game from PopCap. Now, everything you see here is a 2D sprite. The backdrop, the characters, the text, and so on. And if you're familiar with this game, you know that the characters are articulated at their joints as well. But all this animation is just being done with sprites.

And the next is Osmos for iPad. This is a gorgeous physics puzzler from Hemisphere Games. This game has a lot happening on screen that gives it a really rich 3D look. But, once again, this is many layers of alpha textured graphics. 2D sprites being drawn in OpenGL ES.

Then there's Fruit Ninja by Halfbrick Studios. So here's OpenGL ES being used in a 2D action game, but with a really interesting twist. The backdrop, the swipes, the fruit splatters, and the text are all done with alpha textured sprites, but each piece of fruit is actually rendered as 3D geometry for extra realism. And the result just looks great.

Here's another. We can also see OpenGL ES being used in a lot of card games and the casino genre. This is World Series of Poker: Hold'em Legend from Glue, where as before, everything on screen is a 2D sprite. The backdrop, the cards, the player avatars, and they take advantage of OpenGL for things like chip animations and so on.

Now a couple more examples. Here's Harbormaster HD by Imanji Studios. This is a line drawing and chaos management game, where you're guiding boats into their docks on screen, and once again, everything on screen is being rendered in layers of 2D sprites using OpenGL ES. And then finally, one that many of you are familiar with is Angry Birds, with in this case, Angry Bird Seasons by Rovio. Now, every object on screen is one of a series of different 2D alpha textured sprites.

So what this is meant to illustrate is that even though OpenGL ES is usually associated with first person real time 3D graphics, that actually its usage is even more widespread into all the varieties of game genres that center around 2D scenes as well. So what about your game? If you're writing a 2D based game and using OpenGL ES, or a game engine or a third party framework layered on top of it, how can you make sure that your game performs as well and looks as good as these other very, very popular games?

Well, I'll start by looking at what some of these games can have in common, and then we'll see how to tune for that. What you see in common among almost all of them is, of course, widespread usage of OpenGL ES, and it's being used to render quads or sprites with alpha textures. Now, typically each quad is being drawn separately, and quite often each quad is carrying around with it a lot of its own graphics state. And so the amount of overhead involved in rendering the scene is really quite high.

It's also very common for these games to have a top-down or side-on view or false perspective view. Sometimes with a really large level that you're panning around and a lot of off-screen game elements. So the approach that we'll use to tune for this situation breaks down into these four steps.

First, we'll do a really detailed analysis of the OpenGL ES activity happening in the game. Then we'll walk through what we find and work on eliminating state changes and optimizing the draw calls. And there's some well understood techniques that we'll use for doing that. Then once that's taken care of, at the end I'll have a few pointers for some further optimizations that you can make. So let's start with measuring OpenGL ES activity.

Now, in the past, this has been a little bit challenging. It's actually not very easy to just look at the source code of a game and intuitively figure out the linear stream of OpenGL commands that it's going to send down to the GPU. Your game might end up doing a lot more work than you thought, sending more commands that you realized, or if you're traversing some data structure, that might result in sending work to OpenGL in some unexpected order, causing extra state changes.

Or, you might just be doing the wrong kind of work, either by having a lot of overhead or going down some non-optimal path. And so, actually measuring and analyzing the OpenGL activity coming from your game becomes critical in knowing what's going on and understanding what you need to do.

And now there's a great tool that can help you with this called the OpenGL ES Analyzer Instrument. This is a new instrument included with Xcode 4 that'll record and measure the OpenGL ES activity in your app, analyze that data, and find specific correctness and performance issues, and then provide you with instructions on what to do about them.

Now, the analyzer has three main parts that you'll use. The first is the activity monitor. This records a trace of all the OpenGL ES activity in your app, and provides you with statistics about what's being called, how much time it took, the number of calls, the stack trace that each GL call was being made from, and so on.

Then the second part are the overrides. And these are basically global switches that will bypass specific parts of the graphics pipeline, such as fragment processing or texture lookup. And by turning on and off certain parts of the pipeline, you can very quickly identify where your application is bottlenecked.

And the third component is the OpenGL ES Expert, and it's amazing. So this is an expert system with knowledge of our hardware characteristics and the best practices for our OpenGL ES implementation. And it watches the trace coming into the activity monitor and looks at the statistics, and then tells you exactly what trouble spots it sees, and even better, exactly what you need to do to fix them. It's just incredible.

So, let's take a look at a demo of the OpenGL ES Analyzer instrument. Alright, so first let's take a look at the demo that we're going to profile. I have it loaded on this phone here. So what you can see here is all of this is being drawn in OpenGL ES. It's a bunch of 2D sprites. I'm drawing a quad for the background, a quad for the spacecraft, and the fuel display, and the lander at the bottom. And then all of these particles floating in are quads as well.

Okay, and I can move things around, and the particles follow the spaceship. So it's a very simple demo, not a very fun game. But it's what we'll be using to profile. So let's take a look at it now in the OpenGL ES analyzer instrument. So I'm going to run the instrument here. And it comes up with, and allows me to choose a template. I'm going to choose OpenGL ES Analysis.

I'll choose my target, which is this demo in its before state. Okay, and now let me just start by running this application for just a few seconds and collect some data, and then I'll give you a tour around what you see being collected. So I'm capturing a few frames here, and that's probably enough, 23, 25 frames.

Okay, so I start out in this view, which is the OpenGL ES Analyzer Expert. And I'm actually going to come back to this later in the talk, so let me skip over that for now. But by pulling down this tab, you can see a lot of different statistics and different kinds of data that we can get access to. And I want to start with the frame statistics.

So I ended up actually drawing 43 frames, and you can see here the number of triangles that I've rendered, the number of batches or draw calls that I've made, the total number of GL calls, and redundant state changes. And actually, I control what statistics are gathered up here by this disclosure. And so let me zoom into that.

So this is a list here of the frame statistics that I've chosen to observe, the batches, GL calls, state changes, and triangles. And in this particular demo, all I'm drawing are triangles. So that's why I've selected triangles rendered here. But if I was drawing points and lines or other objects, I'd probably want to look at a few other statistics.

But okay, so let's take a look just first, what do we see here? Well, down in the number of triangles rendered, each frame is only rendering a thousand triangles, but it's taking me 500 GL draw calls to actually render that. So that's 500 calls to GL draw arrays. And really, it's not a very complex scene.

And the next thing to look at is this, the number of GL calls. So again, 500 objects are being rendered on screen into quads. But it's taking nearly 8,000 OpenGL calls for me to draw that object. And so that means that there must be a lot of state changes going on here.

And then third, if I come over, there's also a lot of redundant state changes happening. Almost a thousand per frame, or basically two for every one of these batches. So that's something that we'll want to take a look at. But so that's the frame statistics view. Okay, so now let's look at the API statistics.

So what this is, is a list of all of the OpenGL calls your application is making. It tells you how many times you're calling each routine, the total amount of time for each call, and then the average of each call as well. So what I like to do here is to sort this list by the total time, and look to see what calls are really taking the most time. So let me zoom in.

I would always expect for draw calls, like GL draw arrays or GL draw elements, to dominate this graph and to be up at the top. If they're not, that's something that you should look into. But then the next thing to do is to just try to see a few calls, what calls that are happening next, that look like they're kind of expensive. And there's a few here. GL enable client state, GL bind texture, GL disable client state. We're calling these a lot of times.

And the total amount of time that they're taking is rather significant. So now the trick for this, though, is actually... If you don't know what you're really looking for here, is to find the call to Eagle Present Render Buffer, and then look at everything that's above it in the list. And those are really the calls that you can spend the most time on, and probably get the most benefit from tuning.

And so we have a pretty good list here of different calls. Draw arrays, text image to load texture, we have enable, and so on. And there's one more thing that you can do, actually, and that's to enable single frame navigation, and go and look at a particular frame. Say I'll just choose frame 25. And you can see which calls here are being made, and exactly the count of how many times they're being called per frame.

And actually, this is already now telling me about a likely problem. I'm calling GL draw arrays 500 times, GL enable almost twice for every draw arrays, and enable and disable client state twice for every draw arrays, and so on. So I can see that I'm really doing a lot of work for every call to draw arrays.

And the reason why this is important is because each of those calls to draw arrays is really only resulting in a single call. And so this is a lot of overhead being set up for really very little output. Okay, so that's what the API statistics can show you.

Then next, I'll show you just the call trees. So this is a lot like the API analyzer where you can just see the call trees for each of the rendering calls in your app and how much time they're taking. So in this case, I don't need to go into that, but it's there for you if you want to take advantage of it.

And then last is the trace. So really, this is where I like to spend the most time when I'm analyzing an OpenGL application, is to just look and see what calls are really being made every frame. And this is the best way for you to actually see the linear trace of calls if you have really complex source code or some intricate data structure that you're using that's resulting in all of this.

And so the trick to do, as I zoom in on this, And what you should do is try to think of how your code is structured and try to correlate what you see in this list to one object being drawn on screen. And here, for example, I can see that all of this is correlating to one object.

Because I know from my code that every object starts with a call to push matrix, and then a translation, some state gets set up, and then finally here I call GLDrawArrays. And then I unwind all that, pop matrix, and call getError. So these are the calls that I am making per object for everything in the scene. So every particle, the background, the lander, and so on. And so what we'll do is take a look at this now and see if there's ways to optimize calls out of this.

Now, if you have a lot of different objects and kinds of objects, you might have to do, you might have to repeat this process two or three times or multiple times to go back and really cover all of the cases that are in your app. In mine, it's pretty simple. Everything is just being drawn as a quad, and it's being drawn using these, generally the same routines here, just changing the position and changing the texture for each one. But so that's the API trace.

All right, again, that was the OpenGL ES Analyzer instrument, which is going to help you measure and analyze the OpenGL ES activity in your application. So you just saw the frame statistics, the API stats, the command trace, and the call trees. And at the end of the talk, I'll go into more detail about the Analyzer Expert.

So now let's go back and remember some of the problems we found. The first thing we found was that in this demo, the number of triangles per batch was much too low. Each frame of the demo had 500 draw calls, and this resulted in only 1,000 triangles on screen.

And what this really indicates is that your vertex arrays are just much too short. Only sending two triangles here for each draw call is just terrible. So you want to try to minimize the number of batches and make that ratio much, much higher, so that each batch, each draw call, takes care of a lot of triangles.

Problem number two. The number of GL calls for each batch was much, much too high. So this indicates that we're carrying around a lot of overhead for every object that's being drawn on screen. And what this means is that there's a lot of state management happening between each draw call. In this case we have 500 draw calls and nearly 8,000 GL calls per frame, which is about 16 to 1, and that's much, much too high. You really want to minimize the number of GL commands that are being issued for each of your draw calls.

Then third, redundant state changes. So these are just totally unnecessary. When you see these happening, it means that you've changed the OpenGL state to the same value that it already had, which is totally redundant. But it still causes work to occur in the implementation. So if this is happening, it's something that has ultimately no effect and just doesn't need to occur, which is a perfect thing to optimize. So we'll need to make changes in our code to avoid making redundant calls.

Now fourth, on to the API Statistics view and things to look for there. Well first, as I said, we expect to see GLDraw arrays or GLDraw elements dominating the total time on this graph. Of course, if it's not, that's something to look into, but it is at the top here.

So the next thing to look at are the next few commands that are taking the most time. And see if these are things that can be eliminated one way or the other. And specifically, my rule of thumb is to find everything with a higher total time than Eagle Present Render Buffer. And tuning those calls will give you the most payoff.

And then finally, look at the command trace itself, because this is really telling you the truth about your application's rendering. And the best way to do this is to enable single frame navigation, then go to a frame that you want to look at, and try to identify the calls that correlate to the various objects being drawn on screen. Now, in my demo, it's easy to find, because I know that every object starts with a call to GL Push Matrix, and ends with GL Pop Matrix and GL Get Error.

So now I can look at this, and really understand what's happening in GL for every object I draw. And this is a lot, 16 GL commands per object. So you should do the same thing. Try to find the different objects you draw, and see what work is really being done on their behalf. So let's take this example now and really dig into it line by line. So here it is. This is what I'm doing for every object in my scene. The ship, the thrusters, each particle, the background, and so on.

So for each object, I'm starting with a push matrix and applying a transformation to position that object on screen. Then here I'm setting up my state. I'm enabling texture and binding the appropriate texture ID for this object, and enabling blending and setting up the blend function for something appropriate for an alpha texture.

Then I enable some client state for a vertex array and a texture coordinate array, and set up pointers to those arrays with GL vertex pointer and text cord pointer. And then I draw the object by calling GL draw arrays with a triangle strip with four vertices, so that's two triangles or a quad.

Now here's where I undo those state changes that I made above. I disable the client state and set this blend function back to the default. And then pop the matrix transformation stack. And I call glGetError just to make sure that I'll catch any errors that might have happened. Now remember, all of this is being done for every object, every particle, and so on in the demo.

So, the thing to understand about this is that everything you see here is totally valid code. It's syntactically correct, it's a valid usage of OpenGL ES 1.1 in terms that there's no errors here, and the results on screen are obviously correct, you just saw them in the demo.

But actually, everything about this is just totally wrong, every single line. And so we'll spend the rest of the talk fixing it. And in particular, this line. I see this a lot, where every quad is being drawn as a separate triangle strip with four vertices. Or I've seen some apps doing the same thing with independent triangles and six vertices.

And if you're doing this, if this is what you're using to do the heavy lifting in your game, and specifically, I mean that if your game does a heavy amount of drawing, and most or all of that drawing is through calls to GL draw arrays with only four vertices, then let this be the alarm bell that tells you that something is seriously wrong and needs to be changed. So, I'll show you what to do with this in part three of the talk, but for the moment, let's not get too far ahead of ourselves, and let's see what we need to do to reduce the number of state changes in this app.

So let me start just by reviewing the concept. So at any given moment, OpenGL ES defines a current state of the rendering pipeline. And this is all done atomically by doing things like setting the current matrix transformation or blending function in OpenGL ES 1.1, or binding a vertex program and fragment program in ES 2.0. And in this example, when it's time to draw something, then the current 1.1 state will be compiled and cached away, and whatever's being drawn will take advantage of that state.

And if the next object to be drawn can also use that state, then it's very efficient. It can go through the entire pipeline without changing anything, and that's very fast. But if the next object to be drawn needs to change a lot of state, maybe because it uses a different texture or needs a different blending function, then conceptually, we need to stop, compile, and cache that new state, and then go on to draw that object. And that little bit of time taken to set up the new state can really add up if it's changing a lot or changing for everything you draw.

So instead, if performance is a limiting factor for you, then you should design your rendering code around the idea of minimizing state changes. And how you do this will be very dependent on the drawing requirements of your game. But there's three really simple techniques that can have a pretty dramatic effect.

So the first has to do with eliminating redundant or repetitive state changes. So the first item is something that the OpenGL ES Analyzer instrument makes really easy to find, and those are redundant state changes, or cases where you're setting the state to the same value that it already has. And the way to eliminate that is to just keep track of important state elements yourself.

Like for example, keep a variable around that holds the ID of the currently bound texture, and keep another that tracks the current blend function, and so on. And then before you bind a new texture, or set a new blend function, check the value that you're already holding in your variable. And if it's the same, then don't bother issuing that command to OpenGL. It's already in that state.

And all you'll be doing is making OpenGL redo its validation in this new state that you're setting, and ultimately it would just end up right back at the same place. Now the second item is something the Analyzer won't necessarily find for you, but it's easy to spot in the command trace.

And it's to avoid the situation where you have unnecessary symmetry in the way that you're enabling and disabling various state. So don't fall into the trap of setting up a bunch of state, drawing something, and then cleaning up all the state you just said, just in case something comes along that needs to do something different. The next object may actually need to be drawn with the same state as the first one you did. So then you'll just have to go and make all those same state changes again, if you had cleaned them up.

And you saw this in the demo code. Every object is setting a particular blend function and then restoring it to the default after that object is drawn. And then the next object comes along and repeats that whole process again. And there's 500 objects doing that same process in that particular demo.

And it's easier and faster if you just track the current state and get rid of any notion that after you change the state that you need to put it back the way it was. Instead, just be lazy about state changes and wait until some object comes along that needs to be rendered with a different state than what's currently set. Only actually make the change then when that happens. And then third is that you should try to hoist up any repetitive changes.

Meaning to bring them up out of the per object rendering routine. Maybe there's certain states that you're setting for every object or for many, many objects. And rather than setting that state every time for every object, just set it once or maybe once per frame or once for a particular group of objects. And then don't change it on a per object basis.

[Transcript missing]

So to go even further and to be a little more concrete, what I really recommend is to change the order as I've described here. Start by separating your rendering code to actually draw all of the opaque objects first, and then all the non-opaque objects. Then within each of those routines, order your rendering based on the texture that needs to be bound, or the shader your objects use, and so on. And this can really save you from making a lot of state changes.

And a third way to eliminate state changes that's especially relevant for most games is to combine multiple individual textures, if you have them, into a single larger texture atlas. So looking at this example, here's three of the textures for my game, a starry background texture, and two more little textures, one for my lander and the other for the alien spaceship. And if I've just set these up as three separate textures, then it means I have to bind texture one, and draw a quad. Then bind texture two, and draw a quad. Bind texture three, and draw a quad.

Okay, so that's three texture binds and three draw calls. But instead, if I were to combine these three textures together into a single texture atlas, then I could just bind that whole atlas and use different texture coordinates to pick out which sub-region I want, then draw, draw, and draw to get the same result. One texture bind instead of three.

And if you have a lot of small textures in your app, then you can perhaps reduce all of them down to just one texture atlas. All of the more recent devices support textures sized up to 2048 by 2048, which is pretty big. And you can have more than one texture atlas, too. So this is another really powerful technique for reducing the number of state changes.

All right, so now let's go back to our code and apply those changes. First, this texture enable. Well, every object in our game needs to have texture enabled, and we never disable texture. So after we've done this once, it becomes totally redundant to do it again 500 times a frame. So we can lift this out of the loop and just enable it once per frame or in some initialization routine.

Okay, now this texture bind. This is currently binding a different texture for every object, when we could just create a texture atlas with all our textures, and then hoist this call up out of the loop and just do it once per frame, or when it actually needs to change. Okay, next, this call to enable blending. It's just the same as the call to enable texture. It's redundant. And after we've done it once, we can just leave it enabled. So this can get lifted up to an initialization routine somewhere.

Okay, now these calls, they're necessary too, but you'll notice that we're also disabling them after the call to GLDrawRays. So this is one of those cases of unnecessary symmetry. So likewise, we could just set this at the beginning of the frame and keep track of it, and only change it if some object needs to be drawn with it changed. But either way, this will get hoisted out of the list of calls we need to make per object. Okay, so now let's take a look. Here's the changes so far.

So we've hoisted all of the state changes out of the per object loop, mostly by removing redundant changes and consolidating our textures into an atlas. Now, if you had more states to manage, your app might do these calls per frame instead of just once it started. But either way, moving these calls out of the per object loop is really important. Okay, so now let's move on to optimize what's left. and we'll focus next on optimizing the draw calls.

And I have a few good tips to give for this topic. The first is to cull any off-screen objects. The basic idea here is that OpenGL ES is going to process every command you send to it, even if after a particular object has been transformed, it turns out to be off-screen someplace and doesn't contribute to the scene. Well, instead, if you have a simple way to calculate whether an object will wind up off-screen, then there's no need for you to render it in the first place. All that work is something that just can be avoided.

In calculating whether an object is on or off screen in a 2D game is generally trivial. Just a simple position check against the bounds of the visible area, or an axis-aligned bounding box check, which is simple addition or subtraction, and this might save you the work of a whole series of state changes and draw calls.

So you would do this check for each object. So imagine that this is my whole level, and I have enemy spaceships scattered around, and perhaps this is the area that's actually visible right now. So anything entirely outside of this imaginary boundary doesn't need to be drawn. So as I iterate through the objects in my scene, I'm only going to issue draw calls for the five objects that can be seen in this frame instead of all 16. And if in a later frame I had panned around, then I'd get a different result, and I'd only draw those objects.

It's a simple concept, but really important to apply if you have a scene with a lot of off-screen objects. Now the second thing to do, particularly in 2D games where there's a lot of quads, is to flatten the transformations you'd normally apply with GeoRotate and GeoTranslate or GeoMultMatrix into the values of the vertex positions themselves.

And ultimately what this is going to do is to let you make longer vertex arrays, which is the next recommendation. But to be more specific, it's very typical for a 2D app to define the vertex positions of its quads about the origin, like -0.5 to +0.5 in X and Y. And then apply transformations by calling GeoTranslate and GeoRotate, and then issuing a draw call just for that quad, and then going on to the next one and repeating that whole process.

Well instead, what you can do is calculate that transformation yourself on the CPU and put those values directly into your vertex array. Then next I'll show you how to combine those arrays together. Now this is something that 3D games can do as well, but in 2D games it's absolutely trivial, even with thousands of vertices. And it's because calculating 2D transformations is such simple math. A 2D translation is just addition and subtraction, and a rotate is just sine and cosine. Let's talk about what we do with the result.

So the reason why we flatten the transformation into the vertex array data is so we can put multiple objects together into the same array. And the concept behind this stems from the fact that vertex arrays are meant to be very long. They're meant to be a way of sending a lot of data to the GPU very efficiently. So the opposite, using a whole bunch of really short arrays totally defeats the purpose.

And we need to find ways to join them together. And that's possible if you have multiple vertex arrays that all share the same state, including the transformation. And if you flatten the transformations into the vertex data itself, then they probably have no other transformations. Or maybe you can just have the same overall transformation in common, which is perfect.

Now, to actually join them together depends on whether you're using independent triangles or triangle strips. With independent triangles, there really isn't much. There isn't much work to do at all. You just make the array big enough and keep adding triangles to it, and make it as long as possible for as many triangles as you have that share the same state.

But the other possibility, and what I'm showing here, is if you use triangle strips. We still want to create a single long array that combines them all together, but how do you handle the separation between the original arrays? Well, we use something called degenerate triangles to join these strips together.

And what we do is to repeat the first and last vertex of each strip. The result is one longer strip, and the repeated vertices never get rasterized. So the strips will appear to be separately placed on screen, but they're able to be submitted to the GPU together in a single draw call.

So here we have triangles ABC, BCD, and CDE, which get rasterized normally. And then still, in the same strip, we have DEE, and EEF, and EFF, and FFG, which are really lines, not triangles. So they're not valid, and nothing gets rasterized. And then finally we get back to triangles again, with FGH, GHI, and HIJ. So in this example, instead of two strips with two draw calls and three triangles each, we get one strip, or one draw call, with six triangles that actually get rasterized. And we just keep applying that for all of the objects in the scene.

Now, either of those techniques are fantastic for the situation that hopefully we're in now, where we've sorted all of our rendering by state, we're using texture atlases to hold all of our textures and cut down on state changes, and we flatten transformations into our vertex array data. So if we can also make longer arrays, either by switching over to independent triangles and just throwing them all into a big array, or by taking our triangle strips and using degenerate triangles to join them all together, then in many cases, we could end up drawing all of our quads in our game in one or sometimes just a few draw calls. So let's say these five quads were defined as strips with four vertices each, and they all share the same state. They're all sorted, texture atlased, and flattened.

Then I can combine all of these together into one single vertex array, and I combine one texture atlas, and I set the vertex pointer, and do the same work for my texture coordinate array, and then draw my entire scene in one call to GLDrawArrays. And the performance difference between this and what we started with would just be incredible.

All right, so now let's go back to our code and apply those changes too. So for each object. Well, actually, we're going to change this. And instead of drawing each object, we're first going to calculate whether that object will be visible on screen, and only proceed if it is.

Okay, now here's our transformations. We'll remove these by flattening them into our vertex array data. And here's where we set up the pointers to our arrays and call GLDrawArrays. Well, this will change to construct the arrays on the fly, either using independent triangles, or by joining triangle strips together with degenerate triangles. Okay, so now let's take a look. Here's what that would become.

So now there's some work I'm doing at the initialization time at the top to set up state. Then the part that's highlighted is what I'm doing each frame. I'll start out now and for each object, if it's visible, then I'll flatten the transformations and add it to a combined array. Then once I've gone through all the objects at the end of the frame, I'll set up my array pointers and call GLDrawArrays with the whole combined array.

So, the thing to notice is that when we started, I was making 16 GL calls for every object. It was almost 8,000 GL calls being made per frame. Now, there's no GL calls being made for every object. I'm just doing calculations on the CPU to add the objects that are visible into a big vertex array. And then I'm making a draw call once per frame. So, the difference should be dramatic.

So let me show you that difference. And at the same time, I'll show what to look at for some further optimizations. So let's go back to the demo and then take a look at the OpenGL ES Analyzer Expert. All right, so let's start by taking another look at the demo as it existed originally before we made our changes.

So you can see it again here. And it's running pretty slow. We have the 500 draw calls resulting in 1,000 triangles or 500 quads. And it's taking about 8,000 GL calls to draw this. Okay, so there it is. So now let's look at the version that integrates the changes that we just made.

You can see it's much, much faster. So, and the performance here is great, but I can even crank up the number of particles, and it still is able to maintain completely fluid motion and able to maintain that performance. So, all I've done here are the things that I've shown in this talk, where I've converted over to texture atlases, I've removed a lot of state, and I'm using degenerate triangles in this case to join vertex arrays together. And a few other things that, just like I had mentioned, I'm culling off-screen objects and so on. Okay, so now let's take a look at how this actually runs in the OpenGL ES Analyzer instrument.

So I'll switch over to the new version and record a few frames. So it's running on my phone now. And I'm just counting up a few frames here. Now you can see even just in this short amount of time, I've gotten what about 30 times as many frames as I did before. But so now let's take a look at the same things that we saw before. Frame statistics first.

I'll turn this over. So here's the number of triangles that we're rendering per frame. It's changing. 838, 832, 862. The reason for that is because I'm culling any off-screen particles. And their position is defined randomly, so the number that actually end up on screen changes each frame. But so that's why this number is less than a thousand, which is what I had originally started with.

But look here, the number of batches, instead of it being 500, it's now one. Just one draw call to get all 838 or 832 triangles drawn on screen. And then over here, the number of GL calls per frame is also really, really low. Just four GL calls instead of 8,000. So let's take a look at the trace. So this is the trace.

For all of the GL calls that were made for the lifetime of the application, let's go to single frame navigation and just look at one frame. And here is the trace of calls that we're making in a single frame. All I'm doing is binding a texture. This is the ID for my texture atlas. I'm setting up my vertex pointer and texture coordinate pointer. And in this particular frame, I'm drawing a triangle strip with 1,050 vertices, so 1,048 triangles.

So that is just amazing. I mean, we are able to take the entire scene. So this includes the background, this includes the lander, all of the particles, and the fuel display, and so on. We're taking all of those objects and drawing everything that's on screen in just one call to GLDrawRays. And so it's obvious that's going to be a lot more efficient than calling 16 different GL calls plus a separate call to GLDrawRays for each of 500 or 1,000 different objects on screen.

But okay, so that is a look at the changes that we made. But now I want to, as I said, I want to show you one more thing and that is the OpenGL ES Analyzer Expert. And so actually what I want to do is go back to the original version of the app and see what it finds for us. So I'm going to run it again, and let it store up a few frames here.

Okay, that's probably far enough. And now, this information down here is what the OpenGL ES analyzer expert will tell you. And what, let me zoom in. What you see are different categories of problems or issues that it's found, the summary of what that issue is, then over here also, the number of times that that issue occurred for the life of the application, and the number of unique places in your source code that it came from. And then there's one more thing that I can look at also.

If I click on a particular issue here, I can open up a view which will give me extended detail about this, with a recommendation that you can make in your code for a change to make. And then also, it'll tell you the exact stack trace that this issue was coming from. So in this particular example, let's open this up.

I have all of these are redundant state changes, redundant calls. And a few of them are only happening once in the lifetime of the application, or a few times in the lifetime. So they don't really matter. But here, let's look at this one, glenable, gltexture2d. So this is being done more than 24,000 times so far in the life of the application. And what it's saying up here, well, okay, this command was redundant, glenable, gltexture2d, and this command was redundant, glenable, gltexture2d. So this is being done more than 24,000 times so far in the lifetime of the application.

And what it's saying up here, well, okay, this command was redundant, gltexture2d, and this command was redundant, gltexture2d. This is the routine that it's coming from. If I double click this, it will take me into my source code, and I can start making changes directly at that point. So this is a really powerful tool for finding issues in your application. So let's look at a couple more.

So a few more things that are happening here. Well, here's one. It's saying, okay, you have many small batch draw calls. And let's drill in on some information about that. Okay, it's saying, ah, this call to GL draw arrays with a GL triangle strip and only four vertices. Okay, that's a problem. Here's the trace where it's coming from. But now let's go back and see what it wants me to do about it.

So it says up here, OpenGL ES Analyzer detected a number of draw calls containing a small number of primitives. Remember, four vertices only in the original version of this application. So there may be an opportunity to combine several small draw calls into a single larger draw call in order to gain a performance improvement. So to do that, I used degenerate triangles to fix that particular problem. But here is where it's being called out using the OpenGL ES Analyzer.

So the point of this isn't just to go back and repeat everything that we talked about in this talk today, but it's to show you that you actually don't have to be an expert to know what to go tune. This expert system will actually tell you all of the things that are happening in your application that you should go and take another look at.

Let's look at just a couple more. State query call count. You know, I didn't do anything with this one in the slides, but let's see what it's talking about. It said that I had called something 500 times. Let's see what it is. Ah, it says call to GL get error. Yeah, I hadn't addressed that.

But here we are again. The analyzer expert is telling me something to fix. It's saying, you know, you really don't need to be calling GL get error so often. Really, you should just get rid of that from the production version of your code, and only use it for testing. But the reason why it's calling it out, let's look at the recommendation up here.

OpenGL ES Analyzer detected a number of state query calls in the current frame. Use state query calls sparingly from within the main rendering loop. State query calls are any functions that begin with glGet, glIs, or glRead. So that's great advice. And it shows me exactly where in my stack trace this is coming from.

Great. So I really encourage you to download Xcode 4 if you haven't already, and try out the new OpenGL ES Analyzer instrument. And then let the analyzer expert tell you where and how you can tune your app. Now probably a lot of that tuning will mean eliminating state changes and optimizing your draw calls.

And you can apply the techniques I've shown in this talk to do those things. And finally, here's my contact information if you have any questions about the content presented here, and a link to the iOS Dev Center for documentation, sample code, and our developer forums. Thanks for watching.