Video hosted by Apple at devstreaming-cdn.apple.com

Configure player

Close

WWDC Index does not host video files

If you have access to video files, you can configure a URL pattern to be used in a video player.

URL pattern

preview

Use any of these variables in your URL pattern, the pattern is stored in your browsers' local storage.

$id
ID of session: tech-talks-2013-22
$eventId
ID of event: tech-talks
$eventContentId
ID of session without event part: 2013-22
$eventShortId
Shortened ID of event: tech-talks
$year
Year of session: 2013
$extension
Extension of original filename: mov
$filenameAlmostEvery
Filename from "(Almost) Every..." gist: [2013] [Tech Talk 22] Advances i...

iOS 7 Tech Talks #22

Advances in OpenGL ES 3.0

2013 • 43:46

OpenGL ES provides access to the exceptional graphics power of the Apple A7. See how the innovations in OpenGL ES 3.0 deliver incredible graphics in games and other mobile 3D apps. Learn about advanced effects enabled by the latest API enhancements, and get specific tips and best practices to follow in your apps.

Speaker: Paul Marcos

Unlisted on Apple Developer site

Transcript

This transcript was generated using Whisper, it may have transcription errors.

So, hey, my name's Philip Iliascu. I'm the graphics and media evangelist, and this first talk is about OpenGL 3.0 and some of the advances that we've made you can take advantage of. So, yeah, as John was saying, you know, we recently introduced the iPhone 5S and the iPad Air and the iPad Retina Mini, and these are our first devices ever with support for OpenGL ES 3.0. You know, this great technology that we introduced in iOS 7, but OpenGL ES 3.0 is only available on these devices, just to start out with. This is a major leap forward, and there's a lot of new advances that you guys really want to take advantage of, so we'll talk about some of that stuff.

But specifically, you know, for graphics, you should realize that, recognize that the A7 is, like, our new baseline, our, you know, like, our next-gen GPU. So moving forward, this is going to be really important for you all to take advantage of. We recommend this is kind of like what you start driving your development towards and still keeping in mind that there are, you wanna keep the ASICs in mind and these things, but you can start thinking for new features, this is a good direction to head towards. We meet with game developers a lot and a lot of times they say, "Oh, we're not ready to start working "with the new stuff yet." There's a, you know, people are still on the older GPUs and this sort of thing. But, you know, like we said earlier, we have 74% adoption of iOS 7. So that's huge. There's a huge, huge demand for the new A7 devices. So it's a major leap forward. I highly recommend moving in this direction if you can.

And then, of course, combining that with OpenGL ES 3.0, this has its feature set taken directly from desktop OpenGL. It enables things that were previously only seen in AAA games on the console before. So it's great. And, you know, the third part of this picture, which maybe you hadn't thought of or seen too much of yet, is how we're pairing this up with the OpenGL ES debugger in Xcode. And there's some really amazing, awesome advances in there for the A7 that I'm going to show you today that we didn't actually get a chance to show at WWDC because, well, there wasn't an A7 device yet at that time. And this is great. This is actually, I had a chance to actually work on a lot of this stuff myself before becoming the evangelist. So I'm very excited to actually get to share some of that stuff with you guys today. The agenda for this hour.

First, I'm going to, you know, I want to start with some of the highlights of the A7 GPU. And then we're going to talk about what you can do to, you know, move your ES2 games towards ES3. And then I'll take a deeper dive into some of the new features of OpenGL ES3. And then, of course, I'll show you how you can tune your ES2 and ES3 games using Xcode's OpenGL ES5 debugger on an A7 GPU. Great. So I'm going to go ahead and start with the A7. You know, and just some of the facts first. You know, the A7 is still a tile-based deferred renderer, much like the previous GPU, the A6. But the performance is up to that of double of the A6. So that's fantastic. So A7 is the first GPU that provides fully native support for OpenGL ES 3.0, but it still provides support for ES 2.0. and ES 1.1 through backwards compatibility, we implement the shaders for you if you're gonna use ES 1.1. But we don't recommend taking that route if you can avoid it, unless you really wanna support older, older devices or your game is doing that already. And of course, it's a shader-based pipeline, just like all modern GPUs.

All right, well, here's a list of some of the new features in OpenGL ES 3.0, and there's a lot of them. You know, uniform buffer object, instance rendering, multiple render targets. There's a new version of the OpenGL ES shading language, GLSL, frame buffer fetch, and how we're combining that with MRT so you can do things like deferred shading. That's actually one of the things I'll talk about later. It's a lot to look at, but it's as much of this you can sort of wrap your mind around will give you a lot of really great tools to make really cool games with. So all right.

So quickly, I want to just kind of mention the new limits on an A7 GPU. And there's a lot of stuff on the screen here, but I think the big takeaway is that, you know, the limits are now double of that of the previous GPU. You know, more than double in a lot of cases. And, of course, you can have multiple render targets now on the A7. Max color attachments of four, whereas before you could just have one. So of course, if you are still using an OpenGL ES2 context on the A7, not an OpenGL ES3 context, then you'll be capped at the previous limits, the OpenGL ES2 limits. So this is one reason to take advantage of OpenGL ES3 right away, is to get the hardware native limits on the A7 GPU. I'll talk about a few of the key differences between the A6 and the A7. And the first, performance. In the area of performance, there is no longer a penalty like we had on the A6. I mean, not as much of a penalty for dependent texture reads. But there is a higher penalty now for logical frame buffer loads and stores. So what I mean by that, I don't know if you all know what that means, but in case you don't, logical buffer load would be if you forget to clear your color attachment at the beginning of the frame. Well, if you do that, we actually have to reload that, and so you'll incur a performance penalty for that. That's a logical buffer load. So the way to not do that is to make sure you clear your color attachment. And a logical buffer store is if you forget to, say, invalidate or discard your depth or stencil or multi-sample attachments at the end of the frame. In this case, we will actually make a copy of those attachments between frames, and so you'll incur a cost, a performance cost for that. So that's something to keep in mind. So precision.

So in the realm of precision, we've promoted low P shader values are now promoted up to 16 bit, and any floating point shader calculations are done with a scalar processor. So this might have some implications in your game, in your shaders, if you're doing anything with write masks or anything like that.

You might want to take a look at some of that stuff and make sure it still works the way you expect it to work on the new GPU. And as far as limits goes, like I said before, apps with an ES2 context will still get the ES2 limits, but apps with an ES3 context will get the native limits. Another reason to move forward with the ES3.

All right, so I'm just going to quickly talk about how you might move to the ES2, move to ES3 from ES2. The big picture is that the core of ES3 is actually a superset of ES2. So what that means is that if you were, you know, if you just want to take your ES2 code and move it into ES3, it will mostly just work. All of the stuff that was in core is still in core, of course.

For extensions, any extensions you were using in ES2, there's a couple of different changes. You know, three cases, actually. The first case, some have actually moved into the ES3 core as is. There's a couple really simple things you can do there to just get those to work, and you can pretty much do that with a search and replace, and we'll talk about that in a second. Some have actually moved into ES3 core with some semantic changes. So a little bit of API changes, but not anything dramatic. And some are still, some that were extensions are still extensions. Not too much to do as far as that goes. All right, so looking at the first case, these pretty much work identically between ES2 and ES3. The only difference is that we've dropped the EXT or OES prefixes or suffixes from the functions and the tokens. So, you know, a lot of this stuff you can just do search and replace and, you know, move it forward.

And a couple of quick examples, you know, gl-text-storage-2d-ext has a token, glRGB88 with OES, has now become gl-text-storage-2d. And that's it. And another example, map-buffer-range-ext just drops the ext. And, you know, same with all the tokens. So that's great. You know, that's like the trivial case. Case number two, a little less trivial, but still not, you know, rocket science or anything. You know, these extensions have some API changes, map buffer, discard frame buffer, texture flow. They all still exist, but, you know, there might be some code changes you may have to make to get them to work in ES3.

So a couple examples, MapBuffer, this is the extension that lets you map an area of some memory in the GPU onto the CPU so you can read and write. Well, MapBuffer OES is gone. Now you would use MapBuffer range and just specify the max size of the buffer to get the same functionality as MapBuffer OES. And discard frame buffer, We've renamed the function. That's all there is to it. It's now glInvalidateFrameBuffer. And now this is the function that you would use to avoid a logical store penalty. Just want to make this point. You know, use this to invalidate anything you're not going to present to the screen at the end of your frame. And this will help increase the performance of your games on the A7, okay? And Apple FrameBufferMultisample. glResolveMultisampleFrameBufferApple has now become, well, you create a source in a destination buffer and use blit.

Joe Blip frame buffer. And that's great because you can actually specify regions and so on and so forth. Pretty simple. Thank you. All right, so case number three. These extensions are actually still extensions in ES3, but some of them may have a couple minor changes, but nothing dramatic. And, you know, copy texture levels, RGB 422, the debug label, debug marker API that you can use with Xcode, actually. I don't know if you know about it, but you can use it with Xcode to create groups and label stuff so that when you're running the OpenGL debugger, You'll actually see your draws grouped in folders and stuff like that. That's the extension you use that with. And then I've highlighted shader frame buffer fetch for two reasons. One, there are some minor semantic changes here, mostly in the GLSL version. And I'm going to also talk about it later. That's the other reason.

Just talking about shaders really quick. So any shaders that you had previously supported for ES-- or had written for ES2 are still supported in ES2. Since ES2 are still supported in ES3, just directly, you can just use them. And ES2 is a subset of ES3, or ES3 is a superset of ES2. So this includes the shading language. shading language will just work. And if you, you know, normally you would, you know, sometimes you could put version 100, but you didn't have to do that. We would just assume it's version 100. Well, that's still the case. Now, there's also a new shading language, version 300. You would actually put that up at the top if you want to use the new shading language.

And there's actually a lot of additions, and we'll talk about that a little bit more as well. This is actually very similar to what we did on the desktop with GLSL 330. And, you know, back when we did this, we put out a video migrating to the OpenGL core profile. So this can actually help you sort of get going on this a lot faster. And I'll provide a link at the end of the presentation, and you can kind of look at that. If you don't want to go through and read the GLSL spec directly right now, this video might be a quick way to get going.

It's very similar. So, yeah, I'll provide a link at the end of the presentation for that. Let's talk about how you might actually start doing this, the adoption strategy. First thing you want to do, like I mentioned before, go out and get an A7 device if you don't have one and test this stuff on the new, test your games on the new devices because the changes I talked about earlier, they might have some performance implications on your game, and it's really important that you test your stuff and make sure that it's working the way you expect on the new hardware. And especially correct any logical buffer loads in stores because there's, like I mentioned, much higher penalties for that now.

And I'll actually show you how you can find those using the tools later in the presentation. Next you want to try and support both ES2 and ES3. And the way you might do this is the way you did it back when we moved from ES1-1 to ES2. Try for an ES3 context, and if you don't get one, fall back to an ES2 context. Try for an ES3 and then fall back to an ES2 if you don't get it. And then have two separate code paths for each. That way you can still support all devices running iOS 7 with either version of the language. Make sure you can handle, you know, you can query for the GL string and make sure you handle any extensions at runtime and so on and so forth. You know, some games can afford to just go ES3 only. Like, if you want to take advantage of some of these new features that I'll talk about, like deferred shading or, you know, some of these other things, you know, this is the new baseline. Going forward, this is what you want to start targeting your games for. Let's take a look at OpenGL ES 3.0 a little bit deeper.

Okay, so here's that chart again. And, you know, there's a lot of stuff on here, but in this talk, I'm just going to focus on, you know, just a couple of things, and that's instance rendering, multiple render targets, and frame buffer fetch. And I'll actually mention a couple other things, too, but those are the main areas of focus. Okay, so instance rendering. Now, this is actually something we did talk about at WWDC, and the reason I'm bringing it back up here again, I wasn't sure if I wanted to talk about it, but we discussed it with a few engineers, and we thought, you know, this is a really important topic for people to understand because so many games are CPU bottlenecked, right?

So many games are spending a lot of their time work on the CPU unnecessarily. And if they started taking advantage of this extension, it could increase the-- it would just move a lot of work onto the GPU and increase your performance with very little work. And the great part about this is it's available in ES2 on all iOS 7 devices.

Now, you don't even need to move the ES3 to get this. But in ES3, it's part of core, whereas in ES2, it was an extension. So this is something you can take advantage of now and speed up the performance of your games if this makes sense for you. What's instancing all about? Well, it's the case where you might want to draw many similar objects. You know, in this scene here that I'm showing, we're drawing 18,000 asteroids, right?

18,000 asteroids. Each asteroid has 48 vertices. So if you do the math, that's 864,000 vertices, I believe, that we're actually... 864,000 times we're going to run the, you know, vertex shader and so on and so forth. So...and all the calculations you might be doing. With instancing, you can actually do that, get rid of all of the work that you're doing on the CPU. Not all of it, but, you know, a big majority of the work you might have been doing on the CPU previously, and still have different positions and different colors and all sorts of whatever. Each asteroid can have its own parameters. And that's what makes it really powerful.

So without instancing, here's one way you might do this. You go ahead and draw the backdrop, draw the stars and the planets. And then you would brute force loop through all 18,000 asteroids and set some kind of uniform matrix, like your model view projection or whatever for each asteroid and that sort of thing. And then actually draw the asteroid. Now, when you do this, what this does is it fills up the OpenGL command buffer with, what, 18,000 times two calls here. And of course, most people are probably doing a lot more than just setting one uniform. So you're just filling up this command buffer and asking -- which essentially is asking GL to do a bunch of work. That's just asking the GPU to do work. The CPU is asking the GPU to do work.

This is where instance rendering comes in. Instance rendering lets you draw the same object many, many times from a single draw call. That's what's really powerful about it. And in that draw, every single one of these can have its own position, its own rotation, its own texture coordinates, all sorts of stuff, whatever you need. There are two APIs, two ways to take advantage of instance rendering. The first one, instance arrays. This is kind of similar to how you would set up your attribute arrays for positions, normals. you might actually create another attribute array. Or you can use one of the other ones you have and set it up in the stride somehow. But the simplest way to think about it is you have another attribute array.

And then you'd specify a divisor so that you could actually get at the per instance data for each instance as you're running through. If you want to see this in a little more detail, the WWDC talk from this year on OpenGL ES3 goes into this in a lot more detail, or advances in OpenGL. goes into this in a lot more detail. Shader instance ID, I'll talk about in a little more detail because it's the simpler, I think, conceptually of the two. But, you know, real basically, it basically gives you this GL instance ID built in in your shader that increments for each instance that you can use to do anything you need. Now, as I said, both of these are available on all iOS 7 devices. ES3 is the core and ES2 is an extension. so I highly recommend you guys start using this now if you can. All right, so taking a little bit closer look at instance ID, you know, you get this built-in GL instance ID in your shader, which is incremented for each instance. So for the first asteroid, the first 48 vertices, you will have instance ID 0, and then the next 48 vertices, instance ID 1, and so on and so forth. And what you do with it is up to you. So in our particular scene, we were using it to, you know, do some kind of sine and cosine calculation to put each asteroid in the orbit somewhere, and a number of other things, which I'll talk about in a little more detail. But you can use it to, you know, maybe look up into a uniform buffer object with the uniform buffer object extension.

You can also use it with the new vertex texture sampling extension, which this is great, because with this extension, you can do interesting things in your vertex shader with a texture like bump map each instance, for example. You can now do bump mapping directly there because you can look up into a texture.

And you can use the uniform instance ID to do that. All right, so here's what the, you know, basic setup on the CPU would look like. You would just, you know, set up your vertex attrib arrays, and then, of course, you know, depending on if you're using the divisor version or not, you might have another one.

But either way, you would then create your uniforms, your global uniforms that will be used across all instances. And all in one draw call, specify how many asteroids you want to draw. Right? GL draw raise instance. And then you'd put 18,000 there at the end. That's it. A lot less CPU work.

And in your shader, you might have the same setup. You can choose version 300 or not. In this case, I'm using version 300 and setting up my uniforms and my attributes the way you normally would. And using instance ID to calculate the position of each asteroid. This is kind of a simple canned example. We're just kind of setting them up in a grid. It's not actually what we're doing in the demo, but just to illustrate the point, instance ID can be used however you want. In this case, you know, I'm modding it by 100 and dividing it by 100 to get some XY. And then I'm outputting that position. You can have much more complex lighting or whatever it is you do in your shader, but, you know, that's the basic example of how you'd use instance ID. All right, so I'm going to actually run a quick demo and show you guys how you might use this. Okay, so here's the demo I was showing earlier. And here we are rendering, you know, 18,000 asteroids in immediate mode, the brute force method I showed you earlier, we're getting some kind of performance, you know, 18 frames per second. Okay, that's, you know. So, you know, if I tap the screen, we switch to instance ID. And we're getting, you know, about double. That's actually really fantastic. I mean, it's actually faster if I don't have it hooked up to this projector. But, you know, either way, That's double the performance that we were seeing before.

And then if I tap again, we'll go to the attribute arrays, divisor version, and, you know, you'll still see that it's double. Now, you can also do a lot of other things to tune this, and it actually runs closer to 55, 60 frames per second with 18,000 asteroids with both of these techniques. But when I'm unplugged. But what I really wanted to say is that, you know, There's no kind of set, this technique is faster than the other technique kind of thing. It's up to your application to decide which technique is going to be faster.

So I recommend that you experiment with both and see what really speeds up your code. And, you know, either the attribute arrays version or the instance ID version. I just showed you a pretty simple shader, a simple example of the shader and all that. But we are doing a lot more in this demo, and I just wanted to mention it for those of you that might either know GL and be asking, hey, are we actually taking advantage of any of this other stuff? Or, you know, just to kind of mention it so you can think about it for your stuff, your games. First, we're using instance ID, of course, to look up some pre-calculated model view projection textures, things like that, model view projection matrix stuff in a uniform buffer object. And then we're also using it to calculate the spin rate for each asteroid. And then we're using uniform buffer object extension to hold some of the transformation data and that sort of stuff for each instance. So then we can use instance ID and actually index into that uniform buffer object and get that data out. And, you know, uniform buffer objects actually have a limited size.

I think it's about 16K. So we actually have to make -- you can't actually submit all 18,000 instances at once. You have to batch them so that they match the uniform buffer object size. So I think what we're actually doing is submitting about 40 draw calls instead of one. But it is still better than 18,000. And then lastly, transform feedback and rasterize discard. This is actually maybe a little more advanced, but what we're doing here is we're using this to pre-calculate all of the model view projection matrixes and all that other stuff up front before we actually go into the rendering pipeline. Rasterize discard lets you, you know, shut off the rasterization part of the pipeline, and transform feedback lets you emit the output back into a buffer object from a vertex shader. So you can use a vertex shader to calculate all of your data and then sort of emit that back before you actually render. This lets us do, you know, instead of calculating the model view projection matrix for every vertex, it lets us do it for every instance up front, because we can just specify in the draw for the transform feedback, you know, 18,000 vertices where each vertex represents one instance. So that saves, you know, instead of 864,000 model view projection matrix calculations in a vertex shader, we do it 18,000 times, once per instance, so that's better. and it's in a previous rendering pass before we do any rendering at all. Next topic, I wanted to talk about multiple render targets. The main idea behind multiple render targets is, well, now you can render to four attachments or four textures at once. This might look like the first pass of a deferred shading algorithm where you have your G buffer, your geometry buffers, your normals, your depth, your albedo, And it is, actually. It's like we actually started kind of playing around with some of this stuff. And what's great is that you can now have, you know, for each one of these attachments can have a different format as well. So your color can be ARGB 8888. You might want higher precisions for your normals, 1010102. And you can mix all these in one pass. So the one thing I wanted to mention about this before we get too far into it, is that if you want to stay on the fast path, you don't want to output more than 128 bits per pixel. So if you have a color attachment that's ARGB 888, that's 32 bits that you're outputting in just that one attachment. And if you have four with the same format, that's going to be 128. So whatever format you use, try and do the calculation and figure out how many bits per pixel you're actually outputting. and that will keep your app on the fast path.

If you go over that, you'll incur a pretty huge performance penalty. I'm not going to get too deep into deferred shading here because this is something that's been around for a long time on desktops and consoles, and there's a lot of information out there about how you might do deferred shading. But, you know, just briefly want to mention this is something you might use to decouple the amount of, like, the geometry, you know, the overall geometry from the number of lights in your scene. So the complexity of your geometry from the number of lights in your scene. This lets you have a lot of really interesting lighting. You know, lots of lights in your scene, irrelevant of what the geometry looks like. So, you know, multiple render targets, however, is just this one piece. So this would be the first pass of a deferred shading algorithm. them.

The second pass, you would then read back from these attachments and do some lighting computation to get your final scene. So just to clarify the pass, when I say pass, I don't mean draws. This is actually, I've had this question a couple times, so I'm going to clarify it here. You know, traditionally, we sort of think of it as, you know, the first pass, you bind a whole set of textures, and you render to those textures as your output, and then you would unbind them and bind a whole new set of textures in the second pass and sample from the previous textures in that second pass. That's kind of traditionally what we mean by passes, and I'm going to use that word a little more in the presentation, so I just wanted to make that clarification now. All right. So this is how you might, what the CPU code setup for this might look like, and it's pretty simple. you would just declare the four attachments that you're interested in, and then attach them to the frame buffer, and enable, tell GL that you're gonna draw into those four attachments, and then just draw.

and taking a quick look at the shader and the setup for the shader, if you're using version 300, which, of course, you have to for multiple render targets, the syntax is slightly changed. You no longer have a built-in for outputting your color values and that sort of thing to the attachment. Now you would actually use the layout syntax and specify the location and then, you know, kind of give each attachment a name. And then, you know, you do a bunch of work and you output to those names that you've set up, those variables that you've set up that represent each location.

You know, the work you might do to set up the shader is obviously beyond the scope of this talk, so I just put dots. And then you'll just get this really interesting, more complex scene with lots of lights. Like in this particular scene, we have lots of kind of, you know, these little glowing fairies spinning around this monolith. As I mentioned, deferred shading is usually a two-pass algorithm. But now with framebuffer fetch, it becomes one pass. And I think this is something pretty unique to iOS 7 and the A7 GPU. It's how we're combining this extension, framebuffer fetch, with MRTs to let you do deferred shading in one pass. So framebuffer fetch has been around since iOS 6. This is an extension that was there before. And traditionally, it could be used for programmable blending. You might use it to do some kind of additive or hard light blend like that that you couldn't get from the fixed function pipeline. Or you could use it actually for some local post-processing effects. That's another thing. If for example you had a scene and there was a character in your scene that was running around and he got shot or something and the screen flashed gray for a minute, you might use the current color values in your scene and read those back and just turn them gray and just flash gray for a second. You know, this would be a quick way, quick and easy way to do that without re-rendering the whole scene.

And, you know, I just wanted to actually say this is how this extension looked in the previous version of GLSL. You would actually read back from GLS frag data, but now you would just simply change the out attribute in the shader to in out. so you could read and write from the same attachment. And so now, this actually enables you to have, you know, kind of use this for something other than effects or blending. You can use it to kind of store data and read it back later in another pass or that sort of thing. So I'll talk about that a little more.

So framebuffer fetch with multiple render targets, this is what's new in iOS 7 and the A7 GPU. Combining that together gives you this ability to do deferred rendering in one pass. And what that means is now you can actually sort of read and write from the same attachments. And of course you can read and write from one and output to another. You can read from one and output to another. That's one of the things that makes this really powerful. You're not limited to just kind of reading back from the same one. This is how you might do deferred rendering in one pass. You would use MRTs to output your four attachments for, you know, color normals, that sort of thing. Or in the same pass, you would then read back using framebuffer fetch without changing any of the attachments that you have bound, which is why I'm calling it a single pass, and then compute your lighting and output to one of the attachments that you have bound.

And then you have to make sure that you clean up all of the other attachments so you can avoid any logical buffer stores, as I mentioned. And then output your scene, and you'll notice you'll have a lot of interesting lights all in one pass, deferred rendering in one pass. It's pretty awesome how you can take advantage of these technologies. You know, multiple render targets take care of rendering all your buffers in one pass, and you can decide various combinations in your game's algorithm, of course. And using frame buffer fetch, allows you to calculate all your lights and add all sorts of lights in the same pass. And make sure you invalidate the render attachments to avoid any buffer stores. I'm going to show you some of the tools. And specifically, we're going to talk about Xcode's, Xcode 5's OpenGLES frame debugger and the new shader profiler in the frame debugger. Now, actually, previously, before I joined the evangelism team, I worked on this team, and I was on this team for about five years and I'm really excited to actually get to talk to you guys about this stuff on stage. I have the A7 device hooked up here, and I'm going to go ahead and launch the Asteroids demo. When the game starts running, you can actually go over here to the debug navigator, which is Command-6.

When you're in the file navigator in Xcode, you can switch to any of these by using the Command-number buttons. So if you do Command-6, you'll go to the debug navigator, And then you can select the FPS performance tray. And this will show you right away the FPS of your rendering. And we also have a GPU and CPU comparison graph. We're running in immediate mode, so you can see that this is actually pretty bottlenecked right here on the CPU. We're spending a lot of time just emitting draw calls. So if I switch to instance ID-- you'll see that we've eliminated a lot of the work that we were doing on the CPU and moved that to the GPU. And that's actually what we want. And you can see that the frame rate has gone up.

to 40, it looks like, about 40 frames per second. So now what I'm going to do, I want to actually see how I can maybe improve this performance a little more. And I can use, this is how you might use the, this is how you can use the OpenGL frame debugger in Xcode by clicking on the button that says, it looks like a camera, and it's called Capture OpenGL ES Frame. And when you do that, what we do is we actually take a sampling of your OpenGL code. Like, we'll actually record every single call that you're making and recreate that. And when it's done, you'll get this, you'll get your trace, a view of your trace over here in the debug navigator, and you can actually click on one of those and see your MRTs or your attachments, however you're using them, and a wireframe of the rendering of the objects that you have bound to render. And, you know, you can then open up the assistant editor over here and see all of the bound resources at each draw call and the state down below in the debug area. You can actually change and see the GL context or any of that, but I'll just go ahead and keep it on auto so we can then go back over here into the debug navigator and expand the rendering group. And you'll see there's a warning sign here next to GL clear. So if you click on that, it'll actually tell you down in the variables view that your app is causing a slow frame buffer load. Now, this is... I'm using different wording, but this is a logical frame buffer load.

It's telling us that we failed to clear at the beginning of each frame. So if you look at the code, you can actually see where we're doing that by expanding this group and clicking on the top stack frame, and it'll take us here. And all we're clearing here is the depth buffer. So I can actually go ahead and add the color bit here, but I'm not going to do that just for time's sake. But that's one way to get to that and find where you're causing a logical buffer load.

All right, so if we go back to the FPS tray, we'll notice now if you're using the A7 versus previous GPUs that you'll get this program performance graph down below after you've made a capture. And you can see right off the bat which programs are using the most amount of time in your game, which shaders, right? So this particular shader, program number 12, is using 22.29 milliseconds of the 24.5 milliseconds on the GPU, right? Well, okay. If we want to dive a little deeper into seeing, you know, where that performance is in our shader, over in the debug navigator here, you can actually change the way that you're viewing the frame. Now, this is similar to how you might do it if you're looking at threads versus queues. Well, we have a way to see either by call or by program now. So you can actually switch this to view frame by program, and you'll see the same programs over here in the debug navigator, but you can expand them and see the time per shader. So here in the vertex shader, we're actually spending 20.52 milliseconds of that 22.9 milliseconds. Now, if we click on this... will actually get line-by-line performance information in the shader. And you'll see how much performance, how much time each one of those lines of code are taking. So, okay, great.

Well, this first line, where I'm using instance ID to look up in a uniform buffer object, is taking 28.1% of the overall 20.52 milliseconds in my code. Well, I actually, that's work I want to be doing, so that's fine. I want to actually put more of that, I want more of the percentage to go into that particular line of code than some of the other ones, right? So, okay, let's take a look and see what else we can optimize. Well, you know, this for loop here is taking 11.3% to what? Well, hard code a bunch of values into a model view matrix, and it looks like four times.

Well, you can take a look at the, if we take a look at the shader best practices guide on developer.apple.com, you'll see that actually using a for loop to precalculate a value to then, you know, kind of set that up is kind of, it's slow. So something we can do actually is just unroll this loop and hard code these values. So I'm going to go ahead and copy this line of code and delete the for loop. I'm going to go ahead and unroll this loop and just paste the values here.

And then when I've made the change to my shader, I can then click on this kind of circular arrow here right below the shader, and it will actually recompile -- ah, it looks like I have an error. So it's smart enough to tell me that I have an error. Okay. Yes, because I changed the wrong -- Let's go back and do that again. All right. So I'll go ahead and recompile it again.

And we'll notice that when it's done compiling, that amount of work that we were doing there is gone, and it's moved, and the percentage that we're-- where we want to be doing more work, and that's where we're spending our time. And of course, the overall time of the shader has decreased. It's now running at-- you know, um, 17.56 milliseconds, where it was, you know, something like 20 milliseconds before. Well, that's great. Well, what if I want to actually see that in my game and see what the performance looks like in my game? Well, if you click this arrow down here, um, where the camera button was, you can get, um, you know, this arrow. It will continue-- it will rerun the game that you were running, but it will inject the new shader in your game. And you can see, uh, that the frame rate has gone up.

So that's one way you can use the new shader profiler to actually kind of tune your shaders. And once again, this only works on the A7. So this is another reason to get one of these devices and play with it. And, you know, you can optimize code for previous devices as well using this, but you have to have the A7 to do this kind of shader profiling. So I highly encourage you guys to take advantage of the tools and play around with some of this stuff. We talked about the A7 GPU, and OpenGL ES 3.0 and how you can use the new technologies together. And we talked about how you can use Xcode's OpenGL ES frame debugger and the new shader profiler to take advantage of how you can tune your games with this stuff. So I hope you guys take advantage of this new technology and go out and make some awesome OpenGL games. And thank you for your time. I'm Philip Iliascu. I'm the new graphics and media evangelist.

Just started in November on this team. and Apple Developer Forums is, you know, you can post any questions there and we'll try to get back to you as much as you can, as much as we can, and take a look at the documentation for OpenGL, and this is the video I promised to give you the link for earlier, migrating to the OpenGL core profile. This will help you get into GLSL 300 by looking at how we did that on the desktop. So here's the link. You should write it down if you, you know, if you're interested in taking a look at that. So, great. Thank you very much. Thank you.