Advances in OpenGL ES 3.0 - Tech Talks

2013 • 43:46

OpenGL ES provides access to the exceptional graphics power of the Apple A7. See how the innovations in OpenGL ES 3.0 deliver incredible graphics in games and other mobile 3D apps. Learn about advanced effects enabled by the latest API enhancements, and get specific tips and best practices to follow in your apps.

Speaker: Paul Marcos

Unlisted on Apple Developer site

Downloads from Apple

HD Video (426.2 MB)
SD Video (95.2 MB)

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.

So, hey, my name's Philip Iliascu. I'm the graphics and media evangelist, and this first talk is about OpenGL 3.0 and some of the advances that we've made you can take advantage of. So, yeah, as John was saying, you know, we recently introduced the iPhone 5S and the iPad Air and the iPad Retina Mini, and these are our first devices ever with support for OpenGL ES 3.0.

So, yeah, we're going to talk about some of the advances that we've made. But OpenGL ES 3.0 is only available on these devices, just to start out with. This is a major leap forward, and there's a lot of new advances that you guys really want to take advantage of. So we'll talk about some of that stuff. But specifically, you know, for graphics, you should realize that, recognize that the A7 is, like, our new baseline, our, you know, like, our next-gen GPU. So, moving forward.

This is going to be really important for you all to take advantage of. We recommend this is kind of like what you start driving your development towards. And, you know, still keeping in mind that there are, you know, you want to keep the A6 in mind and these things.

But, you know, you can start thinking for new features. This is a good direction to head towards. You know, we meet with game developers a lot, and, you know, a lot of times they say, oh, we're not ready to, you know, start working with the new stuff yet.

You know, there's a... You know, people are still on the older GPUs and this sort of thing. And, you know, but, you know, like we said earlier, you know, we have 74% adoption of iOS 7. So that's huge. There's a huge, huge demand for the new A7 devices. So it's a major leap forward. I highly recommend moving in this direction if you can.

And then, of course, combining that with OpenGL ES 3.0, this has its feature set taken directly from desktop OpenGL. It enables things that were previously only seen in AAA games on the console before. So it's great. And, you know, the third part of this picture, which maybe you hadn't thought of or seen too much of yet, is how we're pairing this up with the OpenGL ES debugger in Xcode.

And there's some really amazing, awesome advances in there for the A7 that I'm going to show you today that we didn't actually get a chance to show at WWDC because, well, there wasn't an A7 device yet at that time. And this is great. This is actually, I had a chance to actually work on a lot of this stuff myself before becoming the evangelist. So I'm very excited to actually get to show you this.

Yeah, I'm excited to share some of that stuff with you guys today. The agenda for this hour. First, I'm going to, you know, I want to start with some of the highlights of the A7 GPU. And then we're going to talk about what you can do to, you know, move your ES 2 games towards ES 3.0. And then I'll take a deeper dive into some of the new features of OpenGL ES 3.0.

And then, of course, I'll show you how you can tune your ES 2 and ES 3 games using Xcode's OpenGL ES 5.0. And then, of course, I'll show you how you can tune your ES 2 and ES 3 games using Xcode's OpenGL ES 5.0. Xcode's OpenGL S5 debugger on an A7 GPU. Great.

So I'm going to go ahead and start with the A7. You know, and just some of the facts first. You know, the A7 is still a tile-based deferred renderer, much like the previous GPU, the A6. But the performance is up to that of double of the A6. So that's fantastic.

So A7 is the first GPU that provides fully native support for OpenGL ES 3.0, but it still provides support for ES 2.0 and ES 1.1 through, you know, backwards compatibility. We implement the shaders for you if you're going to use ES 1.1. But we don't recommend taking that route if you can avoid it, unless you really want to support older, older devices or your game is doing that already. So... Okay. And, of course, it's a shader-based pipeline, just like all modern GPUs. So... Here's a list of some of the new features in OpenGL ES 3.0.

There's a lot of them. Uniform buffer object, instance rendering, multiple render targets. There's a new version of the OpenGL ES shading language, GLSL. Frame buffer fetch. And how we're combining that with MRT so you can do things like deferred shading. That's actually one of the things I'll talk about later. There's a lot to look at, but as much of this as you can sort of wrap your mind around will give you a lot of really great tools to make really cool games with. All right.

So quickly, I want to just kind of mention the new limits on an A7 GPU. And there's a lot of stuff on the screen here, but I think the big takeaway is that the limits are now double of that of the previous GPU. More than double in a lot of cases. And of course, you can have multiple render targets now on the A7. Max color attachments of four, whereas before you could just have one.

So of course, if you are still using an OpenGL ES 2 context on the A7, not an OpenGL ES 3 context, then you'll be capped at the previous limits, the OpenGL ES 2 limits. So this is one reason to take advantage of OpenGL ES 3 right away, is to get the hardware native limits on the A7 GPU. I'll talk about a few of the key differences between the A6 and the A7. And the first, performance. In the area of performance, there is no longer a penalty like we had on the A6.

I mean, not as much of a penalty for dependent texture reads. But there is a higher penalty now for logical frame buffer loads and stores. So what I mean by that, I don't know if you all know what that means, but in case you don't, logical buffer load would be if you forget to clear your color attachment at the beginning of the frame. Well, if you do that, we actually have to reload that. And so you'll incur a performance penalty for that. That's a logical buffer load.

So the way to not do that is to make sure you clear your color attachment. And a logical buffer store is if you forget to say invalidate or discard your depth or stencil or multi-sample attachments at the end of the frame. In this case, you know, we will actually make a copy of those attachments between frames. And so you'll incur a cost performance cost for that.

So that's something to keep in mind. So precision. So in the realm of precision, we've promoted low P shader values are now promoted up to 16 bit. And any floating point shader calculations are done with a scalar processor. So this might have some implications in your game. And, you know, like if you're doing in your shaders, if you're doing anything with write masks or anything like that, you know, you might want to take a look at some of that stuff.

And make sure it still works the way you expect it to work on the new GPU. And as far as limits goes, like I said before, you know, apps within ES2 context will still get the ES2 limits. But apps within ES3 context will get the native limits. Another reason to move forward with the ES3.

All right, so I'm just going to quickly talk about how you might move to ES 3 from ES 2.0. The big picture is that the core of ES 3.0 is actually a superset of ES 2.0. So what that means is that if you just want to take your ES 2.0 code and move it into ES 3.0, it will mostly just work, although the stuff that was in core is still in core, of course. For extensions, any extensions you were using in ES 2.0, there's a couple of different changes.

Three cases, actually. The first case, some have actually moved into the ES 3.0 core as is. There's a couple really simple things you can do there to just get those to work, and you can pretty much do that with a search and replace, and we'll talk about that in a second. Some have actually moved into ES 3.0 core with some semantic changes. So a little bit of API changes, but not anything dramatic. And some are still... Some that were extensions are still extensions.

Not too much to do as far as that goes. All right, so looking at the first case, these pretty much work identically between ES 2.0 and ES 3.0. The only difference is that we've dropped the EXT or OES prefixes or suffixes from the functions and the tokens. So a lot of this stuff you can just do search and replace and move it forward.

And a couple of quick examples, you know, GL text storage 2D Xt has a token GL RGB 8.8 with OES has now become GL text storage 2D. And that's it. And another example, map buffer range Xt just drops the Xt. And, you know, same with all the tokens. So that's great. You know, that's like the trivial case. Case number two, a little less trivial, but still not, you know, rocket science or anything.

You know, these extensions have some API changes, map buffer, discard frame buffer, texture flow, they all still exist, but, you know, there might be some code changes you may have to make to get them to work in ES 3.0. So a couple examples. MapBuffer, this is the extension that lets you map an area of some memory in the GPU onto the CPU so you can read and write. Well, MapBuffer OES is gone. Now you would use MapBuffer range, and just specify the max size of the buffer to get the same functionality as MapBuffer OES. And discard frame buffer.

We've renamed the function. That's all there is to it. It's now GLInvalidateFrameBuffer. And now this is the function that you would use to avoid a logical store penalty. Just want to make this point. You know, use this to invalidate anything you're not going to present to the screen at the end of your frame.

And this will help increase the performance of your games on the A7. And Apple FrameBufferMultisample. GLResolveMultisampleFrameBufferApple has now become, well, you create a source and a destination buffer and use blit. Joe Blip frame buffer. And that's great, because you can actually specify regions and so on and so forth. Pretty simple.

All right, so case number three. These extensions are actually still extensions in ES 3.0, but some of them may have a couple minor changes, but nothing dramatic. And copy texture levels, RGB 422, the debug label, debug marker API that you can use with Xcode, actually. I don't know if you know about it, but you can use it with Xcode to create groups and label stuff so that when you're running the OpenGL debugger, you'll actually see your draws grouped.

By in folders and stuff like that. That's the extension you use that with. And then I've highlighted X shader frame buffer fetch for two reasons. One, there are some minor semantic changes here, mostly in the GLSL version, and I'm going to also talk about it later. That's the other reason. So...

[ Transcript missing ]

Now, you don't even need to move the ES 3.0 to get this. But in ES 3.0, it's part of core, whereas in ES 2.0, it was an extension. So this is something you can take advantage of now and speed up the performance of your games if this makes sense for you. What's instancing all about? Well, it's the case where you might want to draw many similar objects. You know, in this scene here that I'm showing, we're drawing 18,000 asteroids. Right? 18,000 asteroids. Each asteroid has 48 vertices.

So if you do the math, that's 864,000 vertices, I believe. That we're actually -- 864,000 times we're going to run the, you know, vertex shader and so on and so forth. So -- and all the keys are in the same place. So you can do that. You can do the calculations you might be doing.

With instancing, you can actually do that, get rid of all of the work that you're doing on the CPU -- not all of it, but, you know, a big majority of the work you might have been doing on the CPU previously. And still have different positions and different colors and, you know, all sorts of -- you know, whatever. Each asteroid can have its own parameters. And that's what makes it really powerful.

So, you know, without instancing, here's one way you might do this. You know, you go ahead and draw the backdrop, draw the stars and the planets. And then you would, you know, brute force loop through all 18,000 asteroids and set some kind of uniform matrix, you know, like your model view projection or whatever for each asteroid and that sort of thing. And then actually draw the asteroid. Now, when you do this, what do you do? You draw the asteroid.

And what this does is it fills up the OpenGL command buffer with, what, 18,000 times two calls here. And of course, most people are probably doing a lot more than just setting one uniform. So, you know, you're just filling up this command buffer and asking -- which essentially is asking GL to do a bunch of work. That's just asking the GPU to do work. The CPU is asking the GPU to do work.

This is where instance rendering comes in. Instance rendering lets you draw the same object many, many times from a single draw call. That's what's really powerful about it. You know, and in that draw, you know, every single one of these can have its own position, its own rotation, its own texture coordinates, you know, different all sorts of stuff, whatever you need. There are two APIs, two ways to take advantage of instance rendering. The first one, instance arrays. This is kind of similar to how you would set up your attribute arrays for positions, normals. You might actually, you know, create another attribute.

Or you can use one of the other ones you have and set it up in the stride somehow. But the simplest way to think about it is you have another attribute array. And then you'd specify a divisor so that you could actually get at the per instance data for each instance as you're running through. You know, if you want to see this in a little more detail, the WWDC talk from this year on OpenGL ES 3.0 goes into this in a lot more detail, or advances in OpenGL goes into this in a lot more detail.

And shader instance ID, I'll talk about in a little more detail because it's the simpler, I think, conceptually of the two. But, you know, real basically, it basically gives you this GL instance ID built in in your shader that increments for each instance that you can use to do anything you need. Now, as I said, both of these are available on all iOS 7 devices. ES 3 is the core and ES 2 is an extension. So I highly recommend you guys start using this now if you can. All right.

So taking a little bit closer look at instance ID, you know, you get this built-in GL instance ID in your shader, which is incremented for each instance. So for the first asteroid, the first 48 vertices, you will have instance ID 0. And then the next 48 vertices, instance ID 1.

And so on and so forth. And what you do with it is up to you. So in our particular scene, we were using it to, you know, do some kind of sine and cosine calculation to put it in the orb, to put each asteroid in the orbit somewhere.

And a number of other things, which I'll talk about in a little more detail. But you can use it to, you know, maybe look up into a uniform buffer object with the uniform buffer object extension. You can also use it with the new vertex texture sampling extension, which this is great because with this extension, you can do interesting things in your vertex shader with a texture like bump map each instance, for example.

You can now do bump mapping directly there because you can look up into a texture. And you can use the uniform instance ID to do that. All right. So here's what the, you know, basic setup on the CPU would look like. You would just, you know, set up your vertex attrib arrays.

And then, of course, you know, depending on if you're using the divisor version or not, you might have another one. But either way, you would then create your uniforms, your global uniforms that will be used across all instances. And all in one draw call, specify how many asteroids you want to draw. raw. That's it.

A lot less CPU work. And in your shader, you know, you might have the same setup. You can choose version 300 or not. In this case, I'm using version 300 and setting up my uniforms and my attributes the way you normally would. And using instance ID to calculate the position of each asteroid.

This is kind of a simple canned example. We're just kind of setting them up in a grid. It's not actually what we're doing in the demo, but, you know, just to illustrate the point, instance ID can be used however you want. In this case, you know, I'm modding it by 100 and dividing it by 100 to get some XY.

And then I'm outputting that position. You can have much more complex lighting or whatever it is you do in your shader, but, you know, that's the basic example of how you'd use instance ID. All right, so I'm going to actually run a quick demo and show you guys how you might use this. Okay, so here's the demo I was showing earlier. And here we are rendering, you know, 18,000 asteroids in immediate mode, the brute force method I showed you earlier. We're getting some kind of performance, you know, 18 frames per second.

Okay, that's, you know. So, you know, if I tap the screen, we switch to instance ID. And we're getting, you know, about double. That's actually really fantastic. I mean, it's actually faster if I don't have it hooked up to this projector. But, You know, either way, that's double the performance that we were seeing before.

represents one instance. So that saves-- instead of 864,000 model view projection matrix calculations and a vertex shader, we do it 18,000 times once per instance. So that's better. And it's in a previous rendering pass before we do any rendering at all. Next topic, I wanted to talk about multiple render targets.

The main idea behind multiple render targets is, well, now you can render to four attachments or four textures at once. This might look like the first pass of a deferred shading algorithm where you have your G buffer, your geometry buffers, your normals, your depth, your albedo. And it is, actually. We actually started playing around with some of this stuff.

And what's great is that you can now have-- for each one of these attachments, can have a different format as well. So your color can be ARGB 8888. You might want higher precisions for your normals, 1010102. And you can mix all these in one pass. So the one thing I wanted to mention about this before we get too far into it is that if you want to stay on the fast path, you don't want to output more than 128 bits per pixel.

So if you have a color attachment that's ARGB 8888, that's 32 bits that you're outputting in just that one attachment. And if you have four with the same format, that's going to be 128. So whatever format you use, try and do the calculation and figure out how many bits per pixel you're actually outputting. And that will keep your app on the fast path.

If you go over that, you'll incur a pretty huge performance penalty. I'm not going to get too deep into deferred shading here, because this is something that's been around for a long time on desktops and consoles. And there's a lot of information out there about how you might do deferred shading.

But just briefly want to mention, this is something you might use to decouple the amount of-- like the geometry, the overall geometry from the number of lights in your scene. So the complexity of your geometry from the number of lights in your scene. This lets you have a lot of really interesting lighting, lots of lights. And you can see that there's lots of lights in your scene irrelevant of what the geometry looks like.

So multiple render targets, however, is just this one piece. So this would be the first pass of a deferred shading algorithm. The second pass, you would then read back from these attachments and do some lighting computation to get your final scene. So just to clarify the pass. I don't mean draws. This is actually-- I've had this question a couple of times, so I'm going to clarify it here.

Traditionally, we sort of think of it as the first pass, you bind a whole set of textures, and you render to those textures as your output. And then you would unbind them and bind a whole new set of textures in the second pass and sample from the previous textures in that second pass.

That's kind of traditionally what we mean by passes. And I'm going to use that word a little more in the presentation, so I just wanted to make that clarification now. All right. So this is how you might-- what the CPU code setup for this might look like. And it's pretty simple. You would just declare the four attachments that you're interested in, and then attach them to the frame buffer. And enable-- tell GL that you're going to draw into those four attachments, and then just draw.

[ Transcript missing ]

So, framebuffer fetch with multiple render targets, this is what's new in iOS 7 and the A7 GPU. Combining that together gives you this ability to do deferred rendering in one pass. And what that means is now you can actually sort of read and write from the same attachments. And of course you can read and write from one and output to another. You can read from one and output to another. That's one of the things that makes this really powerful.

You're not limited to just kind of reading back from the same one. This is how you might do deferred rendering in one pass. You would use MRTs to output your four attachments for color normals, that sort of thing. Or in the same pass, you would then read back using framebuffer fetch without changing any of the attachments that you have bound, which is why I'm calling it a single pass, and then compute your lighting and output. to one of the attachments that you have bound.

[ Transcript missing ]

You'll see that we've eliminated a lot of the work that we were doing on the CPU and moved that to the GPU. And that's actually what we want. And you can see that the frame rate has gone up.

[ Transcript missing ]

All right, so if we go back to the FPS tray, we'll notice now, if you're using the A7 versus previous GPUs, that you'll get this program performance graph down below after you've made a capture. And you can see right off the bat which programs are using the most amount of time in your game. Which shaders, right?

So this particular shader, program number 12, is using 22.29 milliseconds of the 24.5 milliseconds of the A7. So this particular shader is using 22.29 milliseconds of the 24.5 milliseconds of the 24.5 milliseconds of the A7. So this particular shader is using 22.29 milliseconds of the 24.5 milliseconds of the 24.5 milliseconds of the A7.

will actually get line-by-line performance information in the shader. And you'll see how much performance, how much time each one of those lines of code are taking. So, okay, great. Well, this first line, where I'm using instance ID to look up in a uniform buffer object, is taking 28.1% of the overall 20.52 milliseconds in my code. Well, I actually, that's work I want to be doing, so that's fine.

I want to actually put more of that, I want more of the percentage to go into that particular line of code than some of the other ones, right? So, okay, let's take a look and see what else we can optimize. Well, you know, this line, this for loop here is taking 11.3% to what? Well, hard code a bunch of values into a model view matrix, and it looks like four times.

Well, you can take a look at the, if we take a look at the shader best practices guide on developer.apple.com, you'll see that actually using a for loop to pre-calculate a value to then, you know, kind of set that up is kind of, it's slow. So something we can do actually is just unroll this loop and hard code these values. So I'm going to go ahead and copy this line of code and delete the for loop. I'm going to go ahead and unroll this loop. and just paste the values here.

And then, when I've made the change to my shader, I can then click on this kind of circular arrow here right below the shader, and it will actually recompile -- ah, it looks like I have an error. So it's smart enough to tell me that I have an error. Okay.

Yes, because I changed the wrong -- Let's go back and do that again. All right. So I'll go ahead and recompile it again. And we'll notice that when it's done compiling, that amount of work that we were doing there is gone and it's moved, and the percentage that we're -- where we want to be doing more work, and that's where we're spending our time.

And of course, the overall time of the shader has decreased. It's now running at You know, 17.56 milliseconds, where it was, you know, something like 20 milliseconds before. Well, that's great. Well, what if I want to actually see that in my game and see what the performance looks like in my game?

Well, if you click this arrow down here, where the camera button was, you can get, you know, this arrow. It will continue--it will rerun the game that you were running, but it will inject the new shader in your game. And you can see that the frame rate has gone up.

So that's one way you can use the new shader profiler to actually kind of tune your shaders. And once again, this only works on the A7. So this is another reason to get one of these devices and play with it. And, you know, you can optimize code for previous devices as well using this, but you have to have the A7 to do this kind of shader profiling. So I highly encourage you guys to take advantage of the tools and play around with some of this stuff.

We talked about the A7 GPU and OpenGL ES 3.0 and how you can use the new technologies together. And we talked about how you can use Xcode's OpenGL ES frame debugger and the new shader profiler to take advantage of how you can tune your games with this stuff. So I hope you guys take advantage of this new technology and go out and make some awesome OpenGL games. And thank you for your time.

I'm Philip Iliasqueux. I'm the new graphics and media evangelist. Just started in November on this team. And Apple Developer Forums is, you know, you can post any questions there, and we'll try to get back to you as much as you can -- as much as we can. And take a look at the documentation for OpenGL.

And this is the video I promised to give you the link for earlier, migrating to the OpenGL core profile. And I'll help you get into GLSL 300 by looking at how we did that on the desktop. So here's the link. You should write it down if you -- you know, if you're interested in taking a look at that. So, great. Thank you very much.