Graphics, Media, and Games • iOS • 47:01
OpenGL ES enables your app to push the performance envelope of iOS devices and create an experience perfectly tailored for the platform. Dive straight into specific recommendations to fine-tune the interaction between your usage of OpenGL ES and iOS built-in behaviors for multitasking, display orientation, and more. Learn critical design principles that will make your OpenGL ES apps feel snappier, look better, and have the flexibility to take full advantage of iPhone 4 and iPad 2.
Speaker: Richard Schreyer
Unlisted on Apple Developer site
Downloads from Apple
Transcript
This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.
Hello. Welcome. This is Best Practices for OpenGL on iOS. My name is Richard Schryehr. I work in Apple's GPU Software Group, and I'd like to talk to you about some best practices. And what I really mean by that are some of the ways to take code written for the cross-platform OpenGL standard and code written for the iOS platform and integrating those together to build a really high-quality application.
I'm going to cover four main topics today. First up are going to be view controllers and what using a view controller to manage your OpenGL view gets you. Second subject is going to be multithreading. OpenGL has a lot of somewhat hairy thread safety rules, and I want to talk to you about what you might stand to gain from multithreading, and if so, how to do it successfully.
Third, I want to talk about how to handle the range of screen sizes we have on our various iOS products today and how to handle those as easily as possible. And then finally, we'll end things off by talking about how to take advantage of the performance of iPad 2. So, to start things off with view controllers.
If you haven't done any significant amount of UIKit programming before, you know, view controllers are an object in a UIKit application that plays the controller in the model view controller design pattern. It's responsible for creating and managing a, you know, either one view or an entire view hierarchy that builds an entire scene of an application.
This is a screenshot of Interface Builder's new storyboard feature and with a sample game, in this case our Touch Fighter game. Storyboarding lets you sort of lay out the overall high-level flow of your application, which scenes the user sees, and how the user progresses through those scenes. And in this case, each one of these scenes are managed by a separate view controller. So if you want to start taking use of these features, you'll need to start from having a view controller for all of your views, both your 2D views and OpenGL.
So also consider the case if you want to integrate Game Center into your application. Again, Game Center generates a very complex hierarchy of views on your behalf, achievements, leaderboards, and the like. And it packages all of that up for you in the form, again, of a view controller.
And so if you want to transition from to a Game Center view controller, you need a view controller to transition from. And that's what a lot of OpenGL applications are missing today. And so I'd like to talk about how to integrate a view controller into your OpenGL application and then use that successfully.
So what is a view controller responsible for? As I said, it creates and manages a collection of views. It also controls transitioning from one view controller to the next, which is really one of the big things that we want to enable. And then finally, I'll add a couple of OpenGL-specific considerations here, one of which is deciding when you're going to animate as your scene leaves the user's view and comes back in. And the second, I'll talk a little bit in specific about how to handle rotating your device. Visibility is actually pretty straightforward.
A view controller, your subclass of view controller, is the place where you receive -- provides a set of methods that you can override, which lets you know when your view hierarchy becomes visible or invisible. And as the user transitions away from your OpenGL view into another view, you know, it makes no sense to animate something that the user cannot see.
And so we want to start and stop that animation at the appropriate times. A view controller provides these really handy call-outs -- view will appear, view did disappear -- which gives you a -- which gives you a view that's not visible. And then finally, it allows you to animate the view that you want to see. Which gives you a place where you can choose to start and stop that animation as appropriate.
In GL kit, we've added a new sort of default OpenGL view controller that you can use to do exactly this. GLK view controller abstracts a way, setting up a -- in this case, a display link for you, which produces regular animation events that are tied to the display's refresh.
GLK View Controller determines an appropriate frame rate, taking into account both your desired frame rate and the capabilities of the display, and it delivers these animation events regularly on the main thread. It also overrides these methods that I just named to automatically start and stop the animation as appropriate.
So, if you're one of the actually very, very many OpenGL games who really just looks like one big 3D view that the entire user experience lives in and you want to start, you know, having your view be a little bit less lonely, this is actually a really great place to start by dropping one of these into your application. The other aspect of ViewControllers is rotation, and this is particularly interesting because in the past we've suggested to not use this, and I'm going to change that advice today.
How many of you have ever seen some game that's sort of done something like this to you? where various aspects of the UI just don't agree on which way is up. I've seen it in a number of places. And if this application properly used view controller rotation, this wouldn't have happened.
So the UIViewController class also provides iOS support for handling device orientation and making sure every part of the UI agrees on what that orientation is. In the past, I've suggested not using this feature because it actually had a fairly large performance impact. But I'm actually very, very happy to say that as of iOS 4.3, that performance impact is completely gone. View controller rotation is free on iOS 4.3, and so it is now, you know, it is now, without qualification, the best way to rotate your OpenGL views along with every other view on iOS.
In the past, we've suggested as an alternative, you know, do the rotation yourself within your view rather than rotating your view. And generally, that's a little bit difficult to set up. It's kind of annoying, and it's actually kind of error-prone, as we just saw. And so you get to remove all of this.
So I want to show a little bit about how to actually rotate your OpenGL view with view controllers. Step one is not OpenGL specific in any way. This is your standard view controller subclass, and you're going to override should auto-rotate to interface orientation. In my example, I'm going to support every orientation. Many games may just say landscape left and landscape right.
The second step is specific to OpenGL. And this is, once you have your view, you have to create a frame buffer that lets OpenGL render into that view. And that frame buffer's dimensions are set when you create it. So if your view is -- the bounds of a view change, your frame buffer is not going to change automatically behind your back. And so you need to delete your old frame buffer and create new ones, which will then pick up the new dimensions.
Usually the easiest place to do this is in your OpenGL view subclass where UIView provides a layout subviews method that you can override. This is called whenever the bounds of the view change. If you use GLKit's new GLK view, it would do this automatically for you on your behalf.
Step 3 is the best part. This is when we can remove a whole bunch of, basically, hacks. You don't have to swap the width and height arguments to all of your OpenGL APIs. You don't have to mangle your transform matrices. You don't have to manually correct all of your incoming touch coordinates. The view controller transform mechanism handles all of this for you automatically. And then finally, this fourth step isn't actually writing any code. This is a verification step.
Core animation has some very carefully written optimizations for getting OpenGL content onto the display as fast as possible. But unfortunately, there are some things you can do that will defeat those optimizations. So what you can do is you can launch instruments, go to the core animation tool, and then in the lower left corner here, you'll see a checkbox called Color OpenGL Fast Path Blue. And it does exactly what it sounds like. For all OpenGL views in the system, if they are being composited via Core animation's most efficient path possible, it'll put a big blue tint over that view, in which case you know you're done.
If you don't see your view turn blue in this case, then you've done something to defeat these optimizations. For example, your OpenGL view may be transparent. We actually have in our written documentation, which I'll link to later, quite a bit more detail on what kind of things you can do that either enable the really efficient compositing paths or may force a little bit of a delay. fallbacks to slower compositing paths, slower but more complete compositing paths.
So that's view controllers. All views in an iOS application should be managed by view controllers, and OpenGL views are no exception to that. If you need to add one to your existing application, GL Kit provides a really good one to use to start with in GLKViewController. And this provides you your automatic animation timing on the main thread.
View controller rotation is now completely free on iOS 4, 3, and later, which erases all previous advice to the contrary. And then finally, Instruments provides a really handy tool to verify that your compositing is being done as efficiently as possible. So that's view controllers. And that brings us to the second and largest subject of the day, and that is multithreading.
So I'm going to start by proposing a couple different design patterns for multithreading an OpenGL application. The first of these is asynchronous resource loading. You know, if you have an application where you've decided that being able to, say, stream textures in dynamically without interrupting rendering is desirable, then this is the kind of design pattern you should be looking into.
And the second design pattern I'll talk about is to actually pick up your entire game and move it off the main thread. And I'll get into how to identify if that may benefit you, and if so, a little bit about how to do it. But to start with, I'm actually going to go through a few preliminary concepts and in the end put them all together to show you a functioning asynchronous texture loader just like the one that GLK Texture Loader provides for you.
So the first preliminary concept is the current context. In this case, if you have multiple contexts in your application, the system needs to know which one is going to receive your OpenGL commands, since GL calls don't take an explicit context argument. And this is a per-thread variable that you can set with the setCurrentContext class method.
It's very important to note for the content that follows is that the current context is per thread state. This means that you can set a different current context on two different threads and call into two different contexts concurrently. Similarly, it is possible to set the same context current on two different threads, but to do so is actually quite dangerous because OpenGL contexts are not thread-safe.
In this case, our first naive attempt at the texture loader is we set the same context current on two threads, and we call tech image concurrently with ongoing rendering on the rendering thread. If you're lucky, this will crash. If you're unlucky, it'll be a lot harder to figure out what's happening. So if we want to be able to have that tech image run without blocking rendering, without a whole bunch of synchronization, then we need to move our text loader to a different thread. So let's move the rendering thread over to context B.
The second concept is shared contexts. By default, OpenGL contexts all have their own completely separate pools of objects. One context has its own collection of textures, which are completely different from another context pool of textures. That doesn't work for us here because we want to create a texture on one context and then use it on another. And the solution to that are shared contexts. In this case, you can create two or more contexts that all use the same pool of objects in the same namespace.
In this case, we have two textures that are shared, or two contexts that are shared by referencing the same share group, and we still have a third context on the side that's sort of in its own separate silo. Setting this up is really, really easy. In this case, we'll allocate our first context as usual with Eagle context, Alec, and it.
This also implicitly creates a new share group for that context behind the scenes, which normally you don't have to worry about. When creating our second context, we'll use an alternative version of the initializer method, which takes an additional share group argument, and we'll pull that value out of the first context. With this, we've now created two contexts that both share the same pool of objects.
Now, this actually has some implications on thread safety. Because word before, when I said you could call into two separate contexts, that was sort of on the assumption that, again, these two contexts were completely separate entities. But now that we've shared them, they're sort of intertwined again, and so, again, there are thread safety implications with that. So those implications are that, well, first thing, two contexts or more can concurrently read from the same objects at the same time. So there's no problem with that.
But you cannot modify an OpenGL object in parallel with any other context reading from it. So if you actually want to safely modify objects, first you have to quiesce all usage of those objects on any other contexts. Call your modification command, in this case, techimage2d, buffer_subdata, and so on.
And then after modifying that object, you have to take a couple more steps because, you know, first, that modification command may still be sitting in some command queue somewhere. It hasn't actually been pushed all the way to the GPU yet, which means it's not really available for anybody else to see. Well, the first thing to do is your modifying context has to call gl flush, which will guarantee that all of those commands have at least been minimally processed to be able to do this dependency tracking.
After the flush returns, all of the rest of the contexts that want to successfully observe this change must bind that object. Even if it was already bound, they have to bind it again. Otherwise, they may just not see the changes because the context may be caching information about its currently bound objects.
A little bit more about which OpenGL commands modify objects, since the cases I have here are fairly obvious, but there are some other cases that are not obvious and might catch you by surprise. For example, a uniform is a property of a GLSL program. And so if you set a uniform, that effectively counts as modifying the entire program and subjecting the entire program to this flush bind rule.
Similarly, rendering to a frame buffer counts as modifying the frame buffer, and it counts as modifying all of that frame buffer's attachments, its color buffer, its depth buffer, and its stencil buffer. The implication here is that you cannot have two contexts concurrently issuing commands to draw into the same texture, however tempting that may be to try. And then finally, I'll call out our Eagle Context render buffer management methods, since these are a little bit weird in that they're really quite specific to iOS.
As far as thread safety goes, these methods are exactly the same as if you called a normal GL function on the named context, but they are both considered to modify the named render buffer. In this case, the first one actually creates our render buffer, and the second one swaps its current context to the display.
So with all of these put together, we can now create our correct example of an asynchronous texture loader. In this case, we have our two contexts. On our loading thread, we'll set the loader context to be current. We'll create our texture as usual and call flush. That completes its job. After the flush returns, the rendering thread on its own context can rebind that texture and then draw from it as usual. And this is completely correct.
However, it can actually be made a little bit better still. Because I've really left out all the code around here that you didn't really want to write in the first place. And that is having to go create the thread, having to go come up with some mechanism for doing cross-thread messaging, marshalling of arguments, using semaphores or condition variables to tell the scheduler that the thread is now runnable. And this is a fairly standard multi-thread producer-consumer problem, but it's still just a lot of busy work to write, and there is a better way. And so I want to take this same example and restate it in the form of some code using Grand Central Dispatch.
which makes all of this quite a bit simpler, even if, you know, although some of you may have not seen the syntax before. So I want to step through this in pieces. Right in the middle, we have our plain old OpenGL texture creation code, followed by the flush for the first half of our flush bind rule.
This code is executing within a block that is called by dispatch async. If you haven't seen Grand Central Dispatch before, Dispatch Async will schedule a task to run on my loader queue, which will run the code inside of that block. And it will do that on whatever thread of Grand Central Dispatch is choosing based on its knowledge of the current load of the system.
Similarly, when the texture load is complete, we're going to call dispatch async a second time and message back to some other queue, in this case the queue that protects our rendering context, and let it know that, hey, this texture load's done and the texture is now available for use.
And one of the coolest things here is that, you know, my printf here is actually using the variable -- is actually using the texture name variable. Despite the fact that my render queue code is actually going to happen on some other thread at some later point in time, this is because Grand Central Dispatch and blocks have captured the various named variables on the stack and pretty much automatically marshaled them along for me just so I don't have to actually do all of that myself. It cleans up a whole lot of -- you know, it cleans up a whole lot of, quite frankly, quite boring code.
And so I'll end this by highlighting I have some explicit calls to set current context here. And this is actually here for a very good reason. Because Grand Central Dispatch itself selects which thread it's going to run the task on, and it's going to be a thread that it owns, and the current context is per thread state, it means that you can never know which thread you're actually going to run on, and you can never rely on the current context when you enter a dispatch block. And so, when using OpenGL and Grand Central Dispatch together, you should always explicitly set the current context at the beginning of your block.
Similarly, we don't want to leave a dangling reference to our context on a thread that we don't own. And so we're going to clear our current context before leaving. So with that, we now have an asynchronous texture loader that can let your main thread run unimpeded and does so with actually a very small amount of code.
And if you had the ability to look inside GLKit's new texture loader, this is pretty much exactly what it does internally. So if you need asynchronous texture loading, you can either do it in just a couple lines of code with GLKit if you have more common needs, or if you need to do something really exotic, you know, reimplementing it yourself is actually quite possible.
So with that, I'm going to switch subjects a little bit and talk about the other usage case, where you actually want to pick up your entire game or your rendering loop and move it all off of the main thread and onto a background thread. Why would you do that? So one big thing that matters to quite a few game developers, especially game developers porting code from another platform, is control over the event loop.
On an iOS application's main thread, you have to return control to the NS run loop really quite frequently because that's the run loop that's going to deliver pretty much all of the events that come from iOS into your application. So if you've got a game that really wants to have the model of sitting in a while loop and spinning forever, that's not going to work very well. So you can either try to -- some developers have tried to restructure their code to fit within the model of getting, you know, regular event callouts from that run loop via timers or display links and the like.
and a number of others have also had quite a bit of success just picking their code up and moving it off to another thread, which they control. And what you get out of that is that's your thread, and you can build whatever event loop you want on it. Another really interesting reason to do this is to avoid blocking on the main thread.
Normally we consider blocking something that happens when you do file system or network I/O, and you block your main thread for tens of seconds at a time if something goes wrong. OpenGL doesn't really block anything for tens of seconds, but it certainly can block your main thread for tens of milliseconds. For most applications, that really doesn't matter, but there are a few we've seen that really, really demand very low latency response to things like touch events, where this actually did become a bit of a problem.
And so by getting that work off of the main thread and leaving the event loop pretty much, the main thread's event loop pretty much uncontended, you know, they had some success in addressing that problem. It's not something that many game developers, not a problem that many game developers face, but it was an important issue for some. And then finally, especially on iPad 2, utilizing multiple CPU cores.
Picking up your whole game and moving it to one background thread doesn't actually use multiple CPU cores quite yet, but it is one really big step in that direction in setting your application up to do that. So I'm going to include it here because it's sort of a preliminary to that.
So if you've decided this is something that might benefit you, then how do you go about doing it? Well, the first thing is, way back on your main thread, you still should have your view managed by a view controller subclass, but you should use your own subclass and not GLKViewController.
Because as I said before, GLKViewController is all about a really easy way to get regular animation events on that main thread. And once you're not there anymore, it's not doing much for you. So that means that you are going to have to create your own UIViewController subclass, override the visibility methods, and signal your background thread as appropriate when it's time to start and stop animation as appropriate.
Another thing is, you do want to pick one thread to do all of your OpenGL work on, and be very careful to make sure that you haven't accidentally left anything behind. Consider this very easy trap to fall into, where we have our very same layout subviews that we talked about earlier. Well, UIKit, when you're -- when you're -- say the device rotates, UIKit is going to handily call that on the main thread for you.
This could be very difficult to track down. So in this case, you know, whenever you see this or any other cases like it, usually the thing to do is either-- you'll either have to signal the background thread to do the work or even just leave a flag set on the main thread and let your background rendering loop pick it up on its next trip through the animation loop. In this case, we'll see the flag, we'll reallocate the frame buffers, and we'll clear them.
In turn, our create framebuffers will end up calling the Eagle context render buffer storage from drawable in that API, and that API is perfectly safe to call from a background thread. Today I've mentioned both using threads at some point and using Grand Central Dispatch at other points. Which one of these should you use? Well, if you're really just looking to control your own event loop and you're not going to do any more aggressive multithreading beyond that, then creating one thread that you own and sticking your while loop in it is probably perfectly adequate. There's not much that Grand Center Dispatch adds for you in such a usage case. So in this case, create your one thread, move your code there, and you're pretty much done.
However, if you're going to be a little bit more ambitious and you really want to take advantage of iPad 2's dual-core processors, then you're going to say, "Oh, I'm going to divide my physics work up into multiple threads. I'm going to do skinning, my game logic AI, and so on." Now, that's the kind -- when you start having these more complex graphs of work between tasks, that's when Granted Dispatch starts to be a really, really big convenience, and you should seriously consider leveraging it as much as possible.
That being said, as widely as you take your own code in terms of tasks execute, I still very strongly suggest that you keep all of your OpenGL API usage, again, into just one thread or just one task. As we've seen, OpenGL's thread safety rules are kind of tricky to get right. And usually the best answer is to just not tempt that and multi-thread everything else around it and have it all feed into one final task.
For example, you could have your game update for the next frame happening in parallel with OpenGL for the previous frame, but you should avoid doing OpenGL in parallel with OpenGL. There are two other sessions on Grand Central Dispatch if you haven't heard all about it already. The first one of which happened this morning and I believe was completely sold out, called Blocks in Grand Central Dispatch in Practice. So I highly suggest you catch that on video. And there'll be another session happening tomorrow morning called Mastering Grand Central Dispatch, which is going to be a bit more of an advanced course. So both of these are fantastic sessions to attend if multithreading interests you at all.
So in summary, OpenGL is not limited to the main thread in any form. You can use OpenGL from any thread you want, but I do suggest that you keep it to one thread. will draw an exception for this asynchronous texture loading case because that design pattern is well enough contained that it's pretty easy to get right and make it efficient. So, but other than that, keep your actual active animation loop all in one place. And that's multithreading. That brings us to our third subject of the day, and that are screen sizes.
Across the range of iOS devices today, you will see three screen sizes that you want to support in your application. On iPhone 3GS, You'll see 320 by 480. On iPhone 4, you'll see the Retina display at 960 by 640. And finally, on iPad, you'll see something a little bit larger at 1024 by 768.
To an OpenGL developer, the biggest implication this has on you is performance. The larger your frame buffer is, the more times your fragment shader has to run, the longer it takes to fill. In this case, note that iPhone 4 has to fill four times as many pixels as the iPhone 3GS.
And while its GPU is faster than the iPhone 3GS's, I can't tell you that it is four times as fast. So in this case, if your application is fragment shader bound, it won't be too surprising to see that your application might actually perform slower on an iPhone 4 than a 3GS. And in turn, iPad is, again, about 30% larger still.
So what to do about this? Well, the first thing to do is, you know, go through the standard set of optimizations. You know, I'll go, you know, make your fragment shaders as simple as possible. Try to, you know, use smaller texture formats, especially in iPad 1. Texture lookup tables may be a better preference for doing math, so on. This is all advice that you'll find in written documentation and that we've discussed in more detail in years past.
You know, you want to strongly consider your fragment shader coverage. Every layer of blending you do, you're now filling that pixel multiple times. And if you've got four times the screen coverage and you're filling every pixel 4, 10, 25 times, it starts to multiply up very quickly. The architecture is a unified shader architecture, so if you're drawing a huge number of vertexes, that very well can sap performance away from fragment shading. In this case, the OpenGL driver monitor and instruments can give you a feel for what the utilization of the vertex processor is and let you know if this might be a factor. But, you know, these are your standard optimizations, and they're good on all devices everywhere.
If you find that this just is not enough to get you to your performance targets, there's one really big tool you still have available to you, and that is to directly attack what changed to slow your application down in the first place. And that is to back off a little bit on the frame buffer width and height. Now, take the example of iPhone 4's Retina display.
If we were to fill 720 by 480 pixels instead, you're only covering 56% as many pixels as before. If your application was fragment shader bound, it is not unreasonable to expect that your application will actually be nearly twice as fast by making this change. That's a really, really big tool available to you to improve performance. Now, we don't want to leave big black bars on an iPhone screen. So, fortunately, Core Animation has some really easy support for taking your lower resolution content and scaling it back up to fill the entire view. And that takes just one line of code.
That is UIView's content scale factor property, which effectively determines how much detail every view in the system is drawn with. And this applies to OpenGL views as much as it applies to every other view. If your content scale factor is, in most cases, less than the scale of the screen, then you're going to be rendering to a lower resolution, and Core Animation will transparently scale your content back up to fill the bounds of the original view.
As we saw with rotation, your frame buffer size is fixed, and so if you want to -- when you change your content scale factor, you have to reallocate your frame buffers to pick up the new dimensions, as before. This support is very, very efficient on all of the larger screen devices, iPhone 4 and iPad. The API works as described on the smaller screen devices, but it's neither efficient there nor is there really any reason to need to use it there. So, but on the large screen devices, this can be a lifesaver.
Then comes external displays where there is more variety. In this case, on iPhone 4 and iPad 1, you'll often see 480p displays or 720p displays. iPad 2 cannot put to 1080p. Now, 720p is, again, about a 30% larger still than the iPad's display, and 1080p is twice as large again. It adds up very, very quickly.
So what to do about this? Well, the first thing is, if your application is using mirroring, or, more accurately, if you've done nothing, then you still get to do nothing. Your application is rendering at its internal display resolution, and you're done. The user will see the mirrored content. However, if you're going to drive the external display into second display mode because you want to put different content on the internal display and on the external display, then the performance implications here start to matter to you.
So my suggestion here is when developing your application, based on your performance targets and your performance measurements, you want to pick sort of your upper bound on how big of an external display you are going to support. You know, usually that'll be 720p because that's closest to the iPad's display size. However, if you have a really aggressive application, maybe something smaller is more appropriate.
At runtime, you can query the available modes on the display with the UIScreenAvailableModes property. You'll want to iterate -- enumerate through this array, find the best match for your development time target, and assign that match to the mode property. This will cause a mode switch on the display and, for example, get it down from 1080p to 720p.
On an external display, you also have the option of using the content scale factor solution, although it tends to be a little bit less efficient than just setting the display mode directly. So, when given the option, set the display mode first and then rely on content scale factor from there if necessary. There is actually an entire session on handling AirPlay and external displays, which happened yesterday, so you'll have to catch that on video. But it goes into all of these topics in quite a bit more detail.
The other side of supporting different display sizes is actually the size of your input textures. So I'm going to take all of your input textures and subdivide them into two course buckets. The first of those are going to be your user interface textures. These are your buttons, your HUDs, anything you're drawing to a screen-aligned quad.
These are the kind of things that tend to be, you know, quite specific to a particular DPI, to a particular aspect ratio. You know, it matters if it's iPad versus iPhone. For these, much like UI images in normal UIKit applications, you probably actually need to draw, you know, draw a different version of that texture for each one of the screen size targets.
At runtime, when it comes time to load them, you can do something like UIImage, which provides this built-in naming convention where you can just say load image, and based on the file names, it will automatically pick up the right version of that image depending on what device you're running on. Or, even easier, you can use UIImage to actually load your images and not have to worry about it at all.
So then comes the other bucket of textures. And those are the textures that don't have a DPI in any way because they're the -- they're the mipmap textures that you're wrapping across a 3D object. For these, the best approach is to actually -- you really don't want to generate three different versions of your textures offline. Because there's no DPI, there's really no reason to do that. And so what you'll do here is you should generate the full set of bitmaps offline, you know, one bitmap stack for all devices, and ship all of those images in your application bundle.
At runtime, depending on either you're on a smaller screen device or you're on a device that has somewhat less memory, you can simply choose to, for example, skip loading the most detailed level. You may want to only do this for some textures and not others, you know, maybe an artistic decision depending on what's actually important to the look of your application.
Oftentimes, you can get away with this without actually having any impact on the visual quality of the app. So I've sort of put together this example where we have a texture, the same scene drawn on a 3GS resolution and an iPad screen resolution. Each MIP level has been given a different color tint, with the reds being the more detailed levels and the greens and yellows being the less detailed levels.
You know, as you can see, for exactly the same scene, the 3GS screen really tops out at the 512 by 512, this sort of purplish level, whereas on iPad, you can see quite a bit more detail. On the 3GS, you can barely see the small sliver on the column where the largest texture level was used.
If that texture level just did not exist on the 3GS, it would have had very, very little impact on the user-perceived quality. And so this could be a really easy way to both significantly reduce memory usage and also to reduce loading time on the devices that were probably taking longer to load to begin with.
So that's their screen sizes. There are three internal screen sizes to support. The larger ones are more expensive for the GPU to fill. If necessary, you can use content scale factor to reduce the resolution. If that's an appropriate solution for your application, you know, actually really depends on the exact content you're drawing. Some applications have been more successful than others with that.
If you're supporting an external display, then prefer setting the display mode before turning down content scale factor. That will generally be a little bit more efficient. And then finally, bake all your levels offline, and at runtime, then it's easy to just skip one to save memory and loading time. So that is screen sizes. So that brings us to our last topic of the day and the most fun. And that is what to do with an iPad 2.
So as you heard Gil Khan describe earlier today, iPad 2, we've exposed a number of new features on iPad 2 in iOS 5. We've improved support for Float 16 rendering and texturing. We've added support for binary occlusion query. We've added support for an extension in GLSL that does for sample shadow filtering. And we've also increased the maximum texture size.
At runtime, if you want to know these features are -- you want to check if these features are available, then, as always, the one true way to do that is with OpenGL's extension string. All of these, except for the 4K textures, have a specific extension name to check for, and the 4K textures has the sort of classic get integer max texture size that you can query. Separate from features, and probably much more interestingly, is performance.
iPad 2 is really fast. It is hugely faster than any previous product of ours. In this case, and it's not just one area of the device that's faster, it's all faster all around. The number of shader instructions per second you can execute, the number of texture samples, the memory bandwidth, the vertexes per second, triangle setup rate, every part of it is hugely faster.
Within your shaders, the performance degradation you go from low precision to medium and high precision in your shader is actually much, much reduced. There's not much you pay there at all anymore. This opens up a whole range of new shader algorithms that previously may have been difficult because they really demanded the higher precisions.
The ratio of shader instructions to texture samples has sort of swung dramatically in favor of shader instructions. So, you know, but where before, you know, you may have -- some people found success, you know, replacing math in their shaders with lookup tables baked into textures. On iPad 2, it may actually be more efficient to go back to just doing the math again.
So how do you tell if this performance is available to you? Well, unfortunately, there is no direct query for OpenGL performance, but we've got a pretty good proxy in the OpenGL renderer string. In this case, this returns a string that names the model of GPU that you are currently using.
On all of the devices that iOS 5 supports, you'll see one of two values from that: PowerVR SGX535, which is the GPU used on iPad 1 and previous. And on iPad 2, you will see PowerVR SGX543. So if you see this value, that's your best indication that you have this huge performance potential available to you.
I do want to make one caution in that this is returning a string, and without saying anything, someday there will be a third or fourth or fifth entry there. So please do code defensively here and do something reasonable when you see an unexpected value. In this case, something reasonable is probably going to be to default to your highest quality rendering path.
So, once you've detected this, what can you do with it? So remember what I said about content scale factor and turning that down? Forget it. Turn it all the way back up, all the way as high as it goes. It can handle it. Throw in full-scene multi-sampling while you're at it. You can do that, too. It's got more than enough performance for that.
If you had problems with level of blending and overdraw, you have much, much, much more budget for that specifically. You can either have a larger number of particles, you can have larger size particles, generally a much higher density in your particle systems. Similarly, if you're using blending for drawing trees or other kinds of foliage.
It's not just the GPU that's faster, the CPU is quite a bit faster as well. And even on a single thread, the CPU is going to get through an OpenGL draw call in much less time, which means that you can afford to do a much larger number of draw calls every second in your application. If your application is multi-threaded and moving other work off to other cores, then that just goes up from there. On the shading side, the first thing you should be looking at is to actually move to proper per-pixel lighting.
As I said, you can actually get away with using much greater precision in your shaders. In this case, actually writing shaders that do a reasonable amount of high-precision math is well within the performance budget. You can use high-precision normal maps and gloss maps to provide really high-quality per-pixel lighting. This is something that you saw Real Racing 2 use to great effect.
We've had a really interesting recent example in light probes. The Shadowgun demo that you saw used precomputed light probes to apply baked indirect lighting onto dynamic objects that moved around the scene. Parallax mapping is a very interesting technique that has actually been used to great effect in Epic Citadel to improve the perceived detail on surfaces.
You also have performance budget now for shadow mapping. In this case, we have a new GLSL function to actually do, you know, read sample from the shadow map and filter it for you. We've actually seen examples of all of these techniques, and sometimes all of them at the same time, used in real shipping applications. Now, these are all techniques that are well within the performance budget of an iPad 2. We wanted to go a little bit farther and see what iPad 2 was really capable of. And so I want to show you what we came up with.
So you've seen a few shots of this in various instances earlier today, but I want to talk about this demo and what it's doing in a little bit more detail. This is a demo of a technique called light pre-pass rendering, which is in turn one of a class of techniques called deferred shading.
One of the big advantages of this kind of technique is that it provides, you know, it lets you dramatically increase the number of dynamic lights in your scene. In this case, all of these little lights you see, our little fairies here, are all each a... There's 64 of them, and they're all each a light casting per-pixel lighting on the surrounding geometry.
One of the things that this class of deferred shading techniques lets you do is draw all of your geometry, completely ignoring what lights apply to them, and then later draw all of your lights, completely ignoring what geometry they apply to, which is what lets it scale so well with a much larger number of lights.
So how is this actually put together? Well, the first thing is, we have rendering of the shadows that are casted by the sun. So Gokhan talked about this a little bit earlier today. But this is our shadow map. The second step in the Light Prepass technique is to render our G buffer. In this case, we're rendering two images. We're rendering our color buffer, which we're actually not writing colors to, but we're actually writing viewspace normals of that pixel. On the left, we're actually running to a depth texture, recording the depth from the camera of that pixel.
So what that lets you do is that if you read the depth from this texture and you know its x, y coordinates in the texture, you can reconstruct where that point was in 3D space. In turn, that lets you find out how far that was from the light, and given that you also can find the normal from the other texture, that gives you what you need to calculate the lighting equation. And that leads to the next pass, which is to actually render the lights. So this image is actually a little bit misleading in what it's doing.
This is actually, you know, it looks like we're drawing lights on top of our little temple with, you know, trees around it, but we're actually not applying lights to the geometry at all. We're applying lights to the images, the G buffer that we just saw. What we're actually drawing are spheres, or some very coarse approximation thereof, and for every pixel on the surface of that sphere, we're looking up into the normal texture, we're looking up into the depth texture, and using that to calculate what the contribution from this dynamic light is. One sphere per dynamic light. And then we just add that in to the light buffer as we move along. And that leaves us with our light image.
And our final step is to composite all of that together, where we effectively blend that light image in with the geometry again with its diffused texture. We also sample from that shadow map that we saw in the beginning to determine the sun shadowing. So just to show you what this kind of thing lets you do, if we remove all of the actual fairies in the scene, then you can see how much character that takes out of the scene. And it gives you a really easy idea of what the ability to just throw a lot of lights at the problem gives you. So this is a light prepress renderer rendering four passes, our shadow pass, G buffer, light, and final composition.
It is applying 64 dynamic lights to this object and also applying one light from the sun that fills the entire scene. The sun is casting soft shadows via a 1024 by 1024 shadow map, all running at 30 frames per second on an iPad 2. So that brings us to the end of today's presentation. So a quick summary of what we went over. We talked about view controllers. Unless you want your OpenGL view to be very, very lonely, it needs to have a view controller paired with it.
We talked about some OpenGL multi-thread design patterns, in which case OpenGL's thread concurrency rules and some suggested design patterns on how to use those effectively. We talked about handling the range of three internal display sizes and more external display sizes if you choose to support the second display feature. And then finally, we went over a little bit about what you can do with the extra features and the performance that iPad 2 provides.
If you'd like more information, you can contact Alan Schaefer, our graphics and game technologies evangelist. We have a very large document called the OpenGL ES Programming Guide for iOS that is available on Apple's developer website. This has a lot of information in it that we did not cover today. We also have a section in the Apple Developer Forums specifically for OpenGL that actually gets quite a bit of traffic, both beginner, expert, and a number of people from Apple's OpenGL team contributing to answer questions there. Thank you.