Advances in OpenGL for Mac OS X Lion - WWDC 2011

Graphics, Media, and Games • OS X • 52:32

OpenGL is the foundation for accelerated graphics in Mac OS X, enabling your apps to tap into the incredible rendering power of the GPU. Explore the streamlined power of OpenGL in Lion and get all the details to take advantage of the OpenGL 3.2 Core Profile. See how your graphics code can deliver incredible visuals using OpenGL and gain specific insights to ensure maximum frame rates.

Speaker: Matt Collins

Unlisted on Apple Developer site

Downloads from Apple

HD Video (232.4 MB)

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.

Good morning. How's the commerce been for everyone so far? Good? Today we're going to talk about advances in OpenGL for Mac OS X Lion. Hopefully you're all in the right place. If not, I won't take it personally if you leave and go to your correct session. So I assume most of you are familiar with OpenGL, but let's go over and review a bit. OpenGL is the high-performance rendering API that we offer in Mac OS X. It's direct access to the GPU. The GPU are very powerful processors primed for graphics that sit in all your machines. And if you need direct, high-performance access, OpenGL is the way to go.

It's broadly used throughout the industry for things like games, visualization, medical, all sorts of things, entertainment. And it's also the foundation of all the great visual technologies in Mac OS X. So if you're used to things like core image, core animation, even Quartz Extreme, the Windows server itself, is built on the foundation of OpenGL. So this is a way to give you the power of the GPU and let you harness it for your applications.

So let's start off with an introduction. And you might be wondering where we're at in Lion and what the state of OpenGL is, what it looks like. So what's new? The big new thing is the OpenGL Core Profile. And this provides you with new rendering features and new support features and all kinds of new things you can use in Lion to help your application look and run great. Spoiler alert.

So the OpenGL Core Profile. All the new features we're providing in Lion require you to use the Core Profile. And this is a little different than what you might be used to, which is what we're calling the Legacy Profile. Legacy Profile is fixed function and programmable. You create the profile, you can use the fixed function pipeline, or you can write your own shaders and use that pipeline. In the Core Profile, everything is fully programmable.

The Legacy Profile is completely binary compatible with all the apps you've already written, so everything that you've written before will continue to run just as great, maybe even a little better online. The Core Profile is all new, efficient, and modern. The legacy profile is default, so when you create a new OpenGL app, you'll still get the profile that you've always gotten before. If you want to use the new features, you must opt in to the Core Profile. This is a choice you can do. Just put a few lines of code in, and you'll get all the new features.

So new Core Profile features. The big one is GLSL 150. GLSL is the OpenGL shading language. And we're introducing version 150 to you guys so you can use it, which comes with a whole slew of great new features. Another new feature is uniform buffer objects. If you're familiar with other APIs, this is quite analogous to the C buffer. And it allows you to use a buffer object as the backing for the uniforms in your shaders. Similarly, there's texture buffer objects, which is a texture backed by a buffer object. It can take advantage of texture caching and other things.

A new way to do instancing, which allows you to draw a whole bunch of batches with one draw call. And this is shader instancing. We'll go through that in a second. Snorm textures. An Snorm texture is a normalized texture. Normally a value in a texture can go from 0 to whatever you'd like. An Snorm texture goes from 0 to 1 or negative 1 to 1. So Snorm is signed normalized. It's also unsigned normalized. So this allows you to use the full range of bits in your texture to just express the 0 to 1 range.

Multisample textures is a way to do multisample render targets. These are multisample textures to allow you to do anti-aliasing as you're rendering to off-screen buffers. And last but not least, timer query. Timer query allows you to query GPU state and do some high-performance performance counting, stuff like that.

Float and integer rendering. We have RG and multi-sample, like I mentioned before. Texture arrays, which are texture arrays similar to 3D textures, except that you don't blend between the Z axis. Conditional rendering, which allows you to render things conditionally based on the output of an occlusion query. And transform feedback, which allows you to output rendering things into buffer objects.

Other new features we're providing. IOSurface. IOSurface is a method to share data between processes, so you can share texture data and surface data. And automatic graphics switching. If you've bought a laptop in the last year, year and a half, you know you have two GPUs. There's one that uses less power, and then there's a super powerful one. And from then until now, you've always been forced to write your app. You create an OpenGL app, and it has to run on the high-power GPU. Well, now we're allowing you to use the integrated GPU to make your app have better battery life.

So let's jump right into using the Core Profile. What is the Core Profile? How do I use it? What does it give me? Concept of the Core Profile, new features and cleaner API. You know, this API has been around a long time, and we've cut out some of the cruft. It's more aligned to the hardware, what the hardware actually does. So a cleaner expression of the GPU power. We want to really have you-- when you call an API call, we want it to correspond to something the GPU is actually doing.

And if you're familiar with iOS apps, this is really similar to the jump between ES 1.1 and ES 2.0. The same sort of leaving the fixed function behind, going to fully programmable shaders, and really giving you the power of the GPU and putting it in your hands. So very similar to ES 2. So why change? Like I said, there's a lot of cruft in this API. There has been in the past. OpenGL was designed in 1992 for machines that look way different than what we've got now. So those Irix GL machines-- Hi, everyone.

I'm Matt Collins. I'm the founder of OpenGL. And I'm going to talk to you about some of the things that we've been doing to improve the GPU. The first thing is that we've been using the GPU for a long time. So the GPU is really, really different than the GPU that's sitting in your machine right now. They really look nothing alike. So we really want to express what the modern GPU looks like in this API.

We don't want to be stuck back on something that came out almost 20 years ago. And this is because these GPUs are really, really powerful, and they're really hungry for work. And most of the time, the GPU is going to be sitting idle as you're using it. And we want to keep it fed. We want to send it lots and lots of work so it's always doing something, just stream all this data down so it can crunch those numbers.

So let's go over what we're gonna talk about. First I'll go over getting started, how to get started with the core profile, what you need to do. The differences from the legacy profile, you know, what looks the same, what looks different? How do you wrap your head around this? How do you send data to the GPU? 'Cause like I said, we wanna keep it busy, so we can keep it busy by sending it tons of data at once so it has something to do.

We'll go over shader development, GLSL 150, what it means for you, how it looks different from the shaders you're writing now, and how it looks different from the shaders you might have written for ES 2.0. We'll also talk about API changes, some differences in flushing data to the GPU, syncing data to the GPU, and syncing different threads in your program, different execution to make sure that the GPU has actually finished what you think it's finished. Then I'll go over a couple tips and tricks and some other things to watch out for.

So let's get started. Requesting an OpenGL 3.2 context. It's actually really easy. If you look at this, you see you need to add two things, the OpenGL profile key and the OpenGL profile version key. So when you create your context, you just have to add these two new attributes and you'll get a Core Profile. Core Profile is OpenGL 3.2 compliant.

And I'd like to call out that if you use Interface Builder to create your OpenGL apps, you will actually need to programmably create your context. Interface Builder will not let you create an OpenGL 3.2 context. And we did this because we want you to be sure you really want the Core Profile context, because the Core Profile will not be compatible with your previous OpenGL code. So you create your attribute list, and you pass it, create a pixel format as usual, pass that pixel format, and create a context with it, and now you have a context that is the OpenGL Core Profile.

Now the differences from legacy profile. Like I mentioned before, everything is programmable. There's no more fixed function. And this is actually really cool because you get a lot more flexibility and power. And the GPU works this way. Your modern GPU actually does not have a fixed function pipeline in it. When you're running old OpenGL apps, the driver itself will create shaders behind your back and sort of emulate the fixed function pipeline. So this is giving that power to you so you know exactly what the GPU is doing.

The other main difference is everything is now a batch draw instead of immediate mode. You might have been familiar with calling GL begin, GL end, passing it vertex by vertex by vertex by vertex. And we really want to get away from this. We want to hook up the hose and feed it millions of points at once so it can have all this data to crunch. It's much more efficient. Because actually, you can imagine sending vertex by vertex over the bus is going to be much less efficient than sending 100K of data all at once.

So using the Core Profile, how do we actually send this data to the GPU? Well, you're going to use a vertex buffer for everything. Many of you have probably used VBOs in your code. And now we're going to tell you that this is the de facto way to draw and upload data. So you want to send data once to the GPU and reuse. You may ask, what happens if I have data that I want to modify? Well, you can still use a vertex buffer object and update it as needed.

So everything you should send once and reuse as much as possible. You're also going to supply data to your shader via a generic vertex attribute. This is also something that's been available for a while, but now this is the de facto way to supply data to your shader. Everything is a generic vertex attribute. You supply this with vertex attribute array. There's no more vertex pointer, no more GL normal pointer, anything like that.

Everything is a generic vertex attribute and completely controlled by you. You can interpret the data coming into your shader however you'd like. This gives you much more flexibility because now you can just use whatever you want in your shader. You can have three attributes, you can have two normals, you can have eight colors, whatever you want.

And this is all helped-- you help manage this with what's called a vertex array object. This is something Apple introduced to GL several years ago, but it helps you manage your buffers and enables. So a vertex array object coalesces all the enabled buffers, all the bound pointers. So you don't have to continue to call bind buffer, GL vertex attribute array, GL enable vertex attribute array over and over and over again. Now you just call-- GL bind vertex array, and it will bind all that for you automatically.

So let's take a look at what this looks like. I'm going to talk about uploading positions. That's the most important thing in graphics. We want a bunch of vertex positions. So I say, pound define attribute position. And I'm going to say this is generic vertex attribute 0, because we'll start at the beginning. So first, I'm going to create my VAO, my vertex array. And I'll create it as normal like you create everything else. One of them passing the pointer.

Then I'll bind the vertex array so that everything else I bind will be associated with the currently bound vertex array. Then I will bind my array buffer as normal to whatever VBO I'd like to use. And then instead of calling glVertexPointer or glNormalPointer, etc., I'll call glVertexAttribPointer. And I've already decided I want to use attribute 0 for this. These are my positions. So the first thing I put into the argument list is my pound-defined attribute position, 0.

Then, similarly to the other pointer calls, I say the number of elements I have in this, so 3. I say floating point numbers. The glFalse allows you to specify whether you want this data automatically normalized for you. Since they're positions, I probably don't want it normalized, so I'm going to say no. Then, as usual, I have the stride, and then I have the offset into my buffer object. And then, lastly, I have to remember that I want to enable.

[Transcript missing]

So you have 16 vertex attributes. You may have more. So you can query the limit to be sure if you want to see if you have 32 or more. But we guarantee you at least 16 across all renderers. Like I mentioned, enables and pointers are saved. So this is great. You only need to bind once. Instead of having multiple bind calls, you can bind your vertex array once.

and that's all you need. Array buffer is not saved. So when you say GL bind buffer array buffer, that's never actually used for rendering. It's only used for GL map buffer, GL buffer data, et cetera. And the actual thing that binds a specific vertex buffer object to the attribute is the pointer call.

So that's what actually does the binding. Other things you might want to supply to your shader. Well, the big thing everyone asks about is, how do I send a matrix down to my shader? Well, it's quite simple. You just use GL uniform matrix. Now, there's no more matrix mode and there's no more matrix stack. And a lot of people say, "Oh, that sucks. I really like the matrix stack." Well, this is actually better.

gives you better program control over what you're doing. You no longer have to manage the stack. Instead, in your render tree, you could just associate a transform with an object. And when it comes time to render that object, you can upload it with GL uniform matrix 4fv, send it all up at once.

One thing to remember with that function call is the count is the number of matrices, not the number of components or the number of vectors. So if you're going to upload four matrices, you would pass it a pointer and you would say four, not 16 or however many components are going to be there.

Now, I said you get better programmer control. And this goes back to when I said we want to express what the GPU is actually doing here. glMultMatrix and glRotateF, all those functions, were actually CPU side work. When you did that, the CPU would calculate these matrices and upload it for you.

And you really don't want that. As a programmer, you want to know exactly what the API is doing. You don't want it to do some stuff on the CPU, some stuff on the GPU. You want to be sure, I know exactly that this is going to happen on the GPU. This is going to happen on the CPU. So you can associate your transforms with your objects. You can know that I did this transformation myself in my code, or I'm uploading it, and I'm going to let the GPU do this transformation in my shader.

And while you're thinking about that, keep in mind that a single matrix multiply is always going to be faster than 10,000 matrix multiplies. So if you have a vertex shader that's going to multiply the projection times the model view matrix, that may occur 10,000 times, or however many vertices you have. If you do that multiply yourself in your code, that will happen once, and you can upload that pre-multiplied matrix to your shader.

That's going to be more efficient. So keep in mind, we call it hoisting things up the pipeline. So usually, the fragment stage will run more often than the vertex stage, and the vertex stage will, of course, run more often than whatever you're doing in your code. So if stuff can be moved up, you should definitely consider doing it. It'll make things quite a bit more efficient.

So what does a shader look like? This is a pretty big shader, so don't get overwhelmed when you see it. I'll go over the interesting things. Some of it may look familiar, some of it may look a little different, but we'll talk about each in detail. So here's a vertex shader.

Looks pretty normal, a regular transform. You know, we have the normals being transformed, normalized, a little matrix multiply, some color that we're passing down the pipeline. And lastly, we output the position with the model view projection. Pretty straightforward. A couple things look a little different. You might be saying, well, what are those? So let's go through it.

So the first thing that you'll notice is at the very top we have a new pragma, #Version150. This indicates to the compiler what the GLSL version that we're using is. lets the runtime know that, hey, this is the new stuff. Be prepared. The next thing is specifying shader inputs. Now, if you used vertex attributes before, you might have been used to saying attribute, vec3, whatever. Well, now we've changed that a little simpler. Now you just say in.

Designate this as an input to the shader, and then you get to specify whatever you want to call it. So you hook them up to your generic vertex attributes that you already specified. And there's two ways to do this. You can bind the name position, in this case, to a specific vertex attribute. So I had decided I wanted to use zero before. So you just say, bind attribute location. You call this.

Make sure you call it before you link. Call it for each of your attributes, and you can decide which vertex array you want to use. So you can decide which generic vertex attribute you want to source it from. Alternatively, you can call get attribute location after your link. This will let the compiler decide which generic vertex attribute it's going to associate all the inputs with.

If you decide to do it this way, one thing to keep in mind is there's no guarantee as to order or anything that may not be consistent. So here I've heard them position normal color. So you might think, oh, that would automatically get zero, one, and two, right? Well, maybe. Maybe not.

Maybe a compiler changes. Maybe it's different per GPU. So don't guess that things are always going to happen the same way. And be sure to code a little defensively to make sure that you're always checking which generic attribute corresponds to which name in your shader. So in replaces the attribute keyword that you may be familiar with.

Next, we have outs. Outs specify what goes to the next stage in the pipeline. So this was similar to putting varying before something before, except that now it's explicitly saying, "Yeah, this is an output." And that makes more sense because varying-- yeah, if you understand how the GPU works, this is something that varies across the entire primitive, but output's a little more straightforward. Yeah, this is output of this stage, and we'll go into the input of the next stage.

So let's look at a fragment shader. This is a little shorter, but should be pretty similar to things that you're used to. Again, we have some inputs and uniforms. Yeah, that looks normal. Output, final color, huh? But I thought we had color built-ins. Well, we'll talk about that in a second.

And the shader itself is a pretty simple lighting deal. So let's talk about how this is a little different from what you might be used to with the fragment shader. Again, we start off with the pragma, pound version 150. This is the new stuff, so don't forget to put this at the top of your shaders.

inputs. Now, as you can imagine, the output of the previous stage becomes the input of this stage. So whatever was labeled out in your vertex stage, or if you're using a geometry shader, the geometry stage, becomes the input of the fragment stage. They have to be named the same thing to make sure that you're passing through.

So in replaces varying in the fragment stage. And last but not least, the outputs. Now there's no more GLFragColor and GLFragData. That's replaced by a variable of your name, and you can call it whatever you'd like. and you label those as outputs. So here I have out vec4 final color. And similarly to how we bound the vertex attributes going in, we also bind the fragment colors going out. So I can say GL bind frag data location, and I can bind it to a specific number which corresponds to whatever color attachment I've attached to my output.

And you would do this before link similarly to how it works in the vertex stage. Also, you can do it after link and let the compiler decide, similarly to how the vertex state works. And the same caveats apply here. You can't guarantee that they'll always be in the same order, that they'll always go to the same output.

So make sure you code defensively, like I said before, and make sure that you're binding or you're getting the frag data locations so your outputs are going to where you expect. So out replaces geo frag data, geo frag color. There's a limit on the number of outputs you can do. Check that limit to make sure you're not trying to use too many.

So let's talk about these API differences from the legacy to the core as we move along. So GeoVertexPointer, GeoNormalPointer, GeoColorPointer, that all becomes GeoVertexAttribPointer. Pretty simple. You can use each vertex attribute for whatever you'd like. Now, when you were enabling things before, you would say enable client state. That's been completely subsumed by enable vertex attribute array. You just need to enable each array numerically as you'd like.

As for rendering, GL begin and end, completely gone, which is good because they were slow anyway. And now we're going to GL draw arrays, GL draw elements. And the next thing I'd like to call out, and I'll talk about this a little later, is GL get string with GL extensions. Now before, you wanted to query your extensions, use get string, and it just gave you this massive string full of GL arb, gobbledygook.

That's a little different now. Instead of getting a giant string that's like 100 lines long, you query each extension individually. So you would say, GL get string i, and you give it an index. And each extension is indexed, and you get each extension string individually. I'll go over that in a bit, why that is and how it works, so don't worry too much about it right now.

Shader differences. Like I said before, all the built-ins from the fixed function are gone. So GL model view is now some uniform that you're passing some matte for. GL normal is something you have to pass down yourself, because you can use the attributes for whatever you'd like. In your vertex shader, what you were labeling as attributes before are now inputs, and what you were labeling as varings are now outputs.

Similarly, in the fragment shader, varings from the previous stage become the inputs of this stage, and GL frag color becomes a variable that you're deciding yourself. So some differences subtle, some not so subtle, but in the end, it's for the best, gives you much better control, and helps you wrangle all these ideas in your head because you can call anything whatever you'd like.

So some other changes as we move from the legacy to the core. Mapping and flushing. I think some of you guys are probably familiar with the Apple Flush Buffer Range extension. We've had this for a while. This has actually come into the core of OpenGL. And now not only can you flush a range of a buffer, you can actually map a range of a buffer.

So map buffer range will allow you to map maybe a small portion of the buffer, 10%. You have 100k buffer, you can map 1k of it. And you can give it a couple flags, map invalidate buffer, which will tell the runtime that you want to invalidate this entire buffer.

You're going to replace the entire thing. All the data in it's stale. Don't worry about it. You can also invalidate only a range. Like I said before, I want to map this 1k range. I'm going to mark that 1k range as invalid, stale data. The rest of it's good, but this stuff I'm going to update.

There's also the map unsynchronized bit. And this is important if you want to edit something that you're currently using. This will tell the GL runtime not to synchronize on previous uses of this buffer. So this will also allow you to stomp the memory you're using. So be very careful when you give it this flag that you're not actually writing over something you're currently drawing with. And lastly, we have the flush explicit bit. This will tell GL not to actually flush this data up until you tell it to. So you must use flush map buffer range in order for this data to get up to your GPU.

So what does this look like? Oh, sorry. In order to use this, You need to have some synchronization primitives. So if you've used Apple Fence or something similar, now we call them sync objects. So to insert one of these sync objects, a fence, into the command stream, you would call glFenceSync. And this puts a token into the command and allows you to query it to see if all the GPU commands before have run.

You can query it and tell it to do a wait on the client side, which means your CPU will wait until the commands have done. Or you can do a wait on the service side, which means the GPU will wait. And this allows you to do interesting synchronization ideas, like issue a bunch of commands and put in a fence. Then on another thread, you can wait on that fence, put a GPU wait, then issue a bunch of other commands. And this will guarantee that the first thread's commands have executed before the second.

So you can queue up commands based on earlier operations and guarantee they will happen in order. Now, make sure you flush appropriately. So the way the GL works is a command buffer fills up with commands, and when it gets to the end, spits them out to the GPU and they do a bunch of work.

Now, you can imagine if you have this big buffer, and you fill it up halfway, and then you put your fence here, you could be sitting there for a long time because nothing's going to be sent to the GPU and it fills up, unless you tell it to flush.

So if you want to make sure everything keeps going normally at a good rate, then make sure you flush appropriately. Now, this is a fine balancing act, because you don't want to flush too much-- that's going to be slow-- and you don't want to flush too little. So take a look at your app and try to figure out what the precise amount of flushing is right for you.

So here's what a fence sync looks like. You insert a fence into the command stream. Pretty simple, fence sync. GPU, wait for-- you know, sync commands. And that's the only token you can give this right now. Maybe more will be added, but that's what we've got. Map your buffer range. And then I can wait, and it'll tell me if my commands are done. Pretty simple.

Now, I can use this to actually do stuff in the background while I'm waiting. As I map this buffer range, I can wait, and on my fence, I can test to see if all my previous commands are done. If they are done, I can modify this buffer and do things, you know, to update it.

If they're not done, I might want to do some other work so I'm not stomping the memory I'm using. So this is a way to synchronize your use of buffer objects with your application and updates. And then when I'm done updating my buffer, I say flush map buffer range. Then I'm going to want to delete my fence, and I'm going to want to un-map my buffer.

So if you're actually using the unsynchronized bit, these fences, these sinks, are a great way to make sure you're not actually stomping your memory. Because you can insert a fence after your draws, and then you can be sure that all your draws are done, and you're not going to be modifying the data that those draws were using.

The next thing we'll talk about is off-screen rendering. Now, we've been pushing FBO for several years now, and now we're finally giving you the last push to move to it, because in the core profile, there is no more pBuffer. FBOs are what you're going to use for rendering off-screen. This is actually good, because I think FBOs are a whole lot easier to use than pBuffers.

Much more efficient, the API is much cleaner, and the error checking is more strict, which will help you find bugs, because you never know, when you port something over and you start getting these errors, it turns out that, oh, wait, I did that wrong. So stricter error checking is actually really good to catch, you know, sort of small issues in your code.

Now, all this might look a little familiar to you if you've been using ES 2.0. You know, going straight from the fixed function, fully programmable, using generic vertex attributes for everything, no more matrix stack, managing the matrices yourself. So I'll do a little quick comparison between ES 2.0 and the GL Core Profile.

had a lot of the same reasonings behind them to make these changes. So ES 2.0 uses generic vertex attributes, but you'll use the attribute keyword there. And here, similarly to the legacy profile compared to the core profile, you'll use in. And again, there's varying in ES 2.0, which becomes out. Varyings in the fragment stage becomes in. And geofragcolor is still in ES 2.0, but like I mentioned before, now you have your own variable name. It can be called whatever you'd like.

The more interesting differences are in Cocoa and how you set up your ES 2.0 rendering. On the iPhone or on the iPad in iOS, you'll have the Eagle layer. You'll grab the EAGL layer, and you'll start rendering into that. Well, you don't have to do that in Lion. We have the NSOpenGL view, which is a standalone object, and it allows you to render right into it. And similarly, how you have an Eagle context in iOS, there's an NSOpenGL context.

on the desktop. And lastly, when you're presenting your rendering to the screen in iOS, you're going to use present render buffer. Instead of doing that here, you're actually going to draw into FBO0. FBO0 is the back buffer on the system. So if you have a window, if you have a drawable, you have FBO0. Make sure you do have a drawable, though.

If you don't have a window on the screen and you create a context, your drawing will go nowhere and there'll be an error. So make sure you actually have a system drawable. And to present this information to the user to send this to the screen, you're going to use flush buffer.

So a quick recap. And this OpenGL view is a standalone object. You create the view and you attach it to a window. You can do this in Interface Builder. You don't need to grab the GL layer. You don't need to use Core Animation. And you must use the flush buffer method of the context in order to get that information onto the screen so that the user can see whatever great graphics you've done.

So quick summary. Use the Core Profile. It's great. A lot of modern features, better API. Use the VAOs to group the buffers that you've got to help you manage your enables and all your state. Use your generic vertex attributes to supply data, as much data as you'd like. And use OpenGL 1.5 for your shaders. And that's about it for the Core Profile.

Now tips. I mentioned talking about querying extension before, so I'll give you a quick hint how to do that. GL get string, GL extensions with an index. That's indexed from zero to the value of GLNUM extensions. And individual extension strings are given to you by this result instead of one long one.

Now, like everything else, you cannot guarantee what order these extension strings will come out in. Certain renderers may be reverse alphabetical order. Another renderer may be in alphabetical order. A third renderer may be in some random order. So you're going to have to go through all of them to make sure that they're all enabled. So query the GL extensions, the number of extensions with get integer. And then you can loop over like I have here and just test each extension string to see which ones are available for you.

Don't forget to include GL3.h. This is OpenGL 3.2, so there's GL3.h and GL3x.h if you're using extensions. Calling deprecated API will error. It'll give you invalid operations, so you can't call vertex pointer or anything like that. That's not valid. And the other thing to keep in mind is there is no vertex array object zero.

So if you start binding things without creating a VAO, nothing will get to your shader. You must create a VAO and you must bind it before you start binding vertex attributes. And like I mentioned before, there must be a drawable attached to FBO0. So if you don't have a window on the screen, you don't have a drawable, you can't actually draw anything, which seems a little straightforward, but it's worth repeating.

All right, let's get into the cool stuff. New rendering features. Uniform buffer objects. This is something people have requested for a long time, and they're really cool. They allow you to upload and store uniform data as a buffer object and upload it in massive chunks at once. So you use a buffer object for storage, similar to how you use it for vertex data.

Now you can use it for uniform data. And this is faster than calls to GL uniform. So instead of updating each uniform at once, you know, GL uniform, GL uniform, GL uniform, you can now put it into a UBO and upload this all at once. all in one chunk.

So what does this look like? Well, it kind of looks like a C struct. See, we have the layout which specifies that this is going to go in order. So here I have a whole bunch of matrices. This is just going to be matrix, matrix, matrix, tightly packed. There may be some padding to make sure we're aligned, so keep that in mind.

And so you specify it by specifying the layout uniform UBO, which is going to be the external name in your code that you reference this by. Then you declare it like a C struct. I just have a bunch of matrices here. And then I'm calling it block. So later on in my shader, I'm going to refer to it as block.mv. This is going to be the first matrix. Quick, simple shader to give you guys an idea of how this works. So pretty straightforward. Now you just have to figure out, well, how do I hook this up to the data that's in my program?

Well, it's pretty simple. Again, you need a binding point that you define yourself. I'm going to call it block binding here. And I need to get the index of my UBO from my shader. So I do that with GL get uniform block index. And like I said, in my shader, I had called it UBO. And now I grab it with this function, the, oops, UBO. will now show you how to bind a buffer to your shader.

In this example, I'm going to use the actual UBO from my shader. Uniform block binding specifies my UBO is now going to be binding point zero. This is so when I bind a buffer at the very end, I can specify that I want this bound to buffer the UBO zero. So here I'm using bind buffer range to bind a buffer to my UBO, but you can also use bind buffer base. So pretty straightforward, not too bad.

So, hooking it all up, get the UBL block index, which is similar to the attribute location. Set the block binding index, which you get to specify yourself, and then bind your buffer object to that block binding index that you specified. You can use bind buffer range or bind buffer base, and specify buffer some existing uniform buffer.

Remember, you want to check your size limits. UBOs cannot be infinitely big, so query the limits to make sure that you're using a UBO that's not too enormous. Just like VBOs, you cannot modify a UBO that's being used to draw. So you can get around this by orphaning your buffer, which you can do by calling buffer data with null.

And that will tell the driver, keep using this buffer, and then get rid of it when you're done with it. But right now, I need a new one. Or there's the old standby of double buffering your data, which is probably the most straightforward and easiest way to do it.

Split frequently used and modified UBOs into separate objects. If you have a UBO that's going to be updated once a frame, and you have a UBO that's going to be updated once a draw, these should be completely separate objects. You shouldn't put them into the same uniform buffer object.

and don't update more than you need to. Pretty self-explanatory. If you can update something once a frame, you should. If you have to update it once a draw, well, then you have to. But if you don't have to, don't do it. Because the less data you send up to the GPU, the better overall.

Now, similar to a UBO, we have a texture buffer object, a TBO. This is a new texture target, GL texture buffer, and a new sampler type, which is sampler buffer. This is quite similar to a UBO, except that it's a giant 1D texture. And this allows you to do fast uploads, but it also takes advantage of texture caching.

All of your texture units have a cache associated with them, and this will allow you to do fetches that could be uniform data, also backed with that cache. And similarly to a UBO, you have to check your texture size limits, because there are limits to the amount of data that can be used with a TBO.

So how to do this really quickly? You generate a texture as normal, and then you bind a texture to the texture buffer target. And then you use this text buffer call. Now, a texture buffer is backed by a previously existing buffer object. So this does not actually allocate brand new memory from nowhere. You must already have a buffer object with this data in it. So keep that in mind when you're binding your TBO.

Shader layout is pretty simple. Just declare a sampler buffer.

[Transcript missing]

Instancing. You might have heard me talk last year when I talked about divisor instancing. Now we have shader instancing. Now this is a new built in to your shader called instance ID. And this will increase monotonically depending on the number of instances you're rendering. So the first instance, this will be zero, the second, it will be one and so on.

And you can use this to index uniform or texture data, something like a skinning matrix survey would be great. If you want to draw 100 skin guys, you upload a ton of matrices and then you draw an instance so that guy zero gets the bones for instance zero and so on. Let's take a quick look at the shader. Looks pretty simple. So I have a UBO here.

And when I access it, I access it with the instance ID. So here, whatever model new matrix I'm using for this instance is just indexed into this giant array of matrices I have. Easy way to do instancing, pretty straightforward. Next, we have multi-sample textures. These are multi-sample render targets, another new texture target.

To create a multi-sample texture, you need to decide how many samples you'd like and whether they're a fixed sample size. Now, the fixed sample size means each sample in the texture will be the same size and in the same place, proportionally. You can decide what's right for your rendering or what you'd like. Then you use Tech Image 2D Multi-Sample, and quite similar to the regular Tech Image, but now you want to supply your sample count and whether sample location is fixed or not.

So you could use this for something like deferred rendering. Deferred rendering does not typically handle multi-sampling very well, but if you multi-sample each slice of the G buffer, you can actually get some cool anti-aliasing out of here. So in this case, I have a multi-sample texture for my positions on the left.

I have a multi-sample texture for my normals in the middle and a multi-sample texture for the albedo on the right. And these then feed into the final composited image, and I get an anti-alias teapot with deferred rendering. Now keep in mind, this will take a little bit more RAM than a regular deferred renderer, but you'll actually get the benefits of anti-aliasing this way.

So to access this in the shader, here's a quick shader for my deferred rendering. I declare a multi-sample texture sampler. And then I actually need to loop over all my samples to decide how I'm going to resolve this, because these are not resolved automatically. You have to do the resolve yourself. So like I mentioned before, TexelFetch uses integer coordinates.

So you need to actually calculate your normalized coordinates and make them into integer coordinates, unless you're specifying integer coordinates yourself. You can do this with the texture size built in. You use texture size and you give it a sampler, and it will give you the width and the height of that texture.

So here I'm just multiplying my normalized coordinates by the width and the height to get the integer coordinates. And I'm using Texel Fetch. to resolve these. So I loop over samples. So maybe I have four samples in my multi-sample texture. I loop over and resolve it sample by sample and accumulate this into my color. So pretty straightforward and quite simple, but an interesting technique nonetheless.

Some other quick tips and tricks. If you're using the legacy context, keep in mind that texture2d.lod, which is something I see in a lot of games, is actually an extension. This is not a built-in. If you want to use this functionality, you need to use the pound extension pragma and require it so the shader compiler knows that you want to use this.

Another thing I want to bring up is that TBO and PBO are actually separate things. Now, we're introducing TBO, which sounds similar to PBO, the pixel buffer object, but these actually are not the same thing at all. Texture buffer object is a texture-backed buffer storage. Pixel buffer object is more used for asynchronous texture upload. So keep that in mind when you're thinking about the two of them and when you're writing your app. Make sure you pick the right one for whatever job is at hand.

So let's quickly go over some other new features that we're bringing forward in Lion. The first one that I'm really excited about is automatic graphics switching, and we really want you to save the battery. So what is automatic switching? Well, like I said, if you bought a laptop in the last year, year and a half, you know you have two GPUs in there. You have one GPU that's integrated, doesn't use much power, has great battery life, and you have one that's discrete, which is really powerful.

Now, we're giving you the power to use the integrated GPU so that your apps and your users can have really good battery life. In order to do this, the first thing you have to do is create a pixel format that tracks multiple GPUs. And this is the Allow Offline Renderer pixel format.

So put that into your declaration when you create your context. Now, to actually support the integrated GPU, you just have to add an attribute to your plist, which is this NSSupportsAutomaticGraphicSwitching. This is a Boolean attribute. You just set it to yes. That tells the system, hey, I want to run on the integrated GPU. If you want more information about doing this, more information about supporting multiple GPUs, you can see session 310, which is from WWDC 2009. Now, if you're going to do this, I can't stress enough, test on actual hardware.

The driver for the integrated parts is different than the driver for the discrete parts. They may have different characteristics. So you want to make sure your app runs well on the integrated driver that you get the performance you want and there are no rendering errors. You also want to make sure that your app survives switching between the two.

So let's say you have an application that's running on the integrated part and then the system decides, hey, I need the discrete part, so I'm going to kick it up. All the applications will move over. And you want to make sure your app moves over cleanly, smoothly, without any flickering or any texture corruption. So you can do that by starting up your app and then opening another GL app, maybe even something as simple as chess, to make sure you survive the transition without any weird artifacts. Also, make sure the reverse is true.

So when everything is running on discrete and you close down all the applications that require the high-power GPU, you want to make sure your app switches back to the integrated GPU gracefully as well. So make sure you test on actual hardware if you're doing this. A lot of latent bugs will show up as weird flickering, slight texture corruption for a split second, something like that. And that usually means that there's some issue with the way you're uploading textures or you're not doing something completely safely. And if you do this and you figure, like, man, I'm doing everything right, as always, feel free to file a bug.

Some other cool stuff. I hope you all went to the OpenCL talks. If not, I highly encourage you to go check them out. Talked about a lot of cool stuff that we're bringing forward in Lion in OpenCL. IOSurface. IOSurface will let you share things between contexts and apps. Now, you cannot share between an OpenGL legacy context and an OpenGL core context. One thing to keep in mind. You absolutely must share data. You can try IOSurface. And lastly, OpenGL Profiler.

OpenGL Profiler is our awesome profiler tool that lets you profile your app and debug it. Now we've added a great new feature, which is remote profiling. So you can control profiler from one machine and run the app on another machine and actually control it over the network, which is great, especially for full screen apps.

So full screen modes, speaking of full screen. If you want full screen, the best way to do this is something we've been saying for a couple years, is just create a covering window. Cover the entire screen and draw to it. This gives you all the benefits of full screen and none of the drawbacks. It also will give you a slight performance benefit as we automatically switch you into the high performance rendering path in terms of compositing to the screen.

This also has the benefit, if there's a notification that needs to pop up, the user actually will get the notifications without anything blocking the way. Because before, if you had captured the fullscreen, they could have 10 or 15 windows behind there that they'd never seen. If you absolutely must switch modes, check the QAs that we've got on the developer site. There's a couple of things that you can use, but again, we'd like to encourage you to actually just create the fullscreen covering window.

Now, I know some apps will use the CG display base address, which used to give you the frame buffer pointer. This returns null now. You can no longer grab it. So keep that in mind. Don't use it. You'll get null, and then you'll crash, which is undesirable, to say the least. If you're interested in video capture and capturing things for the screen, there's been a couple of great AV Foundation talks. I'll go over what they were at the end, and you can go check those out if you're wondering how to capture from the screen.

So a quick wrap up. We have lots of great new features in Lion. Want to take advantage of the Core Profile. Create that context. Create a couple of VAOs. Use your generic vertex attributes. And use GLSL 150. And try out the new features. Try out instancing. Use UBOs. Use TBOs.

Use multi-sample textures. These are all really cool, powerful features you can use to make your applications look great. So try them out. Integrate them into your apps. Let us know what you like about them, what you don't like about them. We're always listening. And now I'm going to show you a quick demo, demoing instancing.

So here we've got a galaxy and staying on that theme. So I want to draw this planet. Looks pretty cool. It's got some asteroids floating around. And as you can see, I'm making 10,000 draw calls. And I have-- or sorry, 1,000 draw calls. And I have 1,000 objects, which is about 70,000 vertices.

And I'd really like more. I'm really into more. So let's say 2,000. You can see things are getting a little slow as I draw all these asteroids. So I can draw about 7,000 of them with 60, but I really like-- this doesn't look very realistic. I want more density.

I want 50,000. That looks okay. We're starting to get there, but really I'd like, you know, like 100. That'd be great. So yeah, so now I have my 100,000. I'm running at 4/5, and this isn't really acceptable. But if I have instancing-- I'm now running at 100 fips. This is great. So I can even put in more. Oh, man, 130, 140, 150.

Oh, let's do more. Let's see. 170. Yeah, so we'll top out about 180,000. That looks much more like an actual planet. And you can see I have a nice little shadow, and the asteroids are slightly colored, because we're thinking, like, Saturn has, you know, slightly colored rings. Now we're in the dark of the planet. So you can see here that with a UBO, I actually can draw all this with 176 draw calls, which is way better than the 180,000 draw calls. Now, let's see how slow-- if I go back to non-instanced, yeah.

So that's 180,000 draws. That's 176. Now, I have to do 176 because, as I said before, there's a limit to the size of the UBO. However, the TBO limit is much greater. So I can actually do this with one draw call with a TBO. This is 12 million vertices, 12.6 million vertices with a single draw call. And I get 60 frames a second, one draw call, runs super fast. So instancing is really cool.

You can use, I mean, you can instance whatever you like. It doesn't have to be asteroids, what we've got here. I mean, you can see that each asteroid is slightly spinning and it's rotating around the planet. So you can actually have them be doing things completely independent of each other, even though you're only using one draw call. So that's a quick instancing demo, just to give you an idea of what you can accomplish with this technique. It's pretty cool.

Now, if you need more information, Alan Schafer is our graphic and games technologies evangelist. Drop him an email. He's always waiting to hear from you. And if you want to hear more about OpenGL, check out the OpenGL programming guide. Now, right now, it's still in pre-release because Lion has not made it out to the public. But you guys can check it out on developer.apple.com.

Couple of more pieces of information. We have technical QAs about screen capture, an image snapshot, and using the integrated GPU. And if you're curious which QA numbers they are, they're in the link here. But 1740 is capturing screen activity to a movie. 1741 is how to take an image snapshot. Using the integrated GPU is 1734.

And another tech note on supporting multiple GPUs, because if you do want to use the integrated GPU, that is equivalent to supporting multiple GPUs, because you will actually have two. There's the integrated one and the discrete one. So make sure you check those out if you're interested in any of those topics.

These are the related sessions. Unfortunately, they've already passed, but you can go check out the videos. So there's the best practices for OpenGL ES. A lot of those apply to the desktop as well because ES 2.0 and the Core Profile are quite similar. Of course, you have a little bit more power on the desktop. And then there's introducing AV Foundation Capture, which we'll go over stuff about capturing stuff from the display if you're interested in that. And that's it. Thanks for coming to hear me talk, and I hope to see you at the lab.