Advances in Modern OpenGL - WWDC 2008

Media • 59:38

Understand how advances in OpenGL unlock the rendering power of the GPU. Tackle GPU-based vertex and fragment processing with the OpenGL Shading Language (GLSL) and use the most current capabilities of OpenGL to modernize your code. Learn techniques for integrating the high-performance 3D graphics pipeline with the other graphics frameworks on Mac OS X. A must-attend session for Mac OpenGL developers to learn how to take advantage of the recent innovations in graphics hardware.

Speakers: Kent Miller, Alex Eddy

Unlisted on Apple Developer site

Downloads from Apple

SD Video (729.1 MB)

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.

So good afternoon. Thanks for coming to Advances in Modern OpenGL. My name is Kent Miller, and, uh, Let's just dive right in. As a way to kind of frame our discussion today, spend a second talking about past OpenGL. So, past OpenGL, you know, early '90s, kind of looked like this. Immediate mode, multiple calls per vertex, and allowed you to do neat things like lit shaded whales.

At that point, a lot of the OpenGL pipeline was implemented in software. A lot of the real-time bottleneck was spent doing computation on the incoming data, vertex transformation, lighting. All that kind of stuff. Contrasted to today, all that computation is more or less free. And the way applications use OpenGL is to just bombard the card with data. Big textures, lots of them, lots of geometry. Sometimes the textures and geometry change per frame. So the bottlenecks really moved to becoming a data management problem to get the best performance out of the graphics system.

So we just talked about immediate mode, but today a well-performing application will use vertex buffer objects and vertex arrays to use little function calls to specify the vertex data And instead of fixed function OpenGL, modern apps are using shaders. All the current graphics hardware, when you use fixed function OpenGL, it just internally makes a shader out of your state anyway. So a lot of times it's more straightforward and executes quicker on the graphics card to just make your own shader to do exactly what you want it to do.

Many function calls to change state. So instead of doing that, a modern application will use buffer objects and all the other types of objects in the system that allow you to change state in batches. And programs allow you to switch, you know, from one complete transform state to another one. And instead of making multiple calls to change the fixed function state.

Instead of blocking calls for pixel data, so to specify or get something back from the graphics card, modern applications use pixel buffer objects to do that asynchronously. We'll talk about that a little bit. Then an old way to do off-screen drawing was to use P buffers, which were platform-specific and worked differently on OS X than it worked on other platforms. But today, OpenGL provides framebuffer objects, which allow you to do a lot of the same things in a cross-platform way.

So, in this session, we're going to talk about keeping the graphics cards busy and a couple different ways to do that. The Buffer Object API will cover to allow you to manage your memory in the graphics system and provide updates to your data in a way that doesn't block the drawing pipeline. We're going to talk about how to generate data using the graphics card, capture intermediate results, and feed them back into the system.

And then we're going to talk about some more new OpenGL APIs that allow you to manipulate the intermediate results on the graphics card without any round trips to the CPU. And then I'm going to show a little example of using OpenGL with OpenCL, using OpenCL compute kernels to generate some data and then using OpenGL to visualize it.

So here we go with buffer objects. So first let's talk about vertex buffer objects. Vertex buffer objects encapsulates vertex state, vertex array state. And it allows you to do two things that you really can't do any other way. One is that it allows you to tell OpenGL when you're going to change the data. If OpenGL doesn't know when the data in your vertex array is going to be modified, it really has no choice but to use it in place every time.

And that's not good for performance. And then it also has a memory management policy, APIs that allow you to hint to OpenGL how you're going to use that vertex buffer object so it can store it in an optimal place. Also allows you to update your data in a non-blocking way.

So when you start to think about the computer, so it's a diagram, simple diagram of the computer. There's a bus between the CPU through the system controller to the RAM, and CPU to the GPU, and GPU to GPU memory. But, you know, there's -- you can't directly access the RAM from the -- from the GPU. You have to go through the system controller. So that leads to a trade-off depending on what you want to do with your data. If you store the data right there in the system RAM, then it's great for your program to access.

It can quickly manipulate anything it wants to there. But the drawback to that is the GPU doesn't have direct access to it. It has to go through the system controller and either, you know, directly access the data in small chunks or make a copy up to the -- up to the GPU to use that data.

On the other hand, if you store the data in the graphics card, that's great for the GPU, but if you want to modify it with the CPU, then that's not going to be a good idea. But if you store the data in the graphics card, and the GPU is in the graphics card, then it takes more time. You either have to update it and copy it or do, you know, slower writes through the GPU to the GPU memory.

[Transcript missing]

And if you want to update the data in a vertex buffer object, there's really three ways to do it. We're going to go over some code with this. So just spend a second here. You can use the map and un-map buffer pair together. Buffer sub data allows you with one call to pass a small amount of data into the system and it will update your buffer in place and do all that for you.

And then Apple has an OS 10 specific extension, flush buffer range, which allows you to take responsibility for telling GL which part of your vertex data you modified. And that allows it to copy just little bits of it and send that to the graphics card as opposed to the entire vertex buffer object. And it can have a good effect on performance, which we'll show in a second.

If you want to think about this, just why do you have to use a vertex buffer object? The OpenGL standard allows you with the vertex array usage, so the same vertex array that's been in forever in OpenGL, you can really just change the data at any point and GL is expected to pick up the change. So in this little piece of code there, you call drawArrays to a set of data you defined and then your program is going to go off and modify it.

And then when you call drawArrays again, OpenGL is expected to get your modifications. So what that leads to is that OpenGL really has to read your data every time it needs access to any particular piece of it, you know, the X vertex for this point. It's got to read it out of memory again.

So, this chart was a little data that we took to show the difference. The previous slide said that draw arrays was fancy immediate mode and this slide really illustrates that. So, you can see the yellow and red line on the chart are similar performance curves. The yellow line is just using immediate mode.

So, even at the largest batch sizes, you can see that it behaves similarly in performance to the draw arrays call. The blue line is using static VBOs. So, the data was able to be copied to the graphics card. GL was assured that the data wasn't going to be modified when you used it. And so, it was able to not make any copies and just read straight from its cache copy, which allows the performance as the batch sizes get big to really have some legs there.

So this is a simple code, simple piece of code, and I'm showing it to contrast it to using flush buffer range, but really just updating your VBO. This is the simple way to call mat buffer. Then you modify your data. Mat buffer returns a pointer to the data.

So you use that pointer to update the data. And then you call unmat buffer when you're finished. And then the pointer that it returned to you becomes invalid and any changes to it are undefined. Crash, my crash, might at least, very least not get noticed by the graphics card.

The Flush Buffer Range extension, though, allows you to update a portion of the VBO data and you to tell GL what changed. So in the prefix example with map and then un-map, GL's going to be forced to copy your entire buffer when it needs to put it back on the graphics card for fast access. And this extension relieves of that responsibility because your app's going to tell exactly just which specific pieces of the vertex data that it changed. And the code for that just looks like this.

So step one is A promise to OpenGL that you are going to manually flush your modified data. So, I'm Then step two is just the same. Map buffer it, returns a pointer. Step three is the same. Modify the VBO. Step 4 is where you flush, explicitly flush what you modified.

So if you had a megabyte of vertex data and you changed, you know, 256 bytes of it, and you just flush that, then OpenGL can absorb that change very quickly, you know, transfer it to the graphics card for caching or whatever it's going to do with it, and then when you call unmap buffer, that's not going to do anything because you already manually flushed what you were going to do. So what does that do for your performance? So this chart shows the yellow line is the performance of -- So, we took this data from updating 0% of the data up to, you know, 90%. At 100%, you kind of expect the two things to converge.

And the red line, curiously, shows the performance of buffer sub-data, which really shows that that's not a very optimal way to integrate the 3D graphics pipeline. And the red line, curiously, shows the performance of buffer sub-data, which really shows that that's not a very optimal way to update your data.

So, I have to mention at this point that this kind of performance tuning can be sensitive to many things. It can be sensitive to your data alignment, interleaving, Data size. So it's really, you know, if you remember the code that we saw, it's easy enough to just try the different techniques inside your code to see what's the best performance for you. But I think in our experience, map and map used with flush buffer range, you know, pretty much always provides the best answer.

So vertex buffer objects allow you to capture and manipulate, switch quickly between vertex array state. Pixel buffer objects accomplish the same thing for image data. This shares the API, a lot of API with the VertexBuffer object. It uses the same memory management hints with the static, dynamic and stream. The same call to map and unmap to get access to the data, same calls to specify the data. It's also possible to overlap the internal storage for PBO and VBO.

And what that allows you to do is, you know, generate some data with one and consume it with another. So you can generate image data, copy that to a PBO and then reuse that as Vertex data without incurring a round trip through the CPU. And we have some sample code that shows that. I think we had a demo of this last year. But it's interesting and if you're interested, you should check that out.

So, Pixelbuffer objects provide cross-platform way to implement some things that we've provided platform specific thing -- platform specific ways to do before. Pixelbuffer objects encompass some of the functionality of texture range and client storage. And it also is a supported cross-platform API to use to get non-blocking read pixels behavior, which we'll see in a minute.

And you can use a Pixelbuffer object with any API that uses images, takes images from the GL entries, draw pixels, read pixels, any of the texture entry points, anything that takes an image. So the read pixels thing, this is how you call read pixels. So this has the disadvantage of -- when you call this, the graphics card has queued up commands. So it's working ahead and your app is running ahead of it.

So when you call read pixels, you force the whole queued up buffer stream to -- or command buffer stream to get consumed and then the result to get read back into your pointer. And so it will not only bring the -- not return until that's finished, but it also completely flushes out the graphics system and then you're going the next time you want to render to try to get ahead of it.

With pixel buffer objects, you can do this in a way that won't stop the command stream from getting executed. It won't force it to finish. The way you do this is you bind a pixel buffer object as the pixel pack buffer. Pack is for OpenGL giving you data.

Unpack is for OpenGL taking data from you. So pack buffer. And then you call read_pixels. Instead of sending read_pixels a pointer, when you're using a pixel buffer object, you send it an offset. So we set offset to zero. So we set write this to the beginning of my pixel buffer object.

And then the important part of this is that you have to do other work. So the read_pixels command will get queued up. And then you want to go off and do some other things. And then when you call map_buffer, this is the call that will either return immediately if the buffer got cleared out and the read came back.

This will return immediately. Otherwise, it'll block here waiting on that to happen. And then when you're finished that, you call unmap_buffer telling GL you're done with the pointer that it returned to you. And then that should go off into to be undefined again. And you're done with accessing any of the kind of data that it put in there.

So just to kind of finish off the BufferObjects here, it allows you to do some things that we had OS X specific extensions to do. I mentioned the Pixel BufferObject to do asynchronous read pixels. We used to suggest using copy text and get text image to do that.

Pixel BufferObjects is cross platform and well supported. Apple vertex array range was an extension that we used to have to allow you to specify some things about your vertex array to OpenGL like, okay, I want you to map this into the GPU space and then you had to manually flush when you changed it. And vertex BufferObjects accomplished the same thing.

Axle -- Apple Pixel Buffers were the -- was the API to do off screen drawing. And Frame BufferObjects provides the same thing. We'll talk about that in a minute. But Frame BufferObjects has other advantages for sharing objects and things like that. And then the text range extension is somewhat encompassed by Pixel BufferObjects.

So, I If you haven't learned everything you want to know about this now, the ARB specifications for these extensions are really good. They include pseudo-sample code that is illustrative. The FlushBuffer range extension is posted on developer.apple.com. Also, I believe, with some sample code in it to show you, you know, everything you want to know about that.

And the code I mentioned to do the overlapped PBO and VBO behavior is up on developer.apple.com. And then NVIDIA's developer Web site has a paper that kind of dives into the details of the differences between string draw, static draw, and, you know, stream copy and those different things. So if you are interested in that topic, that's a good place to go for that information.

Okay, so buffer objects allow you to get some asynchronous behavior going. So, you know, allow the GPU to keep working and reduce the amount of data that gets copied from system memory to GPU memory and reduces the number of stalls in the pipe. So we're going to talk about another way to keep the graphics card humming along.

And we titled this section data recycling. But what we really mean by this is using the GPU to generate some data, capturing that data on the GPU, and then sending it back into the system in subsequent rendering and avoiding the round trips to the CPU enhances your performance, keeps the card busy.

So the first thing is framebuffer objects. So this is the encapsulation of a render target. And what that means is you can render to some data that stays on the GPU and then you can reuse that as source data, perhaps in a texture, in other rendering that follows. So one of the nice things about framebuffer objects over pbuffers is that there's no context switch overhead. So the FBOs all live in the same OpenGL context.

So you make one context and all your objects, textures and programs, et cetera, are shared there without having to go to all the pain that I'm sure some of you have to share data between multiple GL contexts. And we mentioned that it was cross-platform array for off-screen drawing and simple, you and easy to use as a texture.

and I'm gonna go through just a brief code sample on the slide here. This is the simple code you execute to create an FBO. You can attach a -- as the drawable you can attach a render buffer or a texture. In this example, we're going to create a texture.

This is just the same as creating any other texture in GL except we're specifying null as the storage for the texture because it's gonna use the FBO as the storage. So we created the texture. And then this call binds the texture as an attachment to the FBO. So drawing to this FBO as the drawable is going to get captured in the texture.

So now we're going to draw something in it. So we're going to draw a duck. Any rendering that you do here will go into the current draw buffer, which is the FBO at this point. But now we're going to go back to the system draw, so the normal color buffer. Call blind frame buffer zero. It takes you back to the default system drawable.

And then we're going to bind texture to the texture and use it to draw into the color buffer. So that's interesting. Hopefully you use the results graph lots of times to save you some rendering time. Before we leave FBOs, I should mention that you can have in FBOs per context and switch between them at will.

And you can also--each FBO can have depth stencil attachments as well as color attachments. And so it's a full functioning drawable. So right now, I'm going to bring Alex Eddy up here, and he's going to take you through even more ways to capture intermediate rendering results and reuse them in hopefully an interesting way.

Okay, thanks, Kent. So continuing on with the theme of generating data on the GPU and recycling it, I want to talk in particular about two modern OpenGL extensions, Transform Feedback and Geometry Shaders. And I'll finish up with a little demo demonstrating how to combine those two things. So first up is Transform Feedback.

So this extension, ext transform feedback, is really like a modern version of the feedback mode that's been built into OpenGL since the very beginning. But what is feedback? What does that mean? Well, we're talking about recycling data. This is really a way to get at the intermediate results of the vertex transformation stage.

If you think about the pipeline, you're passing in vertex positions, for example, and you might do a model of your matrix times that position. And normally that result is sent on to be -- turns to a triangle or an asterisk. But this is a way to get that intermediate result of that vertex transformation and get it back to the CPU.

The big difference here with this extension is that instead of sending the data to the CPU, you can write it directly into a buffer object. So once you have data in a buffer object, you can really use it any way you like. It's up to you, depending on how your data flow wants to be structured.

If you want to, you can cache that data and then reuse it multiple times. And typically, you're going to feed it into another shader for additional processing. So, buffer objects are really flexible, and you have several different choices of how you feed data into additional shaders to that point. You could treat that data in the buffer as vertex attributes and use it as a vertex buffer object.

Or you could use it as a pixel buffer object and call techimage using that buffer object as the data source and feed that data in as pixels and a texture to a shader. There's also a new extension, extbinaluniform, which will let you pass that buffer object in as an arbitrary array of data in a uniform to a new shader.

This extension also has a switch which allows you to optionally discard rasterization. So instead of capturing the data from transfer feedback into a buffer and not seeing any result on the screen, you can simultaneously capture it to a buffer and rasterize it as triangles on the screen. So depending, again, what you want to do with your data flow, one or the other might be a better choice for you.

I also want to point out that we've been shipping this extension for a little while now in Leopard, but the specification was just recently posted online. So if you're trying to find it, it's now up there on opengl.org. You can go there and check out all the details.

So there are a lot of different ways to use this in an application, but to give an illustration of one concrete data flow here, this is kind of the standard pipeline of how data moves through the screen. You start out with some data in a buffer object here, let's say it's vertex positions again, and you feed that to your vertex shader.

You use some transformations like you multiply with the Model-View projection matrix, maybe you do some lighting, whatever you're going to do there. And normally those vertices come out and they're assembled into primitives like lines or triangles. They go to the rasterizer, get rasterized by the fragment shader, and the results are into the display.

Well, with transfer feedback now you can get a tap into that pipeline right after the Vertex Shader executes. You can write those results, all the variants that are coming out of the Vertex Shader, directly into a buffer object. Like I was saying then, you typically are going to pass this on to another shader. And you can see how now you can build up a loop in the pipeline here internally on the GPU.

And repeat that as many times as you like. Transforming some data with a shader, get it into a buffer object, transform it again in a different way, get it into another buffer object. Maybe you cache it and reuse it several times. So this is pretty flexible. And let me jump into some code here to show how to set it up and use it.

The first thing I'm doing here is defining a constant array of strings. These are the literal names of the variants in Unreal Associated that you want to capture. You send that array into this new API, transfer feedback variances, along with a mode parameter. And the mode parameter here is either going to be interleaved or separate.

So in this example, I'm only capturing one variant, the position. So it doesn't really make any difference if I use interleaved or separate. But if you're capturing multiple things, you have a choice of writing all those variances in interleaved order into one buffer object. Or if it suits your data flow better, you could write each individual variant into a separate buffer.

It's up to you. After you've told OpenGL which variants you're interested in capturing, you need to link the program. And also, at that point, you would gen some buffer objects and be sure to call buffer data allocating enough space to capture all of the vertices you're going to be capturing.

So a little later on at draw time, you would set up drawing like usual, binding a source buffer and setting the pointers for all the attributes that you're going to be using, and also bind this new transform feedback target with the destination buffer you're going to be writing into.

Once the source and destination pointers are all set up for OpenGL to draw, you bracket your drawing with these two new calls, beginTransformFeedback and endTransformFeedback. Everything that you draw in between that will be processed by the shader, and the results will be written down to the buffer object for use however you like. So there are a couple more details here which you can look up in the specification, but that's really about it. It's pretty easy to set this up and use it. You don't need very much code at all.

So moving on to an example of something you might be interested in capturing with transfer of feedback, we have geometry shaders. So what is a Geometer shader? This is actually a really interesting and kind of powerful new development in the hardware that OpenGL is exposing. And it's a completely new stage in the pipeline that happens in between the vertex shader and the fragment shader. And so for the first time now, you can process entire primitives like a line segment or a triangle.

There's also some new adjacency modes which allow you to process the vertices around a line or around a triangle, which really enables you to do much more complicated types of algorithms on the GPU. Once you have more than one vertex to work on, you can do things like calculate the area of a triangle or the face number of a triangle, or you can start to look at the curvature of a line or the curvature of a 3D mesh. So inside the shader you have an array of inputs and all the vertices are available for you to look at there.

A geometry shader also has the ability to dynamically create new vertices during execution. So inside of your shader, you have full flow control ability, so you can build up new primitives depending on any kind of logic that you want to write. And the output of the geometry shader is vertices, which are then assembled as usual into points or lines or triangle strips.

So this kind of generalizes to a one-in-and-out problem, which, as I said, is really a lot more powerful than the old pipeline where you're working on one vertex at a time. So diving in a little bit, we're going to cover some key points here. There are a couple of new functions added to the GLSL Shading Language. And most important there is this emit vertex function.

So to explain that, I'm going to draw a parallel with how vertex shaders work. In a vertex shader, you're setting all of the variants that you care about. Like you might output the position and the color, maybe some texture coordinates. And the vertex shader is done executing, a vertex pops out and it's done.

The geometry shader works the same way, except that you can explicitly create a vertex by calling this emit vertex function. After that point, you're free to then set a new position or a new color or a modified texture coordinate and emit another vertex, and so on and so on, building up as much as you like within some reasonable hardware limit.

So another key language difference here is that all of the outputs from the vertex shader are visible as arrays of input in the geometry shader. Because you're going to be working on multiple vertices, like three vertices in a triangle, you can access them with this array notation. So if I was up in positions in the vertex shader, those are visible as position in with this array notation, and I access 0 and 1 and 2 elements to get all three vertices in a triangle.

The other new API here is Program Parameter, which you need in order to tell OpenGL what type of geometry shader this is. Since you're working on an array here, the geometry shader has to work on a specific type of primitive. And this is how you do that. You need to tell OpenGL what is the input type. Are you going to be processing points or lines or triangles or adjacency modes? and also tell OpenGL what the output type is so that it knows how to assemble the vertices that you're creating inside the geometry shader.

There's also this limit, max vertices out, and... You need to tell OpenGL what is the maximum number of vertices that you're going to be generating during one execution of the geometry shader. So there is a hardware-specific limit to that number, but in general, you want to try to keep that as low as possible to allow maximum efficiency to get as many hardware threads executing in your shader simultaneously as possible. After you've told OpenGL these parameters, you need to link your program. And then later on at draw time, you draw like usual using a primitive mode that matches the input type of the geometry shader that you've specified.

So looking at a very simple shader here as an example. Here this shader is working on points coming in, and it converts every point to a triangle going out. So you can see here that I'm taking the position that was written by the vertex shader, and it's being accessed through this array, position in. Since I'm working on points, I'm only looking at the 0-th element, and it's just adding a very simple, hard coded offset in the x and y directions. So this is emitting the same vertex three times, and it's of splatting a triangle around that point.

Not very useful, but it's an illustration. So you can check out the extension specification online for all the rest of the details here. And I want to spend a little time talking about some of the things you can do with this technology now. So here's a simple example to start out with. I mentioned that you have access to the tessellation-- or sorry, you have access to the curvature of your mesh. You pass in data with adjacency. So it's possible to do tessellation now on the GPU, so a very simple, paper doll stick figure mesh going into the shader.

And you can calculate a kind of bicubic spline running through all those points and then dynamically generate as many line shapes as you feel like to make it look good. And you also have flow control in the shader, so you can conditionally deform or move or warp or whatever you feel like specific line segments just depending on any logic you want.

So in this case, I'm growing hair only on the head of that model. So looking at a little more complicated example, it's also possible now to accelerate some algorithms that we've had for a while but had to be executed on the CPU. So a good example here is Shadow Vauln Extrusion, which you've probably seen in some games like Doom 3.

this type of algorithm, you need to know really the topology of the model that you're looking at in order to figure out something like the silhouette from the light's point of view. So in the geometry shader you can figure that out because you have access to the triangle and the adjacent triangles.

So with any given triangle you can figure out the face normal, and if you take the dot product of the face normal with the light position, you can figure out if a given triangle is facing towards the light and the adjacent triangle is facing away from the light, then you know you're at the silhouette edge.

Once you find the You can then dynamically generate in the Jumper Shader new triangles only for triangles that have an edge of the silhouette and project a volume away from the light source. You can write that into the stencil buffer and end up with nice self-shadowing characters like in video games now. So there's really an infinite number of things that you could do here. This is a couple simple examples. And to give you one more idea trying to combine this with transfer of feedback, I'm going to move over to the demo machine.

So I have here an everyday, ordinary wine glass. And I was working together with my coworker John Rososco on this demo, and we were trying to come up with a visually interesting way to combine a couple of these extensions. And John had the good idea of trying to simulate what happens to glass as you heat it up to melting point. So, in addition to the wine glass, I have a virtual blowtorch.

Which I can rub over the glass and start to heat it up. And it'll start to melt and deform. And it's always going to move down towards the bottom of the screen due to gravity. So if I heat it up a little bit, I can kind of rotate it around like this.

You can watch it kind of melt and deform and fold into itself, right? So if I'm really clever, I can kind of set up a little virtual glass blowing kit here, if I get the spin right on this thing. And I can, you know, heat up the rim maybe.

Over time, the heat will dissipate and the stem now is kind of fused into a new, deformed position. And then you heat up the RAM, flip it up like this, watch it kind of melt on top of itself. Right? Pretty cool, right? So this is kind of neat looking, but how is this working? What's going on here? So to break this down, let me turn off all the eye candy for a second.

And what you can see here is that I'm starting out with a really simple, coarsely tessellated input mesh. And there are basically two stages in this algorithm that make this work. The first stage is dealing with heating up the vertices and deforming them into a new shape. The second stage is using an enjompto shader to tessellate that result into a higher polygon version of itself.

So looking at the first stage really briefly, there's a vertex shader which, for every vertex in this mesh, figures out how close is the heat source. And if it's close enough, it'll start to inject heat into it. And there's a gravity vector which pulls all the hot vertices downwards.

So what you're looking at here is a visualization of the heat at every vertex. So the results of this shader execution are being captured with transfer feedback into a buffer object. And so what's going on here is that each frame, the input mesh is fed into the shader and it's melted and deformed a little bit.

And the good thing about this is I have a reset button. And the deformed shape is captured back into a buffer. And then the next frame, we feed that buffer into the same shader, and we warp and deform it a little bit more. So we're basically tracking the current state of the mesh at all times in a buffer object, which can be kept in video memory on the GPU. So there's no round trip of data here. It's all fast and local.

So this same shader also deals with recalculating the normals on every vertex in every frame. And this is done by feeding that same BufferObject in as a texture to the shader using PixelBufferObject. And I can do a couple texture samples around the vertex to figure out all the connectivity of the mesh there. And I can calculate a bunch of face normals. I can average those into a vertex normal.

So I need to do this in order to keep the normals accurate regardless of how much the mesh has been distorted. I need this for the next step, which is tessellation. So to explain tessellation a little bit, let me jump to a very simple example, just one triangle.

So every triangle has a face normal, which here are in purple. And you can calculate the average vertex normals like I just described, which here are in yellow. So once you have those vertex normals, if you imagine those interpolating across the face of this main triangle, you can kind of reconstruct an idealized curved surface that fits the curvature of this local area with the mesh.

And inside the geometry shader, you can then emit a bunch of triangles to try to approximate that ideal curved surface by displacing the triangle in the direction of the normal. And you can do that at some arbitrary level of detail as long as you stay within a reasonable hardware limit. So there are a couple of different ways to do this tessellation, but this particular version is called NPatches.

And ATI has a white paper written up about it if you're interested. It's up on the AMD website now. So that's tessellation on one triangle. You apply the same algorithm then to every triangle in this input mesh. And I start out with a pretty coarse, chunky-looking thing. Tessellation might take it from a couple hundred input vertices to tens of thousands of output vertices here.

So we've got a deformation shader, plus normal recalculation, plus a jump shader doing tessellation here. Then you combine that with a fragment shader, which is here. It's using a cubic environment map to simulate reflection and refraction. And actually here, The refraction is doing a little bit of a dispersive chromatic effect. So, you know, there's a separate ray for the red, green, and blue. I can play around with the index refraction here if I want to. So combine all these things together and you get a kind of good-looking eye candy demo.

So what do you think? So I'll just let this go for a little bit maybe. It's fun to play around with. So in general, I think the message here is that you're not really limited to the old fixed-solution pipeline in any way anymore. GLSL and shaders have kind of blown all that away. And now, additionally, there's enough flexibility in the way you use the pipeline that you can build up really complicated multi-stage algorithms and keep them all on the GPU.

So generate some data, stick it in a buffer object, do a second pass on it, cache it, use it ten different times, ten different ways, whatever you feel like. You can implement really more advanced algorithms this way now. So that's about it. I just encourage you all to go check out the specifications, get the details, and really start playing around with the stuff yourself. So that's it. I'm going to hand it back to Kent now.

So now we go from a cool spinning melting wine glasses back to slides. That'll go well. Okay, so we're going to talk about some more ways to sound the GPU now. There are some new features of GLSL that allow you to implement more classes of algorithms on the GPU. So I'm gonna talk about GPU Shader 4 for a second along with Texture Integer.

But GPU Shader 4 is an extension to GLSL that allows you to do full integer operations in a shader. You can do ands, shifts, you know, knots, all that fun binary stuff. And the Texture Integer extension, the distinction of that between the regular integer data that you could already pass in OpenGL is that the data is left intact with no conversions. If you use the GLN data type, That really signals to OpenGL that you've passed it a color that is, you know, that the value of zero is 0.0 and FFFFF is 1.0 color data. But the texture-injury extension destroys that implicit conversion that might happen otherwise.

So,

[Transcript missing]

Another extension that we are just adding support for, it's going to be in Snow Leopard, but it also started to appear in 10.5.3, is FrameBuffer_Blit. So this extension has two tricks, basically. It separates the draw and read framebuffers, so now you can have a separate framebuffer that's going to be read from with all the read operations, and then the draw operations will take effect on the draw framebuffer.

It also adds the Blit framebuffer function, which is a fast copy between framebuffer objects. So this is kind of like a 2D Blit API, a stretch Blit or copy bits or whatever. I kind of think of it like copy bits. Anyway, so it not only can copy straight away, it can also scale. It can also copy depth and stencil buffers, any of the attachments to the framebuffer object. And you can also specify a filter. When you use, when you're copying depth and stencil, the nearest filter is all that applies.

But in this next example, if you, if you're just copying color, you can apply a linear filter to that and you'll get some nice, you know, linear style. So you can see that interpolation on the source data. And also in this example, it's inverting the data, so it's flipping it, I guess, left or right in this one.

So it scales it down and it's flipping it. So this might be an interesting way if you wanted to use it to address some, you know, there's a problem with flipped data in OpenGL. It kind of reads it upside down from what you might expect. This might be something you could do to try to program around that.

Frame Buffer Multi-Sample is an extension that allows you to create off-screen, multi-sampled FBOs. So it's an extension to the Render Buffer Storage API, which is the type of attachment we didn't show before, we showed a texture attachment. But this will create a multi-sampled FBO. And it basically is just like the other Render Buffer Storage API, but you pass in the number of samples that you want. This example passes in four. You can pass in, you know, any number the card supports or you can pass in one, which is a special value that means, you know, give me any multi-sample format you support.

So there's one more API that was new in Leopard and that's the object purgeable extension. So I don't believe we talked about this last year, but this is an OX10 specific extension. And basically what it allows you to do is promise GL that on this object, say a texture, that you are not going to require that GL retains the storage right now.

So this is great if you want to really bombard the system with textures. That you don't know if you're going to use all of them or not. Or you might create something and then want to allow GL to throw it away if it needs the space for something.

If you're working in a constrained memory space or whatever reason you'd want to do that. So in this case, if GL does decide to throw away the storage, it's still going to retain the state and the name of the texture. Everything associated with it except for the storage. So this probably will work. It's going to be a little bit easier to look at if you just look at the code.

What this does is this code is going to create a texture. So maybe you're starting up a photo application. So it's going to kind of chug through your library in the background and maybe create textures. Big textures. But instead of loading the system up with it, it's going to make them all purgeable as it loads them. So object purgeable apple. Volatile apple. So that's a promise from your application. Before you use the texture, you're going to check to see if it needs to redefine the storage.

So in step two, time has passed and you've decided that lo and behold you do want to use that object that you made. So you call object unpurgable with retained apple as the flag. So this is going to do two things. First of all, it's going to return to you a value that indicates whether it kept that texture around or not or that object around in this case a texture. But -- And then the second thing it's going to do is it's going to make this object unpurgable again. So when you call this, then it's not going to purge this object anymore.

So it returns -- if it returns, if it does not return retain, then you need to recreate the object. If it returns retain, you don't have to. So then after that, you draw. And then later on, if you're finished with the object and you want to go back to that state where you want GL to throw it away if it can, you just call object purgeable again with the volatile flag.

So, brings us to using OpenGL with OpenCL. So, we've talked about asynchronous behavior using vertex buffer objects, pixel buffer objects. We talked about generating data with geometry shaders, so new data. We talked about using transform feedback and framebuffer objects to capture intermediate results. And this is another way that we have to generate some data in the system. So since OpenGL and OpenCL are obviously running on the same graphics card, one of the major advantages we have is that the storage objects can be shared.

So you can share OpenCL vertex buffer objects with OpenCL arrays and I didn't realize when I was preparing this that this was actually before the OpenCL sessions, but -- so I'll try to take that into account here, but the vertex buffer objects -- so OpenCL array is a 1D piece of memory to OpenCL. Textures and render buffers can be images and that's kind of self-explanatory. It's a 2D array or 3D array in OpenCL.

All the commands that deal with using GL and CL objects together are included in the CL framework, OpenCL framework, in that file, OpenCL_GL.h. So, and this is just a small piece of code that allows you to create a--create a CL context that shares objects with the GL context, so with a pre-existing GL context, which you might have got CGL context from calling maybe CGL get current context, or maybe you saved it earlier, but--so you call CL create device group from CGL context, and that returns a group of cards, which will be the same group that you created your GL context on. And then you pass that in to CL create context, and then you've got it set up to share objects.

So this is kind of how that would work. So you make a buffer object, a vertex buffer object, just with standard GL API there. Genbuffers, bindbuffer, and GL buffer data. So to make the object sharing work well between GL and CL, then we're going to use a static draw VBO.

So we're going to give it the static draw hint there. And then in step two there, we're creating a CL array. So the important thing there is in orange, the mem alloc reference, which tells CL that the storage for this object is going to come from another object instead of, you know, having to allocate its own.

Then when we're ready to use that together, we're going to attach the array and VBO together. So seal attach deal buffer, compute context, array to VBO. So then they internally get references to the same storage object on the card, storage object on the card. And those two calls are, I'll go over with some code here in a second on the demo machine, but that's how you execute a seal kernel.

And then sometime later, before you dispose everything, you don't have to do it right after you execute the kernel, but that detaches the array from the other storage object. And then you can use the storage object. So, I have some code over here on the demo machine that illustrates this.

Wow, that's neat. Let's open it like this. Okay. So this is a short program. You know, it's just maybe 100 lines or 150. There's two parts to this I wanted to point out. This is more or less the same piece of code that we just looked at on the slide, but this is how simple it is to attach a VBO to a CL array. It uses the same flags, the static draw VBO to the MIM-alloc reference CL array that we saw before. And then we're going to attach them right away.

And then the other thing that's interesting here, and I guess since we haven't looked at any CL code before, probably, this is the routine in this program that executes the CL kernel. So, you know, you set up an array of sizes of the data, and then you set the values for that data in the values array, and then you call setKernelArgs. And real quickly, I'll show you the CL kernel itself for this example.

And so... You can see here that the kernel args end up being the arguments to the CL kernel that it executes. So that's how you pass data into the OpenCL function. It's kind of, I guess, I've drawn analogous to me to using uniforms passing into a shader. So that is-- so you set the KernelArgs, and then you execute the kernel. And then the results of that get written into the VBO.

And then in the DisplayRoutine here, we're just calling updateMesh, and then we're calling drawRays to just get the data out. And the result of that is this little, neat, geometric point cloud thing. So, you know, every frame, the CL kernel is running and transforming the data from the previous state to the next state. And then when it's finished, then we're just calling drawRays on the VBO, on the bound VBO, and getting the point results out.

Can we switch back to slides, please? So... For any more information about CL or anything we talked about, Alan Schaffer is our evangelist. And his email is actually not [email protected]. Maybe it is. I don't think it is. If you need to contact Alan seriously, you can talk to me afterwards and I'll get you his correct address. The Mac OpenGL mailing list, myself and most of my engineers are pretty active on that list. At least we do, I think, read every message that comes across, even though we might not have an immediate answer.

We do at least see it and we'll answer if we can. If we know the answer, it's not that we can't, but we may not know. And then there's documentation on the developer.apple.com website. We have a pretty good OpenGL and OS X programming guide that explains some of these offscreen rendering techniques as well as, you know, a lot of platform things for working on OS X. There's extension documentation up there for most of the extensions that we support and all of the Apple specific extensions that we have. And there's also a sample code, the sample code we mentioned, as well as some others up there.