Configure player

Close

WWDC Index does not host video files

If you have access to video files, you can configure a URL pattern to be used in a video player.

URL pattern

preview

Use any of these variables in your URL pattern, the pattern is stored in your browsers' local storage.

$id
ID of session: wwdc2008-719
$eventId
ID of event: wwdc2008
$eventContentId
ID of session without event part: 719
$eventShortId
Shortened ID of event: wwdc08
$year
Year of session: 2008
$extension
Extension of original filename: m4v
$filenameAlmostEvery
Filename from "(Almost) Every..." gist: [2008] [Session 719] Advances in...

WWDC08 • Session 719

Advances in Modern OpenGL

Media • 59:38

Understand how advances in OpenGL unlock the rendering power of the GPU. Tackle GPU-based vertex and fragment processing with the OpenGL Shading Language (GLSL) and use the most current capabilities of OpenGL to modernize your code. Learn techniques for integrating the high-performance 3D graphics pipeline with the other graphics frameworks on Mac OS X. A must-attend session for Mac OpenGL developers to learn how to take advantage of the recent innovations in graphics hardware.

Speakers: Kent Miller, Alex Eddy

Unlisted on Apple Developer site

Downloads from Apple

SD Video (729.1 MB)

Transcript

This transcript was generated using Whisper, it may have transcription errors.

So good afternoon. Thanks for coming to Advances in Modern OpenGL. My name is Kent Miller, and, uh, Let's just dive right in. As a way to kind of frame our discussion today, spend a second talking about past OpenGL. So past OpenGL, you know, early '90s, kind of looked like this. Immediate mode, multiple calls per vertex, and allowed you to do neat things like lit shaded whales.

At that point, a lot of the OpenGL pipeline was implemented in software, and a lot of the real-time bottleneck was spent doing computation on the incoming data, vertex transformation, lighting. all that kind of stuff. Contrasted to today, all that computation is more or less free. And the way applications use OpenGL is to just bombard the card with data. Big textures, lots of them, lots of geometry. Sometimes the textures and geometry change per frame. So the bottlenecks really moved to becoming a data management problem to get the best performance out of the graphic system.

So we just talked about immediate mode, but today, a well-performing application will use vertex buffer objects and vertex arrays to use little function calls to specify the vertex data. And instead of fixed function OpenGL, modern apps are using shaders. All the current graphics hardware, when you use fixed function OpenGL, it just internally makes a shader out of your state anyway. So a lot of times it's more straightforward and executes quicker on the graphics card to just make your own shader to do exactly what you want it to do.

many function calls to change state. So instead of doing that, a modern application will use buffer objects and all the other types of objects in the system that allow you to change state in batches. And programs allow you to switch, you know, from one complete transform state to another one. And instead of making multiple calls to change the fixed function state.

Instead of blocking calls for pixel data, so to specify or get something back from the graphics card, modern applications use pixel buffer objects to do that asynchronously. We'll talk about that a little bit. Then, you know, an old way to do off-screen drawing was to use pbuffers, which were platform-specific and worked differently on OS X than it worked on, you know, other platforms. But today, OpenGL provides framebuffer objects, which allow you to do a lot of the same things in a cross-platform way.

So in this session, we're going to talk about keeping the graphics cards busy and a couple different ways to do that. The Buffer Object API will cover to allow you to... to manage your memory in the graphics system and... provide updates to your data in a way that doesn't block the drawing pipeline. We're going to talk about how to generate data using the graphics card, capture intermediate results, and feed them back into the system.

And then we're gonna talk about some more new OpenGL APIs that allow you to manipulate the intermediate results on the graphics card without any round trips to the CPU. And then I'm gonna show a little example of using OpenGL with OpenCL, using OpenCL compute kernels to generate some data, and then using OpenGL to visualize it.

So here we go with buffer objects. So first let's talk about vertex buffer objects. Vertex buffer objects encapsulates vertex state, vertex array state. And it allows you to do two things that you really can't do any other way. One is that it allows you to tell OpenGL when you're going to change the data. If OpenGL doesn't know when the data in your vertex array is going to be modified, it really has no choice but to use it in place every time.

And that's not good for performance. And then it also has a memory management policy APIs that allow you to hint to OpenGL how you're going to use that vertex buffer object so it can store it in an optimal place. Also allows you to update your data in a non-blocking way.

So when you start to think about the computer, so it's a diagram, simple diagram of a computer, there's a bus between the CPU through the system controller to the RAM and CPU to the GPU and GPU to GPU memory. But, you know, there's, you can't directly access the RAM from the GPU. You have to go through the system controller. So that leads to a trade-off depending on what you want to do with your data. If you store the data right there in the system RAM, then it's great for your program to access. It can quickly manipulate anything it wants to there. But the drawback to that is the GPU doesn't have direct access to it. It has to go through the system controller and either directly access the data in small chunks or make a copy up to the GPU to use that data. On the other hand, if you store the data in the graphics card, that's great for the GPU. But if you want to modify it with the CPU, then it takes more time. You either have to update it and copy it or do slower writes through the GPU to the GPU memory.

So the Buffer Object API allows you to hint to GL where you want to store the data, the vertex data. In this case, we're talking about vertex buffer objects. If you use the static draw hint, that tells GL that you're really intending just to specify this data once and use it in place. You're not planning to dynamically change the data and re-upload it. That allows GL to make the optimum place put the data probably on the graphics card directly, or the graphics card memory. The stream draw and dynamic draw tell GL that you're going to be manipulating this data, so sometimes you want to change it. The buffer object spec itself says that the stream draw is to be used when you're going to modify the data every frame. So you're going to do Draw, then modify, then draw, then modify. And dynamic draw is supposed to mean that you're going to just modify the data occasionally. So you do modify, and then draw, draw several times, and then modify again, followed by several more draws. There's also other hints for read and copy behavior too. And those are all outlined in the spec and some documentation that I'll mention later.

If you want to update the data in a vertex buffer object, there's really three ways to do it. We're going to go over some code with this, so just spend a second here. You can use the map and un-map buffer pair together. Buffer sub data allows you with one call to pass a small amount of data into the system and it will update your buffer in place and do all that for you. And then Apple has an OS X specific extension, flush buffer range, which allows you to take responsibility for telling GL which part of your vertex data you modified. And that allows it to copy just little bits of it and send that to the graphics card as opposed to the entire vertex buffer object. And it can have a good effect on performance, which we'll show in a second.

If you want to think about this just why do you have to use a vertex buffer object, the OpenGL standard allows you with the vertex array usage, so the same vertex array that's been in forever in OpenGL, you can really just change the data at any point and GL is expected to pick up the change. So in this little piece of code there, you call draw arrays to a set of data you defined and then your program is going to go off and modify it. And then when you call drawArrays again, OpenGL is expected to get your modifications. So what that leads to is that OpenGL really has to read your data every time it needs access to any particular piece of it. You know, the X vertex for this point. It's got to read it out of memory again.

So this chart was a little data that we took to show the difference. The previous slide said that draw arrays was fancy immediate mode and this slide really illustrates that. So you can see the yellow and red line on the chart are similar performance curves. The yellow line is just using immediate mode.

So even at the largest batch sizes you can see that it behaves similarly in performance to the drawArrays call. The blue line is using static VBOs, so the data was able to be copied to the graphics card. GL was assured that the data wasn't going to be modified when you used it, and so it was able to not make any copies and just read straight from its cached copy, which allows the performance as the batch sizes get big to really have some legs there.

So this is a simple code, simple piece of code, and I'm showing it to contrast it to using flush buffer range, but really just updating your VBO. This is a simple way to call map buffer. Then you modify your data. Map buffer returns a pointer to the data, so you use that pointer to update the data, and then you call unmap buffer when you're finished. And then the pointer that it returned to you becomes, you know, invalid, and any changes to it are undefined. crash, my crash, might at least, very least not get noticed by the graphics card.

The flush buffer range extension, though, allows you to update a portion of the VBO data and you to tell GL what changed. So in the prefix example with map and then un-map, GL's gonna be forced to copy your entire buffer when it needs to put it back on the graphics card for fast access. And this extension relieves of that responsibility because your app's going to tell exactly just which specific pieces of the vertex data data that it changed. And the code for that just looks like this. So step one is... a promise to OpenGL that you are going to manually flush your modified data. Then step two is just the same. Map buffer it, returns a pointer. Step three is the same. Modify the VBO.

Step four is where you flush, explicitly flush, what you modified. So if you had a megabyte of vertex data and you changed, you know, 256 bytes of it and you just flushed that, then OpenGL can absorb that change very quickly, you know, transfer it to the graphics card for caching or whatever it's going to do with it. And then when you call unmap buffer, that's not going to do anything because you already manually flushed what you were going to do. So what does that do for your performance? So this chart shows the yellow line is the performance of of map buffer, unmap buffer, without using flush buffer range, and the blue line is using flush buffer range. So we took this data from updating 0% of the data up to, you know, 90%. At 100%, you kind of expect the two things to converge. And the red line, curiously, shows the performance of buffer sub data, which really shows that that's not a very optimal way to update your data.

So I have to mention at this point that this kind of performance tuning can be sensitive to many things. It can be sensitive to your data alignment, interleaving, data size. So it's really, you know, if you remember the code that we saw, it's easy enough to just try the different techniques inside your code to see what's the best performance for you. But I think in our experience, map and map used with flush buffer range, you know, pretty much always provides the best answer.

So vertex buffer objects allow you to capture and manipulate, switch quickly between vertex array state. Pixel buffer objects accomplish the same thing for image data. This shares the API, a lot of API with the Vertex Buffer object. It uses the same memory management hints with the static, dynamic, and stream.

The same call to map and un-map to get access to the data. Same calls to specify the data. It's also possible to overlap the internal storage for a PBO and VBO. And what that allows you to do is, you know, generate some data with one and consume it with another. So you can generate image data, copy that to a PBO, and then reuse that as vertex data without incurring a round trip through the CPU. And we have some sample code that shows that. I think we had a demo of this last year. But it's interesting, and if you're interested, you should check that out.

So, PixelBufferObjects provide cross-platform way to implement some things that we've provided platform specific thing, platform specific ways to do before. PixelBufferObjects encompass some of the functionality of texture range and client storage. And it also is a supported cross-platform API to use to get non-blocking read pixels behavior which we'll see in a minute. And you can use a pixel buffer object with any API that uses images, takes images from the GL entries, draw pixels, read pixels, any of the texture entry points, anything that takes an image. So the read pixels thing, this is how you call read pixels.

So this has the disadvantage of when you call this, The graphics card has queued up commands, so it's working ahead and your app is running ahead of it. So when you call read pixels, you force the whole queued up buffer stream to -- or command buffer stream to get consumed and then the result to get read back into your pointer. And so it will not only bring the -- not return until that's finished, but it also completely flushes out the graphics system and then you're going to have to start again the next time you want to render to try to get ahead of it.

With pixel buffer objects, you can do this in a way that won't stop the command stream from getting executed. It won't force it to finish. The way you do this is you bind a pixel buffer object as the pixel pack buffer. Pack is for OpenGL giving you data. Unpack is for OpenGL taking data from you. So pack buffer. And then you call read pixels. Instead When you're using a pixel buffer object, you send it an offset. So we said offset of zero, so we said write this to the beginning of my pixel buffer object. Um, and then the important part of that, of this, is that you, um, have to do other work. So, uh, the readpixels command will get queued up, and then you want to go off and do some other things.

And then when you call map buffer, this is the call that will either return immediately If the data has been-- if the buffer got cleared out and the read came back, this will return immediately. Otherwise, it'll block here, waiting on that to happen. And then when you're finished that, you call unmapped buffer, telling GL you're done with the pointer that it returned to you, and then that should go off into-- to be undefined again, and you're done with accessing any of the kind of data that it put in there.

So just to kind of finish off the BufferObjects here, it allows you to do some things that we had OS X specific extensions to do. I mentioned the PixelBufferObject to do asynchronous read pixels. We used to suggest using copy text and get text image to do that. PixelBufferObjects is cross-platform and well supported. Apple vertex array range was an extension that we used to have to allow you to specify some things about your vertex array to OpenGL like, okay, I want you to map this into the GPU space and then you had to manually flush when you changed it. And vertex buffer objects accomplish the same thing. Axle -- Apple pixel buffers were the -- was the API to do off-screen drawing. And framebuffer objects provides the same thing. We'll talk about that in a minute. But framebuffer objects has other advantages for sharing objects and things like that. And then the text range extension is somewhat encompassed by pixel buffer objects. If you haven't learned everything you want to know about this now, the Arb specifications for these extensions are really good. They include pseudo-sample code that is illustrative.

The flush buffer range extension is posted on developer.apple.com also, I believe, with some sample code in it to show you, you know, everything you want to know about that. And the code I mentioned to do the overlapped PBO and VBO behavior is up on developer.apple.com. And then NVIDIA's developer website has a paper that kind of dives into the details of the differences between stream draw, static draw, and, you know, stream copy and those different things. So if you are interested in that topic, that's a good place to go for that information. Thank you.

Okay, so buffer objects allow you to get some asynchronous behavior going. So, you know, allow the GPU to keep working and reduce the amount of data that gets copied from system memory to GPU memory and reduces the number of stalls in the pipe. So we're going to talk about another way to keep the graphics card humming along. And we titled this section data recycling. But what we really mean by this is using the GPU to generate some data, capturing that data on the GPU, and then sending it back into the system in subsequent rendering and avoiding the round trips to the CPU enhances your performance, keeps the card busy.

So the first thing is framebuffer objects. So this is the encapsulation of a render target. And what that means is you can render to some data that stays on the GPU, and then you can reuse that as source data, perhaps in a texture, in other rendering that follows. So one of the nice things about framebuffer objects over pbuffers is that there's no context switch overhead. So the FBOs all live in the same OpenGL context. So you make one context and all your objects, textures, and programs, et cetera, are shared there without having to go to all the pain that I'm sure some of you have to share data between multiple GL contexts. And we mentioned that it was cross-platform array for off-screen drawing and simple and easy to use as a texture.

And I'm going to go through just a brief code sample on the slide here. This is the simple code you execute to create an FBO. You can attach a-- as the drawable, you can attach a render buffer or a texture. In this example, we're going to create a texture. This is just the same as creating any other texture in GL, except we're specifying null as the storage for the texture, because it's going to use the FBO as the storage. So we created the texture. And then this call binds the texture as an attachment to the FBO. So drawing to this FBO as the drawables is going to get captured in the texture.

So now we're going to draw something in it. So we're going to draw a duck. Any rendering that you do here will go into the current draw buffer, which is the FBO at this point. But now we're going to go back to the system drawable, so the normal color buffer. Called blind frame buffer zero takes you back to the default system drawable.

And then we're gonna bind texture to the texture and use it to draw into the color buffer. So that's interesting. You know, hopefully you use the results graph view lots of times to do-- you know, to save you some rendering time. Um, before we leave FBOs, I should mention that, um, you know, you can have, you know, in FBOs per context and switch between them at will, and you can also, uh, each FBO can have depth stencil attachments as well as color attachments, and, um, so it's a, you know, full functioning drawable. So right now, I'm going to bring Alex Eddy up here. And he's going to take you through even more ways to capture intermediate rendering results and reuse them in hopefully an interesting way.

Okay, thanks, Kent. So continuing on with the theme of generating data on the GPU and recycling it, I want to talk in particular about two modern OpenGL extensions, Transform Feedback and Geometry Shaders. And I'll finish up the little demo demonstrating how to combine those two things. So first up is Transform Feedback.

So this extension, extTransformFeedback, is really like a modern version of the feedback mode that's been built into OpenGL since the very beginning. But what is feedback? What does that mean? Well, we're talking about recycling data. So this is really a way to get at the intermediate results of the vertex transformation stage. So if you think about the pipeline, you're passing in vertex positions, for example. And you might do a model view matrix times that position. And normally, that result is sent on to be turned into a triangle and rasterized. But this is a way to get that intermediate result of that vertex transformation and get it back to the CPU. The big difference here with this extension is that instead of sending the data to the CPU, you can write it directly into a buffer object. So once you have data in a buffer object, you can really use it any way you like. It's up to you, depending on how your data flow wants to be structured.

If you want to, you can cache that data and then reuse it multiple times. And typically, you're going to feed it into another shader for additional processing. So buffer objects are really flexible, and you have several different choices of how you feed data into additional shaders to that point. You could treat that data in the buffer as vertex attributes and use it as a vertex buffer object. Or you could use it as a pixel buffer object and call tech image using that buffer object as the data source and feed that data in as pixels and a texture to a shader. There's also a new extension, extbinaluniform, which lets you pass that buffer object in as an arbitrary array of data in a uniform to a new shader.

This extension also has a switch which allows you to optionally discard rasterization. So instead of capturing the data from transfer feedback into a buffer and not seeing any result on the screen, you can simultaneously capture it to a buffer and rasterize it as triangles on the screen. So depending, again, what you want to do with your data flow, one or the other might be a better choice for you.

I also want to point out that we've been shipping this extension for a little while now in Leopard, but the specification was just recently posted online. So if you're trying to find it, it's now up there on openj.org. You can go there and check out all the details.

So there are a lot of different ways to use this in an application, but to give an illustration of one concrete data flow here, this is kind of the standard pipeline of how data moves through the screen. You start out with some data in a buffer object here. Let's say it's vertex positions again. And you feed that to your vertex shader. You do some transformations, like you multiply with the ModelView projection matrix. Maybe you do some lighting, whatever you're going to do there. And normally those vertices come out and they're assembled into primitives like lines or triangles. They go to the rasterizer, get rasterized by the fragment shader, and the results are in to the display.

Well, with transfer feedback now, you can get a tap into that pipeline right after the Vertex Shader executes. It can write those results, all the variants are coming out of the Vertex Shader directly into a buffer object. And like I was saying then, you're typically going to pass this on to another shader. And you can see how now you can build up a loop in the pipeline here internally on the GPU. You can repeat that as many times as you like. Transforming some data with a shader, get it into a buffer object, transforming it again in a different way, getting it into another buffer object. Maybe you cache it, reuse it several times. So this is pretty flexible. And let me jump into some code here to show how to set it up and use it.

The first thing I'm doing here is defining a constant array of strings. These are the literal names of the variants in JL Associated that you want to capture. You send that array into this new API, transfer feedback variances, along with a mode parameter. And the mode parameter here is either going to be interleaved or separate. So in this example, I'm only capturing one varying in the position. So it doesn't really make any difference if I use interleaved or separate. But if you're capturing multiple things, you have a choice of writing all those variances in interleaved order into one buffer object, or if it suits your data flow better, you could write each individual varying into a separate buffer. It's up to you. After you've told OpenGL which variants you're interested in capturing, you need to link the program. And also, at that point, you would gen some buffer objects. And be sure to call buffer data allocating enough space to capture all of the vertices you're going to be capturing.

So a little later on at draw time, you would set up drawing like usual, binding a source buffer, and setting the pointers for all the attributes that you're going to be using, and also bind this new transform feedback target with the destination buffer you're going to be writing into. Once the source and destination pointers are all set up for OpenGL to draw, you bracket your drawing with these two new calls, beginTransformFeedback and endTransformFeedback. Everything you draw in between that will be processed by the shader, and the results will be written down to the buffer object for use however you like. So there are a couple more details here which you can look up in the specification. But that's really about it. It's pretty easy to set this up and use it. You don't need very much code at all.

So moving on to an example of something you might be interested in capturing with transfer feedback, we have geometry shaders. So what is a geometry shader? This is actually a really interesting and kind of powerful new development in the hardware that OpenGL is exposing. And it's a completely new stage in the pipeline that happens in between the vertex shader and the fragment shader. And so the first time now, you can process entire primitives, like a line segment or a triangle. There's also some new adjacency modes, which allow you to process the vertices around a line or around a triangle, which really enables you to do much more complicated types of algorithms on the GPU. Once you have more than one vertex to work on, you can do things like calculate the area of a triangle or the face number of a triangle, or you can start to look at the curvature of a line or the curvature of a 3D mesh. So inside the shader, you have an array of inputs, and all the vertices are available for you to look at there.

A geometry shader also has the ability to dynamically create new vertices during execution. So inside of your shader, you have full flow control ability, so you can build up new primitives depending on any kind of logic that you want to write. And the output of the geometry shader is vertices, which are then assembled, as usual, into points or lines or triangle strips.

So this kind of generalizes to a one-in-n-out problem, which, as I said, is really a lot more powerful than the old pipeline, where you're working on one vertex at a time. So diving in a little bit, we're going to cover some key points here. There are a couple of new functions added to the GLSL shading language.

And most important there is this emit vertex function. So to explain that, I'm going to draw a parallel with how vertex shaders work. In a vertex shader, you're setting all of the variants that you care about. Like you might output the position and the color, maybe some texture coordinates. And the vertex shader is done executing, a vertex pops out and it's done.

The geometry shader works the same way, except that you can explicitly create a vertex by calling this emit vertex function. After that point, you're free to then set a new position, or a new color, or a modified texture coordinate, and emit another vertex, and so on and so on, building up as much as you like within some reasonable Harvard limit.

So another key language difference here is that all of the outputs from the vertex shader are visible as arrays of input in the geometry shader. Because you're going to be working on multiple vertices, like three vertices in a triangle, you can access them with this array notation. So if I was up in positions in the vertex shader, those are visible as position in with this array notation, and I access zero and one and two elements to get all three vertices in a triangle.

The other new API here is Program Parameter, which you need in order to tell OpenGL what type of geometry shader this is. Since you're working on an array here, the geometry shader has to work on a specific type of primitive. And this is how you do that. You need to tell OpenGL what is the input type. Are you going to be processing points or lines or triangles or adjacency modes?

and also tell OpenGL what the output type is so that it knows how to assemble the vertices that you're creating inside the geometry shader. There's also this limit, max vertices out, and, uh... You need to tell OpenGL what is the maximum number of vertices that you're going to be generating during one execution of the geometry shader.

So there is a hardware-specific limit to that number, but in general, you want to try to keep that as low as possible to allow maximum efficiency to get as many hardware threads executing in your shader simultaneously as possible. After you've told OpenGL these parameters, you need to link your program. And then later on at draw time, you draw like usual using a primitive mode that matches the input type of the geometry shader that you've specified.

So looking at a very simple shader here as an example, here this shader is working on points coming in, and it converts every point to a triangle going out. So you can see here that I'm taking the position that was written by the vertex shader, and it's being accessed through this array, position in. Since I'm working on points, I'm only looking at the zeroth element. And it's just adding a very simple hard-coded offset in the x and y directions. So this is emitting the same vertex three times, And it's kind of splatting a triangle around that point.

Not very useful, but it's an illustration. So you can check out the extension specification online for all the rest of the details here. I want to spend a little time talking about some of the things you can do with this technology now. So here's a simple example to start out with. I mentioned that you have access to the tessellation-- sorry, you have access to the curvature of your mesh. You pass in data with adjacency. So it's possible to do tessellation now on the GPU. So a very simple kind of paper doll stick figure mesh going into the shader. And you can calculate a kind of bicubic spline running through all those points and then dynamically generate as many lineships as you feel like to make it look good. You also have flow control in the shader. So you can conditionally deform or move or warp or whatever you feel like specific line segments just depending on any logic you want. So in this case, I'm growing hair only on the head of that model. So looking at a little more complicated example, it's also possible now to accelerate some algorithms that we've had for a while but had to be executed on the CPU. So a good example here is ShadowBond Extrusion, which you've probably seen in some games like Doom 3.

With this type of algorithm, you need to know really the topology of the model that you're looking at in order to figure out something like the silhouette from the light's point of view. So in the geometry shader, you can figure that out because you have access to the triangle and the adjacent triangles.

So with any given triangle, you can figure out the face normal. And if you take the dot product of the face normal with the light position, you can figure out if a given triangle is facing towards the light and the adjacent triangle is facing away from the light, then you know you're at the silhouette edge. Once you find the silhouette edge, You can then dynamically generate in the Jumper Shader new triangles, only for triangles that have an edge of the silhouette, and project a volume away from the light source. You can write that into the stencil buffer and end up with nice, self-shadowing characters, like in video games now. So there's really an infinite number of things that you could do here. This is a couple simple examples. And to give you one more idea trying to combine this with transfer of feedback, I'm going to move over to the demo machine.

So I have here an everyday, ordinary wine glass. And I was working together with my coworker, John Rosasco, on this demo. And we were trying to come up with a visually interesting way to combine a couple of these extensions. And John had the good idea of trying to simulate what happens to glass as you heat it up to melting point. So in addition to the wine glass, I have a virtual blowtorch.

which I can rub over the glass and start to heat it up, and it'll start to melt and deform. And it's always gonna move down towards the bottom of the screen due to gravity. So if I heat it up a little bit, I can kind of rotate it around like this. You can watch it kind of melt and deform and fold into itself, right?

So if I'm really clever, I can kind of set up a little virtual glass blowing kit here to get the spin right on this thing. And I can heat up the rim maybe. Over time, the heat will dissipate, and the stem now is kind of fused into a new, deformed position. Then you heat up the rim, flip it up like this, watch it kind of melt on top of itself. Right? Pretty cool, right? So this is kind of neat looking, but how is this working? What's going on here? So to break this down, let me turn off all the eye candy for a second.

And what you can see here is that I'm starting out with a really simple, coarsely tessellated input mesh. And there are basically two stages in this algorithm that make this work. The first stage is dealing with heating up the vertices and deforming them into a new shape. The second stage is using an enjompto shader to tessellate that result into a higher polygon version of itself.

So looking at the first stage really briefly, there's a vertex shader which, for every vertex in this mesh, figures out how close is the heat source. And if it's close enough, it'll start to inject heat into it. And there's a gravity vector which pulls all the hot vertices downwards. So what you're looking at here is a visualization of the heat at every vertex.

So the results of this shader execution are being captured with transform feedback into a buffer object. And so what's going on here is that each frame, the input mesh is fed into the shader, and it's melted and deformed a little bit. And the good thing about this is I have a reset button.

And the deformed shape is captured back into a buffer. And then the next frame, we feed that buffer into the same shader, and we warp and deform it a little bit more. So we're basically tracking the current state of the mesh at all times in a buffer object, which can be kept in video memory on the GPU. So there's no round trip of data here. It's all fast and local.

So this same shader also deals with recalculating the normals on every vertex at every frame. And this is done by feeding that same buffer object in as a texture to the shader using PixelBufferObject. And then I can do a couple texture samples around the vertex to figure out all the connectivity of the mesh there. And I can calculate a bunch of face normals. I can average those into a vertex normal. So I need to do this in order to keep the normals accurate regardless of how much the mesh has been distorted. I need this for the next step, which is tessellation. So to explain tessellation a little bit, let me jump to a very simple example, just one triangle.

So every triangle has a face normal, which here are in purple. And you can calculate the average vertex normals like I just described, which here are in yellow. So once you have those vertex normals, if you imagine those interpolating across the face of this main triangle, you can kind of reconstruct an idealized curved surface that fits the curvature of this local area of the mesh.

And inside the geometry shader, you can then emit a bunch of triangles to try to approximate that ideal curved surface by displacing the triangle in the direction of the normal. And you can do that at some arbitrary level of detail as long as you stay within a reasonable hardware limit. So there are a couple of different ways to do this tessellation, but this particular version is called NPatches. And ATI has a white paper written up about it if you're interested.

It's up on the AMD website now. So that's tessellation on one triangle. You apply the same algorithm then to every triangle in this input mesh. And I start out with a pretty coarse, chunky-looking thing. And-- Tessellation might take it from a couple hundred input vertices to tens of thousands of output vertices here.

So we've got a deformation shader, plus normal recalculation, plus a jump shader doing tessellation here. Then you combine that with a fragment shader, which is here, it's using a cubic environment map to simulate reflection and refraction. And actually here, The refraction is doing a little bit of a dispersive chromatic effect. So there's a separate ray for the red, green, and blue. I can play around with the index refraction here if I want to. So combine all these things together, and you get a good-looking eye candy demo. So what do you think?

So I'll just let this go for a little bit, maybe. It's fun to play around with. So in general, I think the message here is that you're not really limited to the old fixed-solution pipeline in any way anymore. GLSL and shaders have kind of blown all that away. And now, additionally, there's enough flexibility in the way you use the pipeline. You can build up really complicated multi-stage algorithms, keep them all on the GPU. So generate some data, stick it in a buffer object, do a second pass on it, cache it, use it ten different times, ten different ways, whatever you feel like. You can implement really more advanced algorithms this way now. So that's about it. I just encourage you all to go check out the specifications, get the details, and really start playing around with the stuff yourself. So that's it. I'm going to hand it back to Kent now.

So now we go from a cool spinning melting wine glasses back to slides. That'll go well. Okay, so we're gonna talk about some more ways to sound the GPU now, so, um... There are some new features of GLSL that allow you to implement more classes of algorithms on the GPU. So we're gonna talk about GPU Shader 4 for a second, along with Texture Integer. But GPU Shader 4 is an extension to GLSL that allows you to do full integer operations in a shader. You can do ANDs, SHIFTs, you know, NOTs, all that fun binary stuff.

And the texture integer extension, the distinction of that between the regular integer data that you could already pass in OpenGL is that the data is left intact with no conversions. If you use the GLN data type, that really signals to OpenGL that you've passed it a color that is, you know, that the value of 0 is 0.0, and FFFFF is 1.0 color data.

But the texture-engineer extension destroys that implicit conversion that might happen otherwise. So... We don't have a demo from this, but this is what a shader that uses GPU Shader 4 looks like. The first line there activates the parsing of GPU Shader 4, and so it's required, else you'll get an error on your parsing. All the UVec types were added, a full 32-bit unsigned integer, a bunch of different texture sampling functions. So in the line that text will fetch to direct call, that's a new GP shader for thing to return a sample from an integer texture. And then below that, you can see we're adding that with an integer mask that we passed in as a uniform. So what might your app do with this? So we've done some things internally and looked around on the internet. And people are doing these types of things with them. Image compression.

some guy was doing a marching cubes algorithm using GPU Shader 4. You know, you can do pattern matching. Maybe you have a new--a binary format, image format that comes from some source where it's, you know, planar data or interleaved in some strange way and you'd like to, you know, upload that directly as a texture and sample from it using integer--using integer textures as the data format and then the GPU shader for bit operations, you could natively read that texture without having to convert it with the CPU. You know, maybe that's something you need to do. I don't know if any of you saw the Apple II emulator that they showed on Monday. That makes heavy use of this, of GPU shader for it to do the integer things that are required.

Another extension that we have, are just adding support for, it's going to be in Snow Leopard, but also started to appear in 10.5.3 is Frame Buffer Blit. So this extension has two tricks, basically. It separates the draw and read frame buffers. So now you can have a separate frame buffer that's going to be read from with all the read operations and then the the draw operations will take effect on the draw framebuffer. It also adds the blit framebuffer function which is a fast copy between framebuffer objects. So this is kind of like a 2D blit API, a stretch blit or copy bits or whatever. I kind of think of it like copy bits. But also scale. It can also copy depth and stencil buffers, any of the attachments to the frame buffer object. And you can also specify a filter. When you use -- when you're copying depth and stencil, the nearest filter is all that applies.

But in this next example, if you're just copying color, you can apply a linear filter to that and you'll get some nice linear style interpolation on the source data. And also in this example, it's inverting the data, so it's flipping it, I guess, left to right in this one. So it scales it down and it's flipping it. So this might be an interesting way if you wanted to use it to address some -- you know, there's a problem with flipped data in OpenGL. It kind of reads it upside down from what you might expect. might be something you could do to try to program around that.

FrameBufferMultisample is an extension that allows you to create off-screen multisampled FBOs. So it's an extension to the RenderBufferStorage API, which is the type of attachment we didn't show before. We showed a texture attachment. But this will create a multi-sampled FBO. And it basically is just like the other render buffer storage API, but you pass in the number of samples that you want. This example passes in four. You can pass in, you know, any number the card supports or you can pass in one, which is a special value that means, you know, give me any multi-sample format you support.

So there's one more API that was new in Leopard, and that's the object purgeable extension. So I don't believe we talked about this last year, but this is an OX10-specific extension, and basically what it allows you to do is promise GL that on this object, say a texture, that you are not going to require that GL retains the storage right now. So this is great if you want to really bombard the system with textures that you don't know if you're going to use all of them or not. Or you might create something and then want to allow GL to throw it away if it needs the space for something. If you're working in a constrained memory space or whatever reason you'd want to do that. So in this case, if GL does decide to throw away the storage, it's still going to retain the state and the name of the texture, everything associated with it except for the storage. So this is probably a little bit easier to look at if you just look at the code.

What this does is this code is going to create a texture so maybe, you know, you're starting up a photo application. So it's going to, you know, kind of chug through your library in the background and maybe create textures, big textures. But instead of loading the system up with it, it's going to make them all purgeable as it loads them. So object purgeable apple, volatile apple. So that's a promise from your application that before you use the texture, you're going to check to see if it needs to redefine the storage. So in step two, you know, time has passed and you've decided that, lo and behold, you do want to use that object that you made. So you call object unperjable with retained apple as the flag. So this is going to do two things. First of all, it's going to return to you a value that indicates whether it kept that texture around or not, or that object around, in this case a texture, but whether it kept that object around or not. And then the second thing it's going to do is it's going to make this object unpurgable again. So when you call this, then it's not going to purge this object anymore. So it returns, if it returns, if it does not return retain, then you need to recreate the object. If it returns retain, you don't have to. So then after that, you And then later on, if you're finished with the object and you want to go back to that state where you want gel to throw it away if it can, you just call object purgeable again with the volatile flag.

So, brings us to using OpenGL with OpenCL. So, we've talked about asynchronous behavior using vertex buffer objects, pixel buffer objects. We talked about generating data with geometry shaders, so new data. Um, we talked about, uh, using transform feedback and frame buffer objects to capture intermediate results. And, uh, this is another way that we have to, uh, to generate some data in the system. So since, uh, OpenGL and OpenCL are obviously running on the same graphics card, one of the major advantages we have is that, um, the storage objects can be shared. So, um... you can share OpenCL vertex buffer objects with OpenCL arrays. And I didn't realize when I was preparing this that this was actually before the OpenCL sessions, so I'll try to take that into account here. But the vertex buffer objects-- so OpenCL array is a 1D piece of memory to OpenCL. Textures and render buffers can be images, and that's kind of self-explanatory. It's a 2D array or 3D array in OpenCL.

All the commands that deal with using GL and CL objects together are included in the CL framework, OpenCL framework, in that file, OpenCL_GL.h. So -- and this is just a small piece of code that allows you to create a CL context that shares objects with the GL context. So with a preexisting GL context, which you might have got CGL context from calling maybe CGL get current context, or maybe you saved it earlier. So you call CL create device group from CGL context, and that returns a group of cards, which will be the same group that you created your GL context on. And then you pass that in to CL create context, and then you've got it set up to share objects.

So this is kind of how that would work. So you make a buffer object, a vertex buffer object, just with standard GL API there. GenBuffers, bindBuffer, and GLBufferData. So to make the object sharing work well between GL and CL, then we're going to use a static draw VBO. So we're going to give it the static draw hint there. Um, and then in step two there, we're creating a CL array. So the important thing there is in orange, the mem alloc reference, which tells CL that the storage for this object is gonna come from another object instead of, you know, having to allocate its own.

Then when we're ready to use that together, we're going to attach the array and VBO together. So CL attach GL buffer, compute context, array to VBO. So then they internally get references to the same storage object on the card, storage object on the card. And those two calls are, I'll go over with some code here in a second on the demo machine, but that's how you execute a CL kernel. And then sometime later when you're -- before you dispose everything, you don't have to do it right after you execute the kernel, but that detaches the array from the other storage object. And then you can use the storage object. So I have some code over here on the demo machine that illustrates this.

that's neat. Let's open it like this. Okay. So this is a short program. You know, it's just maybe 100 lines or 150. There's two parts to this I wanted to point out. One is, you This is more or less the same piece of code that we just looked at on the slide. But this is how simple it is to attach a VBO to a CL array. It uses the same flags, the static draw VBO to the MIM alloc reference CL array that we saw before. And then we're going to attach them right away.

And then the other thing that's interesting here, and I guess I'll-- since we haven't looked at any CL code before, probably, this is the routine in this program that executes the CL kernel. So, you know, you set up an array of sizes of the data, and then you set the values for that data in the values array, and then you call setKernelArgs. And real quickly, I'll show you the CL kernel itself for this example. So... You can see here that, you know, the kernel args end up being the arguments to the CL kernel that it executes. So that's how you pass datas into the OpenCL function. It's kind of, I guess, I've drawn analogous to me to using uniforms passing into a shader.

So that is-- so you set the kernel args, and then you execute the kernel. And then the results of that get written into the VBO. And then in the display routine here, we're just calling update mesh, and then we're calling draw arrays to just get the data out. And the result of that is this little, neat geometric point cloud thing. So every frame, the CL kernel is running and transforming the data from the previous state to the next state. And then when it's finished, then we're just calling draw arrays on the bound VBO and getting the point results out.

Um, can we switch back to slides, please? So, um... For any more information about CL or anything we talked about, Allan Schaffer is our evangelist, and his email is actually not [email protected]. Maybe it is. I don't think it is. If you need to contact Alan seriously, you can talk to me afterwards, and I'll get you his correct address.

The Mac OpenGL mailing list, myself and most of my engineers are pretty active on that list. At least we do, I think, read every message that comes across, even though we might not have an immediate answer. We do at least see it, and we'll answer if we can. If we know the answer, it's not that we can't, but, you know, we may not know. And then there's documentation on the developer.apple.com website. We have a pretty good OpenGL and OS X programming guide that explains some of these off-screen rendering techniques as well as, you know, a lot of platform things for working on OS X. There's extension documentation up there for most of the extensions that we support and all of the Apple specific extensions that we have. And there's also a sample code, the sample code we mentioned as well as some others up there.