Configure player

Close

WWDC Index does not host video files

If you have access to video files, you can configure a URL pattern to be used in a video player.

URL pattern

preview

Use any of these variables in your URL pattern, the pattern is stored in your browsers' local storage.

$id
ID of session: wwdc2006-216
$eventId
ID of event: wwdc2006
$eventContentId
ID of session without event part: 216
$eventShortId
Shortened ID of event: wwdc06
$year
Year of session: 2006
$extension
Extension of original filename: mov
$filenameAlmostEvery
Filename from "(Almost) Every..." gist: ...

WWDC06 • Session 216

Using the OpenGL Shading Language

Graphics and Media • 58:26

One of the most exciting developments in OpenGL is the advent of the OpenGL Shading Language (GLSL). GLSL gives you high-level C-like access to programmable GPUs. With GLSL, you have programmatic control over vertex and fragment processing, allowing you to accelerate complex renderings, create spectacular visual effects and enable new graphics capabilities. Come learn about GLSL and its incredible capabilities and how to use it in your application.

Speakers: Nick Burns, Rav Dhiraj

Unlisted on Apple Developer site

Transcript

This transcript was generated using Whisper, it may have transcription errors.

Hello, everybody, and welcome to session 216. I'm Nick Burns, as the voice of God just said. I work on GLSL and the OpenGL team at Apple. It's really fun, really exciting. I hope you guys can learn something from it today. So before we get started, the next slide that I'm gonna show you is a GLSL vertex shader. We're gonna delve right into it. Now, a lot of you may not understand this at first, but I hope after this session, you will understand what's going on. So... Bam! Here is a rather simple GLSL vertex shader.

For those who are not familiar with GLSL, you may not understand what the words attribute, uniform, and varying mean. Well, those are just qualifiers that qualify the vertex B variable weight, text-- I didn't name these variables. Somebody else did. Anyway. So hopefully at the end of the session, beginners will understand this a little bit more. Anyway, let's go on.

So many of you are probably wondering what GLSL is. So for a very simple, quick introduction, GLSL allows you to program the GPU to replace certain parts of the pipeline that I will show you later and actually put your own control in there. So you can do your own operations. Instead of the fixed function TCL and fixed function fragment processing, you can do pretty much whatever you want. Experiment, have fun. We'll talk about this a little bit more and show you some demos. Just to get you started. So GLSL is the path forward for OpenGL. ES, OpenGL ES 2.0 is very shader-centric. So why don't we go on and see what it's all about. Maybe it's cool, maybe it's not. We'll see.

Hopefully, hopefully cool. So what I hope you'll learn in this session, for the beginners among us, I hope you'll start to understand what GLSL is. I'll give you a quick introduction, and I'll point you to some references and some examples that we have that are already on the seed, and they've been out there since Xcode 2.3.

I will give you some hints for working with hardware. Hardware is fun to work with. Think of it that way. It's not hard and mean and punch you in the stomach and in the back, those types of things. Anyway, I'll give you some tips and tricks to keep yourself alive when you're working with hardware. And lastly, I'll hand off to Rav from ATI, who will present the Toy Shop demo and some of the technologies behind it, such as the HLSL to GLSL conversion.

So what are shaders? Shaders are made up of a few parts. They're encapsulated within objects. We have shader objects, and we have program objects. And these objects take certain inputs. They produce certain outputs. And you can replace certain bits of the pipeline, which I will show you later, I promise.

The inputs to the vertex shader are vertices that aren't transformed and some uniform values, such as any uniforms that you provide, generic ones, or the built-in ones that are provided by the fixed function pipeline, matrices, and other fun uniforms and such. There are also attributes for vertex data. Any data that you want to pass in with the vertices, such as weighting for vertices for vertex skinning, and a whole bunch of other stuff, you can use them for generic purposes. We'll go into that a little bit more later.

Vertex shaders produce outputs, transformed vertices. Seeing as it replaced the fixed function pipeline, you have to actually do what the fixed function pipeline did. It took vertices in, transformed them, did some lighting calculations, interpolated some values across bearings for the fragment shader. So you just have to do the work that it did or whatever work you want to do, and then it just goes on to the fragment shader. So the fragment shader then takes these vertices that are transformed and variances that were interrelated across the face of the polygon, and outputs a fragment color, a fragment depth, whatever you want to the frame buffer and any associated depth buffers, anything attached to it.

The program object is just an encapsulator for the vertex and fragment shaders. It's not very complicated. It's just a way for the programmer to access GLSL in a very easy, intuitive way. So we'll go into this a little bit more, but first let's go with a more realistic example. You've been giving me, say, all these fancy words like attribute and varying and uniform and ooh.

So here is the fixed function pipeline. I'll build it slowly. You've probably seen this from the Redbook. So let's take a look, Sean. As I said before, we have incoming vertex data and pixel data. This is then transformed and lit by the vertex processing stage. The values that are produced from the vertex processing stage are interpolated across the face of the polygon, sent to the fragment stage, which then rasterizes and produces the fragments that go to the frame buffer, and other stuff happens to. But we're not concerned with that right now. You can't program that yet.

The vertex shader replaces the transform clipping and lighting stage, like I said before. And the fragment shader replaces the rasterization stage. So you can control the rasterization aspects, the transformation aspects. You can do a whole bunch of-- just a world of effects and cool things, like you saw in my colleague Alex Eddy's demos and whatnot. And if any of you have tried Showpiece, it's there too, along with a bunch of cool teapots. None of those bunnies. This is a zero bunny demonstration here. No bunnies. I'm not harming any bunnies either, so.

But there are giant lizards, okay? If you're going to kill the bunnies, you have to get giant lizards instead. So here is a nice giant gecko lizard-y thingy from Modo. They provided this nice image for us. Why don't we take a look at it? This is a more realistic example instead of attributes and vertices and blah, blah, blah. So here we have a mesh that has some texture coordinates. Why don't we put the mesh on it? Here you can see the scope of how many vertices we have in this case. I mean, there are probably thousands of vertices for the scene. I haven't counted each one of them up, but if one of you guys want to do that and give me the number in a second, go for it. Now, why don't we zoom in a little bit, take a look at his palette here.

You can see that you have a lot of vertices that are getting transformed here, and there are also a lot more fragments and vertices in this case. So why don't we take a little closer look here. So keep in mind that the vertex shader is going to be running once per each vertex. So whatever you write to replace the vertex shader portion of the pipeline will be run once per all these 1,000 vertices. Now let's get an idea of where the scope hits for fragment shaders. Zoom in a little bit. Okay, there we go. Now, you can see a nicely pixelated version of this triangle on the palette of this giant gecko.

Why don't we take a look at how many fragments potentially are in this, at least in one scan line. So maybe you'll get an idea of how many fragments are actually being rasterized for this gecko. As I said before, there are probably thousands of vertices in the scene.

Well, there are probably millions of fragments, a thousand times the number of vertices in the scene. The fragment shader is going to be run once per each of those. The variances are going to be interpolated for each fragment. So any of the varying output values such as text coordinates, that's why you can see the texture varying instead of just like one solid color per triangle. It actually varies across the face of the polygon. It does that with variances and text cords. So why don't we take a look at that, shall we? Ooh, look at this nasty blue line.

So, now you're getting a little introduction of what GLSL is. You probably want to know where you can actually use it. Well, it turns out it's actually been around since Tiger. So take a look in Tiger, use it, try it out. There you have GLSL 1.10. And in Leopard, we plan on getting OpenGL 2.1 support along with GLSL 1.2. Well, GLSL 1.2 has just been recently announced. That's why it's not in the seed right now. There's a lot of work for me to do, and I'll try to get it done for you people.

Now you get an idea of what's all going on and where it's supported, why don't we just jump into a little, small, simple demo. This is the Hello GLSL demo. We will hopefully have it available soon for developers on the developer examples, maybe with the next release of Xcode. We'll figure that whole thing out later. So now I want to switch to that machine. There we go. No, wrong machine. This is A. There we go.

Lots of fun demos. All right, let's take a look at Hello GLSL. Right now, I am drawing a triangle, just a blue triangle. Nothing fancy, and it's blue. Nobody really likes blue, especially this shade of blue. Whew. Don't know what Kent was thinking. Anyway, so here's our blue triangle. This is going through the fixed function pipeline, just drawing a triangle, nothing fancy. Now I'm gonna press space. Boom. Now we're using a shader. Now, we're not doing anything really fancy with the shader, But you can see that we're perturbing the colors of this triangle using varings. So why don't we delve into this a little bit more and switch back to the other machine. Um, switch back to the other machine. Slides, please. Slides.

All right. Now you saw that simple example. Let's go in behind the scenes a little bit, and I'll show you what's actually going on here. So you've seen the triangle. You saw how it went from that blue to an awesome, multicolored, psychedelic triangle. How do we actually do that? Let's take a look at the steps that are required to get GLSL into your application, and then go through how to actually do each of those six steps. The first step you're going to want to do to get GLSL in, please check for extensions.

You have to do this. Make sure the extension that you want is on the renderer that you have chosen. If you don't do this, you will-- stuff bad will happen. Bad. Check for extensions. Then we want to make some shaders. The shaders that we used for that Hello JLS example, I'll show you and walk you through them step by step. Next we'll show how to compile them, link them, and use them at the end.

Here is the vertex shader that was used for Hello GLSL. It's really not very complicated. It's actually simpler than the one you saw before. It has variances like we were talking about. And you'll notice that we have a varying vec3 color. That's the output of this vertex shader to the fragment shader. So this color will be interpolated across the fragments of the polygon.

you'll see that we have a function called main. That main will be executed for each vertex. Keep that in mind if you have a really vertex-heavy or really vertex-not-heavy application. Same for the fragment shader. We'll go into that a little bit more later. Now, you can see that the color is a function of the incoming vertex value. Now, the vertices in that example went from 0 to 1 and 0 to 1. That's why we saw colors going in, in this case, red and blue.

And lastly, we transform the position, like I said we had to do before, and put that into-- we transform the incoming vertex and put that into gl position. That's the output position of this. I'll show you some more examples of this later, but That's that for now. Now the fragment shader is even simpler. It's just one line. It takes the incoming varying value from the vertex shader, interpolated across the face of the polygon, and puts it into the fragment color. That might sound all fancy and crazy, but it's just setting a color. But it does it once per fragment, keep in mind.

Now we have some shaders, two awesome shaders, three lines total. Wow. Let's compile these shaders, shall we? Let's make some shader objects, use them, and draw that awesome little triangle that we saw. So in order to compile your shaders, all you have to really do, send the source to GL, tell it to compile your shader, That's pretty much it. Check for some errors and be a nice, good GL citizen. Next, we want to set up the program object.

First, we create a program object. Then we attach both vertex and fragment shader that we just compiled previously to this program object. This allows us to easily use this GLSL program that we made. It'll activate it when we turn it on, and it will deactivate it when we turn it off.

Next, we want to link the program. This is even easier than the other steps. Just link. Check for errors again. Make sure everything's hunky dory. Lastly, now that we have this awesome shader, compiled, linked, we've got a program object, everything's all together, let's use it and draw a triangle. So all you have to do is use program object, the program object you just made, draw some triangle, GL begin, GL vertex, or use a VBO. it will be activated and replace those portions of the fixed function pipeline that I showed you earlier. Then when you're done, deactivate the shader. That's it.

Uh, I showed you in the original shader that I didn't really go over in the Hello GLSL example are the uniform and attribute API. So you can set the generic uniform values, basically constants in your program, To do this, you just have to make a uniform in your shader like the original one showed. So it would be like uniform float A. Now here, you would just go get uniform location of A. Then it gives you the location, which would be some number. You don't really care. Don't look at the numbers. Behind the scenes, ignore the number. And then you just set the value of that uniform. So in this case, it's setting a vec4, but you could set other values, please look at the uniform API in the orange book. Similarly, you do the same type of thing with attributes, except attributes are per-vertex data, Wow, that was a little bit quicker than I expected. It's going to be-- that other side said OpenGL Profiler debugging shaders. So let's get rid of Hello GLSL, shall we? We're all done with that. We're past that. And now let's launch Profiler. Here I have Profiler and nothing else on the screen.

Okay, hopefully you people are familiar with Profiler. It's a great tool for checking what's going on with your application, looking at timing, that sort of thing. Well, we added some new features in Leopard that I'd like to show you today. So let me launch GLSL Showpiece. This has been around since Xcode 2.3.

And now we have an awesome plasma teapot. There are other shaders here as well. Please take a look at them and their associated source. Let's just zoom in here. Ooh, look at the plasma effect. Become entranced. Now I'm going to go to Views and set a breakpoint by going to the Breakpoints tab. Here I'll put a breakpoint on the CGL flush draw, which will then break the application. Not in a bad way, just, you know, it stopped. So let's go back and now look at the Resources tab.

Okay, so you can see that we have some textures here. Ooh, there's the Bodium. It's all rusty. Here's some 1D texture, and here's some interesting little gradient texture here that kind of looks like what's on the plasma teapot there. So why don't we take a look at these shaders? This is the new tab in Leopard, which you should be able to take a look at. on your seed there. And let's take a look at the program object that we created. Now, we have two program objects here.

Yeah, you can see them. All right, we have two program objects here. We have this program object, and we have this program object. So I have a 50/50 chance of finding the shader that is the plasma shader in this case. So let me roll the dice and take my chances.

Okay, hmm, let's see, it's doing lighting, hmm, it's got text cords, it's got a position, that's good. All the types of stuff we were talking about before. And here's the fragment shader, and oh, oh, there's a 1D and a 2D texture in this thing. Hmm, that's probably this plasma shader. Why don't we try modifying it and see what happens? So now what you can do with Profiler, you can look at the shaders that are in your application, you can also modify them and see what happens. So let's modify this a little bit.

We have this texture coordinate coming in, and this is a teapot. Who knows how texture coordinates are generated on a teapot face? Let's take a look. So here we're going to take the tech cord here, And as I said before, you can output to the fragment color a fragment shader. So instead of putting the color that we had there before, let's put into the R, G, and B channels the text cord. And because we need three, let's replicate S. No real good reason, just need something in the blue channel. So let's compile the shader.

Ooh, compile succeeded. Let's ignore all breakpoints and continue. Now you can see that I've modified the shader. And I was doing it. I could take a look at the shader and modify, look at any intermediate values. This is very invaluable for debugging. What is actually going on with your shader? So here you can see I have texture coordinates in an interesting gradient on this teapot. So now I know how the texture coordinates are on a teapot. And that's only partially interesting to me. text recordings on a teapot. It's not a bunny. Man.

Alex and his fancy bunny demos. All right, so I'm going to put a breakpoint in again, and let's take a look at something else. So here we're setting the RGB values to the text cord. Instead, we have this varying called light intensity coming in. I don't know what that is. Maybe one of you in the audience knows, but I don't know. So I just set this vec3 to this float value. What's going to happen here? Whoops, the compile failed. So that means I did something wrong. So instead, since it's a VEC3, why don't we cast it as such? You can cast things in a nice, type-safe way in GLSL 110. So let's compile this. And oh, compile succeeded.

Let's ignore all the rank points and continue. Ooh, pretty. Much better than the text coordinates. Much. So there you can see that I'm modifying the intermediate-- or I'm able to look at the intermediate values of the fragment shader just by changing a few things around in the profiler. So this is a very, very valuable tool. As I said before, we have the vertex shader as well. Why don't we take a look at what modifying that looks like and the sort of things you can do there. So there's a program object. OK. Here is our vertex shader. So we're broken. And here's the position. Why don't we add something to this position? Or actually, no, I'm not going to add something. Let's multiply this position by something. I want to squish a teapot right now.

So 1 comma 0.1 comma 1. So now we are going to multiply the transform vertex positions by a squished y. So let's see what that does. I'm curious. Whoops. I did something wrong. Oh, that's a vec3. Again, got to be careful with those types. Oh, there we go. Compile succeeded. Let's ignore all breakpoints and continue. Oh my gosh, it's a squished teapot. Ooh, it's all squished. See, good thing I didn't do this to the bunny, huh? Wow. OK, so now let's move on from that demo. You have an idea of what you can do in Profiler. So let's go back to the slides.

There we go. So I just showed you how you can use Profiler in Leopard to debug your shaders. So what else we got in Leopard for you? As I said before, I touched on GLSL 120. This is going to be part of OpenGL 2.1, which we're going to have in Leopard. So in order to try GLSL 120 out, the compiler, all you have to do is put #Version120 at the beginning of your shader, and you've activated the 120 compiler. This has some different rules. Please look at the spec. Take a look at it. Understand what's going on. See if you want to use functionality. I'm going to talk a little bit more about 120 in a second.

We're also going to be adding the non-square uniform matrix support that John alluded to earlier. So that's new API that allows you to set some of the new stuff that I'm going to talk about later. So what does 120 have to offer you? First, it adds invariants. This is similar to our position invariant, and it'll do it to generic values coming out of the variances from your vertex shader. It also adds centroid variances. They're useful for multi-sample. Please look at the spec.

We have non-square matrices like John talked about earlier. These are 2 by 3 and mat 4 by 3, those types of things. And it's traditional mat 2, mat 3, and mat 4. So you have a whole range of new things that have new ways to add and multiply and all that fun stuff.

Next, we have built-in functions for matrices. Some new ones, we have transpose, and we have outer product. These are kind of useful. They can be useful in certain math cases, you know, depending on your needs. We also add some new built-in variables for fragment shaders. For example, we have a GL point chord, which is similar to the frag chord that you have in your shader, but it allows you to work in the same space for points.

There's a few more little things here, and I thought I'd give you some examples of them. We're gonna have first-class array support-- first-class object support for arrays. This allows you to pass array objects around just like you pass structures in GLSL 110. We also add the length operator for arrays. This allows you to find the size of any of the arrays that you've made, either dynamically sized or explicitly sized.

You can find out more about that in the spec. We add implicit conversion from ints to floats. You don't have to convert. In this case, it's setting a floating point vec4 to an integer vec4. So you can actually do that in GLSL 120 without it complaining and making you cast it to a actual vec4. Those will be done for you implicitly. And lastly, we add uniform initializers. So a lot of times you have uniform values, but you have some good default values, like you have a light position, you always want it to be like 0, 10, 0. You have some other values that you pass in there, maybe what texture unit a sampler is on.

You know text 0 is always going to be 0. Might as well put that right in the shader. You can express it right there, and then at link time, those values are picked up. You can still change them. They're like default values. So it's just something extra for you guys.

Now, I'm gonna talk a little bit about working with hardware. Um, so far, you've seen some simple shaders. Well, you can make really complex shaders, but the hardware is not necessarily capable of running these shaders. You're gonna need to test these in hardware. So I'll give you a few tips and tricks and abilities to figure out if you're on hardware or software. Let's take a look at that now.

So I talked about where GLSL was supported. You know, it's been in Tiger. We've had hardware support since Tiger, and we're going to continue that trend as well the GPUs are. They're going to get more and more capable, and we're going to have more and more support for GLSL in hardware.

But there are still a few little pieces here and there that aren't supported very well. So I'm going to go over those and give you some little tips and tricks, like I've been saying, to get around. Keep in mind, test the shaders on the target hardware. If you want something to run on the Mac Mini, make sure it runs on the Mac Mini. that has a less capable GPU. We'll talk about that in a second.

So first off, in-- let's see. As a way to test to see if your shader is actually running in hardware, so if the hardware actually supports the shader you have submitted to it, use the CGL get parameter on vertex processing and fragment processing, like John talked about before in the previous session.

And this will tell you if it's running in vertex-- if the vertex processing is happening on the GPU or if the fragment processing is happening on the GPU. So... Test it out, take a look. Keep in mind, it is a relatively expensive call. Call it once per shader if you need to, or just use it for development purposes.

Okay, so on to the tips and tricks. First thing to note, the GPUs, they are vector processors. When they add, they don't just do one simple, meaningless add. They do four. Four adds. So when you do an add, don't waste the other three components. So this is a little example of that. Here I have floats XYZ and A, B, and C. Then I'm just going to add them together and do some other stuff later. It's a kind of trivial example, but assuming this is what I had, or you could have other cases where you can fit things in this way.

Anyway, so instead of having individual floats to contain these values, why don't we have a vec3 for a, b, and c, and a vec3 as well, which is three floats, for x, y, and z. And then add those floating point vectors together. In this case, before we were wasting nine adds, we are only now wasting one add.

The next thing you may want to do is, you may have a rather long and complicated vertex shader. Now, vertex shaders tend to be less supported in hardware, in terms of instruction limits and potentially speed that they take. So an optimization technique that you can use is you can move work around from the fragment shader to the vertex shader. Remember those varings? Use them.

Just moving on, indirect array access is really only supported for uniform access in vertex shaders. Indirect array accesses, they're very simple. It's just like you have an array of some data and you want to indirectly access into that array like with i. You know it's at five, you know it's at six, you don't care. You want i. So be aware that indirect array access is not necessarily supported on the hardware that you wish to target. So if you're going to use indirect arrays, please use them uniform in the vertex shader.

Next, we'll just talk a little bit about conditionals, if statements, that sort of thing. These are relatively well supported on the more capable GPUs, so like the ATI Radeon X1600 and the NVIDIA GeForce 6600. So here you will find really good support for conditionals. You still want to test your shaders and hardware, but here you should be able to put breaks, continues, inside of your if statements in your shaders if you're targeting this hardware. However, if you're targeting less capable GPUs, there's a few tips that you may want to use to cut down on the instruction count and keep your shader running in hardware.

So how about I give you a little example that might help? Here we have a function. It takes one input, v, and it produces one output, component. Now, this is a very simple function. I don't know what it does. It's called func. Not a very good name. Maybe you want to use better naming schemes in your shaders and whatnot. So now we're going to just test to see if a is equal to 0, we call the function one way. If a else, we call it some other way. Now, this is a very common programming technique that's used, a switch based off of certain values in your program, maybe a uniform, whatever. However, keep in mind that on less capable GPUs and on certain other GPUs as well, this can really up your instruction count. So let's say a very large function, maybe 50 instructions or so, did lots of adds, multiplies, texture lookups, that sort of thing. It's going to inline that function into both the true and false parts of that branch, which will expand quite a lot. So instead of calling the function in each part of the branch, you can just set the parameters that will be sent to the function, and then you call that at the end of the branch, like you see here in this example. We have these in the dev notes. We will upload those shortly.

So now we talked about conditionals a little bit, so if statements, why don't we go into the more fun stuff, looping. Dynamic branching, you might have heard this term a little bit. It's supported pretty well in the more capable GPUs like the ATI Radeon X1600 and GeForce 6600. However, on the less capable GPUs, since they don't support dynamic branching, we have to unroll those loops. So you want to keep your loop iterations down. Now, even on the more capable hardware, you only have about 255 iterations for a loop. So when you make your loops, make sure they don't exceed this number. Make sure your shader runs as expected. These are things we cannot necessarily check at compile time, and the hardware doesn't really support.

So for the less capable GPUs, the Intel GMA 950, the GeForce FX 5200, the ATI Radeon 9600, there are a few techniques you can use to still use looping to make your code nice, maintainable, easy to read, and still stay in hardware. I mean, unrolling isn't necessarily a bad thing.

Why don't we see what we can do in that space? So you want to keep your loop iterations down like I was saying before. The newer hardware can support up to like 255 iterations. The older hardware is much less capable. So be very aware of that. Maximum of about 10 iterations. These are the two types of looping forms that are supported.

Well, actually, there are a few others, but these are the basic types. So you can have a while loop and a for loop, of course. Now the start value, the end value, and the iteration value, as long as those are constant and integers, then we can determine what the size is of your loop and unroll it at compile time.

So let's take a look at a little example of this, shall we? This is very similar to the Mandelbrot shader from GLSL Showpiece. If you've tried that on less capable hardware, like Radeon 9600, you'll notice that it falls back to software and gets really slow. Well, there are a few techniques you can use to make that shader run in hardware. So as I said before, it supports those types of loops, just a simple for loop with a guaranteed number of iterations. So in this case, we have a complex loop guard. We have i less than 15 and r0 is greater than 0. What's r0? I don't know. It's r0. And then whoever wrote this wants it to be greater than 0. That's fine. Whatever. So in order to make this something that you can actually support in hardware, you're going to want to break this apart. So first you want to make the i less than 5 part be just your loop guard, which means you're always going to have five iterations, and then put the r0 greater than 0 portion into an if statement into the body of the loop. This will allow you to get the same functionality at the cost of more instructions as it will be unrolled. But it will be strong hardware.

Next, let's take a look at why this-- I've been telling you about the less capable GPUs. A few numbers might help. So for the Intel GMA 950 and the ATRadeon 9600, they only have about 96 total instructions. That's not really all that much. And you can think you're going to use up instructions pretty fastly, like 96 ads. I always add 96 times when I add. Anyway, these are broken up into two different types. You have your texture instructions.

There are 32 of those, so approximately 32 texture lookups. And you have your arithmetic instructions. And there are 64 of those. So the math instructions allow you to do adding, multiplying, subtracting. Those are the general purpose instructions that you're going to be using. The texturing instructions are only really used for texture sampling. So even though you have 32 of them, you really can't use them for arithmetic operations. Also keep in mind that certain instructions can expand to more instructions because the hardware doesn't have native support for them. for example, sine and cosine, although they seem really tempting to use. It's like, "Ooh, I can just run sine," but they are potentially emulated by the hardware that you're targeting, so make sure you test your shader in the target hardware. as instructions can expand, and you want to make sure that it works.

There are a few more advanced portions of GLSL that I haven't talked about before. So we have differencing functions, we have noise functions, and we have the new ATI Shader Texture LOD extension. Rav will talk about this a little bit more later in his presentation, right after this. So, differencing. The GFDX, GFDY, FWITH. This allows you to kind of peek at your neighbor texel or your neighbor fragment that's executing in parallel.

It's used for making anti-aliased effects and sort of things like that. You can look at your neighbor fragment values and figure out some kind of operation you want to do based off of if you're on a border, like going from black to white. If you're right on that border, you can fuzzy it and make it look pretty.

So DFTX and DFDI help you do that type of thing. ATI Shader Texture LOD. This is a new extension just recently approved and whatnot. Oh, did I say ATI? ArbShaderTextureLOD, and noise. Noise is not really supported in any hardware. So keep in mind, if you use the noise function, although it's really nice and it's cool-- it will fall back to software. There is no hardware at the moment that supports noise in hardware. So instead of using noise, the noise function, you can instead... Use a noise texture. Please take a look at the GLSL showpiece example, such as Gradient, Erode, Marble. They use noise textures to simulate the same type of effect in the fragment shader, and they still run in hardware. Pretty important when you want speed.

So now I'm going to wrap up my presentation and have it over to Rav, who will go into the Toy Shop demo and talk about how that all works and talk about the conversion techniques that they used and some of the other techniques as well. So just in summary, please test your shaders on the target hardware. Make sure they work. You want to make sure that it works as you expect. Sometimes it's okay if it falls back for vertex processing. If you only have four vertices, Doing that on the CPU is not necessarily a bad thing.

Keep in mind that GLSL is easy to use. We're going to be at the lab at 3:30, so please come and visit us. Take a look at Showpiece. Take a look at the other examples that are on the developer CD, DVD. And try it out. Experiment with it. It's fun. As I showed you in Profiler, you can just mess with shaders and look at individual values, squish teapots. If you want to squish a bunny, OK, squish a bunny. Not on my time, but you know.

And lastly, as I was saying before, GLSL is the path forward for OpenGL. The newer hardware is going to support GLSL more and more, much larger instruction limits, more types of instructions they can support. It's just going to get better and better from here. So now I'm going to hand you off to Rav, who will do the ATI toy shop demo. Good job, man.

Great, it's all sweaty now. Hi, guys. My name is Rav Dhiraj, and I work at ATI Technologies in the core software team. And Apple has asked me to come by and just talk to you a bit about the challenges that we face bringing GLSL to the ATI demo engine. Oh, I don't have to click.

All right, so we're going to talk a bit about those issues. I've separated it into two areas, the engine, sort of at the API level, and of course, the shader conversion itself. I'm going to discuss dropping to software a bit and then present a new shader converter tool that hopefully will make your lives a lot easier.

OK, so starting with the engine issues, the first is creating and binding shaders. There's a fundamental difference between how DirectX and OpenGL handle shaders. On the OpenGL side with HLSL, you tend to deal with shaders as individual entities. You have your vertex shader, you have your fragment shader. You can bind them separately. You can mix and match them. You can query them separately for constant information and such. Now in the case of GLSL, this is, as Nick was talking about, it's actually a good thing. We encapsulate all of this into the notion of a program. And so you have vertex and fragment pairs, for instance, that are bound and linked together into a program. And now this becomes a bit of an issue when your shader engine, or in our case, the ATI demo engine, can mix and match these shaders arbitrarily, and even more importantly, when they can do it at runtime. So you don't know at the time that you're actually launching your application which shaders are going to be matched together. So we got around this issue by encapsulating this notion of programs in this little object that we call the GLSL interface.

And essentially what it is is a table-- internally, it has a table that maps vertex-fragment pairs to programs. So what you do is every time you bind to a unique vertex shader and fragment shader, it sort of internally figures out, is there a program? Do I need to create a program? And do I need to create a constant mapping table? And I'll talk a bit more about why we need that later.

OK, vertex attributes. There's two types. There's predefined attributes that you use in the GLSL shaders, such as GLPosition, GLColor, GLNormal. These are set using the standard API called vertex pointer, color pointer, et cetera. And then there's the generic vertex attributes, which are sort of-- you can define them as you please. Now, if you are going to use attributes-- and I think most of you will be using attributes-- you want to try to stick to one type. Use predefined or use generic. In the case of the ATI demo engine, we mixed and matched them, and it did make things a little more challenging. If you are going to use generic attributes, one thing you should keep in mind is that there are two ways of getting the location. One, as Nick pointed out, you can query for that location. And the second way is you can actually bind it to a specific location, so you have a lot more control with the bind call. And we actually took advantage of that, because it allowed us to more easily manage the mixed nature of our attributes.

All right, getting back to mapping constants, the ATI demo engine has a constant table that very closely sort of maps to how things are kept in direct X space. So this is a bit of a problem for us. We need to be able to map these sort of constants that are dealt with as individual shader objects into programs. And so for programs that might use Vertex Shader A and Fragment Shader B, and then another program that might use Vertex Shader A and Fragment Shader C, we've got to be able to map to the correct program so we can find the location and thereby set the attribute-- or sorry, the constant. So how do we do that? We start by calculating or by determining the description of the constant. And it's very easy, just like in DirectX. We can just query for the number of constants by calling glGetObject parameter and passing in the glObjectActive uniforms enumeration. Once we have the number of constants, we can loop through and call glActiveUniform to get some information about the constants, including the name of the constant, which we can then pass in our second step to the glGetUniformLocation function to get our location in our particular program. And then we can use that to build our map tables to map basically the engine's constant table to the program location.

Let's go to some of the shader issues. One of the big ones that you're going to hit almost right off the bat is type conversion. The GLSL is a type-safe language. There's no implicit conversion unless you're talking about 1.2.0. So that's a bit of an issue because HLSL is not type-safe.

And so you want to remove ambiguity when you're converting your shaders or your shaders are not going to compile. You can use constructors, and you can also specify components to do this. And here's a quick example. So we have some HLSL code right here, which is assigning in the first line an integer to a four-component float vector.

Yeah, that's something that we're normally very used to being able to do, but it's not something that's allowed in GLSL. And for the second line, we're taking that four-component float vector and assigning it to a three-component float vector, and who knows what HLS is doing underneath. Is it giving us the XYZ component? Is it giving us the ZXW component? Who knows?

So GLSL enforces this, and we can sort of fix this by, in the first case, using a constructor, so passing that integer into a constructor, and it builds an iSpec for us. And in the other case, we specifically specify the components that we want, and voila, problem solved.

All right, built-in functions, they exist. They're optimized. They're great. We want you to use them. But keep in mind that they are some limitations, especially when it comes to function arguments. One of the issues we ran into was with the POW function. It is undefined if x is less than zero.

And when you have undefined results, your shader behaves very badly, and it's very hard to track down. So there's probably three reasons why you have a negative value going into that POW instruction. One, your shader's doing something bad and something wrong, in which case you should really identify that and fix your shader. The second reason is that in DirectX, this is actually magically dealt with automatically due to some implementation differences. It's just sort of clamping that for you internally, or it's just not noticeable in DirectX land. And the third reason is you might just get some precision errors that's resulting in small negative values.

And so for the last two cases, you can fairly easily get around it by inserting a clamp to zero instruction or you can also do an absolute value instruction. Now, you don't want to just universally go through every function, figure out what the limitations are, and start doing stuff like this, because you're going to be adding extra instructions. So you really have to identify the cases where this is happening.

A bit about texturing, there are going to be some cases where you will have to hand-tweak your shaders. An example is inverting a texture in the Y direction with FBOs and P buffers. Yes, there are ways of getting around this in the API, but just as a simplistic case, you may have to deal with that.

Another example that we ran into in the ATI demo engine is the 3DC format, which is basically a normal map format. It's a two-channel format. In DX space, when you sample the texture, you get your values in red and green, In OpenGL, unfortunately, we're using a luminance alpha format underneath, and so you get your values in red and alpha. So in the shader, we actually had to do a swizzle there. Likewise, there are some formats, like the signed 888, signed 1616, 1616, format in DX that you may have to perform scale and bias operations on. I believe we had to scale and bias the texture prior to loading it in GL, and then in the shader, we have to do the sort of opposite operation to get our assigned value. Okay, dropping to software.

Now there's this nice API to figure out if you're in the GPU, and we certainly encourage you to use that, but in some cases, your only sort of real-world metric is performance. And so we really encourage you to have the ability to isolate your shaders, to be able to use GL Profiler if you have to or within your own engine to be able to load individual objects, individual shaders, and measure the performance. The second point is sort of just be aware. Traditionally, with shader programming, at least as far back as I can remember, you're working with assembly-level stuff. It's a black box. You have inputs, you have outputs. And it's very easy to be caught into the mentality that what the shader you're looking at right now is where the problem is. But if your shader engine does support and include files, you might have a uniform matrix array defined somewhere that gets touched by some function somewhere, and it gets compiled in, and boom, you drop the software. So keep in mind that you now are dealing with a high-level language, and you may have to look at include files.

From an API driver's perspective, the reason you might drop the software is, well, you're exceeding resources. In fact, it's probably the only reason you're going to drop the software. So there are APIs for querying how close you are to those limits, please use them and keep track of that.

The shader converter tool, HLSL to GLSL. This is something new that we've just developed. We didn't actually use it in the Toy Shop demo, but we have run it on a number of the shaders, and the results are very promising. It takes HLSL fragment shaders and their associated entry points, and it outputs GLSL, excuse me, shaders and the optional uniform list. Now, there are some current limitations. We don't support non-square matrices that are introduced in 120, and there's other things like complex structures with arrays and so forth that we're working on. We're definitely working on. So we still think it's going to be useful to you. We're going to try to get out a version of this. It's just a command line tool, but a version by the end of August so you guys can start playing with it and give us some feedback.

Going forward, we would like to expose this as a dynamic library so you can link it with either your application or your tool chain. And it would be nice to have batch processing, and this is a Mac, so a GUI front end. So that's basically all I had about shader conversion. I'm going to talk a bit about the Toy Shop demo, some of the techniques that we went over. The Toy Shop demo is perhaps the most complex demo we've ever created. In fact, I know it is. It is over 700 of those unique shaders all running concurrently. And it was designed and developed by the 3D Applications Group at ATI, which is a group of about six developers and four artists, which is pretty impressive given that.

So we're going to start by showing you a little video. It's a technology video. I'd like you to focus on the rain effects in that video, and then we're going to go on and talk about a couple different techniques, the water droplet simulation and parallax occlusion mapping. So if we can just switch to the demo station. Let's get this going.

Hopefully we have audio. Simple. Make the viewer a part of the scene. A variety of high-end techniques were designed with the sole purpose of creating an immersive, detail-rich, real-time environment. And while these methods are past convention, they are implemented efficiently within the current specs of the most cutting-edge popular video game engines. Welcome to the Toy Shop.

A novel post-processing rain effect was developed to simulate multiple layers of raindrops with a compositing pass over the rendered scene. Motion parallax for raindrops is created through projective texture reads. The rooftop puddle was created by combining many different layers of detail with the dynamically generated ripple simulation. This implicit integration for the fluid dynamic simulation is calculated entirely on the GPU. The raindrop particle collisions generate multiple interacting ripples in the puddles throughout the scene.

We simulate raindrop splashing by colliding individual particles with objects in the scene. An animated high quality milk drop texture sequence was used for the raindrop splashes. To generate raindrops falling off objects, we used physics-based particle systems with stretch NolanMap claws. The stretching based on particle velocity created the illusion of motion blur. The illumination from the lightning system accurately affected refraction and reflection of the droplets, adding further realism to the scene. For the street puddles, a texture masking method was used to control puddle placement and depth in order to minimize the complexity of the street geometry. The street puddles utilize the same GPU-driven ripple simulation as the rooftop puddle. An offline raindrop simulation system was adapted to the GPU to render water droplets trickling down glass panes in real time. This system allows us to replicate the quasi-random meandering of raindrops due to droplet velocity, mass, and surface tension.

Parallax occlusion mapping is one of the most complex rendering techniques in the Toy Shop demo. This efficient algorithm provides a real-time, per-pixel ray tracing solution for inverse displacement mapping, using dynamic lighting, soft shadows, self-occlusions, and motion parallax. This technique generates smooth LED transitions without visual artifacts and maximizes visual quality by dynamically scaling the sampling rate of the parallax occlusion mapping computations. Each technology in the Toy Shop demo adds detail to the scene. Each additional detail changes the way you experience the environment. This attention to detail seduces the viewer and delivers a lasting impression of the limitless expression of real-time graphics. Great, can we switch over to the slides?

So now that we've listened to Agent Smith, we can carry on with our presentation. So the only thing I want to say about rain is that, as is evident, it's not just a particle system. We actually had to combine a number of techniques to get that sort of sense of realism. And all in all, it was over 300 shader just for those effects. So that's basically it for that. A little bit about the water droplet simulation. I'm going to preface this by saying that each of these techniques that I'm going to describe can fill a whole hour. So we're going to go through this very quickly. All right. So first of all, this is a modification of a technique by Kaneda et al. in 1999 in a paper called "Animation of Water Droplets Moving Down a Surface." Now, of course, our biggest modification was moving this all onto the GPU. So we have a surface that's represented by a lattice of cells. Each cell contains a mass value and a velocity in the X and Y direction. It's a 2D plane. Now, all this information is actually very convenient to pack into a 16-bit RGBA texture.

In our case, we actually used an FBO because we need to read and write to that buffer. So there's a couple forces acting on these droplets. There's the force of gravity pulling it down, of course, and there's a competing force of friction pushing it up. The friction force was actually artist-generated in a friction map of sorts that essentially modeled the surface imperfections. So it sort of allowed us to create almost a random behavior for the water to go down. Of course, the resultant force generated from the initial-- sorry, the resultant velocity generated from the initial velocity and these forces.

So the droplet can actually flow into one of the three cells below it. And this is, again, based on that velocity. Now, one of the things we did model was the affinity for water to want to go into a cell that already has water. So we actually added this bias into our calculation. So once we have the simulation complete, we have all the mass values, we can actually derive a bump map, which we then use to perturb the reflection and refraction vectors on our surface. We can also use this mass map to generate dynamic shadows on the objects that are close to the glass surface, in this case, the toys in the toy store. All right, so perhaps the most defining feature of this demo is parallax occlusion mapping. So let's go into a bit about that. First of all, our objectives.

We basically want our cake and to eat it, too. We want really complex surfaces without the cost of transforming it and without the associated memory costs. And as if that wasn't enough, we also want to be able to have this thing behave correctly. We want to be able to look at it and see the original geometry underneath, or at least think we see it.

So we can kind of already do this already. We have bump mapping. We have normal mapping. These are techniques that have existed for a while now, and you're all familiar with them. But they do have a few limitations. They don't exhibit correct parallax behavior. We don't have self-occlusion, self-shadowing. And the illusion tends to break down at silhouette edges.

So what do we do? We introduce parallax occlusion mapping. At its heart, it's a per-pixel ray tracing algorithm. It handles, it's great because it handles all the sort of viewing phenomena that we're interested in, motion parallax, self-occlusion, self-shadowing. It allows for a very flexible lighting model. And it also supports, this is important, it supports a flexible LOD system because this is not a cheap operation. So I'll talk a bit more about that.

And the sort of the last point there is that because everything, all the computations are occurring in tangent space, this can be applied to almost any type of surface, which is great. So a little more detail about the algorithm. Let's start by a cross-sectional view of our polygonal surface represented by the top white line. The bottom line is basically the maximum displacement we're going to allow. So consider this extruded surface as an example.

Given an input texture coordinate, this would be the displaced point with normal mapping. Nothing fancy so far. Now, with parallax occlusion mapping, on the other hand, what we want to do is we want to calculate our displaced point by casting a ray from our viewpoint through that input texture coordinate and then calculating the intersection with our extruded surface. And then from that intersection point, lo and behold, we get a texture offset into our normal map that we really want.

All right, how do we calculate that intersection with our extruded surface? Well, we start by calculating a parallax offset vector. This is derived from the tangent space view vector and is essentially the offset in texture space. What we do is we then perform a linear search by sampling the height field, which is the extruded surface, as a piecewise constant function-- sample, sample, sample-- until we find our intersection.

Now, there are issues with sampling-based algorithms such as this. So the most obvious one is the aliasing artifact that I hope you guys can see. It's sort of a staircasing artifact there. How did we get around that? Well, we actually changed the sampling rate dynamically based on, A, the surface topology, and, B, the view vector. And that's the calculation that we use to determine how many samples. And the great thing is that in GLSL, we can take advantage of dynamic flow control that Nick just talked about to be able to iterate n number of times for each pixel, so we can actually change that on a per-pixel basis. is. Last thing I really want to talk about is adaptive level of detail and how we applied it to the Parallax Occlusion Mapping shader.

I mean, it's not a cheap shader. It's very computationally expensive. We're doing dynamic branching. We're looping up to 64 times, I believe, is the max that we're looping. So we really need techniques to be able to speed this up. And one is to use the MIP map level to determine whether we should fall back to normal mapping, which is a much cheaper technique, or whether we should use Parallax Occlusion Mapping. And of course, we can blend between the two to create a transition. Now in addition to that, we can also apply a level of detail scheme on the algorithm itself. Based off the MIP level, we can also dynamically adjust the sample rate. So that's kind of cool. Actually, one more thing. I kind of lied. I'm not going to talk very much about ARB Shader Texture LED, or ATI Shader Texture LED, as Nick likes to call it. I just want to say that we did use it for our gradient computations. It just recently went into Tiger. It works. Definitely, if you get a chance to use it, it's fantastic, and we'd like to thank Apple for that. And that's basically it for my talk. I'm going to now hand it over to Alan, who's probably going to start up the QA section.

And just to let you know, we will be in the OpenGL lab as well. We're hoping to get the TorchHub demo running by the end of this presentation. I'll go try to set that up right now. But if you don't get a chance to see it then, try to come by the lab. ALAN PALLER: So thank you, Rav. RAV NARAYAN: Not a problem.