Graphics and Imaging • 58:24
One of the exciting new developments in computer graphics is the ability to create programmable per-pixel effects using the display card's GPU. This session covers the variety of different pixel/fragment programming techniques and discuss how to create incredible 2D and 3D visual effects.
Speaker: James McCombe
Unlisted on Apple Developer site
Transcript
This transcript was generated using Whisper, it may have transcription errors.
Good morning, everyone. My name is Travis Brown. I'm the graphics and imaging evangelist. And I want to welcome you to session 208, Fragment Programming with OpenGL. And this session is our second session in programmability in terms of sort of our focus on exposing the capabilities of GPU to do interesting things like vertex operations, and in this case, doing per-pixel operations, very advanced per-pixel operations at incredible speeds. A lot of you saw some demonstrations using fragment programs earlier this week in the graphics and imaging overview. And what we're going to do in this session is really sort of drill down and focus on fragment programming. And we're really looking forward to seeing what you, the developer, are going to be able to do with this incredibly new technology. So it's my pleasure to invite James McComb to the stage to take you through the presentation.
Morning, everyone. So, thanks, Travis. So today, yeah, I'm going to introduce you to this pretty new technology that we have, and we have a really great implementation of it at Apple combined with some, I think, really pretty cool tools that should make this easier to do than it would be, perhaps, on some of the other platforms. Before I start, I just want to point out, because a lot of people have noticed this, if you think my spelling's wrong, I don't think it is. It's just that I'm using the UK spelling for a lot of this stuff. And I was criticized a lot about this, so I'm going to try and just warn you now. Anyway, so first things, why would you want to do fragment programming? Well, it has so many applications, which is one of the most exciting things about it. Not only can it be used for 3D lighting calculations and the like, but you can also use it to do sort of even think of the Photoshop filters. Well, you can implement 2D effects, even color correction style things using these programs. You can do 2D displacements, all sorts of interesting things, things that you maybe wouldn't even think of, such as using the GPU for just general purpose computation, even thinking of it as a second processor on your computer. The other great thing about it is you can load this program up on your card, and you can open up your CPU monitor, and surprise, surprise, you'll notice it's taking no CPU time.
That's a pretty compelling reason to offload some of your intensive per-pixel calculations off the GPU. And then my third point, Apple, I think, definitely provides the best tools in the industry for building shaders specifically. Last year, I was here and showed you vertex programming and showed you how to use Shader Builder to do that. Well, Shader Builder has had a lot of work done to it and now has support for new languages and a lot of new features, which I'll cover later on in my presentation. So to run through, I've got seven points here, what you're going to learn. Where does fragment programming fit in the OpenGL pipeline? We'll also take a look at, you know, does your current graphics hardware you have allow you to do this? If so, great. If not, what do you need to buy next? The next thing we'll look at is our fragment program. This is the language that the architecture review board approved for fragment programming. It's a great language.
I'm going to talk more about it later. If you don't have the absolute cutting edge and you weren't willing to do that, we'll look at what sort of condolence prizes we have for you. And then we'll, again, We'll run over the shader builder and we will -- then I'll move on and open up shader builder and show you a basic fragment program and we'll work our way forward with that to something to show you just how the language works and move forward. Then I'm going to show you something kind of interesting with fragment program, that is using the GPU for general purpose computation and then I'll move on and show you how to get multi-pass working in the most optimal way with our OpenGL implementation at Apple. Then I'll look at how optimization applies to fragment programming. As programmers, we optimize where we should optimize stuff often. We'll see how that applies to this slightly different way of thinking about software. And then finally we'll move on to something a little more math intensive and look at implementing per pixel lighting in a fragment program. I'm going to pull that apart and hopefully remove a little of the mystery that may surround that. So starting off with my first point, where does it fit in the OpenGL pipeline? This has been covered pretty extensively in some of the prior sessions. Most of you I think will be familiar with the traditional OpenGL pipeline. So I'm going to try and move quickly here on this slide. You have your vertex data and you have your pixel data. Vertex data can enter through immediate mode calls or vertex arrays. Pixel data usually went through the text image calls. Or those could come from display lists that you have already prepared at your program initialization. Pixel data gets through all the pixel store stuff, gets unpacked and makes its way towards the graphics card. Now, vertex data traditionally, there's a very fixed function that's done to the vertices that you submit, they're simply multiplied by the model view and the projection matrix inside of OpenGL. Then they're clipped into the window coordinates and then OpenGL will make use of the GL lights that you've enabled and it will calculate the vertex colors based on a fairly primitive lighting model on a per vertex basis so the colors are interpolated across the polygon surfaces which we all know doesn't look so good. After that stuff is calculated, the rasterizer goes along and colours in the frame buffer basically with the pixels and then this last part, the blue dotted line that's just come up, that indicates that this is the process of feedback and that is taking your frame buffer and eventually bringing it back so it can be fed into a texture unit again for multi-pass operations. So I've just pulled out the fixed function transform clipping lighting and I'm also going to pull out the fragment operations, the fixed function ones, and then vertex program which has been around for longer, it replaces that part with a programmable language. And today I'm going to talk about replacing this part with our fragment program. So supported hardware. This slide I'm going to amend slightly at this point.
The scene for our fragment program, the ATI Radeon 9700. But the good news now is that all of our new high-end systems, the new G5 boxes, all of them will support our fragment program. So if you've been thinking about buying one of those, maybe this is another reason you might want to look into that. And then the ATI text fragment shooter, this is a significantly less capable language, but something that you might want to look at, and I will give you a little bit of information on it later. That's supported on some of the lower end cards, and for instance, our current 15-inch power books, the latest ones, do have a graphics card that's capable of vertex programming and support for this ATI vendor-specific extension.
So, what exactly is a fragment program? Just out of curiosity, how many people here have actually been writing fragment programs in the last while? How many people have done it before? Okay, that's good, great. So what exactly is it? Basically, it's run, this little program that you upload to the graphics card, it's run once for every color lookup during the rasterization.
Basically, whenever the rasterizer is going across and filling in the pixels on the surface of the polygons, normally it goes through a fixed function which can get pixels from the different texture units and blend them together. This allows that process to be totally programmable. So yes, you could go and get the data from the texture units or you could just return any color you want based on a mathematical function. Now, so again, the output of the fragment program is good, it's got a very well-defined output, single RGBA color, and optionally you can specify its position in the depth buffer so as it can be culled by later stages in the GL pipeline. With regards to inputs to the fragment program... The texture coordinate channels, which you would normally set up in the vertex program, you'll note that when you write an output vertex, you can put into the texture coordinate channels and you have the number of channels is equal to the number of texture units on your graphics card. The values you write in there in the vertex program, those are interpolated across the surface that the fragment program is running on, so you get that interpolation for free. So that's one of the inputs. You also get the interpolated vertex color and the position of that fragment in the window coordinates. Also with our fragment program you can access all of OpenGL state which is certainly very convenient. That means you get access to the matrices and also the program parameters as well which I'll discuss. And then the other important input is you can sample textiles from any of the texture which is important also. Let's take a look at specifics of ArbFragmentProgram. Great news here. If you've programmed ArbVertexProgram, the language is totally parallel to that. It was intentionally designed that way, so if you know ArbVertexProgram, you know ArbFragmentProgram as well. The only difference is that instead of dealing with XYZW components to specify a vertex position in 3D space, you're now defining red, green, blue, and alpha components for an output color. So the same thing applies, all these instructions, there are a few exceptions. 90% of these instructions are vector instructions, meaning if you do an add, it does the add for all four components. There are some of these instructions, we'll talk about later, which are considered scalar and that is, for instance, the pow instruction, to the power of. That's a scalar instruction, meaning it only deals with one component. So if you want to do a power for all four components, you need four instructions. That's one thing. So I've grouped these together. Those are all your arithmetic instructions. You've got some logic instructions like other languages.
Unlike programming a general-purpose CPU, don't be expecting to be able to do bit shifts and things like that. That isn't going to happen. So you won't be able to do shift left, shift right, any of that. but you do have some basic logic to determine if something's equal to zero or not, or it's greater or less than zero. The other thing then is these are the important ones that our fragment program adds. They're the texture sampling instructions. These allow you to get things from the texture units, with text being the most common one. And then you have some miscellaneous instructions like vertex program.
You have a squizzle instruction, so you can reorder your components in a vector, and you have this kill instruction. I haven't actually used it yet in a fragment program, but what it actually is supposed to do is stop a fragment from continuing down the OpenGL pipeline. It actually literally just kills it off. So might be some possibilities for you without instruction. Um... So input fragment program attributes, I've already actually covered this one, a bit of duplication here, but it still needs to be in the slide. Color, text coordinates, also the fog coordinate, I guess. Then again, the OpenGL state matrices, lights, materials, these are all accessible in the fragment program.
Then you've got these convenient little program parameters, which mean that you can make a fragment program that is parametric. For instance, if your fragment program is implementing a lighting model, the position of the light in 3D space, you want that to be sort of a parameter of the fragment program.
You can define a variable in your fragment program, and then there is a new OpenGL entry point, which you call and you can hand in four component, or four floating point values, and that just, those four values get uploaded to the card. And when you flush your next scene, those get applied. And then finally your output result color and depth. Advantages of our Fragment program, portability. This is the standard now. So if you're programming in this, I think you'll find you're in good shape because a lot of vendors are gonna pick this up and our high-end machines support it. And I think it's a good bet to go with and the ARB approved it, which always helps. And also the instruction set, it's very rich in comparison to some others I'll show you, which allows you better -- be able to implement better models for lighting or whatever.
And also you don't need to generate so many sort of frustrating lookup tables for instructions because they're just natively supported on the hardware. And also flexible texture sampling. There is certainly hardware design challenges to certain aspects of this which I'll discuss maybe a little later. But you can do with our Fragment program, you can do many, many samples. You can have them depending on each other and stuff. It's really cool. This is what I want to cover for this is sort of to support some legacy hardware. This is a vendor specific extension. I've certainly written plenty of Fragment programs in it. It's certainly quite usable, but it's not very pleasant to program in.
So it allows basic fragment programming. There is a fair amount of current hardware support. Shooter Builder does fully support this language, so you don't lose out. It's not like we aren't going to provide a developer tool for you. We do provide all the tools. We just discourage you from using it unless you absolutely must. So again, so now I'm going to introduce you again, for those of you who saw shader builder last year, I'm going to show you it again except this time it's quite a bit different and has support for new stuff.
Here's a screenshot of it right here. This is the new layout of the shader builder. On the top left of the window here, on the top left of the screen, that looks pretty familiar to those of you who have seen it before, except you'll notice that now the rendering window, the OpenGL view, is actually detached and on a separate pullout window, which is convenient because the multihead users out there will be able to take that and drag it onto a separate display head, and they'll be able to code on one screen, see their graphic in full screen on the other, which is convenient also. Also another major thing that Shooter Builder adds is the ability to monitor the actual state of the graphics card in terms of how much expendable resources you're using. So you'll notice on the top left most window at the sort of just above the debugger buttons, you can see these little gauges which are showing you how much expendable resources you're using on the graphics card. So as you put more instructions in, that's going to keep growing until it hits the top. If your fragment program stops working, well, at least now you have a fighting chance of determining why that you filled up your graphics card and you need to buy a new one. And then the other great thing here is this, like what it was before, on the right just above the resources and performance box, you can see this list of sort of identifiers.
When you're writing your program and you declare an identifier, it actually will take that identifier and put it in this list for you, meaning that when you're running the program, if one of those identifiers is a program parameter, you can pop open this great symbol editor, which is on the bottom rightmost window, the metallic window, and you can click on that, and you can slide those, and that will actually change the parameters on the card, and you can have any number of those parameters up to the hardware limits. That has some interesting side effects, which I'll cover later, which are pretty exciting. So just to summarize, Add a support, our fragment covered that. There's more control of the texture units in this shader builder. Again, program parameters can be changed on the fly. You've got the resource monitor. Better solution for the multihead folks because the rendering window can be put on a separate head. The underlying code has had some rework for better code sharing with the OpenGL profiler. Documentation has improved. So there's full documentation for all of the languages, a pretty nice UI for that. And it's with Panther, so your CD that you got, it's all on there. In terms of example code, that's available on the website. So, let's see.
Let's switch over to the demo machine and let me show you Shader Builder here. demo two, actually. Great. So, what we have here is, like the screenshot I showed you, on the top right of the window, we have the rendering view, which, you know, I can move it around. At the moment, it's just showing a simple quad, which is really what you want if you're just writing a fragment program. You wanna see the pixels. You don't really care about the geometry so much for a lot of the things you'll be doing. On the bottom right, again, you've got the texture units inspector, so here, if I click here, this shows me that in texture unit zero, I have this rock texture loaded, in texture unit one, I have the water texture, which is what's currently being shown. Now, let's look at the code here. We can see basically what's going on here. This is a very simple pass-through fragment program. It's doing nothing special. Remember, the program's running once. I beg your pardon?
Oh, it looks okay here, it must just be clipped. Sorry about that. Is that better? Okay, sorry about that. So what it's basically doing, this is running once for every pixel, let's just think of it as. And what it's doing is the text instruction, basically that acquires a color, it gets an RGBA color from a texture unit at a specified texture coordinate. So we can see that I have a temporary variable declared, which I've just called T0, and I am sampling from texture unit one, that's the second argument, from texture unit one into T0, and this fragment.textCord that you can see, that is the interpolated texture coordinate that it's using to look up at. And then I'm simply, in the last instruction here, I'm moving T0 into result.color. Result.color is sort of a fixed thing in the language which defines the output color you're writing. Let's look at, that's something I can do. So for instance, this texture one, that's the texture unit that I'm sampling from. So as I change that, you can see it's changing. That's very straightforward. Let's look at something a little more interesting. Let's implement, say we wanted a fragment program that would allow a program parameter to adjust the brightness of the image and do it all in hardware. How would you do that? Okay, let's create a new line of code right here.
So what we've done is we've declared a parameter p, which is bound to a program parameter, meaning that there's now a glEntryPoint that I can call with an index 0, and I can pass in a floating point value, which is sent to the card, and run in this program. So let's implement a brightness control. Pretty straightforward. A simple multiply will do that. So let's do a multiply. Remember, it's like assembly here. If you've written vertex problems, you'll be familiar. It's the destination first, and then the arguments. Multiply takes a destination and two arguments. So we multiply T zero with the X component. That's the first one last little change to make.
with the X component of P. Now right now, the brightness is at the bottom, so we're seeing it as black, but watch this. If we move over here, you notice P is in this identifier list. I'll select that. I will open up an inspector right here. It's the symbol editor right here.
This allows me to change those values. So remember, the brightness is stored in the X component. So I'm editing P, and you can see right here where my mouse is, I can change the minimum or maximum value this slider will go to, and I want to set the range to go from zero through to two, just to keep reasonable values.
And as I slide this right now, you can see it's sending that value to the graphics card, and the graphics card, there's a brightness control implemented in hardware. That's pretty straightforward stuff. Let's now show, okay, I want to implement a contrast control as well. I'm going to do a pretty cheesy implementation of contrast here. I'm just going to basically raise the color to a power where that power is the contrast. And if you may recall, I mentioned that the power instruction is a scalar instruction. Note I only had one multiply instruction, yet it multiplied all the three red, green, and blue components for me. The power instruction will not work like that, so I'm gonna need three of them, and I'm gonna need to do it for each channel. So I'm gonna do that. So I'm dealing with the red channel here. Red channel dealt with.
And you notice that Shader Builder is keeping, like before, as I type, it's all real time. It's updating on the fly, and it's showing me all the syntax information. It's highlighting the line that the error is on. It's just great to work with. Okay, we now have a contrast control. If I now take the Y component, I set the range like I did the other one. Zero through five happens to work well here. It's totally low contrast right now, hence white, but notice as I increase the contrast here, it's becoming more and more and more contrasty. And again, the brightness control works as before. So this is a simple example of writing a fragment program and showing how you get program parameters into it. So let's head on with the presentation here. Back to the slides.
So I'm going to now try and keep the rest of this presentation pretty example centric and get you thinking here. So let's look at 2D image displacement. Think here of the Photoshop filters that allow you to do, for instance, the twirl effect. That's what we want to be thinking about here. So it's another example of how to offload stuff to the GPU. Performance is excellent. And it could make interesting inner scene transitions in a game or something. There's certainly possibilities.
A little background on how this is going to work. What we're going to do is, I'm basically, I'm going to show you how to do a twirl effect in a fragment program. It's single pass. The way it's going to work is imagine a texture map, and that texture map, the colors of the pixels actually define vectors, define movement vectors, like displacement vectors. So what I'm showing here on the left of the slide, pardon me, is a very low resolution two by two texture, which would do a very coarse twirl effect. Now, what I'm doing here is for the top leftmost one, the one that's kind of looks red, that's pointing to the right, and it's encoding the X, Y in the red and green components, which you can see there. And you can see it's showing that there's a twirl going on. In my fragment program, I'll be able to sample those texels, and I'll be able to interpret them as vectors, not colors. I'll be able to interpret them as just, so think of texture units as just being lookup tables at this point.
Think beyond them as being what you've used them for before. Important thing to note, for colors in texture units, they have to be between zero and one and in positive space. So I want these vectors to have full freedom of motion. So what I'm going to need to do is they're packed into the texture units in a slightly different way, whereby I want them to go from minus zero through to one. So simple equation, I just divide that by two and add 0.5 to it, pushing them into positive space, keeping them between zero and one. In the fragment program, I will do the reverse of that to unpack them. So that's describing what the displacement map is. So back over to the demo machine here. where I'm going to just open up the other example, which I have laid out here.
Let me make it so you can see what's going on. Okay, the same problem again, huh? Okay, there we go, thank you. So switching to the fragment program, it's a pretty straightforward fragment program. It does much like what I described if you follow the comments. Let's take a look at what's in the texture units. In texture unit zero, we have this rock texture, which is the texture I'm going to displace, I'm gonna warp it. And then in texture unit one, I wrote a little program that basically just generated, it generated a spiral sort of effect, and it encoded those vectors as colors and put them into positive space, which is what you're seeing in the texture unit right there, in texture unit one. So my first instruction here is the text instruction, which samples the displacement map. I'll just highlight that right here. And then it does what I described, which is scaling it from color space into sort of a vector, which has negative and positive. And then I have a program parameter a bit like what I showed in my last example where that allows me to control the magnitude of the effect. So by multiplying that vector with a normalised magnitude, I can control how much of a spiral effect is going on. I add it to the original. So I'm displacing the texture coordinates here. That's how I'm doing the lookup. I'm actually changing the texture coordinates for each fragment. That's how you do the warping effect. And then finally, when I have a texture coordinate, I sample from the rock texture map in texture unit zero. So let me show you what it looks like. So here I select the magnitude parameter in the list. I open the symbol editor again. And as I slide this, you'll see there's actually a spiral effect going on. It might not look as good as the Photoshop filter, but the reason for that is simply because the quality of the displacement map that I put in was crudely generated. It was a simple program. You can imagine here how this same code is totally generic. You could create a displacement map that was a sine wave filter or any number of other displacements, and this same fragment program, all I'm doing is sending four values, four floating point values across the bus, and this is all running on the graphics card. I think that's pretty exciting.
So let's see now. Okay, yeah, let's move on. Back to the slides again. So now we're going to move on to general purpose computation in a fragment program. Given a new piece of hardware, it seemed utterly correct to me to implement the game of life on it. So we can say thank you to Dr. John Connolly for coming up with this interesting scheme. So -- and he did it quite a while ago, too. So yes, it can be done using a fragment program.
correct to do it in a fragment program because it's a totally parallelizable problem. This is the great thing. If you have a mathematical problem which can be parallelized, fragment program is a good place to do it as long as you don't have too many instructions that will fit on the hardware. It's also a great example of multi-pass because when you calculate one frame of life, you need to feed it back into the input and run it again and you keep doing that for every time quantum and you keep running it until either it stabilizes or it keeps going. And yes, it runs extremely fast. It's really cool and I'm going to show you that very shortly. Refresher course on the game of life for those who haven't looked at this in your daily jobs. So this configuration that we see on the left of the display is sort of a famous configuration in life called RPentomino. Basically it's interesting because when run, it runs for -- it doesn't -- a lot of life initial setups will die out and you'll just end up with a black screen. Well, this one does not. It runs and looks interesting for I think like 60 time quantums or something and then eventually it stabilizes. What life is, is imagine a grid and for each cell it's either alive or it's dead. There is no gray zones. It's either alive or dead. So what I'm going to think about it here is think we're calculate, we're running for every single cell on the grid. We want to figure out is this current cell going to be alive or dead in the time quantum we're trying to calculate. Well So, first of all, you sum up the total of the living neighbors in the surrounding eight cells. So, this basic equation on the left is what you do. N number of neighbors, and then you sum them together. When you have the total number of neighbors, you apply a very simple, consistent rule to that, which I've outlined here on the right. If the cell is currently dead, if your current cell is dead, and you have two or three surrounding neighbors, you will become alive. You turn yourself on. You're alive. If you're currently alive, and you have... I actually had that the wrong way around, but I'm sure you can read the code. But if you're currently alive, and you have two or three neighbors, you will stay alive. Otherwise, you're sort of fighting for resources, and you'll die. That's the basic principle of this. Run this for every pixel and do it per frame, and you'll get this interesting sort of little colonies growing and stuff. So let's take a look at... I'm not going to show you how I did this in Shader Builder. I figured it would probably be more intuitive just to show you it on the slides here, building in. So let's take a look at the fragment program that implemented this. First things first, declaring our variables. The important thing to note is that in a fragment program, the important thing in life is we need to be able to address each of the pixels in the fragment program. And remember that in OpenGL, if you're using 2D textures, the texture coordinates are normalized.
So the leftmost is zero, rightmost is one. If we have a 256 by 256 image to address each of them, you need to have a scale factor to multiply by to be able to address each of the pixels accurately, and that's very important in a calculation like this. So the next thing we do is we need to know the coordinates of the surrounding eight cells. Remember, this fragment program is currently determining the destiny of one cell. Is it going to be alive or dead? So it needs to find out the coordinates of the surrounding cells.
So those are the instructions to do it. It's pretty straightforward what's going on there. The next thing we do is we need to, we've got the coordinates of the surrounding cells, we need to sample them now. So we're using eight texture sample instructions here to get the colors of those surrounding pixels, the surrounding cells. When we have that, we add them together. It's very straightforward here. And then we implement the life logic. Now, this was actually a little tough because of the limited number of logic instructions.
There's no ifs, there's nothing nice like that. I had to use, you know, just is it greater than or equal to zero or less than or equal to zero and move them around and stuff. And I implemented the logic using that. It's not very clean, but we'll look at a better solution to that in a minute. And then finally, when we have the destiny of that cell as alive or dead, we write it to the output color.
This runs in parallel for every pixel on there automatically. So, okay, we've calculated what the life grid is going to look like the first time quantum, how do we get ready? We need to feed that back in for the second calculation. So how do we do that? Well, OpenGL has provided the API for this for a very long time. On our platform, it's extremely fast. It's really cool because the data does not have to travel across the bus. The frame buffer can be fed back in to a texture unit without even leaving the graphics card. So the process of feedback, your CPU, you'll see nothing happening and your bus traffic will be zero because it's all on the graphics card if you follow these simple instructions. First thing, you need to do a GL read buffer which defines where GL is going to read data from. You would set that to be GL front left. You could make it an auxiliary, a back buffer, but for simplicity I just made it a single buffered application so I just got it from the front buffer. When you have the read buffer set, you do a GL copy text subimage 2D which basically gets a rectangle from that and puts it into a texture target.
And again, it's covered pretty much what I've said there. And that's a little code snippet. It's only two lines to do it. And it's very fast. Now, the moment you've all been waiting for, what does life look like when it runs on a fragment program? So back to demo two here. Let's say goodbye to ShooterBuilder for a moment. And open that up. Okay, watch it, 'cause it is quick.
Basically, this configuration started off as three of those little ARPentamino configurations I described. And you can see it's sort of entered a stable state right now. There's a little bit of movement, but it's totally stable right now. So that was life running on the graphics card. It's just kind of interesting.
Can we cover it at the end? Thanks. Okay. Now, let's head back, head over to the slides here, and let's look at how does optimization apply to this new environment. That fragment program I showed you was hideously unoptimal. It was long, it had lots and lots of texture sample instructions. These are the very things that you really don't want to be doing in a fragment program because yes, it's very quick. It's very, very quick, but it could be quicker. And remember, I was only... I had a small window, and I was only texture mapping the face of one polygon. If I was in a complicated 3D game environment, which I'm sure some of you work on on a daily basis, performance really, really matters. And we're going to look at how you can make big improvements in your fragment program performance.
ShaderBuilder is going to help you here a lot, I think. Let's first look at some of the tricks that we can do. The first thing is, if possible, why instead of calculating things in the fragment program, if it's something that isn't absolutely necessary for to be calculated on a per pixel basis, remember you've still got your vertex program running. That's running for every vertex. Now for that example, there were only four vertices defining a quad, but many, many, many fragments. why not calculate it on a per vertex basis if it doesn't need to be as high resolution? And you can then pick up that data in an interpolated form in the fragment program as those values will be interpolated across the surface.
The other useful thing is using lookup tables at this time can bring performance benefits, and I'm going to show you how to do that. Instead of doing a calculation in the fragment program, why not pre-generate at your application launch time a table of numbers stored into a texture map and then be able to sample that in your fragment program, and there you have a lookup table, just like you're familiar with. The other thing is the instruction set is very rich, so I would encourage you, and I know no one likes to read the manual, but it's a good idea in this case to read through the instructions. Shooter Builder has a nice instruction browser. Go through them. Learn your toolkit. Learn the instructions you have because you could be implementing something in four or five instructions and you realize, oh, there's just a strange instruction I've that I've never heard of that actually does all of this in one instruction. That's gonna run faster, and it's gonna make your code shorter, too. So let's look at optimizing life specifically. How did I go about that?
This part I'm a little concerned about 'cause it's a little hard to explain, but I'm gonna do my best how I did it. This is a little clipping out of the grid that Life was running on. It's that thing I was talking about. One of the -- the thing that added a lot of instructions to that program that I showed you when I was running through it was the process -- remember there were eight texture sample instructions and that was sampling the state of all the surrounding cells. Well, that's a lot of texture sample instructions. If we could do that in less, it would be great. Not only do we have to sample those texture -- those eight times, we also needed to calculate the surrounding coordinates to do that. We needed to calculate the surrounding texture coordinates, and there were eight of them. Well, I'm going to show you a way that I went about it that actually has allowed me to drastically cut down the instructions here. If we look carefully at the life configuration, we realize that it exists only in one color plan. It doesn't require the three color channels. So let's take advantage of that. We realize that in fragment program, the instructions, 90% of them will deal, will operate in one instruction. Like Altivec almost, it will operate on all four components at once. So if we can pack more data into those channels and execute one instruction, in one instruction we can do the work of like three or four.
So what I did was outside of the fragment program, just before I did the feedback where I was feeding it, where I was doing the feedback, I actually took the sort of one bit image and I took it and I offset it at one pixel to the left and I stamped it into the red channel. Then I offset it at one pixel to the right and I stamped it into the blue channel and in the center it was in the green channel. So I end up with -- I've magnified it, but you can see what I've got here on the right-hand side. That shows you how I've packed the data in. This is really good because now if I do one texture sample in the fragment program, by By accessing the red, green and blue channels independently, I have got the state of the left and right pixels, meaning to get all the surrounding neighbours, all I need to do is read the -- is three texture samples instead of all those other ones. I only need three. And then by using the red, green, blue channels, I've got all the pixels. That's a useful thing.
The other important thing is remember that nasty logic that I implemented which was to choose whether the cell would be on or off? Well that was like six or seven instructions. Let's get rid of those with like one instruction. Let's create a lookup table. Here is a texture map. It's a two by eight texture map. And basically what we do is on the Y component of the -- on the Y -- in the texture coordinate Y component, we just put in a 1 or a 0. Is the current cell on or off? And on the X component, we have the number of neighbors that we calculated. And by simply doing a texture sample with that lookup value, it's a lookup table, we get back on or off. That's the color of the output cell.
That's another pretty useful thing to be able to do. After doing that, here was the benefit. Remember Shader Builder had that nice thing that would show you the number of instructions and the number of texture samples that were going on on the graphics card? Well, I just created a chart to show you what I was able to get it down to, and it runs exactly the same. It looks no different, except it's faster, if you measure it. What we see here is the instructions went down from 36 instructions down to nine instructions. That's a pretty big deal. Texture samples, those are expensive to run on the graphics hardware. Got those down from nine down to four. Temporary variables, well, not 13 of them anymore, got those down to five. And then this last one is interesting.
a DTR that expands that for you. It's a dependent texture read. What a dependent texture read is, so I've shown you the text instruction, which allows you to take a texture coordinate and look up and get a pixel back from a texture unit. Well, imagine that the coordinate that I used to look that up was actually derived from a previous texture sample. You can imagine that you have dependencies between them. Well, for every one of those dependencies, that's known as a dependent texture read. The graphics hardware has support for four of these.
four of these dependent texture reads. I am now using one of them. It's still a huge performance improvement. Dependent texture reads, if you're gonna use lookup tables in your program, you're going to inevitably use these. You've got four of them. I find that is more than enough for anything that I'm gonna implement. You can look into that after. Now, this is my final part that I'm going to look at. And this part has had me most concerned Because it seems that a lot of, when I look at other demo programs of people who've maybe implemented per pixel lighting models using not necessarily our fragment program, but other extensions, I find it personally very hard to understand what they're doing. I've looked at their examples. Yeah, they look cool, but it's like, I'm not really sure what they're doing. So what I've done here is I just threw a lot away and started from scratch, implemented my interpretation of a lighting model, which I think is correct, and it looks good. And I'm gonna just explain step by step. I'm gonna break it apart for you, show what's how it's done, and try and dispel any of the mystery there. So there's certainly some 3D math involved here, but I'm sure it should be quite understandable.
Before explaining it, let's look at what the contributing entities to this are. What are the sort of inputs to this fragment program? The first one is the position of the light source, where the light source is relative to the fragment that we're currently calculating the color of. Well, so we're gonna call that LP. The other thing is I have support for variable brightness lights, so that's another thing. For the light source, we have another, just a scalar, which is how bright it is. The next important thing is where is the fragment in 3D space, where is it located? That's important to know as well, because we're going to use it to calculate a vector from the light to the fragment. It's important to know that. The other important thing is the normal of the fragment. This is the key part that makes per-pixel lighting models look so cool.
And that is that the normals, unlike the OpenGL traditional lighting model, where normals only exist on a per-vertex basis, and they're interpolated across the surface of the polygon, Here, the normals are on a per pixel basis, meaning we can create these amazing detailed surfaces and the way it's done is if you recall back to when I showed you my displacement map where I had encoded a two-dimensional vector into a color, this is exactly the same thing except we're encoding a three-dimensional vector into each color. So not only an XY but an XYZ, so we're able to encode a three-dimensional normal into a texture map. And by sampling that in the fragment program, I can get the normal at that point, which is the large reason why this looks so good. And then finally, we have the fragment, we're calculating the colors of the fragments on the surface of a polygon, so we need to know the normal of that entire polygon so that we can transform the fragment normal. We'll cover that in a minute. Another useful thing I did to make this program simpler was I wanted to calculate everything in model space. So in your 3D game, your light sources, you're probably going to want to define them in world space or eye space. You're going to want to define them relative to your world. And then what I did was in the vertex program, I multiplied by the inverse model view matrix to transform it into model space so the fragment program had something easier to work with. So I've already actually covered the normal map, kind of what it is, but this is what it looks like. That's the base texture map that I'm gonna use. And you can see the one below is the normal map. Well, these psychedelic colors are what define the bumpiness of the surface. Again, the same little trick applies. The I have to do this sort of, you know, normals are obviously they range from, they're normalized, they range from minus one through one. I have to squash them and push them up into positive space using this super simple equation. And I'll unpack them in the fragment program. So, quick run through before I demonstrate it, what's going on here. We calculate the vector from the light to the fragment. We have a light source, we have a fragment. You know that vector, we calculate that, And we're gonna call that light vector incoming, LVI. Then, when we've done that, it's pretty trivial to calculate the distance, just a scalar, the distance from the light to the fragment. If we know the vector, we can get the magnitude, and that's the distance. And that we use to attenuate the light later on to get attenuation in here also. Then we acquire the fragment normal, like I described, we sample the normal map, and we get the normal.
FN. Then we transform that relative to the surface of the polygon. I explained that, I don't want to go too. Then the good part, if we know the fragment normal, and we know the incoming light vector, we want to calculate it after it's reflected. So it comes in and we get it coming out like that. We do that for every fragment. That's important also. And then when we have that, we do a dot product between that with the incident light ray, and then the direction of the light source. So in this lighting model, you could actually not just necessarily have the light pointing directly down on the surface, but you could have it sort of going along the surface. You can change it. I've actually made it fixed right now, but there's no reason I couldn't move it around.
I just wanted to have not spent so much time here on this. Then the next thing is to do the attenuation. Remember, LD, we calculated the distance from the light to the fragment. we're going to use that to attenuate the light intensity which will create sort of a fall off, which is important also. Like for instance, if I hold a light here, a little light here shining on this, it's not gonna have very much effect lighting up the far wall, that's the attenuation. And then we modulate that with the light brightness control. So quickly, a little bit of animation here. This is the surface over polygon, FP, That's the current fragment we're calculating. Here we have the FN, the fragment normal, which I acquired from the texture lookup. It's encoded in the red, green, blue channels. Here's our incoming light vector, like so. Then we calculate it after reflection with the normal. And then LD, that was the distance I was talking about. So I think it's probably very clear in your heads now what I'm going to do. And I'm gonna move over to Shader Builder now to illustrate this in action. and show you how really pretty cool it looks. Let's close our previous example.
get these windows in shape here. The resolution's a little smaller than what one's typically used to. Is that in range? Now, can you see it? Is that good? Okay. Um... So if you can see this, here's the Fragment program. I'm not going to go into it on a per instruction basis on this, it simply would take too long.
Again, this exact thing I'm editing right now on stage will be available as a demo afterwards. So you can go home, you can run it on Shader Builder on your new G5 that you buy, and then you'll be able to see this. You'll be able to see it, play around with it. What I've done here is I've actually commented out, I designed the program in such a way that the exact order of the stages I described to you are the exact order of the instructions in this program, and I put a comment that matches that exactly. So when you look at it, and you look at my corresponding slide, you can walk your way through this program after. So let's take a look. Texture unit one, there's our normal map in the bottom right corner of the screen. In texture unit zero, I have the base texture, which is the gargoyle face. In the rendering window right now, what you're seeing is for every fragment, you're seeing the color, you're seeing the incoming light vectors displayed for you, encoded and decoded as a color. So if I go here and I open up the symbol editor, This will allow me to move the light source around. So here, I'm moving the light position around, and you can see the vectors are updating. Let's pull the light back from the surface. You can see it, move it close to the surface there. It's right at the light, it's right on the plane now. You can see a little singularity there. So let's put it back. Now, let's start uncommenting some code and watch this effect build up.
Let's calculate the vectors after they've been reflected off the surface, and this also, this depends on the fragment normal, which comes from the normal map. So let's uncomment those lines. These are the light vectors after reflection. Again, move the light around, you can see them updating right there.
Let's now do the dot product that I described with the light direction vector, which is a constant right now. Move it around again. You can see it moving around. And then let's do the attenuation. And you'll see this makes a big difference. That looks better. And then the next stage is we multiply it by the color of the light. The moment the light is just white light, we want to be able to have that controllable, so we do that. We've got a colorful light, and then finally we modulate that with the base texture map, which we now have. That's that. Now, if we move the light source around, you can see how good this looks. I'm going to make this window bigger so you can see it more clearly. Let's start moving the light around. You can see it.
We pull the light away from the surface. You see it's getting far away. Light's so far away now that it isn't even showing up. Bring it in really, really, really close. So I think it looks pretty good. And then we can also, of course, we can change the color of the light. That's another program parameter. If I adjust that, I can edit it as a color. You see it's changing the color as well. So multiple parameters as well is supported in Shader Builder. So that certainly should be fun for you to be able to look at.
So, at the end of, I've reached the end of my presentation now. I hope you all find this enlightening. And I hope that you'll go home afterwards and we'll run the new shader builder and look at these examples. 'Cause I think there's a lot to learn here and a lot of scope for implementing these things. So thanks a lot. And I think I'm gonna head back to, hand back to Travis here.
Thank you, James. If you have any questions on the information that you've seen today, feel free to contact any one of these e-mail addresses up there. I'd like to actually add mine to it. I'm [email protected]. And again, any questions about the graphics technologies or various things you've seen relating to 2D and 3D graphics at WWDC, feel free to e-mail me with your questions. you