Graphics and Media • 1:12:33
The Mac OS X implementation of the OpenGL framework continues to track the innovations in the OpenGL specification. Come learn how to increase the 3D-imaging capabilities and improve the performance of your application. You'll get all the details of the most recent OpenGL extensions, as well as learn best practices and tips for modernizing and streamlining your graphics code.
Speaker: John Rosasco
Unlisted on Apple Developer site
Transcript
This transcript was generated using Whisper, it may have transcription errors.
Good morning. My name is John Rosasco. I'm a member of the Apple OpenGL engineering team, and I'm here to talk to you this morning about the state of OpenGL on OS X. In this session, if you'll indulge me for a minute, I'm going to cover what OpenGL is in a slide or two, and I know that's a little elementary for most of the crowd here, but I think it helps to kind of ground us in a minute before we go into the rest of the presentation. I'll also talk about OpenGL's role on OS X. We'll go through some of the new features and things that have happened in the specification and in Apple's implementation of the specification since WWDC '05. We'll talk about what's new in OpenGL for Leopard.
We'll go through some architecture slides, some of which, if you've attended previous WWDCs, you may have seen before, but we'll have some new stuff in there for you as well. We'll go through some of the new extension examples, and we'll kind of roll that together with best practices and basically how to make the best out of modern OpenGL specification and the features that have been enabled in OS X. Amen. So what is OpenGL? Fundamentally, it's a specification. And that specification describes a software interface to graphics hardware.
The OpenGL specification, currently version 2.0, has been developed by the OpenGL Architecture Review Board, otherwise known as the ARB. That review board... is a consortium of industry interests in graphics and in graphics and modern graphics technology and as of July 31st it has been rolled into a much larger open standards consortium known as the Kronos group. I serve as one of the two representatives for the ARB for Apple and The last year, year and a half have been a really exciting time because the ARB is working with better kind of collaboration and forward momentum than I've ever seen it before.
So the standard feels really good in that regard. It's sometimes tough when you go to these meetings and competing industry interests are in the same room trying to move an open standard forward. But I think they're doing an excellent job of that. Apple is very excited about this Kronos transition because Kronos, as an organization, it represents more diversified industry interests. It's a better funded organization, certainly, and they have a professional marketing arm that the OpenGL ARB really never did. So as far as promoting the standard and being able to work in cadence with other open media standards such as OpenGL ES, OpenML, and the graphics file interchange formats described by the open standard Collada, we feel that this is a really healthy transition for OpenGL to be able to work closely with these other standards, work with these diversified industry interests. You have people like Sony and Hitachi and Google, people that have different, much more widely varied business models than, you know, what's traditionally been in the OpenGL arm.
So what does OpenGL do? Fundamentally, it takes a few inputs. It applies, and those inputs are pixels and vertices. Then some visual operation is applied to that data, and it produces pixels or fragments out. And really, if you take a look at it, back at Apple, we have the OpenGL machine printed out on the wall by our group. And it's about five feet wide and four feet high. This thing has, you know, it's basically describing a state machine, which is kind of feathered in here, that has an innumerable number of states, it seems like, at times.
But it's often good to just reflect back and say, what is it that this thing is doing, and where are we in those three stages? Are we here at the input stage, where we're modifying this data? Are we in a kind of transformation stage, or are we in the output stage, you know, kind of the rasterization end of the pipeline?
So OpenGL's role on OS X, I think it's pretty fair to say, is very unique industry-wide. And its uniqueness is really rooted in the fact that there are application dependencies and API and framework dependencies because the Mac OS X user experience is so intimately tied with this kind of visual experience all the way back to the Genie effects, you see when you close windows, and all the compositing and transparency that's been, you know, now most folks are familiar with as far as OS X is concerned. You know, the heart of all that is OpenGL's, you know, processing that visual information.
So as far as OpenGL being Grand Central Station, I alluded to some application dependencies earlier. You know, one of the most fundamental application dependencies is the Quartz Windows server itself and all the windowing operations that it basically controlling the desktop and the OS X user experience. The other application dependencies, like iLife and iWork, represent a very broad base of Apple's, you know, basically their customer or their clientele that are using the platform. So obviously iLife and iWork is a big part of the OS X user experience. There's also the professional applications like Final Cut Pro and Logic and Shake, you know, for music and digital compositing and video editing. Those all also depend fundamentally on OpenGL as well. And there are also the API dependencies, some of them of which you may have seen in this session, such as Core Image or Core Video and the new Core Animation. These all depend on OpenGL as well as Quartz Composer.
So a little bit about the past and the present of OpenGL. At its beginnings, OpenGL was a single vendor, really a single company API where it was known as a 3D graphics API, and primarily in its very beginnings, it was used for scientific visualization. And shortly thereafter, kind of in the Jurassic Park time frame, it became more and more known as an API used not just for scientific visualization, but also for the entertainment space. But in both cases, kind of a professionally geared equipment and software, you know, tens of thousands or hundreds of thousands of dollars per seat kind of thing.
Well, today, OpenGL is a much more ubiquitous standard with a much bigger clientele, a much bigger audience. And its presence on OS X is a big part of that. One of the really important aspects of OpenGL is that it is the primary interface to what are today these vastly powerful GPUs. and it is the sole interface to those. You know, sans having some abstraction layer in your way, right? But the thing that's amazing about these GPUs is that the transistor counts on the big GPUs, like I was just kind of going online the other day looking at ATI Radeon X1900. That GPU had 384 million transistors in it. You know, we have a highly vectorized processing unit. massive transistor count.
And when I compared that to even an Intel Xeon 64-bit dual-core chip, which was weighing in around 190 million transistors, the relevance of that interface to this processing unit really becomes apparent, and what kind of horsepower and importance that interface holds. So as far as OS X is concerned, yeah, it's a 3D graphics API, But modern OpenGL is not only 3D graphics, but it's also compositing. It's windowing operations. It's digital video processing and image processing.
So new things since WWDC '05. We've been very busy, and one of the biggest new things is OpenGL 2.0, and I'll talk a little bit more about that later. GLSL, as of the last WWDC '05, we did have GLSL support, but it was only in software. Since that time, we have given GLSL full hardware support. The introduction of frame buffer objects at that time in the Mac OS X OpenGL implementation of OpenGL. Also, pixel buffer objects. For the first time, stereo and window on the platform.
And processing--we've added a feature for processing unit determination based on the enabled state for your application. We also added an extension for our Vertex and Fragment program that allows batching of program parameters and a lot more efficient kind of uses of the API so that you're not just hammering on the API to send these parameters into your shaders.
We've added an extension called Apple Flush Buffer Range for partial flushing of VBOs and more efficient use of the bus. We've added Intel GPU support, so that's kind of a big departure, or not big departure, but a big addition, because, you know, adding a driver is kind of a big deal on the platform. Obviously, you know, not only is the GPU support there, but, you know, the OpenGL implementation has had to do a lot for CPU support for the Intel platform as well. You know, there's a lot of optimized paths, and when it comes to OpenGL development, There's no room for messing around or dropping megabytes per second on the floor when you're trying to move 2 or 3 gigabytes a second. So obviously we've got to make the most advantage of the CPU that we have at our disposal. Council.
So for Leopard, what do we have new? OpenGL 2.1 will be available in Leopard, and GLSL version 1.20 of the language spec, that GLSL 1.20 is available in the developer seed that you've gotten, or the Leopard preview DVD that you've gotten here at the conference, so that's ready to go. You can find out a lot more about the GLSL, you know, modern GLSL usage and examples by attending Nick Burns' session, number 216.
We'll also add support for OpenGL ES 2.0 in Leopard timeframe. 64-bit support, a big one to make your application 64-bit ready. We have multi-threaded our GL engine and our command stream for processing. You know, the pervasiveness of multi-core systems was just too tempting not to make that change, that transition. We'll have a new major version rev of OpenGL Profiler available in Leopard. And we're going to have some to-be-announced shader tools.
So this is probably one of my favorite things that's happened since the last WWDC 05, which is that our group has done a lot of work on updating the documentation so that you can get the most out of your application on OS X. This documentation is really superb. They had a lot of engineering support working with technical publications on this. Very well illustrated, great examples, really great coverage and continuity, you know, across the platform. So I really encourage you to take a look at this. It was updated recently, June 28th. So I hope you find it as interesting and kind of compelling as I did when I first took a look at it.
So on to architecture. As far as the architecture on OS X, I alluded to some of the diagrams you may have seen in previous years. OS X has some really unique challenges when it comes to architecting an OpenGL implementation. If you consider that you have a system that is capable of deriving multiple displays using multiple heterogeneous devices from various different graphics vendors, That means that this one device may be driven by logic that was written in part by Apple and part by NVIDIA. This display over here is driven, again, in part by Apple, but by ATI or Intel. Then you take your application. You have an extended desktop across those two devices, and you move it from a GPU that has capabilities of, say, the ATI X1900 over to something with the capabilities of one of the older NVIDIA parts, for instance, you lose a tremendous amount of functionality as far as the GPU is concerned. And if there's a bunch of state that's not supported in hardware, obviously the implementation has to do something smart with all that state. It has to move it across and basically seamlessly make your application kind of do what we call renderer switch when you move it across. So there's--you know, for these reasons, you know, it's kind of why the OS X, you know, combination of OpenGL is quite unique. We'll describe the framework interface and the driver model.
So on the top of this diagram, you see your application. And on the left side, you'll see the windowing interfaces to OpenGL. There are three of those available on OS X. One is GLUT. And you can see the dependency chain for GLUT. GLUT depends on NSOpenGL view. So essentially, GLUT results in having a COCO-based application. Over to the right of that, you see AGL. And AGL is for carbon-based GL applications on OS X. Underneath GLUT is NSOpenGL app kit slash Cocoa, if you want to put another slash on there. And always remembering that the app kit code and functionality is not interchangeable with AGL. So those are two paradigms you never mix. And similarly, because GLUT is derived from NSOpenGL, you do not mix that with AGL as well. So underneath that, of course, is driving through the state to the OpenGL engine and drivers that ultimately control the hardware.
So as far as the driver model, we talked about the application and the framework interface. You may have heard in some of the State of the Union sessions some of the work about the OpenGL Engine. And this is effectively another logical module. One of the kind of configure application, it's a modular you So you're really just trying to wire together these different elements and you know in a manner that makes the most sense logically. Some of this wiring happens you know kind of beneath the covers for you but ultimately you control this by you know choosing pixel formats and some of the state that you're that you've set up in your application and so forth. So you have the the GL engine kind of command stream which then drives the commands into the you The driver layer beneath it, and we have our new Intel driver plug-in there in blue. On the right, the previous three were at the last WWDC. And ultimately, once the driver state has been set, that drives the commands through to the hardware. Thank you. So that's it for architecture. And after a little water, we'll go into features.
So new features for OpenGL and Mac OS X. OpenGL 2.0. If you consider, most of you are probably quite aware that when OpenGL makes a revision, a minor revision number is usually pretty significant. In OpenGL's case, a major revision number, therefore, is even that much more so. And what makes OpenGL 2.0 much more significant than the 1.x series is the introduction of the GL shading language. That, of course, is the descendant of the arb vertex and fragment program method. book in the Addison Wesley series, we now have the orange book.
rather than paying the price to-- if you need multiple output destinations, paying the price to do transform clipping and lighting and the shader processing more than one time. There is also in OpenGL 2.0, non-power-of-two textures, also an efficiency measure. You know, most domains, the most problem domains that come to visualization are really, you know, have very unique characteristics as far as the size of the textures that you're using. And obviously, power-of-two, the power-of-two requirement is not always representative of what people's needs are. So that's where the non-power of two texture extension came from. Two-sided stencil support is another efficiency mechanism in computing shadow volumes. And point sprites are useful for doing kind of particle simulation things. So if your applications are interested in any of the kind of features illustrated here, take a look more closely at the GL2.0 spec and see how it might help you.
So for one, the Leopard timeframe, as I said, V1.20 of the GLSL language specification is in the C that you have. 2.1 brings us pixel buffer objects, and we'll talk more about that, quite a bit more about that in this session today. There's also the sRGB textures have been added to the 2.1 specification, as well as non-square uniform matrices. will also be available in the Leopard timeframe. Apple feels it's very important to have ES 2.0 on the platform, and there's a very important reason for that is that you may have noticed that the OpenGL specification is moving into a more and more kind of objectified, kind of state-encapsulated form over time. And the OpenGL 3.0 specification is going to be the biggest push in that direction. And it really, you know, I guess you're kind of asking why does that tie into OpenGL AS 2.0. Well, OpenGL 3.0 also is looking to be the first time where features are actually deprecated out of the specification and actually dropped from the specification for the first time. And that is the precedent set by OpenGL ES 2.0. So all expectations for the Architecture Review Board are that OpenGL S3.0 will basically have OpenGL ES 2.0 as its ancestor. So it's really a great thing for OpenGL because, you know, when you develop a specification that's 400 pages, now pushing 400 pages long outside of the GLSL specification even, and you have a number of, you know, industry interests. And the thing, you know, I guess OpenGL was 1993-ish is when the transition was made from Irish GL to OpenGL. That's a lot of years where, you know, things have creeped in and creeped into the specification and that have then subsequently been outdated or outmoded in some way. So the great thing about the ES2.0 specification and OpenGL 3.0 is that the API is going to have a denser kind of concentration of best practices type APIs. It's not going to be, you're going to have fewer of these things that allow you to shoot yourself in the foot, essentially.
Let's see, on to framebuffer objects. Framebuffer objects really represent a paradigm shift for the OpenGL specification. If you think about the history of the API back to the Irish GL days, in the Irish GL timeframes, the API had all of the functionality required for windowing. I mean, there was an event mechanism and all the resize and all of the things required to do the window, you know, to handle the windowing. When Irish GL was moved to an open standard, one of the most fundamental changes to that, to the specification, was to pull all of the, you know, kind of OS kind of windowing system specific stuff out of the API, you know, leaving OpenGL, you know, kind of leaving the common elements of OpenGL alone. That change, although improving portability, also meant that OpenGL was no longer managing surfaces for your application, and so it meant that destinations for rendering, you know, things that were managed by the windowing system became something that had to be basically platform-specific. You know, if you think of pBuffers and all the various APIs that are used to, you know, where you're doing a context switch and you're rendering off-screen and you're using something that's specific to Apple or, you know, one of the other vendors, it really kind of gives you quite a porting headache if you have a multi-platform application, right? So framebuffer objects represents that paradigm shift from, you know, only windowing systems, you know, manage surfaces into now the core of OpenGL is managing surfaces and destinations for rendering. And there's all kinds of stuff that that implies. If you step back and you think about it, you know, if you look at the various APIs for configuration of an OpenGL application and how you choose pixel formats or an X11 visual, for instance, you know, all of those decisions about making a compatible destination for renderer, or rendering, rather, get pulled into the OpenGL core. So it's a big deal. And the ARB did a really, really good job of doing the specification, representing everybody involved, and making it very easy to use. And I'll try and prove that with an example.
So fundamentally, framebuffer objects are known for their use as a mechanism for rendering to texture. So this example shows just the very kind of rudimentary use of framebuffer objects insofar as how you'd use them to render to texture. So just like any other kind of IDs you would generate in your OpenGL application for textures or shaders or others, you generate a framebuffer object ID. I omitted those couple steps on the slide for, you know, just to make the font bigger, I guess, basically. So you generate that ID, and you bind that framebuffer as we have FBID there. You then attach the texture that you intend to use used as a destination for rendering to the frame buffer that you've bound. And in this case, we're attaching it as a color texture so that when you're rendering executes, it modifies the color planes of the texture.
Then once you've attached the texture, you need to validate the frame buffer configuration that you've established. So if you think about going through each of the steps to-- you know, you can attach textures or render buffer storage objects, which is another kind of facet of the FBO API, I combine all these things in various ways and they don't necessarily groove with what the hardware is capable of. So if you if you just kind of willy-nilly start combining something here that's 36 bits something over there that's 12 bits and something that's 9 bits and that sort of thing and there's big stack well the hardware may not be able to render it so this next step here is to verify yeah the hardware can handle what you've done and it's gonna you know it's gonna execute properly. So you check that check the validity of that...configuration with GL check frame buffer status.
So once you know that the frame buffer itself is valid and your texture is attached, everything's bound, you go ahead and do your rendering. And when you do your rendering, then of course the texture ID that you've attached to the frame buffer object gets updated. So you finish doing your rendering pass, modifying your texture. Now you bind the default frame buffer, which is the only kind of frame buffer or Windows system managed surface now. That's all kind of transparent. And then go ahead and use the texture that you've modified as a source for rendering now. So if you look at that, what is new here? Well, bind frame buffer, that's just another object bind, so that's not even new, really. The only thing new in this example to do render to texture is frame buffer texture and check frame buffer status.
That's it. That's rendered to texture using the FBO specification, which is pretty amazing. If you've messed around with, you know, some of the other off-screen rendering mechanisms, it's really worth checking out. And, you know, now that all these surfaces are in hardware, it's like you're basically associating things that are in VRAM to VRAM, so it's very fast. So... This is an example of just the most fundamental use of FBOs. And, of course, the specification is much more rich than that, and it's superb as far as its examples. And it also is very superb with its kind of issue list. So if you go through there and you look and you think, oh, they shouldn't have done that, well, go look through the issue list and you'll probably figure out what discussions were held around the decision that you're trying to cross-examine, and you might get an idea of why the decision was made to do things a certain way. So with that, I'd like to introduce my colleague, Alex Eddy. Alex wrote some great demos for today's presentation, and he's going to demonstrate to you a kind of sophisticated use of framebuffer objects.
Thanks, John, and good morning, everybody. So this demo's gonna show you three different ways that you can use framebuffer objects in your applications. And as John mentioned, if you've been using some of the older render detection mechanisms like off-screen windows or pbuffers, then you really should think about moving to FBO to get some of the ease of use, performance, and portability benefits. If you've never done anything with render detection, then hopefully this demo will give you a couple new things to think about.
So I'm going to start off really simple here, not even using FBO yet. And I'm drawing this now traditional Stanford bunny model. And there's some lighting and fog effects going on here, but nothing too complicated. This is just an example of some 3D rendering that your application might be doing. Now here's the first technique, rendering into a texture.
On the left, I'm drawing the bunny model just like before, but on the right here, I'm rendering it into an FBO. So what I've done to set this up is just like John described in the slide. At the beginning of my program, I generated a framebuffer object, and I attached a texture to it. Then in my draw loop, every time I want to draw, I find that FBO, and I draw the bunny, which gets rasterized into the texture. Then I can turn around and use that texture to draw this one textured quad on the screen. So it's a simple two-step process. Draw into the FBO, use the resulting texture, draw it back to the screen.
Now, now that I have some content in a texture, I can really do all kinds of things with that as a texture. So I can map it onto any kind of arbitrary geometry that I want. Here's just a very simple 3D cube. And if I wanted to, I could play around with the texture coordinates. I could scale and distort and warp the texture. I can generate mipmaps on it. I can change the filtering mode. At this point, it's just a texture. You can do anything you want with it. So if I take away the faces of this cube, you can see that I'm drawing six of these bunnies.
And at this point, you can realize some of the performance benefits of caching a 3D rendering into a 2D image. This original bunny model has about 16,000 triangles in it, which isn't that much nowadays. But it's still enough that if you wanted to draw it many times in a scene, you would have to start worrying about the performance. On the other hand, once I've cached it into an FBO, every additional bunny that I want to draw only costs me one quad. So if I can draw six bunnies, might as well draw a couple hundred bunnies.
So here you can see it's very cheap. Every bunny is just one textured quad. And some 3D engines are doing this nowadays for their outdoor terrain environments, for example, with trees. You could draw a high polygon model of a tree when it's close to the camera, but also cache that rendering into a texture and use the texture to instantiate 100 or 1,000 copies of the tree way off in the distance, where hopefully you can't realize that it's not a full model. So that's the first technique, just rendering to texture and maybe using the texture to draw a lot of copies.
So now that I have all these crazy bunnies spinning around, a second usage of FBO comes to mind, which is dynamic environment maps. This technique has been around for quite a while, but just a quick recap on how it works. Over here, I have the six faces of a cube map texture unfolded on the screen. And you can think of this, if you fold it back up into a box, as kind of a box you can place around your entire world and use as the environment.
And I can dynamically generate the content of each one of these faces by placing the camera at the center of my scene, giving that camera a 90-degree field of view, and then pointing it left, right, up, down, forwards and backwards, and drawing everything that I want to be able to see from that point of view. When I'm done, I can use that resulting cube map texture with OpenGL's texture coordinate generation and apply it onto my main model to make it look reflective. So you can kind of see that the main bunny now is reflecting all these cubes that are flying around it. That technique has been around for a while, but the nice thing about doing it with FBO is it's really fast and easy. Literally all I have here is a loop where I attach each of these faces in turn to the FBO and make it the target of all rendering. I set up the camera and draw. When I'm done with the loop, the texture is immediately available for use. Unlike some of the previous rendered texture methods, I didn't have to make any copies of textures. I didn't have to flush the rendering pipeline. I didn't have to do any context switches. It's really simple. Just bind, draw, bind back to the window, use the texture.
Now, there's a third technique I want to talk about, which is full screen post-processing filters. So far, I've been using two FBOs in this demo. First, I've been rendering the bunny into the cube. And I've been rendering some of those cubes into this cube map. And then I've been using all the results and drawing things back to the window. Well, I can actually create a third FBO and render all of this content into a giant texture attached to that FBO. Then I can use that texture and draw back a full screen quad to the whole window and apply some fancy shader effects onto it. So for example, here's the depth of field effect. In this case, I also have a depth component texture, which I'm using as a depth buffer visualized down here. And there's a little GLSL shader, which is literally three lines of code. And for every pixel, it's using the depth value as a kind of level of detail bias when it looks up into the color mipmap texture. So what this gives me is a kind of per pixel blur control, depending on how near or far from the camera that particular pixel was. I have some parameters on the shader so that I can change the focal point and bring things close to the camera in focus and blur out the stuff in the background, or vice versa, make things close to the camera blurred.
So that's just one example. But the key thing here is that once you've rendered your scene into a texture, there's a whole universe of 2D image processing filters that you could be running on it. We ship a whole library of core image filters, so you can imagine how you could chain together a whole string of video operations on your 3D scene. So that's the FrameBrief for Object demo. I'm going to turn it back to John now.
Alex is pretty fun to work with. I asked him if he could do a frame buffer object demo for this presentation, and I came back about a day and a half later, and he shows me this thing. I was like, hallelujah, bro, that's nice. So on to pixel buffer objects. Pixel buffer objects, I've got this kind of redundant verbiage up here that says pixel buffer objects, buffer objects for pixels.
The reason I was calling that out is because the pixel buffer objects are really a specialization of the buffer object specification. So like vertex buffer objects, there's kind of an interchangeability there, because all of the mechanism infrastructure to use a pixel buffer object is nearly identical, you know, saying some tokens and enumerates you use to enable them as the VBO specification.
The pixel buffer objects really are containers that OpenGL managed resources where they maintain the pixel data. The pixel data is maintained in VRAM. So moving pixel data between a frame buffer and a pixel buffer object is lightning fast. If you consider these GPU cores where they have a memory bandwidth in the tens of gigabytes a second, It's very easy to move data between, like, destination rendering into a pixel buffer object, which can then subsequently be used for, you know, a drawPixels operation or as a source for texturing.
They also allow you to do very, very fast readbacks, if it wasn't obvious from what I just said. If you remember some of the logic, if any of you have played around with what Apple called the async read pixels kind of method, they allow you to do-- to achieve those same results with a much simpler interface.
So I'll go through an example, a very simple example of using PBOs. And, you know, historically, if you think about the kind of bidirectionality or the symmetry of bandwidth between, you know, moving stuff onto the screen, you know, rendering or drawing to the frame buffer versus pulling it back, there's a long history of a big asymmetry there. And there's been all these kind of clever tricks that you need to do to try to get the upstream bandwidth to be as high as the downstream. Well, as of PBO, that is no longer the case. The symmetry of upstream bandwidth to down has been essentially rectified.
The core specification will kind of handle this problem for you and make it very effective or efficient for you to do multi-pass rendering where you need to get those results back. Maybe these vastly powerful GPUs can do the stuff that you need to do with that data, maybe they can't. Maybe you need to pull it back to the CPU, munch it, you have something specialized in your problem domain or the kind of scientific visualization or game or whatever you're working on needs to process that data. The PBOs are a facilitating mechanism for that. And we've seen that with some of the clients that have come to us saying, I have this real time video feed. It's coming in in HD. I need to pull it back and I need to do something with it. What do you got?
Well, the answer to that is PBOs. So just like the FBO demo-- and of course, nothing new-- is that you generate an ID and you bind it for PBO usage. You issue a rePixels call, which you'll notice that the data parameter, the last parameter in that call, is 0.
because you've essentially-- because you've bound a PBO for basically the packing destination, if you will, for the pixels, you don't need to specify a client address to pack into. Once the pixels have been read back, that's as simple as you need to be for getting the pixel data back out of the frame buffer. You can map the PBO just like you would map VBO. You can modify the pixels, do whatever you need to do. You then can take the PBO and bind it as a source for drawing or texturing. So now that you've gotten the content back, you've modified it as you want, you can go ahead and bind it.
And then once it's bound, you can do a drawPixels. Again, this has an example showing, you know, zero for the client address. And I just recognize, looking at this slide, that after you get done modifying the pixels, you need to un-map the buffer. So looks like there's an omission there when I put that together. So don't forget that step. So if I could bring Alex back up, I'll show you another cool demo on PBOs.
OK, so as John mentioned, you can use PBO for a couple different things, including accelerating a pixel upload and download. But those techniques are going to be covered in more detail later today in the OpenGL performance session. So for this demo, I'm going to talk about using PBO to do render to vertex array.
So in order to do this-- well, first of all, the key concept is that we want to use pixel colors as vertex positions. That sounds a little weird. Don't worry, it is. In order to do it, I'm going to use a trio of buffer objects in this demo-- FBO, PBO, and BBO. So let's go through it step by step. To start off with, I'm using FBO just like in the Bunyu demo you just saw. I have a framebuffer object with a texture attached to it, and I'm drawing some kind of content into it. Right now, the content is just this very simple gradient, where I have red going horizontally from 0 to 1, and green going vertically from 0 to 1. So that stuff gets rasterized as RGBA pixel colors into a texture.
Next, I have a mesh of geometry here, which is being stored in a VBO. Now, to set this up, at the beginning of my demo, I called geogenbuffers to get a VBO ID created. And then I called bufferdata to allocate enough storage for this end-by-end mesh. And more specifically, I want to allocate enough storage that for every one pixel in this texture, I have one vertex in my mesh.
And when you use VBOs, you can provide a kind of usage hint so the driver can optimize where the memory's going to live. If you've used VBOs previously, you're probably used to hints like static draw or stream draw, depending how often you're going to be updating that geometry from the CPU side.
In this case, though, I've used the hint stream copy to indicate that I'm going to be changing the geometry in this mesh every single frame. But if possible, I want to copy from a GPU side resource into another GPU side resource. I'm actually never going to be touching this data from the CPU.
So if the FBO contains RGBA pixels and the VBO contains XYZW vertex positions, how do I get between the two? Well, this is where PBO comes in. PBO builds on top of the VBO spec. It's really exactly the same API. All that PBO spec does is adds two new targets, pixel pack buffer and pixel unpack buffer, which you can use for pixel read and write operations. So what I can do here is bind the target pixel pack buffer to this VBO ID. What that's going to do is it's going to set the memory storage that was allocated for this geometry as a destination for all read operations. Then in my drawing loop, after I've rendered some content into this FBO, I can just call readPixels. And that'll copy out every single RGB pixel color here down into my geometry mesh, where I can then use it as XYZW vertex positions. So if I start drawing some content, I think you'll see how it's working.
So you can kind of see now that red and green are corresponding to x and y in this plane of geometry that I have, while blue is corresponding to z. So I get this kind of height field effect going. And this is a copy operation. The current spec doesn't allow you to render directly into the vertex array. But if the texture storage is on the GPU and this VBO storage is on the GPU, then it's really just a GPU memory to GPU memory copy, which nowadays happens at dozens of gigabytes a second. So it's pretty fast.
So now that I have all this set up, what can I do with it? Well, right now, I'm drawing these little blue ripples with the GLSL shader. And if you want to learn more about GLSL, please attend the next session in this room. You'll learn all about it. But to keep things simple, I'm just doing a little math with a sine function so that I get these kind of rippling waves that move out from the center and fade away over time.
Now, GPUs are vector processors, so I might as well do all that math for two centers simultaneously, take advantage of some of that parallelism. And I have some parameters on the shader so that I can change things like the wave frequency, make it lower or higher frequency. If I get out of wireframe mode for a second, I can move these centers around, and you can look at all the neat little constructive interference patterns that it makes.
So that's all neat and fun to play with. But if all you'd wanted to do was render sine waves into a mesh of geometry, well, you probably could have done that with just a vertex program. So the real power of this render to vertex array technique is that you can render arbitrary geometry and then use the resulting pixel colors as vertex positions. So here I'm just calling glutsolid torus a couple times, and I'm drawing the fragment depth values into the blue channel, and that result is automatically extruded into this mesh. It's kind of like a 3D photocopier now.
So you can really draw anything you want. This is pretty powerful. Another example, I could be drawing a texture into the FBO, and I can see that result extruded into the mesh. And I can, you know, bring back some of the sine wave ripple. I can still move all that around. It's all real interactive.
As a matter of fact, since all the shader execution is happening on the GPU, and all this geometry transform is happening on the GPU, and the copy between the two things is happening on the GPU, the CPU is doing practically nothing at this point. The CPU is entirely free to do other things, which is great. Have your graphics card, do the graphics, CPU can work on other stuff.
So, so far, I've only been drawing content into the blue channel here, but of course, I don't have to stick with this red-green gradient. I can draw anything I want there, too. So for example, I kind of hinted at the end of the FBO bunny demo that once you've got your scene rendered into a texture, you can apply any kind of 2D image processing on it. Well, guess what? If you do that now, that image processing will get applied to the geometry. So I can run, for example, a twirl filter. And you can see the green and red moving around. And the x and y positions are moving the same way.
And I can start to combine all these things. I can still move these around. I can bring back the rings. It's all going simultaneously here. So you get this kind of really neat fusion of 3D geometry and traditional 2D image processing going on. So I think there's a lot of room for exploration here. That's the PBO demo. Please read the specs and try it out. I'm going to turn it back to John.
What can I say? Another day and a half of work from Alex. Thank you. Thank you, Alex. For Leopard, you may have heard some rumbling in some of the previous sessions about resolution independence, and there's one thing that can be kind of a gotcha as far as resolution independence is concerned for your Cocoa applications.
You know, traditionally, it's been pretty easy to just take a reshape func and apply your, you know, kind of projection and viewport transformations and kind of assume that the bounds were act of the view that you're using or the NS-- you know, typically kind of subclassing NSOpenGL view. It's been easy in the past to assume that that bounds rect is equivalent to the pixel space of the display.
With resolution independence, you can no longer make that assumption, and you need to make a transformation between that view rect and the screen pixel space. So the line you see here in yellow is a simple Cocoa call to do that transformation and convert the bounds rect of the current view into those pixel coordinates. If you don't take that step, What you'll end up seeing is a viewport in the bottom left of your window rendering all of your content with a big black band on the top and the right. So make sure you take a good look at your reshape function for your Cocoa applications and make sure that they've taken into account that the view rect or the bounds view rather which is in points, which is 1/72 of an inch, has been correlated to the screen pixel space.
So also new for Leopard, 64-bit support, of course, allowing us to address 4 gigabytes and beyond of memory. I'm assuming Einstein had 4 gigabytes or beyond of memory. It also makes your OpenGL application, if you have an application that is 64-bit, makes OpenGL a good citizen in the 64-bit space so you can link and compile and such.
Here's a couple of examples of 64-bit compilation and linking. These binaries are getting very sophisticated. And you can see in the top example is a simple compilation into ADOT out of a 32-bit universal binary with the two arch specifications on the tail end of the compile line. So that, you know, that giving you a binary that's capable of running 32 bits in both PowerPC and Intel systems. The bottom example is showing, you know, kind of multiply that space by four-- or two, rather, so that you have four different configurations. In that case, you have 32-bit and 64-bit universal binary output, adding the -ARC PPC64 and -ARC x8664 at the tail end of the compiled link line.
So as far as invocation, when you've gone ahead and compiled your application to 64-bit, if that application, you know, I think they need a new term for applications that are not only universal, but they're 32 and 64. But if you run these applications on G4s, the invocation, the system will load the binaries as a 32-bit PowerPC. On G5, it will load and execute as a 64-bit PowerPC, which may or may not be intuitive to you. On the Intel Core Solo or Duo CPUs, it'll run as 32-bit x86, and then on the Intel Xeon CPUs, it'll run as 64-bit Intel. Well, So another thing to keep in mind for Leopard is that QuickDraw has been deprecated. So of course, your QuickDraw-based kind of Carbon OpenGL applications will still work, but there will be no future feature development in the QuickDraw space. So our recommendation is to take your application, if it's carbon-based, and using QuickDraw, and move it to using either AppKit, Cocoa, or use the HI Toolbox, HI View, for your kind of view window event mechanism, so you need to replace that.
So here are some of the changes that has happened with AGL and, you know, QuickDraw not being supported in 64-bit and deprecated for Leopard. Two kind of fundamental data structures that have been deprecated really are the key to the deprecation, if you will. That's AGL device and AGL drawables, and because those have been deprecated, anything associated with those, which is the list of eight functions below that on the screen, those have all been deprecated as well. So that's out with the old QuickDraw-centric stuff. The new data structures are shown here. These are CoreGraphics/HIToolbox data structures, the CGDirectDisplayID, WindowRef, and HIViewRef.
So here's kind of a function mapping of old to new. The quick draw based functions are all in blue, mapping over to white in the new API. So we have the AGL choose pixel format, the new API being AGL create pixel format. AGL query render info maps into the longest function name I may have ever seen in my life. AGL query render info for CGL direct display IDs. I don't know if anyone can WC-C that one for me. AGL devices of pixel format become AGL displays of pixel format, and then you have the AGL set and get drawables and the according set and get window refs below that.
So two new functions, no new mapping, are, you know, the HI toolbox-based routines, and those are the AGL set HIV ref and get HIV ref. So moving on, I mentioned earlier stereo and window. As of OS X Tiger 10.4.3, Apple introduced for the first time the ability to do stereo and window. So if you have a need in your application to have a stereo application that is desktop-friendly so you can see non-stereo content simultaneously, the red carpet's here for you in that regard. So the Quadro 4500 video card that was originally shipped with our Ford processor G5's machines when they first came out, comes with a serial adapter to drive an emitter so that you can connect LCD shutter glasses and do stereo window on your Mac.
So the execution of your logic based on state, I talked about this ability to determine where your application was running, whether it was on the CPU or whether it was on the GPU. We have a CGL token to allow you to do that, or a pair of them, rather. That is KCGL GPU vertex and fragment processing.
You can essentially send these in with a CGL get parameter call and make the determination based on, you know, the current state configuration of your OpenGL app and figure out whether or not you are good hardware end-to-end. That is from, you know, the beginning of the pipeline to the end. Things to remember about that is that if you test and see that your fragment processing, I mean, you consider these things, you know, vertex is here on-- let's see, that would be your left and right. The vertex processing occurring first and the fragment processing occurring last. If the fragment processing comes back as false, that implies that the vertex processing is also going to be rendered on the CPU. So there's kind of two pieces of information wrapped into that one if you query fragment processing. On the other hand, if just vertex processing comes back as false that means you could have kind of a hybridized setup where the rendering, you know, the TCL work is being done on the CPU, but the fragment processing and rasterization is still being done on the GPU. So there's a few different kind of outcomes you can get out of making that query.
So one thing to remember when you're using that query is that if KCGLPFA no recovery has been set when you chose your pixel format, that's essentially telling the implementation that you don't want a software fallback, period. So that obviates the need to check to see if you're running on hardware or software.
Because if you set no recovery and it was unable to render, it just won't render. It's not going to fall back to software. So there's no need to check. And the other thing to keep in mind that this is not something to use whimsically in your application. We're kind of advocating that you either use this get parameter call as kind of a development time operation so you can make a determination based on whether it's shaders or kind of logical groupings of OpenGL state that are specific to your app, you really want to make this determination either at development time or if you're going to do it at runtime, try and do so judiciously because it causes a pipeline flush each time. So you definitely do not want to throw this, am I running on hardware or am I running on software in the middle of an inter-rendering loop and then just watch your performance just tank into the ground.
So that kind of concludes some of the feature updates. And I'm going to go into some of the performance updates since the last WWDC '05 and for Leopard. So you've probably heard some of the rumorings about the multi-threaded OpenGL implementation. For a long time, we've advocated kind of decoupling your drawing thread from the logic of the rest of your thread, the rest of your application, so that you could essentially kind of take advantage of some of the latency that was occurring during your drawing commands. Well, we know that's a lot of work, and we decided there was an intelligent way for us to, for us as the implementers of OpenGL to do that for you. You know, what this effectively does is gets control back into the hands of your application more quickly and gives your application more CPU processing, you know, per unit time, more cycles per unit time.
If you do want to use the multithreaded OpenGL, keep in mind that it benefits applications. There's kind of three real things to consider. It really benefits applications that are what we consider well-written, which is they're using kind of modern OpenGL methodology. They're not flushing the pipeline all over the place by doing geoget integers and, you know, unnecessary finishes or flushes.
So that's one kind of important aspect. And, you know, the application has to be CPU-bounded to begin with, so if you have some application that's rendering these monster windows or volume rendering or something where it might be fill-limited, you know, you may not see as much benefit. And of course, the multithreaded OpenGL is doing more work than the serialized OpenGL. So if you don't have two processors to amortize it over, your application definitely is not going to be faster. It will be slower. The only question is how much slower will it be? So keep those three things in mind. The good news is that those three things, even though there are three criteria there, they're very common. Many, many applications are CPU-bounded, and many, many applications are well-written. So your application could be a candidate for this, and you may have some, you know, this may be some money in the bank earning interest, if you will.
Just flip a switch, and on it goes, right? And another thing to consider is just kind of the number of Macs out there that have been shipped with multiple processors. This multi-threaded OpenGL is not a feature that just benefits somebody that buys a new Mac Pro. It's a feature that assists anybody that has an application or a system that is multi-processor. and Apple has been shipping multiprocessor systems for many years back into the G4 time frame. So there's a lot of machines out there that can take advantage of this if your application makes the switch.
So some of the results that we've seen at Apple that we're thrilled about, Blizzard has been working with us and kind of making the most of this extension, and we're seeing around 90% gain in frame rate for an application like Blizzard. Blizzard, or World of Warcraft, rather. World of Warcraft is a very well-written application, CPU-bounded, as many are, and the frame rates are just terrific. Doom 3, on the other hand, we didn't really work with the ID guys on making Doom 3 faster. We feel it's a very relevant application because that engine is going to be so widely used. And in that case, we're seeing gains of 20% to 40% in frame rate, and that's a very sophisticated application. We and the OpenGL engineering team spent an awful lot of time trying to eke out another 1% or 2% of performance, so when we see numbers that are in the 10s, we're absolutely thrilled.
So a couple notes on what the multi-threaded engine is not. It doesn't imply anything about thread safety within your own context. So if you are one of those bold and daring people that want to submit OpenGL commands from multiple threads into a single drawing context, you're braver than I am. We're still advocating that you have a single drawing command or a single drawing thread and you issue commands from a single thread into OpenGL using that.
So the enabling of the multi-threaded engine is dynamic. you can set up your context, create your context for your OpenGL application and basically flip this on or off anytime you choose. Of course, we don't really advocate somebody like a little kid with a light switch flipping it on and off, because it's really not necessary to do so.
But if there is some reason that you want to be able to characterize performance at runtime, if your application is able to basically monitor its own performance, you want to flip it on in a configuration, flip it off, please do so. Simple Boolean test, flip it on, flip it off. When you do this, though, you have to kind of step back and think for a minute, you know, what kind of state change is this making to the context? You know, really, the engine has to, you know, kind of mirror a lot of client state and do a lot of work to transition between serialized processing and multithreading. So it's a definite pipeline flush and more. Very expensive switch. So use it wisely, please.
So as I said earlier, no reason to ever enable the multi-threaded engine if you don't have the CPUs to amortize the work that it's going to introduce. Here's a short example, simple sys control by name to determine your CPU count. And if you get a number bigger than or equal to two, then you're good to go. You can enable it.
So Apple flush buffer range, we had a number of applications that we were working with and clients that had some performance problems because when they went to go use Vertex buffer objects, they would go ahead and map a buffer and they would go and change some small portion. They have something that's four megabytes long and they go change 3K of it. And when they got done with their modifications and they unmapped it, the implementation and the way the VBO is specced, it essentially has to flush all of those changes to the GPU. It doesn't know, it essentially flushes the entire buffer rather, not just the changes. And so that creates kind of a choking amount of bandwidth consumption and is very inefficient. So Apple flush buffer range, like many of the kind of async extensions and things you've seen to OpenGL, the Mac OS X OpenGL implementation over time, give you finer-grained control of how you manipulate data. And, you know, when it comes down to it, ultimately you are the authority about how data is moving in your system. You know when you're done with something for rendering. You know when OpenGL should start using it, stop using it, et cetera. And the Apple Flush Buffer range is a very powerful extension. We've seen significant gains in applications with this extension as well. Chris Niederauer in session 219 will talk about this in depth and give you some examples of it. I've got some kind of rudimentary steps to use here in these slides that I can talk through with you. Simply specifying that using GLBuffer parameter IAPL that you do not want to flush on unMap is the first step. So you turn that off. Now we're just into kind of a basic VBO example. You map the buffer, you modify the VBO's contents, you then, rather than just un-map the buffer, you now give it the range that you just modified. Tell it, "Just flush this. Don't flush my old VBO. I didn't mean for you to do that, so please just do the region I modified." And that's the key to saving the bus bandwidth. and then you can un-map the buffer. So very simple to use, and this is kind of a-- this slide is discussing mostly the un-map side of the equation. Now as far as the mapping side of the equation, you can use the flush buffer range extension in conjunction with fences. If you know you're done with-- if you know you're done rendering with a VBO and you want to start modifying its contents, really again, your application knows that better than OpenGL can. So this is another mechanism to tell OpenGL that, you know, I want to start modifying this and don't block and wait until the VBO has been finished, you know, is finished rendering. So this is allowing you to do that.
So on with a few best practices, and this is really just kind of modern OpenGL usage, and the big theme here is the BOs. It's the VBOs for moving vertex data, very flexible memory model, allows you to characterize the particular usage, whether your data is static or dynamic, if you're copying, if you're streaming, et cetera. These are the mechanisms that get the most attention from the Apple engineering team because they're the latest and the greatest of how to manipulate data, certainly. So moving pixels as we talk to PBOs, you might want to experiment with some of this readback stuff. It's really exciting to see this stuff, this readback on some of the higher-end systems, you know, well in excess of a gigabyte a second. They also allow you to do rendered vertex array, which is really fun, as Alex demonstrated earlier. FBOs, you know, a big change to OpenGL. If you haven't had a chance to take a look at those, please do so. Very elegant, very simple mechanism for doing rendering to texture.
Use the multi-threaded engine. By all means, check out your application. Pull up activity monitor. Throw it into icon view. See if those CPUs are just jamming when your application is rendering. If they are, and especially if just one of them is, enable the multi-threaded engine and see if you can get some performance gains from that.
Take advantage of the Apple Flush buffer range like we talked about earlier, and use Color Sync to make sure that your colors are correct. Color management's getting more and more important every day. As you saw, the OpenGL ARB introduced sRGB textures into the specification, and, you know, kind of similar to that kind of view bounds thing I alluded to with the move to high DPI, color management is not just important for people that are doing high-end photography or processing of raw data images. It's becoming relevant for people doing any kind of graphics processing. Apple has a fabulous ColorSync API, and you can find out more in this TechNote 2035 on the developer pages. So I think you'll find some great information in there.
So Allan Schaffer is our 2D and 3D graphics evangelist, and you can contact him for more information. As you may have seen in some of the other sessions, documentation, sample code, et cetera, are at the URL below. And that concludes our session. If you want to do a little hands-on work, the OpenGL lab is at 3.30 today. It runs until 6 o'clock. Many of the engineers that will be manning the lab and kind of helping you with some of your questions are here today. Okay?