Optimal 2D Graphics - WWDC 2006

Graphics and Media • 52:31

Learn techniques for ensuring that 2D drawing within your application takes optimal advantage of the Mac OS X graphics architecture. We'll cover best practices for application drawing, optimizing screen updates, and detecting and removing unnecessary graphics processing overhead. This is a great session for all developers interested in maximizing the performance of their application.

Speakers: Ralph Brunner, Assana Fard, Andrew Barnes, Ken Dyke

Unlisted on Apple Developer site

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.

And welcome to the session Optimal 2D Graphics, which will conclude our evening. So my name is Ralph Bruhner. I manage Quartz 2D Graphics. And here's the entertainment for the coming hour. So we will talk a bit about core graphics architectural changes in Leopard. We'll talk about how to do performance tuning. That's going to be a bit about power consumption and why that matters. And we also talk a bit about color matching and then a bunch of pointers to other places.

So first topic I have is 64-bit. Essentially, you heard that Leopard essentially has all the frameworks in 64-bit available. And as a core graphics client, what 64-bit mainly gives you is a really big address space. And one of the things you can do with that is to render to large bitmaps. Now there's an interesting problem that if you have really large bitmaps, let's say more than half a million pixels wide or tall, that a single precision float is actually not sufficient to describe the coordinates accurately enough to do anti-aliased rendering.

So to address that problem, we introduce a CG float type in our API, which replaces the previous version, which was just a single precision float. And CG float in a 32-bit process is a single precision floating point number, while in a 64-bit process, it's a double precision floating point number. So not only can your pointer address more memory, the actual coordinates you use can address the same memory. That's essentially the idea. So if your app is building for a pre-Leopard OS, just use float because it's naturally forward compatible.

Excuse me. So one point. Somebody's talking behind me. Okay. So one point I would like to make is about performance with 64-bit. On PowerPC, 64-bit performance-wise is pretty much a wash. So your 32-bit application in regards to CG will run about as fast as the 64-bit application. However, on Intel, there's actually a performance gain.

And the main reason is on Intel 64-bit, you have twice as many registers so that the compiler can produce better code gen. And you have slightly modernized ABI, which makes function calls cheaper. So we have measured up to 10% speed improvement just going from Intel 32-bit to Intel 64-bit by doing things like rendering PDFs and so on. So you might want to consider going to 64-bit if that helps you. But the summary of a 64-bit is that it's pretty much a non-event for core graphics. Everything just continues to work as you would expect. And there is very little work for you to do.

So the second topic I have is, well, we made the bitmaps bigger by having a bigger address space. We also made the pixels deeper. So before Leopard, you had the ability to have a CG image, which references 16-bit per component images, among the all other formats that were supported. In Leopard, you now also have the ability to create a bitmap context that renders into 16-bit per component.

And the main clients for these kind of operations are scientific visualization and pro photography are using 16-bit a lot. So now in Leopard you actually are able to take a 16-bit image that comes from a high-end camera and scale it, render it into a 16-bit context and not lose any fidelity there.

So now let me talk about the bigger changes. So every now and then we go and essentially replace a subsystem of the operating system with kind of a new implementation. And that's usually kind of a disruptive change, lots of bugs that come out of these operations. And what that does, it kind of changes the performance characteristics of that subsystem in a more radical way.

And of course the hope is that all the things that are going to be important in the next five years get, you know, performance gain. Well, we don't really want to break anything, but everything that used to work stays more or less the same and nothing bad happens.

That's the hope. Well, if chosen right, this is kind of the gift that keeps on giving. And as an example, I would like to point out Quartz Extreme. So a long time ago, it was, well, just after the discovery of FHIR. But before that, you know, invention of that wheel contraption, we introduced Quartz Extreme in, well, Jaguar 2002. And the idea was to take the software composite that we had before and put that, all the compositing work that the Windows server is doing, onto the graphics chip.

And well, it was so great that today they even sell you ice cream extreme, so that's kind of validation there. I guess we will have to have a MaxiBond product at one point. So as an example, a year later, I've introduced Expose. And Expose kind of made use of the ability that you have all these windows as textures on the graphics card, and now scaling them and moving them around is much cheaper.

Well, two years later with Tiger we introduced Dashboard and it kind of uses the same capabilities. So it now uses more translucency because, well, it's kind of translucency is almost free from the client's perspective. And, well, having windows with arbitrary shapes and complex alpha channels and so on is just working. And in LabBird, we are kind of milking that feature again with Spaces, where we now have several desktops full of windows, and we can seamlessly move those around. So with that, I would like to ask Assana Fard up on stage to give us a demo of Spaces.

So Spaces was designed to increase your display real estate without really discombobulating your desktop. So I'll go over some of the features we have and a little bit about navigation and preferences. So the general view is great for arrangement. You can pick your windows around, you can reorganize them, move them around. Exposé works if you need to see more of your spaces. You can pick one, again, move between spaces. One of the nice things is you can pick up an entire app and move it around. You control click and we move it.

We actually snap the windows back to where they used to be so you don't have to rearrange again afterwards. You can arrange the spaces themselves. Just pick one up and put it wherever you want to be. I'm going to move this guy back there. A little bit about navigation. This is, of course, a great mode to navigate. Tap moves around your spaces. Arrow keys work, so on and so forth. And you can just unzoom to wherever you want to be.

At your desktop view, it would be nicer to navigate a little bit faster. So we have programmed the arrow keys to spatially move over your matrix of spaces. Your modify key is programmable. I have mine to control, so control arrow just moves around your spaces. And similarly, your number keys work. So if you know exactly where your Xcode project is, you just go control one. Oops, it's not there. It's right there. Or four. And you can just kind of jump between the spaces.

Another way to actually invoke a space switch or space change is by dragging your window. You just drag the window to the other side. I guess I'm not in the right place. There we go. Let's see. Oops. I guess my dragging doesn't seem to be cooperating. But anyways, you could pick up a window and basically move it to your space and see if these guys work a little bit better. There we go. So we switch the space and we bring the window with you.

And you can also actually on that same note, you can hold down your window and just go somewhere else and your window comes with you. Now, a couple other things I want to mention are about the preferences. We have a preference pane in Exposian Spaces. And you can add rows and add columns and remove them. And of course, everything as usual is live. The other thing I wanted to mention is if you remove a space with apps in it, don't worry about cleaning it up. We move everything over. We always remove the right column and the bottom row.

So we roll things up and roll things over for you. The other thing I wanted to mention is this concept of binding your application to Workspace, which means at launch time or every time you open a new window, your application will open in that particular space. So you can do that a couple of ways. You can either add an application.

Let's see, let's say I wanna add my, gosh, let's say iChat. You open it and then you can just go in here and decide wherever you wanna put your application, Workspace 3. The other way is by command dragging out of your dock. So here I wanna put my Safari in Workspace 1. and as we go, everything is there.

One of the things I wanted to mention is, again, let me go back in this mode, everything we do is live, and our windows are alive, and the apps are rendering, and we're drawing, and everyone's happily going about their business, which is something that we can do because of Course Extreme. It really wouldn't have been possible without it. So with that, I'd like to introduce Andrew Barnes, who's going to talk about Course GL, our next generation acceleration. Andrew. Thanks, Assana.

So what we're really going to talk about at this part of the section is about QuartzGL and just general optimizations that you can do with your 2D graphics. And as my title suggests, I'm the person who makes pixels show up on the screen. So anyway, let's just start right away because we've got a fair amount of material.

You've seen this architecture slide many times before. Basically, your applications draw either into DRAM via the window backing store or via surfaces with 3D or video content. They all get mixed together by the Quartz compositor and displayed onto the frame buffer, which gets scanned out to the display.

So obviously, this session we're not going to talk about 3D and video because this is 2D graphics and Quartz. The second thing we're not going to talk about is Quickdraw. So hopefully you went to the Quickdraw session last year where they basically told you to use Quartz. So we're not going to talk about that either. What we are going to talk about is a little bit about QuartzGL.

QuartzGL basically draws into VRAM instead. It's a very simple version of VRAM. It's a very simple version of VRAM. And everything happens like you think it should happen. So what's QuartzGL? Well, QuartzGL is an implementation on top of OpenGL of the entire Quartz 2D API. And its main goal is to offload all the rendering onto the GPU where it should be.

And as a side effect, it also minimizes your DMA transfers that you go back and forth between your window backing stores and your GPU. In addition to that, it also allows you to have a more efficient or more integrated model when you're dealing with a lot of the graphics.

So it's not just about rendering, it's about rendering with other hardware accelerated systems. For instance, Core Animation, Core Image, OpenGL. Those stories slightly change as soon as we start using the GPU all the time. So yeah, we went off and did this. We took some time to do it. And we learned a bunch of lessons along the way.

Basically, a lot of applications, the majority of applications, rendering is not the bottom line. You do samples of applications and they're spending time touching a disk updating preferences and stuff like that. So that's one thing we really did notice. We also noticed that people didn't use the Quartz 2D API. They used the Quartz 2D API in a very efficient way. And that, unfortunately, turns out to be a little bit more costly for the GPU scenario.

And the third and last thing that we did learn is that once you start doing efficient programming using Quartz, you can really fly. So let's take a look at some of the details. So basically with a 2D guy, you're drawing into your backing store at system memory speeds, and we do that at 5 gigabytes a second. Once it's in DRAM, the DRAM gets uploaded to VRAM at 2 gigabytes a second. With Quartz GL, the formula change is slightly different.

You're actually packing command buffers at 2 gigabytes a second, which are much smaller than the actual data that you're actually blitting if you were to use Quartz 2D. And your actual blitting speeds are like 30 gigabytes a second. So, you know, it's all really basically about the bandwidth. So here I had a scenario where basically, you know, I wanted to show some numbers of how fast it could be. And this chart shows basically the cycle of I want to draw a rectangle, 256 by 256 rectangle.

And then I want to tell the... I want the Windows server to flush it. We've put the Windows server and everybody in benchmark mode, so we try to minimize any kind of blocking that you would see. So in their example, in this example, you see the MacBook Pro and the Mac Pro using Quartz 2D would basically start uploading huge amounts of data, DMAing via the backing store, whereas Quartz GL, there's no backing store, so it's zero. Command buffers in Quartz 2D, they're smaller than with Quartz GL. That's because we've translated all of the hard work of that 3,300, 500, gigabytes a second into, you know, 16 megabytes per second, right?

So a lot of things has changed. And as a side effect also, you also end up using less CPU. Quartz 2D uses a lot of CPU. Quartz GL uses less because all it has to do is pack quads. And the grand total at the end is basically a 2X for your MacBook Pro and for your Mac Pro, you know, 4X. These numbers are obviously pretty preliminary. But basically, that's the bottom line. Use QuartzGL, you can get to the display faster.

So now we're actually starting to look at the nitty gritty. What we're trying to do is we're trying to draw a bunch of images, right? Same size images, you know, a whole range of images. And we say, okay, great. What we're going to do is we're going to give you your application 100% of the CPU and what can you get done in that time? So obviously, as we can suspect, the software just gets slower and slower because you're actually just ending up filling more and more memory.

Whereas in the hardware case, well, it takes constant time to pack a quad. And that's it. So that's all great. And I'm sure a lot of surfer dudes out there, developer surfer dudes are probably thinking, dude, you know, what are you thinking? The GPU doesn't have infinite bandwidth.

How does this really work out? You know, I'm going to have to block some ways. Well, that's kind of true. You do have to block. And for the software cases, which is the yellow and red, you see the usual curve. You start going down. But, you know, the GPU, depending on how fast your GPU, i.e.

the faster GPU being the blue line, it actually takes a constant time. that it starts to end up stalling on the GPU. And the faster your GPU, the more that line, like, let's say, take the blue line, would continue out and bulge over, you know, to overlap and become faster in software. And then, of course, some of you are actually thinking and looking, well, you know, on the slightly less capable GPU, it looks as if the GPU is kind of slower than the CPU. Well, that's... Sometimes that can be unfortunate, but it's not always true.

Here you have to look at the other perspective. The other perspective is that even though you've consumed all this time on your CPU and you're getting more or less the same work done, how much CPU are you consuming? The software case consumes 100% of the CPU in order to do those blitz, whereas a hardware case, it gets less and less. So guess what?

Use QuartzGL and you get back all your CPU again. You can start going and touching the disk and flying off and doing all the strange things that you want to do. So that's the advantage. So now let's basically say what's going to happen. Enough with the silly benchmarks.

What's really going to happen with my application? Here we see a bunch of examples of resizing because live resize is probably a very taxing scenario, and you will get good exercise of your API by doing it. So there's a bunch of scenarios where you're resizing mail and resizing text edit, Xcode, scrolling, stuff like that, and we're seeing pretty decent gains. Personally, I'd like to see those numbers a bit larger, and probably they'll end up being larger at the time Leopard ships. But that's basically what happens. Okay, so great. That's one application. That's one application. Cool. Let's add a little bit more complicated scenarios.

Here I have a simple test where, you know, we got Atlantis spinning away at 800 frames per second or whatever, and we got, you know, four PDF documents all scrolling with transparency. We got a bunch of different things going on, and this is what this really looks like. On your MacBook Pro, you can see the DMAs are pretty large, and you see your CPU usage across all the board, both with Quartz GL and Quartz TD being pretty much constant time.

But guess what happens? You're actually updating almost to, you know, 1.5x, or 1.5x. So you're actually getting more frames out. Of course, this is in benchmark mode, of course. And, you know, as you see, as usual, the DMA command data slightly goes up, but, you know, but you get rid of all of that backing store DMA going up also. So that's good.

So now what we're going to do is sort of move into little practices that you can do to play nice with Quartz GL and Quartz. And these optimizations that I'm talking about basically apply to both systems. They're good. Once you start practicing these, using these tips, you basically go a lot faster on both the hardware and the software. So let's start off with some fundamentals.

Basically, it doesn't matter what system you're on, graphics system or otherwise, minimizing your state changes is always a good thing. It basically amortizes the cost of a state change with respect to your operation. So you can set your black color, and then draw a whole bunch of black text. You set your blue color as opposed to going ping pong, ping pong. The second thing that you also need to consider is drawing less, right?

I mean, you guys know what is going to be shown, what is going to be there. So discarding things is a good thing. Don't just say, oh, I've got a scene of 50,000 things and just draw it and expect us to do really fast stuff. And second thing, you know, you always have to be very cautious about memory consumption. Memory consumption is a big thing. And the point underneath there is basically that every page you make is potentially a page that needs to be paged out.

So yes, you developers might have two gigabytes or four gigabytes in your system, but for people, you know, your usual app, your usual mom and pop shop, it's going to be, you know, 512K, 512 meg. So, you know, you have to be cautious about how much memory you consume. And the second two points are more global. They're about knowing what you're dealing with.

Are you dealing with lots of images, small images? You know, how complicated are the images? Are they all, you know, raw camera images, stuff like that? You have to understand the type of data that you're dealing with. And the last point is, you know, as usual software is an evolutionary process and, you know, you shouldn't be afraid to evaluate something today and move on to the next, you know, because things have changed. So.

First key goal, key point, key tenant, key everything is reuse your objects. You've heard this before. Basically, your object refs are keys into internal caches that CG uses. You just have to hold onto them and we will, you know, end up, if we need to draw something or color match an image or any of those things, we'll just do all the usual stuff.

And it, you know, you don't have to do anything except hold onto it. It also has a bonus point of if you have an example of, you know, 100 images that you're trying to put in a PDF file, there's actually only one image. If you give us a new image ref, we're just going to write out that image 100 times as opposed to just using one. So, you know, in all those cases it, you know, basically cuts down on overhead and doing complicated work if you hold onto these image refs.

So we can go back and look at, you know, cached entries when you give us the image ref again. And, of course, using the tenant of, you know, reserve your, be cautious about your memory consumption. If you don't need something, just let it go. We'll delete all the entries in the cache, you know, so, and you can reload it later on.

So now let's jump into the type of objects that you're going to be caching or potentially using or which will give you a benefit. Path refs. Always use path refs. Path refs are much faster than constructing your path all the time over and over again. Just hold on to the path and use it.

The other thing about path refs is that if we've already scan converted the data, we already know that we don't have to do this work again. So you give us a path ref, we translate to our cached entry, bing bang, you go fast. Last point about paths is that you have to be cautious about using very, very large, complicated paths.

If you have a path that consists of many subpaths and the subpaths are geometrically disjoint, it might be more efficient for you to render the disjoint subpaths independently. So watch out for that. So lines and rectangles. Like paths, but we're going to concentrate more on stroking lines and filling rectangles because a lot of people actually fall into that class.

They've got a CAD application or they've got some wireframe thing. Use this API. It's much better. It is the fastest way to draw lines and rectangles on the display. The second thing for those of you who do would find this method more efficient, you also have to be concerned about what you're dealing with.

Are you dealing with massive wireframes? Are you dealing with massive scenes? Get rid of the objects you don't need to draw. Get rid of the objects that are not going to be seen. If you're standing at 60,000 feet, you're not going to see a fire hydrant, so don't draw it.

I mean, that's basically. Other things, too, you might want to reorganize your data structures to maybe help you on your way of deducing whether or not you need to draw something. Quad trees, octrees, they're all there. They're known stuff. You can look it up on the web. You can Google it. So, yeah, there's lots of different algorithms that people have been using.

The next thing is a little bit more about trying to cut down what you draw. You really, you know, as I said before, you really don't want to throw a whole bunch of stuff at something thinking that, "Oh, somebody's going to do all this clipping." If you can pre-clip ahead of time, like for instance that text block, you know that that text block is the only thing that needs to be updated. Then just, you know, draw the text.

Why bother, you know, draw the whole diagram with all the images? So definitely if you have things, you can do trivial culling of your objects, you can do trivial clipping of your objects, all those things, just trying to minimize what you end up giving to us to draw.

The second thing that you want to also do when it's with respect to minimize clipping is that we have lots of clipping primitives that you actually, you can use, and you can use CG clipping to do the clipping for you. But as I said, the less data you give us to clip, if you know this stuff ahead of time, you know, the better it is for everybody. So that's all cool, but what happens if you want to do something a little bit more complicated? You can't actually specify a clip path, right? You actually have per-pixel clipping.

Well, the method of doing this, which dodges a whole bunch of complicated clips, even if you have complicated geometry, as well as per-pixel clips, you can use this method. You basically create an alpha context, and you draw what you want to draw, which is going to be your clip.

That represents a stencil. You take that stencil, and you have two choices. Either you can take an image that you want to draw, in this case it's a wood texture, and you can mask the image, or you can just set the clip to be the clip, you know, set, you know, CG context clip to mask, and that becomes, that gets integrated with your clip. And for a silly thing like that, where I have a mask and a texture and a little highlight, we end up getting something that looks like that.

Well, that's pretty cool for 2D. Yeah, okay, great, the ball doesn't actually, the texture isn't actually wrapping around the ball, but you get my point. So now, let's talk about things where, which you can provide color. Well, CG color refs, always use CG color refs. They're faster than actually saying, CG context set RGB fill color, blah, blah, blah. Those color refs you can use and reuse them again.

You hold on to them. We know the color is there. We don't have to rematch, et cetera, et cetera. One other point about it is, actually, you should have a color ref, but you should probably attempt to use the color that is appropriate. It's not always true that red, green, and blue equals the actual gray. There are lots of situations where that is actually not true. Pattern refs.

Pattern refs are much faster than tiling drawing yourself. You set up a pattern ref, and we go in, we want to fill something with your pattern ref, we execute one tile, and then we get that tile, and we rasterize that tile, color match it, put it in a cache, and whenever you reuse that, fill an object with that pattern, we just use the cache to entry and replicate it. So it's always faster than, just continually to draw, draw, and over and over again.

Shadings. Shadings are also another thing. The star in the middle is a shading from white to blue, or a darker blue, and if you, you know, it would be much better for you to use shading APIs. A single call, you just draw it, instead of actually going and trying to put those pixels one by one, or creating some image or something. So shading refs are good.

They're also a more efficient representation in PDF and PostScript. So, you know, that's a big win for using shadings. But more specifically, about shadings and caching. The shading isn't really the entry that's actually cached. It's your function. Shadings are basically a geometric transformation of your function to your display. So, yes, if you hold onto your shading refs, we'll keep and say, "Yes, we've rendered this rendition before."

But what's really more important for you to hold on is your function ref. That function has to get sampled and color matched to the correct color, and then used. If you hold onto your function refs, and you reuse your function refs across multiple shadings, you get a benefit because you more or less would be hitting the cached entries all the time.

The last point on here is about alpha. Yes, alpha can sometimes be consistent. It can be considered to be free in the hardware case, but it's a little bit more expensive in the software case. You should try to, you know, if you have situations where you have a color and you're trying to change the color, do a gradient or something like that, setting the alpha is a much, much cheaper way of doing it.

You got your simple black color, you're drawing a gradation, just say, "Set alpha, set alpha, set alpha, set alpha." And you'll get the same effect without us having to go through and, you know, get conniption fits about trying to match colors and do all these different things. So definitely, you know, that's a little technique to do.

Similarly, you can do it for any object. You can do it for any object, you know, if you have a color that's alpha, you draw an image, it'll come out 50% shaded. Okay, so let's talk a little bit about images. This is the other source of data, right?

Things you should know about images. What are good component sizes? Component sizes, well, bytes, shorts, and floats. Ralph did mention that we added a new 16-bit bitmap context, but what he also didn't say that for Leopard, we've actually increased the actual pipelines. The pipelines are now byte pipelines, and they're also short pipelines and float pipelines.

So once they go through, they get color matched, interpolated if you have a float image, it gets color matched, interpolated, masking, everything happens all in floats. So the final image that comes out on the other end, which you see in your bitmap context, is the actual closest representation to this source that you provided.

Other things that you should notice about images is people always ask, "Well, what's a good bytes per row value?" Well, you should probably use pixel multiples, start there. Vector multiples are also good. Scanline alignment, that's also good, too. So you can pad things out or not. Color spaces, if you don't know what your color space is, you should just use generic RGB. Don't try and get funky on us. And the second thing about just let us do that, and we'll do it for you with ease. That's all we do.

Also, as I said, avoid needless coercions of your data. Don't coerce data from gray to CMYK or whatever. Just leave it as it is. Tell us what the color space is and we'll do all the hard work. The last thing, point on the slide, is about good image performance.

It's always good to use the destination color space for best image performance. We look at it, we say the source image is this color space, destination is this color space. Oh, nothing needs to be done. We can just blast that data straight into the destination. So, find your destination color spaces and you'll go faster.

More about images, what you should watch out for. In Tiger, we actually introduced a new constant for to handle little Indian data versus big Indian data. You really should use these cautiously. Use it for what makes sense. If you actually have some kind of pixel loop running over your data and you're looking at things in terms of words, like 32-byte quantities, then yes, turn on the flag. But if you already have big Indian data, meaning just red followed by green followed by blue, just leave it at that.

We'll do all the work. The second thing you want to be cautious about when you're dealing with these constants is you want to set up your byte order constant sizes to be a multiple of row bytes. You don't want a swap unit that crosses a scan line. That just causes us to do more work. So, if you could just pad it out and say, "Okay, great. I've got something that's got to be a multiple of 32 bits," then just use that constant instead. More about alpha.

Basically, alpha, anytime you have alpha, you imply blending. Blending basically cuts your throughput in half, no matter whether it's a GPU or a CPU. You have to read the source, read the destination, combine them, and put them back in the destination, as opposed to just an opaque copy. So, watch out for alpha, or use it sparingly. Don't create images that have FFF in the alpha channel.

Just say, "Skip alpha," and we'll do the rest. Premultiply. That's also another one. Premultiply when you need to. If you need to premultiply, do premultiply. If you don't need to look at the data, don't bother premultiplying for us. We know how to do that. It has performance penalties when you're dealing with color matching.

Basically, we have to un-premultiply the data in order to match it and then apply it to the blitters. So, if you have a problem with color matching, you can do that. If you have a problem with color matching, you can do that. If you have a problem with color matching, you can do that. If you do it more than you need to, it's going to be a little bit more expensive. Yes, okay, things are cached, so you only do it once, but avoid it when possible.

The last point is about image interpolation. Image interpolation is basically used to control what you see, how well it does its work. You want to use it. They have very different quality versus speed performance characteristics, and you should probably choose the one that's appropriate. High, medium, none, low, default. All of those things. You find them in the headers and in the documentation.

So, okay, great. We have images. We told you what to do about doing things with images. Well, what happens if I'm bumping into some scenario where I know I'm going to blow CG's cache, or I know I'm going to have really complicated things? You might want to consider caching. Now, caching, you have to be very cautious about caching, right? You have to make sure that you've measured all the effects of your caching. You don't want to have a situation where you're allocating tons and tons and tons of images, and they're basically hanging out in memory.

So, measure the performance, measure the memory footprint, and try to be, if you're going to go down this path, be committed to managing your cache efficiently. Don't say, I'm caching, and then just call it a stash, where you just create images and keep them as memory, expecting the system to page out your memory. Manage it effectively. Things that aren't being used, evict them. One last point about using this type of approach and managing your own cache is that you can use data providers to your advantage. They somehow decouple.

You can do VM tricks and stuff like that, where when you first get the draw, you get the notification, access the data provider, you make the memory. After the image is used, you get a notification, and you can basically tell the VM system to kill those pages when appropriate, make them volatile. And somebody draws it again. You go back, we see it, you can use a VM trick and say, "Have any of these pages been evicted?" If the answer is no, you've got the pointer.

There you go. Otherwise, you go and do your decompression. So, if you guys have any of these, you know, want to get into any of these things, come and talk to us in the lab. We're going to be doing a lot of work on that. So, if you guys have any questions, you can come to the lab after the session, and we'll be sure to help you with these things. The next source, images, we're moving on to layers. Layers are another source of data.

And they're basically device-dependent representations of your drawing. We really, really would like you to use it for complicated drawing, not for simple images. Don't just create a layer and draw an image into it and then use that. If you draw the image, the actual correct thing will happen. We'll rasterize it, put it in the correct form, upload it to Texture, and you just use that, as opposed to using a layer, because we have this memory, and it's a hard allocation. You tell us to allocate the stuff. We have to keep it around.

So, definitely use it for complicated drawings. Also, layer rests are kind of convenient because it hides all the destination details, right? If you're looking at a bitmap context that's floating point or a window context that's a floating point window, right, and you draw into it, you ask the context for a layer, you'll get back a layer that appropriately matches that device. Instead of you having to dick around with trying to find out, "Oh, is it a float component? Is it reverse-Indian?" What's the problem?

So, just definitely use layer rests whenever you have things like that to do. The other last point about layers I wanted to mention was that I've bumped into a bunch of different scenarios where developers are actually creating layers that are really large. They have this scene. They're zooming in to, like, what, 10,000%, and automatically the spinning beach ball of death comes up because basically you've told us to allocate, you know, two gigabytes of memory. So, be cautious about what you're actually doing. In terms of when you use layers and what you're allocating. So, try to be cautious. Memory consumption. Always keep in your mind.

So the last point about layers is that we've, for Leopard, snapshot a sheet of acetate where you can just pop a layer, you draw it, and then you can get the entire thing, like let's say you call end, and you can fade everything at 50% or you can add a shadow style or whatever.

If you have situations like that where you're actually just drawing transient data, you have one of two choices. You create an off-screen layer, you draw into it, and then you draw your layer with your effect, or you just say begin transparency layer, draw your content, and then end transparency layer, then it gets composited as a destination. So you want to try and look at those APIs and use those APIs when that's the type of scenario you're in.

One last point, set your clip path before you begin your transparency layer. If you don't tell us what you're going to draw, we're just going to create a layer the size of the destination. So setting your clip path before you begin your transparency layer. the path restricts basically how much data we actually make.

Okay, so now we've sort of looked at all the different objects that will help you go fast both in the 2D and in the 3D environment, or 2D and QuartzGL environment. Now we're going to take a look at some other meta-level type optimizations. Drawing on pixel boundaries, you know, sometimes they can give you good results. You draw a rectangle, you don't want the rectangle to be on a fuzzy edge, you want it to be a solid edge, you know, a nice clean line.

You basically take your, get the context's user space to device space transform, you get your array of points or whatever rectangles. You transform them to your device space, and then you do whatever you want to do, round them, seal them, whatever you're fancy. And then you inverse transform them back to user space. And those will be the coordinates that if you use those coordinates, they will actually end up hitting the pixel cracks. So that's how you get to find these things.

Another thing that makes things slightly more complicated is that, you know, people normally think, one, one user space point is equal to one device space point. With high DPI, that's a very different scenario. So as high DPI comes online, you may want to do these things so that you can get your things to snap on grids close to each other without, you know, having us do these partial pixel coverage calculations.

So the second thing, which is the last thing about having demands, library size. Library size is probably the most taxing thing your application can probably do when it comes to user interfaces. There's several scenarios where you can try to give a user a better experience during library size, and one of the points is basically consider periodic updates. Periodic updates, an example of that would be mail.

When you resize the mail, the mail window, what happens is they take a snapshot of what's on the screen, and they fill in all the exposed area, and then when the user drops the mouse, then they actually do the full layout, right? Text layout. If you have a large document and you're resizing the document, yes, layout is expensive.

You might want to either cache your layout or just hold on to the current layout and wait until the user releases the mouse. An example of that would be text edit. Really large documents in text edit, instead of relaying out the document at the new page size, it basically takes a snapshot or keeps the same layout. Resize as you wish, and then when you drop the mouse, it does the full layout at the end.

Another example would be to draw lower quality results. Lower quality results in some applications, for instance, text edit and mail, you don't want your text to look fuzzy. But with preview, preview is an example of using lower quality results during resize. When you live resize a preview window, what actually happens is that the image itself is being drawn with a low quality result. So we're just sort of giving the user, oh, this is what it's going to look like without actually doing the correct color matching and calculation. Well, not color matching, but all of the extra work involved in down sampling it to the exact right size.

When you release the mouse, you see a clean version of the image at that resolution. Last point on here is even though you do all these optimizations, one killer is always blocking your UI because you're touching the disk or running off and doing something. During a live resize session, you probably don't want to do those expensive things like updating preferences. You know, and touching the disk and getting network I/O and stuff like that.

So you defer that until the user releases the mouse and says, yes, this is what I want you to do. So I'll say, how fast can I give the user the feedback that they want from the live resize without actually doing really expensive work? So that's what this is really about.

That was sort of like the conclusion of the optimizations. And now, probably thinking, well, geez, how do I turn this thing on? You can use course debug to turn it on, or you can actually turn it on on your application, right? One of the methods is to basically put an entry into your info P list, which says enable or disable. You want to be explicitly sure that that's what you want to do. You want to either disable it, or you want to enable it. If you don't know what you want to do, just leave it alone. Don't put the entry.

The other method of getting hardware acceleration is to use the context set allows acceleration. You can use it on a per context level, but you have to be sort of cautious about using multiple contexts, because you can have one context trying to draw something with acceleration set to false, and another context set with acceleration to true. What actually happens is that when the acceleration false guy draws, we basically say, oh, somebody wants software access to the bits. We pull the data off of the GPU.

We pull the data back to the CPU, and the application continues to draw. At that point, the window remains in that mode until you do the next flush. When you do the flush, the data gets pulled back on the GPU, and we use it. So basically, that's why we took out that quick draw thing, because the moment you use quick draw, you basically decelerate your window, right?

Great. So what I want you guys to do when you go home or when you leave the session is try and look at where the situations are where you're not using the Quartz API efficiently or optimally and see if you can make improvements to try and change it such that that's what happens. You can use Quartz Debug to turn it on or you can turn it on in your application if you've done this stuff.

But basically the bottom line is if you see an advantage, yes, great, my application flies. I get 30% more, 50% more live results. I get 30% more live results. If you see a scenario where things aren't quite working out, go in, dive into the details and say, oh, yeah, I forgot to hold on to my image refs. I'm always uploading, stuff like that.

Go in, optimize your code, retest it, try and find out whether or not you can turn it on. If you're in a lose situation, just disable it and the software guy would be just as good. So, and the last point basically is if you don't see an advantage or a disadvantage, just leave it alone. Don't put an entry in it. You know, we might do something about that.

And also, don't forget to test on various VRAM configurations. Just because you have a machine that's got two gig in it with 512 megabytes of video RAM doesn't mean it's going to run the same on something like a laptop. So be cautious about the type of considerations that you're going to do when you say, yes, I want to enable it. So what I'm going to do now is bring Ken up. and Ken will talk about Quartz compositing and co-release updates.

Yes, sure. There we go. Okay. So as it says, I'm the guy to blame. This was my idea and partly Andrew's. So if this is causing you problems, you can scream at me afterwards. I'm not going to spend a lot of time on it. Andrew went over it in gory detail last year. But I want to make a couple of points.

Basic idea here is that the Windows server for various performance, power, pick a reason, wants to limit the number of updates that hit the display. The user, if you're on like a regular flat panel and it's only 60 hertz, that's how you're going to get a lot of updates. That's how fast the user can see stuff.

So if you've got some app that's trying to pond away and flush 400 times a second, that's bad. The user can't see it. You're wasting resources, wasting GPU power or CPU power, wasting battery life, whatever. So basically the Windows server will try and coalesce as many updates that happen in a single VBL period down to just one single update. So if you've got video playing on one window and the little spotlight icon is in the upper right-hand corner flickering away, those basically hit the screen together. at the same time.

[Transcript missing]

Some stuff here. So, these tools over here were put in last year so that you can basically see how your app reacts under different scenarios. So one of the things I'm going to do is actually run Xbench. So you run it in a normal scenario. I'll just start and let it fly through here. This will take just a minute. It looks like it's running pretty fast. You know, no problems. It's flying along. I wish I could do just lines or something. By the way, is the Xbench developer guy here? Drag, okay.

So then we'll have the numbers here. So, come on, one more test. Okay, so not bad. This is a 17-inch MacBook Pro. Okay, numbers. So, but what happens if we turn off Beam Sync? What kind of numbers do we get? So I'll run this one more time. And you can see now that there's tearing going on up there. I should be able to anyway. And you can see that the numbers are already popping out about twice as fast. So you're like, well, wait a minute.

Like, what am I doing wrong? And I know there was a change made to XBench. It was in one of the release notes that said, well, you know, stick in a timer to basically deal with this coalesced updates thing. One of the things now you can do is if you, in Quartz debug, if you force beam synchronization on and request the good old, you know, see all the screen update flashing stuff, Quit this, there's a bug I'm working around here real quick.

Now, if you set up Quartz Debug in this particular mode, we'll flash the screen blue any time the app basically tried to get a lock on the backing store and couldn't because coalesced updates were in the way. We kind of do this little trick behind the scenes and let you continue anyway.

But then that flush that goes out will flash blue. So that's a good way to tell you, by the way, you're basically hitting the Windows server limit. You guys should be able to see all the blue that's happening up there, saying, hey, wait a minute, you're trying to flush and coalesced updates are slowing you down.

So this is something that was added for Leopard. You can try it out, give us some feedback on it and see if you like it or not. The other interesting thing I'll point out is there's been some hubbub on the net on why are all the Intel machines running so slow when we do the UI test in XBench?

Well, it's hitting deferred updates. If you leave it in the normal mode because it got opted in and you run it, it's just kind of do, do, do. Not the--

[Transcript missing]

You can-- so I guess that the summary from Ken's part is really, if you're writing a benchmark that measures drawing throughput, make sure you don't have a weight in there.

But waiting for VBL is kind of in play, an implicit weight. Okay, so let me talk about a few remaining topics. Power. So when you think about optimization, power consumption is usually not the first thing that comes to your mind, but I would like to just point it out and make sure you remember. So why you should care?

Well, the majority of Macs sold, actually for the last few years, are notebooks, and most people in this room have a portable computer, and to all those people, battery life matters. Now, the good news is, general performance optimization is good for battery life, too. If you finish quicker, well, you use less power.

So I would, however, also like to point out that QuartzGL actually has an advantage in terms of power consumption. So the numbers I'm having here is scrolling. In Safari, I used the Slashdot website on a MacBook, and I used the power meter, essentially took the battery out and used the power meter at the core to measure how much power it pulls. And the MacBook, when it's idle, it pulls 16 watts.

And then, during the scrolling test, without QuartzGL, it runs at 22 watts, and with QuartzGL, it's 20 watts. Now, 2 watts there, that's about 10%, and you might think, "Well, 10% isn't that impressive." Well, 10% on that MacBook is 25 minutes of battery life. And so, this stuff matters.

And... So if you have a long-running application, you might want to consider opting into QuartzGL just for that reason. Long-running application, I mean, is something that draws a lot, and typically it's something that's in the background. One example is you have a little clock that you put on your desktop.

So the point is, if you're building a clock, you probably don't run into responsiveness issues. You know, however complex your clock is, you probably are able to re-draw the entire clock in a second that you need to update secondhand. However, if that clock needs 5% of the CPU or 1% of the CPU, it will make a difference for laptop users. So please keep that in mind.

The next topic I would like to mention is color matching. So a lot of images today get drawn to the screen not tagged with a color space. And what Core Graphics is doing, it just says, well, if it's not tagged, it's probably the display. And you kind of get away with that because most of the displays that Apple sells are very consistent. They all have very close color characteristics. But that is kind of changing.

So people today are attaching a Mac Mini to a TV, or you have third parties that sell you wide gamut displays. And if you draw untagged images to these displays, well, funny things happen. Like if wide gamut displays, all your colors are neon. So make sure that when you draw images, they're actually tagged with some reasonable color space. And as Andrew said, if you can't really figure anything out, generic RGB is a good choice because at least it's consistent and it's close to what your Mac display is. And there's actually a new mode.

It's called the Quartz debug that you can turn on. And it will essentially mark every image that gets drawn that is not tagged in red. So you can easily find out where in your application you're doing that. And yeah, if you're doing that, you will find out quite a bit of the Aqua UI is not tagged, so that's going to change by the ship time.

Okay, other changes. So 256 colors and thousands of colors modes for desktop use is going away. So in Leopard, millions of colors only. And for most of you, that probably doesn't make any difference because you don't have dependencies there. In fact, it might reduce your testing burden because you no longer have to test your app in thousands and 256 colors of mode. Those modes are still available if you're actually using the display APIs to change mode in full screen mode.

If you so desire and your game would like to go full screen and run in 256 colors, you can still do that, even though I don't really understand why you want to do that. But for the desktop, these modes are disabled. And last, and probably least, quick draw.

So the message here, the API has been deprecated for three years. There is no 64-bit version of Quick Draw. So if you want to compile your application to 64-bit, you will have to get off Quick Draw. It's also not compatible with QuartzGL. So as soon as a Quick Draw window that is QuartzGL accelerated gets drawn to with Quick Draw, we have to pull that off the graphics card because Quick Draw cannot send its commands to the graphics card. So this includes QuickTime clients. QuickTime has a lot of old APIs which use G-Worlds. So you should use off those and use QtKit instead. You can visit session 220, High Performance QuickTime Video Process ing, where they tell you a lot about how to use QtKit.

Okay, with that I would like to point out where to go next. So there's the Graphics and Imaging Evening Lab, which starts at 6 today to 10 in the evening. And tomorrow there's the Core Image Lab if you're interested in how Core Image fits into Core Graphics. And with that, these are the contact information for our 2D and 3D graphics evangelist, Alan Schaffer, who will be...