Tune Your 2D Graphics Code - WWDC 2007

Graphics and Imaging • 1:13:18

Have you taken the time to tune your 2D drawing code? It can be a great way to increase the efficiency of your application. Learn the techniques that squeeze the most performance out of your drawing code. See how to use Quartz debugging tools to locate and remove graphics bottlenecks and optimize screen updates. Appropriate for all Carbon and Cocoa developers.

Speaker: Andrew Barnes

Unlisted on Apple Developer site

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript has potential transcription errors. We are working on an improved version.

Welcome to the Tune Your 2D Graphics session. Apparently it turns out this is the haven for people who weren't allured by the Core Image and the Core Animation. So you all are here, thanks very much. Hi, my name is Andrew Barnes, and yes I'm the pixel guy, the guy who makes pixels.

So this session, the things that you learn is how to make more efficient use of the Mac OS X's graphics pipeline, and what we're going to go through is sort of divided into sections, one section has to do with drawing optimizations, what you can do to draw faster, and the other one is what you can do to get your stuff to the screen faster.

So let's take a look at the graphics architecture slide. We've all seen this before, for those of you who are returning. Basically what happens is that when you draw with Quartz, you draw, your application draws into the window backing store, and if you draw with video, you draw into surfaces, and you draw into 3D. Those guys both are surfaces that reside in VRAM, and once all the, you flushed your content and you want to say great, I want to present it to the screen, the Quartz Compositor takes it all together, builds it up, and puts it onto the framebuffer, and that's the thing that you see on screen.

So, what we won't be talking about obviously is 3D and video, what we will be talking about is you know, Quartz, the uses of Quartz. So let's jump right in. First we want to start off with some basic optimization fundamentals, right. The first one, it doesn't matter what graphics system you're on, changes in state are expensive. So the first thing you want to try and do is try and minimize your state changes.

Second thing you want to do is obviously don't draw that can't be seen, or what won't be seen, or what's secluded. Don't overdraw your stuff. An example in 3D is like tryin=g to test like the far side of the moon. It's like nobody's going to see it, so why bother testing it? Another important point is about memory consumption.

People like to create caches, they like to allocate memory, you know, hopefully garbage collection might want to get rid of your memory for you, but basically what happens when you create a page, it potentially becomes a page that needs to be paged out the disk. So if you can control how much memory you consume, it's always wise to try and do so.

Another point is sort of trying to understand what kind of data you're dealing with. Right, are you an application that draws images? Are you an application that draws line art? What sort of size of data that you're dealing in, what amount of data that you're dealing with, all these things are very important in trying to come up with the solution to your particular problem.

And the last point, obviously, is sort of never be afraid to sort of reexamine any kind of the architectural decisions you may have made in the past, right. What's true today, it may not be true tomorrow, right. So let's jump a little bit further in. Objects. We have a whole plethora of objects for you to use, right. And reusing those objects are critical to best performance.

You don't really have to do very much, you just have to, once you use your objects, you hold onto it. If you don't need the object, well yes you can release it. But if you do need your object and you're going to reuse it again, hold onto it, reuse it and reuse it.

Basically improves the performance of your application without you doing very much, right. It avoids a whole bunch of costly sort of complications, whether or not it's a path, whether or not it's color matching, whether it's image interpolation or color matching. It you know, just reusing that image ref is very important. Another important point about it is for PDF and PostScript representations, right.

Some people are, there's been a bug that I've seen people have where they would actually draw this image repeatedly inside this PDF document, and they constantly created new images. And what actually ended up happening was that we created several copies of the image inside the PDF file. If they only used the same image ref all the time, what would have happened is that they only have one representation of that image in the file.

And of course for transportation to the GPU, which is sometimes costly, if you reuse these objects, we know we don't need to re-upload this stuff. So definitely if you have objects, you want to hold onto them, by all means do so. Finally, of course you know, the previous slide told you about memory consumption. If you don't need the object, definitely throw it away. So Quartz objects are sort of separated so that they live in two worlds.

Well not live in, there are two worlds. One world is about coverage, it's like the stencil sheet in which you, the mesh or the cookie cutter, right. And the other one is the actual paint. The colorized objects are basically paint that you pour through the stencil in order affect your destination. So what we're going to talk about right now is coverage objects.

So the first guy on the list is paths. Hopefully you're digging the fruit theme today. Paths are basically general containers for any kind of geometric data, curves, lines, rectangles, and so on. And the important point about paths is that yes they're really complicated, yes they're really easy to use, as well as you can capture lots of complicated stuff.

You can have a path that has multiple sub-paths, right. And what happens is that at the time you actually draw your path, or your paths, your array of sub-paths, we go in and compute exact geometric coverage. So basically what happens is that we never overdraw, even though you have multiple objects. We basically compute the geometric coverage for exactly each pixel, so that when we pour the paint through the pixel, it's only going to touch the pixel once, so that's important.

PathRefs. Whenever you have objects that you're reusing over and over again, like I'd say you want to have a little circle or a little kind of something that you want to reuse over and over again, it's much better to just construct a PathRef and then reuse that object instead, instead of re-explore, or re-explaining the whole arc in the line and the curve, right. So if you are reusing paths over and over again, regardless of whether you're scaling them or not, or transforming them, you should just make a path object and reuse the path object.

So yes, do that, and it's always better than explaining or doing path reconstruction all the time. Last point really is connected to the first point. I told you that you can create a path that can be arbitrarily complex in multiple path segments, but computing the actual geometric coverage for each of those pixels that comprise the path, or the area that the path is going to touch does take a lot of time. So, especially when they're self intersecting.

So it's important, or it helps if you were to somehow subdivide your paths, right. So if you have a big street map you could say okay, well let's draw all the thin streets, and then I'll draw all the fat streets, and then draw all the different, you know. So you can break your stuff up to decrease the actual computation cost of actually constructing and creating the physical mask for the path.

So definitely you can try and subdivide things. And of course it's also important to try and know what kind of data and where the data is, so that when you start cutting these paths up, you can sort of discard whole chunks. Which kind of leads me into something else.

We've talked about CAD and different things. We have new APIs that were sort of available in Tiger, and they're stroke line segments and draw up rectangles. These are the fastest ways to draw rectangles on lines. So even though paths are great in general, lines and rectangles are really common, and people use them all the time. And these APIs are the fastest way to do things.

But be aware that the actual exact pixel coverage is not completely guaranteed. Gives you a good estimate, but it's not as accurate as the path. The coverage values and errors are probably you know, one and two pixels, something like that. But it definitely saves us from having to do a lot of costly computations with respect to the self intersecting lines and stuff like that, which you get a lot of when you have line segments and you know, rectangles and stuff like that. So if you just like a simple CAD application, or you want to just draw a whole bunch of rectangles, definitely use that, those, these APIs and you go a lot faster.

For more complicated situations, you know, which is more or less the CAD situation, or a GIS system, you sort of, if you haven't already done this before, you might want to take a look at what kind of data you're dealing with, right. Especially for instance GIS systems or CAD systems, you really want to try and say well geez, how can I organize my data structures in such a way that I can get efficient rendering of what I need to draw, and not basically shove you know, five million lines at us and say draw it please, only if it's a postage stamp. So you could definitely do a whole bunch of reorganizing how you actually store your data so that you can get fast access or random access as people are panning and zooming around.

The other thing too is about what you're actually seeing. You know, I made a comment about testing the far side of the moon, similarly with for instance the waveform diagram that you see. There's a lot of segments in that waveform diagram, and not all of them need to be drawn, right.

If the line width is one, you can certainly prune a whole bunch of data instead of shoving all 10,000 lines at us. You can cut it down to a hundred and it'll be just as good. So definitely knowing what kind of data you're dealing with is very important.

So next thing is, told you how to draw paths, different types of coverage. The other type of coverage is clipping. So told you how to sort of draw less, now we're telling you how to draw even less, right. The first part of this sort of problem is to do with calling, right. The, for instance, the highlighted area is just a bunch of text blocks, and you really don't need to draw the entire page, right. There's a whole bunch of text, a whole bunch of lines (inaudible).

If you know that they are outside of what the area that you want to draw, definitely cull those objects and don't send them to (inaudible). You could send them to (inaudible), but we'll just simply go in and do a whole bunch of work to diskard the objects. So definitely if you know ahead of time, you can actually just sort of call out entire blocks of stuff, it's like this is not visible, just get rid of it.

And similarly for trying to sort of look at things that are actually smaller, to do sort of trivial clipping yourself. Even though you might have a sub region, and you have a huge text run, which is like 500 glyphs, right, and you know all 500 glyphs are not going to you know, hit the area that you're going to touch, you can sort of do trivial clipping and just clip out all of those glyphs and not give us that large run, right. Because we're just going to turn around and start processing the data and throwing away those runs anyway. So if you are able to you know, efficiently ascertain whether or not you need to run to the run, then yes, you can start chopping your runs up.

And of course minimizing what you clip, right. Typically sometimes people do something kind of silly, like set the clip path around the image. The image already intrinsically has a clip, it's whatever's outside of the image bounds. So don't just set up a clip and then draw the image, that's just a waste of a clip. So just draw the image. So if objects intrinsically have clips, don't try and re-explain what that clip is if the object already is going to tell you what the clip is. So definitely do that.

And along the state changes line, you might want to sort of think about amortizing your clip. We spoke about how expensive it is to do state changes, and yes clips are cached on the card, which is slightly unlike paths. But at the same time, reusing those clips will definitely help you.

So if you have stuff that is going to share a particular clip, it's much better to say I'm going to use clip A, draw a whole bunch of objects, and then move onto clip B, as opposed to saying I'm going to draw clip A, and then I'm going to draw clip B, then you move back to A, and then you go to B, and you know, that's just going to cause us to do a whole bunch of stuff to do things. So definitely you want to amortize your clip over you know, if your content allows, of course.

So great, that's clipping. So that's all nice and great, but what if I definitely have a clipping issue in terms of my performance, and I want to sort of pre-build this clip at a particular resolution, such that I don't have to go in and tell you about this clip, and we have to go and construct the clip every time. So you might want to just create the clip off screen, and then draw it. So how do we do that? Basically create an alpha-only bitmap context, that context is basically I'm going to call our coverage.

You can draw any kind of object in there and then record the coverage. Even the alpha from your images, or shadings, or whatever. So an example, the little thing on the bottom, and basically use a shading with a circular clip. Second thing you want to do, after you've constructed your clip, you have a choice. You can either set it to be the current clip in your context, or if you're drawing an image, you can just simply create a masked image from the clip, or you could just draw the clip yourself.

So in this particular example I just you know, created the clip with a shading, with a circular clip, added the texture on top of it, and then threw a little highlight, and you can get something like that. So that's an example. And you can sort of reuse that clip object every single time.

Just say ClipToMask, and away you go. Okay. So now let's talk about, we know how to create the thing, the stencil that we want to pour our paint to, pour our paint into. Now we want to talk a little bit about how to create the paint that you want to pour into there.

So we're going to talk a little bit about colorized objects. First up a thing that you guys use all the time, ColorRefs. ColorRefs are definitely much faster than sort of saying set RGB color, or set CMYK color. If you have a color, you can just create the color object and reset the object. Colors represent either single colors, like a color like a spot color like red, green, or blue, as well as patterns.

They're the thing that is sort of an infinite sheet of just color, and that infinite sheet can be a constant color, or the infinite sheet could be a replicated pattern. So we have some, for spot color examples, we have new API for Leopard, where it allows you to sort of create a generic RGB color, or gray color, or something like that. So you can definitely start using those APIs, instead of having to go in and find the colored space, then go and create the object and never use it.

It can be in APIs. Patterns as I said before, patterns are sort of wrap, interior components of colors. You create your pattern, which represents your arbitrary drawing. It could be an image, it could be a PDF document, it could be anything. And then you stick it inside the color and say this is the thing I want to do my infinite sheet of fill with, all right.

It's obviously much faster than just going through and setting up the clip, and drawing, drawing, drawing, drawing, drawing, drawing, drawing, right. So you basically set this pattern up, give us a representation of the pattern, and you just set that to be your current color, and you fill your current color.

One thing about patterns that you need to be cautious about is about how it replicates. If you want sort of consistent spacing, or sort of pixel aligned spacing, either you try and do it yourself, which is not always possible, or you try and use that constant, which is a PatternTilingConstantSpacing. So great, we spoke about patterns and colors.

So yes, they're one of the objects that you can hold onto, and if you hold onto it will do nicer things. In the case of colors, color matching sometimes can be expensive, so if you reuse your ColorRef and we see the ColorRef again, we would say hey, I can look it up in my cache, bam, got it. Similarly with patterns.

Patterns are a little bit more complicated in the sense that just because you've described a pattern, which was that shade that you see, it is really sort of, there's a lot of work that has to go on, especially if the actual pattern contained a PDF document. We have to go in and rescan, convert, and we have to recolor match, you know, to calculate everything that we need to in order to get this key cell, which then gets replicated. So holding on to the PatternRef and reusing that PatternRef in the form of a color is definitely going to say, going to give us the ability to go and fetch that representation in the cache, and then reuse that to go and do replication.

And always, about the state changes, we sort of minimize. There are lots of cases when you want to sort of say great, I'm going to be drawing my black text, you draw your black text, then you draw your red highlights, you know, so that you can reorganize your data around trying to minimize your state changes, that would also be beneficial.

Okay, so shadings. Shadings and gradients are obviously transitions between two colors, or smooth colors. I shouldn't say two colors, they're arbitrary transitions between colors. They're definitely much better than trying to create the shading yourself. A lot of documents that were sort of PostScript generated back to PDF have this kind of misbehavior where they, because there was no shading in PostScript prior, they would basically go in and draw, start to draw a whole bunch of lines in order to make the gradient. That's super painful, right. And PDF does support shadings, we support shadings, and shadings are sort of you know, a common thing of how to do transitions between colors. So definitely it's much faster, you should always use them. And your performance would be increased.

Shadings and gradients are also objects that we cache. So if you had a shading that you were using, and you wanted to reuse that, that would be a much more beneficial thing, because we would, if we constructed something at a particular resolution, did sampling on our function, when you reuse that object and you give it to us again, we'll say hey great, we have this in the cache, we can reuse it.

But more specifically to do with shadings. Shadings, gradients are just basically transition between two colors, right. It's a linear transition. Functions are a little bit more general, or shadings have the ability to specify shapes. Shadings have the ability to specify things more general than just a gradient. So if you create a function, which basically represents your sampling point, it could be non-linear, it doesn't really matter. It's a, the shadings are evaluated by callback, and you can do whatever you want inside that callback.

So it's important to try and sort of minimize what you do, you know, don't go off and touch the disk you know, to go and do a sample point. You know, just sort of try and make it as optimal as possible, because at any point in time the shading is sort of re-evaluated as you draw different, sorry, the function is re-evaluated as you draw different shadings that use that. So in that particular example I had one shading, and I basically, sorry, one function, and I basically reused it across a bunch of shadings. So definitely you want to reuse your functions, and you want to implement them efficiently.

So we spoke about colors and coverage. You know, we really sort of didn't actually talk about colorspaces, like what, what is my color? What space is my color? Is it red? Is it green? What is it? You should try and be very cautious about colorspaces that you use, right.

If you have gray or black, you should typically just say gray or black. Don't try and say okay great, I'm going to set my RGB components to my, to all the same color, cause they actually aren't you know, the exact same color. And this is also very true for gradients, right. An RGB gradient is a very, very different thing than just a simple gray gradient, if all the RGB values are equal. So you want to use the appropriate colorspace.

Second point is you always want to use calibrated colorspaces. People sometimes you know, from way back when got accustomed to using device RGB, that's not always a very good thing. In fact it's a very bad thing, because you actually don't know what the output representations going to be.

You could be like a Mac Mini plugged into TV, you could have gotten one of those brand new Sony TVs with a huge Y gamma. And if you don't actually say what you know, the color is really with respect to one colorspace, we don't really know exactly what to do, and we produce you know, cyan, electric cyan on displays that really that's not the color what you wanted, you wanted a red that was a red in this colorspace. So you should choose the appropriate colorspace that looks good for you, and you say this is what I want to present to my users.

So definitely you want to always use calibrated colorspaces. Quartz Debug has a new feature that will allow you to sort of you know, colorize your usage of uncalibrated colorspaces. So if you have line art or images, it'll just sort of produce a different color, and you say hey yeah, I used, I filled this line with a device RGB. So that is definitely available.

And we've also added new APIs for Leopard, which allow you to say, instead of trying to find out some colorspace and go and get some profile from the disk, you just basically, they're by name, you can say give me a generic RGB, or give me a linear SRGB, you know, as constants. So you can get colorspaces that way, and then construct your colors or your shadings, or your images, or whatever.

Second point about colors and trying to fill area. Transparency. Transparency's really cool, it has great effects. But it's expensive, it basically cuts your fill rate in half. Cause every time we need to draw something that is partially transparent, it requires us to go in and blend with the destination and the source in order to choose the final pixel.

On your right here's an example of the San Francisco Bridge. And a transparent image is put on top of it. We don't actually know, because you basically said there's transparency in for instance, an image or something. We don't know exactly which pixels are opaque and which are not, so we basically have to run through and do a per pixel operation, which requires us to fetch a destination.

In a case of the other case where it's actually just you know, opaque, and you set it to be opaque, we just never ever contact the destination, we just simply slam the data down onto the destination. Now, if that image actually contained alpha in it where all the alpha was FFF, similarly, you would be in a bad situation too, because we'd still be doing per pixel operations in order to find out what has to happen.

So definitely stay away from, if you can I should say, even though alpha's sexy, you should stay away from needlessly putting alpha into your objects when they don't necessarily need to be there. Another thing that people sometimes do is sort of fade objects. We have a constant that's been, sorry, a function that's been available on a context (inaudible) which basically allows you to set a global alpha, right.

So if you want your entire PDF document faded out by 50%, or an image faded out by 50%, or whatever, you could just basically set that constant, say okay draw my image, instead of you know, trying to create an image that has alpha in it at 50%, or trying to draw the image that's your PDF document off screen at 50%, or you know, anything else. Just very simple call, just make that call and draw it, and then you'd be fine.

We've also added new blend modes, the Porter-Duff blend modes to the CG context. Now you have to be cautious about using these blend modes, because they're not easily representable on print material. So what ends up happening is that you end up conceptually going on the transparency printing path, which is not, may not have been what, exactly what you wanted to do. So you have to use them cautiously, which, so the Porter-Duff compositing modes are new.

But previously we had PDF blend modes, those can be represented and printed. So if you try and switch over because Porter-Duff is a lot nicer, and there's lots of new features, there's lots of features that Porter- Duff supports that aren't in the usual PDF set, you can use the Porter-Duff modes fine. But just be cautious about what happens when you're actually going to the printer.

Okay, so let's talk about the workhorse images. Images are images, they're all over the place. And they are basically containers full of color and alpha. Arbitrary colorspace, arbitrary number of components, arbitrary depth, optional decode, there's a whole bunch of parameters that you can pass in image creation. Images are probably the biggest thing we cache, because sometimes they're expensive, and sometimes the operations that you try to do with them are also very expensive.

So most of the stuff that sits in our cache is basically about images. One sort of general note about images is that you can actually control the interpolation quality when you come to, when you actually start to scale your images. New for Leopard is the low interpolation quality mode actually now does something a little bit more enhanced. It does an area sample.

High does lancose which is a two lobe thing, or three lobe thing, and gives you much better quality, and low slightly less. None is obviously just point sampling, and default is whatever the context chooses to, which is going to be not none, but something, but not necessarily low either. So definitely you want to use them to your advantages, and we'll talk about that coming up on next, on subsequent slides.

Okay. So let's talk a little bit about things that you should know about images. This is when you're actually constructing images yourself, right. So people sort of okay, we construct an image and it has a parameter called bytes-per-row, what is that? Well it's the actual skip to the next gamma line. What should that number be? Well the number should at least be the number, at least be congruent to the size of your pixel, right. You have24-bit pixel, you probably want to choose a multiple of that.

Next up from that, cache line or vector alignment, those are all good, those make us go faster. So patting them out, yes they increase the size of your image, but for large images it may not be that much of a factor, but it definitely improves the image handling and the image pipeline when you do that.

Component sizes. Bytes, shorts, and ints, and floats, they're all good. For Leopard we've actually widened the pipeline, and basically allowed the image to dictate how deep the pipeline should be. A lot of people got surprised in Leopard when they would actually create floating point images, and have them unfortunately bottleneck through you know, 8-bit image, eight bit per component.

So now for Leopard, the pipelines are completely deep, you draw a floating point image, you're going to get floating point on the other side. And that's especially useful and nice when you actually have a floating point destination, which is not actually new for Leopard, we had that before. We've also deepened the actual destinations, in the sense that now you can actually have short bitmap context. So you can actually draw yourself through your, the image pipeline, it'll actually, if you have a short destination, short component size destination, it'll actually record your short data.

So definitely those are all good. If you have, formats that are like 11-bit, you know, 17-bit, those kind of numbers, you may not be going down the fastest path possible. So if you can, sort of align things up to some reasonable size. Transparency. Can you tell that's my favorite topic?

Transparencies, transparency is great, but more specifically to do with pre-multiplication when it comes to specifying images. Pre-multiplication allows us to avoid a whole bunch of expensive calculations, when actually blending to destination. But sometimes they get in the way. When we actually have to color match data, we actually have to unpre-multiply the data in order to do color matching.

When we have to do interpolation, well we actually have to have it pre-multiplied in order to do it. So the basic point of this slide is that if you need to manipulate the pixels themselves, or you're generating the pixels themselves, and it's more efficient for you to work with pre-multiplied data, then by all means do so. If the data came to you unpre-multiplied, don't try and pre-multiply for us. We know, we have fast vectorizer teams that'll do that, and we know under what situations we can or should or shouldn't do pre-multiplication.

So if you have unpre-multiplied data, whatever the pre-multiplied state is of your input data, just give it to us. Don't try and do anything, unless you need to modify them yourself. Endianess. This showed up for late ended Tiger, which is basically because we started to switch over to non big endian machines, and consequently endian. So this constant, or these constants in this API showed up for exactly this purpose, how to handle this.

So typically, if you're actually generating an image, you might want to manipulate pixels. So if you have an RGB 32-bit pixel, you might want to say okay, take red and shift it up by 24 bits, take green, and you manipulate that. If you write that into a stream, what actually ends up is in memory, the data is actually flipped around.

So lots of people do this, and we have to provide some particular facility for you to say look, I'm going to lay down a bunch of shorts, and they're going to be shorts because I dealt with the data, and it's on a linear machine I want to lay down shorts. Because effectively image data inside of the images that are provided by the data providers are more or less big endian streams, most significant byte followed by least significant byte.

And (inaudible) on your Intel machine and you try and do that, and you write that out by just writing out the ints, it's actually swapped. So definitely you should do those. So you can set up the byte order constants to specify whether or not you're 16-bit or 32-bit swapped, basically. And you can set that on your image info when you create it, and you can pass your data unmodified directly to image.

That being said, if you're actually looking at image data, you're on the receiving end of the image, you actually have to be prepared to handle these cases, right. So basically you want to do whatever you have to do. You read the constant, figure it out, it's like it's swapping data.

And of course you need to use sort of the endianess that, when it makes sense. You get, you can get situations like this where somebody has an RGB image, it's obviously 24-bit, but they specified a 32-bit endian swap thingie, that's just kind of dumb. So definitely don't do that, be nice and play fair, and be, make other people further down the chain from you happier.

Okay. So that's enough about sort of creating your own image. So what happens if you actually want to get an image from someplace? It's on disk, you need to, or you need to write it out to disk. ImageIO was created for Tiger, and it's a very great, good, and we've collected basically all the JPEG libraries, and all the TIFF libraries, and all the p and g libraries from all over the place, and compressed it into one.

So now we have just basically one source, a bunch of very talented guys that go and work on stuff, and they basically Altivec it, SSE it you know, so it's, don't sort of drag your own image decompression or compression libraries with you, because we have them. They're on the platform, they're the fastest that we, you know, they can be on a platform, so definitely you should just basically use ImageIO.

So in addition to that, there's lots of nice little bells and whistles on ImageIO. You can you know, supports a rich set of formats, RAW, OpenEXR, all the web standards. You know, so any kind of image that you can see, you probably just give it to ImageIO, suck it up, and say here's your image.

So definitely you want to take a look at that. Also has features for incremental loading. For instance if your Safari uses it as it's actually getting the stream data from the websites that are actually incrementally loading it into the image source, and then the final thing is they get the image and they draw the image to the screen. It also provides image manipulation, like rotation, shares, stuff like that, it does fairly primitive stuff. There are also some additional features on image, in image sources where you can ask for thumbnails, right.

You want to have a reduced size, as well as you know, because you might be drawing an image over and over again and it's always going to be decompressing, you might want to say actually no, I want you to just decompress it once, and put it, cache it in memory. So there's these options when you ask for images, you can set these up.

There's one interesting thing that I actually bumped into just you know, a few weeks ago where I had a RAW image, and if I asked for a thumbnail at a reduced size, it actually went down a much faster code path in order to construct the data, because the actual processing was a lot, the processing required was a lot less. You didn't have to go through the full debayering process to construct your image. So if you asked for thumbnails, it reduced like little tiny, you know, you want to do a quick shot of the image.

Like you have RAW image and you want to display something really quickly, while you go run off and do the real work, you can ask for this thumbnail at reduced size, and then show that as a poster, posted frame let's say, while you're actually processing and fetching the real image. So definitely you might want to use that.

So one additional point is about caching. People say hey, you know, I'd like to cache because I think I have a problem. So this sometimes is a little bit problematic, in the sense that you'd like to cache, and somebody else is caching, so then we have two caches that don't know about each other.

So what you really should do is try and sort of make sure that there is a performance benefit for caching, right. This is definitely a case of pre-optimization that you kind of want to avoid. So you need to make sure that there's a benefit, one, and you need to be aware of the cost of doing your own caching.

And you also need to be committed to managing the cache effectively. The cache and the stash are two different things, you know. A stash is basically oh I saw it, let's just put it memory, versus something that actually ages. And always remember back to the original slide, first slide, be cognizant of memory consumption. When you start creating these caches, you're just basically making pages for the VM system to page up. So definitely, you have to be committed to managing it correctly.

Images are supplied by data providers. Data providers basically are the means by which the data gets to the image and gets represented. These objects can be used to your advantage, right. You can simply say, on my first fetch of my data, that's when I do my decompress. And then you supply the image to the data, sorry, the data to the image, and then some time later you might want to just discard the memory, cause it's been sitting there for you know, six hours or something, so you might want to discard it. So you can sort of decouple the image usage from the actual decompression, and sort of do you know, fancy stuff to discard memory on you know, when it's no longer needed. So definitely data providers, you might want to look into that.

Another way of caching is to actually just say well, even inside of a data provider, the data provider says get me the byte point, or you say okay great, I need to provide the data. I've got my image, let's go in and construct the data right now. So what you would normally do is just basically create a bitmap context of the appropriate size and colorspace, and a format, and then you simply draw the original image into the bitmap context.

And then you ask the bitmap context to give you back the image, or you can actually, because you've created the bitmap context, you can supply the data point as you can create the image yourself, and the data provider yourself. And then you draw the resultant image, or in this particular, in the example that I gave you, you actually just populated the memory for the data provider. And then you just draw that instead of the original image. And therefore bing, bang, boom, you get caching.

So let's look at the creation part a little bit more. Obviously you're going to be using BitmapContextCreate. Now if you actually don't need to manipulate the pixels, it's much faster for you to just tell us you don't care about the memory, right. We go in, we create a chunk of memory that's applicable to whatever system you're on, and with the right alignment you know, for speed. And when you draw, it draws into that memory.

A second point is about creating the correct format and colorspace. Easy example of that is the image source was for instance, a grayscale image, and you created a bitmap context that was RGB, and you added alpha. When you drew that opaque grayscale image into that bitmap context, what will actually end up happening now is that you'll get an RGB image with alpha.

So now you've basically replicated the data by three, and you've added alpha, which of course in the previous slides we spoke about reducing your fill rate by actually introducing alpha when you didn't need to. So that's an example of trying to choose you know, the appropriate number of components and colorspace for the source that you're trying to create. And of course, you know, use transparency effectively.

Layers. Layers are basically general containers. They're device-dependent representations. You can simply create a layer from a bitmap context, and you can draw, sorry, a context, and you can draw all of your content. Layers are primarily used to capture multiobjects, like if you had a PDF document or you're drawing a whole bunch of stuff. Sometimes it's not advantageous, most times it's not advantageous to actually just create a layer, to draw an image into it and then draw the layer instead. So definitely you want to sort of be conservative about when you use these layers.

So whenever possible you should try to use layers instead of bitmap context, because they sort of simplify a lot of the details, right. We told about, we spoke about you know, creating the right colorspace, the right bits per component, you know, all of these little nasty things. And what layers allow you to do is basically say okay, given a context, I don't know what it is, it could be a PostScript context, it could be a PDF context, it could be a bitmap context that's actually floating point, and the floating point bits are swapped, right. Asking that context for a layer gives you the best representation, so that what you draw in it is more or less compatible with a destination, right.

Whereas if you were to choose a bitmap context approach, you'll have to figure out all these details, which is not all readily available to you, as opposed, you know, in the case of for instance swap flow components. You might just say create flow little endian, and what actually happens is that little endian float layer that you just, bitmap context that you just created is drawn into the context, but the actual context is actually expecting big endian float, so we have to do a whole bunch of conversions. So definitely if you want to basically get something that's compatible with destination, create a LayerRef drawing to it, and then you, away you go.

For transient type stuff, meaning that you just want to draw a bunch of objects and have them all faded out by 50% for example, what you would normally do, instead of creating a layer from a whole cloth, which is basically a separate independent object, you can simply just, simply create a transparency layer, which is basically a sheet of acetate that you can draw on. You draw onto it, and then you say EndTransparencyLayer, whatever you drew gets composited into destination based on the mode that was set, or the clipping that was set, or the alpha, or the style that was set before you actually began the transparency layer.

One important point about that is when you say begin transparency layer and it's completely unbounded, we have to more or less create a transparency layer the size of the destination. So if you know that you want to draw on a particular area, it's advisable to set the clip before you create the transparency layer, and that way we basically create transparency layer that would capture everything that would be drawn.

New APIs for Leopard would be CreateTransparencyLayer in rectangle, where you say hey, I'm going to promise to draw in this area, and that area is a rectangle specified in user space. And what we'd end up doing is we'd actually project that user space rectangle to the destination, create a destination of that size at the right position, where I'd format, etcetera, etcetera, and we'll draw, when you draw, it'll be captured correctly. This is explicitly useful when dealing with styles. Because a lot of the times you have styles, like for instance shadow style or focus ring style. Those styles are very, it's not easily determined exactly how much area you actually should draw.

So you can use this new API, it's much more convenient. You just say here's a rectangle, make me, I'm going to draw in this area, make me a transparency group, a transparency layer for it, draw into it, and then you're done. Okay. So last little point about basically finding pixels. It's wise to hit pixel cracks.

If you don't hit pixel cracks, we do a little bit of extra work. So you might want to align your pixel crack, align your drawing to pixel cracks, like images, you want to snap them out so you don't get edges or blurring inside of your image. You can sort of adjust your pixel coordinates to match exactly where you think you want to go in order to hit pixels, so we get you know, one to one mappings.

HiDPI tends to add a little bit of complexity, because you know, if it's completely scalable and you're not only locked in at 1X 2X, you end up having things that might basically be always straddling pixel boundaries. So it adds a little bit of additional complexity to trying to figure out if you wanted to lay out UI and draw a bunch of controls, what do you have to do in order to make sure that you get a good representation for the DPI that was specified.

Way how to do that, ask the context for the user space, the device space transform. You take your user space point, you transform it to your device space point, and then you round in device space, and then you transform the point back to user space, such that if you were to draw that user space, that new user space point, that point will hit a device pixel. So that's kind of how you do that.

Okay. So let's, we finished all the optimizations, let's say more or less. And what we're going to do is talk a little bit about even more optimizations, more specifically Quartz GL. So for those of you who don't know what Quartz GL, basically Quartz GL is a Quartz 2D implementation on top of Open GL. Basically it allows us to offload the burden of rendering, inputing, and blending onto the GPU.

Using Quartz GL minimizes DMA transfers that'll happen to and from the video card, and it has a much more efficient integration model when dealing with other GPU-based subsystems. An example of that is obviously things like Quartz Composer, Quartz GL, sorry, Open GL, Core Image is another one, and Core Animation.

So while we ran off and did all this really cool stuff, it's been in the works for a while, we discovered a really large set of lessons. One major lesson is that for a lot of applications, rendering may not be the bottleneck, all right. You guys are running off fetching a disk or copying files, or looking at the network or something, and you know, your actual application isn't really rendering bound.

So that's one point. The second point, which is kind of not so surprising, but it was kind of obvious, it was more that if you sort of try and use Quartz 2D in the ineff-icient model, like you don't reuse your ImageRefs or colors, or anything like that, basically can be extremely costly, especially in the GPU scenario, right.

You don't tell us this is an image that you're reusing again, we have no choice but to re-upload it to the card, and that's not fast, right. So definitely try to make sure that you use the APIs in an efficient fashion, and you can get much better performance, especially in the GPU scenario. Of course the good news is that if you kind of us it, it'll go fast.

Okay. So let's look at some stuff in more details. So why is the expense what it is? When I spoke about you know, the GPU scenario being slightly more expensive than the CPU scenario, here's why. With Quartz 2D what normally happens is that you can draw into backing storage, which reside in system memory at five gigabytes a second, six gigabytes a second depending on what machine you're on.

That stuff, once the data is actually in the backing store, then gets sent to the Quartz Compositor, which actually basically DMAs it up from the system memory into the video card. That happens at two gigabytes a second. Notice the number, two not five, two. The the Quartz Compositor blends it all together and presents it to the screen. In the Quartz GL path what's actually happening is that you're now putting data into the backing store at two gigabytes a second, remember that number two. But it's significantly less data, it has nothing to do with the actual geometry.

In terms of the area that's covered, it merely has to do with what it is. It's a quad, it requires four vertices, that's the data that you're actually sending. You're putting that stuff into a command buffer, and that command buffer is going up at two gigabytes a second, up to the video card.

And the actual filling, the actual area, the uploading is actually done at thirty gigabytes a second. So you can see where those numbers sort of lopside. So let's take a look at a simple benchmark. Here I think we're trying to, we might be trying to fill a rectangle.

And we're filling a rectangle and flushing it. So you can see those numbers, this is done in Mac Pro, you can see the numbers for Quartz 2D. Basically it's the draw into my backing store, flush my backing store, that's 500 megabytes a second just constantly pumping up to the video card.

Quartz GL, well there's no backing store DMA going on, it's zero, there's nothing to do right, it's gone. Command buffer traffic, you, Quartz 2D, you draw into your backing store, and then when you tell the Quartz Compositor to draw, it goes and lays a little command that says okay great, upload this data and draw this data.

The Quartz GL case, well that's not the only traffic. The Quartz Compositor's putting its own traffic going up to GPU, but you've also put the quads that you're actually drawing on the command traffic that's going to the GPU. So that's 40, so it's you know, four times, more or less, bigger. CPU utilization, well it's more or less the same.

But the bottom line is basically that we get four times the amount of updates per second. Of course we've thrown the Quartz Compositor into a mode that you guys can't do, but it's just basically trying to measure traffic. And so yeah, you can get 8,000 frames per second. Don't try this at home.

( laughter )

Okay, so that's kind of cool, so let's look at some more numbers.

Here's a situation of trying to find out if I were given 100% of the CPU, what can I do with it? So for Quartz 2D obviously as -you draw more and more pixels, we're actually trying to do an image operation, so we have to fetch an image and then write destination, you can see that you're just basically dropping, right. I mean it's just going to go that way, because you're just consuming more and more.

You're maxing out your G, your CPU usage, but you're doing less operations per second, because each operation for the image size is getting larger and larger, so that's true. The Quartz GL case, it's slightly different. Yes it's no surprise, it does take constant time to pack one quad. Doesn't matter how big the quad is, it's just going to take constant time.

So some of you might actually get it, and you're thinking but that doesn't seem quite right. I mean you got to block some time, right? The GPU doesn't have infinite bandwidth. You're right. So let's actually look at what actually happens if you were to say, measure in real time, right. You're not measuring if you max out the CPU, you want to find out exactly how much can get done. Well this graph is not so great.

For the Quartz GL case, you can see at the small sizes we're actually not as fast as a CPU. The reason for that is that the actual data that goes into the command buffer is actually larger than the area that you're filling. But at some point in time, actually it's kind of a sweet spot, because you don't normally necessarily draw one by one things or two by two things, lots of images are you know, 32 or 48 x 48. GLs start to take over, in that the CPU trying to draw that area is actually taking more time than the GPU.

And then the GPU starts to take off, and says hey yeah, I'm winning, I'm much better than you. And then as the image sizes get to fly down towards the end, they more or less sort of come off and say well, we're more or less the same. So that's not very encouraging results.

But you really have to look at it a different way, keep on switching the axis of the presentation. But anyway, if you look at it a different way, and it's basically how much resources, how many resources are you consuming to do your work? We spoke about the G, the CPU scenario where you're actually consuming all of your CPU to do the work, and that's the straight line at the top.

The GPU scenario, well you're actually using less and less of the GPU, but you're basically doing more or less the same area. Remember the previous graph, toward the end it was more or less equivalent. So the GPU less, uses way less CPU in order to do the same amount of work.

Okay, great. That's fine for one particular application. But let's actually take a look at some of the results that we see with Quartz GL turned on for a bunch of different applications. Text edit. Simple Text edit resize, simple flow document, it's not rich text, plain text, 3X performance.

The rich text format, there's a lot of kerning that goes on, a lot of different types of fonts and stuff like that, that's not so, but it's still 70% improvement,p which is not bad for absolutely doing nothing other than turning on a switch. Similarly, Safari scrolling, empty page, terminal resize, that's a nice number, it's 2.3X. Mail resize, 170.

So all you have to do is just turn on the switch and do nothing, make sure you optimize your codes using it the right way, and boom, you get you know, 40 to 3X, or 0.4X to 3 or 4, maybe even 5X sometimes, depending on what hardware you're on.

So that's nice for simple applications. So let's take a look at something more complicated. Here I have a situation where I have basically four P of documents running away in the background, a 3D app running, clock running, frame meter running, a whole bunch of stuff running, the entire desktop is all done with Quartz GL, just flip the switch on everything and boom, it's all running. Let's see what that does. Well, as in the previous slide, Quartz 2D, if they were drawn, if that exact demo was done with Quartz 2D, basically there's a whole bunch of DMA traffic, 500 giga, megabytes a second. Quartz GL, nothing, there's no backing store to upload.

Command DMA 30 megabytes a second, that's just basically the Quartz Compositor using Open GL to get all those windows on screen, right. In the Quartz GL case, it's actually the window server, or sorry, Quartz Compositor's code that's actually sending the command data, plus the Quartz GL code to all of those PDF documents, all going away. And we're not even, it isn't double, it's not even double, right. It's 47 versus 30. 0 So we are consuming much less bandwidth, and more or less doing the same amount of work, if not more.

CPU utilization, well basically it's the same, right. They're, it's a quad, there are four documents being passed, all done on separate threads, or basically maxing out the CPU more or less. But the bottom line is basically we get a little bit more than 50% improvementP, just by flipping a switch.

Okay. So how do you turn it on? All you developers, I can see you like sitting in your seats all, I want to turn it on, I want to turn it on. It's really simple. You basically go into your, in Xcode, or you can manually do it, info.plist, just turn that on, Quartz GL enabled set to true, boom, your entire application's Quartz GL.

Or you can be a little bit more cautious, and since I don't want to turn it on for everything, I just want to turn it on a per window basis, because my other windows, they don't really need hardware acceleration. You can turn it on by either asking the window to set its preferred backing store if you're using NS, the Cocoa APIs, and for the carbon APIs it's same similar thing, you just set your preferred backing store location to be video memory.

Things you should know. It's important for you to try and figure out exactly what camp you're in, right. There's one camp where you're basically trying to upload a whole bunch of data, brand new images every single time, brand new line art, brand new everything. Basically all of those basically happen at the two gigabytes a second transfer, versus the other count, which is typically user applications where your objects are already on the card, right. You have all your textures and whatever, your fonts and your glyphs, and you're just basically redrawing. So that's why for instance things like application resize would, you would see performance benefit, because all this stuff is more or less cached.

In the cases of for instance, like let's say your web browser that's trying to get pages from the web, that particular scenario is basically this brand new data every single frame, right. So the data's coming in off the network, and you have to upload that data. So it depends on what kind of number you're dealing with.

I don't necessarily think that people are going to be able to click a new page at 60 hertz. What would be more interesting to them is being able to resize at 60 hertz. So you have to figure out what the tradeoff is for you, and where you're going to sit.

Second point is about Core Image integration. A lot of work went into this, and a lot of people sometimes create, the normal usage for Core Image is to basically have an image, and then you create your Open GL, you create your CIOpenGL view, and you draw your Core Image stuff into there. What happens under the covers is that the Open GL surface is more or less a sheet that sits on top of your window. So when you create those Open GL contexts, you end up effectively burning that size memory into video memory.

For a GL application that's great, it runs really fast. But for CI, depending on what you're actually doing, you may not necessarily need to do that, so you can make that tradeoff. It's like do I need to create an OpenGLContext, or do I just create a normal CICGContext. If you create a CICGContext, you can draw your CI stuff into there, as well as you can put text highlights and all sorts of nice things on it.

So you don't have to write any Open GL code, you don't have to sort of change things. You just say okay, I've got my CI, you use CI the way it was, it's intended to just do image processing. And then you just draw it with your normal CICGContext. And you turn on Quartz GL, and it runs faster.

Okay, other things. What you should do obviously is talk about the, implement the optimizations, right, use Shark, use your different tools, figure out where you are, find out what it's category, whether you're into new resource versus resource reuse case, and definitely do those things. So you basically turn it on, use Quartz Debug to turn it on, test your applications.

If you see an advantage and you want to opt in, opt in. If you see a disadvantage, and you don't have time to optimize your code, just turn it off and start working on your code later. If you see neither, don't do anything. Don't explicitly enable it or disable it, just leave it the way how it is.

And of course don't forget to test on different kinds of hardware. It's not like your stuff will break. If you happen to turn it on and you're running on a G4 with a Rage 128, what actually happens is that that just gets disabled. So you don't really have to do any hardware- specific changes to handle different hardware, it just automatically falls off, and you get exactly what you get now.

The thing about this feature is that it is not something that is new, and therefore you can expect bugs. It's like you can't expect any bugs, right. It's a replacement, it's a way of drawing what you used to draw before faster. So we tried to make sure that there are absolutely no bugs, no visual anomalies. So you don't have to actually do anything different.

Okay. So, we spoke about optimizations, we spoke about different types of coverage objects, and types of ways to draw things. Now we're going to talk about how to get stuff to the screen fast. So, optimized screened updates. Let's go back to the architecture diagram. You've seen it before, and what we're going to concentrate on is that part of the process. After you've drawn your data, now you want to get it to the screen.

Flushing. Once you finished drawing what you have to draw, you have to present it to the user. So you basically do a CG context flush, right. You definitely don't want to flush things that the user, that you didn't draw in. Try not to draw a flush area that you didn't draw in. That's obvious, if you didn't draw there, why flush it?

The other thing too is that sometimes people try to flush way more than they should, and that's you know, it's pretty obvious because it's 60 hertz for the refresh of display, if you try to flush faster than 60 hertz, nothing really is going to happen. If you do that, also you basically end up consuming valuable GPU and resources, GPU and CPU resources if you just try to flush this willy nilly all the time.

So let's take a look at what actually happens. So great, I got my application, and I, basically I have a VBL timeline, so the application hasn't drawn anything. VBL is clear screen, application draws something, consumes some GPU and CPU resources. Again, he does the next thing, the VBL goes out, bim, you get beat. Again, next VBL, C does a whole bunch of stuff, consumes a whole bunch of GPU and CPU resources. Very valuable, you could have given it to somebody else. But basically what happens is boom, you see Z. So all of that work you did was for nothing.

So what you should have done was really try and synchronize your flushes more or less to the VBL, and that way you do less work, and the user sees what they're going to see. Okay. So talking about optimal flushing. One important case is about animations. If you're not using Core Animation, which actually does the right thing, they actually sync off of VBLs, if you're actually doing your own animations, you have like little, an animating GIF let's say, like 500 of them on one window, and you actually do the flushes off of different timers, you're going to get five independent flushes.

What's much better is for you to actually animate off of one clock so that you update all 500 of them once. So definitely you want to choose something that is, have a single clock, and you want to choose a refresh rate of I don't know, half the display refresh rate, 60 hertz, maybe 30 hertz, pref0erably 30. Movies are okay with 24, so why not 30.

And of course you're drawing time mustbe less that your timer interval obviously, or else you'd just be backing into yourself. Second point right here is kind of crucial. You need to sort of separate your data manipulation engine from your visual engine. It's always good, doesn't block your UI, you get much more responsiveness. So let's say you were somebody's favorite file copying utility let's say. And you were, you made the unfortunate decision to use your file copying thread, and your visual thread, and have that be the same thread.

There would be no surprise if you realized that you could only copy files at 60 hertz, because what actually would happen is that your file copying will copy, and then it'll go and try and do a flush, your flush may or may not rock, but it's definitely going to happen at 60 hertz.

So if you have situations like that, it might be wise to sort of separate your visual and your data engine, have your file copying routine copy at file copy speeds, right. Or your file copy thread running at file copy speeds, and then have your visual engine happen at 60 hertz. That's definitely a thing. Definitely also in those particular cases you might want to use separate threads obviously, it's much better to do that.

Another approach would be timer based also. Last point about flushing or trying to do it optimally is more or less try and flush using a validation model rather than explicit flushing, right. The event loophole, the frameworks all do their own nice thing to try to give you the best representation, so you just simply say I've drawn this and validated back through the event that will get flushed, and away you go, instead of actually calling explicitly flush, right. There's actually some cases where you actually want to explicitly flush in order to present something to the user, but if you don't need that kind of urgency, just let, just use the normal framework and validation model.

Okay. So optimal flushing again. So how to detect these type of problems, what tools do we have available. Quartz Debug, if you guys aren't accustomed to using it, or never seen it before, it's basically a nice little handy tool that we have hanging around that does a bunch of nice cool things. One of them is autoflush. autoflush sort of basically as you draw, every single primitive that you draw, we basically go in and we flush it out to the screen, so the user sees it right away.

You can use that to detect whether or not you've redrawn stuff over and over again all the time for no reason, and you can turn on the color flush updates, which means that once the flush happens, we actually flash yellow in front of you, and say hey this is going to be flushed, and then you see it. And the other one is identical updates, where we actually do some processing in the background as the flush happens to say look, and we highlight these pixels and say look, this is the same. You've painted a white rectangle 500 times, it's still white.

So definitely you can turn on those different things to try and debug what's actually happening. It also captures things like overdraw. You know, you might resize a window, and you see a view being reflashed like 500 times, it's where the actual resize is actually done. That may mean that you might have some invalidation problem, where you've invalidated the view too many times. So definitely we can, you can use those tools to take a look at what you're doing, what you're drawing, and how often you're flushing.

Shark. Shark is another useful tool, it's a great tool I find. In Shark, because flushing is really more or less asynchronous, when you actually do a flush you don't actually get taxed right away. The flush goes into a queue, and some other work some time later, the Quartz Compositor picks it up. So you may not see the time there, but what you will see is a time when you try to draw again, right.

So definitely you want to look for the times at those points. Also too, when you're talking about dealing with Sharking your application to find out where you're blocking, you want to throw it in the all threads states mode, not just the time profile. Time profile measures exactly time spent in your process, not necessarily time you're spent waiting on other people. So if you turn on all thread states, it shows you exactly what's happening in your process, where you're blocking.

And of course if you try to use it too often, even though I said it was all queued up and asynchronous, if you overuse it, it'll obviously start showing up in profiles. Last point is basically that you know, we put nice names inside of our functions. So if you see those kind of functions, it basically means you have some sort of a contention.

Okay. So we spoke a little bit about optimizations, then we spoke about how to flush your stuff optimally. Now we're going to say okay, this is the final stage, this is the stage where stuff is getting from your backing stores onto the screen, right. The Quartz Compositor, you sort of have to look at it as basically being the guardian.

It's like nothing gets to the screen except going through the Quartz Compositor, absolutely nothing. So what we do is we manage all these flushes from all these different applications, right, and they come in, and then we say at VBL time we go in and take it out, composite it, throw it on the screen, and that's what the user sees.

Coalescing these updates together from multiple applications allows us to much, make much better use of the CPU and the GPU, right, because there's absolutely no sense in trying to draw faster than the user can see, you know, trying to get 800, or 8,000 updates per second, when the user does not going to see any of it. So we basically try and police the whole scenario, and say you know, hold off, you're over flushing, let's wait for some other people to happen, flushes, and then they all get presented out on the screen.

So let's look how that sort of affects you. Here we have an application that's actually a drawing. And because of coalesced updates, instead of doing that, what ends up happening is that it updates and gets scheduled for the next VBL. Another application comes in, similarly he tries to flush, he gets scheduled for the next VBL. Lots more CPU, lots more applications, they all get scheduled for the next VBL. And then when the VBL goes out, all the presentation happens all at once, you get nice, smooth, flicker free drawing, or flicker free updates.

So coalesced updates. What happens when you actually become part of this whole flushing scenario. Your updates are guaranteed to go out to the next display, we're not going to sort of miss an extra VBL, or you know, you get a flush and it didn't happen. It's going to go off with the next VBL. But more importantly, in order for us to satisfy that constraint, we have to prevent you from trying to overdraw on top of the backing store that we have yet to flush to the screen.

So we hold you off. So your next drawing operation after you did a flush may be held off for a little while until the actual VBL goes out. So this is kind of important, and go back to the file copying example that I gave you. This is exactly what slows things down for people who try to overflush. So let's take a look at actually what happens. Application comes in, before VBL 1 he starts to draw. And then somewhere between VBL 1 and VBL two he actually does a flush. He gets scheduled for that flush. All of that time he was blocked.

Similarly, another application does the same thing, they're always going to schedule them, they're all going to go out at VBL 2. But during that time they get held off in a certain way. So, how do you avoid this? Also be aware that for Quartz GL, if you use Quartz GL the time delay is actually a lot less, so that's another benefit.

So how do you avoid things like this? Minimize, obviously you want to minimize what you flush. If you try to flush it more than 60 hertz, then you're going to bump, if you even flush at 60 hertz you might bump into it every so often. So definitely flushing more, less, less than the refresh rate of the monitor is a good thing.

Try and do useful work. If you're not flushing you could be doing something else, if you have something else to do. Definitely, that's another way of trying to avoid the situation. So how do you detect the problem? If you were to try and say I think I'm bumping into a situation, how do I turn, how do I detect it?

In Quartz Debug will, what we have is the ability for us to turn on, or force beam synchronization. And when you turn on the flash screen updates and the force beam synchronization, we throw it into this mode where what happens is that if you try to draw on the backing store before we actually took it out and sent it up to the display, we'll flash it cyan.

That way you can get an indication that what's actually happening is you're trying to draw, or flush faster than the refresh rate in the monitor. So that's also a kind of neat tool. Also too, Shark again, that will tell you, you'd see the lock or synchronization in there.

There's also a tech note that was published two years ago I guess exactly on this topic, and how it affects you. A lot of people don't necessarily have to worry about it, cause they don't try and update faster than 60 hertz per second. So definitely you can take a look at that tech note for more details.

Okay. So this is sort of coming up to the tail end of the presentation, and live resize. Sounds something, sounds like something very simple, but it is really, really very complicated. It is the most graphics intensive part of your application, when the user wants and doesn't want live resize the last thing you want to do is block your resize, and the first thing you want to do is make sure he sees what he asked to see really, really quickly. So how do all of that? You don't necessarily need to update everything all the time, every time, with as much precision as you want.

You can use periodic updates while the live resize is actually happening. You don't need to re-layout, you can refresh at particular, refresh periodically. (inaudible) is an example of this, when you actually live resize, there's a little table view that actually updates every now and again, not all the time. It doesn't do the re-layout for the rows, for the table view.

You also can consider drawing lower quality results. Preview is an example of this. When you live, when you try to do live resize, we actually put up a lower quality representation of the image while that resize is happening, and only when you drop the mouse does it actually go in and say great, give me the full quality image representation at that scale.

Text layout. Text layout is very consuming also. You want to try and avoid re-laying out the data. Text edit is an example of this where when you actually resize Text edit, the layout that exists before the resize happens actually is kept until you actually drop the mouse. Once you drop the mouse, then it goes and it does all the re-layout. You wouldn't necessarily see this in normal text, but you'd see it in, definitely in rich text, cause rich text is a lot more difficult to lay out.

And of course avoid disk and network access during live resize. It's not all the time, you know. If you were to say let's say somebody's favorite address lookup utility, and you were to actually go in and fetch data from some database while the resize is happening, this would not be a good place. I mean it's not as if the representation on screen the user's seeing is what we'd actually change, right. It's only when the user releases the mouse you can sort of run off and go and start doing some expensive work during that time.

So you should try not to block your user interface during live resize, try not to do expensive things, right, touching the disk, anything like that. You can do it once it's done, and at all costs you should try not to, definitely try not to block any kind of user interface manipulation at all. Cause it just annoys users.

So how do you do that in Cocoa? Well there's lots of different features. Well first feature is for instance, you might see in Text edit, Text edit actually uses this pretty extesively. It basically says for the update what I want to do is I want to get the exposed area.

So if you have a window that's been resized, there's a little L that needs to be repainted, you just simply update the stuff that overlaps the L, you don't need to draw all the interior content. So you can get the exposed rectangles inside of live resize, and just only repaint those.

There's also a switch so that you can know when you're in live resize. And we spoke about image interpolation quality, excuse me. You can find out whether you're in interpolation, sorry, in the live resize, and choose a different representation. So a lower quality representation, you know, in order to get much better response, and then when the mouse drops, then you actually do the better representation.

There's also an option in Cocoa that allows you to preserve the contents, where you actually basically say during my live resize, I want you to preserve my content. And what happens there is that they take a snapshot of the window, and just leave all of the outside area unpainted, basically. Or you just paint the outside area.

That's another cool feature for some type of things, especially some of the representations are really, really complex. Like if you had a CAD, and you know you have no choice but to draw everything, you could just say preserve what I have now, and then repaint these other pieces later, and then once it's done you're fine.

Notifications are also available, tells you when the resize has started and when it's stopped. In Cocoa, sorry, in Carbon, similar types of behavior in terms of finding out what happens to the exposed area, how to get newly exposed area. You just basically sort of diff the difference between the current and the previous balance, and that'll tell you what new area's being exposed. And you can handle the appropriate events to do more optimal invalidation. And of course in that framework there's also notifications that are available to tell you when resize has started and stopped.

Okay. So I want to leave you with this quote that I found two years ago, and I thought it was pretty cool. I don't know if it's completely true that it's a root of all evil, but it's fairly close. But basically the whole point is to use your tools, right.

Go and find out what your problems are first, don't run into lots of situations that you have no clue about, like our favorite president.

( laughter )

( applause )

And whatever tools, we provide lots of cool tools, you know, Quartz Debug, Shark, Xray. And whatever other tools, gdb is you know, the best tool.

Well not for performance reasons, but definitely printf is a good tool. And of course the last tool of course is you know, the ultimate computer, the brain. So definitely take your tools together, and make a decision about what you want to do, go in, refine your work, and do your stuff the way, how it should be done.

Want to leave you also with this last slide. And it basically is about a book that was written by Bunny Laden and David Gelphman. David is on the team, and he basically knows a lot about Quartz. And this book is a very good book, it talks about all the details of what you might want to know.

It's a good reference book to have if you wanted to do something in a particular way. It doesn't necessarily talk a whole bunch about performance, but it definitely talks about all the aspects of Quartz, goes through every single thing that you can possibly do it, when you want to do it, when you don't want to do it, pitfalls to fall into. It's a very, very good book, it should probably be on everybody's shelf.

So with that, that definitely want to say that we have labs, but it's Thursday but we don't really have labs, so we have open hours so you can come by and visit. I'll be there, some other people will be there. And that would be more or less it. So now I'm going to invite Allan, is Allan here?

For Q and A. So our Evangelist, he will, you can contact him for any questions that you have, and sample code is also available at the website, previous years and this year. And definitely, I'll invite Haroon. There you go. Two minutes, yes. So we can talk about, you know, come to me or at the lab, and we can talk about any details anybody has.