Optimal 2D Application Graphics - WWDC 2005

Graphics and Media • 1:18:06

Bring your laptop and learn the latest techniques to ensure your application's 2D drawing makes the most effective use of Mac OS X's visual pipeline. We'll cover high-performance drawing, optimized screen updates, and how to avoid unnecessary graphics overhead. Although we'll focus on best practices with Quartz 2D, this session also contains practical information on the entire visual pipeline--valuable for all application developers. Whether you're a Carbon graphics whiz or a NSImage master, come to this session to learn how to optimize your application's graphics.

Speaker: Andrew Barnes

Unlisted on Apple Developer site

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.

Hi, my name is Andrew Barnes, and this is the Graphics 2D Performance session. So now, we're going to start. So what you'll learn today is about how to use the graphics 2D API, Quartz 2D API, efficiently. And... to help you to improve your application's graphics performance. So, what you'll learn today is how to draw what you have to draw efficiently.

The second thing you're going to learn today is how you get what you've drawn onto the screen so the user can see it. And the third thing that you're going to do is sort of a meta-level type of how does your application interact with the user, sort of live resizing, little things that you might want to know when you're writing applications. So, let's get started.

Here's the graphics architecture slide. You've seen it before in other presentations. Basically, applications can draw either video, 3D, or 2D. And what they do is they draw, particularly for video and 3D, they draw into surfaces. Those are hardware-accelerated surfaces. And for your 2D content, you draw into window backing stores.

Those window backing stores are sort of shared between processes, the Quartz compositor, and your application. And what happens is the Quartz compositor takes that data, the data you've just rendered, either video or 3D, and assembles it all together, puts it in the frame buffer, and it scans out. The user sees it.

Great. So what we're not going to talk about today, although the things that you learn in this session would be definitely applicable, is 3D and video. There are other sessions related to those topics. And the second thing we're not going to talk about is Quickdraw. So hopefully you were at the last session, which had to do with transitioning from Quickdraw to Quartz, so we won't be talking about that here. But definitely your questions, go take them through the Q&A. Thank you. And hopefully we can accelerate your transition from Quickdraw to Quartz. Now what we will be talking about is Quartz 2D. Ah, there you go.

But first, interesting topic. What about Quartz 2D Extreme? You've heard it in the past, previous presentations, and for those of you who haven't heard it, it's the OpenGL-based acceleration of the Quartz 2D API. The final version, as Peter said in his keynote presentation, was that the final version was not in Panther--oh, sorry, Tiger.

But a debug version is available for you to test. So... While we went off and did all this really crazy, cool stuff, we learned a few interesting lessons. One lesson is that for the majority of applications, graphics rendering is not always the bottleneck. But for the applications that are rendering bound, we found out that if you were to not use the Quartz 2D API in the optimal fashion, you get varying results.

And so this session is really designed to help you to try and-- Change your application in such a way that you can make optimal use of the 2D API, such that when Quartz Extreme 2D comes online, you will benefit from it greatly. So the last point on this things we've learned was that if you use the API well, you get tremendous performance increases. So what you have to do as developers is definitely run out and take your applications and test it on Quartz 2D Extreme, the debug version.

And the optimization--

[Transcript missing]

Let's talk about drawing your stuff or your data into your backing stores. So what I'm going to cover here is basically some basic graphics optimization tenants. They're sort of micro-level optimizations, and they're sort of macro-level optimizations. The first one is minimizing state changes. Doesn't matter what graphics architecture you're on, what graphics engine you're using, minimizing state changes will increase your performance. So as we go through some more of this session, you'd realize that I was going to be talking to this point over and over again.

The second thing is minimize your drawing. If stuff doesn't need to be drawn, don't draw it. If you can get rid of data by culling it out, or know that you only need to update a little section of the screen, do that. It will serve you well in any kind of graphics API.

The third thing is sort of-- You have to be cautious about how much memory you consume, right? Even though people say memory is cheap and you've got a lot of it, the disk is slow. Every byte you write out to memory is a byte that can potentially get paged off.

So if you are cautious about the type of memory that you have and whether or not you need your terabyte of data of your atlas or your images, and you don't need it, you only draw it at once, just get rid of it. Don't let the system try and decide, oh, you need pages, so I've got to page this stuff out. Well, that's going to take some time, especially on my Mac Mini.

So now we're going to move on to sort of macro-level tenants. The biggest one here is sort of knowing your data set. It's kind of a dubious kind of statement, but it applies to a lot of things, and it has a broad spectrum. It has to do with knowing what types of data, knowing the size of data.

I always keep on saying there are like three numbers in the universe. Zero, nothing. One, something, and only one thing. And N, everything else. If your application falls into the trap of the class of there's going to be nothing or there's one thing, then fine. You can just have a very trivial algorithm that will deal with that. But if your application says, yes, I can open a file that has thousands of lines, you have to have ways of dealing with thousands of lines.

The same thing with images and sizes of images, and even the types of images. So knowing what type of data you're dealing with is definitely an important thing. The final point on macro-level tenants is sort of you kind of have to every now and again stand at 60,000 foot level and say, is this the right way of doing things? I know I did it before. This is the right way. This is the best way. But perhaps I should change things around and see if I can get better performance out of it.

First, we're going to move to the number one rule in minimized state changes. Objects. Objects are paths and different types of objects. We'll talk about those later. But you have to consider, can think about objects as a state. Right? Even in GL. Even in video. Any type of thing. And once you have an object, that object pretty much represents states. There's a lot of data associated with it sometimes.

But it's definitely represents states. So what you really want to do is you want to sort of say hang on to your objects, reuse your objects, set your object state, use your object, either release your object if you no longer want to deal with it, or hang on to it. So within Quartz, your objects as you retain them, sort of we kind of track your objects in a working set.

And these objects are sort of your object references are effectively keys in terms of into any internal caches that we might have. Right? So if you hang on to your image refs, you know, like for instance images, we might, when you render an image, we might color match it and do some other stuff to it. And we will retain this image. And your performance will increase whenever you sort of go back and reuse that same image again.

So just hanging on to your image is definitely a good thing. Or hanging on to your path or hanging on to your colors. The third thing is sort of more related to PDF generation. If you have an image that is, you know, whatever, how many, how big it is, but you're drawing it 60 times, if you draw that image and then you release that image, and then you go to the next page and you draw the image and you release the image, what's going to actually end up happening is that you're going to end up with, you know, the 60 copies of that image inside of your PDF file.

So hanging on to that image is probably good and when you reuse it, you get much more efficient PDF generation. So there are other set of related objects that we'll talk about further on in the sessions that have to do with when you hold on to these objects and you reuse these objects, you get much more efficient PDF generation.

So I told you when to hold on to your objects, but what about when to release your objects? Well, the obvious time to release your objects is when you don't need it, right? Or more optimally, or sometimes more optimally, you might want to release it when you know you're not going to use it for a while, right?

And by doing that, we will say, oh, yes, you know, the guy has decided to release his object, we can blow away our terabyte image because we no longer need it. So first we're going to start to talk about little objects. Objects can be classified into two types of sections.

So the coverage type objects, which is sort of like your cookie cutter sheet, and then your colorized objects, which is sort of your color that you're going to be pouring through this sheet to present to the user. So there are several types of coverage objects. There are paths, which are just general paths, and clips and text. And colorized objects, colors, patterns, shadings, layers, and images. So now we're sort of going to focus on coverage type objects. Paths.

Paths are basically general containers for any kind of geometry. They have curves, they have lines, they have rectangles, ellipses, anything you want to do. One important point about paths is that they actually represent atomic coverage. And what that means is that if you constructed a huge crisscross path, basically the coverage for each pixel inside of that path is computed such that the pixel is only touched once, which is a very different scenario than if you draw a line, line, line, the intersection points are going to be touched multiple times as you draw each line.

So depending on, for instance, things like your blend mode and whether or not anti-aliasing is turned on or not, you end up with little aliasing artifacts. So if you want a path rendered atomically as one object, you put it into one big fat path and you render it.

The third thing about paths is that we've gone through great lengths to make sure that your paths are fast. If you use path refs, like you construct your path, and then you say CG context, fill path or stroke path, that's going to be way faster than you going around and saying, "Oh, jeez, I've got my data structure which has got my 10,000 lines. Let me go through and say CG context move to line to whatever." So if you want, or I would suggest that you sort of build your paths up, hang on to your path refs, and then you can reuse your path refs, and you get much better performance.

If you don't use path refs, or even if you do use path refs, you should probably try and use the convenience APIs. Example I like to give is the ellipse case. It's much faster for us to generate the ellipse than for you to have to put in eight Bezieres. So if you need complex objects, you should probably try. If there's a convenience function API for you to use, you should probably use it.

So one of the final set of two points, avoiding state changes. So even though you've constructed your coverage or your path that represents your coverage, you might want to think about sort of maybe reorganizing your data such that you draw all your blue lines together and then your green lines together, for example.

So little things like color and actual line styles for stroking, whether it be the line width or the dashes, you want to set that state up and draw all of those together and then draw some other stuff. Last point on here, knowing your geometric locality. Sort of a bizarre kind of statement.

It's better off for you to, if you know that you have a path that consists of a little point over on the far corner and another path over there, it's better for you to use two independent paths for those things rather than using it as one big object. So that's one thing you have to sort of look at. If there's data over here and there's data over here and I don't need to see the data over here, it's nice to kind of separate the data.

So let's move a little bit on to the subpath, subsection of paths. Lines and rectangles. Well, paths are really cool for curves and all this monster stuff, but what about just lots of lines? How do I do that fast? Well, for Tiger, we've introduced a new Stroke Lines API. We've had DrawRex for some while now. And this is the fastest way to draw lines and rectangles on Quartz. So you definitely want to cut over to using stuff like that.

And here our point comes back about knowing your data set. As you can see, those are little, small diagrams. But well, on screen they're kind of big. But basically, if you know exactly what your layout is, and you have a huge monster, many megabyte CAD drawing that you need to draw, you might want to think about knowing what the data is and where it is.

So there's also this stuff about lots and lots of data. If you've got a waveform that has lots of little itty bitty line segments, you might want to think that you should probably distill that data down to something a little bit more optimal. You've got 10,000 lines, and you only got 100 pixels across for your window. Guess what? The 10,000 lines, they're not going to mean a whole lot.

So what you want to do is you want to know what needs to be drawn. So that means you don't want to draw the stuff that doesn't need to be drawn, i.e. that part of the map that's not visible. And what you want to do is you also want to know what is going to be too small to draw.

A little tiny thing that occupies 1 to 56 of a pixel is probably not going to make a difference to the user. So you probably want to say, kind of cull away those objects and minimize them. The last point is sort of about going back to re-examining your architectural decisions. Huge CAD maps, for example, are really complicated, and they have a lot of data.

So knowing exactly what your data is, if you have a set of features that are buildings and features that are a bunch of lines and streets and pathways and bike paths, you want to sort of kind of divide that stuff up into sort of geometrically localized data. So if you have a lot of different mechanisms there, there's sort of grids and quad trees.

My favorite is archery, which is sort of a-- Yeah. Yeah. It's a multi-dimensional bee tree. It allows you to-- so for 2D, it gets you to sort of organize your things into different sections, so that when you have your viewport, which is sort of walking around inside of your CAD drawing, you can cull away huge sections of data instead of re-rendering that data.

So for the graphics applications that do deal with these volumes of data, I'm sure you're doing all of the stuff. But for the other people who are sort of not as-- have applications that large, you might want to consider looking into some of these ways of organizing your data. So now I'm going to invite Haruno to give us a demo for the new Path and Lines API. HARUNO KAURU: Thank you, Andrew.

So I'm just going to talk about, through code, some very simple examples of some of the things that Andrew just talked about. This code example is QuartzLines. You want to pull that up if you've got access to it already. And we're just going to do a very simple example of trying to draw 10,000 lines. It sort of simulates-- arbitrary random data. But let me just run it just so people know what we're looking at.

It is supposed--it should show you some, you know, random lines being drawn, but it's--you know, you can use this as an example for audio waveform data, you know, stock charts, whatever. But it's basically a set of sample data. In this case, it's 10,000 lines. So let's look at the three different ways we're gonna actually do this. Let me just comment out some of this stuff for now.

Come into here. The first way-- Andrew's talked about different ways of drawing lines and some of the issues associated with them. The first way is we're going to create a new path. We're going to move to and add a line to that path and stroke the path. Basically, repeatedly all 10,000 lines are going to go out as separate paths. So that's our first example.

Second example, instead, what we're going to do is build a path once outside of the loop. And later, we're just going to begin the path, add the path, and stroke the path. This is what we're going to be timing next. Finally, the last example is-- Instead, we're going to use the Stroke Line Segments API, but that just takes an array of points. So we're just going to take our original data and instead create these new arrays of points from that. But really, most of the work is really being done there.

So let's run this code now. So I've got these three lines, three different ways of doing it, and they're just offset so that people can see everything. The original one being drawn as separate line segments is roughly around 200,000 lines per second, but the key point is the bulk line APIs that I talked about that we referenced is around 1.3 million.

So definitely use this sample code. Try it out in your application. There's enough people that have asked these questions in the last few years that this should be very useful for people to test and debug some of the things and see why they're not getting this level of performance.

One last thing I'm going to do is look at the--apply Andrew's comment of knowing your data set. So in this case, what we're going to do is--I've got 10,000 lines that don't necessarily all need to be drawn immediately because my window is roughly around 1,000 pixels across. So all we're going to do is generate a limited set of lines from the original data, and we're just going to have a small sampling frequency that sort of goes through that data appropriately. So we're not really going to draw all 10,000 lines. Let's see how that works out.

And obviously, you know, we're only drawing about 1,300 lines instead. It reasonably represents the original data, so use that to your advantage. And the key thing is, it used to take, you know, seven seconds. Now it's taking half the time, roughly. So know your data set. Take advantage of it. I'm going to pass it back to Andrew at this stage.

Thanks, Harun. So as Harun said, the key point is know your data set and use the bulk API. So now let's talk a little bit more about clips. Clips are sort of like paths, but paths are sort of like set your path, go. Forget about it. Clips are more like, OK, here's a clip. I'm going to draw a whole bunch of stuff, and I'm going to change my clip.

So you want to use clipping to your advantage. So there's some things that you can do in order to speed up your drawing. Obviously, the goal here is to not draw too much in addition to the state changes thing that you want to do. So what you would like to do is sort of cull out huge sections of objects.

For instance, in the diagram there, there's a focus in on the picture because you just want to get that cut out. But all those runs of text, you can just ditch. And you can ditch them at sort of an object level if you capture objects as runs of text in an array. You can look at these objects without doing anything terribly complicated. And you can say, hey, this entire thing is not going to touch my clipping boundary.

Why don't I just not draw that? If you don't do that, we'll end up doing that. So instead of handing us all of the stuff where we have to go through and get fonts and glyphs and all that stuff, we can ditch that entire thing if you don't tell us to draw it. So that's one thing, right? Culling your data.

The second point is about trivial clipping. If you know that the data can't be culled, but you do know that you sort of have to draw some of it, if you can, do trivial clipping. Rectangles, you say, hey, great, let's cut the rectangle, or text, OK, instead of drawing all 1,600 characters of text, let's draw the 16 of them. So you're sort of waiting between what your data structure is.

So, definitely, you probably want to definitely color away your objects and you want to do trivial clipping. And the third guy is if you don't do trivial clipping, at least use Quartz. So, okay, great, we told you how to clip, but we also want to minimize your clips, right?

Objects that are going to be clipped and objects that are going to share the same clip, you should probably just set the clip once and draw the objects. What you don't want to do is you want to say, oh, geez, I've got an object A with clip A, and then I've got an object B with clip B, and then I want to draw A again with clip A, and then I want to draw B again with clip B.

Why don't you just set the clip to be A and draw all your A objects and draw your B objects with the B clip? So that's basically getting along or talking to the point of minimizing your state changes. The second thing is you probably want to try to clip only the things that need to be clipped. Something I see developers do a lot of times is they have an image, opaque or transparent, it doesn't really matter. But they clip. They clip. They clip. They clip to the bounds of the image, and then they draw their image. That clip is useless.

The image itself inherently has its own clip. So try not to draw and try not to set clips on things that you know you don't need to clip. So that's one important point about minimizing your clipping. Also, in PDF and PostScript generation, it is much more efficient to do it that way.

So the last point on here is going on our little genre of minimizing state changes. You probably want to minimize your state changes around your clip, which gets back to the whole point of drawing your A objects with your A clip and then your B objects with your B clip.

So we're going to talk a little bit more about some more clipping. You'd like to clip on pixel boundaries. You might have a user space, which is rotated in that example. And the goal here is to find a device pixel. Typically, the rotated case is sort of almost a bad example, but it sort of gives you an idea about the complexity of it. Mostly it would be used for rectilinear-type transformations. So things are either stretched or whatever. And if you want your rectangle to hit a pixel boundary, what you really would like to do is specify a user space point such that it will hit a device pixel.

So given that you have the user space point in user space, what you want to do is you want to use the APIs to get the user space to device space transform, take your point or points, transform them to device space, then round them however you wish. And then invert transform. Those device space points back into user space. They may not align, but they would be a point such that if you were to draw it, it would hit a device pixel.

So that's something that you might want to do, especially with clipping, especially with image drawing. If you want things to hit cracks, this is what you'd have to do. We also have convenience APIs, CG context, convert point to device space and convert point to user space, to convert back and forth to simplify your work.

So now we're going to go into a little bit more clipping. A common technique that developers can use is to do pre-rasterization of the clipping. If your clip is really, really complicated, and you probably don't want to-- and you have to use it multiple times, and you don't really want to incur the cost of having to rebuild that clip every time you change it, one mechanism that you can use is that you can either-- you can pre-render your clip into a bitmap context, for instance, an alpha-only bitmap context, that would record the actual coverage for the path. Then you take that path, you hang on to that image ref, and then whenever you need to set that clip up, you can reuse that clip by either saying clip to mask, or you can actually say clip to image.

It'll be considered at that point an image mask. So you could have a color, and you'd say draw image mask, or draw image, which it'll look at the image and discover that it's only an alpha image, and therefore it'll apply the current color through that mask to give you the drawing that you want. So that's clipping.

At the Quartz level, text is sort of viewed as just a basic collection of paths, cache paths, right? Your container is your font and your glyphs are your index into all these different little elements that we use. So at our level, this is how we look at paths effectively. So what you need to do in order to do full text, and, of course, text is way more complicated than just a collection of paths, you need to use the Cocoa and Atsui APIs to do your text layout for you and rendering.

One thing that we always realize, especially when dealing with text, is that layout is hugely expensive, right? So, if you have a lot of text, and you're doing a lot of drawing related to text, I'm not just talking about the button type text. I'm talking about like documents. You want to consider caching your layout. Right?

As I said, layout is the most expensive part of text rendering. We can render the paths like nobody's business. But when it comes to layout, layout is hugely expensive. So, you probably want to start thinking about if you can cache your layout, you should do that too. The last point on here is to try and avoid state changes, obviously.

Things like font size, text rendering mode, even color have to do with state changes in order to render your text. So, you should probably try and minimize it. And minimize it to the point of reorganizing your objects. I mean, if you have a web page. That has a whole bunch of black characters.

And then, you know, a whole bunch of blue characters. It's probably wise, you know, for you to say, set the font. Set the color to black. Render all your black text. And then set the font. And then set the color to blue. And then render all your blue text.

And so, within balance, I mean, there's a painter's model where you draw one thing. And it's supposed to appear on top of the other. But, you know, typically people don't like overlapping text because they can't read it. So, that's one thing that you can do. So now we're going to talk a little bit about colorized objects. So we've spoke what?

So, coverage objects. Now we're going to talk about colorized objects. So the first colorized object is color. Color refs. Color refs represent single sort of color spot colors. Sort of its infinite sheet of everything being the same color. As well as patterns, right? It's still an infinite sheet, but it's a repeating pattern that sort of goes, continues. Using color refs are always faster. Always faster than using the CG context set RGB color or It's more efficient. We can cache the colors, color refs, as I said.

Your color references are keys to internal caches inside of CG. And when you reuse your color refs, that is going to be a good advantage to you. So as I say, even in the case where you're doing the black text versus the blue text, use a color ref to set the text.

The second thing is, or the third thing on this slide, is about color spaces. What you'd like to do is you'd like to use the color spaces appropriately. If you have gray and it is gray, say it's gray. Don't say it's RGB. This helps us in many ways.

It's more complicated to match those colors versus gray colors. If something is gray or it's not CMYK, don't convert it into CMYK because it's not always equal to the same thing. You definitely want to use your color spaces appropriately. And as always, try to avoid color state changes if you've got your blue text versus your black text. You know the story. The other thing is to do with transparency.

It doesn't matter what architecture you're on, how fast your GPU is or your CPU, dealing with transparency is encouraged at a certain cost. If an object needs to be transparent, then say it's a transparent object. If it doesn't need to be transparent, like it's very close to opaque, then say it's a opaque.

Whenever we're dealing with transparency, it's a full read modifier right of the destination. So if you want to increase your fill rate, for example, you would probably want to use things that you would realize when you actually profile it yourself that opaque fills are much faster than transparent fills.

So I told you to minimize your state changes, and I told you to use color refs. But what happens if you have a gradient that is a slight transition from one alpha to the next? You can go and create separate objects, but that's somewhat wasteful in order to just change the alpha.

So you can use CGContext set alpha to get the same effect, and it's a much more lightweight state change. It has nothing to do with the color or color matching. It only has to do with a transparency value. So if you use the CGContext set alpha, you get a much faster method of doing alpha transitions.

OK, let's go on to patterns. Patterns are basically representations of tile drawing. They're a lot faster than you going through and saying tile, tile, tile, tile, tile, tile. Patterns are also cached by courts. So what happens is that when you draw a pattern, regardless of what space you specified your pattern to be, the pattern-- the space that matters or the representation that matters is the one that's actually going to physically hit your device. So you might have a pattern that's 1,000 by 1,000, but you're only running it into 100 by 100.

What we do is we cache the 100 by 100 color matched version. So now whenever you want to fill, if you reuse your pattern ref all the time, what will happen is we look that up in our internal caches, and we'll find them and say, hey, we've got something. We don't have to go through and call your pattern proc again and do color matching on the data.

So definitely you want to reuse your pattern refs. So that's a big advantage. And the last point on here is about tiling boundaries. So in the previous section about clipping, where we spoke about transforming your user space to your device space, you probably want to do that too in this particular case, because you do want your patterns to hit pixel alignment. Easy way to do that is to set the appropriate tiling mode, which is a constant spacing version which allows your patterns to be spaced on pixel boundaries.

So now let's go into shadings. Shadings represent sort of smooth transitions between colors, right? And they're much faster than you, obviously, drawing each scan line. Imagine doing that picture by just drawing little thin lines of colors or even stretching an image. So you can definitely do shadings, use shadings, and you get much faster color transitions.

Shadings are also cast by Quartz. In the same way that patterns are cached in a device-represented form, color matched and sized appropriately, shadings are similar. So when you reuse your shading refs, you would get-- end up picking up the cached version every time you redraw it. Final point on here is to do with function refs.

You should try and represent function refs efficiently, right? If you're going to be running a function that's doing some sine curve or whatever, you might want to think of a more-- a more efficient way of having the function samples get generated, you know, or-- a more efficient way of generating your function samples.

And the other thing that you would probably want to do is that, you know, given that you have a shading which represents an actual shading, if they share function refs, you should probably try to reuse those function refs because your function ref samples are also cached. So if you have a shading, you can change the shading geometry any which way you want or even use different types of shadings. But if you reuse your functions, then you get much better performance.

So now we're going to talk about the big Mongo person, images. Images, they represent sort of general data, which represent general bitmap data, which is either color or-- which is and--sorry, color and alpha. Images--images are the biggest cached object that we have, right? We go through great lengths to try and get, you know, image caching. I mean, patterns are 1Ks, and they are used, and developers use them frequently. Images are the one that, you know, pretty much, you know, most user interfaces are a lot--are built with images. Images and text, those are the big guys that we concentrate on all the time.

So a few important points about images. Definitely you want to consider in certain situations to use the image quality, right? If you have an image that's being down sampled, you want to probably, depending on what you're doing, whether you're interactive mode and you want to do things quick and dirty, you want to set the interpolation level off, interpolation quality to low.

And what will happen, we just do a point sample, and then we can sort of render things more appropriately when you feel appropriate--when you feel it's appropriate, you can say, set the image quality to high. And then that way, we go through and do the full line cost, you know, down sample to get the, you know, pristine quality that you guys would like. So use the interpolation quality to your advantage. That's a good thing.

The last thing is--the second to last thing is--is sort of-- also related to clips and also related to pattern boundaries. You want to grab pixel boundaries. You want to find them and draw them, right? If you tell us to draw an image that's on a pixel crack--not on a pixel crack, but halfway in, we go through great lengths to make sure that you get those nice little anti-alias edges on the corner. So if you don't want that to happen, align your images to your destination pixels. The last thing is sort of using alpha effectively.

As I said before previously, that alpha is an expensive operation, causes the full read modify right. If you have an image data that is really opaque and you just happen to fill in all the alpha channel FFF and you say it's alpha, guess what, you're going to get a performance penalty because we're going to think of it as alpha.

We have to do a per pixel operation to figure out whether or not this pixel needs to be blended with the destination. If you have things that, let's say, packed high, like you've got your alpha and your RGB just because your rendering engine or whatever you did came in that form but it really is an opaque image, just say alpha skip first and then we'd say, oh, it's an opaque image. We can just blast that to the destination. So definitely you want to use alpha effectively. So getting more into sort of images, knowing your pixel format. This is a whole knowing your data set thing. First guy is Robites, right?

Typically what ends up happening is that, one, either your image is going to be aligned to the size of your image, the width of your image, and in the case of sub-byte pixels, it's gonna be the nearest byte. That's actually not on this slide, but it's definitely rounded up. You want to use an alignment that's appropriate.

Definitely you want to start off with at least the byte, rounded up to the nearest byte if you have sub-byte pixels. The second thing is you want to at least align up to the pixel. If you've got RGB data that is eight bits per component, 32 bits, aligning onto a 32-bit boundary is probably -- sorry, a four-byte boundary is probably a good thing. The next one up from there is vector alignment.

Vector alignment is also good. We have a lot of vectorized routines that would benefit greatly from you telling us not to have to do shuffles all the time. So if you aligned your row bytes to be vector aligned, that would be a good thing. The second one or the third one is sort of cache line alignment. Cache line alignment, that's sort of a dubious one because it depends on your architecture, but 32, 64, those are good sizes.

Of course, there's wasted space on the side, but this is what you have to sort of gauge, whether or not that wasted space is worth the effort or worth the performance advantage versus not. So the next thing is component size. Well, 8 bits, 16 bits, 32 bits and floats, those are all good. Your 11-bit data, for example, that's probably not going to go down the optimal path.

So if you have data that is 11 bits, if you're only going to draw it once, let's say, and you're going to rely on the image caching, then sure, fine, set your 11-bit data. But if you're going to draw it repeatedly and you figure that it's going to get drawn out of your cache and you're going to reuse it often over and over but not too frequently, you might want to sort of move that into a different 16-bit, for example. That will be a lot more optimal path. So those are component sizes. More about pixel format.

Premultiplied data. People always wonder, geez, do I premultiply my data or not? Well, that's a good question. If you have data that you're working with, like pixels that you're actually physically manipulating, you've got some rendering engine, if it's better for you to work with alpha, premultiplied alpha, then sure, do so. We have paths that will handle that.

But if you're not, don't try to premultiply the data for us because we can do it. We don't need any help. And sometimes, as a second point on here, if you try to premultiply the data ahead of time in order to save us time but you just happen to do the wrong thing, like color match it or interpolate it or something like that, we end up having to undo your premultiplication. So the best advice is for you to just, if you have data that's unpremultiplied, leave it unpremultiplied. If you have data that you want to work with premultiplied, sure, that's fine. Use the premultiplication there. That's a very important point.

Okay, so now let's talk a little bit about knowing your Indian-ness. We all heard about the switch. So we're wondering how we're going to deal with that. So images are considered at the API level to be a-- Byte, order, stream. And what that means is most significant to least significant repeated ad infinitum for every scanline.

We've added some new bitmap info constants that you can mask in when you create your objects. And what they are, are basically-- it tells us basically a swap unit. And the swap unit is effectively applied to entire stream in sequence. There's no skipping. It's sort of you have your image data, and we do swaps of the stream as we go into pixels.

So as developers, when you're running on the Intel platform, you probably want to be prepared for image data. If you are asking for data or manipulating data, you probably want to be prepared to receive this kind of data. So you can test to find out what the image bitmap format is by asking get bitmap info, and that will give you the actual mask.

That you can sort of-- sorry, the info field, which you can mask off to say, geez, is it big Indian, is it little Indian, or is it host? We've applied host here to sort of say, I want it to be host. I don't want to care. It's a relative fashion versus an absolute.

So you have those three sets of-- or three classes of constants to use. So let's say you're on an Intel machine, and you have some data, and you want to apply this Indian thing to go fast. Well. Our suggestion is to only do this for things that really, really make sense.

Things like RGB data that's swapped on a 32-byte boundary, that doesn't make a lot of sense. And there are probably not going to be too many optimal paths to pick this kind of data up. So you probably want to stay away from stuff like that. But definitely, things that are like 32-byte, that are ARGB, or even skip RGB, then you can apply the 32-byte.

So the important things to know about whether or not you should apply this byte order swap flag, it has to do with one, your component size, two, your pixel size, and three, your row bytes. So the best way for you to figure out whether or not you should use these flags is you think about it as the component size or the pixel size must be congruent to the value you want.

So if it's 16-bit components or 16 bits per pixel, yes, you can turn on the swapping, use the swapping. As well as row bytes, right? That's obvious, because you really don't want us to be swapping across scanline boundaries. So you have to use this with caution. But in general, these flags are really useful and allows you to go a lot faster, because we have a lot of very, very tuned paths for these cases.

So now we're going to move on to color space and color spaces. So even though you know your pixel format, you want to know what color space to use. Well, our suggestion is that you should use the generic color spaces, right? We have lots of generic color spaces, which are RGB, CMYK, and gray. You want to use those for your data. Which one do you use? Use the one with the least number of components for which you have data.

In my example about colors, you don't want to say, oh, jeez, let's make it RGB when you have gray. Just use gray. So don't try and coerce your data into something else. If you've got data, we have color spaces that suit you. We have all sorts of RGB gray.

And we have a lot of other color spaces, like calibrated stuff, that you can use also. So definitely avoid coercing your data. So other people sometimes want to say, jeez, well, I really want my images to go fast. I don't want to have to go through this color conversion bottleneck or color conversion hit. How can I avoid it?

Well, you can use the destination color space. If you say that your image is in a destination color space, we don't incur any color matching penalties. But there's a catch. The data is in color match. So it's not color correct. It's not color managed. So you've got to be cautious about that. If you want it to go fast, then fine. Sometimes if you want to create color correct data, you probably don't want to do that instead. You want to use a generic color space.

The last point on there is, what if I don't know my color space? Where do I get it from? Well, you can get it from the bitmap context, or you can use a display profile. In your examples, for I think the worm example, there's a method of getting the display profile. So that's what you'd use to go for window contexts.

Um, images and caching. Strange thing, but sometimes useful. You got, like, a 16-megabyte file, which is a TIF file. And you really want to render a little thumbnail. Common technique that we suggest to developers is to pre-render the data. And how you would do that is that you'd create a bitmap context of the appropriate size and color space-- remember, either destination or generic-- and pixel format, alpha or not. If your image is rotated, you probably want alpha. If it's not rotated and it's rectilinear, it's going to fit right inside of your destination.

You can create something that doesn't require alpha. That's also going to be a performance speedup for you. So you create your destination. You draw your image into the bitmap context. Then you ask the bitmap context for its image. And then you render the image instead. Don't render the terabyte image. Render the little tiny thumbnail. So that's one way of caching.

The second way of caching is sort of more advanced. It has to do with data providers. You all see these data provider things that we have to create. In image IO, there's data sources. It represents the data source, where you're getting your bits from. But it effectively decouples your image specification of what it is, from the actual data source.

And because we cache images, or your image addressor uses keys to caches, you can sort of create your image, create your source-- or sorry, create your source first, obviously, which is your data provider. Then create your image, wrap it around that, and then draw your image. And you say, OK, great. I want to hang on to my source. I don't want to evict all that data. So what I'm going to do is I'm going to hang on to my source, but I'm going to blow away my image.

At that point, we say, oh, you've destroyed the image. So we blow away our cache version. But you still have your data source. Now, there are lots of different techniques that you can use to minimize the effects of hanging on to that data source. Right? You can purge the data source when you see fit. Right? Or you can pre-render your data source. Like, for instance, the first time you use your image, you can pre-render it or pre-decompress it. Right? And then when you draw your image, it's already going to be pre-decompressed.

So there's lots of different ways that you can use data providers to your advantage to sort of deal with decoupling this, where am I getting my image data from versus the image that I'm actually going to use to draw. So you can definitely start using data providers in a more appropriate fashion.

So you can start sort of dealing with caching, doing your own caching, doing your own pre-decompression, doing a lot of these different things. And even in the previous example about image caching, you can actually have an image that is actually-- you have the data provider that is there sitting there, but you haven't physically made memory for it.

And it's only when the request happens for the first time do you actually say, oh, geez, actually, let me actually draw my little thumbnail for my two terabyte image. And then that way, you can ditch the image when you see it's appropriate. So that's another way of sort of an advanced technique for getting of some of the penalties associated with large data.

[Transcript missing]

So in this case, this is the key code that we're really looking at. Everything else around it is really-- so this is about 14 operations per second. Instead, let's move on. What if we were to just have CG do the caching for us, because all we're going to do is repeatedly draw that image instead.

And this is the second example. Here, we're just going to draw that image. Let's go back to our example to go on to the second case. We're up to about 1,000 operations per second. So in this case, CG is just basically blasting through that data, because it's already cached.

What if we, as Andrew mentioned earlier, what if we ended up using CG bitmap context to actually do the caching for us? So there's quite a bit of code one has to come up with in terms of all of the details of creating the bitmap context. When we're done with it, what we're going to do is draw our original image into that bitmap context. And once we're done with it, from the bitmap context, we're going to now have our cached image instead.

One other thing that Andrew was referring to earlier was, this bitmap context, we're actually creating an display RGB color space. So have a look at the code. It's very simple to actually get at the RGB color space. That's the one we're going to be doing. We're going to match directly to that space once and be done with it.

And finally, all we're really going to do in terms of our test is just draw the cached image instead. So let's see how long that took in terms of the operations per second. Roughly the same thing. Basically, CG is doing the same level of caching effectively as you would end up doing. So better for you to just hold onto your image refs. Instead, what if instead we were to use a layer ref?

This is my fourth example. In the layer ref, the API is very simple. You don't have to worry about the details of what type of image from-- sorry, what type of bitmap context I want to create. Just get the content, create the layer from the context, and then just draw the image into it. So we're basically very simple.

Performance wise-- roughly the same, so in all of those cases you know the code is much simpler so definitely you want to use that instead. My next example is where you may have your own rendering engine where you've got, you know, you are drawing repeatedly to it and you want to just draw that image to the screen, so I'm simulating that by just drawing a random rectangle into my bit map context and the bit map context represents my off screen bit map that I've -- that I'm drawing into and I want to draw that repeatedly or draw that as fast as possible.

Basically in this case because that image is immutable you do want to create a new image out of it and then release it. If you were to hold on to that image you would probably get the cache representation that was there already, so let's look at the performance in this one. These are our rectangles. We're roughly at around 287 operations per second.

Better yet, if you can draw everything through the CG API, and you do not need to touch the bits directly yourself, just use LayerRef instead. And the reason for that is we can actually do-- let me jump to it-- much, much better caching of it. So LayerRefs are the way to go. And take advantage of them.

CG Image IO, there's been several sessions on it already. It's very simple. I'm not going to go into detail, but basically, in this case, I've got a test image that I'm loading up. Going to use CG Image IO using an image, a source ref, and just draw the image.

My point here is, in this case, it's actually taking 23 operations per second. What's really going on here is the entire image is being decoded. It's actually being color-matched. It's also being down-sampled. So everything's being done at this point. Instead, what if we were to do some image I/O decompression caching so we could get rid of one of those cases?

And to do that, all we have to do is-- That's actually, I don't see it here. But it's probably in the create image code. But we want to pass in an option to the CGImageCreate so that we can actually get the caching done for us. So let's look at that running.

And in this case, it's now 43-- double the performance. So what we've actually done is just, the JPEG is now no longer being decompressed repeatedly. But instead, what we're seeing is the image is being downsampled. And I've actually scaled it so that we can defeat CG's caching instead. So at this stage, I'd like to pass it back to Andrew. ANDREW WONG: Thanks, Arun.

So the key points--reuse your image refs. Another key point--switch from bitmap context to layers. Third point--use ImageIO where appropriate. Okay, so now we've told you how to draw your data into the Backing Store efficiently, and now we're going to start talking about how to get your data from your Backing Store onto the screen efficiently.

So, graphics architecture slide, you've seen it before. As I said, we're concentrating not on the part that's drawing into the backing store, but we're concentrating more on the part where the Quartz compositor takes all the surfaces and all the backing stores, generates them together to produce the final output on the frame buffer. So, flushing, right?

A lot of users sort of just flush willy-nilly. You probably want to stop that. You make sure that what you flush is what the developer--is what the user needs to see, not what you just happened to draw at the moment. Users would prefer to see an atomic frame show up and say, "Hey, here's your result," as opposed to things trickling on screen. So definitely, you want to flush only what needs to be flushed. Second point about flushing is that if you try to flush faster than the refresh rate of the monitor, the user's not going to see it.

So don't flush faster than the refresh rate in the monitor. Try to flush in a more efficient way. If you don't flush in a more efficient way, basically you're just going to consume tons of CPU in order to get that data onto the screen. So definitely minimizing what you flush is a definite advantage.

So let's see a little bit about that. So we've got a little timeline where we're running our refresh going at VBL1, VBL2, VBL3. We got an application. It's green. It's all good to go. Application comes in at VBL1. Screen refreshes, and that's what you see on the screen.

At subsequent, before VBL2, application draws A, consumes a chunk of CPU and GPU resources to do so. Then application draws a little bit again, draws B, consumes even more CPU and GPU resources to draw B. But guess what? VBL2 gets what the user's going to see--B. So, let's try again. Do C, consume even more CPU. Do some more stuff. Lots of flushing. Lots of drawing and lots of flushing. And then finally ends through Z.

That's a lot of stuff. Lots of CPU. But guess what the user is going to see. Just Z. So wouldn't it be nice if the developer could-- or you guys-- or your applications would be a little bit more optimal than what it flushed? And get rid of all of those guys and then look at the resources used in order to render those frames.

So definitely, if you want to increase your performance in your application, don't overflush. What you want to do is you want to use an optimal flushing kind of mechanism. So one suggestion is to use animated-- for animated content, you probably want to use a timer. Have your timer run at the refresh rate of the monitor, or half the refresh rate of the monitor. The second thing about your drawing is that you should probably try to-- the refresh rate should be greater than what it takes for you to draw a frame. Because you don't want to have your timer backing up on itself.

So definitely. Use the timers. Another thing about it is to-- which is a very important point about drawing-- is that what you really would like to do is you really would like to decouple your visualization engine from your data engine. If your data engine is going off and running off and touching the network and churning out a bunch of files, you don't want to have the user penalized for that. You want the user to be able to resize and snap around and do all this extra stuff whenever he wants to.

But your data can get rummaged on the disk, and you can do all that heavy lifting. And you can do it in the background. So another example of that would be, for instance, like TextEdit. TextEdit does this layout when you open a really large document. The layout actually happens on a separate thread. The user can open a document instantaneously, but the layout happens further on.

So this is what you can also use these different techniques. Definitely for network and disk I/O, threads would be the answer. Sometimes timers are good. So you want to definitely minimize your flushing in that way. So what I'm going to do is I'm going to invite Harun up again to show another demo about drawing-- drawing and flushing optimally.

OK, so this is a worm demo. So you want to pull that down if you've got access to it. But this was something that was discussed a few WWDCs ago. So I won't go into the details of it. But let me at least run the demo, so you know what's going on. But basically I'm just going to be playing a game, which is-- you get the point. You've seen these games before. It's a worm or snake going around.

And we're going to try and see how fast we can actually do this. When the game's playing, it's running at around six frames a second. And there are different ways of improving it. So in this case, in the four cases above, instead of the play, what we're doing is basically trying to run the game engine at about 1,000 frames a second. But we're only seeing about 200 frames. And then we go through a few optimizations.

[Transcript missing]

So let me move this out of the way. And notice, I'm actually running on Panther, and we'll talk about Tiger later. So the first optimization-- let me quickly skip through some of this stuff. But we're setting the view as opaque. In the second optimization, which I won't cover here, we're only drawing the change rectangle.

The third optimization that could be done is using the NsLayoutManager to actually do the caching of the layout. The fourth one is what we're really interested in, is decoupling the display timer from the engine timer. So this is where I wanted to point to you, that we're running the engine at about 1,000 frames a second.

What we're really doing in this case is just adding a new timer in here. The timer is actually going to run at 30 frames a second, which was one of the recommendations I made. And then the timer, on its callback, all it's going to do is actually end up calling the fire update.

Well, let's look at the fire. Fire update does nothing more than just do a sets need display in Rect. And given that we were actually going to be drawing this periodically, and we may want to accumulate a dirty region instead, so most of this code really does is have a look at it, play with it. All this does is actually just update the dirty rectangle. So finally, what you want to do is just stop the timer. So back to Andrew.

ANDREW FITZ GIBBON: Thanks, Arun. So I hope you guys caught the CPU meter, right? As soon as you switch to the good version, the CPU meter went . So that's what I was getting at, sort of consuming resources in order to flush. So thanks, Rune. So now let's go a little bit up to updates.

Every flush you do causes an update to happen on the screen. So we go back to our little diagram, but we're going to change it up a little bit, and what we're going to do is track the CPU required to produce the next frame for the next VBL.

So application one comes in, and let's say he did exactly what Haroon did and minimized his drawing and flushing at 30 hertz. So he comes in and he starts to draw, causes some CPU utilization or to generate the frames for the next VBL. Application two comes in, right, draws some more, consumes some more CPU. And it's easy to see that after N applications, guess what, we're already maxed out.

So even though you've done your work and ran off and optimized things and changes so that you don't flush all the time and create an optimal way of doing it, there are other applications in the system that can be running at the same time that will still cause us to get into this situation. So what we're going to do is I'm going to demonstrate exactly what will happen when these applications are all running. Switch to demo machine one. Demo machine one.

Okay, so here we have, you know, a little status updater. Basically what it's doing, it's pretending as if it's copying files, let's say. But it does insist on printing every string that needs to be shown. So what we're going to do is we're going to start these guys.

So the number really should be exactly that number. It's 480, right? Because that's the time in which the simulation is happening. So as we start--so he's basically saying every time he needs to copy a file, he--or print a copy of file or do an iteration, he prints a string, right?

So now we start another guy. And then now we start another guy. Oops. And right away, It's not taking us too long before we start cutting our rate in two. So basically, our data engine is running at exactly the same speed as a visualization engine. But guess what? Our data engine is being slowed down by a visualization engine. So what you really would like to do is not do this.

You would really want to say, well, Do we really need to be printing all these strings all the time? I mean, can you see those strings? Probably not. Or you can see a blur, I guess. So what you want to do is you want to say, hey, got a little version here setting up that it's doing things in a different way. It's actually printing out the strings like every--

[Transcript missing]

We're still copying at 480, right?

So don't let your visual engine slow down your data engine, right? So that's one important good, very good point. So let's go back to the slides a little bit. So even though you've drawn optimally, and we told you about the situation, even in that example, we can still run into other situations where we're still, because of multiple applications are running, we're still banging into each other because you and you and you and you, you're all wanting to flush. You're doing it optimally, but you can still see the CPU is going to get taxed. So what we did for Tiger is we co-released all these updates.

What happens is that your applications draw into their perspective backing stores, and that data is then composited together in one atomic frame, or co-released together in one atomic frame, and presented to the user at once. So basically, all the applications, all their flushes, are all co-released together, and they all get sent out in one frame.

So this type of strategy makes much better use of the GPU and CPU. So we don't actually obey your flush right now for the next VBL, or we don't send it out, but we schedule it for the next VBL. And you're flush, and you're flush, and you're flush. And you get to see the points.

But let's sort of look at it. Let's go back to our little familiar timeline here. Got timelines going, the CPU tax for actually sending out a frame for the next VBL. So let's say application one starts to flush, but he doesn't really get flushed right away. We sort of schedule him for the next VBL.

So that's actually what happens, and then CPU usage to generate that VBL frame actually goes up, obviously. But the second application that comes in and flushes sometime later, or before, or whenever, he does his flush, and it also gets scheduled for the next VBL. But the CPU meter hasn't gone up. We haven't spent any more time generating that frame. And you can see, after n applications, there's no more extra CPU being used.

So we only end up generating a certain amount of CPU just to render that frame. And we're not sort of at the mercy of multiple applications generating it. We're generating and flushing. So now I'm going to show you a little bit of this in action, switching to demo two.

So now we have--in Haroun's example, I got--well, I have the status updaters up there. There are two bad ones and one good one. And I got worm, which is Haroun's example, but one good worm, which is the one he showed you last, and then I actually colored the bad worm in red. So now we're going--the fish are swimming. We're--we're--we're--see it, you know, they're hopping along. And then, you know, let's say we wanted to play a movie.

So we play a movie, right? So the movie's being played, and it's also utilizing a certain amount of resources in order to generate its frames. But the movie is pretty taxing, so it's actually using CPU. But what happens now is that, as you can see, the fish are kind of chunky. But when we switch to Tiger with those co-release updates optimizations put in, we get this instead.

Fish is swimming a lot faster. There's less CPU being used in order to generate the frames for the next VBL. Everybody gets scheduled together, right? So there's a definite advantage for your user. Your application's--your entire user interface does not feel buggy because it has multiple stuff going on. Not that you'd be really running Atlantis and playing a movie and copying files at the same time, but you get the point.

So now we're gonna go back to the slides. So--sorry. So, co-released updates. Well, we told you that your application updates are being scheduled for the next VBL. That's exactly what's going to happen. But there's an important, interesting point to note. Your application drawing onto a particular area, when you flush that area, it's going to stop you from drawing on that area until the actual area that you've previously flushed got committed to the display. So there's a hitch. So let's look at what that means. So applications typically go through a drawing cycle, right? They draw and they flush. So application starts drawing for a particular period of time, then it issues a flush. That flush gets scheduled for the next PBL.

During that time, the application will be blocked from drawing on his Backing Store until the actual data gets committed to the display. Similarly, another application that comes in, maybe starts to draw a little later and flushes a little later, he also gets scheduled. But his waiting time is a little bit shorter. But basically the point is what you would really like to do is any time you need to do updating, try to sort of do at half the refresh rate or even the refresh rate of the monitor. That way you sort of minimize your drawing.

But to be more specific, what you'd like to do is you'd like to sort of obviously avoid these blocks. How do you avoid blocking your application? Well, the first thing you've got to do, as we said before, is minimize your flushing. That's the one thing you have to stop.

You use timers or use threads or whatever you want to use, try to do. The second thing to do when you happen to do drawing is what you'd like to do is try to do useful work before you start drawing your next frame. So if you've got lots of layout to do, do the layout first before you start drawing your text.

Don't layout and draw at the same time. So that way you have ample amount of time from your previous flush to do lots of useful work. We're not off touch with this. Do layout, whatever you want to do, and then start generating the next frame. So another important point, scrolling.

Scrolling is an interesting thing. Scrolling, you might need special treatment for scrolling because scrollers, especially on the button, user interactions are fine. But when you're scrolling the buttons, typically developers say, OK, great, I've got another button click. Let's advance by 10. I've got another button click. Let's advance by 10.

What you more want to do is you want to do timing. So you say, OK, I've got a button click. I've advanced by a certain amount. OK, the next time I come back, you say, well, I'm not actually moved up by 10, but I'm actually moved up by 50. Let's issue the next scroll application to move by 50. So those are a bunch of different ways you can do to alleviate your problem.

What you'd like to do is sort of also, how to detect the problem. Well, Quartz Debug is there for you to use, and you want to use Shark also to debug your samples. There's a special option called Times All Thread States. What you want to use is use that.

And what that would do is that will not only track the time spent in your application, but it actually tracks a time global system-wide. So that's the way how you would pick up blocking. But there's a Shark session that is coming on further today that you would see.

But anyway, so the crux is that there are some difficulties involved. But there will be a technical coming up that would talk some more about trying to avoid some of the penalties associated with over-flushing and trying to optimize your flushing so that you work better. So now we're going to talk a little bit about application graphics optimizations. Basically, I'd like to share this quote with you.

Premature evil. It's a very nice quote, I like it. Don't optimize before you figure out you need to optimize. So basic discipline, profile your application, optimize your application. Rinse and repeat, add in front item, keep on going. Re-explore architectural decisions you might want to use. So that's the basic discipline.

Tools to use, Shark. Shark's your friend, right? And then related sessions coming up at 2:00 PM, you want to use that. Go to that session. What you want to look for in Shark, anything to do with graphics. Graphics, course filling, stuff like that, you want to use those.

Second thing you want to look for is flushing. Well, you won't actually see flushing. What you flush is completely asynchronous. And when you try to do this flush, it won't show up on your profile. What will show up on your profile is the next drawing operation after you flushed.

So that's what you would see in profile. The last thing you want to look for in Shark is anything to do with locking and synchronize. Basically means contention. If you see it, try and avoid it. Another tool to use is Quartz Debug. Lots of cool things in Quartz Debug. Auto-flushing shows what's been drawn and how frequently or redundantly it's been drawn.

Flashing updates, that will show you and say what's been flushed, even though you've drawn it. You wouldn't want to flash it. And flashing identical updates, what that does is it allows you to figure out what's being drawn that may not necessarily need to be drawn. You see the same white rectangle being flashed all the time. Well, that's probably something that you probably didn't need to flush or draw. And there are also lots of tools to do with high DPI and frame rates and stuff like that. So let's consider one last section about live resizing.

As I said, user wants to interact with his application. And they really want to behave in a certain way. What you really want to do is try and do separate data engine versus visualization engine. You might want to maybe sometimes render lower quality results. You don't have to draw your high interpolation quality every time you resize. And text layout.

Do your text layout ahead of time. Or do the text layout every so often. Or do the text layout when the application has finished resizing. Those are all these things. Network and data access, avoid them. Don't do them at all. Just if you can get away with having your application resized without having to touch the network, you should probably-- your application will benefit greatly.

So for live resizing in Carbon, there are a bunch of things that are associated with it. But basically, the general rule is to don't draw or invalidate-- or only draw or invalidate the newly exposed stuff. Don't draw too much. So the way how you do that is just basically diff the previous and the next bounds, or the current bounds and the previous bounds in the event.

And then for HI views that are compositing views, you would use the invalidate to minimize what you're invalidating. And notifications are available when it's been started. When it's been stopped, you can use that to say, OK, great. I'll create my low quality result image, or I'll do my layout or whatever. So you get the notifications.

In Cocoa, similar set of rules. Only draw the newly exposed regions. You can find those out by calling the get next and get exposed recs call to give you back only the exposed regions. And the other thing is, too, you can use the in resize option to figure out, oh, I'm actually in resize.

Maybe I don't need to do my high quality image. And the other option that the kit has also provided was preserving content. You can specify that-- your view preserves content during resize, so you don't have to draw anything. You only have to draw the newly exposed data. And notifications are also available in the kit, too.

So for more information, there's lots of sample code and resources that can be found at that URL, other related sessions. Swimming with the Sharks, definitely want to go to that. Sharks your friend. And who to contact? Travis. Everybody knows Travis. He's cool. And basically, this is a book that's been presented I'm sure maybe you've seen other graphics presentations.

There were these slides. There's a free chapter on performance. This is a very, very, very good book. It's written by one of the developers that work with Apple. So he's got the whole skinny on basically every type of optimization technique you might want to use. And just basic drawing.