Graphics & Imaging Performance Tuning - WWDC 2002

Digital Media • 1:00:35

Discover techniques to ensure your application gets the most out of the incredible graphics architecture in Mac OS X. This session focuses on reaching optimum screen drawing performance and also explores techniques to optimize printing performance.

Speakers: Haroon Sheikh, Ralph Brunner, Joseph Maurer

Unlisted on Apple Developer site

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.

Good afternoon, everyone. Welcome to session 516, Graphics & Imaging Performance Tuning. And the reason we're having this session this year is because the fact that I work in developer relations, I get a lot of email for developers. It's part of my job. And a week, or practically a day, doesn't go by that I don't get an email from a developer who is going into excruciatingly great detail on how I can draw lines faster in Quick Draw under Mac OS 9 than you can do on Mac OS X. And a lot of that has to do with the new visual architecture that's in Mac OS X and the fact that it's a different visual pipeline.

And we spent a lot of time at this year's WWC addressing that issue with things such as Quartz Extreme and even optimizations that are in the software Quartz compositor that runs if you don't have a system capable of supporting Extreme. But the key thing is there's a two-way street in this.

It's that since the visual pipeline is different, there are optimizations that developers need to do to leverage that pipeline appropriately. And that's what we're going to be talking about in this session. We're going to talk about the tricks and secrets to unlock some of the performance you feel might be trapped in your applications on Mac OS X. So to start the presentation, I'd like to invite Haroon Shaikh, manager of the Quartz engineering department, to the stage.

Thanks, Travis. What are we going to cover today? Primarily, we're going to be looking at performance tuning from the perspective of the course compositor, as Travis mentioned, but also look at QuickDraw performance on Mac OS X for Carbon applications. We'll have a few demos, and the rest of the talk, we'll actually spend some time looking at tips and recommendations from us to you on QuickDraw and Quartz 2D.

So since Mac OS X 10.0, performance has been an issue. 10.1 addressed many of those issues. Jaguar continues to address many of the other issues with performance. What I'll be doing through the rest of the talk is punctuating the slides with what we've done with system level optimizations and recommendations for you. Some of the contents of the talk has actually come from developers like you looking at performance feedback, some of the things that they've been struggling with. So your feedback has always been important and going forward, let us know if you still have performance concerns.

We're going to be talking about the quartz compositor today, Quartz 2D, and also Quick Draw, which is part of Carbon. So this is part of the architecture diagram. So hopefully by now you know what the Quartz compositor is. It's responsible for all your screen display. It takes content from multiple applications and sends that over to the frame buffer for display. This year we've actually introduced Quartz Extreme, and that's actually using hardware acceleration to perform the composite operation on the GPU itself. So it's actually freed up a lot of the CPU for other processing.

People have asked, you know, we don't have access to the screen, give it to us, or we want to just get to the screen as fast as possible. You now understand why we've actually done this with the new architecture. And we've accomplished this primarily because we've had this window backwrapper abstraction so that you draw into your back-end store the windowing system on Mac OS X. And the Quartz compositor will be responsible for taking that content and flushing that to the screen.

So this is the architecture slide for the software compositor. You've got applications that are drawing into some form of a buffer, be it the window backing store or surfaces. The Quartz compositor takes the content from those buffers and composites them and sends them off to the frame buffer.

The buffers themselves are a shared piece of memory that is shared between the application and the course compositor. So in order for an application to draw into that buffer, they're usually calling Quartz 2D or Quick Draw to draw into that buffer. And when Quartz 2D or Quick Draw draws into that buffer, they have to lock down the bits, draw into it, and unlock the bits, and then call the flush operation. Then the compositor takes over and performs its compositing magic.

For Jaguar, we've actually introduced Quartz Extreme that you've heard about already. And Quartz Extreme is performing hardware acceleration. It's taking... The same buffer content, pushing that to the frame buffer, letting the GPU pick the bits up from the window backing stores. You'll notice compared to the software compositor, there is hardware acceleration being used here. That's not to say the software compositor does not use any hardware acceleration. The software compositor does use hardware acceleration, especially, for example, if you're doing opaque window drags. So it's not to say that there's only hardware acceleration in the Quartz Extreme.

So the first piece of recommendation we normally come up with is avoid the extra buffer. People are used to drawing to yet another off-screen buffer, taking those contents, performing a copy bits operation into your Windows Backing Store. So avoid that wherever possible. Once again, buffers cost extra memory. The copy bits operation is yet another step. So if wherever possible, try and avoid it.

You've also got off-screen windows in your application. These are windows that have been ordered out. You want to avoid them, primarily, once again, because these buffers will consume memory. What we recommend is always use one-shot windows so that all your off-screen windows will have disappeared. The buffer contents will disappear in the sense that when you order a window on screen, you draw into it.

Now when you order it out because you minimized or you've hidden the application window, if you're using one-shot windows, the buffer content disappears. When those are ordered back in because you've maximized or you've ordered those windows back in, that's when you'll get a window repainted message. And that's usually much, much faster than paging in the back-in-store memory off of disk. So unless you've got really content in your buffers that takes a long, long time for you to draw, always use one-shot windows. And Carbon and Cocoa do this differently, so look at how they recommend you do this, but always use those wherever possible.

Another common mistake that developers end up doing is you create a window, you then look at the preference file somewhere to determine what the size of the window should be and resize the window. Once again, this is just redundant operations in the system. You're putting up a window that is going to be destroyed anyways, so make sure you specify the window size appropriately. In order for us to continue maintaining a backing store abstraction, we're asking people not to assume that the backing store will be in system memory. It can move. This will allow us to further add future performance optimizations in the system.

Now, flushing is the operation that actually takes your content from the window back buffer or from the buffer and sends that off to the frame buffer. Now, flushing can be done both implicitly or explicitly for you. Implicitly by Carbon or Cocoa. When you've done your drawing, at the end of the update, Carbon or Cocoa will send off a flush request on your behalf to flush all the dirty regions that have been accumulated so far. You may also choose to do your own flushing. And you can use Quick Draw Flush Port Buffer, CG Context Flush, or even an AppKit Flush Window.

One thing to keep in mind about flushing is flushing is asynchronous. So what that means is basically when you execute the flush command or when the application executes the flush command, the window compos-- the software-- the Quartz compositor is actually responsible for taking those bits and sending them off. You are free to go and do whatever you want. So we'll have a demo showing how you can take advantage of asynchronous flushing.

The flush operation does not block until it's completed. It's only when you lock down the bits again whenever you do issue yet another drawing command. If the flush has not succeeded by then, that's the time when the lock will actually block until the previous flush has completed. So take advantage of that. Make sure you do not execute any redundant flushing. Delay your drawing until you need to. keep it together as much as possible.

Now, as has been mentioned before, the fastest way to get your bits to the screen or to the frame buffer is to draw into your window backing store and execute that flush command. The Quartz compositor will take all opportunities on the system to send that over the bus and send that to the frame buffer as fast as possible. Now, with Quartz Extreme, we're using AGP transfer. Those backing stores are textured through OpenGL. It's being pulled off the system memory as fast as possible.

Another thing to keep in mind about flushing is all flushing is actually beam synced. And that's primarily to give you that nice visual presentation that Mac OS X does where everything appears to the screen atomically. There's no window tearing. So on CRTs, you've got a refresh rate of anywhere from 60 to 120 hertz depending on your monitor. But keep in mind, LCD panels also happen to have a screen update frequency. And as a result, they are typically at 60 hertz also. If you're trying to flush anything faster than 60 hertz, your eye is not going to see it.

Those are just redundant operations, so try and avoid those. Try and shoot for a refresh rate of anywhere from 30 to 60 frames per second. Those are reasonable rates for your application. If you're not getting those type of rates, profile, try and find out what's going on in the system.

A new feature for Jaguar is scroll acceleration. Usually the slowest part of any memory transfer from main memory to system memory, that's the slowest part usually. With Course Extreme, it's actually faster, but the more you can minimize the amount of data transfer, the better. So for Jaguar, we've actually accelerated scrolling. And we do that by only sending over across the bus what is the new region that's been updated. At the lowest level, this operation is three times faster. You may not see a factor of three in your application necessarily because there's usually lots more going on.

So scroll rack and a scroll view, any imaging that you do through CG will have roughly about a 30% improvement depending on how much area is being scrolled. The recommendation here is use scroll rec wherever possible. We'll show in the demo where there are cases where scroll recs or scroll views are not being used appropriately, and so scroll acceleration is not necessarily hooked up.

In Jaguar, if you're copying from a window back buffer into the same window back buffer, if you're using core graphics or quarts, this will also take advantage of this optimization. QuickDraw currently isn't, but for Jaguar we will make sure that that will also take advantage of this. Okay.

You're also familiar with overlays, but in case you're not, overlays are nothing more than a series of layers of content. And in this example, you've got a transparent window with some controls on top of the content that you can compose on top of your existing content. So with Corts Extreme, we've actually made overlays accelerated in the sense that The transparency tax that was present with the course compositor is no longer a tax anymore, and primarily because all of the compositing is actually being done in the GPU, so feel free to take advantage of overlays, you know, come up with nice applications that take advantage of overlays in the system. You can use overlays in your system by, in Carbon, using the Carbon Overlay window class, Cocoa, you can do the same thing with a different approach.

You can always fall back to Quartz to clear the window so that all the alpha is set to zero, and then from that point onwards you can start drawing on top of it, and you effectively have an overlay on your window. Keep in mind that if you use Quartz, Iteris, Plex, Alpha, Quickdraw is generally alpha agnostic. So anything you draw into your transparent window through Quickdraw will touch, set the alpha to FF when it's drawing into the window buffer.

Yet another feature for Jaguar is backing stroke compression. So you've heard about it already in a previous session. But all it is, is you've got windows that are generally inactive in the system. After a delay of 5 to 10 seconds, those windows will become compressed and the Quartz compositor takes care of that for you.

They get decompressed only when you write into that backing store, so try and avoid redundant drawing into your backing store when not necessary. You don't want to necessarily compress/decompress, but the system is pretty smart about that. On average, we get a compression ratio of anywhere from 3:1 to 4:1 and really depends on the window contents that you've got on your screen.

One of the reasons it's a benefit is the Quartz compositor can actually composite that directly from the compressed data, only decompressing what's necessary. The backing storage remains compressed unless you're drawing into it. So this is a system level optimization that does not require any changes on your part. It should be completely transparent to you.

Other areas of performance tunings are related to the use of the velocity engine. So for Jaguar, we've got some more graphics operations that are now taking advantage of it. The software Quartz compositor is now fully vectorized, so it's not to say that there hasn't been any performance tuning on the software compositor. That's taking advantage of the velocity engine. There are blitters in the system that are now taking advantage of the velocity engine.

The example I show is QuickDrawCopyBits when you're copying into a Windows Backing Store. That's not done yet, but we're looking at doing that shortly. And also Backing Store Compression is taking advantage of vectorized code. This is a simple recommendation. If you happen to allocate your own bitmaps, make sure that the row bytes that you specify are divisible by 16, as that will allow more optimizations to kick in on the Velocity Engine. And now I'd like to actually bring up Ralph Bruhner to talk about Quartz Debug.

Okay, I'm going to do a few demos here. First thing I'd like to show is the asynchronous flushing that Haroon was mentioning. So I have a little app here that, when it's started, does quite a bit of floating point computation and just renders frames from the mental road sets.

First thing I'd like to show is if you don't make use of asynchronous flushing, by the way in the bottom right corner we have this little speedometer here which essentially is the frame rate counter. It measures the number of frames that the quartz compositor pumps out. So it works for everything in the system, even the Docker. So, without asynchronous flushing, you can see the needle peaks at 40, 45 frames per second. And from then on it will drop slowly because the computation gets more expensive.

The same thing with asynchronous flashing. And we're peaking at 55, 58 or so. So the only difference between these two modes is there's a timer firing 60 times a second because the goal is to produce 60 frames per second. And at the beginning we do some computation to render the next frame and then we draw it.

And this is quite natural way to do things and it exploits asynchronous flushing without any additional work because in the first part where you actually do the computation, during that time the previous frame can be flushed to the screen. So when I disable this little checkbox here to defeat asynchronous flushing, the only thing I do is right before I do a computation I draw a single pixel in one of these windows. And that causes us to hit that lock to wait until the flush has happened and then we start the computation. And as we see this really can make 15 frames per second difference. Okay.

The next thing I'd like to show is Quartz Debug, which is a tool that is, it was in 10.1 and it's slightly improved for Jaguar. What it helps you doing is it turns the Quartz compositor, it turns the certain debug flags on in the Quartz compositor. So for example, I can select flash screen updates and from then on, whenever an application flushes, The error that gets flushed flashes in yellow first for a short period of time. And so you can see what kind of drawing is going on.

This is a good thing to find out if you flush certain things multiple times. If you thought you didn't do any drawing in a certain area and it flashes, then it's probably an indication that some part of your code is executed there where you probably didn't want it to execute.

So, to illustrate that, we can look at scroll acceleration in Jaguar. So when I scroll in text edit, we see only the bottom part, the part that needs to be redrawn from one frame to the next, is actually redrawn. And the part in the middle just gets the hardware acceleration. So for comparison, if I look at Project Builder, Project Builder flushes actually everything. So we're going to fix that for the final release.

Okay. So, other flags you have in Quartz Debug is Autoflush drawing, which is whenever a primitive is drawn, we immediately call flush. So that simulates a bit like the Mac OS 9 style drawing where you see drawing trickle on the screen. So when I go and change stuff here, you see everything flickers because everything is drawn and immediately shown. Most interesting, this is... that sounded like Yoda. Sorry. You can see this when you turn both of these on. You say Auto Flush Drawing and you say Flush in Yellow whenever we flush. So you can switch to an application.

And then for example see how the menu is drawn or how individual text lines are drawn and stuff like that. So, because the entire system goes into that mode, things tend to be really, really slow. And what you can do about that is we have this little hotkey here. This is control option command T. So you can only turn it on for those drawings you're interested in and not, usually you're not optimizing the menu drawings, so that's kind of helpful.

There's also a really mean switch up here, which is called Flash Identical Updates. So Quartz then does quite a bit of work when the area that gets flushed gets compared before it's flushed against what's already in the screen and it flashes in red whichever pixel hasn't changed. So if you do drawing that is redundant, then everything shows up in red. So as an example, I can see that the carrot here has a little red border around it because these bits get flushed.

Now, these of course are a really small area and we don't really care about it, but the interesting thing is if I move the cursor around, we see that that little bar up there is redrawn for every cursor movement, and that's, for example, one of the things you should look out for and optimize.

Okay. The last feature of Quartz Debug is Show Window List. So it gives you this little NSBrowser view. of all the windows on the system and the applications that created them. So for example, I have Quartz debug here. So there is one window which is 4.5 kilobytes, another one is 270 kilobytes, and another one is 88. So the 88 one is 1K by 22 pixels, so that's our menu bar.

The other one is 270k. It's this little window with the nice quartz image in it. And then there's a third one which is an off-screen image, which is essentially the little checkbox which is cached in an off-screen image. So what you can do is you can look at the windows that have been created by your application and try to avoid windows that are off-screen because they're just taking up memory and they're mapped into a different address space where they probably don't need to be. So this is kind of a handy tool. You can also go and sort by size and then by application and then you have every application at the top list. Terminal has a 700 something kilobytes window here and stuff like that.

Another thing here... So we have one in Finder has 330K and has a little C behind the kilobytes number. C means compressed. So these are the windows that got were idle for more than 10 seconds or so and then got compressed by the Windows server. So another thing to look out for, if your application has Windows on screen and after 20 seconds or so they're still not compressed, maybe you're dirting them continuously and that's a hint to look at. Essentially, the recommendation is, well, you know, if you're idle, don't do anything because it spoils other optimizations in the system. Okay, with that, that was it for the demo. Now I would like to invite Joseph Maurer up here.

Freedom for Lower Bavaria! Now I got your attention. I know it's Friday afternoon, we're all tired, we want to go home. That's why I'm going to make it sweet and short. This is some kind of table of contents. I was thinking about how to present this subject. One idea was First, to convince you that all the rumors about bad performance of Quick Draw on 10 are wrong, and then to explain why Quick Draw is so slow on 10 But this wouldn't work well. So it's just a potpourri of What comes to mind in this matter? First item: I hate benchmarks.

Everybody can write applications that run slow, right? You don't need to make an effort. And you cannot believe how many bugs I have received over the last two years. Benchmark bugs that said, "This benchmark runs 20 times as slow on 10 than on 9." Yeah, alright. I know. I guess that's why I'm here now, right?

Not that I wouldn't want to make some jokes here from time to time. Of course I use benchmarks to sample and to figure out why we have all these performance problems on 10 with Quick Draw. So my favorites are sampling and microsecond timing. The question came up yesterday how to determine execution times. My favorite is uptime. It gives my machine multiples of 40 nanoseconds. That's good enough. And I can convert it and display and work with it.

Well, we have learned over the whole week, I guess, that frames per second The favorite benchmark on systems before X don't work well on X. In order to figure out how many frames you get on the screen, you would have to flush each frame. And flushing just chokes up when it goes beyond the refresh rate of the display. So, reduce flushing. And the benchmark, the frames per second benchmark is broken.

There's this other item of hardware acceleration. Ever since it became known that on Mac OS X Quick Draw would be tricked into drawing into a window buffer in RAM instead of directly into VRAM on corresponding to the screen's frame buffer. Everybody understood that this

[Transcript missing]

If instead you have to write them out in your even optimized software split loop you're going to suffer. And that's one aspect of what we have seen. You know, ever since Quick Draw has been tricked into Drawing into the window back buffer instead of the screen as it still believes.

Somebody has to give in the whole big picture of the forward vision that you have learned about this week. And QuikTour is the one to give here in this specific case. On the other hand, don't believe that just because these specific copy bits operations onto the destination, if it's the screen destination, the window, are not highly accelerated anymore, that everything is just hopeless now for QuikTour drawing.

For one, fortunately, all this hardware acceleration on 9 and previous systems is not as good as you think. Which is good for me because so I can get at situations where QuikTour on 10 is actually faster. The other... The idea about comes from I mean has been spelled out by Peter Graffagnino. Peter Graffagnino, one of his favorite sentences is you know by the end of the day everybody will have many more processor cycles everything will be faster. He doesn't say which day.

You may have seen that I have demos in preparation so by the end of today we will see situations where things are faster. Meanwhile, you have had time to read the reasoning why hardware acceleration on X is different, how it is different, and I try to convince you that it's eventually from the narrow point of view of Quick Draw, it's not such a big deal after all.

What tends to be a big deal potentially in some situations is the additional requirement for QuickDraw to maintain a dirty region for each drawing. You do not want in each flushing operation, even if you do it only 60 times a second, you do not want to flush the whole window content each time. So the maintenance of the dirty region turns out to be costly. It gets costly with increasing complexity of the region. I'm not sure but the The penalty of region processing goes at least with the square of the region complexity. It might be worse.

When, for one, you have a very often a non-trivial clip region. Each time a simple, dirty rectangle, say, gets added to the dirty region, it gets intersected with the clip region. And we cannot do anything about it if we would not, so many clients depend on the region to be flushed eventually be intersected with the clip region. Think about the finder's desktop, for example. CopyBits the whole desktop content and just clips out the little icon surfaces that need to be updated. The finder relies on it. If it would not each time intersect, the finder would each time flush the whole screen surface.

You have an intersection, you have a region reunion, and all in all ends up in the samples when you do your sampling of performance problems. So after some time of learning, we came up with the right approach, the right trick, which is To set the dirty region before you start a sequence of drawing to a big rectangular region, just as big as you need for your drawing, and go from there. And it turns out that things go miraculously much, much faster.

The worst case is a drawing where you do a lot of math and then each time you found some convergence point, you plot the single point. You get a point here, a point there, a point here, a point there, and after a while you're waiting until you get 100,000 points.

Well, this is the worst case for regions. Each time you add a single isolated point to a region. And these are the cases we are drawing on 10 when compared with 9 in the same Carbon app. It's not just 20 times slower. It can be much, much slower even. Fortunately, many of you have made the experience asking themselves now, "Why didn't I run into this?"

I never observed this type of problem because you did a big erase rect on your window content before you started drawing and this erase rect dirtied the whole rectangle so all the additional little drawings they got shortcut, the region processing realized I don't have anything to do it's already completely enclosed and this makes the problem go away but if the region you erased was not a simple rectangle but was clipped with some some fancy clip region instead then you are running into the problem and finally another candidate for optimization of quick draw on ten which just doesn't exist on nine is this Locked Port Pits story.

Because of the fact that each window buffer now is managed by the Windows Server, and when you start drawing into it, you need to make sure that it gets locked down and that you have access to it, you need to make this lock port it's called. Quick Draw needs to do this for each and every single API call, even if it's just to set a single point, a single pixel.

I have been told over many many months that Lockport is negligible performance wise and certainly it is if your drawing thanks to dirty region management or such is so slow that it doesn't show up. Once you are getting in the range where you are teasing it out then it does make a difference and we will look at that.

So the principle here is lock port bits can be nested and once they are nested, once you have the lock, it just increments the lock count and doesn't basically cost anything anymore. Why do I say be careful? System 10 has become pretty complicated and many things are going on on the Windows Server side which are often, when you look at it, counter-intuitive or puzzling.

Holding a lock, and there are other people here who know this much better than I do, can under certain circumstances stress the tolerance of the system design. And you may end up in problems which we would not like to have to deal with. The rule of thumb is don't push it too hard. Try to get everything done within a couple seconds at most or release the lock and then come back and continue. At this point in time, I would like to bring up on stage Phil Schiller, Senior Vice President of Worldwide Product Marketing. Yeah, I guess not.

What I meant to do was to put him on the 10 machine and I would have used the system 9 and then we would have done our competition and I would have won. - Well, I have to do it alone. Before we switch a couple of words to scrolling, the scrolling demo is a no-show. I said I hated benchmarks, I hate scrolling benchmarks even more.

I don't want, I hate it when it scrolls too fast. And I have a bad experience with scrolling benchmarks from a couple years ago on 8.5. I got in the last weeks before final a bug that text drawing was so damn slow, terribly slow, big disaster. And the way the tester measured it was by scrolling through a long text document. It took me hours to figure out that in fact, HIToolbox, the control manager, had put in throttles to slow down scrolling due to a user interface request.

So that's why. This doesn't mean that scrolling is not the problem. Why is scrolling a problem? Has been a problem up until now is because Quick Draw has to do this huge massive move of bytes in the backing store before it can get flushed and show up. Whereas before scrolling happened right in video RAM.

So for Jaguar, scrolling as you have learned has been accelerated. It has been hooked up to uh, the scroll-rect call-in quick draw. It still is not hooked up completely to a pure direct copy bits call in cases where it could be used. This is going to happen next week.

But still, the problems which Ralph showed where somebody scrolls a window content and everything gets yellow may not even have to do with improper use of scroll rect or such. I know that it's much easier when you have a huge document and the window really is just a window into this document to maintain the source rectangle and just do a single copy of it while you are scrolling through.

The problem is, in this case, the scroll rect optimization that now goes back to hardware accelerated scrolling cannot be applied. There's no way. Which means, it means more work for you to replace this simple copy bits call with a moving source rectangle by some more bookkeeping and a direct scroll rect call in the window and then filling in the little area that gets exposed. This was what I had to add to scrolling and then finally we get to the demo machine, please.

So we are coming to the second. Yeah, this machine is faster than my machine at home. We got these reports about frame rates. Now suddenly here I display frame rates. 355, pretty good. The window is too small. Yeah, that's more like it. You see that we spend 8 milliseconds for so many copy-witch calls.

During 1.8 seconds and we spend 106 microseconds in the flash port buffer on average the theoretical frame rate is the number of copy bits calls per second that came out of one of the G worlds. I have 20 G worlds in the background. They differ only by a little moving out of this pattern.

Of course I don't flush 111 times, but I could. So if I would, I flush 80 times per second, I flush 90 times per second and so on. And if you watch, maybe we don't get so far here and we won't waste time. You should play around with this at home.

and then run into puzzling situations and convince yourself that just don't do it because we won't help you fix it. You run into situations where it doesn't make sense to flush 140 times a second. It depends on hardware, it depends on many other instances. It's a similar game as the one that Ralph demoed.

If you start drawing, in this case copy bits before the previous flush has completed then you're just locking up and suddenly your copy bits calls themselves appear to become much slower. Why would copy bits become slower? It's always the same code. Well, something secret is going on in the system.

The application gets starved out of processor cycles. It's held back. And these are subtle things. They are too complicated for myself. So, um, play around with it but don't take it as an invitation and an encouragement and to heat up the frame rate race. What I like much more, my favorite, is this one.

There's a long story to it and the drawing is so slow that I thought I had time to tell the story. Well here it turns out that it's not that slow. But imagine that this wireframe drawing comes from a real world application and the people want to design the architects. They want to turn this around interactively and in real time make modifications to it. With waiting 7 seconds or up to 20-30 seconds on slower machines for this to redraw is completely impossible. By the way on system 9 the same drawing takes 2 seconds instead of 7.

which is still way too slow. So what did Jean-Paul Armand do? He wrote his own Brezenham and brought it down to something under half a second. So with less than half a second you can push it around the cathedral and work on it. and when he found out that on system 10 his On system 10.1, here we are running Jaguar, which is different. On 10.1, he spent about 20 seconds redrawing the wireframe.

He made a big noise on the development list and I took him aside and said hey let's work on this together and the first thing we figured out was the dirty region processing is there so a hidden key and David B. By the way, these two applications, they will show up hopefully next week or sometime soon on developer.appl.com with the source code so you can really figure out what I'm doing there and what's going on.

So with something like two-thirds of a second, that's not so bad. But remember there was something else on the slides which we could use to improve performance. The key L for lock port bits. There you go. Now it's better than his brazen hum. So now you can go home and tell your people there was an old Bavarian who taught an old dog new tricks. Thanks Joseph. So Joseph's just showing you how you can actually get performance on quick draw applications to what you expected. So take advantage of those two tips.

Going forward, I will spend the rest of the session talking about tips and tricks in general regarding Quartz, 2D, and Quickdraw also. So one of the biggest requests we've had during the last year or two is, Quartz has this great anti-aliasing, how can I get this in my Carbon application?

We've actually added this feature in for Jaguar so that this will allow you to have carts render your glyphs for you and you will get the same level of text-centered aliasing quality that you see in the rest of the system. We do this using the new API called Quick Draw, swap text flags.

It's not on by default, and I'll mention why later. But you can use this to pass in the first flag, which is use CG text rendering. Here we're using Core Graphics of Quartz to do the rendering of the text, but using the metrics in QuickDraw. If you choose, you can actually use CG to do the rendering and at the same time use the CG metrics. The difference between the two is one will cause a relay out because QuickDraw metrics, you're used to, you know how the text is going to position. It's only the glyph will be rendered with Quartz.

Using CG text metrics, they are subpixel positions, so you've got fractional components in there, and as a result, you'll actually get better-looking text. So you want to take advantage, make the decision which one you prefer better, and use that in your application. You can always fall back to the traditional quick draw rendering using the last flag I mentioned here. Using Quick Draw Soft Text Flags, that's an application-wide setting. If you want, you can also do it on a per-put basis using the API I mentioned at the bottom.

One of the reasons it's not on by default is there are limitations. Quickdraw's imaging model is different from Quartz's imaging model, so not all the text styles and transfer modes are supported. So in those cases, it will fall back to Quickdraw text rendering. There is a feature in Quick Draw called Glyph Squishing that fixed the rendering of glyphs where the font styles-- the font itself did not have correct vertical metrics. So we'll actually do some squishing of the glyphs. That's once again not supported in Quartz. Just recognize that that's not going to be there.

Using CG text metrics will cause relay out, as I mentioned earlier, and because it's actually positioned on a subpixel boundary, it will be slower. So that's why it's up to you as a developer to make a decision as to whether you want CG rendering in your system or not, or whether you want text to reflow or not. We cannot do this on by default in the system because every application is different. There will be a Q&A available for this discussing these things in more detail.

So you've got a Carbon application, you want to move over to using Quartz 2D in your application. You've got a Quick Draw port. How do you get a CG context? You use Quick Draw's Begin CG context. That gives you a context for that port. You go ahead, do your port rendering, and then when you're done with it, you do a quick draw and CG context. You cannot intersperse quick draw rendering inside of that.

If you want to do that, you have to perform the end CG context. One common thing that most people run into is they forget to flush the context. So before the end CG context, make sure you do call the CG context flush routine. That's how you'll get your content onto the screen.

But given that we've already mentioned that, you know, avoid redundant flushing, what you may instead want to choose to do is to use a CG context synchronized. And what that will do is you may have multiple contexts, multiple controls, multiple views that are all drawing at the same drawing. You don't want all of those to flush on their own.

You want it all to appear atomically. Use a CG context synchronized instead of the flush routine. And that will add to the dirty region and so that when at the end of the event loop when Carbon does its flushing of all the dirty region, it will all come up as one single flush.

Another tip related to this is you should now be able to replace your poster picture comments with Quartz 2D rendering. So where you've got existing poster picture comments, replace those using a Begin CG Context drawer. Do the equivalent rendering using Quartz, primarily because the imaging model is the same between PDF and PostScript. So that should be an easy transition for you. You can use that. And the benefit there is it's not only going to print on PostScript printers, but it will also print on ink charts.

Another nice feature that we have added that has been there since 10.1 is the ability to actually render PIC drawings into a CG context. So the benefit you get here is it actually uses CG to do the rendering of the PIC. It will also respect any of the Quartz 2D transformations that you may have set up into the context, and now you can actually rotate your pics, transform them however you want them to be.

Other simple benefits you will get is it actually substitutes shades of gray instead of the PIXMAP patterns that you might be used to. So it's better shown as an example. So this is Quickdraw rendering the PICT. And now this is Quartz rendering the PICT. And as I was mentioning, all your vector line art is actually rendered using Quartz, and your pattern has been replaced by a shade of gray.

There are limitations to this, once again, given the two imaging models being different. So there's no special transfer modes that are present in Quick Draw. The CG doesn't necessarily have those, so stay away from XR. We don't necessarily support them in Quartz 2D. There is also a performance difference. One is using native quick draw to actually render the pic. The other one is actually going through a conversion process.

But this is a good opportunity for you to actually convert your pics into PDF, which is the recommended way of processing these. It's very easy to create a PDF context, make this single call to draw that pic into the PDF context, and now you've actually converted your pic into a PDF.

Another common request we have is how do I actually get my graphics importers in QuickTime to work with Quartz? We're working with the QuickTime team to better integrate the two technologies, but in the meantime we've actually got some sample code that I'll just quickly go over today. In this case, all we're doing is we're creating a 32-bit ARGB G World on off-screen buffer. Get the graphics imported to actually draw into that buffer.

One of the limitations of this is everything is flattened into 32-bit ARGB. So if you happen to have an image that is CMYK, you would have to make that choice on your own, or be it any other color space or 8-bit or 16-bit data. Don't forget color profiles. Everything is managed.

Using color sync in Quartz. So get the color profile, color sync color profile from QuickTime. Convert that into a CG color space. There's a convenience function that we provide for you to do that. Once you've got that, you can get all your information about that image using the graphics importer. And when you create the image, like the width, height, the alpha information, anything else, that's available from the graphics importer. Specify the data provider, and you've got an image ref. Now you can actually draw it into the CG image context-- CG context.

When you're working with images, recognize that the CG image reps at the moment are not cached. Possibly that might change in the future, but at the moment everything is specified, it's rendered in the source data format. So if you're downsampling images and doing a lot of color matching, if color matching becomes an issue for you, do the caching yourself.

For example, you may have an iPhoto-like application where you're drawing a lot of thumbnails. The best thing is to draw it into an off-screen bitmap context, cache the results, create a new CG image out of it, and use that where it's already downsampled for you and the color matching is already done for you in the off-screen buffer. Use the JPEG data providers.

The PNG data providers that are new for Jaguar. The benefit there is you are now passing compressed data as much as possible through the workflow to the screen. And it's primarily important for printing, so you've got lots of digital cameras with high quality JPEGs, 3 megapixel JPEGs or whatever. Use the JPEG data provider. The spool files that you'll be generating will be much, much smaller.

Another point about those data providers is you can actually use custom data providers for any of the non-native formats that might not be supported by Quark2D. So even for the JPEG and PNG data provider, you can use those. So you can use not only the convenience ones that we provide from memory, from disk, through a URL, but you can also write your own custom data provider. For example, you may want to get the data from a resource file.

Another common problem that people ask us about is why are complicated paths so slow? There's a difference between what... The definition of what a complex path is, as far as Quartz 2D is concerned, is really dependent on the number of intersections in that path. For example, in the bottom left, I've actually got this pattern, which is all rendered as one single path.

Now, recognize that there is a difference between a single path and if this was represented as multiple paths, all composed of line segments. The difference is actually seen in the intersection region. So when Quartz actually scan converts this data, it takes into account the self intersections of that path and it will make sure that it actually does not do any double drawing. If you've got a single hairline pixel wide path, very likely you may not care about the intersections because they're not going to be visible. So in that case you may want to use multiple paths instead of a single complicated path.

There is a difference in the rendering. If the quality of rendering of a single path is not as important to you, go with multiple paths. They'll be much faster for you. The other thing to keep in mind is with multiple paths and single paths, the difference will also be in your line joints and your line caps. So take those into account. Avoid using line caps and line joints when not necessary, especially if you've got small hairline lines.

Lastly, I just want to cover another component of course, which is the CG Direct Display. This is for applications that are drawing full screen applications or games and whatnot. For Jaguar, we've actually added some new APIs to actually get at information about list all the online displays. We've already had something that would actually list all the active displays. It becomes important when you're mirroring.

There's a difference between active versus online displays. Active is the display that is actually-- there's only one single active display. All the other online displays are actually mirrored to that single active display. We've also got some new APIs to do some reconfiguring. Turn on mirroring, set the origin of displays.

And one new API that you might be interested in is to check to see if Quartz Xtreme is actually running on a particular display. That can depend from display to display depending on the video card you have, whether it's on a PCI bus or an AGP bus or not. So you can use that to check to see if it's actually running. And also we'll also be providing you with a new API to actually get the screen DPI given that we've got a wide variety of monitors out there with different DPIs.

Lastly, use your performance tools like Quartz Debug, Sampler, and Sample to profile your applications. You've heard the mantra before, "Profile, profile, and then profile some more." Use those tools. There's also the other tools. If you've been at other sessions earlier, there's the Chud framework that provides a new performance tool to get to the processor level. Those are your Shikari and your Monster type of tools. Take advantage of those.

You've got lots of documentation. The one that I recommend is the Mac OS X Performance Document. We've got quite a few Quartz 2D documents. And we've also got upcoming tech notes and QA about some of the QuickDraw performance issues and the QuickDraw text aliasing feature. And now I'd like to invite Travis back up to go through the rest of the slides.

What I want to do is take a quick section and just tell you about the one remaining session we have today at WWDC, and it's an important one. It's the feedback forum. And this is the place where if you want to give us feedback on what we've been doing, obviously for Jaguar or feedback in general, please come join us in room J1 at 5:00.

What I'd like to do is let you have my email address, even though many of you already know it. You can contact me at [email protected]. And this is relating to pretty much most of the graphics stack in Mac OS X with the exception of OpenGL. So if you have questions relating to what was discussed today or if you have questions relating to printing, image capture, color sync, Quartz 2D, feel free to contact me.