Configure player

Close

WWDC Index does not host video files

If you have access to video files, you can configure a URL pattern to be used in a video player.

URL pattern

preview

Use any of these variables in your URL pattern, the pattern is stored in your browsers' local storage.

$id
ID of session: wwdc2005-632
$eventId
ID of event: wwdc2005
$eventContentId
ID of session without event part: 632
$eventShortId
Shortened ID of event: wwdc05
$year
Year of session: 2005
$extension
Extension of original filename: mov
$filenameAlmostEvery
Filename from "(Almost) Every..." gist: ...

WWDC05 • Session 632

Performance and Graphics Tuning Your Java Application

Enterprise IT • 55:50

Maximizing speed and performance is important for any application. Bring your laptop to this session and explore how to identify and remove performance bottlenecks throughout your Java application, using Shark, Sampler, and other powerful profiling tools found only on Mac OS X.

Speakers: Viktor Miladinov, Gerard Ziemski, Christy Warren, Josh Outwater

Unlisted on Apple Developer site

Transcript

This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.

Hello and welcome to the hands-on session about performance and graphics tuning your Java application. So before we begin, I just want to remind you that there is a WWDC survey form that you can go and fill out, and we want your feedback. So having said that, let's talk about the purpose of this session. And it's very simple.

The title says it all. We're here to make your Java apps run faster on Mac OS X. So how do you do that? Basically, there are many ways and techniques to optimize your application. But today we're going to focus on two areas. The first one is going to be we're going to focus on graphics performance tips. And the second area we're going to focus on, we're going to talk about some free tools that come on the Mac OS X platform that you guys can use to profile and optimize your Java application.

So if we do our job well today, you're all going to walk out of this session and either during the break or when you get home or when you get to your hotel room, you're going to spend 15 minutes, and within 15 minutes you should be able to get a clear idea of where the bottleneck is in your application. And hopefully optimize it and make it better. So we're going to talk about some tools that will let you do that.

So let me give you a brief outline of the session. First of all, we're going to talk about graphics do's and don'ts. What that means is that there are certain APIs in graphics Java APIs or Java 2D APIs that if you call the wrong API as opposed to the right one or the right API as opposed to the wrong one, you're actually going to get an order of magnitude speed improvements or decrease. So we're going to talk about some of those areas and what you should keep in mind when you write your graphics code.

Second, we're going to talk about Quartz debug. Many of you might know that, but if you don't know it, it's a very simple tool that you can use to quickly optimize your Java repaints. So you can profile them and you'll see what areas are you painting unnecessarily. So we're going to talk about Quartz debug.

And third and last, but not the least, we're going to talk about Shark. And Shark is a very cool profiling tool. If you haven't used it, I'm sure you'll love it. So last year we had support for Java. And we're going to give you a couple of demos about Shark and how to use Shark to profile your Java application. So without any further ado, I'd like to Gerard and come to talk to you about graphics.

Thank you, Kurt. Hello, everybody. My name is Gerard Ziemski. I'm an engineer at Java Classes. And today, I'm going to talk to you about graphics do's and don'ts. So for those of you who are new to developing Java applications on Mac OS X, first you need to realize that the graphics implementation on Mac OS X is done on top of Quartz.

Quartz is the marketing name for Core Graphics framework. So that implies several things and among them the most important is that certain operations, certain APIs as provided by Java 2D map one to one to what Quartz supports. And there are some APIs that there is no direct mapping to Quartz and that involves more operations and that means certain APIs may be slower or work slightly different.

So here I'm going to tell you all bit about those APIs and point out specifically those that you should try and avoid while developing your Java applications for Mac OS X. You have to realize that the performance characteristics of your application will differ from the ones you run on one platform and then you take it and run it on another platform. This is a two-way street. Certain applications may work better on Mac OS X than on Windows and vice versa.

So we'll be talking here about three main things. First of all, we'll talk about buffered images. Then we'll talk about painting from third threads. Third thread here is defined as non-AWT event thread. And then I'll briefly talk about XR. So let's talk about buffered images. First and most important, you guys have to realize that there is no way for you to predict what is the best image format for you to use.

Especially right now, that we're also going to support the Intel platform x86. It's a little end-to-end platform as opposed to PPC, which is big end-to-end. So you really cannot guess what image format is the one to use. Don't hard code your buffered image types. Always use Create Compatible Image.

Second, you have to realize that because we're depending on Quartz to provide us the functionality for implementation of Java 2D graphics, there are certain things that limit the way we can use it. For example, Quartz insists that the images are immutable. If you look at the Java 2D specification, a buffered image, you can draw into it, you can draw from it. Those images are mutable. So there is a mismatch between Java and Quartz. What that means is, if you think about it, is that we actually have to keep two copies of the pixels.

One copy is on the Java side, the buffered image, data, raster, you know, all that. And then on native side, we actually have to create a core graphics image rep, and we have to copy those pixels into the native Quartz image every time you want to use it.

So, of course, that means there's a penalty involved. We are trying to avoid that penalty, and in most cases, if you follow the right way, you will not hit that penalty. However, if you do things not quite as we would like them, then you will actually hit that penalty, and your application will suffer.

So let's take the GetRGB method that is defined on BufferedImage. What that means is if you call GetRGB, you're asking for the image pixel value. You're on the Java side, but that image actually exists as a native image, as quartz image, and those pixels are actually defined on the native side.

So when you call GetRGB, what has to happen is that we cannot really look at the Java side and look at the pixels because it actually exists on the native side. So we have to cross the boundary. We have to go from Java to native. We have to look at the native pixel, and then we have to return it back to Java. What's more complicated is that really quartz doesn't want you to see those pixels. So there is a lot of work involved in retrieving that.

So there's cost associated with it, but that cost is per call. So you call GetRGB, there's a cost, you come back, and things are fine. So there's another one, Raster GetDataBuffer. Now that has much, much bigger implications from the performance point of view. What happens is if you call Raster GetDataBuffer, you're requesting an data buffer.

So you have direct access to the pixels of the image. So from that point on, we have to turn off our optimizations. And then from that point on, whenever you draw a primitive, that primitive gets drawn into the native side of the image. But since on the Java side, you requested a direct access to the pixels, so that means for every single primitive that you draw, we have to sync pixels back from native to Java. Back and forth, back and forth. So if you don't have to, avoid calling Raster GetDataBuffer. Because from that point on, the performance of your application will really suffer. And to demonstrate this particular issue, here is Scott.

Great, thanks Gerard. So if we can go to the first demo machine. So what we did is we found an applet online that actually showed this problem. And it's this great thing called the Zombie Infection Application. I just took it from an applet and put it in here. So if you've downloaded all the materials, you should have this Xcode project. Just open up the Zombie Infection Xcode project. And I just want to run it.

And so this is a slightly modified version of what's online. And what you'll see happening here is it will bring up a simulation of a city. And the city has a bunch of blue people. If you can see these blue dots all running around here. And then these green dots that show up here are these zombies who are infecting the people. And so it's simulating how zombies would realistically infect a city.

So this is a really important application for scientists who are investigating zombies. But the point here is that this was kind of a cool thing we saw online, but it was really slow. And we fixed some obvious slow things. But then we wanted to look at it and see what really was slow.

So I put in a frames per second counter. I also wanted to see how well the zombies are going. So I put in some counters to see these are zombies. These are people being less and less people. And then some of the people fight back and they're corpses. So the little red dots are dead zombies. So the point here is that this is working OK, but we're on a really beefy 2.7 G5. And we should be getting more than 26 frames a second.

So I wanted to take a quick look into the code. So if you open up the zombie panel, this is pretty much the applet just embedded right in a J frame. If you go into the step method of the zombie panel here, we can look at how this application actually works.

And the step is basically one iteration of the simulation. You can tell us by the comment here. And it goes through all. It goes through all the beings in the world. And what a being does is first we plot the current location of the being to an empty color, erasing our color.

Then we step. And this lets the being do whatever it does in its step operation, which usually involves looking to see what's in front of it. If it's a person and it sees a zombie in front of it, it might turn around and run. It might attack the zombie or something.

And then that involves looking ahead. And then what we do is, in the step, we'll probably move or do whatever we do. And then the last thing we do is we plot our new location. Well, I factored out the world object here. So if we go into the world, we can see what the plot and peak methods do. Well, the plot method, if we look all the way at the end, is getting a raster on the image, and then it's calling setPixel. That's okay. That's fine.

The peak method here is checking an X and Y coordinate, and it's trying to see what's ahead of it in this world, and it's doing it by looking at the image and getting the RGB value of the image. So immediately we said, aha, this is the deal. This guy is using the image itself for storing all of his data, and he's calling getRGB to tell what kind of objects are in this position. And so we know this is hitting a slower path. So what we did is we just did a very simple fix here.

And we did this by if you just search for double slash performance, I have all the code here so we don't have to waste too much time. Let me start the very first one. And so what I did in the world is I actually created a model for this. Instead of using the image as a model, I just made a little inter-ray for the model, and I initialized my inter-ray.

and here when I plot, I set the value for that inter-ray. And then when I go to peak, instead of doing a get RGB, I just look things up in the inter-ray. And if you remember before, we got about 26 frames per second, I think, something like that. So let's see, let's stop that from running. Save and build and run.

and our new version is getting 36 frames per second. So just by the simple change of going to a data model and not doing get RGB, we got a huge increase with just a very small change to this whole thing. I mean, this also is a better architecture, probably, for your application, because you could render it not just into an image, but you could render it as something else now that you have a data model behind it. But it shows that if you avoid doing the get RGB, where we have to take the native pixel and look at it, you'll actually get faster performance out of it. Thank you. - Thanks Scott.

The last thing that I would like to mention for this issue in particular is You should decide whether you want to use the Java 2D APIs to draw primitives. And if they're too slow for you and you want to implement your own draw line or your own fill reg, you can do that. But in order to do that, then you have to call get raster, get data buffer, and from that point on, you are stealing the Java pixels.

But from that point on, if you are going to do that, then please stay on that side and then don't use normal draw line and normal draw reg or fill reg Java 2D APIs. Mixing those two mechanisms is what will give you a slowdown. If you just stay on one side or on the other side, then the performance will be just fine.

Can we have a slide, please? Right, so let's talk about painting from the wrong thread. You have to realize that windows on Mac OS X are double buffered. They're retained, non-retained, there are different types, one-shot, but all of them are double buffered. What that means is whenever you draw into a window where you actually draw is the back buffer.

So the primitives that you draw do not appear unless somebody calls flash. Now in Java 2D APIs there is no mechanism for you really to control when that flash happens. When you draw using the normal AWT painting mechanism, we know when to flash because there are clear entry points and exit points. There is a start of paint method and there's the exit of paint method.

So we know when you're done painting and then we can flash. However, if you're drawing from a thread older than AWT event thread, there is really no information there for us to know when to flash. So the best that we can do is have a heuristic and then we have a mechanism that tries and makes sure that we flash at least 30 times per second.

Now the problem with that is... If you are drawing, and suppose you have an application, and the very first thing that you're doing is erasing the background, so you have a full rack, and then you start drawing your frame, but by erasing the entire frame, and then you start slowly drawing the frame.

We don't know when to flash, so that flash will happen from another thread, will happen at different times when you draw your frame. That means it can happen right after you just called full rack, and right after you erased your entire frame. So what you will see on screen is just nothing. It's just empty frame. It can happen somewhere in the middle of you drawing the frame. What that will do is the user will actually see half of the frame. Half of the scenery finished.

So that's not good. And the problem, the flicker, we'll refer to this problem as a flicker, can also occur when drawing into images. The scenario is quite similar. If you're drawing into an image,

[Transcript missing]

So let's open Animation Flicker on screen. We'll take a look at this case first.

So let's quickly take a look at how this application is written. So we have normal staff here. We're setting bounds, visible. Here is the interesting part. We are creating a thread which we'll use to render the application. So the rendering actually happens right here on this thread. And here is where we're drawing directly to the screen. So let's run and let's see how this application looks like.

As you can see, there is not a whole lot that is happening here. Basically, mostly you get blank frame. And there's a little bit of flicker there. There's something there. But basically, this is unusable. And we've seen applications like this. We have actually seen them. Here is the point. Windows, the actual Windows on Windows platform are not double buffered, or they may not be. So those Windows are single buffered.

If you draw a primitive, it appears directly on screen. So that application run on Windows platform will actually work just fine. But on Mac OS X, it will not. So how do we fix this? The fix is actually very easy, but you, and as silly as it sounds, you actually have to implement your own double buffering mechanism. So, basically what you have to do is you have to change from drawing on screen to drawing off screen. So let's implement this method.

I already have implemented this method for you, so let's take a look at it. Here is what it looks like. The very first thing that we do is we'll create a buffered image into which we'll be drawing. And notice we're using Create Compatible image here. Once we have that image, we'll obtain the graphics from it, and then we'll proceed in the same fashion as we were before drawing on screen.

We'll use the same method, except we'll use that method to draw into our image. And then once we're done with drawing to that off-screen image, we'll take that image and we bleed it back to screen. So this will work. See, there's no flicker. So this is the on-screen case. Let's take a look at the off-screen case.

So we also have a thread which we use to render our application. So here is the thread. We are using off-screen right here for our rendering. So we actually are rendering off-screen into an image. But then, if you see right here, DrawGameOffScreen, What we do here is we ask the component to repaint itself by calling repaint. What that will do is that we use the AWT paint mechanism. So we'll go back to paint right here.

And that image will be used from the paint method to draw back to screen. So let's run this. There is a little flicker. Interestingly, this application will behave much better on a single processor machine. If you are developing this application on your laptop, you may not get to see the flicker. You may be surprised when your clients are complaining that this application does flicker because they are using a dual processor machine. This is a dual processor machine. It flickers here a little bit.

What is the fix? The problem here is that the rendering into off-screen and then rendering that off-screen back on-screen is not synchronized. There is a trivial fix. All that you need is just synchronization. And we can use the actual image as a synchronization lock object. So we have our paint method synchronized. And then the actual rendering used to be synchronized with that.

There's no flicker now. That is just one way of fixing this application. But this will do. Can we go back to slides, please? There is one another solution. If you remember for the on-screen case, we implemented a double buffering mechanism. You don't actually have to do that yourself. There is already a buffer strategy API in Java 2D APIs. You can use that. It's much simpler and it works too.

[Transcript missing]

Here is an example. The first snippet of the code is using XOR. We are setting the color blue, set XOR mode color black. Now guess what color you will get as a result of this. It's some sort of yellow. You have no control over the transparency and it doesn't look just right. If you use Alpha Composite, you have control over the translucency of the color and the resulting look will be more native-like. So I'm done, and I would like to ask Viktor to continue with his presentation. Thank you.

Thanks, Gerard. So let's talk about Quartz Debug. But before we do that, I want to set up a simple case where Quartz Debug might be useful. So let's talk about how you animate a rectangle. So we have a rectangle, very simple, and you want to animate it. And we're going to make two assumptions.

The first one is we're going to-- let's say, subclass component. And we're going to do all of the drawing in the paint method. What that means is that we're going to do the painting on the event thread. So the problem is move from point A to point B.

Very simple. So how would somebody go about doing that? First, if you have the geometry of the rectangle, you would change it from position A to position B. Second, you're going to actually ask the repaint manager, say, hey, we changed our position. Please repaint us. And then later on, when the repaint manager is ready, he's going to send you a little repaint request, and he's going to call your paint method.

And at that time, you blit the rectangle from point A to point B. So that is a very simple example. And I kind of breezed through that because it's not that important, but I kind of wanted to set up this question. What could go wrong with this example? So the two obvious questions that somebody should ask themselves is, well, are we going to repaint more often than we need to? The second question is also, are we repainting a larger area than we need to? And just to keep in mind, graphics, paint, and paint, I think, are very different. Graphics paints could sometimes be expensive.

So if you don't need to paint an area, don't paint it. And these questions will help you kind of keep that in mind. So to answer these questions, what are the current tools? There are several techniques you can do. And the one that I used to use a lot was just print statements or debug statements. And the other one, you can run a debugger, and you can kind of print the geometry. But that's kind of hard, and it's not visual. So Quartz Debug has a better way of doing that. It has a better way of debugging this kind of repainting issue.

And just as a reminder, for those of you who don't know where to find Quartz Debug, you can find it under Developer, Applications, Performance Tools. So, and now it's a time for a hands-on demo for course debug. Can we switch to demo one, please? So this one will be very easy to follow. So if you've downloaded the examples, or you have the disk image, come along, follow. It should be very easy.

So let's start by opening the application called Sky Creator. So you have the disk image, you can just go to the Sky Creator folder, and there's a code for that, and I'm not going to actually walk you through code. You can look at it on your own time when you get home or whenever you want to do that.

But the Let's start by running the Sky Creator app. So there's a skycreator.zip. You can take a look at it here. And if you double-click that, it's going to unzip it. And there should be a Sky Creator application. So what I want you to do is just go ahead and run this Sky Creator application.

So what this application does, I created a very simple kind of application that tries to generate the sky. So it has different kind of star sizes, and then it just kind of mixes and matches and randomly creates an image of a star. And just for the fun of it, you can actually drag the stars around.

So you can go drag them, and you can arrange them, and you can create your own sky. So it's a very simple app. But one thing to notice is, as I'm dragging the star around, I'm sure you can see it on the projector, there's a lag. As I'm dragging my star, the star lags the cursor.

Now this is a pretty beefy machine. It's a dual 2.7 G5, and it's still very sluggish. If you run it on a laptop, it should be painfully slow. So I don't know if you have run it, but it should be very slow. So how do we debug problems like this? The first thing I would do, I would run Shark. And I've run Shark on this.

And it's just going to say, oh, you spend all the time painting. Well, duh, I'm painting a star, so I'm going to spend all the time painting. So the second thing I want to do now is run Quartz Debug. And I want to see how much am I painting. So if you're following along, open up the Finder and go to your Developers folder under Developers, Application, Performance Tools, and just fire off Quartz Debug.

So you're going to see this little pop-up menu. And Quartz Debug has many features, and I'm not going to cover any of them except for one. I want to cover the Flash Screen Updates. So if you just check that box, what that will do is every time there is bleeding on the screen, there's going to be a yellow rectangle that's going to flash and tells you which area is being repainted.

So just to give you an example, if I move the mouse over, I see this yellow rectangle here. Or if I go over the dock, I can see what areas I'm actually repainting. So the obvious thing to do is to drag a star around and see what happens.

Whoa, so that's not good. So what this tells me that every time I'm dragging a star, I'm actually repainting the whole scene. And the scene is pretty complex. It's not complex, but it has a lot of images. It has about 1,000 images. So let's drag this little tiny star, and that's pretty bad. So let's see how we can fix that. So usually what I do, I just go to course debug and turn the flashing. So while you're doing all the debugging and coding, you really don't want all that flashing around.

So if you just close the Sky Creator app, and again, you can take a look at the code. And I'm not going to go over the code, how to fix that. But the obvious fix would be not to repaint the whole screen. Just repaint what's necessary. Just repaint the dirty area that you just invalidated by dragging that star.

So let's go back to the demos. And if you're following along, just go to the demos and go to the Sky Creator directory. And then there should be an application called skycreator.faster. So in the DMG, it's a skycreator.faster.zip. So I want you to click and unzip it. And that's going to unzip the app. And again, all of this code is in the Xcode project. And I hope you'll go home and take a look at it.

So this is a Sky Creator faster, which means we have optimized the repainting. So if I drag a star around, as you can see, it's clearly following along. It doesn't feel sluggish at all. So let's drag a smaller star. and it's following along perfectly. Let's drag a bigger star. No problem. Feels very zippy. And it should feel zippy on your three-year-old laptop as well.

Let's run course debug just to verify that I've indeed fixed the problem. So just click on Flash, Screen Updates, and then let's drag the star around. And as you can see, I'm only painting the area that I just invalidated. And let's drag a smaller star. This will be even more obvious. And I'm repainting this area. So it's much more efficient than repainting the whole complex scene. And you can see the faster you drag it, the bigger the region is because that's the actual region that you actually invalidated. So I'm going to turn off course debug.

So just to summarize, it's a very simple tool. And if you're doing a graphics application, please run it before you ship it. It will tell you whether you're doing something very silly. So with that in mind, can we switch back to slides? So now I would like to invite Christy to talk to you about Shark. Thanks.

Good morning, everyone. Hope you're enjoying WWDC this week. Yeah, I've been. I'm here to tell you about Shark for Java. You know, we just added this last year, and it's gotten good response in the community. So what is Shark? Shark is a profiling tool. It lets you profile a running process, thread, or the entire system. For Java, you can only profile an individual process, however, but you can do time samples to find hotspots, you can trace memory allocations to see patterns of memory usage in your program, or even exact trace method calls.

And for non-Java profiling, you can do time, memory, function, system, your hardware system events. And this is useful to say JNI calls, or if you're running in a heterogeneous environment where you have some Java and some C or other servers, it's very useful to see where the time is being spent.

[Transcript missing]

Another really special thing in Shark is data mining. How many of you, when you've done your profile, you don't see times spent in your code, but in somewhere in AWT, Java, Lang, String? How many of you have had that problem when profiling? A lot of you. Well, with Shark, you can eliminate what you don't want. You can take a package like Java, Lang, String and charge it to the caller, eliminating that from your samples and seeing who is using Java, Lang, String.

You can also charge a symbol in a similar fashion. The other half of this is to help you see what you want. You can focus on a particular function in the subtree that comes out from it rather than seeing everything that's going on in your program. This helps you narrow down, you know, the contribution of your data. And you can also do a specific performance problem. Similarly, you can focus on a package. Let me illustrate these graphically. So in charging packages to a caller, we have a main program here.

And there's an initialization section. Then it calls this main, you know, function do_example and some cleanup. And let's say inside it calls some function bar. It calls into Java util. Now, if we were to profile this thing, we would see samples in Java util. And not in bar, even though bar is the function we're calling the most.

But by charging Java util, it suddenly pushes all those costs up into bar. And it shows you what's really going on. So with this tool, you can now see what's going on in the middle of the call trees where the interesting stuff and usually the hard to track down stuff is.

Now in some cases we may want to not completely remove a package, but replace all calls to it with just the entry point. In other words, if you call into string buffer append, you don't want to see all the Java lang string calls that append might happen to use in the current version of the library. But you do want to see how many times you've called append in your program. So we call that flatten a package. So in this case, you get rid of the package, but we leave-- in this case, we're using hash table contains-- the call to hash table contains.

So finally, focusing. On focus, we don't care about what's going on in the initialization and the cleanup. Suppose they're calling a bunch of stuff to Java Lang String. Well, we want to make sure that the part of our program that's doing the work is also hitting that code really heavily. So in this case, we're going to focus on Do Example, and that'll strip away this part of the, you know, the outer parts of the call tree, leaving us with just the portion that we're interested in.

So without further ado, I'm going to do a little demo. Can we please switch to Demo 1? OK, so for those of you following along, please open the bouncy Xcode project. And I'm just going to run the app for once just to show you what it looks like.

So, just a cute little demo that shows a string bouncing around on the screen. One of the important things when you're doing performance in an application is you want to create instrumentation. We saw several examples. Scott with his Zombies program here, and in some later demos you'll see that it's really important you add some kind of measurements, like frames per second or whatever is useful for your application to keep track of how it's doing. So we're going to get out of that. And now to use Shark for Java, we have to do a few things.

There's a command line argument. Shark for Java uses the JVMPI API-- or, sorry, package. So you need to add an XRUN argument to it, similar to the one you see here. -XRUN shark. It's also important that it goes before any -jar options. If you put it afterwards, if you were to put it down here, it won't work, and it'll mysteriously not work.

You'll just go, well, why aren't I seeing this? So when we do this, now we can rerun the application. And to verify that Shark for Java is enabled, it will give you this text here, "Shark for Java is enabled," so you know it's running. Now let's go navigate to Shark.

So it's in Developer, Applications, Performance Tools. It's right next to Quartz Debug. We'll launch it. And this is the main Shark window. It has a start button. It has a pop-up menu of different profiles you can select. And there's a lot of different ones, but we're going to choose Java Time Trace. And it'll force us to pick a process.

And then there'll be a list of all the, you know, JVMPI processes that were done with X Run Shark. Now another important thing to do, and those of you following along need to do this at this time, is you go to Search Paths in Shark Preferences, and under Source Files, you want to add the path to your demos bouncy. And you just click on the plus sign here, navigate to that folder in your demos, and then just choose Open.

And once you do that, we can start sampling. And when we sample, when we're doing time-based sampling, you want to sample for like 5 or 10 seconds, so you get a reasonable number of samples. For memory allocations or exact traces, you want to do it relatively short, or you'll just get too many samples, and it'll be slow.

So that's enough time for sampling. And we get a view here. And what this is, this is a list of all the leaf functions that were called and the percentage of time spent. And one useful thing here, if you click on the lower right, there's a little box. And you can show a stack view. And if you click on a symbol, you'll see on the right here a back trace. So C text by dude drawstring is called by OS X surface data, and so on.

and you can click on different functions and you'll see the different backtraces. So this is a bottoms up view of the world. Now it's also useful to see a top down view. So I just switch via this pop up here. You can choose heavy, tree, or both. And I'm just going to show both here. If you click on this, you'll see a view from main down to somewhere deep and down it calls bouncy paint.

Now, getting back to data mining, if you go to the data mining menu, you can say charge library Java AWT to callers. And now that eliminated about half the calls on the right side here. Let's keep doing this. So let's get rid of Apple AWT and let's get rid of Sun AWT. So now we have a much simpler view.

We have paint, paintballs, it's just in a few steps we've reduced the problem from this big complex world that I can't understand, maybe someone in the audience can, but I can't, to a relatively simple world where it's stuff that's familiar to me since I wrote bouncy. Now, you'll notice that bouncy paint is underlined.

So we double click on Bouncy and we see Source. And it's marked up with different color highlights depending on how much time was spent. And we see that most of the time was spent in paintballs. And that's also underlined. So we can click on that like a hyperlink. And here, oh, we see our problem. Well, I must have been asleep when I wrote this code. I'm allocating a new font every time I'm drawing one of these balls. That seems kind of dumb.

So, luckily I have this edit button here. So, I found my problem. I go edit. It took me right to the line of code where I need to change. Isn't that handy? So we're going to comment these out. Luckily, you know, just for this demo, I had the corrected version here. And let's rerun this.

You'll see, just with that one fix, it's dramatically faster. And the thing is, this is a little bit of a cooked demo, but I've worked on servers for quite a long time, on optimizing Java and other applications. And you'd be amazed at how many things you find that are this easy. And it's just a matter of profiling your program, find the most significant problem, fix it, profile it, and just iterate over and over like that. And before you know it, you'll have your app running two, three, even 10 times faster. Thank you.

At this point, let's go back. I'm going to hand it over to Josh. Thanks, Christy. So to show you a little bit more about Shark for Java, we wrote a little demo application that generates fractals. And so this application, it will generate a fractal data for you, and then it will display to the screen.

And it's displayed as a height map, which is pretty much like if you looked out of a plane down at the ground, what you would see. And it just repeats this process as many times and as fast as it can. And we're going to use Shark to determine where the hotspots are in it. So let's go ahead and actually just jump right into the demo. So Kevin, thank you. All right, let's just close out Balancing here.

And if you go to your demos folder, you can find the Fractal Performance folder under that. And then we're just going to load up demo number one.

[Transcript missing]

and As you can see right off the bat, this application is pretty slow. I mean, in the upper left hand corner you'll see the number of fractals per minute that it's generating.

And I've also added some more timing information into this application, so if you hit the spacebar once, The next time it renders a frame, it'll show you the amount of time that it's spending in rendering and generation. And right now you can see that we're spending about 3.2 seconds in rendering and almost 2 seconds in the generation. The problem is, you know, we don't really know exactly where this time is being spent. So this is a great use for Shark here.

So let's go ahead and bring up Shark again. And the first thing I want to do is I want to go to the preferences, make sure that the source code for this application is in the search path, which we already have set up here, but you'll need to add that.

And what I want to show first is an allocation trace. So let's go to Java, Alloc Trace. Make sure that application number one is selected. And I want to take a really quick trace. I want to kind of get a sample at the beginning of that loop when it's starting to generate the fractal. So I'm going to wait for it to start rendering a new screen, a new shot.

and take a really quick sample. And now it's generated some information for me that we're not gonna worry about the profile here. What's really interesting is if we go to the chart, So if you click on the chart, we can actually zoom in on this, from this little zoom bar right down here.

And you'll start to see this repeating pattern. And it's kind of the heartbeat of the application. And this fractal, in particular, is generated recursively. And what happens is, for every section of the fractal, we subcalculate four new sections. And you can see that from all these different peaks here. Every peak has four more peaks sprouting from it. And that's actually what we're doing. And that's actually the recursive algorithm, the recursive method.

So let's go ahead and click on one of these peaks. And over on the right, you'll see a backtrace of all that information. And you can see here that we're calling that recursive algorithm. So let's go ahead and look in that code and see what the heck we're doing.

Well, you see it pops us right to this allocation here, where we're allocating a point object. And saying that 50% of the allocation is due to that. And let's go down a little bit more. We got some more. We have another one. Okay, we're allocating another point object. And we're calculating all these point objects. And what it's doing is we're calculating those so we can actually call the recursive method with these different subsections of the fractals.

Well, we're not really using much of the point object in this case. All we're using is to define these corners of these squares. Instead of using a point object, we can actually use integers instead. And avoid all this allocation. So let's go ahead and close up this demo.

[Transcript missing]

and go to Fractal Performance in folder 2. And let's open up 2.xcode, which is the second demo.

and let's go ahead and look at Float Fractal. www.virtuals.com/java/src/java/

[Transcript missing]

Okay, so application's still a little bit slow. We have increased the frame rate a little bit. If we hit the space bar again once, we'll see the times again. And lo and behold, time to generate has now reduced from almost two seconds to 123 milliseconds. So that's a huge improvement. But we're still spending a lot of time in the rendering code. We're still spending, that's where the majority of our time is being spent. So we're going to use Shark again now to investigate what's going on.

So I'm going to go ahead and bring up Shark again. Since this is a different demo, I want to add the search path for this application in here. So let's go ahead and do that. Going to go to desktop, to the demos, fractal performance. Demo 2, and I'm going to remove Demo 1 so it doesn't get confused.

All right, so instead of doing an allocation trace this time, let's do a time trace. So go to Java Time Trace, make sure Process 2 is selected. And when we do this time trace, we want to get a good idea, a good sample. So we're going to let this actually generate a couple of fractals.

And make sure it displays to the screen, goes to the generation code, and displays again. As it's doing that, it's gone through about two right now. I'll go ahead and stop. And now I'll bring up the profiling information. And the top of the chart here is OS X surface data finish lazy drawing.

[Transcript missing]

So if we double click on this, it'll bring us to the code. We can see exactly what we're doing. Well, in the painting code, what we've done is we've generated this fractal. And what we want to do is when we're displaying it to the screen, we're generating a color depending on the height. And then we're drawing a pixel in that position.

But as you can see here, we're creating a new color. We're generating this color and filling every single pixel, which is extremely slow. In fact, this is just the wrong way to do it. There's something called a writable raster, which is a better way of doing this, where you can give it all of your color information for all of the pixels at once. And for the third demo, this is actually what we've done. So I'm gonna go ahead and close this. Go back to the Fractal Performance Demo 3.x code. This code is in the Fractal Viewer.

Double click on that. Searching for WWDC. Right here, the comment, as it says, is use a writable raster to draw the pixel data rather than creating a new color and painting each point. So what we've done now is created this huge array with all the color information, and we're just going to pass it off to this writable raster. And it's in core. It's optimized for this type of painting. So let's see what kind of performance increase we get. So if you go ahead and hit Build and Go.

Here we go. Now we're getting about 250 fractals per minute. And if you hit the space bar again, you'll see the time difference now. Instead of spending three seconds in the rendering now, we're spending only 100 milliseconds. So at this particular time, you know, our demo is sufficiently performant. We don't need to move on.

But the point of the matter is that Analyzing your application and improving the performance is a really iterative process. And using Shark is a really key part of that and can get you this information even quicker. So can we go back to the slides now? So for more information, you can check out the website and sample code at developerapple.com/wwdc2005. and there's some other related sessions as well. The Maximizing Java Performance, which is in the lab, right after this, will be there to answer your questions and help you out with your applications. And then there's also the Java and Web Objects Feedback Forum.