Configure player

Close

WWDC Index does not host video files

If you have access to video files, you can configure a URL pattern to be used in a video player.

URL pattern

preview

Use any of these variables in your URL pattern, the pattern is stored in your browsers' local storage.

$id
ID of session: wwdc2004-428
$eventId
ID of event: wwdc2004
$eventContentId
ID of session without event part: 428
$eventShortId
Shortened ID of event: wwdc04
$year
Year of session: 2004
$extension
Extension of original filename: mov
$filenameAlmostEvery
Filename from "(Almost) Every..." gist: ...

WWDC04 • Session 428

Maximizing Java Virtual Machine Performance

Application • 55:28

Improve your Java application's performance by choosing the right APIs and tuning the Java Virtual Machine to your needs. We focus on the performance capabilities of Java on Mac OS X and provide information about the latest APIs, garbage collection alternatives, and other "under-the-hood" optimizations available to you.

Speakers: Victor Hernandez, Roger Hoover, Christy Warren

Unlisted on Apple Developer site

Transcript

This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.

Hello. Welcome to Maximizing Java Virtual Machine Performance. This session will be given by three of us, myself, Victor Hernandez, and Roger Hoover from the Java Virtual Machine team, and also Christy Warren, who is the responsible person for responsiveness in Tiger. So what we'll be talking about is the Hotspot Virtual Machine in Mac OS X that is used to actually execute your Java applications.

This Hotspot Java Virtual Machine comes from Sun, and we take that source, and we tailor it for Mac OS X and optimize it specifically for PowerPC. Currently, we are supporting J2SC 142, and as announced yesterday, we will be supporting J2SC 5.0. Everywhere throughout this talk, we'll be referring to it as Java 150, which is what we all know it as, but the official name is J2SC 5.0.

So what do you get with the Hotspot Virtual Machine on Mac OS X? You get a variety of features. You get a client just-in-time compiler, a variety of garbage collection algorithms, an implementation of class data sharing, which was innovated by Apple starting in Java 1.3. You get native G5 support, also a JDK whose classes are optimized specifically for Mac OS X. And finally, you get the debugger and profiler interfaces, JVMDI and PI, which development tools can use to analyze your application. Uh-oh.

What's going on? This thing is a little confused. All right. Sorry, I wasn't looking at the slides. And now with 1.5, there are a whole bunch of new features. Specifically, you will be getting new language features in the Java language, which will make your development time a lot more-- basically, it'll simplify a lot of development. There's also a new client compiler feature called SafePoint Polling. Startup time should also be a bit faster.

And there will be explicit concurrency exposed in the Java APIs. Also, we're really excited about this one. The class data sharing implementation from Apple has been adopted by Sun, and it will be available on all of their platforms in Java 1.5. They've not only taken our source, but they've also optimized it themselves. And so what you will be seeing will be improvements to our initial sharing implementation.

Finally, there will be a new tools interface, which will be replacing the debugger and profiler interface that have been deprecated. So Java 1.5 is available for you today. It is equivalent to the beta 2 version that is presently being previewed by Sun. It installs on your Tiger Preview DVD that you got at the conference this week, and you can go ahead and download it from connect.apple.com.

So today's talk will be divided into three parts. First, Roger Hoover will be discussing what is new in Java 1.5. The second part, I will be discussing how the Hotspot Virtual Machine optimizes your application. And then finally, Christy Warren will be introducing a very exciting new Mac OS X application for profiling your Java applications. So here's Roger.

Hello. I'm going to give you a brief overview of the new and pretty exciting things that are in 1.5. I was over at Java 1 a little bit earlier, and they've got whole talks for each of my slides. So this is going to be very high level, very quick, but I'm doing this to get you interested in new things in 1.5 if you haven't seen them before, and also to point you to where to find more information about those pieces that you're interested in.

So, biggest changes pretty much in the history of Java. There are lots of language changes, and I'll go into those in the coming slides. There are also a bunch of library and runtime changes, which are also pretty interesting. Note the blue bubble with the JSR numbers in it. This is the Java community process. It has these Java specification request numbers that correspond with the specifications for this new stuff, and if there's something you want more information in, remember the number in the blue bubble.

And you'll be able to look it up with the URL that I'll have at the end. Okay. Why change the language? Well, there are some great things in here that I think are going to give a lot of improvements in programmer productivity. Most of these changes are handled by Java C, but there are a few things that touch the VM.

There's a flag-source 1.5 for Java C that turns these things on. It was not the default in earlier betas, but with beta 2, the stuff that we're giving you, it is the default. There's a new keyword called enum. If you used enum as an identifier in the past in your old code, you're going to have to say -source 1.4 to turn off these things in order to compile your old code.

Okay. Let's look at some of these great features. The first one is auto-boxing and auto-unboxing. Well, what's boxing? Consider a primitive type like int or boolean to be an unboxed type and capital integer and capital boolean to be a box type. Namely, the primitive object is inside an object box.

So with 1.4, if you used the box types and the unboxed types together, you had to do lots of conversions. In this example here, I show you are all the time creating the new containers for things. With 1.5, the compiler does this for you. It type checks it, codes much more readable, simple. This is excellent for anybody dealing with these things.

Generics. People have talked about doing generic types for a while in Java. It's finally here. Before you had the dilemma in Java of either writing a very general type using object and then having to do all these casts in and out and worrying about doing the type checking or having runtime cast exceptions when you did it wrong, or you could make something very specific and not be able to reuse them.

Now you can write the code with parameters for the types that are embedded in the data type and get the reusability and get the safe type checking from the compiler, which then generates the cast. This is for object types only. You can't use primitive types as the parameters.

Here's an example, a pair that takes an arbitrary left and right type, capital L, capital R, a constructor for that. And public accessor functions for looking inside. And then at the bottom two lines, the next to last line has a place where we've created, we're creating a new pair of capital integer string.

And note that the 17 in that line there is going to be auto-boxed for you, so these features interact. And also a teaser at the last line there, we can now do C-style printing, and I'll get to that in a minute. But there's where we're calling the accessor functions to pull the values out.

Those of you who know C++, here's equivalent code that does exactly the same thing in C++. The major difference is that C++ compilers typically instantiate the template for every instance, and thus you can use primitive types, and you can't do that in Java. So here we're using Int and Care Star.

Okay, another feature, static import. Why static import? Well, there's two reasons why you'd want this. One reason is that it eliminates a binary compatibility issue with importing an entire interface. You're just pulling out the static methods and fields of another class, and so it's simpler for the compiler. But probably the main reason that you want to use this is it simplifies the naming, because you can actually import those names into the current namespace.

So in 1.4, if you were going to use this MyMath class that has this constant pi and this method times, you have to be saying MyMath.this and MyMath.that all the time. But in 1.5, if you do an import static, you can use pi and times without qualification. So simpler code. Compiler does all the work. to make it right.

For loop has been enhanced. Instead of having to specify the induction variable in the loop, if you're using arrays or a new Java lang iterable type, you can simply say for type variable colon Well, here's an example of a string concatenation in the old method. Here's what you can write now. You simply name the variable that represents each iteration of the array, the piece of data you're interested in, and you can use it in the loop. Again, simpler code, compiler does the work for you.

I showed you the printf example before. This is enabled by variable-arity methods. You can do this. Here's an example. You can say type name dot dot dot in a method definition, and the compiler automatically converts that into an array. So you simply use it as an array. This function here just concatenates-- or no, it chooses a maximum of a bunch of strings that you give it. But you could pass any number of strings to it.

Enumerations. This is similar to the enum type in C. You specify a bunch of constants that get instantiated by the compiler. And you can use it as such by saying, in this case, myColor.yellow picks out an individual one. But this also interacts with static import. And if you import this stuff, you can actually just talk about yellow and red.

But there's a lot more to enumerations than simply a list of constants. You can actually have methods inside enumerations. Here's an example of where I've taken the color and defined another enumeration, fruit, which has a method that does a switch on the type of fruit and returns the color.

Now, note that we were able to say red and yellow because we did a static import. The third one, orange, is actually both a fruit and a color. And if I add a color, it's a color that's actually a color that's actually a color that's actually a color. And if I had just said orange there instead of myColor.orange, the compiler would have complained that that was ambiguous.

So we can go on and in another file import both color and fruit and do computations on them. Here I'm doing apple.myColor. That'll return the color of an apple. And because the compiler keeps these things as unique instances, you can do equal equal on enumeration types. and it gives you equality.

Okay, another big thing is metadata. This is currently not hooked in with the metadata stuff you heard about yesterday, Spotlight, although we'd love to do that at some point in the future. But this is metadata inside the Java program that allows you to add additional information into your program.

And there are three parts in order to make this work. There are declarations, there are annotations, which, well, declarations say what you're going to keep track of. Annotations say where you use that and you instantiate that inside the program. And then runtime access, you can write programs that actually look at this metadata via reflection.

And the good thing about this is that it eliminates the encoding of data into flag classes like Jack's RPC stuff did. Just to indicate that these are special functions. You can do this more cleanly with metadata. It doesn't have the compiler implications of being dependent upon other classes.

And this will be used for programming, documentation, tools. I suspect we'll see a lot of neat stuff that uses metadata. So how does it work? A metadata declaration is similar to an interface declaration, except you say @interface instead of interface. So you can give a bunch of members value is going to be special, and I'll talk about that in a minute. And you can give default values. So here's an example. We've got a bunch of bugs in our code. And so we define metadata called fixme.

[Transcript missing]

Okay, so once we have the declaration, then we need an annotation in order to place this, and we can place the annotations before any declaration in our Java program. There's a special way with a file that we can put a package annotation because Java really doesn't have a declaration of a package that's explicit in code.

And we can also put these in front of enum constants. So what do they look like? Well, here's several in a piece of code. I've got this class called perpetual motion, and you'll note that the thing is preceded by a fix-me annotation that says there's no such thing, and you get a holiday if you fix it.

Well, inside we have a method sum that also has a fix-me annotation, and this time I just have a string. Why does sum subtract? Well, If you don't specify a member name, it assumes value. That's what's special about value. And I didn't have to say what the reward was because there's a default on that. It'll use "cookie" as the default reward.

And finally, notice the debug annotation in the last line. And since debug has no members, I don't have to do the open/close parens, so I can just simply and cleanly put those where I need them. Okay, so how do I use these things? Well, I can write tools that actually look at the source to use these because it just gets compiled out.

But I can also look at them at runtime via reflection. And there's another special annotation called @retention that is used to tell the compiler to retain that and put it in the class file so reflection can find it. So in my previous definition or declaration of fixme, if I had said @retention blah, blah, blah, it would remember this stuff for runtime access.

And in this example, I'm using reflection to get at the method that corresponds with sum, and I do a .get annotation. So that gives me back an array of these annotations, and then I use the enhanced for loop just to print them out on the screen. So this is how you'd write tools to use the metadata information.

Great things for people who write concurrent programs with multiple threads. There are some new classes. I'm going to kind of look at these inside out. The Java Udall Concurrent Atomic does single access atomic on individual variables that's exposed in this API. On top of that, there's locks that are built on those.

Those are completely independent of Java internal synchronization. Then there's Java Udall Current that has a bunch of classes that are pretty useful that's built on all of this. This was done through JSR by Doug Lee and company, some of you know of. Here's a brief overview of some of the things you can do.

In threads, there's an executor interface. It gives you fairly convenient thread pools without doing lots of work. There are lots of different kinds of queues. There's nano time for nanosecond clock time within a given JVM, which performance people are going to love. Lots of synchronization primitives that more match the literature than what's in Java, so you can implement published algorithms easier, and concurrent access to various kinds of collections. This is great stuff.

Also, in terms of multi-threaded programming, a new Java memory model. It says what to expect with multiple threaded program accessing shared storage. There's a specification in the original thread specification for Java that was widely ignored because it says you can't do things that were widely done by optimizing compilers, as well as processors like the PowerPC that reorder instructions.

What this does is it presents a realistic model of what can happen before other things. It also guarantees that when you build a new object that the final fields are set when you exit the constructor, so you don't see an intermediate state there. Practically speaking, what does this mean? Well, we're going to make the Apple 1.5 JVM obey the Java memory model, so you'll be able to count on it. And in particular, you really have to use synchronization any time you're mucking with shared storage.

Either the Java synchronizer or the Java virtual machine. You can use the same thing, but you really have to use synchronization. Also, if you're doing a multi-threaded thing where one thread sets up a bunch of data and then a bunch of other threads take off and start working on it when everything's done, that variable that says that things are done has to be volatile, where the write of that volatile by one thread releases all of the things that were done before it. And the read of that volatile in the other threads acquires all of that stuff. This is the way it works. This is the standard way you should use thread communication via shared variables. They've got to be volatile.

Otherwise, don't be surprised if things happen out of order. In particular, things that you've tested and debugged on a multiprocessor G4 are most likely going to fail at some point when you run them on a G5 if you haven't followed the rules. There's a new tool interface that replaces JVMPI and DI, which are now deprecated and going away, presumably in the next version of Java.

This has a slew of-- functionality that lets you implement these tools, plus hopefully a whole lot more. Basically, agents plug into the JVM -- these are C++ programs or at least a piece of your program -- that say which callbacks they want to get and then are notified, and there's also a whole bunch of functions that they can query the VM.

So it will be exciting to see what comes of that in the coming months and years. There's a new monitoring and management interface that's basically designed so you can do things like load balancing in server environments. You can look at the memory usage in classes and thread information inside the JVM. You can look at number of processors, CPU utilization in the OS, things like that. And finally, there's lots more things that I don't have time to talk about.

Here's a list of some more things that you may be interested in, and note the last two lines, the URLs, the next to last line, fill in the JSR number and you'll find out where to get the specs for that particular JSR, and the Sun site, the last line, also has some great information.

Thank you, Roger. So I will be going over some implementation details of the Hotspot Virtual Machine. That should give you a better idea of how your application can be better optimized when running specifically on Mac OS X. So what do you get with the Hotspot Java Virtual Machine? You get a client-just-in-time compiler, a variety of garbage collection algorithms, automatic G5 optimization, and the sharing of class data between JVM instances when you have multiple Java applications running on the same machine. You also get the tuned JRE implementation that we have done for Mac OS X. So let's talk a little about the client compiler.

The Client Compiler dynamically compiles your application's hot methods. After you've called a particular method a certain amount of times, we go ahead and stop executing in the interpreter and continue execution in a JIT-compiled version of that method. We have personally optimized the client compiler from Sun for PowerPC. That means that we've come up with optimal code sequences for each Java bytecode, and we've also figured out how to make best use of the full PowerPC register set.

[Transcript missing]

So what exactly is inlining? It's pretty straightforward. In the example I have up on the slides, you got average, it calls sum. There's extra overhead of actually calling the function sum. So ideally, you would want to have a situation where the body of sum is just inlined right into the body of average. You don't want to have to do that in your own code, because the code is not as extensible. So we do it for you dynamically.

In what situations can we do it? Well, there's a few opportunities that are very straightforward to be able to do this. One, our field accessor methods. There's no reason for you to access directly fields via their names. You can use methods to do that. And also, constructors in your Java classes.

We're also able to inline a bunch of intrinsics. Intrinsic methods are methods that we don't even need to look at the implementation of the bytecodes. We know what the behavior of that method is, and we go ahead and execute it in the optimal PowerPC code sequence for those. We know how to do array copy. So this basically applies to JDK classes. And we keep on adding methods as they become either possible to come up with an optimal code sequence or they're used more heavily in the JDK.

and there's examples of a bunch of intrinsic methods. Well, this doesn't include a huge set of methods that are used in your applications, and that's-- virtual methods. As I said before, it is possible that the target of an invoke virtual bytecode is actually not always the exact same method. It depends on what variety of implementations of those virtual methods have actually been loaded.

But it turns out that we actually are able to inline those if we know that there has only been one implementation of that virtual method that has been loaded, in which case that method can be considered monomorphic. It turns out that in most usage patterns, this is actually the case. So we're actually hitting a large percentage of invoke virtuals with this optimization.

There are, however, limitations to this. Since we're able to inline so many methods, the limiting factor no longer becomes finding opportunities to inline, but actually the size of the compiled method. So if your hot method happens to be really large, it might not be able to inline methods that it calls, or it might not be able to be inlined by its callers. You also need to be aware of the fact that we are unable to inline methods that are synchronized and also methods that are with exception handlers. There's a limitation in the client compiler.

And I want to leave you with the tip that in previous versions of Java, it was necessary or it was useful to use the final keyword to basically to make a virtual method in-lineable. And now that's not needed at all. And you should basically be using final only for its object-oriented purpose.

Okay, safe point polling. There's a new feature in the client compiler in Hotspot 1.5. What is a safe point? A safe point is the state that a Java thread needs to reach for exact garbage collection. Specifically, the location of all Java objects needs to be known at that location.

Currently, compiled methods reach a safe point in Java 1.4.2 as follows. Basically, you have a Java thread executing through compiled code, and the virtual machine has to hard suspend that thread, make a copy of the code, and then insert traps at pre-designated locations, and then continue executing at the previous location in the copy.

You actually hit one of the traps, at which point the virtual machine can take over, and it's known that it's at a safe point. The fact that we're hard suspending the threads makes this possibly a dangerous thing, and it also requires a lot of extra overhead in the client compiler to keep track of the locations to insert traps at.

This is now greatly simplified in 1.5. What we're doing in 1.5 is basically you have your compiled method that is currently executing through all the instructions, and every once in a while, there is an access to a safe point page in memory that basically is a no-op. It's not reading or writing anything that's actually useful.

But at some point, the virtual machine decides to memory protect that page, and so the next time you come around to that access, it actually hits a trap, and the virtual machine is able to take over. This will result in much more optimal -- basically, the overhead taken for your compiled methods to be able to get to in a state for garbage collection will be greatly improved in Java 1.5 because of this. Thank you.

Okay, let's talk a little about garbage collection in Hotspot. Right now, we're currently supporting three different garbage collection algorithms. They're each designed to meet needs of different kinds of applications. There is no longer this notion of one garbage collector to meet all needs. So, the original garbage collector, the Sierra garbage collector that you're familiar with from Hotspot since Java 1.2, or even before then, is still there and it still is the default collector. But since Java 1.4, there have been two garbage collection algorithms introduced.

First, there's the concurrent mark-and-sweep algorithm, which has been designed to have a higher throughput for larger Java heaps. And there's also the parallel scavenge algorithm, which is designed to have a shorter pause time. What I recommend to you is to run your application with all three and also change the Java heap parameters to see where you can fine-tune your application. There is not one garbage collection algorithm that will apply to everybody.

There are a few small changes to garbage collection happening in Java 1.5. The main thing is that the parallel scavenge collector will now be the default in all Mac OS X server installations. This is similar to Sun's approach at garbage collection configurations for Java 1.5. They're also making it so that all server installations will automatically get the parallel GC.

And we're applying that as well. The way they detect that is any dual CPU machine with greater than 2 gigabytes memory gets classified as a server machine. Therefore, it should get the parallel garbage collector. This will definitely be very good for compatibility between various installations of Java 1.5. We don't necessarily want all of a sudden the performance characteristics to be different just because you didn't happen to get parallel scavenge on our platform.

The other set of differences in Java 1.5 is the fact that we're making it more convenient to you. to configure the heap parameters for performance purposes in each garbage collection algorithm. Specifically, in parallel scavenge, you can now designate a percentage of time that you hope that your application is spent during GC. Under the covers, that gets translated to particular heap sizes of the permanent generation, the new size. Basically, in the past, you had to kind of know how these garbage collection algorithms were implemented. And now, the goal is to abstract that away for you.

If you've been using use adaptive size policy in Java 1.4, that also has a few convenience features. That also has a few convenience flags. Specifically, you can specify how long you want a pause to take and also what ratio of the time of your full Java application is spent doing garbage collection.

Finally, I want to talk about how we've optimized Java for the G5. These optimizations that we've done on the G5 require absolutely no code changes on your part and also no recompilation. Basically, what we've done is we've taken full advantage of the double-word registers available on the G5 and also all of the double-word instructions. This has been done both to the hotspot interpreter and the compiler, and there should be big gains overall, but especially those people who are doing arithmetic of longs, doubles, and floats will see a substantial improvement.

There are specific reasons why. I mean, we can do CAS inline from float to integer. We're also -- bit extractions from those values are much faster, and also the square root instruction is actually available on the G5, and we actually call directly into that instead of writing source for that. As you can imagine, that's much faster. Also, synchronization has been improved by taking advantage of the extra lightweight synchronization instruction available on the G5. And so we are taking full advantage of all of the PowerPC instructions for synchronization.

How have we actually been able to measure that performance gain? Well, we've been tracking SCIMARC 2.0. SCIMARC 2.0 is a very good example of a few scientific algorithms. It does fast Fourier transform. It does the Monte Carlo approximation. And you might be familiar with the composite score numbers from the Java State of the Union yesterday. But there are a few points that I want to point out that are different from that. But as you can see, these are our scores on a 1.25 gigahertz G4.

The 98 number is definitely pretty low. And our goal with the G5 is definitely to get the number up. We expect it at least twice as fast. The second bar is basically a pretend 2.5 gigahertz G5, just twice as fast. The scores are twice as high as the first numbers. What we actually get on the G5 are something that's substantially larger.

And we're pretty excited about that. This makes the composite score is now very competitive with scores reported on Pentium and other platforms. And we're very excited about that. The one thing I do want to point out is, for example, why Monte Carlo is still low. It turns out that Monte Carlo is doing unnecessary synchronization by attaching synchronize to the front of a method. You remove that, and the score increases dramatically. It does increase dramatically on the G4 as well.

The reason I want to point that out is that the client compiler is unable to detect unnecessary synchronization. And it therefore falls into the hands of the Java developer to do analysis on their application to see if this could be the case in your application. And that's it. And I'd like to invite Christy to introduce Shark for Java.

Hi, everyone. I hope you're having a good afternoon today. I'm here to talk about Shark for Java and high-level performance analysis. How many of you have heard of Shark or used Shark? Wow, I didn't expect quite that many, so you know about this program. But now we're going to show it for Java. So I'm going to go through this part pretty quickly then. So Shark is basically the ultimate profiler you can get on Mac OS X.

It's a really neat program. In the past, it's been great for analyzing our C and C++ and Objective-C programs. It does both what's called cost and use analysis. I'll go into that a little more later. It can profile a running process, a thread, or even the entire system.

And for Java, you know, it's limited to just a single process. But it can do time samples, you know, like other profilers do. But it does some things that we used to be able to do in the old sampler programs, such as allocation tracing and even exact method tracing. So it can act like Gprof and record every invocation of a method.

So these two other methods are really nice additions to the usual time profiling you see. on other profilers. It also does non-Java profiling, as I mentioned, for time, memory, function, even low-level hardware events. If your application is a real-time type thing, you can use that. And it's useful to study JNI calls.

And you can download this beta from developer.apple.com. And please get that, because the version of Shark on your Tiger CD does not have the Java support. We worked really hard in the last few weeks to deliver this for you guys for WWDC, so I hope you enjoy it. But you have to get it off the website. The good news is it runs on both Panther and Tiger.

You know, so you can just take this back to your development system as it is, use it, and just rock with it. It's awesome. So some key features of Shark is it provides a profile view that gives you a simultaneous heavy and tree perspective. And that will become more clear when I show you the demo. And we're also introducing with Shark 4 sophisticated data mining and filtering. We also provide a chart view that lets you visualize the execution of your program.

And especially for enterprise applications, this is a really neat feature. You can do remote profiling over a network. You run a command line tool on your X serve sitting in a cage somewhere. You can talk to it to Shark for your rendezvous and control it. So you have minimal impact on your server. You can analyze Tomcat, your JSPs, whatever. So that's a really neat thing. And you learn more about the detailed features of Shark at Got Shark this Friday at 3:30 PM.

So I'm going to talk about just a few general principles here to motivate the data mining. What makes software slow? Well, probably best known as bad algorithms. You're using a bubble sort instead of a quick sort. You know, if you use a large set of data, then your software is going to go really slow.

Excessive memory allocations and locking, these are primitives that are expensive. Victor just talked about this example with Monte Carlo, where the overuse of a synchronization primitive just hosed performance. Disk IO, network call, IPC, these are all really expensive operations compared to doing an ad. These things you want to do as little as possible.

Now, a more insidious thing that happens in software is doing the same operation more than once. Let's suppose I write a module that quick sorts properties that I read out of a file. And let's say Victor had written another function that does the same quick sort. These two will just show up as calls to quick sort in the profile, but it won't show the fact that two different pieces of the code in two different parts of the program, you know, did this call to quick sort.

And this is a simple example of what we call complexity in software. You know, I have a little graph of an execution trace here of a program. The horizontal axis is time slices or sample slices. In this case, it's memory allocations. The vertical axis is the call stack depth. So what we're really doing is we're taking, like, slices of your program as it's running and, you know, doing this plot. And you can see these interesting patterns there. And I'll go into that a little more in a minute.

So what do we mean by complexity? Large-scale software has multiple layers and many modules. And the bigger the system, the more of these things you get. And because we're good programmers, we hide the implementation details from our clients. So a method called foo could do something like just set a bit in a class somewhere. Or it could cause a transaction to a database, update some rows, and even result in the launch of a rocket. Through some I/O devices. One takes microseconds or milliseconds. The other one can take minutes or hours. So innocuous-looking calls can result in crazy, complex, unexpected execution paths.

So going back to this example, which is actually the finder get-info dialog, we zoom in. And the patterns of repetition show up in deeper levels. On this find level, you see repeated structure. And as you're zooming out, it shows up again. This is like two layers that are both doing iterative-- iteration and repetition. And they're layered. Now imagine this multiplied with five layers. Imagine you're adding AWT, all the Sun libraries, all of your libraries. You're a huge application. This thing can be insane.

So how do we deal with this? Well, in analyzing performance, you can break the impact of an operation into two pieces. The cost of the operation times the number of places it's used. And traditional profilers, for a long time, have made it easy to analyze costs. And I can tell what leaf function I'm in.

The hard part is understanding the patterns of usage. Did Victor and I unintentionally both quicksort the same array when we could have just done it once and one of us accessed a cached copy? And when you introduce over-modularization, people tend to over-abstract, design too much, go kind of crazy with design. You get these really crazy multi-level call stacks, multi-level things that just really kill performance. performance.

So, analyzing use. Shark provides two classes of features to help you analyze usage. The first one is called call stack data mining. And in this case, what you do is you want to filter unwanted information. How many of you have profiled something and not seen a line of your code in the top profile? Rather, you've seen all these Java libraries, system libraries. Have any of you had that problem? Yeah, I bet you have. That was the first thing that happened to me when I did a profile.

Now, the other side of this is graphical analysis. In this case, you can visualize the dynamic behavior of your program, like those plots that I showed you. Those are not just cute graphs to make a point. Those are data from real program that I was able to use to find performance problems. And you do this through a technique called software fingerprinting.

And in software fingerprinting, you recognize that if the pattern on the picture looks the same over and over again, it means you're going through the same code path. And if you're going through the same code path, you're either just doing the same thing over and over again, or you're iterating over some array or other structure data. And in that case, you can still look at the opportunity to hoist information.

In other words, in quicksort, you have to do a compare function. But suppose your compare operator then has to go through a whole bunch of different classes and call stacks, and so to actually get down to where it's doing the real A is less than B. Well, that's not a good. You should decapsulate that stuff, and you remove the amount of overhead to do that compare. So the software fingerprinting can also identify those kind of cases.

[Transcript missing]

So-- and focus package is the same thing for all the functions within a Pegler package. So I'm just going to show you this graphically, because there's a lot to cover. So in excluding library, here we have an example of a main program that calls an init function, a do example, and a cleanup. And do example, in this case, calls the function bar four times. And let's say it uses Java util. And it uses a hash table.

So in this case, when you profile it, you're just going to see all these samples, as indicated in yellow, in Java util and not in bar. So we don't know that we've been using bar to do this. But by excluding, it turns bar effectively into a leaf function. And now you can see, well, I'm making four calls to bar. Well, if they're computing the same thing, I don't need to do that. If they are, I can hoist.

Now there's another operation that's very similar to exclude library called flattening a library. And that is, makes the library go away. But instead of making it go away all the way to the end of the tree, it makes the library go away. And that's because it's not going to be able to do it.

So you can see that the library is now going to be able to do it. And if you want to do it completely, it replaces the library with all of the entry points into it. So you can observe your usage of the library in that situation. And finally, focusing. We want to focus on do example. It makes main, init, and cleanup go away. And you're just left with this subtree.

Okay, thank you. So we have a sort of modified version of the Java 2D application here. And to use-- You know, Java for Shark, you add an XRUN parameter. You use the JVMPI interface. It'll migrate to JVMTI in the future. And you add a dash XRUN shark argument. So with that in mind, let's run this.

So you get a message in the console, "Java for Shark is enabled." And here is our familiar example. We added a new pane here called Bouncing Strings. And this is kind of a cooked example in that it has some performance problems introduced that we want to go find. So let's go over and launch Shark.

And Shark has all sorts of traces, but we're going to choose, for now, Java Time Trace. And when it does so, you can pick the Java application. In this case, we ran it from a command line type shell, so you just see Java. It would see your application name if you made it double clickable. So let's just start sampling. Oh, by the way, you notice that it just paused? It does that every so often. And it's because we're garbage collecting. You see it just did it again. So that's kind of odd. So let's just start sampling.

So just let it go for a few seconds. You know, given the sampling rate, it's probably good to sample for about 10 seconds. And let's stop sampling. and we now have a typical profile view. We have a list of the various symbols and the percentage of samples that occurred in them. On the right here, you'll see a backtrace of the calls. So you see doString, drawString, and it goes down to bouncing strings paint.

If you click on another symbol, you'll see its backtrace. And one thing to kind of help keep track of things a little better, there's a neat feature called color by library. So if you click on that, look what happens. It colors all the strings. And this will help us identify everything. AWT is colored in this red color, brown for this, and so on. And one little problem here.

The Java runtime in JVMPI isn't perfect about reporting all the symbols, so we have a method with an unknown library. So we're going to use the exclude library to get rid of that and attribute those to things that are more meaningful. And when this happens, you'll see that these percentages went up. Paint strings went up, native font wrapper went up, and so on. If we look at native font wrapper, we can exclude the library again.

and now it pushes initialized font up. And this is kind of interesting. Initialized font, instead of drawing, we're drawing, we're painting strings. Why are we initializing a font? And this is taking up almost as much time as it's taking to draw the string. So let's take a look at the heavy and tree view.

And we've been looking kind of from the bottom up. We've been looking at the leaves of the execution tree. Now we can look from top down. Here's our event dispatch thread run, and it works its way down. And here's bouncing strings paint. So we see that this bouncing strings paint is an important place to look at. And here is one of the really cool features that we worked hard to get in for you guys. You can double-click on bouncing strings paint and get source.

And in source here, it's annotated by the relative densities of the calls. So about 9% of the time is spent in fill-rect, and 89% is spent in this paint-strings function. And you notice that these things are underlined. Well, that means you can double-click on it and navigate to the associated function.

And it's like a little web browser. You have a backwards and a forward arrow. And you see here, there's three areas of interest. And we found, oh, that looks like a problem. We're calling set font new font Lucida. So we're constructing a font every time we're painting, actually inside of a for loop. That's pretty bad.

So let's go fix that. And yeah, we kind of rigged this a little bit, but I've made errors like this in programs, and I'm sure other people might have too. So this is worth doing. So I happen to have the corrected code here just to save us time. So I'm going to change those.

And also, let's quit the app. And we're going to run again. Give it a second to load. And we go back to bouncing strings. And look at that. We just about doubled its speed. So now, just by doing some analysis, we are able to speed up our program pretty significantly.

Now, let's do one more trace. Since we're feeling lucky, we made some progress, let's see if we can make some more. So I'm going to do a memory trace, because this is a different kind of technique than you might be familiar with. So we're going to do start. and the memory trace slows it down a little bit because we're sampling every memory allocation. And let's stop that. And now we see, wow, 69% of our allocations occur in component bounds.

and we change this to value, we can see that just in those few seconds we allocated half a megabyte of memory. No wonder we were garbage collecting so much. We were allocating all these bounds objects. So let's look at what's up with that. So if you look in the back trace here, you'll see that bouncing strings ball tick is doing most of it.

And we go in here and you see that we have this bounds equals get bounds that's being called in every tick. Now I've already got code in here to turn that off and cache it, which would be the obvious thing. Just compute it once since the window size doesn't change.

So let's go ahead and make that change. We made a convenient little Boolean here called cache bounds. And by the way, this is not a cooked up example. This was something that we found in the program, just as it was given to me. So we're going to do that. We're going to run it again.

Go to Bouncing Strings. And look at that. We're now at about 183, 190. I've seen this thing go over 200. So just doing simple memory optimizations, because memory allocation is so expensive in Java, can get you huge wins. I was working on an application server a few years ago where we reduced the number of allocations by 10x, and we got a 3x throughput improvement in that server. So doing just memory analysis and memory reduction is a really amazing technique. So thank you very much. I'm going to give this back to Victor. Thank you, Christy.

So I just wanted to conclude with one recommendation. For optimizing for Hotspot, the main thing you need to know is exactly what your HOT methods are. You can find out bottlenecks in your own code, and you can identify them using Shark for Java. The other thing that you need to be aware of is, even if there's nothing more to be done, you also need to make sure that the HOT method is as amenable to Hotspot's optimization opportunities as possible.

So I want to highlight the fact, again, that inlining is probably one of the biggest optimizations that we're able to do. And therefore, it should be in your best interest to make your HOT methods inlinable. So here's the reminder of the key things that keep a method from being inlined: it being too large, it being synchronized, or it having exception handlers.

And the last tip I want to tell you is that we do have a Java lab here at WWC all week. And if you want to see your application running on a Mac and you haven't done so before, do go down there. Or if there's any performance bottlenecks that you want to identify to our engineering, definitely we're able to do that there as well.

So that concludes everything. I want to point out a few URLs. You can get Java reference documentation from Apple at the ADC website. You can also get Java 1.5 documentation at the Sun website. And finally, that is the URL again for where you can download Shark that has the Java support that runs both on Panther and Tiger. Finally, if you have any more questions, the people to contact are Alan Samuel, he's our Java Technologies Evangelist, Bob Fraser, he's our Product Manager, and finally Francois Jouel, who is the Manager of all things Java at Apple.