Java: Getting the Best Performance - WWDC 2000

Digital Media • 52:32

This session teaches you how to get the best performance from your Macintosh-based Java application. Topics include advances in virtual machine (VM) technology, graphics and human interface (HI) performance, and coding strategies and styles for improving performance.

Speaker: Jim Laskey

Unlisted on Apple Developer site

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.

Good afternoon, everyone. Welcome to session 182, which is Java: Getting the Best Performance. And here to begin our presentation this afternoon, please welcome Jim Laskey. Good afternoon. We're going to talk a little bit about performance. Yesterday we talked a little bit about some of the new technology that was being introduced on Mac OS X, in particular, Hotspot and some of Java 2.

We're going to focus on different aspects of these two technologies, but we're trying to zero in on performance, and we're going to talk about how we've made improvements to performance in the Java VM. We're going to give you some hints or ideas about how you can modify your code to gain performance.

We've broken this talk up into three parts because we wanted to get three different categories of things to talk about. The first part will be done by Ivan Posva of the VM team, who's going to talk about the VM, the new memory management, thread synchronization. Then I'll come back for part two and talk about code execution and how we boosted performance of code. Finally, John Berkey will come up and discuss how we can get some performance out of the AWT and swing in the new Java classes. I guess I'll get Ivan to come up now.

Well, thanks Jim for the introduction. So let's look at first what are the factors of Java performance. First of all, it's the design of your application that's the most important part of that. Second is the speed of the bytecode execution inside the VM, the speed at which we execute the bytecodes of your program.

Next, and that's what I will focus about, is speed of VM operations, be it class loading, garbage collection, threading, synchronization, and its associated libraries. And last, it's the speed of the hardware you're running on and the underlying operating system. That's not much we can do about it, but we are pressing on the kernel team in that area.

[Transcript missing]

Automatic memory management and language level support for threading and synchronization. Unfortunately, these two points are also associated with having the most impact on performance, most negative impact on performance. And we are here to clean up some of those misconceptions and show you what you can do to give the VM hints in that area.

[Transcript missing]

: The less memory you use, it can be a considerable gain. So I will -- the hotspot VM uses a generational copying garbage collector, and on the next slide I will dive into a bit more detail.

: So, it is accurate. The garbage collector is accurate, which means we know at all times in the executing Java program, we know at all times where we have life references to objects. We are not conservative when we walk through the stack, so we know : What words on the stack are actual integers even though they look like object references? We can collect those objects.

In contrast to the : Conservative collectors, this can really make a big difference in memory usage in the sense that you're not keeping live objects that are apparent objects but you actually : It is generational, which means the majority of objects that you allocate in Java when you actually study it, they are very short lived.

So what we do, we allocate all the objects in this new object heap, what we call the nursery. And once we exhausted this nursery, we copy the surviving objects out of there into an old space and can start from scratch in that nursery. Which means we have a fast allocation in this nursery.

We don't have to deal with the garbage we We have in there, we don't do anything with the garbage. We just copy out the objects that survive from this new generation to the old generation. You could say that garbage collection is actually the wrong term. It's more like a search and rescue operation where you rescue the few survivors out of the new heap, out of the nursery, into the old space. It is also a search where we, due to the accurate nature, we know exactly who's surviving. So the search is pretty fast. I say, I mentioned copying. We actually move those objects out of the nursery into separate memory area.

Studies have shown that : The first thing I want to say is that 5-10% of allocated objects survive from the new allocation space from the nursery into the old generation. So if you say this nursery is half a megabyte big, we copy 25-50 kilobytes worth of objects, which is not very much.

Also, since we have this copying infrastructure already in place for this rather big object space, we compact it regularly. Every time we run the old space collector, we can compact the heap and keep the memory usage to a minimum. : The old space collector also, since it has to deal with much bigger memory allocations, is incremental, so it's... It works on a chunk at a time, every time it's invoked, so it reduces user perceivable pauses so you don't have this big stop in the middle where you're collecting the whole heap, but you have these many, many small pauses which makes your UI applications or server applications respond much faster.

So the benefits to you as a programmer are very fast allocation. Since we always allocate out of this nursery, and we allocate in a stack-like fashion, so all we do is increment the pointer and this is our new object. All we have to do after we increment the pointer is check if we exhausted the nursery space and then we have to trigger this new space collection.

This... This allocation code is actually inlined in the compiled code. All we have to do is to allocate a new object when you have the new operation. This is equivalent to 11 instructions. When you compare that to, say, a C-style malloc function call, you have to go through cross-library function glue, and then you have to do the C prolog to allocate.

After those 11 instructions, you're not even allocating objects in the C language yet, versus in Java, you have your object and you're ready to go. As I mentioned before, we have an accurate collector, so we do aggressive reclamation of this nursery as well as of the old generations.

So, for example, you don't leave objects around that are not accessible anymore. And due to these two factors, you have essentially free temporary objects. Temporary objects are by definition short-lived, which we don't have to deal with. You have no overhead for short-lived objects because all we have to do is basically copy the few survivors out of there.

So, I claim you have essentially free temporary objects, especially since Java is multithreaded. If you have an allocation cache, you would have to lock that allocation cache first. You would have to take the object out of the allocation cache and then unlock. By that time, you already allocated the object out of the nursery.

What do you need to do? Well, as I mentioned, you do not build allocation caches, but what you also have to do is you have to tell the VM that you're done with an object. So what you have to do is, if you have an object here that you don't need anymore, or even worse, if you have an object hierarchy, especially if you have it in a static field, you have to set that value, that reference to null, so we know this object hierarchy or this object is not reachable anymore, so we don't copy it out into the old generation.

Or even worse, if it's in the old generation, we keep compacting it and so on. We keep that memory alive. So what you have to make sure that you null out your objects that you're not using anymore. And while I mentioned allocation caches, what you also have to do is if you have native code, be it stub-based libraries based on the old native call conventions from Java 1.0, JDK 1.0.2, or Java 1.1. Or if you have in your project, if you have JDIRECT2 code, you have to... - Basically, convert those projects to use JDIRECT3. There is a talk tomorrow about Mac OS X Java in depth, and that talks about how to convert projects to JDIRECT3.

Or you have to convert your native stub libraries to use JNI native calls. And if you're using the Objective-C to Java bridge for : When you're wrapping Objective-C frameworks to Java, then you have to make sure that you recompile your wrapper projects with the new JNI-based bridge that comes with DP4 on the CD.

Before I go into synchronization, I wanted to mention that Java threads are one-to-one mapped to P threads and therefore are also one-to-one mapped to kernel threads. They are fully preemptive. which means we inside the VM or you, we do not have to deal with scheduling. The kernel does that for us. We are multi-processing ready as you've seen in the hardware keynote this morning.

If If we have something that the kernel can use to schedule more threads onto a second CPU, we in the VM will make use of that. Therefore, your applications will run faster naturally as well. We integrated native and Java stacks for invocation stacks into one memory area so you have better locality of references as VM internals. This is not something you will notice, but this is one of the reasons why you have to go to JNI, for example, as well as the Accurate Garbage Collector makes it necessary to go to JNI.

So, my last slide is on synchronization. First I wanted to explain, when we talk about synchronization what is a contended case. A contended case is when you're executing in one thread a synchronized block, be it a synchronized method or, if you do execute a synchronized statement on a particular object, you're, inside that call, inside that block, and a different thread comes in and tries to do a synchronized block on the same object.

So, at that time, it is called that you have contention on that object, and The uncontended case, therefore, is if you have one thread go in, synchronize in a particular object, execute the whole block, and exit the synchronized block without any other thread trying to synchronize in the same object.

The studies have shown that the contended case is very rare. In those instances we use Pthread primitives and kernel primitives to make sure that we do the right thing of blocking the thread so it doesn't use any CPU cycles from then on. But what is much more important is that we have very fast synchronization in the uncontended case.

What you have basically is a constant time overhead. You have a couple of instructions to make sure that you set up that synchronized block. We do not allocate memory on the heap or anything. It's all stack allocated for that memory, basically for that synchronized block we allocated inside the invocation of that stack.

Most important of all, we don't use any OS resources, so we don't use any kernel resources, we don't make any pthread calls, we don't make any kernel calls, which is important since we want this constant time overhead. So, I will hand the podium back to Jim to talk about Java code execution.

I want to try to focus on basically what we've done to improve performance in code generation. But I want to give you some background. So we're going to go into a little bit of history first and then talk about optimization. And then at the end I have ten slides which are what I'm calling code generator hints, which are things that you can do in your code which the code generator can look at and say, "Oh, this means I can do this optimization." And hopefully you can pick up a few of those things and put it in your own code.

[Transcript missing]

: Now interpretation can carry us so far, but it's really not the end game. And what we need to be able to do at that point is to get into some kind of native code generation so that we get down to the point where we're actually running at the same sorts of speed that you would see in C++.

So we get into code generators and we had the first JITs that came out use the Java lane compiler interface as a plug-in to the classic VM. And what they did was intercept the execution of methods and go through and convert the bytecodes into native machine code. And a lot of the earlier JITs that came out basically all they really did was take the bytecode sequence and convert them one to one into sequences of native code.

That worked and again you got another boost of performance. You probably got about a five times boost in performance. But the problem is it wasn't really utilizing the true performance of the machine. It wasn't scheduling the instructions. It wasn't looking at the sequences of instructions and whether you could get any kind of optimization.

: So then we got a round of static code generators where people would take the back end off of a C compiler and put a Java front end in and produce static executed code. And that sort of works for some types of applications, but the problem with that is that Java is a very rich and dynamic environment to work in, and static applications don't really fit into what the spirit of Java is about. You need that dynamic environment to run in.

So then we get into the high performance JITs and you're familiar with say the semantic JIT that we've been using in MRJ. There's a couple of other high performance JITs like the C2 or the server compiler that's on hotspot. And what these JITs would do is basically optimize the heck out of the bytecode and try to reduce it down to something that would be quite close to what you would expect from a C and a C++ compiler.

The problem with these high performance JITs, well the positive part is that we're getting the really good performance, really, really good performance. But the problem with these was that there was a lot of competition between the JIT companies and they were trying to squeeze out as much performance as they possibly could, try to get the best caffeine mark they could working with the semantic JIT on the semantic JIT team.

We were getting infinite scores and caffeine marks because we were optimizing methods right down to the point where they were just simple return statements. : But the problem is, that's the positive side, the negative side is that it was taking more and more time to compile these things and more and more memory.

And that's where the cost was. So we have to find some kind of balance between getting the optimization done and keeping the compile time and the memory requirements down to a minimum. And that's where we are with hotspot client version. The traditional type of optimizations you would expect would be expression reduction, CSEs, loop unfolding or loop optimizations, data flow analysis. These are all the standard sort of things you would see in the Dragon book, I guess the standard compiler book.

And typically what would happen would be that you would go through a round of these optimizations and then they would reduce the application down to a little bit smaller and then you would have to go back and repeat them again because then it would bring it down again. So we have a Heuristic type algorithm to actually reduce the optimization.

And this is really where a lot of these high end optimizers got into trouble was because they would loop and loop and loop and they could loop several hundred times before they give up and say, well this is the best I can do with this method. And then actually actually get the best performance. Executed.

One of the great things about the Just-In-Time compiler is that you can do optimizations that you wouldn't be able to do in a static situation. One of the most important ones is to be able to determine whether a virtual method is monomorphic or not. What this means, what monomorphism is about, is that we have in Java the ability to create subclasses of a particular method, and then the ability to be able to override those methods. So in order to call a particular method with a particular object, there's a dispatch that goes on that says which method is associated with this object. So there's basically virtual objects, a virtual dispatch.

But most of the time, and it turns out that in most applications, a little bit over 80% of the classes that you have in your environment are not overridden. They're leaf classes. So there's really no need to go through this dispatch mechanism. You can make a direct call to that method and not worry about having to, you know, hitting the wrong method. So the just-in-time compilers try to exploit this, and one of the things that you can do besides just, you know, simplifying the actual call to the method is inline the code for the method that you're dispatching to.

If you take a look at some of the examples of your own code, you'll probably find lots of places where you're calling this method. It's rather simple, maybe a few lines of code. A good example of those would be the accessors, the getters and setters for your class.

Isn't it a shame I have to go off and call this method because this code could fit in right in line and be very inexpensive? This is what the JIT does for you. It goes off and says, "Oh, that's a really simple method. I can determine that it's monomorphic. There's nobody overriding this, so I'm just going to inline this code in." And the code becomes very inexpensive.

Now, there's always the possibility that that method may get overloaded or not overloaded but overridden at some other time. So what will happen in the Just-In-Time compiler environment is that it may create several flavors of the same method. So you may have several different versions of that method, one where maybe a call is overloaded or overridden and another case where it isn't overridden. So what happens is that the method that needs to be executed will be chosen at runtime and will go off and do that code or whichever one suits the situation.

: Also, I should make a point, and this is another great advantage of just-in-time compilers, is that we can do processor-specific optimizations at runtime. The great beauty of Java is that I can take this bytecode and port it to any machine and have it run on that machine.

While even on the same general architecture like the PowerPC, I can get different kind of optimization on a G3 than I would on a G4 because of scheduling and so on and so forth. So the types of things that you could do would be to say on the PowerPC I can use mask and shift operations, or if I'm running on a G4 I can use a velocity engine instructions. So I can do that on the fly.

And then I can do instruction scheduling. And once I've done that, on a particular machine, I can run it on the G4. : And then I can do instruction scheduling. And once I've done that, on a particular machine, I can run it on the G4. : And then I can do instruction scheduling.

And once I've done that, on a particular machine, I can run it on the G4. or a particular machine that I'm running on, I can cache the code that I've generated. The next time I go and execute it, I'm going to use that cache code. I don't have to recompile it. That code has already been tailored for the machine it's running on.

What gets compiled? There are probably all kinds of myths out there about all these magic heuristics that we use to figure out what method gets executed. There are some truths and rumors, but generally the things that do get compiled into native machine code are primarily methods that have loops.

and methods that have been interpreted n number of times. Those are the two primary triggers that trigger whether something gets converted to native code or not. A method that has a loop may get executed once in the interpreter, but then each successive time that it may get converted to native code and get run as native code. The triggers for n number of times, I said n number of times because different JITs trigger at different levels. Like hotspot triggers at around 1500 executions before it actually goes and converts to native. But that fluctuates depending on different kinds of criteria.

What doesn't get compiled are typically things that, say, if you have a method that's currently running, and it's looping and calling other things, and it seems to be looping for a long time. If it didn't meet the original loop criteria and didn't get compiled, it may sit there and continue as an interpreter.

This is something that would require on-stack replacement. In the current version of Hotspot, we don't have that in place yet. But you should be able to replace, eventually we'll be able to replace something that's currently being interpreted with something that's been compiled. But that doesn't prevent that method from being compiled. What happens is that if that method is being called by any other point in your application, then it will use the compiled version of it because it's already been triggered to be compiled.

Class initializers typically don't get compiled because of the fact that they're only executed once. They usually do the initialization of their statics and create whatever things that they need. They don't need to go beyond that. So they typically don't get compiled. Finally, things that are written in Java assembler that are very convoluted in their go-to structures and whatnot where it's really hard to do the analysis of that code. We can't generate native code for them.

We may try, but it's typically not worth the trouble. The only time I've ever run into that really is with the JCK. There's a lot of the JCK tests that try to see what you can do to trip up on the JIT. So, that's the first thing. So you don't typically have to worry about that.

Okay, so now I have ten hints or things that you can do that you can provide the code generator, provide to the code generator that will actually help your performance. There's various degrees of performance improvement here. Some may be a little bit more dramatic than others, but you don't need to use them all and you don't necessarily have to go out and feel that you have to use them all. They're just ideas that you can keep in the back of your mind when you're trying to tune your application at the end of your cycle. The first and probably the most important thing is to write small and concise methods.

Try to avoid methods that have 2,000 lines of code in them because what happens is that when the just-in-time compiler kicks in, it's got to compile the whole thing. And maybe you're only going to use a couple of lines of it because you've got this big case statement and there's maybe two lines in it.

One of them is going to get executed most of the time and the other ones may get compiled or run very rarely. So what you should try to do is to try to keep them small so that they'll compile quickly. And then if you've got some code that's not going to be used very often, move that code into separate routines so that if it's necessary, they'll be compiled then but otherwise move them off. Don't worry about your method being too small because method inlining will take care of that.

The JIT will figure out the nice load balance to get a nice size routine and what inlines nicely and so on. So don't worry about the size of the thing. It's not important. That's not crucial. All right. : The method is too simple. Finally, you should always remember that accessor methods are almost always in line. Even in the classic interpreter, the accessor methods were often in line. It's good to use accessor methods instead of accessing the fields directly.

The next hint is to trust the supply classes. What we try to do is look for hot spots in your code, things that take a long time and try to gain performance. One of the things we do in our analysis is identify methods that get executed a lot. We want to try to tune those so that they execute very quickly. Often we tune them directly into assembler.

Methods in the class string and string buffer vector, which are used a lot, we actually have intrinsic or built-in methods to deal with a lot of those situations so you get better performance. If you think you can write it better than Sun did, just remember that maybe we're going to do it a little bit better for you in the background. That's just something to keep in mind.

Array copy is something we gain performance on. If you're running on a G4, hopefully we'll be able to get this. This is not currently in place. But if you're array copy running on a G4, use the velocity engine to help with the copy. And sine and cosine and tan. On Intel, you would call the hardware directly to do those. We call the library directly so you don't actually go through the glue. So it's a bit of a performance improvement.

We don't have 64-bit architecture, so there is a cost in using lawns. If you don't really need longs, you're just doing it because you think you might need the precision later, then we'll maybe rethink it a little bit and go back to using straight integers. Long multiply takes five instructions. Long divide has to call a subroutine. A shift operation may take several instructions.

So it's not as simple as just saying long and things are going to work well. There's lots of techniques to get around some of the problems you might have with longs. I did a class library that handles the situation where you're trying to use longs that deal with the unsigned integer problem when you want to do unsigned compares. There's a way of actually doing that without having to resort to long.

Floats vs. Doubles Floats are smaller and take up less memory. In most circumstances, floats and doubles have equivalents in execution. But there are some circumstances like divide, where divide of a double is actually twice as long, or almost twice as long, as a float. So if you don't really need the precision, stick with float. Another reason why I'm recommending using float is that as we progress to the velocity engine, the velocity engine doesn't support doubles, it only supports floats.

So if you're thinking about declaring an array of doubles, see if you can use an array of floats instead, because then that's most likely what we'll be able to use to, or what we'll apply the velocity engine to. There's no commitment to that, but just keep that in mind.

Try to avoid the use of generic types. It costs actually to use these generic types, especially when you're doing assignments between the generic type to a specific type. Because the VM has to do a type check to make sure that it's valid to do that. And it does that type check at runtime.

And it may have to search up the class hierarchy in order to determine whether that's a member or not. Okay, so that's just something that you should keep in mind. Especially when you're doing assignments from a generic type array to a specific array. Because that means that even if you're doing an array copy, it has to validate everything that's being moved from that array over. Okay, it has to go through a class check.

So, try to use subclassing and method overloading as much as you can. Because that will actually be better in the long run than actually -- and then using one -- let's say writing one routine that has a generic type. And then doing an instance of check inside of it. It's better to do the overloading of the methods.

Copy the local values. Some of the optimizers in the JITs will actually do this optimization for you, but it's better for the interpreter, it's better for the lower-end JITs and so on and so forth to have you move it out and work with that copy and then step it back in if you need to.

: In this particular example, the example on the left-hand side, we have the increment of the index -- or sorry, extracting the value from the table, increment the value, check to see if the value has exceeded 100, and then reset it to zero if it has. Every time you have that index, it's going to have to do an array balance check. : Again, the higher level optimizers will take care of that and move that out, but you can't rely on that. So it's probably a good idea to move it into a separate entity.

The other thing is there's a semantic issue there where if there's another thread that goes and changes that field or changes that array entry, you don't know which copy you're going to get. So if you extract the copy, work with that copy and step it back in again, you know exactly which value you're dealing with.

In a situation where you have multiple threads that are accessing something, you should use volatile if you're not using synchronization. Volatile is a little bit cheaper than synchronization because what it says is you need to reload that value every time you access it. And if volatile is not there, then what will happen is a highly optimizing JIT will say, "Oh, well I've got this value. I don't have to reload it." But then meanwhile another thread changes the value and you're sitting there in your loop waiting for it to change and it won't change. So use the word volatile when you've got global values that are being changed.

: Final, we've had a lot of discussion internally about these at final, but It's one of my favorite words as far as just-in-time compiling is concerned because it gives me a lot of hints about what the class can be or what kinds of optimizations I can do on the class. But it's something that you don't need to overuse. Write your applications and if you feel that the class is not going to ever be overridden for any reason, for instance, your application class, if it's not going to be overridden, declare it as being final.

And what this wins you is the fact that it says all of the methods in this class can now be monomorphic. They'll never be overridden. I can make direct calls so it improves the performance of the calls. It also says that if I do an instance of on that class, all I have to do is compare it to see if it's equal to the class.

So if I have the class string, which is declared as final, then the check to see if that is a string is a simple compare to see if the classes are equal. I don't have to search up the class hierarchy in order to find out what's going on there. Amen.

The other use of FINAL, of course, is on statics. And this says, this value is constant. It's not going to change. So once the Just-In-Time compiler knows that it's constant, it'll just grab it and say, okay, I've got this. I can apply it to optimizations in the code. And in this code sequence, what I can gain here is that I know that the allocation of that character array is a fixed size. So all I have to do is increment the allocation pointer by a fixed size. Boom, I've got my array declared.

And in the loop, I know that it's a fixed size loop, or the loop is going to go iterate a fixed number of times, in this case 32. So I can actually get rid of the loop and maybe do a blanket initialization of that array to spaces. So, Thank you.

If you have a choice between declaring virtual class hierarchy or using interfaces, you'll get better performance from virtual calls than you will through interface calls. That's because a virtual call requires a simple index into an array to get the address of the method that you want to call.

Where an interface requires an actual search of the class to make sure we find the implementer of the class and then it does an index into an array. So there's a little bit of overhead. Now in Hotspot they're very clever where they've actually cached the last instruction or the last method that you've called from a particular call point. So it's a little bit better in Hotspot, but it still has to go through a verification to make sure that, well, it really is an instance of that class that is being passed through.

Limit the use of JNI and JDirect. Initially when I wanted to do this talk, I wanted to convey to you that if you feel you can do it, get better performance out of C, you should probably really rethink it a little bit and think that Java is the way to actually write your code.

Try to avoid going off and doing things if you can. Because the optimization levels that you're going to get in Java will be pretty close to C, if not better, depending on which kind of level of optimization that you're doing. When you're using JNI, there's a translation layer that has to take place. You can translate into the system.

And then coming back again, you have to actually do a lookup of the method in order to find out which method to call back into the VM. So use Java as much as possible. Subtitles by the Amara.org community In conclusion, I just want to repeat what Yvonne said earlier. The best thing, first of all, is to make sure that your application has a good design.

Once you have a good design, go back and look at places where you need to improve performance. We didn't talk about performance tools here yet because we're not really finished with them yet. Hotspot is actually released with -XPROF, which will give you a profile of the methods that have been executed and what percentage of time you spent in there.

As we go along, we're going to have better tools. There's H-PROF tools that will give you much more detailed reports on performance. Get a good design of your application, then go back and start tweaking it and maybe apply a few of these hints so that you can get the Just-In-Time compiler to produce better code for you. Okay?

John. Oops. So hi, I'm John Burkey and I'm on the AWT team. And I'm gonna talk about a little bit different side of performance and that's how we can work together to improve performance, specifically from the framework level. A lot of what you just heard is about how to write good methods and good classes yourselves, but from the AWT's perspective, we're more concerned with just highlighting a few things that will help you use our frameworks, actually JavaScript's frameworks, our implementation of Sun's frameworks. So anyway, the five major areas I'm gonna cover are here. : I'm going to go ahead and start the presentation. I don't need to read them. Basically, you'll see it's a lot of usage pattern kind of stuff.

So for image creation, the main thing is there's a new call in Java 2 called get compatible image. And for all the 1.1 usage, specifically swing and the kinds of things that are in the toolkit class, we'll automatically take care of this for you. And what this will do is, depending on the decisions we make based on device depth and stuff, we'll make sure that it's the optimum image. Most users of image don't need to do anything other than this.

And if you do need to dive into bit style manipulation of the image, then number one, check, make sure you really want to do that, and then go into the imaging classes and be real careful about the ones you pick. One of the cases here is that there are some image types on Windows, for example, that aren't as common on Mac, and so they may not perform as you expected. So that's the kind of thing you want to notice. And again, so get compatible image is definitely your first choice.

The next thing is this thing called rendering hints. And if you have a graphics object, you can both get the list of default hints and also set your own. And the basic idea is that with Java 2, there's an ability to do a lot of really nice graphics. You can do anti-aliased text, anti-aliased permutants. You can do image splitting with different kinds of convolutions and do really high-quality work.

But the fact is, for a lot of what we do today, including most of our normal GUI framework operations, we don't need quite the quality. So that's why, for example, a lot of these will be defaulted to lower quality on the implementations for like when Swing uses it.

And that's also because a lot of times, in GUI framework building, anti-aliasing can get in the way and cause fuzziness. And I'm sure you have examples of that in your own experience. So the key here is that we haven't actually fine-tuned all this stuff yet in our implementation. But what I recommend is that first you get familiar with these and experiment with them, understand the different ones. I'll explain them in a sec here.

And then as we moved towards shipping, you try these again with our final candidates because you will start to experience differences. This is really important for us because we can make serious changes in the way implementation does things when we're in fast mode. So rendering is a key. I actually deleted off the front of these int key underscore, but they're in this sun.awt.sunhands class. And so rendering is a key that you can pass into.

There's a little hash map, and you can say basically quality or fast. Anti-aliasing you can turn on and off for both primitives and text. That's why there's two of them there. Fractional metrics is the ability to... specified sub-pixel positions for text. One thing I'll note here, too, is there's a new way to do text in Java 2, which is glyph vectors.

And it is the highest performance way to do text. Don't assume that you can make the -- do those kinds of things yourself. And if you really are doing text stuff, you want to see high quality and speed, look at the way the swing examples do this stuff before you go and write your own.

Because they're using glyph vectors and that is the fastest way to do the text. So fractional metrics comes up because there's an additional cost with doing sub-pixel positioning of your letters on your text rendering. So dithering, there's a couple different choices there. Again, it's quality versus speed. Interpolation, there's bilinear bicubic and that's for your image blitting. Basically your image will look a little better when it's scaled to different sizes if you use the higher quality ones. But it's slower. So keep that in mind. Alpha And color rendering, same thing, color speed.

So bitmap manipulation, this is pretty key for our platform. We're optimizing first to make swing apps and normal usage apps fast. What that means is we actually are not gonna use the same data buffer types internally that are used on the Windows implementation, and that's because then we can do hardware accelerated blitz between our off screens and our screen.

And so in order for us to do that, we have a different implementation class under there, so don't assume that you can type unsaved cast down to specific data buffer types. They won't be down there. So at least look, do an instance of, you can create the other types and they will work, but they will be slower.

So just be careful. Again, this next one's kind of obvious, but there's a whole bunch of APIs that are in the new imaging stuff that are useful for some high quality advanced imaging stuff and they were developed with some, developers with that name, but they're not as useful for typical case.

So I'm calling it low call frequency methods. The point is that unreadable raster and buffered image, there's two basic ways to do things, and that's, this is the ones that I think are good for a typical case, and it's basically, you can pass in a whole rect of whatever size you want of pixels to push between your bitmaps, your rasters. And this is better because you can control the frequency of call operation, for the number of pixels you want to do.

So for example, if you want to drink RAM and you've got it, you can basically have a full copy that gets pushed across and make one call to copy. Or if you want to go all the way their way, I would recommend at the lowest per scanline, you can say make a method call per scanline. That's much better than these for speed, which exist in the API, but you want to be aware, and that's where you basically make a method call per pixel. shooting yourself in the foot in most cases.

: So double buffering, this is kind of interesting. So on Mac OS X we double buffer all Carbon windows right now in most cases. So what that means is since your app's already double buffered, and we'll take care of flushing that up efficiently, if you want to have the effect of double buffering, use the swing stuff because on Windows that stuff will be double buffered, and on our platform you'll still just be double buffered.

Whereas, if you create your own Java image and draw into that, which is really easy to do in Java as most of you know I'm sure, and then blip that image you'll actually be triple buffered on Mac OS X, which is kind of wasteful. So again, we take care of the swing. So what I do in the case where I want to implement this is I just go ahead and use the swing stuff and use Jpanel, etc. and then it'll all just be taken care of.

: So this is another issue from ECOWAS 10. It's performance related. Basically, the hard work of doing live window resizing is yours, the app developers. We'll make, of course, all the primitive stuff fast, but When you do live window sizing, suddenly component.paint actually has to be fast, right?

Because it's going to be called every time you move the little bottom of the window. So if your code can't handle that, you'll either have real chunky performance when you're moving the window. It'll act funny, like you have a low frame rate or something, and the mouse will become unresponsive. Or what you can do is do some kind of threaded rendering.

So the case, some cases are like JPEG loading, where you show what you got, don't wait. Or if you've got just a real complicated image, maybe you just pass over and do some kind of simple version of that. And queue up a thread that does the rest of the work. And then at that point, if it's real complicated like that, maybe you would use double, another buffer. In fact, be triple buffered, but draw to an image and then do it later.

That gets a little more complicated because then as the size changes, maybe you want to sample and draw images. And then scale. I can talk to people afterwards if there's specific examples for that. The main thing I want to point out is that with live resizing, your paint is going to be called a lot. And so it needs to be as fast as you can make it. So I guess that's all for my slides.

I guess I really don't have much to say as far as the summary is concerned, but the environment that you'll be working in is a little bit different than MRJ. And there are going to be some different things that you may have spent a lot of time trying to get performance on in MRJ.

They're going to be a little bit different when you get to Mac OS X. So hopefully we've sort of covered a broad enough area that you keep these things in mind when you're working. So I guess I'm going to get Alan to come up and coordinate a Q&A session. We've got time.