Java • 1:01:09
This session covers features and capabilities of the HotSpot Client Virtual Machine, including Apple's innovative sharing technology, as well as application tuning and debugging tricks.
Speaker: Blaine Garst
Unlisted on Apple Developer site
Transcript
This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.
Good morning. I was thinking about just sitting in the audience and starting this talk because to some extent the Java Virtual Machine is an invisible thing, right? It like sits there and does your stuff. When you're programming in Swing or when you're running your application and things, the Virtual Machine is like just, it's just there. It's just the utility. It's just the thing that makes your Java happen and hopefully if my group does this job right, that's, you know, you won't hardly even have to know about it, right? Well, that's our goal.
We want you to just enjoy the benefits of it, but in fact, it's an incredible piece of technology. I don't know whether you have heard of the game or have played the game, the Incredible Machine or have kids who have played that game, but the Incredible Machine is this great little app on the Mac that lets you build and put together really fun virtual machines. It's just one little stuff and I kind of think of the Virtual Machine, the Java Virtual Machine, as an incredible machine.
So anyway, in this talk we're going to talk about what we're going to, I'm going to get up on stage as if I were still sitting in the audience saying all this and tell you about it. So first of all, if you just came from Steve Neroff's talk, you heard Larry Abrahams get up there and talk about how hotspot is this next generation virtual machine. So it's true.
It is a fabulous piece of technology. So we're going to talk about what hotspot's about and why it's so good and why some of the things we like about it. Beyond that, we're going to talk about what Apple does beyond that. What Apple does to add value to the virtual machine as we get it from the sun.
We will talk a little bit about what is in the Java developer preview that we are going to be releasing, I guess, later this week or any day now or by the end of the week, whenever we get it out on the website. And finally, we'll have some time for some Q&A.
Java Virtual Machine Basics. What does a Java Virtual Machine do? So if I say JVM, by now I hope you know that I'm, it's the acronym for the Java Virtual Machine. The Virtual Machine is responsible for managing the threads of execution. Your code is written in Java and it executes on various threads. So the Virtual Machine makes those threads happen. It's pretty easy for us because we have Mach and threads underneath us, but we keep track of what's going on. There, we, the machine, execute your bytecodes. It is an operating system in a lot of respects.
And having managed operating systems before, let me assure you, it is an operating system in many respects. The machine collects your garbage. Your garbage is your unused objects. These eat up memory and as fast, and one of the virtues of the Java programming model is you don't have to worry about getting rid of your memory when you're not using it. It helps us if you null out your references when you're not using it anymore. It helps you also, but it lets the machine take care of that stuff. The machine helps shift control back and forth between your bytecodes and native code underneath.
"Byte codes are great. So are native libraries that do great things like speech recognition, speech synthesis, QuickTime, you know, GUI drawing, all that kinds of stuff. So there's so much you can get done in byte code and some stuff you have to get done in native code. And so the machine also handles the transition to and fro.
"Library Java Home" is where we put, you know, the standard properties and things like that. And that's something that you get to extend. "Cocoa Java," if you wander around with Java browser or something, is located in a couple other places. It's under "System Library Java." So if you poke around and find strange things in strange places, this is meant to help tell you where some of our stuff belongs.
The next slide. "Java Home" is about what we do or what we think of as your place. So in our system, Java Home is "Library Java Home." And the things in there that you extend with your code, with your applications, are the bin area. Sometimes you've got little helper utilities and stuff.
"Java Home Bin" is a great place to drop those. If you have jar files that need to be part of, you know, the standard extensions, you can put them in there. That's what the extensions directory is all about. You put stuff in there and you don't have to fool around with class paths anymore. That is great.
Class paths are not such a great idea. Obviously, the other stuff there, fonts, images, you put your properties there, security data, you know, it's the general kind of dumping ground for all stuff, all sort of support data for Java programs. "Cocoa" and "Mac OS X" has different bundling mechanisms that let you... package some of that stuff before, but if you're coming to us from another platform, don't want to rethink that part. You know, Java Home is where you already put your stuff. This is the part of Java Home that we want you to extend. And of course, in applications, when you write stuff, you put your resulting product up there.
So that's it with the basics. I want to talk a little bit about Hotspot. So what is Hotspot? We take HotSpot from Sun and we put it and and and and port it to Mac OS X. So it's Not hard compared to what it was on Mac OS 9. On Mac OS 9, we had to do things like invent a threading model.
We had to do things like, well, anyway, I don't want to go into it. So the adopt to Darwin phase is pretty easy because there's natural threads underneath, there's a natural file system model, so IO just happens and stuff like that. So it's a porting job. The main thing where we add value at this stage is we write the interpreter.
This, you might think, well, the interpreter is just a C file, right? And you just kind of compile it. Well, that's not quite the way it is. So I'll talk about that a little bit later. But we get the interpreter up and running. And so now you can run HotSpot in an interpreted mode. And then, of course, in order to make it run fast, we have to write a compiler. We have to write a runtime, a Jitsi compiler that compiles your bytecodes dynamically into machine code, folds it into your program, and makes it run.
So that's the basics for what we do with HotSpot, right? And then the point comes where we want to make it better. And I'm going to talk a lot about how we make it better. But let's just step back to HotSpot. HotSpot, this next generation technology, what is that really all about? Well, first of all, it's 700 files. I have four people in my group who work on HotSpot, as well as other things. So 700 files. That's 200. 200,000 lines of code. It's actually 714 files, 220,000 lines of C++ code.
I don't know whether any of you just started out programming six years ago and have only known Java, but C++ code can be intricately complex if it's not done well. Luckily for us, HotSpot is done very well, but it is still an enormous undertaking just getting that part up.
So let me talk about that interpreter for a minute. The interpreter is actually combined, assembled if you were, out of code templates every time you launch. So there is no interpreter.c file. There are templates for the interpreter. Now why would you jam together an interpreter every time you launch the VM? Well, let me tell you. There's two reasons.
One is, we don't quite make use of this right now, but if you're on a job, you're going to have to have a VM. If you're on a G4, we might be able to have a faster little loop that could take advantage of a G4 processor. So if we had just one version of it, it would have to be tuned for either a G3 or a G4. Another one, though, which we do make use of, is if you're debugging, every time you execute a bytecode, you often have to ask.
You have to say, am I supposed to stop here? Is this a breakpoint? Am I supposed to do something else? And so there's at least an if debugging check that you have to do on every bytecode. Well, not if you assemble the interpreter on the fly. So we check, and we look, and we say, are we running this in a debug mode? And if not, we assemble the interpreter without that little if check.
So the instructions just go flat out. The interpreter instructions just go flat out as fast as they can. And it's really important to have a very fast interpreter because compilation, just-in-time compilation, takes cycles away from the interpreter. And you should only do it when you have to, when it's going to be to your program's advantage. So having a very fast interpreter is very important. And so that's why that part of the technology is actually pretty sophisticated.
I could tell you more about how, when you're about ready to do garbage collection, you actually swap out the whole interpreter with a little jump table such that you jump into and start synchronizing with the garbage collector thread. But, I mean, it is a bunch of very sophisticated technology. It's really cool.
But if you're not going to be spending your life in the interpreter, what you want to do is have a fast compiler. You want a compiler that compiles fast because it's, again, taking cycles away from the running time of your program. And you need it to build good code when it does spend that time. So our compiler is a fast compiler and compiles into pretty good code.
We are anxiously awaiting the next generation technology from Sun, the 1.4 train, because it actually has a better technology for generating better code. So we're looking at that because, as Steve Naroff said, the pendulum is swinging back to the compiler part. So we generate pretty good code, and we're looking to make that code a little bit better for you.
Finally, HotSpot has a patented, I believe, implementation of Synchronize. Now, Synchronize is an interesting notion from the Java language viewpoint. When you go to access a vector object, for example, its methods are synchronized, right? And what that means is that they're safe if several threads are trying to do operations on that object at the same time.
Well, the reality is that most of the time, when you use a vector, only one thread's operating on it at a time. In fact, maybe only one thread will ever operate on that. So the idea with the Synchronize operation is that to get a hold of and acquire the lock for that object is a very, very fast, hand-tuned, assembled, you know, compare and swap operation. And basically, it says that the data structure for that is built on the stack.
So you can see that the data structure is built on the stack of the caller, and basically, it's very, very cheap. There's no extra data allocated for that. Only if a second thread comes in and says, "I need to operate on this object also," do we build a heavier weight operation, a heavier weight locking operation, and actually go to the operating system to say, "Block that thread. We don't want them to spin. We want to put them to sleep until this other guy is done." So a very fast, synchronized implementation is a trademark on a HotSpot. Thank you. It's a patent. It's one of the really cool things about HotSpot.
A fabulous thing about HotSpot is that garbage collector. I mean, when I grew up, I never thought I'd be extolling the virtues of garbage collectors, but garbage collection is actually a fabulous technology that lets you program a lot easier. And so I'm going to talk a bit about garbage collection right now.
Well, not quite yet, sorry. What are the benefits? I want to talk a bit about what does all this fabulous technology do for you in combination. So, one of the people who works for me put together a little tiny benchmark. We call it the allocation micro benchmark. It's from one to 16 threads or something like that it goes.
Let me tell you about it. It's several threads running in this allocate and objects, free them. Allocate and objects, free them. How many threads? Can you get going at this? And he measures the peak rate of allocation. So, the fun thing about it is he wrote it about four times. He wrote the code in Java.
He wrote the code in C. He wrote the code in C++. And he wrote the code in Objective-C, which is what Cocoa is based on. So, then, of course, you run it on a multiple processor machine to make sure you've got two threads actually trying to do the same thing at the hardware level at the same time. So, I have to warn you, micro benchmarks should be taken with a grain of salt, a lot of water, and don't think about them too much.
or don't trust them to predict your performance because they often focus on one very atypical usage pattern. I mean you might use it a little bit but you don't use it a lot. And so anything you see from a micro benchmark about a particular little usage pattern it's really hard, impossible really to extrapolate from that to your program to any kind of a extrapolated win for your program. So to emphasize the point, your mileage will vary obviously, it will be much less. So let's talk about the results. So with C code, this allocation benchmark gives us about two hundred objects per millisecond.
[Transcript missing]
allocates faster than compiled C++ code. Not bad. When the compiler has run and is running those threads, the allocations are eight times faster. That is pretty phenomenal. This is two threads trying to go after objects, and they get them eight times faster when you're writing in Java. For a point of reference, in MRJ on Mac OS 9, compiled as fast as it could go. Well, it's still faster than C or C++, but it's just a little bit faster than HotSpot interpreted.
So let me talk about garbage collection again. Garbage collection is 41 years old. First paper on garbage collection was John McCarthy in 1960, where he talks about mark and sweep. Mark and sweep is the idea that you got your object laid out in memory, and you go and you mark every one that's still alive. And then you get to reclaim the stuff between the objects. So that's pretty cool. It was used, obviously, on a LISP system.
Three years later, Marvin Minsky, of other fame, came along and provided an interesting paper on a copying and hence compacting collector, where not only do you mark all the objects that are alive by descending through their roots, but you copy them into a new space. And so that compacts your memory. And so you don't have fragmentation issues that plague C programmers all the time.
Because your objects can be packed into the smallest memory they need to survive. And so this hugely extends the running lifetime of a program. It was a long time before the next major advance in garbage collection came along, and that was in 1984. Dave Unger put out a paper about generational collecting.
And since then, well, over the course of these 41 years, there have been over a thousand papers written on garbage collection. It's a great topic. Java is the first system where it really comes in the mainstream for folks, though. I got my data for this great book called Garbage Collection.
And if any of this talk interests you or intrigues you a little bit, I highly recommend you go and buy this book. It reviews all the algorithms in a very, very great way. Let's talk about generational collecting. What's the idea with generational collecting? Most objects die young. Right? If you use an object just a little bit, it's dead.
So the idea is you split memory into generations such that You can minimize the number of CPU cycles allocating a new object, and you can minimize the number of CPU cycles to remember and keep track of the old ones. So the idea is that old objects actually often don't change that much in terms of what objects they hang on to. So if you can never worry about an object, if it never changes, then you don't have to spend cycles even remembering that it's still alive.
So in order to make this really happen, the compiler and the interpreter implement what's known as a write barrier, such that if an object in one generation gets stored into an object in another generation, we keep track of that to say, hey, you better go look at these objects over here because they might, you know, we might have had an intergenerational reference here so that we can keep track of which objects are alive. So that's sort of the basic background technology.
So what I want to talk about is how is that employed in HotSpot? HotSpot has four generations running at once. The first generation, the Eden, is where you allocate objects. And basically it's as simple as you've got a pointer to the top of memory, you add the size to it, and you're done.
You have an allocation. The only complication here is that the assignment for the memory, the mem plus equal size, is an atomic compare because you've got multiple threads that may be having to go after that and you might have missed, so there's actually a little loop to make sure that you stored what you want. And so you have to loop back up and see whether or not you have to re-add from the top of stack. It's very, very fast.
The so-called new generation, It's where objects that survive, that survive the first run. I mean, that's the only thing you can do with objects in the new generation is you allocate them. You never worry about them again because if they ever stay alive, the only way they stay alive is if they got stored in an older object. Other than that, they're dead. So you just assume that everything that's in the Eden space is dead because you actually track it. You actually keep track of what objects stay alive in the other generations.
So in the new space, the new space is a two-space copying and collecting, you know, Marvin Minsky kind of technology from 1963 where objects that are in this space just get copied over to another one and compacted. And if they survive this kind of back-and-forth space a while, then we say, ah, they're no longer a child, they're an adult. We push them into what's known as the tenured generation, and they stay there from adulthood until death. Actually, they can die at any stage, but their adult objects go there.
Hotspot actually has two different algorithms for using to maintain the tenured generation. The one we ship with is a pretty classical mark-and-sweep algorithm. There's another one called the train collector, which you can get to with dash X, Inc. GC, I believe it is. We haven't done much experimenting or much qualification on that.
We intend to, though, because the virtue of the train generation is it spends more cycles keeping track of your objects, but you have less pause times when it goes to do and find some dead memory. So pause time is actually kind of important for GUI-based apps, isn't it? So we're going to work on the train collector and see if we can get it into shape to ship with. You're invited to go play with it yourself. Maybe it works just fine for you today. There's another generation, however, which is used for support objects.
Those 200,000 lines of C++ code are possible because... those objects are, for the most part, garbage collected. They're garbage collected with the same collector that is used for the rest of your Java objects. So Hotspot eats its own dog foods. It collects its own... it implements its own collector and uses it for its own purposes.
So the permanent generation is where the support objects for the program are used. And other implementations, those usually just come out of the mallet keep. But in Hotspot's case, they come out of the so-called permanent generation. And that uses a mark-and-sweep. Objects there rarely die. So it's rare that we actually worry about those too much. Let me shift gears a bit and talk about the things we do... the things we do to make it better, to make Java better.
First of all, from the VM perspective, one of the things we do is provide better integration. Better language integration lets you see more APIs to use to write your programs. We like to provide better performance. Performance is critical to how your program looks, how it behaves, and we really believe in better performance. There's general observations about performance. There's all different ways to think about performance.
In general, you want to do more with less memory. One of the things that HotSpot does is, in other implementations, they had an extra word per object just to keep track of that lock, just to keep track of whether or not a lock was around for an object.
And they had another data structure, the handle, to keep track of where it really ought to be, so you stored handles and everything. And so all that pays off. So HotSpot runs about 10% in smaller memory. Simply because it doesn't use handles, and it doesn't have extra data space for that rarely used monitor on every object.
[Transcript missing]
You all can see this. Not too bad. This is the JDirect3 example. I just pulled this pretty much straight off the web at developer.apple.com/java and as you see, The first line of code, public static linkage, needs to be done typically in a static initializer. For some reason, why didn't that show up here? The new linker part is the part that you should do in your static initializer and and the rest of the team have been working on the HotSpot Client Virtual Machine.
So, from that, that's it. From that point on, you can now do prime dot compute prime, send it a short, and it'll return a long, long, and you're in business. You're writing in Java, and you're using this C-based library underneath you. The counterpart for Cocoa, Cocoa's a pretty rich framework. Steve Naroff says he's been working like obvi with Steve Jaroff. Steve Naroff: Yeah. I've been working with Jobs for 15 years. My tenure isn't quite that long. It's only about 11.
But I had something to do with some of the Cocoa APIs in a role previous to the one I have now, and I wanted to pull up just a little bit of something I did a long time ago. I can get to it from Java. There's a date formatter there.
The date formatter takes a string and turns it into a date. And more than that, it can take a date and turn it into a formatted string. So this is an example that does that. So the key is to have a string element. The key element here is the -- I'm going to go back to the -- I'm going to go Let's see where to go. The next Tuesday at dinner.
That's a pretty simple little English string. It was a weekend's worth of hacking. It's kind of fun. But that actually turns into a real date. So you can actually get to Cocoa from your Java and make use of it without having to wade through Objective-C, without having to wade through JNI. And I invite you to take a look at the Cocoa examples that are shipped under Developer, Examples, Java, AppKit. There's actually two or three programs completely written in Java.
There's a game called Blast App. There's the Sketch program, which is a simple Mac draw kind of a game or a program. And there's a text editor in there. So go play with Cocoa. It's kind of fun. Let me talk now about better performance. I said we tried to innovate in two areas. One was better language integration.
The next one is better performance. Better performance, we all want it, right? Question is, of course, how? I mean, it's not like you just walk up to your program, unless you have optimized it, and say, how do I make it faster? And it's obvious. With optimized it, it actually is.
In our case, We had to scratch our heads a little bit, right? We said, "What are the basic principles of performance?" Well, if you remember, if you think, if you've ever done performance work before, you should know that memory is evil. If you are wasting memory, you're going to spend more time taking it away from a system that might not have it. You might have to bring it in from disk. You might have to... I mean, just memory is evil.
If you can use less memory to get your job done, your system's going to run faster. Cycles are moving... The rate of increase of CPU cycles to memory bandwidth is just continuing to... The disparity just keeps growing, keeps getting larger and larger, and to ameliorate that, we keep putting more and more caches onto the chip because memory has to be really close to the CPU.
So, just think, memory is evil. If you remember that, the next thing, is that, of course, you should steal good ideas. I mean, why invent totally new stuff if there's already some good ideas out there already? So, if we think about memory and we think about good ideas, what do we come to? When you talk about C technology, Long time ago, they put shared libraries into the system.
[Transcript missing]
The tenured generation, remember that's the one where your objects live in adulthood. And then there's that permanent generation, that sort of you don't know about it, but it actually costs you kind of place, right? When you get running, that whole space gets dwarfed. The HotSpot keeps the ratios of Eden to new and the total of that to tenured, and they keep the ratios the same. But in a 25, or in this case, a 35 megabyte application, the tenured space is where most of your stuff lives.
But doggone it, that permanent generation, the place where we keep things like your bytecodes and stuff, takes up a fair amount of space. Now, wait a minute, bytecodes, wait a minute, what about all the bytecodes for things like Swing? Things like JavaLang String? I mean, does your program have a different version of the bytecodes for JavaLang String? Of course not. It's the same bytecodes.
Well, why does your program in memory have a different copy of it? No good reason whatsoever. So when we took a look at what we could share, we figured out that it's that space for the bytecodes. It's that space for the metadata for your program that comes out of the standard shipping system libraries. So what we did was, so imagine that red space.
That red space gets split up. And then we split it up into three sections. There's a section that is completely shareable, the completely read-only part. There's a section that is mostly shared. It can be touched on, but it's mostly shareable. And then there's still your classes, the bytecodes for your classes that aren't really shareable to anybody. So this is a review slide. What we did was we added a new generation. We call it the shared generation.
It has no CPU cost to maintain because it's there to start out with. It doesn't die, because these objects are immortal. So, that's pretty cool. If we don't even have to build these objects, and we don't have to even maintain them, that offers us a CPU savings as well. So in addition to reducing memory, we get to reduce the CPU cycles to get to this initial configuration and to maintain it during the running time of your program. So let me talk a bit about the shared generation.
It's based on the observation that some objects never change and never die. So those are the objects we want to share. Those are the objects we maintain on your behalf for the byte codes, for the strings and stuff in your jar files, or in the system jar files at least.
What we do is we process those standard jar files once. We take a a a a we have a list of the of the, an ordered list of the classes that typically get used in a swing application. We load them into the VM using a special option, which I'm not going to tell you about.
We, uh, a key point here is that we don't execute any byte codes. Typically when you load classes into HotSpot, you of course, you know, run the static initializers. Well, the static initializers can do things like look at your command line arguments. They can, you know, go look at, look at you know, uh, go look at disk memory. They can do arbitrary code, right? And so that would change the, you know, change the state of the program.
So the idea here is that we want to just preserve the jar file. We just want to have an in-memory version of the jar file. The, the useful, the running, the useful part of the jar file is the part we want to save and share. And so we don't execute any byte codes. And then we use that fabulous garbage collector technology. There's a little part of it that just says iterate every object in this generation and do something to it.
You know, something like a closure, only it's written in C++. But anyway, we reapply that garbage collecting technology to pack all the objects that ever got created and pack them into these two spaces, the shared read-only space and the shared rewrite space. And then of course, we write that space to disk.
And the next time you start up HotSpot, you just map that into memory, do a little bit of fix up, and you're running, right? Piece of cake. Simple. This is called pickling. Um, swizzling, no, not swizzling, it's not pickling. Um, map and go, I can't remember what, there's a, I don't know, there's a term for that. Um, map and go, maybe that's the right term.
The shared generation benefits, I think I hit on some of those already. There are virtually no CPU cycles used for the shared generation. That's what the asterisk is about. The read-only part, that's true. The read-write part, we do spend some cycles and actually, few more than we need to it's almost totally free We rarely read the standard JARs.
Classes.jar, UI.jar, we don't even read them to get you started. That saves cycles to process them. It saves you memory to, you know, read map the index to the JAR file and to, you know, read part to map it in and to wander through it and copy the stuff out to make our versions of it and stuff like that. And obviously the disk I.O. to get those things off the disk.
So if you never have to read them in, that they're not sitting in your disk cache. So that helps with the rest of your system's performance as well. So one of the benefits from that is that a hot start, you know, the second start of any Java program is always faster because we save all those cycles to begin with.
A secondary benefit of this technology is that we can be smarter about how we lay those runtime data objects out in memory. So, for example, there's linkage strings that keep your class, that whenever you reference another object, there's a little linkage string that goes in that says, you know, Java lang, da-da-da-da-da-da-da, or a reference to your classes.
That's actually laid down in the metadata, and those strings are rarely used. But if your bytecodes are sandwiched between those rarely used strings, what we do by pulling the strings out and putting them in their own space that's hardly ever used, and keeping your bytecodes hotter, then we never even pull those pages in off of disk that reference the data that you never use.
And so your working set actually gets smaller because we've done packing. So, what we're trying to do is we're trying to put the hot data into the memory pages that you're actually pulling off of disk. So, those disk I/Os pack more punch because they bring in more usable data due to this packing benefit.
This sharing benefit was the one I started out with. It's the last one I want to talk about. Steve showed you how, altogether, the combined benefits were 20 megabytes for two applications. You know, the benefits for three and four and five are the same. So sharing saves, we've measured three to six megabytes alone.
The other processing adds up to some of that other data. The other reductions in the working set add up to some of those other benefits. So if you're writing a swing app, and most of you are, you're going to be saving and getting that for free using our shared generation technology.
There are just a few caveats. We don't yet know how to share your application jars. Well, your application jars actually aren't all that often shared, but getting that runtime/launchtime benefit would be pretty cool. So we're going to at least try to figure out how to map and go your stuff so that your stuff launches faster.
The first start of a Java application is actually a little slower. And that is because we actually have to do some processing for all those swing classes. We have to do some processing all up front that we typically meter out as you load them on demand. And we're working on ways to not have to do that.
The other benefit is that we're not going to have to do that all the time. We're going to have to do it in a few weeks. The interpreter, those bytecodes that you execute have to be slightly slower. But since in Hotspot you spend 90% of your time in compiled code, slowing down the 10% you spend in interpreted by 1 or 2% isn't a big deal. But I just want to be truthful. A caveat here is that what we share are the classes on your boot class path.
Now, for some programs that alter your boot class path, HotSpot takes a look at that and says, uh-oh, we don't know what they're doing. So in JBuilder's case, for example, what they've done is they've provided their own implementation of certain AWT classes so that they can use them in their great designer. Their designer is a great tool.
So if you're using JBuilder for designing swing applications, here's a tip. You can get sharing for JBuilder by configuring JBuilder 5, which just got pre-pronounced and it's in your bags, by adding a line called add skip path dot dot slash lawt.jar. That lawt.jar is their jar file for giving them a better AWT.
and that's a configure that that line goes in a file I called it JSA you can call anything you want the magic is .config in the open tools area of JBuilder 5. So, directions for our sharing work. First of all, we know how to and we can improve the HotStart launch time even further.
We can and know how to eliminate, for the most part, we know how to eliminate that first start penalty. We know how to extend the HotStart or we want to extend this fast start launching behavior to all the files, or at least the ones we're told to, all the jar files that exist in that extensions directory.
We're really pleased with that second-order benefit of packing data. So what we would like to do is rather than gather all the data for a class and jam it, we want to make the observation that some methods are never used in a class. So the bytecodes for some methods shouldn't be on some of those pages that get brought in.
So we want to start packing based on the methods that are used and not just based on the classes that are used. This may well double the benefit of our sharing already by reducing your working set even more. We of course have to finish the GC work on the read/write share generation. And we of course could figure out how to do that. We've got to share more runtime data structures that live in that permanent generation.
There's a few things we're not trying to do right now. The non-directions. It's important when you're setting out to build something to know what your goals are and if you can, to identify the goals that you're not going to try to worry about. So the biggest one for us is we're not going to share the machine code that gets compiled for your bytecodes. I mean that is the first thing that other shared libraries, that the traditional C library share. But in HotSpot's cases, you've got to remember what HotSpot's about. HotSpot is about compiling the methods that you're actually using.
Not only combining them, but inlining methods that they use. So you get one long pile of code that is really hot because it has everything it needs to get its job done. So that highly tight code is really good for you. So when we've measured how much code do we compile, it never has exceeded two megabytes. When you're running with HotSpot, applications like JBuilder and stuff, we never end up compiling more than about two megabytes of code.
That code is not worth sharing. That code is the stuff that's hot that is for your run time. Because every time you run an app, of course, you get different HotSpots, right? You shift into this area, and it needs to do that, and then you shift into another app.
It's all based on the work program. So the idea with HotSpot is it's going to optimize what your program is doing right now. And so if we tried to share that, we wouldn't do as good of a job. So we're not going to share the compiled machine code that we've built on your behalf.
So the other reason is it's kind of hard, right? Because if you do try to share that, then it has to have relocation data in it. And so rather than folding in a branch to a direct address, we have to fold in an indirect. It just gets messy. It's not very good. There's only one place where sharing compiled bytecodes might make a difference.
And that might be for, say, the static initializers or the code that you actually run to get up and running. If the interpreter-- if that's a dominant cost to getting a program up and running, it might be better to have a precompiled, but not so good, but compiled-- compiled, not as good as Hotspot would do normally, but compiled better than the interpreter-- might be better to have compiled code start out. But that's sort of precompiled stuff, and I wouldn't even characterize it in the same way. So we might look at that.
Another thing we just decided at the outset was we are not going to try to share, have any kind of shared buffer, shared read/write buffer of loaded class information, a shared read/write buffer of compiled code information, a shared read/write buffer of anything, because you know what happens when you have a shared read/write buffer of something? Some other app can make you crash. We do not want that to happen. So that's just not a design point we're going to provide.
The status of the shared generation. This code that I'm talking about is in Mac OS X. We shipped it on March 24th. You're getting it already if you're using Java. The Merlin, we've asked, we've talked to some, we talked to some about a year ago and said, you guys really ought to do something about sharing because that's what scalability means for the client. And so they said, well the way you do this is you file a little, oh God, I can't remember the, not JSR.
You file a little, you put a feature request into Merlin through the open community process and stuff. So we sponsored one of those guys and it's a feature request in Merlin, which is their code name for JDK 1.4. And more than that, we talked with these folks. The VM teams know each other. And we talked with them and said, have you guys thought about doing this? And what about that? And stuff like that.
But anyway, so we've worked with them testing our designs out with them. And as we develop this thing, and we've provided this code, we've provided it back to Sun so that they can use it for their implementation of this little feature request. So current status, we have talked to Larry about the current status of that. So we've had very positive interactions with Sun on this work.
This comes about in two ways. With WebObjects, for example, with WebObjects stress testing, they run a lot of tests. And we ran into some bugs and we kind of chased them down. We go, hmm, this is bugging what we call the portable code. So we call up our friends across the street and say, do you know about this? And they go, hmm, no, we didn't. So we're actually feeding bug fixes through the indirect channels and making Hotspot as shipped by Sun better for everybody.
And of course, they've given us some feedback on approaches to take when we run into some problems. and stuff, so the feedback goes both ways. I'd like to spend the last, not the last section, but the last section before Q&A. The last section of this talk, I'd like to talk about what's in Developer Preview 1, which you're going to be getting either today, tomorrow, or the next day, before Friday.
The JVM in Java DP1, it's basically got about two fixes since we shipped it in Mac OS X. I just talked about them, actually. The WebObject stress testing gave us, showed us two things once they started kicking off. And so we've upped our mean time to failure to at least days. I'm not sure, we don't know of failure right now.
But it was running in terms of hours and about, you know, after about 24 hours, no, 24 hours. 48 hours of continuous hammering, a bug would show up. And that's the bug I alluded to that we figured out with Sun's help. The other thing, though, that was not quite right in Mac OS X GM was that debugging was really slow.
"Painfully slow. And profiling didn't work. So that's kind of bad. So in DP1 what we've done is we fixed both problems. We fixed profiling and we fixed the speed of debugging. And the way we did that was We took HotSpot 2.0 from the 1.3.1 technology train and packaged it as an extra VM sitting somewhere in that little implementation space I told you about. So there's actually two HotSpots in DP1. The one that's configured for normal use and the one that is secretly utilized whenever you do debugging or profiling.
So, now why would we do that? I mean, where do we get that VM from? Well, obviously we're working on 1.3.1, right? So we wanted to get 1.3.1 out to you in some ways, especially for debugging and profiling, because we think that's really important. The benefits from HotSpot 2.0 is, again, it's the client compiler technology from Sun. They also have a server compiler. The debugging, as I said, now works fast. Profiling works fast.
Profiling works fast. It works. It didn't work at all. 1.3.1 is actually, HotSpot 2.0 is actually, you know, a next generation of the next generation stuff. And they have a register allocator technology in there that we can use right away. And we do use that, so we get better register allocation when we're compiling.
And it foreshadows the compiler that they're working on for 1.4, which does even better code gen. So we're perfect. And we're prepping ourselves for getting on board with the 1.4 work. But we didn't stop there. I mean, this is Apple, right? I want you guys to come to expect more from us than just what you can read on the web pages at Sun.
So what we've done since then, or since we shipped GM, or Mac was 10 GM, was we put some smarts in to recognize when you're on a G4. Now, what could you do differently on a G4? Well, a G4 comes with this thing called a velocity. And it's a velocity engine. Now, what's a velocity engine, right? You're supposed to do graphics with that, right? Well, it's a special processing unit for doing highly fast pipeline graphics operations.
To do graphic or pipeline graphics operations in a high speed way, you've got to read memory like mad off the bus. Well, if you can read memory like mad off the bus, you can use it for simple things like copying memory, can't you? So we put a copy memory implementation in there that went on G4s, uses the AlteVec, and it is dramatically faster than just any kind of C loop or assembly loop you can write in PowerPC yourself. So we have that, and it's in our 131 version of Hotspot, Hotspot 2.0.
We put in an optimized instance of. I mean, this is just an example of lots of little things we do for you that you guys will never hear about. All you're ever going to see it is it improves your run time. But when you do instance of, you typically think, well, how would you do it? Well, if it's not this class, I got to look at the parent class, got to look at parent class. Well, we put a table in there such that it's always it's a constant speed operation. Instance of works. So it's fast.
We put in even better register allocation than what came from 131. 131 still doesn't deal with floating point registers very well, so now we have a better floating point register allocation method for when you're doing those graphics operations. Sharing is not in this little package for profiling and debugging use only VM. We know, actually, how to make startup times even faster, but since that's part of sharing, it's also not in that thing yet.
And 131 also has a technology known as per-thread allocation pools. So remember that very fastest Eden technology that I talked about where you bump the pointer, but that reassigned to the memory was that compare and swap instruction? Well, with a per-thread allocation pool, you don't need the compare and swap even. So it really is just about three instructions to allocate memory. Instead of a "stall the processor and check with the other CPUs maybe next to you and make sure they're not using this memory" kind of instruction. So it actually is going to be really, really fast.
I put this up here because we're starting this kind of beta train thing with DP1. I want you guys to play with it. So if you want to use it for casual use, use it via the command line like this. You say java-hs1-3-1. java-hs1-3-1 will run this new version of HotSpot on any program you throw at it. If you really like it, try using it all the time.
There's a link, a sim link, I'll let you go explore, but there's a sim link under Java VM framework that points to the version of HotSpot that actually gets used. It's the libjvm-dilib sim link. Slam it to point to that thing. You'll find an hs-131.dilib somewhere. If you slam the jvm sim link to point to it, you'll get HotSpot 131 all the time. Tell us about it.
So, when is this thing going to be available? I wish I knew. No. It's going to be coming, it's real soon now. So to get to it, you sign up at developer.apple.com. You go to the connect.app, after you sign up as a developer, you're all developers, you're here, right? Okay. You go to connect.apple.com and you download it.
When you download it, what does it do? It preserves your existing 1.3 implementation. 1.3 is a subdirectory under Java Framework, so it pushes that aside in case you don't like what you got. It preserves what we find under Java Home that we think you've augmented, specifically the stuff that's in Lib, including your extensions. Any other third party, even QuickTime is in there, right? Stuff we ship.
It's packaged up in extensions. So we preserve everything that's in extensions, because we actually put some other stuff in there, and we preserve everything we find in Java Lib Home bin. So that's the main motivation for that first set of slides that tell you the stuff that we consider our implementation and the stuff we think you should extend, because we do need to upgrade, you want us to upgrade, and we've got to agree on some rules. The stuff we can upgrade and the stuff that we shouldn't upgrade.
So, there is a mailing list, JavaDev, that you can get to. Go to, as I said, that page where I pulled the JNI example, or yeah, the JDIRECT example off, developer.apple.com/java. There's a section on there that talks about the JavaDev mailing list, and sign up. Members of the extended Java team read that, respond to it. We found it very useful, and we appreciate your comments from that. So, a quick roadmap.
The first one, wrapping Mac OS APIs and Beans. If you went to Steve Naroff's one, you saw Steve Llewellyn. Steve Llewellyn is is a great guy. I've empowered him to go do great stuff making more Java happen at Apple. He came up with some great APIs. The stuff you saw there are APIs.
They're beans. You can use them inside JBuilder to add that kind of technology to your apps. Find out all about it by going to the session 502. That's today at 5 o'clock. Java Development Tools, Steve Naroff talked about. That's tomorrow at 10:30. Java Performance. Performance is critical to us.
So we have a whole session on how is going to talk about how you can add performance to your programs, how you can discover it, things to avoid, things to do. Part of the Java Development Tools talk is the Optimize It demonstration, and JBuilder debugging, and project builder debugging. And if that's not enough, if you really... I put the JBuilder reference up here as well, because JBuilder is just an awesome tool for building your job applications. That's about it.
Ah, how about that? There is the feedback forum as well on Friday at 10:30. That should have been on the first one. So please come tell us what you like, what you don't like, and give us suggestions as to what you'd like to see even better. Alan Samuel is the contact. He was the guy that introduced Steve Naroff. Find him as [email protected].