Java • 1:01:09
This session covers features and capabilities of the HotSpot Client Virtual Machine, including Apple's innovative sharing technology, as well as application tuning and debugging tricks.
Speaker: Blaine Garst
Unlisted on Apple Developer site
Transcript
This transcript was generated using Whisper, it may have transcription errors.
Good morning. I was thinking about just sitting in the audience and starting this talk, because to some extent, the Java virtual machine is an invisible thing, right? It sits there and does your stuff. When you're programming in Swing or when you're running your application and things, the virtual machine is just there. It's just the utility. It's just the thing that makes your Java happen. And hopefully, if my group does its job right, you don't hardly even have to know about it, right? Well, that's our goal. We want you to just enjoy the benefits of it. But, in fact, it's an incredible piece of technology. I don't know whether you have heard of the game or have played the game, the Incredible Machine, or have kids who have played that game, but the Incredible Machine is this great little app on the Mac that lets you build and put together really fun little stuff. And I kind of think of the virtual machine, the Java virtual machine, as an incredible machine. So anyway, in this talk, we're going to talk about what we're going to... I'm going to get up on stage, as if I were still sitting in the audience saying all this, and tell you about it. So first of all, if you just came from Steve Naroff's talk, you heard Larry Abrahams get up there, and talk about how Hotspot is this next generation virtual machine. So, it's true. It is a fabulous piece of technology. So we're going to talk about what Hotspot's about, and why it's so good, and why some of the things we like about it. Beyond that, we're going to talk about what Apple does beyond that. What Apple does to add value to the virtual machine as we get it from the sun.
We will talk a little bit about what is in the Java developer preview that we are going to be releasing, I guess, later this week or any day now or by the end of the week, whenever we get it out on the website. And finally, we'll have some time for some Q&A. So first of all-- Java Virtual Machine Basics. What does a Java Virtual Machine do? So if I say JVM, by now I hope you know that I'm, it's the acronym for the Java Virtual Machine. The Virtual Machine is responsible for managing the threads of execution. Your code is written in Java, and it executes on various threads. So the Virtual Machine makes those threads happen. It's pretty easy for us because we have Mach and threads underneath us, But we keep track of what's going on there. And we, the machine, execute your bytecodes.
It is an operating system in a lot of respects. And having managed operating systems before, let me assure you, it is an operating system in many respects. The machine collects your garbage. Your garbage is your unused objects. These eat up memory. And as fast-- and one of the virtues of the Java programming model is you don't have to worry about getting rid memory when you're not using it. It helps us if you null out your references when you're not using it anymore. It helps you also, but it lets the machine take care of that stuff. The machine helps shift control back and forth between your byte codes and native code underneath.
Byte codes are great. So are native libraries that do great things like speech recognition, speech synthesis, QuickTime, GUI drawing, all that kinds of stuff. So there's so much you can get done in byte code and some stuff you have to get done in native code. And so the machine also handles the transition to and fro. Library Java Home is where we put the standard properties and things like that. And that's something that you get to extend. Cocoa Java, if you wander around with Java browser or something, is located in a couple other places.
It's under System Library Java. So if you poke around and find strange things in strange places, this is meant to help tell you where some of our stuff belongs. The next slide is about what we do, or what we think of, is your place. So in our system, Java Home is library Java Home. And the things in there that you extend with your code, with your applications, are the bin area. Sometimes you've got little helper utilities and stuff. Java Home bin is a great place to drop those. If you have jar files that need to be part of the standard extensions, that's what the extensions directory is all about. You put stuff in there. and you don't have to fool around with class paths anymore. That is great. Class paths are not such a great idea. Obviously, the other stuff there, fonts, images, you put your properties there, security data.
It's the general kind of dumping ground for all stuff, all sort of support data for Java programs. Cocoa and Mac OS X has different bundling mechanisms that let you package some of that stuff before, But if you're coming to us from another platform, don't want to rethink that part. Java Home's where you already put your stuff. This is the part of Java Home that we want you to extend. And of course, in applications, when you write stuff, you put your resulting product up there. So that's it with the basics. I want to talk a little bit about Hotspot. So what is Hotspot?
We take Hotspot from Sun and we put it and port it to Mac OS X. So it's... not hard compared to what it was on Mac OS 9. In Mac OS 9, we had to do things like invent a threading model. We had to do things like-- well, anyway, I don't want to go into it. So the adopt to Darwin phase is pretty easy because there's natural threads underneath, there's a natural file system model, so I/O just happens and stuff like that. So it's a porting job. The main thing where we add value at this stage is we write the interpreter.
This, you might think, well, the interpreter is just a C file, right? And you just kind of compile it. Well, that's not quite the way it is. So I'll talk about that a little bit later. But we get the interpreter up and running. And so now you can run Hotspot in an interpreted mode.
And then, of course, in order to make it run fast, we have to write a compiler. We have to write a runtime, a Jitsi compiler that compiles your bytecodes dynamically into machine code, folds it into your program and makes it run. So that's the basics for what we do with Hotspot, right? And then the point comes where we want to make it better. And I'm going to talk a lot about how we make it better. But let's just step back to Hotspot. Hotspot, this next generation technology, what is that really all about? Well, first of all, it's 700 files. I have four people in my group who work on Hotspot, as well as other things. So 700 files, that's 200,000 lines of code. It's actually 714 files, 220,000 lines of C++ code.
I don't know whether any of you just started out programming six years ago and have only known Java, but C++ code can be intricately complex if it's not done well. Luckily for us, Hotspot is done very well, but it is still an enormous undertaking just getting that part up. So let me talk about that interpreter for a minute. The interpreter is actually combined, assembled if you were, out of code templates every time you launch. So there is no interpreter.c file.
There are templates for the interpreter. Now why would you jam together an interpreter every time you launch the VM? Well let me tell you. There's two reasons. One is, we don't quite make use of this right now, but if you're on a G4, we might be able to have a faster little loop that could take advantage of a G4 processor. So if we had just one version of it, it would have to be tuned for either a G3 or a G4. Another one, though, which we do make use of, is if you're debugging, every time you execute a bytecode, you often have to ask, you have to say, "Am I supposed to stop here? "Is this a breakpoint? "Am I supposed to do something else?" And so there's at least an if debugging check that you have to do on every bytecode that you interpret. Well, not if you assemble the interpreter on the fly. So we check and we look and we say, are we running this in a debug mode? And if not, we assemble the interpreter without that little if check. So the instructions just go flat out. The interpreter instructions just go flat out as fast as they can. And it's really important to have a very fast interpreter because compilation, just-in-time compilation, takes cycles away from your program and you should only do it when you have to, when it's gonna be to your program's advantage. So having a very fast interpreter is very important. And so that's why that part of the technology is actually pretty sophisticated. I could tell you more about how when you're about ready to do garbage collection, you actually swap out the whole interpreter with a little jump table such that you jump into and start synchronizing with the garbage collector thread. But I mean, it is a bunch of very sophisticated technology. It's really cool.
But if you're not going to be spending your life in the interpreter, what you want to do is have a fast compiler. You want a compiler that compiles fast because it's, again, taking cycles away from the running time of your program. And you need it to build good code when it does spend that time. So our compiler is a fast compiler and compiles into pretty good code. We are anxiously awaiting the next generation technology from Sun, the 1.4 train, because it actually has a better technology for generating better code. So we're looking at that because, as Steve Naroff said, the pendulum is swinging back to the compiler part. So we generate pretty good code, and we're looking to make that code a little bit better for you.
Finally, Hotspot has a patented, I believe, implementation of synchronize. Now, synchronize is an interesting notion from the Java language viewpoint. When you go to access a vector object, for example, its methods are synchronized, right? And what that means is that they're safe if several threads are trying to do operations on that object at the same time. Well, the reality is that most of the time, When you use a vector, only one thread's operating on it at a time. In fact, maybe only one thread will ever operate on that. So the idea with the synchronize operation is that to get a hold of and acquire the lock for that object is a very, very fast, hand-tuned, assembled, compare and swap operation. And basically, it says that the data structure for that is built on the stack of the caller. And basically, it's very, very cheap. There's no extra data allocated for that. Only if a second thread comes in and says, I need to operate on this object also, do we build a heavier weight operation, a heavier weight locking operation, and actually go to the operating system to say, block that thread. We don't want them to spin. We want to put them to sleep until this other guy is done. So a very fast synchronized implementation is a trademark. I don't know. It's a patent. It's one of the really cool things about Hotspot. A fabulous thing about Hotspot is that garbage collector. I mean, when I grew up, I never thought I'd be extolling the virtues of garbage collectors, but garbage collection is actually a fabulous technology that lets you program a lot easier. And so I'm going to talk a bit about garbage collection right now.
Well, not quite yet, sorry. What are the benefits? I want to talk a bit about what does all this fabulous technology do for you in combination. So one of the people who works for me put together a little tiny benchmark. We call it the allocation micro-benchmark. It's from one to 16 threads or something like that it goes. Let me tell you about it. It's several threads running in this allocate and objects, free them. Allocate and objects, free them. How many threads can you get going at this? And he measures the peak rate of allocation.
So the fun thing about it is he wrote it about four times. He wrote the code in Java. He wrote the code in C. He wrote the code in C++. And he wrote the code in Objective-C, which is what Cocoa is based on. So then, of course, you run it on a multiple processor machine to make sure you've got two threads actually trying to do the same thing at the hardware level at the same time. So I have to warn you, micro benchmarks should be taken with a grain of salt, a lot of water, and don't think about them too much.
or don't trust them to predict your performance because they often focus on one very atypical usage pattern. I mean, you might use it a little bit, but you don't use it a lot. And so anything you see from a micro-benchmark about a particular little usage pattern, it's really hard, impossible really, to extrapolate from that to your program, to any kind of an extrapolated win for your program. So to emphasize the point, your mileage will vary. Obviously, it will be much less. So, let's talk about the results. So, with C code, this allocation benchmark gives us about 200 objects per millisecond.
Objective C is a little bit less. I think-- well, I'm not sure why, but there's a little bit of message overhead in there for that. C++ actually is a little bit better than C. You peak out around 200 and, you know, 225 objects per millisecond. With Java, in that interpreted mode, allocates faster than compiled C++ code. Not bad. When the compiler has run and is running those threads, the allocations are eight times faster.
That is pretty phenomenal. This is two threads trying to go after objects, and they get them eight times faster when you're writing in Java. For a point of reference, in MRJ on Mac OS 9, compiled as fast as it could go. Was still faster than C or C++, but it's just a little bit faster than Hotspot interpreted.
So let me talk about garbage collection again. Garbage collection is 41 years old. First paper on garbage collection was John McCarthy, 1960, where he talks about mark and sweep. Mark and sweep is the idea that you got your object laid out in memory, and you go and you mark every one that's still alive. And then you get to reclaim the stuff between the objects. So that's pretty cool. It was used, obviously, on a LISP system. Three years later, Marvin Minsky, of other fame, came along and provided an interesting paper on a copying and hence compacting collector where not only do you mark all the objects that are alive by descending through their roots, but you copy them into a new space. And so that compacts your memory. And so you don't have fragmentation issues that plague C programmers all the time, because your objects can be packed into the smallest memory they need to survive. And so this hugely extends the running lifetime of the program. It was a long time before the next major advance in garbage collection came along. And that was in 1984. Dave Unger put out a paper about generational collecting. And since then, well, over the course of these 41 years, there have been over 1,000 papers written on garbage collection. It's a great topic. Java is the first system where it really comes in the mainstream for folks, though. I got my data for this great book called Garbage Collection. And if any of this talk interests you or intrigues you a little bit, I highly recommend you go and buy this book. It reviews all the algorithms in a very, very great way. Let's talk about generational collecting. What's the idea of generational collecting? Most objects die young, right? You use an object just a little bit, it's dead.
So the idea is you split memory into generations such that You can minimize the number of CPU cycles allocating a new object, and you can minimize the number of CPU cycles to remember and keep track of the old ones. So the idea is that old objects actually often don't change that much in terms of what objects they hang on to. So if you can never worry about an object, if it never changes, then you don't have to spend cycles even remembering that it's still alive. So in order to make this really happen, the compiler and the interpreter implement what's known as a write barrier, such that if an object in one generation gets stored into an object in another generation, we keep track of that to say, hey, you better go look at these objects over here because they might, you know, we might have had an intergenerational reference here so that we can keep track of which objects are alive. So that's sort of the basic background technology. So what I want to talk about is how How is that employed in Hotspot? Hotspot has four generations running at once. First generation, the EDEN, is where you allocate objects. And basically, it's as simple as you got a pointer to the top of memory, you add the size to it, and you're done. You have an allocation. The only complication here is that the assignment for the memory, the mem plus equal size, is an atomic compare because you've got multiple threads that may be having to go after that. And you might have missed. So there's actually a little loop to make sure that you stored what you wanted. And so you have to loop back up and see whether or not you have to re-add from the top of stack. It's very, very fast. The so-called new generation-- It's where objects that survive, that survive the first run. I mean, that's the only thing you can do with objects in the new generation is you allocate them. You never worry about them again because if they ever stayed alive, the only way they stay alive is if they got stored in an older object. Other than that, they're dead. So you just assume that everything that's in the Eden space is dead because you actually track it. You actually keep track of what objects stay alive in the other generations. So in the new space, the new space is a two-space copying collector, you know, Marvin Minsky kind of technology from 1963, where objects that are in this space just get copied over to another one and compacted. And if they survive this kind of back and forth space a while, then we say, they're no longer a child, they're an adult. We push them into what's known as the tenured generation, and they stay there from adulthood till death. Actually, they can die at any stage, but their adult objects go there. Hotspot actually has two different algorithms for using to maintain the tenure generation. The one we ship with is a pretty classical mark-and-sweep algorithm. There's another one called the train collector, which you can get to with dash X, Inc. GC, I believe it is. We haven't done much experimenting or much qualification on that. We intend to, though, because the virtue of the train generation is it spends more cycles is keeping track of your objects, but you have less pause times when it goes to do and find some dead memory. So pause time is actually kind of important for GUI-based apps, isn't it? So we're going to work on the train collector and see if we can get it into shape to ship with. You're invited to go play with it yourself. Maybe it works just fine for you today. There's another generation, however, which is used for support objects. Those 200,000 lines of C++ code are possible because those objects are, for the most part, garbage collected. They're garbage collected with the same collector that is used for the rest of your Java objects.
So Hotspot eats its own dog foods. It collects its own-- it implements its own collector and uses it for its own purposes. So the permanent generation is where the support objects for the program are used. And other implementations, those usually just come out of the mallet heap. But in Hotspot's case, they come out of the so-called permanent generation. And that uses a market sweep algorithm. Objects there rarely die. So it's rare that we actually worry about those too much. Let me shift gears a bit and talk about the things we do-- the things we do to make Java better.
First of all, from the VM perspective, one of the things we do is provide better integration. Better language integration lets you see more APIs to use to write your programs. We like to provide better performance. Performance is critical to how your program looks, how it behaves, and we really believe in better performance.
There's general observations about performance. There's all different ways to think about performance. In general, you want to do more with less memory. One of the things that Hotspot does is in other implementations, they had an extra word per object just to keep track of that lock, just to keep track of whether or not a lock was around for an object. And they had another data structure, the handle, to keep track of where it really ought to be. So you stored handles and everything. And so all that pays off. So Hotspot runs about 10% in smaller memory, simply because it doesn't use handles. And it doesn't have extra data space for that rarely used monitor on every object.
For the client, Steve Naroff talked about how much effort is being spent on the server. Well, for the client, we think that scalability means running more apps for the same amount of memory. When he got up there and said, you know, it takes, I can't remember what the graph said, 60-some megabytes to run two Java applications. Well, we sell systems with 64 to 128 megabytes of memory, and we would love for you guys to write apps and ship them and have them run well on our out-of-the-box configuration. So for us, that means we have to make sure we use that memory to the best, in the least, in the most efficient way we can. Another attribute of performance that we work on is launch time. Nobody wants to buy an app and sit there and wait 20 seconds for it to launch. I mean, they put up with it, but it's not one of the things that they're happy about. So if we can make launch times faster, we're going to do it for you. And of course, faster running time. You know, especially from the VM perspective, the fewer cycles we spend thinking about what you're supposed to be doing means more CPU cycles for you to actually do it. So better language integration. JNI is the standard. There used to be others, but JNI is now the standard. If you've programmed to JNI, it can be a little cumbersome, right? Because you can't really get to an array. You know, an array, you've got to copy the array contents over, muck with them, and copy them back. And it's, you know, it's just cumbersome. You get these J object references and stuff like that.
But the value of that is that it allows that precise collection to go on within Hotspot. Since you never see a pointer to a real object, we can move it around. We don't have to examine all of memory and try to figure out whether or not that bit pattern really represents a pointer to one of our objects or just happens to be some, you know, your current net value of your portfolio sitting in MoneyDance. So that's the benefit you get for JNI. So what we do, we do two things to help extend the ability to program to JNI. We provide Jdirect. We use that internally for Swing and AWT somewhat. We also, QT Java, QTJ uses that. And that lets you sit in your Java code and get to the C routines. We've talked about that in the past. I'm not going to go into it in too much more detail. I'll have a code slide a little bit later. But in using it, what happens is you just kind of write your little wrapper class for the C functions you're going to be using, and then you have one piece of code that you do. You say, you asked JDIRECT to build you a library. And so it generates, it writes the JNI stub codes, links them in, and then you just start using your code. So in your static initializer, you just say, load me in, and JDIRECT does the rest. That's pretty sophisticated.
We also have a Java bridge, which implements the technology that lets Cocoa Java happen. There's a standalone tool, a Bridget tool, that is used to, it starts with a mapping file and says, this Java class maps to the Objective-C class underneath it. And the benefit of that is that for the most part, those Objective-C frameworks can now be subclassed in Java because the Cocoa frameworks are used with setters and getters. So whenever they do a setter, it comes across to the Java side and does things. and when it implements methods, we transform the method names and actually dispatch on the Java side and vice versa.
So when you do super in Java, it actually gets translated by the bridge and gets dispatched into Objective-C below. So an example, I don't know how well-- You all can see this. Not too bad. This is the JDirect3 example. I just pulled this pretty much straight off the web at developer.apple.com/java.
And as you see-- The first line of code, public static linkage, uh... needs to be done typically in a static initializer some reason why did not that show up here the the the new linker part is the part that you uh... should do in your static initializer and it tells it to go fabricate something for the class prime. It's a reference to itself, right? And so JDirect goes and finds through reflection, you know, what native methods, what static native methods are in there, what their names are and what the types of their parameters are. Then it goes and looks up in the runtime and says, hey, is there a, in this case, is there a compute prime function around? If you haven't loaded the library, it'll actually look for that magic string and actually load that library for you in case your stuff's out there. And so that's it. From that point on, you can now do prime.compute prime, send it a short, and it'll return a long, long.
And you're in business. You're writing in Java, and you're using this C-based library underneath you. The counterpart for Cocoa, Cocoa's a pretty rich framework. Steve Naroff says he's been working like Avi with Steve Jobs for 15 years. My tenure isn't quite that long, it's only about 11, but I had something to do with some of the Cocoa APIs in a role previous to the one I have now, and I wanted to pull up just a little bit of something I did a long time ago.
I can get to it from Java. There's a date formatter there. The date formatter takes a string and turns it into a date. and more than that it can take a date and turn it into a formatted string so this is an example that does that so the key element here is the Let's see where to go. The next Tuesday at dinner. That's a pretty simple little English string. It was a weekend's worth of hacking. It's kind of fun.
But that actually turns into a real date. So you can actually get to Cocoa from your Java and make use of it without having to wade through Objective-C, without having to wade through JNI. And I invite you to take a look at the Cocoa examples that are shipped. Under Developer, Examples, Java, AppKit, there's actually two or three programs completely written in Java. There's a game called Blast App. There's the Sketch program, which is a simple Mac draw kind of a game or a program. And there's a text editor in there. So go play with Cocoa. It's kind of fun. Let me talk now about better performance. I said we tried to innovate in two areas. One was better language integration.
The next one is better performance. Better performance, we all want it, right? Question is, of course, how? I mean, it's not like you just walk up to your program, unless you have optimized it, and say, how do I make it faster? And it's obvious. With optimized it, it actually is. In our case, We had to scratch our heads a little bit, right? We said, what are the basic principles of performance? Well, if you remember, if you think, if you've ever done performance work before, you should know that memory is evil. If you are wasting memory, you are going to spend more time taking it away from a system that might not have it. You might have to bring it in from disk. You might have to-- I mean, just memory is evil. If you can use less memory to get your job done, your system's going to run faster. Cycles are moving-- the rate of increase of CPU cycles to memory bandwidth is just continuing to-- the disparity just keeps getting larger and larger. And to ameliorate that, we keep putting more and more caches onto the chip because memory has to be really close to the CPU. So just think, memory is evil. Remember that. The next thing is that, of course, you should steal good ideas. I mean, why invent totally new stuff if there's already some good ideas out there already?
So if we think about memory and we think about good ideas, what do we come to? When you talk about C technology, long time ago they put shared libraries into the system shared libraries are a mechanism for sea libraries for uh... for programs to share instructions, right? So, you know, we keep thinking.
The C libraries, what do they share? Well, they share their instructions, right? That's a dominant cost. There's a little bit of utility in that, and with the dynamic shared library, you can swap implementations out without having somebody relink. So there's a little bit of code portability in there, but sharing the machine code, the actual assembly instructions, is the dominant savings for shared libraries. Another large savings, though, is the data that goes along with it. And so obviously, how do we reapply this? What about using or building some kind of sharing for JAR files? So if we look at an initial memory configuration for your running app, this is the memory layout for something that's just getting started. I actually put in some realistic numbers, real numbers, for your Java application. The point here is that the Eden space is actually pretty large to start out with. The new space is where the little back and forth copping's fairly small.
The tenured generation, remember, that's the one where your objects live in adulthood. And then there's that permanent generation, that sort of you don't know about it, but it actually costs you kind of place, right? When you get running, that whole space gets dwarfed. The hotspot keeps the ratios of Eden to new and the total of that to tenured. They keep the ratios the same. But in this case, a 35-megabyte application, the tenured space is where most of your stuff lives, But doggone it, that permanent generation, the place where we keep things like your bytecodes and stuff, takes up a fair amount of space. Now, wait a minute. Bytecodes. Wait a minute. What about all the bytecodes for things like swing, things like Java-lying string? I mean, does your program have a different version of the bytecodes for Java-lying string? Of course not. It's the same bytecodes. Well, why does your program in memory have a different copy of it? No good reason whatsoever. So when we took a look at what we could share, we figured out that it's that space for the byte codes. It's that space for the metadata for your program that comes out of the standard shipping system libraries. So what we did was, so imagine that red space. That red space gets split up into three sections.
There's a section that is completely shareable. the completely read-only part. There's a section that is mostly shared. It can be touched on, but it is mostly shareable. And then there's still your classes, the bytecodes for your classes that aren't really shareable to anybody. So this is a review slide. What we did was we added a new generation. We call it the shared generation.
It has no CPU cost to maintain because it's there to start out with. it doesn't die, because these objects are immortal. So that's pretty cool. If we don't even have to build these objects, and we don't have to even maintain them, that offers us a CPU savings as well. So in addition to reducing memory, we get to reduce the CPU cycles to get to this initial configuration and to maintain it during the running time of your program. So let me talk a bit about the shared generation. It's based on the observation that some objects never change and never die. So those are the objects we want to share. Those are the objects we maintain on your behalf for the byte codes, for the strings and stuff in your jar files, or in the system jar files at least. What we do is we process those standard jar files once. We have an ordered list of the classes that typically get used in a swing application. We load them into the VM using a special option, which I'm not going to tell you about. A key point here is that we don't execute any bytecodes. Typically, when you load classes into Hotspot, you, of course, run the static initializers. Well, the static initializers can do things like look at your command line arguments. and go look at disk memory. They can do arbitrary code, right? And so that would change the state of the program. So the idea here is that we want to just preserve the jar file. We just want to have an in-memory version of the jar file. The useful, the running the useful part of the jar file is the part we want to save and share. And so we don't execute any bytecodes. And then we use that fabulous garbage collector technology. There's a little part of it that just says, iterate every object in this generation and do something to it. Something like a closure, only it's written in C++. But anyway, we reapply that garbage collecting technology to pack all the objects that ever got created and pack them into these two spaces, the shared read-only space and the shared rewrite space. And then, of course, we write that space to disk. And the next time you start a hotspot, you just map that into memory, do a little bit of fix-up, and you're running, right? Piece of cake, simple. This is called pickling. Swizzling, no, not swizzling, it's not pickling. Map and go, I can't remember what, there's a, I don't know, there's a term for that. Map and go, maybe that's the right term. The shared generation benefits, I think I hit on some of those already. There are virtually no CPU cycles used for the shared generation. That's what the asterisk is about. The read-only part, that's true. The read-write part, we do spend some cycles actually a few more than we need to, but it's almost totally free.
We rarely read the standard jars. Classes.jar, ui.jar, we don't even read them to get you started. That saves cycles to process them. It saves you memory to, you know, read map the index to the jar file and to, you know, read part to map it in and to wander through it and copy the stuff out to make our versions of it and stuff like that. And obviously the disk I.O.
to get those things off the disk. So if you never have to read them in, that they're not sitting in your disk cache. So that helps with the rest of your system's performance as well. So one of the benefits from that is that a hot start. You know, the second start of any Java program is always faster because we save all those cycles to begin with.
A secondary benefit of this technology is that we can be smarter about how we lay those runtime data objects out in memory. So for example, there's linkage strings that keep your class, that reference, whenever you reference another object, there's a little linkage string that goes in that says, you know, Java lang, da da da da da da da, or your reference to your classes. actually laid down in the metadata and those strings are rarely used but if your byte codes are sandwiched between those rarely used strings what we do by pulling the strings out and putting them in their own space that's hardly ever used and keeping your byte codes hotter then we never even pull those pages in off of disk that reference the data that you never use and so your working set actually gets smaller because we've done packing to put the hot data into the memory pages that you're actually pulling off a disk. So those disk I/Os pack more punch because they bring in more usable data due to this packing benefit.
This sharing benefit was the one I started out with. It's the last one I want to talk about. Steve showed you how, altogether, the combined benefits were 20 megabytes for two applications. You know, the benefits for three and four and five are the same. So sharing saves, we've measured three to six megabytes alone. The other processing adds up to some of that other data.
The other reductions in the working set add up to some of those other benefits. So if you're writing a swing app most of you are, you're going to be saving and getting that for free using our shared generation technology. There are just a few caveats. We don't yet know how to share your application jars. Well, your application jars actually aren't all that often shared, but getting that runtime launch time benefit would be pretty cool. So we're going to at least try to figure out how to map and go your stuff so that your stuff launches faster. The first start of a Java application is actually a little slower, and that is because we actually have to do some processing for all those swing classes. We have to do some processing all up front that we typically meter out as you load them on demand, and we're working on ways to not have to do that. The interpreter, those bytecodes that you execute have to be slightly slower, But since in Hotspot you spend 90% of your time in compiled code, slowing down the 10% you spend and interpret it by 1% or 2% isn't a big deal. But I just want to be truthful. A caveat here is that what we share are the classes on your boot class path.
Now, for some programs that alter your boot class path, Hotspot takes a look at that and says, uh-oh, we don't know what they're doing. So in JBuilder's case, for example, what they've done is they've provided their own implementation of certain AWT classes so that they can use them in their great designer. Their designer is a great tool. So if you're using JBuilder for designing swing applications, here's a tip. You can get sharing for JBuilder, by configuring JBuilder 5, which just got pre-pronounced and it's in your bags, by adding a line called addSkipPath.dot/lawt.jar. That lawt.jar is their jar file for giving them a better AWT.
And that's a configure, that line goes in a file. I called it JSA. You can call it anything you want. The magic is.config in the Open Tools area of JBuilder 5. So directions for our sharing work. First of all, we know how to and we can improve the hot start launch time even further. We can and know how to eliminate-- for the most part, we know how to eliminate that first start penalty. We know how to extend-- or we want to extend this fast start launching behavior to all the files, or at least the ones we're told to, all the jar files that exist in that extensions directory.
we were really pleased with that second order benefit of packing data. So what we would like to do is rather than gather all the data for a class and jam it, we want to make the observation that some methods are never used in a class. So the byte codes for some methods shouldn't be on some of those pages that get brought in. So we want to start packing based on the methods that are used and not just based on the classes that are used. This may well double the benefit of our sharing already by reducing your working set by even more.
We of course have to finish the GC work on the read-write shared generation. And we of course could figure out how to share more runtime data structures that live in that permanent generation. So, I'm gonna go back to the last slide the um... there's a few things were not trying to do right now non-directions it's important when you're setting out to to build something to know what your goals are and if you can to identify the goals that you're not going to try to worry about so the biggest one for us is we're not going to share the machine code that gets compiled for your bytecodes. I mean, that is the first thing that other shared libraries, that the traditional C libraries share. But in Hotspot's cases, you've got to remember what Hotspot's about. Hotspot is about compiling the methods that you're actually using. Not only combining them, but inlining methods that they use, so that you get one long pile of code that is really hot, because it has everything it needs to get its job done. So that highly tight code is really good for you. So when we've measured how much code do we compile, it never has exceeded two megabytes. When you're running with Hotspot, applications like JBuilder and stuff, we never end up compiling more than about two megabytes of code. That code is not worth sharing. That code is the stuff that's hot that is for your runtime. Because every time you run an app, of course, you get different hotspots, right? You shift into this area, and it needs to do that. And then you shift into another app. It's all based on the work program.
So the idea with hotspot is it's going to optimize what your program is doing right now. And so if we tried to share that, we wouldn't do as good of a job. So we're not going to share the compiled machine code that we've built on your behalf. So the other reason is it's kind of hard, right? Because if you do try to share that, then it has to have relocation data in it. And so rather than folding in a branch to a direct address, we have to fold in an indirect. It's just gets messy. It's not very good. There's only one place where sharing is not good.
compiled bytecodes might make a difference. And that might be for, say, the static initializers or the code that you actually run to get up and running. If the interpreter-- if that's a dominant cost to getting a program up and running, it might be better to have a precompiled but not so good, but compiled. Compiled, not as good as Hotspot would do normally, but compiled better than the interpreter. It might be better to have compiled code start out. But that's sort of pre-compiled stuff, and I wouldn't even characterize it in the same way. So we might look at that.
Another thing we just decided at the outset was we are not going to try to share-- have any kind of shared buffer, shared read/write buffer of loaded class information, a shared read/write buffer of compiled code information, a shared read/write buffer of anything. Because you know what happens when you have a shared read/write buffer of something? Some other app can make you crash. We do not want that to happen. So that's just not a design point we're going to provide.
The status of the shared generation. This code that I'm talking about is in Mac OS X. We shipped it on March 24th. You're getting it already if you're using Java. The Merlin, we've asked, we've talked to some, we talked to some about a year ago and said, you guys really ought to do something about sharing because that's what scalability means for the client. And so they said, well, the way you do this is you file a little, oh, God, I can't remember the term, not JSR.
you file a little, you put a feature request into Merlin through the open community process and stuff. So we sponsored one of those guys, and it's a feature request in Merlin, which is their code name for JDK 1.4. And more than that, we talked with these folks. The VM teams know each other. And we talked with them and said, have you guys thought about doing this? And what about that? And stuff like that. But anyway, so we've worked with them, testing our designs out with them. And as we develop this thing, and we've provided this code, we've provided it back to Sun so that they can use it for their implementation of this little feature request. So current status, well, you have to talk to Larry about the current status of that. So we've had very positive interactions with Sun on this work. This comes about in two ways. With WebObjects, for example, with WebObjects stress testing, they ran into some bugs. And we kind of chased them down. We go, hmm, this is bugging what we call the portable code. So we call up our friends across the street and say, did you know about this? They go, hmm, no, we didn't. So we're actually feeding bug fixes through the indirect channels and making Hotspot as shipped by Sun better for everybody. And of course, they've given us some feedback on approaches to take when we run into some problems and stuff. So the feedback goes both ways. I'd like to spend the last-- not the last section, but the last section before Q&A. The last section of this talk, I'd like to talk about what's in Developer Preview 1, which you're going to be getting either today, tomorrow, or the next day before Friday.
The JVM in Java DP1, it's basically got about two fixes since we shipped it in Mac OS X. I just talked about them, actually. The WebObject stress testing gave us, showed us two things once they started kicking off. And so we've upped our mean time to failure to at least days. I'm not sure. We don't know of a failure right now. But it was running in terms of hours. and after about 48 hours of continuous hammering, a bug would show up. And that's the bug I alluded to that we figured out with Sun's help. The other thing, though, that was not quite right in Mac OS X GM was that debugging was really slow.
painfully slow. And profiling didn't work. So that's kind of bad. So in DP1, what we've done is we fixed both problems. We fixed profiling and we fixed the speed of debugging. And the We took hotspot 2.0 from the 131 technology train and packaged it as an extra VM sitting somewhere in that little implementation space I told you about. So there's actually two hotspots in DP1. The one that's configured for normal use and the one that is secretly utilized whenever you do debugging or profiling. So now why would we do that? I mean, where do we get that VM from? Well, obviously we're working on 131, right? So we wanted to get 131 out to you in some ways, especially for debugging and profiling, because we think that's really important. The benefits from Hotspot 2.0 is, again, it's the client compiler technology from Sun. They also have a server compiler.
The debugging, as I said, now works fast. Profiling works, whereas it didn't work at all. 131 is actually, hotspot 2.0 is actually, you know, a next generation of the next generation stuff. And they have a register allocator technology in there that we can use right away. And we do use that, so we get better register allocation when we're compiling. And it foreshadows the compiler that they're working on for 1.4, which does even better code gen.
So we're prepping ourselves for getting on board with the 1.4 work. But we didn't stop there. I mean, this is Apple, right? I want you guys to come to expect more from us than just what you can read on the webpages at Sun. So what we've done since then, or since we shipped GM, or Mac was 10 GM, was we put some smarts in to recognize when you're on a G4. Now, what could you do differently on a G4? Well, a G4 comes with this thing called a velocity engine. Now, what's a velocity engine, right? You're supposed to do graphics with that, right?
Well, it's a special processing unit for doing highly fast pipeline graphics operations. To do pipeline graphics operations in a high-speed way, you've got to read memory like mad off the bus. Well, if you can read memory like mad off the bus, you can use it for simple things like copying memory, can't you?
So we put a copy memory implementation in there that went on G-Force, uses the Altivec, and it is dramatically faster than just any kind of C loop or assembly loop you can write in PowerPC yourself. So we have that, and it's in our 131 version of Hotspot, Hotspot 2.0. We put in an optimized instance of.
I mean, this is just an example of lots of little things we do for you that you guys will never hear about. All you're ever going to see it is it improves your runtime. But when you do instance of, you typically think, well, how would you do it? Well, if it's not this class, I've got to look at the parent class, got to look at the parent class. Well, we put a table in there such that it's always-- it's a constant speed operation. Instance of works. So it's fast.
we put in even better register allocation than what came from 1.3.1. 1.3.1 still doesn't deal with floating point registers very well. So now we have a better floating point register allocation method for when you're doing those graphics operations. Sharing is not in this little package for profiling and debugging use only of VM. We know, actually, how to make startup times even faster, but since that's part of sharing, it's also not in that thing yet. And 131 also has a technology known as per-thread allocation pools. So remember that very fastest Eden technology that I talked about where you bump the pointer, but that reassigned to the memory was that compare and swap instruction.
Well, with a per-thread allocation pool, you don't need the compare and swap even. So it really is just about three instructions to allocate memory instead of a stall the processor and check with the other CPUs maybe next to you and make sure they're not using this memory kind of instruction. So it actually is going to be really, really fast. So... I put this up here because, I mean, we're starting this kind of beta train thing with DP1. I want you guys to play with it. So if you want to use it for casual use, use it via the command line like this. You say java-hs1-3-1. Hs1-3-1 will run this new version of Hotspot on any program you throw at it. If you really like it, try using it all the time.
There's a link, a symlink. I'll let you go explore. But there's a symlink under Java VM framework that points to the version of Hotspot that actually gets used. It's the libjvm-dilib-symlink. Slam it to point to that thing. You'll find an hs-131.dilib somewhere. If you slam the jvm-symlink to point to it, you'll get Hotspot 131 all the time. Tell us about it.
So, when is this thing going to be available? I wish I knew. No. It's going to become, it's real soon now. So, to get to it, you sign up at developer.apple.com. You go to the connect, after you sign up as a developer, you're all developers, you're here, right? Okay. You go to connect.apple.com and you download it. When you download it, what does it do? It preserves your existing 1.3 implementation. 1.3 is a subdirectory under Java Frameworks, so it pushes that aside in case you don't like what you got. It preserves what we find under Java Home that we think you've augmented, specifically the stuff that's in lib, including your extensions. You know, any other third party, you know, even QuickTime is in there, right? Stuff we ship gets packaged up in extensions. So we preserve everything that's in extensions, because we actually put some other stuff in there. And we preserve everything we find in Java lib home bin. So that's the main motivation for that first set of slides that tell you the stuff that we consider our implementation and the stuff we think you should extend. Because we do need to upgrade. You want us to upgrade. And we've got to agree on some rules as to the stuff we can upgrade and the stuff that we shouldn't upgrade.
So there is a mailing list, javadev, that you can get to. Go to, as I said, that page where I pulled the JNI example or the JDIRECT example off, developer.apple.com/java. There's a section on there that talks about the javadev mailing list. And sign up. Members of the extended Java team read that, respond to it. We found it very useful. And we appreciate your comments from that. So a quick roadmap.
The first one, wrapping Mac OS APIs and beans. If you went to Steve Naroff's one, you saw Steve Llewellyn. Steve Llewellyn is-- proud to say works for me and does. I've empowered him to go do great stuff making more Java happen at Apple, and so he came up with some great APIs. The stuff you saw there are APIs. They're beans. You can use them inside JBuilder to add that kind of technology to your apps.
So find out all about it by going to the session 502. That's today at 5 o'clock. Java development tools, Steve Naroff talked about. That's tomorrow at 10.30. Java performance. Performance is critical to us. So we have a whole session on how how you can add performance to your programs, how you can discover it, things to avoid, things to do. Part of the Java development tools talk is the optimize it demonstration and JBuilder debugging and project builder debugging. And if that's not enough, if you really-- I put the JBuilder reference up here as well, because JBuilder is just an awesome tool for building your job applications. That's about it.
Ah, how about that? There is the feedback forum as well on Friday at 10.30. That should have been on the first one. So please come tell us what you like, what you don't like, and give us suggestions as to what you'd like to see even better. Alan Samuel is the contact. He was the guy that introduced Steve Naroff. Find him as Blucher1 at Apple.com.