Java Web Services - WWDC 2002

Java • 54:59

This session focuses on the web service available in Mac OS X with standard Java extensions and open source Java software. Learn how to use these technologies to get to web services rather than publish them. Topics include UDDI, SOAP Axis, and XML parser libraries from Apache.

Speakers: Blaine Garst, Greg Parker

Unlisted on Apple Developer site

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.

Good afternoon. We're going to hear about the Java Virtual Machine. I am trying to gear this talk a little bit towards those who may never have seen Java on Apple and those of you sort of in the middle who have seen a little bit of Java, particularly on Apple.

And then some of the talk should be for those of you who have seen Java on Apple for quite a while now and just want to know what's really new and what's interesting and what's different. So a little bit of a gauge, I want to know whether anybody in this audience other than the people who work for me was actually saw my talk last year.

Okay, so, all right. A few. Well, you're not going to hear last year's talk, although I did steal one slide from it. But today, this year, what we're going to talk about is an overview of what the VM is all about, what it does, what it does well. We're going to talk a little bit about where we've gone from last year at this time.

[Transcript missing]

First assertion. So what does a Java VM do? A Java VM basically does four things.

It First of all, the first thing is that it executes bytecodes. You write your Java code once, right, and it gets turned into .class files which get sucked into jar files. And in theory, you're supposed to be able to ship those jar files anywhere on the planet on any machine. And they're supposed to run and they're supposed to do the same thing. So in order for those jar files to do the same thing, you need an engine to actually interpret those bytecodes and make it and do it in exactly the same way on that machine.

So that's mostly what the virtual machine is aimed at doing, is implementing those bytecodes. That's what everybody thinks about the first time when they think about what a Java VM does. The other things it has to do, though, is it has to manage its own threads, you know, the threads on the system, because part of Java is lots of threads, right, synchronization and all those kinds of things.

VM has to deal with memory management. Those of you coming from a C or C++ environment know all about what I'm talking about here. We take care of your garbage for you. We spend our mental cycles figuring out how to do that so that you don't have to spend cycles on that and you can spend your cycles thinking about the problem you're trying to solve.

The last thing I consider a VM, in a broad sense, of having responsibility for is going outside the machine into, back into C code for one reason or another. And both going out into C, but also being able to be called into from C at times. So a VM manages these four things.

It has to do it well, or we want it to do it well, so that your code runs and runs very efficiently on the system that we give you with Mac OS X. So the real question, or the thing that I want to try to get across in this part of the talk is I want to give you a measure a little bit about how well we do Java, the Java virtual machine on Mac OS X.

A few weeks ago, we had Java 1, which is where you learn all about Java. And I was kind of surprised to see that Java Pro had picked us as one of their finalists for the best Java VM. I thought that was pretty neat. And so -- We were finalists.

We didn't actually walk off with the total reward, but that got me to thinking, and that led to this slide, in fact. We are, by perhaps some people's opinion, you know, the best of breed for Java VMs. I put a question mark up there, not because I doubt it, because I know, but it's really for you guys to decide, right?

So I'd like to make this assertion, see whether or not I can, you know, this hypothesis or whatever, and I'll let you guys figure out, and you can come to your own conclusions as to whether this is true. So I'm going to be talking about these four things, or these four areas, and in fact, these are going to support those four theses that I think that a Java VM has to do altogether. So first of all, advanced language integration.

We have, in addition to the JNI that you all know and love from any other platform, we have two other systems that allow you to not even think about JNI because we do the JNI thinking for you. The first of those is JDIRECT. JDIRECT allows you to, from Java, access C function calls. Well, C++ if you could really get the naming right, but for the most part, C libraries and whatnot.

So you can write a little bit of Java code that basically says, open a file. And you'll get back an integer that's your file descriptor. And you can write data to it. And you can get, well, you can't get to Erno, but you can get the read result back from that. And the fun thing about that is you don't have to write any JNI code to get to that. And that's because we do that for you with a facility we call JDIRECT.

It actually writes the stubs on the web. It writes the stubs on the fly using sort of the type information that you provide in Java native, in your declaration of a Java native method. And we build the JNI stubs for you on the fly. A different category of language integration we provide is that from the Java bridge.

It turns out that... If you went to the job overview, you heard James Gosling say that, you know, there were a few guys from Next who went over to this thing called First Person and they took some of their ideas along. Well, it turns out that a lot of those ideas got reflected in Java.

So the Objective-C runtime model, the object model and runtime model are almost identical to Java. So we've been working with this technology for a long time. And what that allows us to do is essentially map one for one the Cocoa classes and methods into Java counterparts. And we use the Java Bridge technology to write what for us are those JNI wrappers in a very neat way such that you can actually subclass. So all of Cocoa comes up with a Java face. So there's Java, there's an NS application, there's an NS view, and you can subclass it in Java.

And, you know, when you do super, the method goes across the bridge into the native Objective-C stuff and things happen. So you can subclass NS view from Java and we make use of Cocoa underneath. Pretty neat trick. That's the kind of language integration that we think is a big value add.

This is nothing new for us. We've had both of these technologies for several years now and they both shipped last year with Mac OS XO. I won't go into them anymore. There's release notes and our email addresses and all discussion lists and stuff that can talk about this. Let me talk about the Java virtual machine itself. We did superb language integration. The next thing is the virtual machine itself.

We use the Hotspot virtual machine from Sun. Sun provides us with this huge pile of code, 700 files of C++, tightly designed, well abstracted. It's our job to make it run on our PowerPC architecture on our Darwin Mach BSD system underneath. The neat thing about the Hotspot virtual machine technology is that it is years, has been years in refinement.

We get most of the benefits of that refinement without us having to add all that much innovation in terms of how it decides when to do a bytecode, when to do this, when to do that. We get it up and running on our Mac OS X system and do the small job of the PowerPC part, which turns out to be 70% of the work.

In any case, it's tuned such that only 10% of your time is really running on the Hotspot virtual machine. The next thing we want to talk about is the interpreter. The interpreter, as I say, is on the fly built, which means that we can take advantage. Every time you start up Java, we create a new interpreter for you.

We make that interpreter different based on whether you're on a G3 or a G4 or whether you're on a single processor or on a multi-processor. So that interpreter can be really tuned. The other one that we see a lot is whether you're running debug mode or not. If you're running debug mode, the interpreter for every little, you know, bytecode has an extra branch in there to find out whether or not you're being--whether you have a break point or other things going on.

These--this is very advanced technology even for the interpreter. So it has to run fast but the real bulk of your code, the bulk of time spent in your code is done in the compiled stuff, where we take those bytecodes and we on the fly compile them, hopefully without any pause times while you're running your app, visible pause times that is, into a fairly small amount of code relative to the size of your application.

We've really only measured about two megabytes worth of compilation time. So you can compile code at any time. That is just really awesome stuff. I'll talk a little bit more about compiler stuff later in this talk and then on Friday there'll be another talk where you'll learn--you can learn even more about--and about why you need to know a little bit about it.

The other thing about Hotspot-- It has a patented low cost synchronization mechanism such that if you have synchronized methods, there's virtually no cost to using them. So you use them liberally. Use them freely. If there's no two threads contending for it, it's virtually costless. We make it as cheap as we can for the CPU you're on. For the PowerPC, that is darn cheap.

There's only a traditional expense of a big, heavy, go to the kernel and wait kind of stop when there's actual contention on things. So the finer grained your locks are, the less likely you're going to get contended. Part of the patent has to do with the fact that there's no memory overhead for the synchronization either. You can synchronize on any object within Java, but we don't have an extra lock word for that.

We build the lock word on the fly, on the stack. It's very neat. And of course, Hotspot brings with it a state of the art generational GC. Last year I talked about that in a fair amount of detail. This year I'll talk a little somewhat less about it, but give you some of the high points.

Tight OS integration. We know our OS. Hotspot comes, I mean, it's like a hand in a glove. Hotspot on top of Mac OS X is just so tight, it's beautiful. The threads are one to one, files are one to one, I/O is one to one. It is like the Java virtual machine is just right a thin layer above the operating system.

It's really neat because that way the operating system guys get to do the scheduling stuff. The operating system guys get to do the I/O buffering. The operating system guys get to do all the stuff that they do well. And, you know, from the VM viewpoint, we just hand that work off to them.

We like that part a lot. We use, for example, we have some tricks. We know the kernel guys. We ask them for favors. So in certain places where on other VMs you would see, you know, code expansion to do, you know, like null pointer exceptions, that's what MPE is there. We actually put a little trap word in place--well, we actually take a fault. We take a memory access fault from the kernel.

It's turned into a Mach exception. We intercept it. We say, "Hmm, this is an area that we know about. That was a null pointer exception." In other places we insert actual trap instructions and take those, fault those, handle those, and deal with that. What that gives us, again, is gives you very fast code paths and you only pay for them when you use them. So the basic advice here as you've read in any Java program is don't program using exceptions. And it's because they're very expensive and that's the design point.

The last thing we do is a little thing, but Mach was a design, one of the designers of Mach was Avi Tavanian, who you may have heard of or seen before. And one of the things he did with Mach is give it a great virtual memory subsystem. So part of the virtual memory subsystem is the idea that when our garbage collector is done with a whole chunk of memory and we no longer want it, it's in our address space and typically it would get swapped out to the disk once we stopped using it. But what we do is we tell the OS, "Hey, we don't need those pages anymore. We want to keep the addresses, you know, reserved. But we don't need the dirty bits behind it." And so the OS just kind of starts using those pages right away.

The next time we start faulting it, it gives us zero-filled pages. So it's just little tricks like that. We try to, you know, use them wherever we can so that you guys, who program in Java, just don't have to worry about any of these little nasty little details that make a lot of difference.

CPU utilization. Not only do we know our OS, we know our CPUs. As I said before, the code gen that our compiler does as well as our interpreter varies based on whether you're on a multiprocessor system or on a single processor system, whether you're on a G3 or whether you're on a G4.

What could be better? You don't have to pick your processor type. You know, as a compiler group, we don't have to pick something that works everywhere even though--and best on one but, you know, not as great on another kind of thing. We can do the best job for you at runtime, and we try to.

For example, last year we introduced the use of the Velocity Engine, which is on G4 chips, for fast copying. The Velocity Engine can soak the memory bandwidth coming out of the chip like nothing else can. A tight loop cannot drive memory faster than the Velocity Engine. We use that to copy bits.

It was so good, in fact, that we've handed that code off and it's now at the base of Mac OS X and is used for -- is it MemCopy or B-Copy? It's MemCopy. It's based for MemCopy and Jaguar. Some of our ideas are done so well that they're even getting picked up elsewhere within our system. On the G4, the G4 has a lot of registers.

So when we compile things, we don't have to follow the ABI that was laid down, you know, how many years ago when they designed the PowerPC chip. We have our own ABI, you know, where registers get, you know, where the parameters go, where locals go. And so we actually dedicate the use of a few registers for things that make interpreting and compiling Java and running Java even faster.

One of the other uses for registers is cache locals, of course. This is a standard compiler technique. If you have a lot of registers, you just dedicate or you get to use some of them for local variables and whatnot. We've introduced that in our hotspot technology in the 10.1 timeframe.

We, another little nifty-ism, the CPU has, the CPU chip has sort of a countdown timer on it or actually a continually clocking timer. And so we gauge and make use of that for use when we do get time millis. So when you need to do timing of your stuff, we don't have to go to the system to make use of that.

We can interpolate from the system and use of the clock register directly such that our get time millis is bloody fast. We were actually talking in the hallway earlier this afternoon about not even having to go to a subroutine for that, but actually inlining it every time you use it inside your compiled code. No promises on that one, but it's an idea for us to explore.

Let me talk about this. So the thesis is we do a really good job with the compiler, with the OS, with the CPU. What better way to show it than to Cook up a micro benchmark. A micro benchmark, most people call them benchmarks.

[Transcript missing]

Now, the reason I like this benchmark in particular is that it's multi threads, there's contention, there's that garbage collection going on, and then you got the effect of the compiler coming in when you run it enough so you're actually compiling this code. So it shows all those four things I think a VM has to do well. We have to run it on a dual processor machine, of course. And the famous note about micro benchmarks, it's a contrived usage pattern.

Your mileage will be way less. Don't look at this as being anything representative of speed advantages you're going to see in your program. But let's see what happens with this. So between C, C++, and Objective-C, let me explain the slide a little bit. The vertical column measures peak allocations, cumulative, I mean total allocations per second.

[Transcript missing]

When we compare Java to this, however, we have to change the scale on the graph.

The, what we provided to you in 10.1 for a single threaded was 8,000 allocations a second as opposed to 700. So that's over 10 times faster. Note that as soon as we go to a multi-threaded case though, Java's allocation rate drops significantly, more than half. But even so, we get 3,000 allocations per second in the multi-threaded contended case.

And you see that it's basically linear. No matter how many threads you get, the, you know, the total number of allocations really doesn't drop off. That's really what you want to see. For We've done a little bit better. In fact, we've doubled the contended allocation rate. And I'll tell you a little bit about how we did that as we go on. Memory utilization.

Pre-Mac OS X, 10.0, and all other Java systems today. have the following issue. JAR files are not shared libraries. The classes.jar that contains swing is not a shared library. Every app has to read those system JARs, has to process those system JARs. They're compressed, remember. Has to build up data structures to make use of those -- make use of the data that's within them. And, you know, literally it keeps a copy of all those byte codes in every application. In Mac OS 10.0 last year, we introduced a technology known as the shared generation to help resolve some of these issues.

A hotspot garbage collector, GC garbage collector, that we get is a very sophisticated garbage collector. What it tries to do is spend as few cycles allocating an object as it can, because most objects die young. And if they're not dead, it spends as few cycles as possible remembering that they're alive. And so what they do is they provide essentially four stages, four different types of generations where in the first stage, the first generation, the fastest generation, it's essentially a pointer bump. Now, in Mac OS 10.0 and 10.1, there was a little bit of a lock around that pointer bump.

And in Jaguar, actually in the Java update, there is no lock for now. And that's where we got our doubled performance. The other generations, two-space copy is where things live for a little while. And tenured is where your objects live when they're adults. Now, the permanent generation is kind of a hotspot implementation detail, in that all those byte codes and stuff that you get out of class files, that stuff lives, that metadata lives in the permanent generation.

The shared generation idea we introduced last year was that those metadata objects, those bytecodes, many of them don't even change and they essentially never die. So we added sort of an immortal generation. What we do is we pre-process the system jars once, reading in all those byte codes and strings and class tables and all that stuff.

And if you ever watch your Mac OS X machine, you might see this fleeting message, building Java shared archive. Well, that's actually when we cook it. We cook it when you boot your system. So we cook up all that data, we write it out to disk, and every other time you launch Java, we just M-map that region in, do a little bit of patch up, and run.

And so, guess what? None of that time is spent reading those IOs. We get to share some of that memory. It's just a total win. I'll talk a little bit more about it. So Apple's Hotspot Garbage Collector has that shared generation as a new technology. technology at the bottom.

So this is a very busy graph. I won't go into it in detail. If you look at the middle graph, that purple area is the region we use for all those byte codes and things like that. And as I said before, we split it into a couple regions. One's totally read-only, totally shared. The other one is shared until it's written to. And then for dynamically discovered class files and stuff, you still have a bit of purple. So your code goes in the area, the purple area at the very beginning.

Now, the really neat thing about this

[Transcript missing]

and We spend zero cycles on them. I mean, how fewer cycles can you get, right? Zero is zero. It doesn't get better than that. And so we see that in terms of running time of your application. Our full GC times are lower for the--when we're running the shared GC stuff.

We get a, there's some benefit, not as much as we'd like in terms of packing because of the way we arrange, you know, we can load methods that are actually used instead of the entire class and including the methods that you don't use. So there's some memory packing benefits as well. The coolest thing on here is that Apple's shared generation technology may well become Suns.

[Transcript missing]

Timeline. Let me shift. Let me talk a little bit about, okay, so this is the end of the part where, you know, best of breed.

You've seen this stuff. You can, you know, sit back, think about it. Make your own decisions. Let me talk about a few of the things that we've done since we spoke to you last. In September, we shipped Java 1.3.1. 1.3.1 was a huge amount of work for us because it was a whole different generation of the Hotspot compiler. So we actually had to work long, hard hours to get 1.3.1 out to you, even though it was a .1 release. It provided practical debugging. 130 did not. 130 did not provide much of a working JVM PI, the profiling interface. 131 did.

We introduced, not only got all that stuff going, but we introduced the velocity engine use, we introduced cache locals, and we improved our shared generation technology above and beyond what we did in Cheetah 10.0. We've been working on 1.4, as you know, but that didn't stop us from doing some things and shipping them in our 1.3.1 update that went out just a couple months ago.

We extended the shared generation to include constant pools. If you know what class file formats are, you know what this is. But basically, we figured out a way to avoid even more garbage collection cycles by sharing even more data. And so those benefits are just there without you guys having to lift a finger.

We added something called thread local Eden. It's not on by default. You have to trigger it yourself in this update. And the way you trigger it is that little, that funny little command line argument. We believe we're going to be providing a handout at the, at our talk on Friday and we'll try to put some of those tricks on that handout.

If not, well, you'll know where to find me because my email is going to be up on the slide. The other thing we did was we had some complaints. People would bring scripts over, you know, jar files that just worked except the shell script would like die because they had dash server on their, you know, Java invocation line. So we decided we ought to at least fix dash server such that it didn't complain. But we did a little bit more than that and we actually started tuning it for what we consider to be server usage.

Java Dash Server is not, as it is on other systems, a different compiler. It is, at the moment, just a bit of tuning. We do realize, though, that servers run longer and we probably should do more in terms of compilation than we do right now. But Java Dash Server is just the start of where we're going to be going with that.

If you took a look at, if you saw Steve's keynote, it appears that we're actually maybe going to start selling some, you know, server hardware as well beyond server software that we have. So we're going to see more and more interest in that as we go on. The thread local lead, and let me back up a bit, that was, as I said before, the main we, the main method or mechanism by which we got that doubled allocation performance.

We have provided multiple JDK capability. This means Thank you for watching. I'll see you next time. How we do this is pretty much the way we've done it before. We have one symlink in the system in the magic place that points to either a 131 tree of things or a 14 tree of things. The only difference that you should probably see is that the header files are actually now part of that tree. So there's a 131 JNI.h and a 14 JNI.h and they're in different places just as they should be.

We have some support for actually allowing you to bind an application to one or the other of these VMs. You should look for that in the release notes. We're not sure we're ever going to ship this this way, but we're at least thinking maybe we might. We'd like to get your input as to whether or not you see that as a requirement. Or whether you think that's nice to have or what you think about that.

The other thing we're doing for Jaguar is providing or making use of the two-level namespace feature that got introduced in 10.1. So the main impact to you folks is that if you build JNI libs, they no longer have to be bundles. They can be dilibs. So they can export symbols to other libraries and other frameworks as well. So a consequence of two-level namespace is that you really should never link directly against a particular version of Hotspot. You should only link against the JNI stub library that we have there. And then our launcher will pick the right VM for you to use.

A little thing, a nice thing. We are taking advantage of some rearrangements of memory that are going on within the rest of Jaguar. We help motivate some of that, such that in terms of the four gigabyte address space, Apple is no longer scattering things all through it. So we're trying to clean up some big chunk of it so that the net effect for you folks who buy 1.5 gigabytes of memory in your servers is you can use most of it. So I think, I don't think that's quite in place in the Jaguar that you see right now, but you should just know that we're working on that. So that will be there by the time Jaguar ships. We, you know, if only I had good sound effects.

EIEIO is an instruction on the CPU. And we figured out how to use it. It turns out that a combination of that instruction and one other is actually faster for doing synchronization than what we've been doing before. So, like I said, when we figure these things out, we can roll them in and your code gets better without you touching it.

So, if only I could, you know, bring in a McDonald's song in here. But anyway, we could all break out in chorus, The last thing we did, we updated to 13103. Every now and then we get these little security updates and stuff like that. And so we roll them in, fold them in, and get them out as quick as we can. So those are the things you're going to see in Jaguar.

Oh yeah. Last year I mentioned that there was a third collector, the train collector, the incremental GC, and that we hadn't done really much testing at it. And I said, test it. If this works for you, ship with it. Well, somebody went off and actually did that, and they found out that it didn't work.

So again, if you have code coming in from other systems, what we've done is we've, unfortunately we haven't actually fixed it, but we've at least disabled it so that it can work. So the Inc. GC doesn't eventually cause your program to crash. So that's an outstanding bug on our part. But you can use the option. It's just ignored.

[Transcript missing]

First of all, there's a whole talk. 1.4 is coming along with some technology that we're going to use to make running your programs better. and there's going to be stuff that you have to use to make your programs run better. You'll have to go to that talk to find out more of the detail. I'll give you two slides worth of an overview of what's going on in that talk.

If you want to make your program run faster, my best advice is to go buy a professional tool for that. We've worked with the folks that optimize it and they have recently been acquired by JBuilder and that does really nice things. If you don't--you do all of these if you want. More the better, right? I believe HP has a public domain HP Prof program that interprets the standard output of Java stuff. I believe that's a free download, and it uses just the standard APIs as well.

Of course, JVM Profiling Interface. If you really know what you're doing, you can write your own, you know, event monitoring things and keep track of particular heavy, you know, items yourself. The built-in profilers are, you know, basically no different except we've actually tested them and made sure they work. And they come in three flavors.

There's basic CPU and monitor profiling with

[Transcript missing]

yet another more single thread profiling facility called XPROF. The interesting thing about that one is it actually shows you when methods get compiled and when they're interpreted and stuff, and so you can use that for various diagnostics. So the details from all of these are really tedious to learn and understand and make use of, and that's why at the top of the slide before I said go buy a tool, because that integrates it, turns it into graphs, gives you a visual sense of what's going on. That's really my recommendation on this one. Beyond Jaguar. 1.4. 60% more stuff in 1.4 than what was in 1.3. There's just piles of stuff.

The guy who wrote, writes the Java in a Nutshell series, Flanagan, David Flanagan, I believe, has a list out on the web. That's the URL to it. It's his top ten features in Java 1.4. There's parsing and transforming of XML. There's a new preferences API. There's logging API. There's sockets, secure sockets. He had a real thing for linked hash map. I don't know why. I think it was because he needed a tenth thing.

Read the article. Memory map files, non-blocking I/O, regular expressions, and language assertions. These are all really neat things in 1.4. and they're all available in our preview. Top 10 things are there in our preview for The preview contains the Hotspot Client Compiler. The Client Compiler is not done yet. It doesn't make any mistakes that we know of, but its performance is not...

[Transcript missing]

Almost all of those classes, all the classes you saw on the previous slide plus bunches of others we didn't have room for, those are in the classes.

The UI, the GUI stuff, the new classes for GUI is not there. What we've done instead is taken the 131 GUI classes and repackaged them. Now there's a separate copy of them. It's not like we're just shifting the VM underneath. So there's a separate copy of the UI classes. The reason this is important, I think, is that for you to take advantage of all those NIO facilities, to start writing your XML stuff and doing all that kinds of stuff, it would be good if you could have an app that basically ran.

So you could take your existing 1.3 based app, start making use of all those other APIs that are now available to you, and when we When we're ready to show you the 1.4 UI classes, then you'll be three steps ahead. So that's the idea in terms of putting together this packaging.

You get it by going to developer.apple.com/java. There's a download section. Look for WWDC downloads. And actually I had a question. Has anybody actually gone and done this? Has anybody installed Jaguar and gone and put one four up? We have one brave soul in the audience. Okay. You can ask him questions later. Use it. We have a little, a trivial little shell script that flips that one magic symlink between 130 and 140.

Compiler. Now this is the slide, one of the two slides that kind of is a preview of what you'll hear about in more detail later. There's a big feature in the 1.4 compiler called deoptimization. What this allows is for your compiled and running code to have really deep inlining so that this method calls that method, calls that method. We can just take all of those, roll them up into one big special purpose function and execute it. So we don't have to do all those server-tank call overheads.

Well, with 1.4 what we do, in 1.3 we can only do that for final methods and for static methods. In 1.4 we do it for almost any method. And the only problem we would get into is if somebody loaded, dynamically loaded a jar that had a class, subclass of something that we aggressively inlined and in fact that re-implemented one of those methods such that our compiled code isn't going to call out to that dynamically loaded class. So what do we do? We've got this code and it's running, it's compiled, and so what we do is we throw it away.

And we replace the call frames on the stack with--we call it in to our interpreter such that when you fall down the stack you start implementing the interpreter for all of those methods. So if you've--you're in the middle of this inline, you know, 60 method, you'll end up with five frames of interpreter on your stack frame. So this gives us the ability to inline, aggressively inline non-final methods. If--how many of you saw the Grand Canyon fly-through demo?

So there's a part of that where--so that was based on GL for Java. There's a part of that where they'd flip a switch and it would go to 1.3. And the truth of it is, it was really only using the I/O facilities from 1.3. Because our compiler could not have compiled that innermost loop and you would not have gotten that speed, even half the speed that you saw there for the 1.3 part.

So that compilation part's really important. The other thing that it allows is full speed debugging, which means that your code runs and gets compiled and when you hit a break point, only the routines that need the break point are interpreted. And that is a lot faster than what you have now, which is when you do debugging, you have to do it wholly in interpreter mode. So we're really excited about this feature. Oh, I wish I could go back.

Let me try down. Second big thing is the low level intermediate representation. Basically inside the compiler there's another layer that maps better to our instruction streams and so we can map or we can map and do better optimizations at the bottom. So you'll hear more about that and why from that at the Java performance talk on Friday. There's three of those many APIs I want to talk about a little bit here. First one is native buffers. What is, what is native buffers all about?

You've got memory. You fill up memory in the regular heap with data of some kind and to use it in Java, you have to copy it into the Java heap. Once it's in the Java heap, it might migrate from one generation to another so it gets copied yet again. This is a lot of copying. So native buffers allow you to represent bits.

that are allocated out of the C heap with a Java object. The Java object, it's not a language array. It's like an NSArray. You have to use methods to get to the bytes and words and stuff within it. But we use a compiler trick such that whenever you compile codes that use those accessor methods, we compile in the access to it because we recognize that magic class such that you get array-like performance. So this is another almost kind of a language integration feature. And that gives us really high-speed ability to map and do I.O. and not waste memory for that. The new I.O. package makes use of those native buffers.

for its IO operations. There's also support for IPv6, support for non-blocking IO. We put together an implementation of preferences. And in fact, to show you and talk a little bit about that stuff, I thought we'd have our own little demo. So at this time, I'd like to bring up Greg Parker, who actually implemented most of that stuff.

And where you saw in the Grand Canyon demo issues about how to do memory mapping and the ability to represent those objects out on the video cards memory. What Greg's going to talk about is network I/O and how we use this package to do faster networking. Take over.

All right, so the first thing I'm going to show you is a little bit of that JUs command that we told you about that allows you to switch between the two virtual machines. So this is Jaguar. This has the developer preview installed on it. And we can just say JUs 131. We have to type our password because this is mucking with stuff in the system, but now we have 131 enabled. Java-version.java.

There's 1.3.1. I can do the same thing with 1.4. J is 1.4. Java Dash version. There it is. So, what I'm going to show you now is a taste of what you can do with Java 1.4. I've written a simple streaming video server in Java. It doesn't really do much.

It reads GIFs from a file, reads them to a network cable, and then we have a client on the other side that renders those GIFs. So what I'm going to do is run -- I have two versions of it, one of them written in 1.3.1, one of them written in 1.4.

So this one is going to run the 1.4 version. There it is up and running. And over in this window, you probably won't be able to see it, I'm going to use JUs to switch over to 1.3.1, and I will run the 1.3.1 version of the server. with the 131 classes without 131 NIO, all that kind of stuff. So, let's switch over to the client.

This is the client program here. It's an Objective-C program. It simply has a thread that reads the GIFs off the network from the Java program on the other machine and just draws one of those threads to the screen to display the animation. And the other threads just read the data and throw it away. So we can run a lot of threads, a lot of server connections, and try and stress test the server.

So let me show you the 131 server here on the left. This is simply using the one thread blocking IO per connection. It's using file input streams to read the GIFs off the disk. And it's using simple sockets and output streams to write the data. So the standard Java mechanism. So we can run this. It runs pretty fast, actually. Java 131 is not that slow.

We can run several connections. These background connections are reading data very fast and throwing it away very fast. And we can see that on this connection, this is using a gigabit Ethernet network in Apple's hardware. Whoops. Oh, damn. SegFault, cool. I haven't seen that before. So that's probably an Objective-C fault. This is not a Java program.

So we see it getting about 170 megabits per second. This is a 1000 megabit per second wire, theoretically. So that's reasonable performance. The 1.4 version is using the new APIs available in 1.4. It's using a single thread to serve all the connections using non-blocking I/O. So that's going to be a lot less threading overhead. We don't have to do thread switches as often.

It's using the byte buffers, in particular memory mapped byte buffers, to read in the files. So we read in the files once and the data stays in the C heap so you don't have to copy it into Java space. Also, all the connections can be served using that same memory. We don't have to copy the memory for each connection.

And finally, we're using the NIO socket channel to actually send the data out across the wire. That also knows about the new byte buffer APIs, which means that we don't have to copy the data in Java from when we read it from the disk between the Java various arrays or objects and then finally to the network. So we can run it. It's going to increase straight from the file into memory into the network subsystem. So less memory copying, less memory overhead, no GC overhead for these blocks of memory.

So we can run it. One connection is still pretty fast. The animation is actually slowing us down here so it's just as fast as the 1.3.1 was. When we increase to two connections or more connections, it's running a lot faster. If you remember the previous one, it was about at 160, 170. In fact, it stopped there at 170.

The new version with a single thread and less memory copying is getting a significant speed increase in data rate that the server is managing to send out. So this is maybe 40, 50% faster, something like that. The other thing I want to show you on the right, this is a process listing on the server machine.

So this is the 1.4 Java server process. It's only one thread, so it's using 100% CPU on that thread. This is a dual processor machine, so the other thread is busy actually in the kernel network setting. But it is not using all the CPU. You can't see it. It's off to the right. But that second CPU is significantly unused right now, which is good. So we're getting good data performance.

Objective C is not that bad. Objective C is nice. Oh, man. Objective C is not perfect, So it's getting better data rate with 100% of one CPU, the other CPU not fully used. If we look at the 1.3.1 version, it's getting worse data rate like we see. Note its CPU usage, 166%. So it's using one CPU 100% of the time. It's using the other CPU 70% of that time.

And that last 30% is actually the kernel doing the actual network IO. So the 1.3.1 version is totally loading the system and still getting less data throughput than the 1.4 version does. So these 1.4 APIs allow us to get better IO rates with less CPU usage. That's pretty cool. We like that.

This is available in 1.4. This is using the 1.4 developer preview you have right now. We'd love to hear back from you whether this stuff works. It's still under development. It works pretty well except for the Objective-C side, but that's not our fault. We'd like to get your feedback on the NIO classes and also the other classes in 1.4: the Preferences API that shipped, the IPv6 that's available. Oh, by the way, the server can serve IPv6 connections with zero extra code.

I don't have a demo for that, but that's pretty nice. If you're using any of these things, we'd like to hear back from you, hear your feedback, your bug reports, how well it works for you so we can make sure that our final 1.4 release is as good as we can make it.

I don't have much more to say. Best of breed, that's for you to believe the assertion or not. As you can see, the system is getting better all the time and it's going to be even better in Jaguar. We want you to try out the 1.4 VM. There is an issue though about talking about it on Java dev. You're under NDA. 1.4 is only available to to folks who have Jaguar and we don't really want you talking about this stuff on Java Dove.

but we are interested in your feedback for that. Still to come, obviously, the GUI for 1.4. Even better, shared generation stuff. The folks at Sun have got some interesting ideas on how to move the stuff even further. We hope to not only help them do it, but fold it right back in and ship it.

And compiler tuning, of course. As you have seen and will probably see again and hear about in more detail, the Grand Canyon fly-through makes use of both NIO as well as that greater compiler technology that is just beginning to come through for us. Who do... Oh, yeah, where do we go from here? Well, we actually have our feedback forum right after this. Most of our Java talks are done. We only have our Java performance talk left, so come give us some feedback right after this over in room J, isn't it? J1.

And then the Java performance talk on Friday at 9 a.m. I know it's a little early, but make it your one, your first and best talk on Friday. who to contact. That's where you can reach me. That's where you can reach Greg. Alan Samuel is our Java Technologies evangelist. I'm going to bring him up and let him lead some Q&A, I think.