Performance and Graphics Tuning Your Java Application - WWDC 2006

Information Technologies • 1:09:49

Maximizing speed and performance is important for any application. See how Apple's Java engineers use powerful Mac OS X profiling tools such as Shark and Sampler to identify performance issues and then make the necessary corrections, resulting in better application performance. Bring your laptop and work along with the Java team.

Speakers: Viktor Miladinov, Chris Campbell, Rick Altherr

Unlisted on Apple Developer site

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.

Welcome, everybody. Good morning. My name is Viktor Miladinov, and I'm one of the graphics engineers in the Java team at Apple. And today, we're going to talk about performance and graphics tuning your Java application. So this is an early session, and I kind of wanted to capture your attention.

And I wanted to come up with a catchy slogan. So I sat down and started thinking about the slogan. And for some odd reason, the only thing that was stuck into my head was this cheesy quote, "15-minute call to Geico can save you 15% or more on car insurance." So I tried to shake it off, and I couldn't. So I decided to play along with this cheesy theme. So here's the catchy slogan for this session.

60 minutes of paying attention to this session can increase your Java application performance by 60% or more. Right. And in worst case, you might just have to buy one of those new Mac Pros and you're pretty much guaranteed 60% out of the box. But let's try to do it the hard way.

Old school. So how are we going to do that? We're going to structure this talk in three parts. The first part, we're going to talk about the Java 2D graphics pipeline in Mac OS X. And we have several pipelines now that you can choose from. And second part of the talk, we're going to talk about resolution independence and how that affects your Java application.

And last but not least, we're going to talk about Shark for Java. For those of you who don't know, Shark is a very cool profiling tool for Mac OS X. We have Java support. And for those of you who know about Shark, We have some new stuff to talk to you this year.

So this session is going to be pretty packed in terms of content. It's going to be fairly intense. We have about 100 slides. We have three demos. So it's going to be pretty intense. But however, I only want you to kind of get a few key things out of this session.

So when this session is over, I would like for you to be able to answer this question, like what are the different Java 2D graphics pipeline and Mac OS X? Another question, what are the pros and cons of each pipeline? What are some of the performance characteristics of each pipeline? How do we compare against Windows XP for some simple primitive drawing? And how can I take advantage of OpenGL hardware exploration from Java?

and Is Java Going to Resolution Independent? So like I said, this is going to be a pretty intense session, but these are some of the stuff I kind of want you to pay attention to. However, the most important goal, or the most important question I want you to get out of this session is basically answer this. What is the best pipeline for my app?

So we have several pipelines, and choosing the right pipeline should be very, very important. I mean, you can actually get 60%, 100%, 200% improvement if you choose the right pipeline. So we're going to actually look over that. And the second goal, or the second most important goal of this session, is you should all know how you can use Shark to profile and optimize your job out. It's very easy, and we're going to go over that.

Finally, I also wanted to remind you that we're here to help you. So if there is something that's not clear from this session, or you're having a performance issue with your Java application, come talk to us afterwards or during the labs. We'd love talking to you. So now we're going to actually dig into the meat of the first part of the talk, the Java 2D graphics pipelines.

Before we do that, let's talk a little bit what do we mean by graphics pipeline. So for the purpose of this slide, a graphics pipeline is an orange rectangle. It has one input and one output. The input is, for example, a drawing command like graphics.drawline. And the output is kind of the rasterized version of that command. So the graphics pipeline does most of the work for you. it does the clipping and the transformation and the restoration and so on and so on.

[Transcript missing]

So now we all know what's a Java 2D graphics pipeline, and we're going to talk about the first graphics pipeline, which is the Quartz-based Java 2D pipeline. For those of you who don't know, Quartz is basically the renderer of choice for pretty much most apps in Mac OS X. Cocoa is written on top of Quartz, so Quartz does the heavy lifting for Cocoa. And we actually implemented the Java 2D pipeline on top of Quartz.

So this graph should be familiar with you. We have the Java 2D APIs, and at the end we have the Quartz rendering engine. In the middle we have the glue code that we, the Java team at Apple, wrote to connect Java 2D APIs and the Quartz renderer. One thing to keep in mind from this slide, I'm going to use the color to mean something. So the blue color means that code written by Sun, and orange color means code written by us at Apple.

So this slide should be familiar to everybody. So we're going to see how the Quartz pipeline deals with drawing a line. So it's actually very simple. On the Java end, there is a call called graphics.drawline. And on the other end, the graphics pipeline spits out this quartz command called CGContextStrokeLineSegments, which is a native C call. So this is a very simple case. We have a clear one-to-one mapping from the Java side to the native side. So I'm going through this for a reason. We're going to see in the next slides.

So the best way to describe Quartz, I like to use the phrase a veteran warrior. It's been with us since 1.3. We use it in 1.3, 1.4, 5.0, and 6. And it's been getting better and better with each release. And by better, I mean less bugs and faster performance. So we can see that Quartz actually served us well over the years.

However, there is a problem. And it's a rather big problem. And the best way to describe this problem, I'm gonna borrow a term from electrical engineering, and I'm gonna call it impedance mismatch. So here's a graphical representation of what I mean by impedance mismatch. So there's a certain subset of Java 2D APIs that there is no clear mapping or support in Quartz. So we also, in the graphics draw line, there was a very easy one-to-one mapping between Java 2D and Quartz.

However, there is a certain set of APIs where Quartz does not support. But Java needs to run on Mac OS X, and it needs to support all the Java 2D APIs. So we, the Java team at Apple, had to put a lot of work around to make those stuff work. So why is this important for you? Impedance mismatch, really bad.

results in a poor performance and buggy software. Now, let's look in detail what are some cases of impedance mismatch. So the number one culprit of this impedance mismatch is the fact that CORS does not support direct pixel access in their images. So we all know in buffered images, you can say get raster, get data buffer, and basically can muck with the pixels yourself. Well, CORS doesn't support that. So we go to great lengths to kind of make this work on Mac OS X.

And it's pretty painful. And this is kind of an unofficial statistics, but I work in the graphics team, so I kind of screen most of the bugs. And I would say from my experience, 50% or more of the bugs we get on Mac OS X are related to this particular issue. So this is a big issue for us, and it's kind of hard to solve.

Second issue, I'm going to call it-- this is another case of impedance mismatch. I'm going to call it the unfriendly image types. So we all know buffered images that have 12 or more or less image types-- ARGB, ARGB3, 3-byte BGR, and so on and so on. Well, Quartz only supports a handful of them.

So when you use one of those unfriendly image types, like type U short 565 RGB, you're hitting the slow path on Mac OS X. You might not be paying any penalty when you use it in other platforms. Another issue is XR. Quartz does not support XR. So we have to go to great lengths to support XR.

And there are many more cases, but we're not going to go into that because I just kind of wanted to get you an idea of examples of impedance mismatch. And again, the same slide as before-- really bad. And we're going to see why. So really poor performance and buggy software.

So if this is really bad, what have we done to solve it? So we worked really hard at this issue. And every release that we release with Java, it's getting better and better and better at solving this particular issue. And the second thing that we have been doing, we have been educating developers. So if you were here last year in this session, or the year before, or three years ago, you probably heard us saying, on Mac OS X, try not calling raster.getDataBuffer, or don't use setPixel.getPixel. It's slow. It's going to make your application really slow.

Or you probably heard us saying, always call createCompatibleImage. Never use any of the other types. By the way, the second one is a good suggestion nonetheless. And the last one is don't use XR. This is not right. I mean, Java, it's a cross-platform API. You write your code on Windows, Linux, or whatever platform, and it should just work on Mac OS X. You shouldn't be worrying about this, oh, don't do this on a Mac, or this thing is in a Mac. or you shouldn't be special casing your code for a Mac.

So I'm kind of happy to say that I think we finally solved this problem, that your experience in the Mac when you call these APIs, it's not going to be degraded. And the way we solved this problem, and the solution to us, comes in a completely different pipeline. So we talked about a quartz pipeline. Now we're going to talk about a completely different pipeline, and we're going to call it the SAN 2D pipeline.

So we sat down to think about how to solve this impedance mismatch problem, and we realized that Sun actually over-dissolved it. Sun actually wrote the rendering engine in C. And since it's written by Sun, it's a pretty much direct mapping from the Java to the APIs. So it supports the Java to the APIs really well.

So we said, why don't we just port that code to Mac OS X? So after we ported it, that gave birth to this completely different pipeline. And for lack of a better term, we called it a Sun 2D pipeline, because most of the code is the software render that's written by Sun.

Very two important points about this pipeline. One, it should be bug for bug compatible with Windows, Linux, and Solaris. Should be. And mainly this is in the off-screen case, because I think, San, when it comes to Windows or the other platforms, when you draw on screen, they might use GDI or DirectDraw and whatnot.

But if you draw off-screen, you should be bug for bug compatible. And the second thing that's also very important, and I think it's the most important issue, is there is no impedance mismatch. So everything we talked about it, San2D pipeline is actually immune to this problem. So very important.

Now, let's look at a graph of the SAN 2D pipeline. What I'd like for you to pay attention here, it's actually the color of the boxes. So blue code means code written by SAN, orange code, code written by us. So most of the code, it's actually code written by SAN, just ported to Mac OS X with very, very few modifications.

We still have to write a little bit of blue code to let SAN render a talk to our native Windows and so on, but it's very minimal. So it should be very similar to the code that your app is taking on other platforms. Now, here they are side by side. The big thing to pay attention here is the render at the end. Quartz render in the Quartz pipeline, the sun to the render.

This should make you happy and scary at the same time. It should make you happy because if you had a bunch of bugs in the Quartz renderer and they were not solved, now you can try a completely different renderer, again, completely different code pads, and all of your bugs might just disappear.

It should make you scary if things are just working fine in Quartz, then doing your testing on a completely different pipeline should make you worried a little bit. However, they're both optional, so you can choose one or the other, so you shouldn't worry that much. So she'd only make you happy.

Let's look at the history a little bit of the pipelines. And we can see that course has been the default render in 1.3.1.4.5.0. and we actually introduced the Santorini pipeline in Java 5. It was optional and it was a beta state, so we didn't make a big deal about it. So we're going to focus today actually on this.

and the rest of the team. We're going to make a big switch, and I want you to pay attention to this right now because it's going to impact you. We're going to switch the default rendering pipeline in Java 6. So we're going to go from Quartz to SAN 2D.

And the main reason why we did that is for the two reasons I mentioned earlier. First, bug for bug compatible with Windows. And second, you should be hitting none of these impedance mismatch problems. However, we still want your feedback. This is a decision we did for our beta and for Leopard, but we're not set on it. So please run your app with the two pipelines and let us know what you think. If you think we're crazy of making this render the default, let us know. We'd like to know if you're crazy.

How do I make it run? How do I switch? One thing to keep in mind, you cannot switch at runtime, meaning after your app is running. So you have to make a decision before you run your app. So you can use this command line option, -d apple/awt/graphics/use-quartz=true. That will make sure that your app runs to the Quartz pipeline. Use Quartz equals false, you run through the SAN2D pipeline. If you specify no option on 6.0 you get, or on Java 6.0 you get SAN2D, on 5.0 you get Quartz. So this is important for you if you want to play with the pipelines.

So remember this question that I kind of said it's very, very important. So now we're going to try to answer it or somehow. So first I'm going to give you a lame answer. Well, it depends. So I'm just going to wave my hands. So let's see what does it depend on. What are the factors that might impact your decision?

Number one, bugs. Like I said, there are two different pipelines, and they have completely different set of bugs. So you're probably going to pick the pipeline that has the less critical bugs for you. I just kind of want to go on a tangent here and mention two other points.

Bugs. When you file bugs from now on with Apple, please specify which pipeline you use or whether it works on the other pipeline or not. So it will make our lives a lot easier. And also, for example, if you have a critical bug in Quartz that's a show stopper for your app, then you try the Santorini pipeline and things just work great. Then you kind of start using the Santorini pipeline and continue happily with your life. What you should have done, though, you should have filed that show stopper with us, the Java team at Apple, because we'd like to fix all bugs. So bugs.

The second thing that should be a factor in your decision is this impedance mismatch penalty. And this is very simple. If you're hitting this impedance mismatch penalty, or in the cases that I outlined earlier, you're pretty much going to use the Santorini pipeline. You can trust me. Quartz is not there and there for you if you're hitting this impedance mismatch penalty. We're going to see a little bit more why in the performance talk.

Two different renderers, two different pixel coverages. We're going to see a little bit of this on the next slide. and last but not least, it's gonna be very important, two different renderers, two different performance characteristics. We're gonna look at them too. So let's look at pixel coverage first.

So I drew a circle with a Sun 2D renderer and a quartz pipeline, or the quartz renderer. And from far away, they might look identical to you. But if you zoom in on particular details, you're actually going to see that they have different pixel coverages. Okay, so what does this mean for you? Which render should you use? So if you're optimizing your app to kind of run on Windows, more likely the sound 2D render is going to be a good choice for you.

Other than that, I cannot give you any advice. Basically, you just have to run your app and see whichever looks better. Like if you ask me which of this circle looks better, you know, I have no idea. I might say quartz, but other than that, I have no idea.

So they also differ in anti-aliasing pixel coverage. Most of you should know what anti-aliasing is, but probably a few people in the audience might be saying, what is anti-aliasing? So I'm not going to go into digital signal processing to kind of explain anti-aliasing, and I'm going to make it very simple. I'm going to use one definition. I'm going to say anti-aliasing is the effect that makes this line look jaggy or makes it look like a staircase.

So there's a technique in computer graphics that kind of draws or fills the nearby pixels to make this line look smooth. So here's how it looks once anti-aliasing is applied. You might not be able to see that much on this slide, but let me see. Yeah, you can see a little bit if you squint, the NTLS line should look more like a solid line. But it looks really good on the LCD monitors.

So different NTLS pixel coverage, again, due to the projector, you might not be able to see the difference. But if you zoom in at the end pixel, let me see. Yeah, you can kind of see that the pixels have different intensities. These pixels also have different intensity, and I'm sure you cannot see this pixel.

But just all in all, NTLS pixel coverage would also be different with the two pipelines. So let's summarize the pixel coverage. It was very brief, but point number one, SAN 2D is, or should be, pixel-for-pixel compatible with Windows, Linux, and Solaris. If you optimize your app to work on Windows, then the SAN 2D might be a good choice for you.

Other than that, there is no right or wrong choice. You run your app and see whichever renderer gives you the best look or pixel coverage. And I'll give you an advice. If you work in a bigger company where there is a UI design team, you can run it by them. In my past experience, I found that UI designers are very good at spotting these pixel differences, and they actually get particular about where their pixels go. So run it by them.

Okay, now we're gonna switch gears and talk for a while about performance. Like I said, there are two different pipelines, two different performance characteristics. So we're going to compare the SAN2D renderer pipeline on Mac OS X. We're going to compare the Quartz pipeline against each other. And on top of that, we're going to compare how they perform against Windows or Windows XP.

Performance challenge number one, how do we accurately compare to Windows? Well, it used to be hard and people would argue, is my G4 fast as your Pentium 4? And these debates would go on forever. It's very easy now. Intel boot camp, same machine, very easy. You get apples to apples comparison, pun intended.

And then, so this is a test machine that I'm going to present the numbers from. It's just a Mac Mini. It's running Windows with Sans version of 6, Beta 85, and same on Mac OS X, DP4, B85. Performance challenge number two, finding a good benchmark. We have a bunch of internal benchmarks at Apple, but they wouldn't mean anything to you if you list the numbers.

So we wanted to find a public benchmark, and luckily, we found one. It's J2D Bench. For those of you who don't know about J2D Bench, it's just a collection of graphics micro-benchmarks. You can run thousands and thousands of graphics micro-benchmarks. The cool thing about it, it's part of the Mustang source drop.

So if you download the source drop, you can actually compile it and run it yourself. And you can get the numbers yourself. And if you download Mustang, you can find it under source share demo Java to the J2D Bench. So try that for yourself and play around with the numbers.

There's one note on J2DBench or a caveat. It is not fair to compare on-screen results against Windows XP. And you're probably dying to know why. And here's the answer. So I draw a line on screen and basically J2DBench told us that the Mac OS X Quartz renderer or the pipeline, it's 14 times faster than Windows, so 14,000%. Well, This is not true.

And let me explain to you why. You kind of need to know how they work behind the scenes to kind of know it's not true. All Windows in Mac OS X, or when you draw on screen, they're double buffered. So you'll be drawing into this back buffer in Mac OS X and periodically flashing the updates back on screen.

Whereas in Windows, they're single buffers, so you might be drawing directly on screen. So in Mac OS X, you kind of have this queuing strategy as opposed to with Windows, you draw directly on screen. Anyway, to make a long story short, just trust us, it's not fair to compare, even though we would love to be able to publish these kind of numbers.

So we're only going to look at the off-screen test. What that means is it's drawing into a buffered image. So it's kind of lame, but that's all we can compare apples to apples. There are gazillions of micro benchmarks, but I picked the four most common primitive drawing routines, like drawing an image, drawing a rectangle, drawing a line, and drawing text. So let's dive into one of them. Drawing an opaque image. So I'm going to pause and you can take a look at the graph and I'll get a sip of water.

So what this slide tells us is that the Sanity and the Quartz render are about 25% faster than Windows. So I'm also going to kind of explain this pattern because all of the other slides, we have at least like 10 more graphic slides, they're going to follow this pattern.

We're always going to have Windows at 100% and we're going to have the SanTudino Quartz render numbers relative to Windows. So can we make this claim that drawing an opaque image is faster in Mac OS X than Windows? Well, not quite. It's too vague of a statement. And the reason why it's vague, we have to look at the details or the parameters.

So all of the slides are going to have this little line where you can look at the details or the parameters of the call. So the source is an RGB image. We're drawing an RGB image onto a destination compatible opaque. And the size of the image is 250 by 250. And it's just going through identity transfer.

This is really important because varying any of these parameters might completely change your results of the benchmark and also the results of your app. So we're going to be very careful when we actually specify this result. The reason why we have to be careful, the next slide tells us why. So this is drawing an opaque image, same as before. The only thing that's different is the size, so it's 20 by 20. And you can look at it, the characteristics are completely different.

I mean, Quartz is now three times slower than Windows. So just to summarize these two slides, the Quartz pipeline is kind of good at drawing big images, but not as good as smaller images. And we can see the Sun 2D render kind of trails a little bit behind Windows. So now we're going to look at drawing a translucent image. Same as before, the only thing that's different is instead of an RGB image, we're going to do an ARGB image, and there's going to be some blending involved.

So this slide looks good. So drawing a big image, Quartz actually is about three times faster than Windows. The sound of the renderer kind of performs like Windows. I kind of happen to know why this is happening too, so I'll tell you. The Quartz render is actually optimized for the CPUs that we ship.

So a lot of the blending code is being done, it's vectorized. So on G5, there's some Altivec code, and on the Intel processor, there is SSE code. Versus the Sanity render, it's just a C render that's not optimized for the CPU. So Quartz looks really good if we actually, if you have a bunch of those images, very likely that Quartz is gonna be three times faster than Windows.

Now, if you have small images, we kind of knew that Quartz wasn't good at small images, but their blending algorithm is so good that they still kind of come a little bit ahead of Windows. But we can call this a tie. And the Sanity just lags a little bit behind Windows.

So we're gonna kind of jump through the, we're gonna skip the image part and we're gonna go into drawing primitives, fill rect. And I'll kind of tell you what to expect now. There's gonna be a big difference whether we draw aliased or anti-aliased. And we kind of know now what's aliasing and anti-aliasing. So keep that in mind, aliased, anti-aliased, big difference.

During a fill rect. So during a fill rect, it's about three times slower on Mac OS X than Windows. And this is the key word here, it's alias. Because once we switch to anti-aliased, Quartz is about four times faster than Windows. And you're wondering why, and I kind of happen to know the answer to this too.

Everything in Mac OS X is drawn anti-aliased because we like our interfaces to be pretty. And the Quartz team has spent a lot of time optimizing the anti-aliasing path. There it is, NTLES Fasten Mac OS X. And you can see the sound to the render is the same code as Windows, so it kind of performs around that ballpark.

Drawing a line. So you're going to see the exact same thing that we saw with filling a rack. And this is kind of painful. Drawing an alias line, Quartz is about 10 times slower than Windows. Really, really bad. But If you go to the NTLS line, it's about three times faster than Windows. The sound of the render on the other hand is the same code as Windows, or similar code as Windows, performs around the same.

So I'm gonna stop here and kind of summarize those two slides, rectangle and lines. Anti-aliasing chords, really, really good. Aliasing chords, bad. So drawstring, we're gonna see the exact same thing. So I'm just gonna breeze through this slide. Alias string, of course about three times slower, not so good. And the Alias string, it's a little bit faster, better. So we see the exact same pattern.

So first I'm going to apologize for kind of breezing through those micro benchmarks. And I kind of did it on purpose because I didn't want us to focus on these little micro benchmarks and parameters. I want us to have the big picture of how the two renders perform against each other in this big picture. So what I ended up doing is I took all the tests and I put them on one graph. So now they're flipped from horizontal to vertical. They're stacked next to each other.

This graph didn't tell me much. So what I ended up doing, I ended up doing a plot chart and I connected the actual lines. And for those of you who are chart purists, yes, the line between the two points has no meaning. However, I like this slide a lot. So if you're actually kind of falling asleep now, this is the most important slide of the performance part. So let me explain what this means. So this is the time to pay attention. So we have Windows at 100%.

And we're going to see the SAN 2D renderer on Mac OS X kind of following Windows. Again, it's the same code. And sometimes it performs faster, sometimes slower, sometimes the same. The reason why we're seeing-- and we're currently investing in investigating this-- but we believe it's basically the fact that there are different compilers produce different code. So for example, Windows is compiled with Visual Studio and Mac OS X with GCC. But we're investigating this to prove this. But you can see that the actual curve is fairly similar. So it kind of gives you the sense it's the same render.

Quartz, completely different story. It's like a roller coaster ride. So sometimes it does really, really good, or sometimes it does really, really, really bad. So the moral of this story is that If you're using anti-alias drawing or you're using big images or you have lots of translucent images, then Quartz will definitely give you a much better performance than you might be getting on Windows. Now, on the other hand, if you're drawing in alias graphics and you have a bunch of small images, then Quartz might not be the renderer for you.

OK, now I've been kind of harping on this impedance mismatch and probably bored about it. But I said, let me kind of prove it to you how bad it is. So remember that call raster.getDataBuffer? You can say image.getRaster, raster.getDataBuffer. And I'm kind of saying you get really poor performance.

And I rerun all the benchmarks. So on all the images, I called raster.getDataBuffer before I run the benchmarks. And here are the results. Should be staggering, I think. So all the course benchmarks go less than 1%. So they're about 100 times slower than Windows. So when I say the experience is suboptimal, it's actually really bad.

So, actually as you saw, Java, the graphics part of Mac OS X, it's not that bad with the course render. It's actually faster than Windows. But sometimes we get a bad reputation because when you hit this weird case of impedance mismatch, our performance is so bad that people just throw their hands in the air. Well, not anymore.

We have a completely different renderer now. It's the Mac OS X San2D renderer. And it's completely unaffected by this call to get pixel or set pixel or so on. So if you're having one of these cases, then probably San2D renderer clearly, you get 100% improvement of your, or 100 times improvement of your performance.

OK, let's summarize this. This is one of the questions I wanted you to answer. What is the fastest pipeline for my app? Same lame answer. It depends. And here is basically summarized what I've been kind of going through in the previous slides. Quartz, really good at anti-alias drawing, big image fills, translucent images.

Not so good, or where the has the advantage, it's alias primitive, small images, direct pixel access, or when you use one of those unfriendly image types. So keep this in mind, depending on whichever, whatever your app is doing. Like I said, I mean, forget 60% improvement. You can get like 100 times improvement if you pick the right renderer. So keep this in mind, or play around, and see how your app performs. At least we give you a choice now.

This is the same slide as before, the slide that I liked a lot, and it's the exact same slide. This is where we are today. So I want to talk maybe a little bit about where we want to be. This is not something we're working on, but we kind of said, what if we take the best of the Quartz renderer and the best of the Sound2D renderer?

You're basically going to get a graph like this green line, and basically most of the time you outperform Windows, and in some cases you perform as well as Windows. So this is what we'll be shooting for. I don't know if it's possible. I'll be honest with you. I don't know if we'll be able to mix and match the two renders.

But if we can, you'll get really, really good performance and pretty much never have to pay this impedance mismatch penalty and get all the benefit of Quartz of anti-aliased drawing or blending and so on and so on. Again, this is not something that we have in store, but something that we might explore.

So we talked about two renders, Sun 2D and the Quartz renderer. And if that wasn't enough for you, we're going to confuse you even more with talking about a completely different architecture, and that is OpenGL. And more particular, OpenGL and Java. How do they talk to each other?

But before we talk about it, we should probably hear what Sun Microsystems has been up to. So let's hear their story first. And we actually invited a special guest to tell us that story, and that is Chris Campbell from Sun Microsystems. He is one of the engineers who works on OpenGL and Java at Sun. So everybody, welcome Chris.

Thanks, Viktor. As Viktor said, my name is Chris Campbell. I'm an engineer on the Java 2D team at Sun. And primarily, I'm responsible for the OpenGL-based Java 2D pipeline that I'll tell you more about today. Now, OpenGL has been playing an increasing role in the Java graphics story over the past, say, two or three years.

It's being used-- for those of you who don't know OpenGL, it's basically a cross-platform API that allows you to get really close to the graphics hardware. In many cases, there's one-to-one mappings between what modern hardware can provide you. So big benefit, obviously, is performance. So the OpenGL-based Java 2D pipeline that I'm responsible for was first introduced in JDK 5, or in Sun's JDK 5 release, a couple years back.

Now, like I said, the obvious reason for implementing this OpenGL pipeline was performance. The picture you can keep in your head would look similar to what Viktor showed you earlier with Apple's Quartz renderer and their Sun 2D pipeline. Well, the OpenGL-based Java 2D pipeline is just another pipeline. It's an implementation detail. This means that you don't need to change your existing Swing or Java 2D application in any way. It's behind the scenes. We're basically taking those high-level Java 2D API calls and translating them behind the scenes into low-level OpenGL commands.

[Transcript missing]

In the new architecture, we're able to avoid a lot of the JNI overhead that we had previously. So this results in faster, especially for small primitives, small primitives like fill rec, draw line, those can render a lot faster. The other thing is in the new architecture, we have the ability to batch up similar primitives and send them down to OpenGL in one pass. And some of you may know that graphics hardware likes to work on hundreds of thousands of small triangles and the like at the same time. So this is another area that we get a big bang in JDK 6.

So stability's improved. Performance is greatly improved. Along the way, we've been doing some bug fixing in the pipeline itself, so quality is much better than it was in JDK5. Interestingly, we've also worked quite closely with driver teams from ATI, NVIDIA, Sun, Intel, and others to really improve their drivers.

Every once in a while, your application, if you turned on the OpenGL pipeline, you might trip across a number of rendering artifacts and the like, or even crashes. Sometimes there's no way we can really work around those, and the solution was to work closely with those teams, kind of raise the level of quality across the board. So I'm happy to say that in the past year or so, the driver quality's really improved with respect to the OpenGL-based Java 2D pipeline.

Now, that's the OGL pipeline in a nutshell. I could probably talk for weeks on end about it, since that's what I work on. But I'll spare you the pain. And I'll switch gears now and talk about another API called Joggle. Now, this stands for the Java Bindings for OpenGL.

It's a completely separate API that's been going through the JSR process for the past couple years. And if any of you have ever done any graphics programming in OpenGL in the C language, then Joggle's API should look quite familiar to you. And that's because it's essentially just a thin wrapper around the existing C-based OpenGL libraries.

But Joggle's API also offers a number of high-level classes that allow you to integrate that 3D rendering into your existing Java application, whether it's an AWT app or even a Swing app. So I think one of the coolest things that is in Joggle's API is something called GLJPanel. This is a hardware-accelerated implementation of the Swing JPanel component, except that it allows you to do 3D rendering directly into your Swing application. Probably the easiest way for me to explain this is to switch over to the demo machine.

And what I'm going to show you now is one of the demos that ships with the-- or is available from Joggle's demos website. It's called Jrefract. Now, what I'm bringing up here is a demo. At first, it looks like a very typical swing application. This is running on Apple's latest Mustang developer preview bits. So it's kind of the latest and greatest.

So pretty boring at first, but I'm going to open up one of the 3D demos here. So this is the standard OpenGL Gears demo that you may have seen before. And you might notice there's some artifacts in here. Like I said, it's Mustang's early access release, so there's still more work to be done on the Apple side.

What you can see here is that performance is reasonable. The interesting thing is that I can drag other internal frames, for example, over. And you'll notice that there's no issues with lightweight, heavyweight mixing that you may have heard about in the past. There's no issues with the Z order of the internal frames and that sort of thing. So everything's rendering correctly. Performance looks reasonable at this size, but you'll see that if I make the internal frame a lot larger, the frame rate really begins to dip or down below 15 frames per second. Still relatively usable, but definitely not ideal.

I should mention that prior to Mustang, at least, the GLJ panel implementation uses kind of a two-step process. It'll render the 3D rendering into a separate off-screen surface. And then it has to go through this very slow path of pulling the pixels back and getting them into the swing back buffer. So that second step is the one that really kills performance. And that's the effect that you're seeing here.

In the Mustang timeframe, we worked closely with Ken Russell from the Joggle team at Sun. And he came up with some pretty novel techniques that allow us to basically make the OpenGL-based Java 2D pipeline interoperate with Joggle's GLJPanel implementation. So I can demonstrate this by bringing up the same demo.

The only difference here is that on the command line, we're passing the Sun Java 2D OpenGL flag, setting it to true, which turns on the OpenGL-based pipeline. Now, the first thing you'll notice here is that, well, this time it came up. Last time it came up in the Aqua look and feel. Here it's coming up using Metal. Viktor will tell you a little bit more about that later. There's some bugs to be worked out with respect to the Aqua look and feel first.

But as you can see, the demo looks basically the same. But behind the scenes, since the OpenGL pipeline was turned on, we're using OpenGL behind the scenes to render even the most basic swing elements here, like the text and the internal frame itself. But the real power is seen if we open up the Gears demo again.

You'll see that the frame rate has jumped considerably. I think it's about 50% faster than it was before. You'll see that we still have the same artifacts. That'll be fixed. But... If I increase the size of the internal frame, I think performance goes up, again, about at least 50%, 60% faster.

This is still early. On our own platforms, for example, on Linux, Solaris, and Windows, we see improvements of this case up to 3, 4x. So this is just a taste of what Mustang can offer in terms of the mixing of 2D and 3D elements. I think the really powerful thing here for developers is that it shows that you no longer have to think in terms of creating, say, an OpenGL application or a Swing application or a Java 2D.

It's not an either/or situation anymore. It's really possible now to integrate 3D rendering. Even more complex 3D demos will run together inside the same window. For example, I can open up this refraction demo behind the scenes. You'll see that, again, there's no issues with mixing. The Z order is respected of the frames.

Going back to this demo here, I can show you the background is rendered with a Java 2D gradient back here. There's a 3D scene rendered on top of that. There's some more heads-up display on the front with images and text. So the nice thing here is that it allows for mixing, even in the same component or the same window. You can mix 2D and 3D elements. If anyone was at the Eryth demo that was shown at the Java session on Tuesday, you saw this in action. You saw that you can actually mix 2D and 3D in the same application, and it's seamless to the user.

[Transcript missing]

Greatly improved performance, improved stability, much better quality, opens up the doors for this 2D and 3D mixing. And I think the coolest thing for this audience is that it's not just for the platform's Sun supports anymore. And it's coming to Mac OS X very shortly, just like I showed you. So I'll hand it back to Viktor, and he'll tell you more about the work that Apple's team is doing in this area.

Thanks, Chris. Yeah, OpenGL Java running on a Mac. Exciting. and Chris Campbell, Rick Altherr, and Chris Campbell, Rick Altherr, and Chris Campbell, So I'm going to talk to you about the OpenGL pipeline on a Mac. We just announced this in DP5, which went out on Tuesday, I believe. And some had this thing since Java 5, but we just got it implemented and it's running in Java 6. And currently, there's a beta sticker. And I'm going to talk about what that beta sticker means.

First, all of this graph should be familiar with you. The only thing that I kind of want you to pay attention to right now is-- so we have the Java to the API. And at the end, we have the OpenGL rendering engine. In the middle, we have this thing that we didn't have in the previous slides, which is Sun Java to the OpenGL layer. This is actually the code that Chris has been writing at Sun. And again, it's blue, so it's written by Sun. And we ported it to Mac OS X. And we had to write a very, very thin layer of Mac glue code to connect that to our OpenGL implementation.

So this is very similar to the Sun to the renderer. We're leveraging most of the work that Sun is working on. And we're just writing a thin layer that sits on top of our OpenGL drivers. So this is exciting, because you'll get the same code running fairly well on all platforms.

OpenGL Pipeline. There are many pros and cons. I'm going to focus on three of them. And the number one advantage, as Chris mentioned, it's speed. And everything should be running in hardware. Therefore, it should be much faster. Now, disadvantages is just because it's running in hardware, your app will be hardware dependent.

So this is very important now, pay attention. If you have the same app, same code, running with the same Java release, with the same operating system, again, the same Java app might behave differently on two different machines with two different cards. So this is kind of something to pay attention to.

And if you're going to ship your Java app with the OpenGL pipeline, you better have a pretty good test suite where you test on every single graphics card that you're going to ship on. So another disadvantage is there are some technical challenges, and I'm going to go into those right now. So that's why we have a beta sticker right now.

Could be an alpha, we just like the word beta. Currently, applets don't work with the OpenGL pipeline. OpenGL pipeline doesn't work with the resolution independence in Leopard yet. We have some issues with overlapping components, missing lightweight and heavyweight. And as you saw in Chris's demo, Aqua look and feel doesn't work with OpenGL. These are all things we're working on and we're hoping to have it resolved by the time we ship SEX.

So this is the history of the pipelines. We have the Santonion Quartz pipeline, same as the graphic before. In six, we're going to introduce an optional pipeline, the OpenGL pipeline. I want you to kind of pay attention now to the 1.3 column. There is this OpenGL optional. What that means, and for those of you who don't know the history, is Apple already shipped an OpenGL implementation. And we were the first Java that actually had an OpenGL pipeline.

But we kind of decided to move away from that effort and kind of leverage on most of the work that San has been doing. Why have two separate implementation when we can have one implementation? We can both work together to make it better. So just to make it clear, there are two different implementations. But yes, we did have OpenGL acceleration in 1.3. How do I make it go? Same flag on all platforms. SAN Java to the OpenGL equals true. So now let me give you a demo of the OpenGL pipeline on Mac OS X.

So I have some notes here. Basically what we're going to do, we're going to run it first with 1.5, and that will be with the Quartz renderer. So what we're seeing here is we have these bouncing balls. And what they do is we're going to try to put as many balls on the screen until we reach 30 frames a second. So we're going to wait a little bit until it stabilizes. And we basically saw that we reached 30 frames a second, and we have about 1,800 bouncing balls. Now we're actually going to run this with the OpenGL pipeline.

Let's see what happens. So we're going to wait a little bit and we're going to try to basically double the number of balls until we reach 30 frames per second. So we see we're getting lots and lots of balls. And eventually it's going to stabilize at about, I don't know, 10,000. So we get about five times improvement of the OpenGL pipeline. Yep, for some odd reason, people get a kick out of these bouncing balls things. I don't know why, but they like to see bouncing balls. So can we switch the slides for now?

So the take-home message of this thing is if you have a lot of bouncing balls, use the OpenGL pipeline. Well, seriously, we're working on the performance numbers. As Chris said, sometimes we'll go faster, sometimes we'll be slower. But this just gives you a glimpse of the future. We're going to switch gears now. We're going to talk about resolution-independent Java. You've probably all heard about resolution-independent in the Java overview or just general. And you're probably wondering, how does that affect your Java app?

So here's a little story. How will that affect you? This is running a Java app, IntelliJ IDEA, at 72 DPI, 1x scale factor. Let's say you go out and you buy this fancy new display that doesn't exist yet that has 216 DPI, which is three times the number of dots per inch. You plug in your monitor, you fire up IntelliJ, and it's going to look like this. You won't be able to see anything. It's so small.

[Transcript missing]

Yeah, it looks fairly crisp. And then the primitives, like these lines, will also be crisp. There's going to be one problem, though. The images will be kind of scaled or blown up. The reason why this is happening is because currently, buffered images are not aware of resolution independence. And we'll need some help from Sun to provide us new APIs to kind of have different resolutions of the buffered images. Until then, all buffered images will be blown up.

So this is how, again, same idea, same app looks in 72 DPI, or a scale factor of 1x. And this is how it looks in 3x. So let me see if you see the difference. Do you actually see the actual like jaggedness? And I'm going to flip between the two.

High DPI, low DPI, high DPI, low DPI. So kind of just pay attention to the text. You have so many more pixels to do that. Anyway, I don't know about you, but I'm excited about those new monitors that are not out yet. So let's summarize this. First, we need some help from Sun for resolution independence from Java. And currently, only six on Leopard is resolution independent, only using the Quartz renderer. We're working on making the Sun2D renderer work with resolution independence.

Time to summarize, and in the interest of time, I'm going to skip through these slides really, or I'm going to breeze through them really quick. What I'm going to do now, remember those questions I asked at the beginning? We're going to answer them now. So if you haven't been paying attention to this talk at all, this is the time to pay attention, because this is what you should get out of it. Different pipelines on Mac OS X, we have three: Quartz, Santuri, OpenGL.

Pros and cons of each pipeline. So I'm going to kind of breeze through this. Quartz, we know it's good at anti-aliased drawing, high quality drawing. One huge problem, this impedance mismatch that we kind of beat to death. San2D, Windows, bug for bug and pixel for pixel compatibility, really good. You pay no penalty for the impedance mismatch. However, you get some slower...

[Transcript missing]

Simple derivative drawing compared to Windows XP. This is important.

The Sun 2D should give you similar performance to Windows. Quartz, on the other hand, like we saw, it's a roller coaster ride, and it does beat Java and Windows pretty handily for anti-LDS primitives, big images, big fills, and translucent images. For everything else, like the LDS drawing, it didn't do that as well.

How can you take advantage of OpenGL acceleration from Java? Chris talked about this. We have the OpenGL pipeline. We have the Joggle bindings. And then we have this bridge that Chris talked about, the OpenGL and Joggle bridge, when they all draw into the same context. Is Java going to be resolution-dependent? Yes, on Mac OS X. And what we're going to get, we're going to get crisp text and crisp primitive. Scaled images are blown-up images. And basically, we need some new help from Sun. Java needs to be able to distinguish between points and pixels.

Currently, there is no such mechanism, but hopefully, it's coming. The most important question for this session, and we kind of talked about this a lot, was the best pipeline for my app? And the big answer is try it yourself and use this criteria to kind of evaluate. And the second question is, how can I use Shark to profile my Java app? Well, this is a trick question because we haven't talked about Shark yet. And to tell us about Shark now, I'd like to invite Rick Altherr from the Architecture and Performance Group. So welcome, Rick.

So Shark for Java is something that we introduced a couple of years ago. And let's quickly talk about what Shark is. Shark is a profiling tool developed by my team at Apple that is a profiler, does a variety of different performance characteristics, helps you identify the performance behavior of your application, and also helps you find performance problems or bottlenecks. It supports a large number of languages, mainly native machine languages. But we also do support Java.

We offer both a GUI and command line version of the tool. So you can actually run it in automated test suites, various configurations. And we actually have a new version available for download as of yesterday, or it's also new today, at this website. It's newer than what is actually on your Leopard seed disks and or Xcode 2.4.

So why do we have a special thing for Java? Well, when you talk about profilers, the standard way of doing a profile is you look at what's executing on the processors and collecting information and figuring out what was actually executing. The problem with Java, of course, is you're looking at the VM. You're seeing the JIT code running or the interpreter running or whatever's happening.

That doesn't really help you find out what's happening in your application. It's what's happening inside the VM. So, Shark for Java is a JNI extension that gets loaded into the VM so that we can actually retrieve a bunch of information about what is executing inside the VM. So, we can actually show you which class, which method, etc.

So how do we do this? Well, there's two different methods that we use. And the reason is that initially we started with the Java VM Profiler Interface. The Profiler Interface was an experimental interface that was put in in Java 4, or 1.4, and provided some pretty rudimentary hooks, but it was enough to get basic profiling information, what was executing, where you were running. It's actually deprecated in Java 5 and completely gone in Java 6.

So we had to move to something new. Well, what they did is they combined the Profiler Interface and what was the Debugger Interface into a unified Tools Interface. And this actually gives us a lot more flexibility. We can do a lot of new types of instrumentation and various things, but it also takes a bit of work to actually do this change.

We're not completely on feature parity yet, but we're pretty close. The important thing is that Shark for Java works in both. You can actually use the same version of Shark in both JVM PI with Java 1.4 or Java 5 and JVM TI with Java 5 or Java 6.

So we offer three main types of profiling for Java. The first one is our classic time profile. What it does is it looks at where were you actually executing and spending your time. So we show you a couple things. One, we show you the number of samples during the session that were in each function or each method. And we actually show you which package, class, and method that was.

So this way you can actually look at-- here it is. And in fact, there's the little triangle there we'll show you in a little bit. You can actually look down the back trace or the stack trace and actually see how it was invoked and by what other methods.

The problem with this is that it's a statistical sample. So you can actually get in cases where you can't accurately depict what was happening by doing a statistical sample because things are happening too fast. So we offer another method called Call Trace, which actually looks at the entry and exit points of methods.

Now this significantly slows down the execution of your program, but it gives you an accurate number of invocations of methods as well as time spent in those methods. So you will literally see everything that happened from the time you click start to the time you click stop. It just offers a large performance penalty, whereas time profile being statistical, actually your app runs very close to if you weren't running Shark at all.

Now these are great for finding out where you're spending a lot of time, but in Java, one of the larger performance bottlenecks can be memory allocation. So what we do is we have a trace that actually looks at where objects are allocated. So we collect this and summarize it as a total size of allocations that happen in a method.

And this way, during the course of it, you can see, well, I have one method that allocates 16k over the course of the sample. And that's not actually too bad, but I have actually seen some applications that allocate multiple tens of megabytes in one function constantly. That's just the way that things happen.

Now the allocation trace is currently only in the profiler interface. We're working on getting that in the tools interface, but it requires a little bit more work that we haven't had time to finish for what's available today. It'll be coming soon. So let's do a little guided tour of how you might use Shark.

So we have this demo program. And all I'm doing is I'm invoking Java with the Java Tool Interface command line for Shark, which loads the Shark JNI extension or the Shark agent. And the agent itself actually does nothing in the background. It's just present, loaded into the executable.

What we have is a ray tracer written in Java that is multi-threaded, and we'll let it go ahead and run. It does two passes. It does the actual rendering of the scene. When that completes, it actually goes through and does a full anti-aliasing pass. In the course of doing this, it actually visits every pixel multiple times and does a lot of calculations on each one.

As you can see, performance for a ray tracer actually isn't terrible. It does a pretty good job. But there are certain spots where it really slows down. And we should be able to make some improvements here. So we're going to go ahead and let this run just to get a baseline number as to how long this takes. And it's almost done with this.

So here we see I have a little drawing problem in my app, but it took 41 seconds to actually render the whole scene. Well, what we can do is say, well, what was actually happening? So here I have Shark in Java time profile on my Java process. And I can click Start over here.

And I'm actually going to use a hotkey. I can hit Option Escape on my keyboard to start Shark without even having it in focus. Let it run for a little while. And you'll notice that the renderer is still actually running at a pretty decent performance. I stop it. And now I'm collecting the samples. And here we can see I'm spending most of my time in four places-- some SHA-1 calculations and also in the math libraries calling POW and FLOR.

So in this case, we know a lot about how the execution is happening, but you go, well, there's nothing I can do about changing how floor is implemented. Well, Shark lets you do an option called charge to callers. So instead of saying, these samples happened in strict math floor, I can say, well, I can't do anything there, so make them look like the samples came from whoever called it, which moves it to math floor.

But, you know, that's not interesting either, and in fact, the entire package I can't do anything with, so I can say charge the entire package to the callers. Well, now, everything, this new method pops up, which is hit transform. Let's take a look at what's happening. I can double click and it takes me to the actual lines of source code in the application and shows me which lines of source. We change background colors in the table to indicate, I don't know,

[Transcript missing]

So we can go ahead and remove a lot of these calls. And we'll actually see that performance, not really-- there might be a little improvement there.

But what we find is that floor and power are actually being called a fairly significant number of times. And because of being a statistical sampler, we see Nyquist sampling theorem comes into play, where we just see it ending up in that function a lot, even though it doesn't take a whole lot of time.

If we go back to the profile, there were some other things that we saw which were SHA-1. Well, what's happening is all of these different textures that you see in the scene that are rendered, A number of them are actually generated using SHA-1 hashes as a texture, a way to get the random numbers for texturing.

These are calculated on every pixel multiple times in both passes, the render phase and the anti-aliasing phase. So we've done some optimizations to go ahead and look and see, well, okay, where am I being called? Because SHA-1 itself is actually pretty difficult to optimize anymore. So let's look. Well, we see that this one particular method, we have a number of calls to SHA-1.

And in fact, The same number is going to be passed to SHA-1 over and over and over again. Each time a new pixel is visited, the calculations, you can kind of see that the numbers are going to be very similar. So what I've done is I've gone ahead and implemented a cache inside of these SHA-1 values so that we don't have to recompute them every time. So go ahead and start it. And this is-- didn't actually do the SHA cache.

So the render phase actually has to go through and generate all these once and cache all these values. But the interesting aspect is that the anti-aliaser uses them once again. So the anti-aliasing phase happens much, much, much faster. And this was all found just by taking a very simple look at what was happening and where we were actually spending our execution time. I'll go ahead and let this finish so we can see how long it actually takes. And then I'll come back and show you a new profile of what we find.

But there we see it took 32 seconds. We shaved off a little over 10 seconds. So now if I run it again and take yet another profile, we can see... We're still calling floor a fair number of times, but now we're actually spending a fair amount of our time in rendering the image itself. So you could repeat this process and look at the different types of things that are happening underneath the covers.

Back to slides. So that's a brief tour of Shark. For more information on the Java pipelines and other things, there's the WWDC website with documentation and sample code. There's also Chris Campbell's blog that you can go look at. and we're actually a little low on time, so I think that's all we have time for.