Mac OS X and Scientific Computing - WWDC 2005

Enterprise IT • 1:08:36

Whether it's the PowerBook G4 for UNIX to go, the Power Mac G5 workstation for serious computational horsepower on the desktop, or the immense power and scalability of Xserve G5 and Xserve RAID, Apple delivers an ideal platform for scientific computing with an exceptional price/performance ratio. Learn how Apple products are driving momentum in scientific markets and hear how scientific developers are using Mac OS X Tiger technologies to deliver innovative research tools.

Speakers: Bud Tribble, Osman Ratib

Unlisted on Apple Developer site

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.

Hi everyone, welcome to Mac OS X and Scientific Computing. I'm Bud Tribble, Vice President of Software Technology for Apple. And I've got a lot of great stuff to go over today. I'm going to be talking about Mac OS X and Scientific Computing in general, but we've also got sort of the meat of this presentation, which are some great demos from some third parties that I'm sure you'll enjoy.

To get started, it's been a year since I was up here last talking about Mac OS X and Scientific Computing and pointing out some of the momentum that we were starting to see in the science market with Mac. And I'm pleased to report that that momentum has not only continued, but it's on an accelerating course.

And I'd just like to take a few moments to highlight a few things here. First thing, in August of 2004, we introduced our iMac, 17-inch and 20-inch. Now, why is this interesting for science? Well, any of you who have done any bench science know that you like to have your computer on the bench, but bench space is sort of at a premium.

And so it's been pointed out many times to me as I've wandered around and talked to various labs. And I've been told that this is a great machine for bench science because it sort of sits up there on its aluminum stand. You can shove the keyboard under it. Someone pointed out to me that it was sort of impervious to spills on the bench. And I mentioned, well, that sort of depends on exactly what is spilling on the bench. But this is a great machine for the scientists.

XServe, RAID, I'll come back to this subject. More and more science is about big storage. And Apple originally got into the storage business. Part of our market is the high-end media market. Those guys eat terabytes for breakfast. But this is more and more the story of science. The ability to either collect data, store it, data that is generated during simulations, data that has to be analyzed. Almost any lab you walk into these days, there's a need for big storage.

And if you look around the market and say, well, what's the most cost-effective way to do that? It turns out XServe RAID is there. It's a bit between $2 and $3 per gigabyte of fully rated storage with XServe. Fiber optic connectivity in a nice 3U mount system, you can get 5.6 terabytes. So it's the kind of thing that no lab should be without. If you're not generating terabytes of data, you're probably not doing real science these days.

Just a couple other things to point out.

[Transcript missing]

Portability. So, Apple obviously has always had a great lineup of portable computers. I can't tell you the number of scientists that have come up to me and said, you know, it's just so great that I can have Xsers in my lab or clusters or grids in my lab running my simulations or running my analysis.

But, you know, if I'm on a plane to a conference, I just take my PowerBook with me. And it's exactly the same environment. That is such a productivity boost for anyone involved in research. And with the PowerBook G4 introduced in 2005, you've got 1.67 gigahertz. It gets slightly warm, but that's okay.

and then a key product launch happened in February 2005, BioTrue. BioTrue is a collaborative system for sharing data, very useful in the biosciences. Continuing on, BioIT World, we got the cover of BioIT World. And as you probably know, BioIT or Biotech is one of the first places where Apple really got a strong foothold starting a couple years ago.

And we got our foot in the door, that is, the door is now wide open. And I don't know if you'd call us dominant, but certainly it's hard to go into a bio information lab without seeing Macs around, without seeing Macs used for analysis, for genetic analysis, biochemical modeling, etc.

Continuing in that vein, what we're starting to see this year, March 2005, we were at the American Chemical Society. What this represents is Apple moving into other parts of the pipeline for drug development. So we sort of started in the genetics part of that pipeline, genetic modeling and some of the biochemical modeling. American Chemical Society represents a different part of that pipeline and we're coming on extremely strong in that area and I expect that trend to continue. Again, more products launched in April 2005, molecular imaging from Kodak.

Just a little bit more information about that. We're starting to see a lot of new applications, I'd like to mention a few of them, SAS, drug development. So what we're seeing here is building momentum. Last time I was here, I spent a lot of time talking about all the open source applications that were available for Mac OS X. That continues to be the case, but the big news over this last year is that we have penetrated this market enough to the point where the commercial applications are moving over in large numbers. So that's just really exciting for us.

On April of 2005, of course, the Power Mac G5, 2.7 GHz, for any real number crunching, this is a must. With the AlteVec and the acceleration libraries, any sort of simulation or analysis that involves floating point or double precision floating point, this is probably the top system that you can get today.

[Transcript missing]

This one I won't spend too much time on because this is one of the examples we're going to see, but June 2005 we saw Wolfram Mathematica launch with 64-bit. You're going to see that here today and in some great detail. And the last thing I want to mention is macresearch.org.

So this is a great website if you're using Macs at all in science and want to get hooked in with the community and find out what other people are doing and find out hints on how to make things work or what applications are available. Macresearch.org is a great resource for that.

Just a quote that I think is really reflective of the current situation with respect to Mac OS X and science. This is from Mario Rotorer of Stanford University. "In research life sciences, probably 50 to 70 percent of research laboratories use Macs. It's by far the most common analysis platform." This is amazing.

I mean, five years ago, if you had said this to someone, they would have thought you were crazy. This is absolutely the case now. Mac OS X is the platform that scientists love to use in computing. We're showing up first in life sciences, but I expect this trend to continue throughout chemistry, physics, etc.

Now to the fun part, I'd like to start out by introducing Dr. Osman Rattib. Dr. Rattib is professor of medicine and vice chair of information systems at the Department of Radiology UCLA. He's a board-certified cardiologist and radiologist who obtained his medical degrees at the University of Geneva, degrees in biophysics, and a PhD in medical imaging from the University of California, Los Angeles.

He's responsible for coordinating the development of enterprise-wide strategy and infrastructure for image management and communication. His clinical activities include cardiovascular, MR, and CT imaging procedures, combined PET/CT imaging, and advanced cardiovascular imaging. And I'm going to ask him to come up and show you some of the very exciting stuff he's doing. So, Dr. Rattib, welcome.

Thank you very much, Bud. Good afternoon. Yeah, my name is Asmar Lateef, and as you heard, I have a lot of responsibility at UCLA, and we are very excited to be here. I would like to first thank Apple for giving us the opportunity to share with you a very exciting project that we had for about a year, a year and a half now, in developing what we think is a killer application in open source and medical imaging. And thanks to OS X and Tiger now, we were able to make that application, really, a very high level, very exciting.

And thanks to all the features that we were able to put in our software, it became really very popular. So let's start with a big, small, or just an introduction of our software. In a few words, it's just a 3D viewer of medical imaging. We call it a 3D, 4D, and 5D because now medical imaging comes into very large data sets that are acquired in three dimensions and can be acquired over time.

That makes you a fourth dimension. And you heard about now things like molecular imaging, functional imaging, which add more and more data to the data set. And we're going to talk about that in a little bit. So let's get started. So the software is a software that is designed to provide the tools that are necessary to visualize, manipulate, and interpret those images. We intended to make it open source so we can benefit from a large community of academic people that will contribute to it.

And it became very popular, as I will explain to you in a few minutes, where people have now joined our group and developed other part of the software and are contributing to it. It is developed by physicians. I'm a cardiologist and a radiologist by background. And I'm here with Antoine Rosset, who is here with me. He's a radiologist, a board-certified radiologist from the University of Geneva.

And he spent one year with me at UCLA as a research fellow. And again, his clinical background helped him develop that platform designed for clinicians. And I must say about five or six other physicians have now joined from different universities, have joined our group in open source, and have contributed with very clinical-oriented application or extension of the software.

It is already used by thousands of users. And it's a very good software. And it's a very good software. And it's a very good software. And it's a very good software. And it's a very good software. And it's a very good software. And it's a very good software. And it's a very good software.

And it's a very good software. We've done a survey in December last year, which brought back about 2,000 responses of people that are actually actively using. We have 10,000s of downloads, but I would say probably now we estimate about 6,000 active centers in the world that are using that software for either clinical or research applications.

It is really intended to meet very high-end demand and performance, because it is for the new generation of medical imaging coming from scanner like CT scanner, MRI scanner, ultrasound scanner. And it is really intended to meet very high-end demand and performance, because it is for the new generation of medical imaging coming from scanner like CT scanner, MRI scanner, and so on. These machines nowadays generate huge amount of data that are very hard to handle if you don't have access to very high-end, very expensive workstations.

And we wanted to fill the gap to provide a tool that will provide other users that don't have access to these high-end machines with something that they can use on their laptop or on their own computer to visualize those images. And it is intended really as a collaboration tool. It's very important in medicine. We know that one of the difficulty we have is to convey information from one physician to another.

And we think that will help us. That software will help us. And we have a lot of data that we can use to do that. And we have a lot of data that we can use to do that. And we have a lot of data that we can use to do that. And we have a lot of data that we can use to do that. And the software will help us to do that.

The The challenges are multiple in medicine, especially in medical imaging, because we are now faced with huge amount of information that we have to deal with for every patient. Every patient gets a lot of images, a lot of studies. And we are, as physicians, we have to go through a lot of effort in trying to interpret, review, analyze those data. We also have a problem, as I mentioned, communicating with other physicians, because we're dealing with a lot of data, a lot of complex data that we have to share. And it's sometimes not very easy in the usual environment that we currently have in our institution.

Performance is very important. And advanced processing tools are required to process those data. You will see that in the demo. And it's very critical to have the performance and the speed to create things like dynamic 3D visualization, rendering, sculpting, and we'll show you that in a minute. But most importantly, I think we are under heavy pressure in healthcare today to have very high performance, very high throughput. So we want to make it easy for the users to do.

We don't want users to be IT specialists to be able to use the workstation. So that's where the Mac comes in. That's where the OS X user interface. That's where all the consumer market tools that we integrate in our software makes life very easy for physicians, that they can be very high performance, focus on what they do best, which is diagnostic clinical diagnosis and clinical decisions and don't have to learn all the complexity of the software behind it.

So based on that, we actually incorporated in our software as much as we can from the existing tools that are provided in Tiger and OS X in general. For performance, of course, we use things like Altevec and Quartz and other tools. OpenGL was very useful to do very rapid image manipulations.

Things like QuickTime 3D allows us to generate 3D VR object that can be exported and sent to referring physicians. Communication was a key here. So we integrated things like email. We integrated things like exporting things to a iDisc so that physicians can use it as a storage media to store their data. I know physicians that are using it for being on call at home. They have their technologists upload the images to their iDisc and they can download it and just review it using OSIRIS. We incorporated iChat.

We'll demo that in a minute. We incorporated things like being able to export to your iPod and carry your data with you. 40 gigabytes in your pocket is pretty good for carrying several studies that you can't put on a CD or a DVD. And of course, we integrated things like XGrid because performance is key here to be able to expand and be able to process and analyze very, very large sets of data. So I think the best is to move on to the demo that Antoine is going to be giving here. And we'll show you some of the features that this software does. Of course, these are some of the components, open source component we have.

Starting in the software, we'll give you a database browser of the patient list and data list that we have on the disc right now. Every patient can have multiple study. Every study can have multiple images. And these are hundreds and thousands of images for each patient. In order to show these series or sets of images, we have thumbnails that actually dynamically browse very rapidly through those images so you can rapidly see which set you want to look at. And you have a preview on the lower corner that shows you a larger view of that.

So the icons and the lower corner allows the user to browse through a very large set of data very quickly. Then the next step is to open those images. And you can open multiple sets together at the same time. So here, Antoine is going to show you two sets of data from different modalities that were acquired in the same patient. The one in color is actually a PET. It's a functional metabolic image.

And the one in black and white is a CAT scan, a CT scan. These two sets are acquired in the same orientation, so they can be fused together. And that's what we do more and more now in medicine. We combine information from different studies. To make it simple, we just drag and drop one set over the other. It can be anything over anything, and the physician will decide what he wants to overlay. And now we have both sets put together and overlaid together. So now we can see that the hot spot on the left is the tumor.

And that's where the activity of the tracer has been. But you can also see the anatomy behind it and see what part of the bone are being involved and so forth. And most importantly, one thing we wanted to have is that everything we do on that software can be easily communicated to other physicians. You can export that by email, but you can also start an iChat session. If you don't have the tool, this is just to show you that the software can be customized. So there's a lot of tools that don't have necessarily to be there.

You can just drag and drop them when you need it. So every user can have his own customized environment. Starting an iChat session here, Antoine is going to start a session. He's going to call me on a laptop. So pretend I'm here thousands of miles away from Antoine, who's just here. And he's going to call me to try to show me an image.

And that's the beauty of integrating iChat, because it's a very simple tool to use. He has me on his body list, and he's calling me. And then all I have to do is just accept or don't accept. If I'm busy, I just say no. And now I'm here accepting the session. And then I can see the results.

And as soon as we get that, we'll be both together. So he's going to be able to show me the image. Now I have on my screen what he has in small on his screen. And he can see me. And we can, with the new iChat, we can have three or four physicians together conferencing over a case. This is great.

I've been trying to do this for years, where it was very specialized, dedicated, proprietary tools that were very expensive. Now with iChat, we have this performance and this facility practically free. You can use high-speed internet and use iChat. And here you are. You can do medical consultation remotely, what we call teleradiology or telemedicine.

And that, again, this is using consumer market tools to meet the demand of practice in medicine. Well, the software was basically designed for doing 3D and 4D. So I'm going to show you some of those features. And again, we use a lot of open source component. That's the beauty of open source. There's a lot of open source software out there to do 3D and 3D rendering visualization tool. So we use that. And we integrated that into our software.

Images come in slices. So basically, that's what the raw data looks like. These are just cross-sectional slices over a part of the body from a CAT scan. Now, the 3D rendering tool, we have multiple of them. We're going to show you one or two of those. We'll now allow to have a different vision of that, which is much more useful, especially if you try to visualize topology and orientation. Of course, this is a heart.

Being a cardiologist, I chose to start with a heart. Here you can see the whole chest pretty much rendered in 3D. And these are rendered real-time. This is nothing pre-calculated. This is rendered on the fly. Antoine is basically moving it up and down. And you see now we have the chest and ribs in front of the heart. So what he's going to do is he's going to just sculpt this out. He's going to do a little surgery.

Don't worry. There's not going to be any bloods. It's not going to be bloody. It's just, you know, like you do in Photoshop. Just take a region of interest, cut through the 3D volume, and let the computer just recalculate the rendering with that piece taken out. So this is how easy it is for physicians to go and process the data.

We want it to be very, very simple. Now we took the ribs and the chest wall off, and we can see the heart. We can see the coronary arteries. We can zoom in and out. We can change the contrast. All these tools can be done in real time, and it's very high performance.

If you're not in the heart and you don't understand the heart, well, we'll show you something that you may understand. Again, here are the cross-sectional images. Well, I'm not sure anybody will really be able to pick up where exactly we are in the body. This looks like a piece of head with eyes in it.

Well, we go for the 3D rendered image and...

[Transcript missing]

And last but not least, it's open source, free, so please join the group. Anybody out there that has great ideas about developing 3D or developing any image processing tools, please go on the website, join the development team.

This is going to be a great platform for developing new applications for the future. And we also provided a way to do plug-ins for those who want to keep their intellectual property. They can also have binary plug-ins. If you have the greatest segmentation or rendering tool you want to put in there, you can still protect it or even sell it as a separate plugin. Thank you very much.

I really want to thank Dr. Ratib. You're witnessing an incredible revolution in medical imaging here. I think it's great that it's open source. I think it's great that it takes advantage of all the things in Tiger. I personally, and I'm sure Dr. Ratib remembers, when I did my, I'm originally a medical doctor, when I did my training, you know, just the medical images associated with one patient, which were at that time physical, you know, chest X-rays and so forth, just to carry them around. These things were heavy. These were big acetate things with silver halide and there was, you know, it was the intern's job to carry them around.

Now they come through iChat and the medical images associated with a single patient would literally fill this stage if they had to be physical. So I think, I think this is an area, again, I'll come back to storage and storage of images like these and manipulation of them with high powered computing is going to be the story of the future. Next.

I'd like to introduce Rob Raga-Scofield. He is the guy behind Mathematica from Wolfram Research on the Mac. He was hired out of college to port Mathematica to, at that time, what was going to be the newly introduced Mac OS X after Mac OS 9. And he has been associated with that project ever since at Wolfram Research. So I'd like to ask Rob to come up here and show us what's latest in Mathematica. Thanks.

Can we have the slides, please? Hi. Thanks, Bud, and thanks, Apple, for asking me here. My name is Rob Breguet-Scofield, and I'm the primary Mac OS X developer at Wolfram Research working on Mathematica. Today I'm going to talk a little bit about some of the Mac OS X features we utilize in Mathematica to add value for our customers.

So first of all, I'll just give a brief overview of what Mathematica is, in case you're not familiar with it. At the most basic level, it is a calculator. It has symbolic, numeric, and graphic visualization functions completely integrated into one package. Next, Mathematica is also a programming language. It allows for functional programming, object-oriented, and more traditional procedural styles of coding. And then, with the Mathematica user interface, it has very extensive support for text, graphics, and typeset formulas to be completely integrated, which makes it a very capable technical word processor.

So, Mathematica began its life on the Macintosh in 1988. It had a great UI which talked to a computation engine that ran in the background. This computation engine also ran standalone on several UNIX platforms for a number of years with no user interface. Because of the more advanced multitasking and memory management, Mathematica generally ran better on these UNIX platforms than what it did on the Macintosh. For a number of years this was the case, until Mac OS X came along. With the merger of the superior Mac user interface with the more advanced UNIX Core OS, it really created the ideal platform for Mathematica.

So some of the things that our customers are trying to accomplish and some of the challenges they face in doing so. First of all, Mathematica allows you to run computations interactively. So you can examine the results at each step along the way. This is very helpful, for example, so you don't have to run a very lengthy computation only to find out at the end that there was an error somewhere along the way. The interactive environment helps a lot with that.

Users want to utilize sophisticated algorithms without having to rewrite everything themselves, reinventing the wheel. So Mathematica includes what is the most extensive set of algorithms of any software package in existence today. Users also want extreme performance, and so if they have parallelizable tasks, they would like for those tasks to be distributed across a grid of computers. So they complete much more quickly. And our product, Grid Mathematica, allows certain types of operations to automatically be distributed.

So users want to explore their data. They want to dig into it and they want to transform it into something useful. And so Mathematica has a really good intuitive environment for that. The interaction, interactivity is great for that sort of thing. And the advanced pattern matching in the mathematical language is very suitable for this type of task.

Users want to work with very large sets of data and they want to work with lots of different sets of data at the same time. So the 64-bit optimizations that are included with Mathematica 5.2, which is going to be shipping in a few weeks, allow users to break the 4-gigabyte barrier and work with massive datasets.

And finally, users want to visualize their data so they're not just looking at the screen full of raw numbers and text. So Mathmatica includes extensive 2D and 3D plotting functions. And then finally, users want to publish their results and make them look good. So Mathematica's top notch mathematical typesetting and numerous export formats make it a breeze to publish your results to the web, to print, or to anything else.

So some of the technologies we take advantage of in Mac OS X are shown here, and I'm going to give demos of some of these things. As far as the--so we take advantage of Apple's highly optimized linear algebra libraries that are part of VecLib to make sure that numerical linear algebra is as fast as it can be.

We have a great native Acqua user interface. And I think the rest of the things we're going to show, so I'll just get to that. And the first thing I want to talk about is the 64-bit support that's in Mathematica 5.2. There are a couple areas where 64-bit support provides advantages over 32-bit computers, and one example is arbitrary precision mathematics.

This is-- Here we're calculating the digits of pi, the first one million digits of pi, and doing so on the 64-bit version of Mathematica is nearly twice as fast as what it is on the 32-bit version of Mathematica. And I'm going to show that right now, so can we cut to demo one, demo machine one please?

Great, so this is Mathematica 5.1 running in 32-bit mode. It's calculating the first million digits of pi right now, and we're just going to see how long that takes. And it's a little over nine seconds. And we're going to run the exact same computation on Mathematica 5.2 running in 64-bit mode, and the result is going to come back much more quickly, just a little over five seconds. So there's a tremendous performance increase for the 64-bit version for this particular type of operation.

The other advantage of 64-bit support is being able to work with massive datasets or other memory-intensive computations. And this is something that I can't do in real time because it takes a couple hours to complete. But this is a simulation of a tsunami that is going to start in the middle of the screen here.

And the unique property of tsunamis is that the energy they carry, moves along all the way down to the bottom of the sea floor, that means that variations in the sea floor, like these mountains in this example, can cause disturbances all the way back up to the surface.

And that is what this simulation is modeling. So when we run this on a 32-bit computer, we had to actually back the resolution off a little bit so that the simulation could complete in the 4-gigabyte address space of the application. And as you can see, there are some artifacts that show up here.

And this isn't because the calculation's wrong. It's just because it's run at a lower resolution, which causes slight variations to be kind of exaggerated at times. So it obviously doesn't look correct. So we can do better on a 64-bit system. We increased the resolution and we rerun the same simulation using about 6 gigabytes of memory. And as you can see, all the artifacts disappear. And we get pretty much what we would expect.

So that is another advantage of 64-bit systems over 32-bit systems. As you saw in the keynote the other day, one of the features we're taking advantage of in a future version of Mathematica is OpenGL for interactive 3D graphics. You know, this is pretty neat, but lots of applications can do things like that. But because Mathmatica is a completely general system, we handle Very complicated things as well, and lots of times more specialized packages fail when they get more complicated inputs such as this here. So that is OpenGL.

Next is Java. We have a Java connection technology built into Mathematica called JLINK, and on top of that we have a package called GUI Kit, which enables you to build Java interfaces that basically utilize Mathematica functionality from within it. And what this is, is a An interface that allows you to plot different starting points and adjust parameters, and then it will solve numerically differential equations based on the parameters that you set. And this is all written in Mathematica, utilizing the Java environment.

And just a couple other quick things. Mathematica includes with it a library called MathLink and an API that allows you to call into Mathematica from external applications or call external applications from Mathematica. And so what we have here is an external application that is calling into Mathematica. It's a simple automator plug-in that will do one of two things.

It will either just evaluate some text as part of an automator action. The result of the action will be the actual evaluated version of that text. Or you can use it to basically process the input text, calling mathematical functions on it. And so I'll run another example here.

And what this code here I have is going to do is it basically capitalizes the first letter of each of the words. So that is calling into Mathematica. And one more example of that, just because it's interesting, is that there's also a service that we have that you can download from our website that you can use from any services-aware application that allows you to basically evaluate the text right in place. And there's even a keyboard shortcut for it. So you can immediately evaluate Mathematica commands from any application to text, or there's a version that will create a graphic as a result.

And the one last thing I'll mention here is we have a Spotlight plugin that knows how to read our documents. So you can quickly find text inside of your Mathematica notebook files. The spotlight will catch up with me. So here are some of the Mathematica files that include the pot3d command. But it also, because we include different types of metadata in our files, you can actually do advanced searches on different types of metadata.

So if you were working with a certain file and you don't remember where it is or what exactly it was doing, but you just remember it had a lot of graphics in it, You can actually search for all the notebook files that have, for example, more than 50 graphics in them. So you can quickly find what you're looking for. And with that, I will give it back to Bud.

Thanks, Rob. Well, I think that's a really great example, as are these other examples, of what you can do when you take advantage of the unique capabilities of Mac OS X, and especially Mac OS X Tiger. And I just want to stress one of them, which is the Tsunami demo. That is the perfect example of how to take advantage of the 64-bit address space, because, as you probably know, the front end of the application is still running in 32 bits, the Cocoa or the Carbon front end.

But the back end, the computational back end, where you're using the full 6 gigabytes, so that you don't have those artifacts, that runs on top of all of our new 64-bit APIs that take 64-bit pointers in Mac OS X Tiger. So that's the canonical example of the best way to take advantage of that full address space. And you see the advantage in terms of those artifacts are gone, because you can finally do that tsunami simulation at high enough resolution.

Next, it's my pleasure to introduce Dr. Dean Doudger of Doudger Research. And Dr. Doudger has a Ph.D. in physics from UCLA, where he created the very first Mac cluster in 1998. He's also the award-winning author of Adam in a Box and Fresnel Diffraction Explorer. And this I didn't know, he co-authored the original award-winning Kai's Power Tools, so that's pretty cool. And today his pursuits include computational physics, particularly dynamics of complex quantum systems and high-performance computing and visualization. So welcome and hand it over to you. Thank you very much.

Well, thank you very much, and thank you all for coming. I'm very glad to see you all here today, and thanks to the kind people at Apple to be able to speak to you today about Pooch and how you can use it to build plug-and-play clusters on Mac OS X.

So like first to be able to jump into this by describing what is Pooch is that basically Pooch is a piece of software that manages clusters and constructs them out of Macintoshes that are connected together over a network, over any kind of TCP/IP network. But one of the things that it supports is essentially supercomputer compatible calculations.

And so that you can take your codes that you ran on your, say you were intending to run or you did run on your supercomputer, say at the major supercomputing centers across the country, and instead run them on the Macintosh or optimize or debug them there and develop them there as well.

But one of the key features to be able to enable that kind of capability is that it supports MPI, also called message passing interface. This is, Pooch was the first software to be able to support MPI on Mac OS X back in 2001, and it supports actually five different MPI implementations now, but also supports this kind of supercomputing behavior such as queuing, launching these kinds of parallel jobs, keeping track of them, keeping track of their CPU time, and makes them accessible to users so that you can make the clusters plug and play, and you can go ahead and use your software, use your application, and go ahead and accomplish your work as you need to. The kinds of users that we have are from all disciplines such as physics and chemistry, as well as mathematics, biology, and across the board. And we have a lot of customers across all the different kinds of customers such as in academia and government and industry as well.

So, to give you an idea of what we did to essentially reinvent the cluster computer, this is how we support five different MPIs, open source as well as commercial MPIs, but we also dynamically manage the cluster and we also use Bonjour as well as Service Location Protocol to be able to discover the nodes out on the cluster, discover the computational resources and make use of them on the fly.

It also supports a number of diagnostics for development optimization so you can make it run really well on the Mac cluster or run that much better on some other hardware, so you can mix and match and actually have the Mac cluster complement some of the Big Iron if you don't have a large system yourself.

So, the idea of this build is to be able to bring high performance computing to the mainstream user and do it in a way that's completely independent of any kind of shared storage or remote command line login or mucking around with any parts of the system. So, you can focus on getting your work done and so that's really the point of this is that it's the lowest barrier to answer that we're familiar with to be able to accomplish this kind of work and so it's a result in the savings of time and money for users like myself as well as for everyone else that I know who uses Pooch. So, and so, which is the whole point of cluster computing in the first place is to be able to provide computational resources for users. So, some of the challenges of high performance computing clusters.

So, one of the challenges of high performance computing clusters is really is meant for problems that are simply too large to solve in one machine. Either it simply takes too long or you simply can't fit it within the RAM in one machine. But other cluster types, previous cluster types are very fragile and very hard to use because they rely on being able to configure things low level in the operating system to being able and so requires a lot of technical expertise to be able to understand what to do and what not to do when configuring and setting up these kinds of low level features.

And so, because the nature of the design of the way these things are built is so complex, it's really hard to get the right kind of software that's able to do it. things are put together to the previous cluster types that the users become responsible for solving any kinds of incompatibilities or any other bugs in the problem.

So one of the ways we try to solve that is to be able to take advantage of a lot of the things that are in desktop computing to be able to bridge the cluster to you as the user, let's say. So for example, we support MPI, so we support the supercomputer, the same calculations that you would run on a supercomputer, as well as we use Bonjour to be able to discover dynamically the health and status of the nodes out there on the cluster so we can dynamically respond to any problems that might occur, as well as we provide, say, AppleScript and Automator support to be able to make it an easier way to be able to launch jobs on the cluster, as well as different kinds of windows into the cluster, such as Spotlight and Dashboard, and other ways to be able to access the cluster and combine these things together.

So if you wanted to do it yourself, you know, you know of various books to be able to build clusters, really our book would be called "The Mac Cluster Recipe." You take a bunch of Macs or XSERVs, G4s or G5s, get an Ethernet switch and some cables, and the directions are simply to connect the hardware together, download Pooch, and go ahead and install it. And it literally takes seconds per machine to be able to install the software. So what I'd like to be able to provide for you today is a demonstration, a quick demonstration, we hope. So I'll switch to demo number three. Let's see.

Let me just get that going. Okay, so the first thing I'd like to be able to give you, to be able to show you, to give you an idea of, say, a numerically intensive application that takes a little while and sort of stresses out the processor, but this is a Z to the Z fourth iteration fractal, a little different than the Mandelbrot fractal, that produces a nice, you know, graphical result, and it achieves about five gigaflops on my one power book here, 1.5 gigahertz G4 in this case, and so that's all well and good.

And if I wanted to make it go faster, well, if I had a second processor, it can make use of multitasking, but what if I want to get beyond a factor of two? That's where you need to get outside the box, and that's where parallel computing comes in. That's where Pooch comes in. So to be able to install Pooch, this is how long it takes.

And there it is. So that's how long it takes to install Pooch, and you do that on each node that's available. And so the way actually, let's go into it this way. I can use Spotlight actually to, if I know that I queued up some jobs a little while ago, and I can have it, we have a Spotlight plugin that exposes the queuing system inside Pooch to Spotlight, so you can go ahead and search for other jobs that are out there.

So I can pull up the, what the nodes of the job that's there, but let's edit this to be able to do this particular application. So I can remove that, go ahead and drag in the fractal program, and let's remove these and select some nodes. I can open up, I can select some nodes and I can see, oh, I've discovered a couple of other machines here.

There's a couple more G5s right behind the scenes here, but I can see that they're actually reporting up as busy here. Oh, actually, let's have a look at what kinds of processors they are. I can see that dual processor G5s, they have a, you know, 2.7 G5. 2.7 gigahertz. And also, they actually use this information to be able to rate each one of these units.

But of course, the running is busy, so I can actually see, gee, you know, what is it running? I can see, okay, there's a running job there, so I want to go ahead and like say kill that job that was there, and it goes ahead and so as soon as I do a refresh, I can go ahead and see the, what's there to be able to make use of the nodes. So let me go ahead and, oh, yeah.

So it just takes a little while to be able to, you know, get the nodes. So I can go ahead and, oh, yeah. So it just takes a little while to be able to, you know, get the nodes. So I can go ahead and, oh, yeah. So it just takes a little while to be able to, you know, get the nodes. So it just takes a little while to be able to submit the data that's there.

Okay. So I can see that it rates each one of these nodes, and I see that it gives the G5s a pretty high rating, so let me go ahead and select some of those and bring those in. Some of the other features that you can add, you can delay the launch until some later time of day, so, or say at a later time, so you can use a colleague's machine after she's gone home from work.

You can, it supports five different MPIs as well as grid type applications, compute jobs and so forth. But to be able to launch the job, I just click on launch job. So this copies the executable out to the other machines. You might have heard that that was pooch barking to be able to tell you that it launched correctly.

And so when it passes control to the executable, you can see that it went quite a bit faster, something like 26 giga-flops or so. So, you know, all this is all well and good, but what if that wasn't enough? So I can go ahead and pull in some more nodes by essentially extending Bonjour. What I can do is, you know, I can go ahead and Say select, let me see.

That's right, select some machines over at UCLA. So this is something like 300 or 400 miles away or so. I can go ahead and pull in some of the machines that are there and make use of those. So let's see what we can do. I can see that we have some G5s over there. And to be able to get through the firewall, I need to use a particular port number. Yeah, there we go. And let's see. So it's now actually copying that executable. Whoops. It should be copying that executable over to UCLA.

And... One of the other things that I can do here is actually sort of an iTunes interface that I can select the local network from essentially our node playlist on the left side of the interface. And if I make use of that. There we go. It's now combining the nodes that are here with the ones at UCLA and asking the, and so it's combining, this is a 15-processor cluster here that's distributed over about 400 miles or so, and I get something like 88 gigaflops or so.

So this is substantially more improvements. And so my distance record right now is like about from Munich, Germany, about 6,000 miles away back to UCLA, combining these kinds of nodes together. So just to be able to show you that this not only works not just for fractals, what I'll do is pull in a physics code. Let's see, actually let me give you an idea of what this looks like when we run just a single processor.

There you go. So this is actually a million particles, a million charged particles all interacting electrostatically and showing the electric potential as a function of time. And so it's from one frame to the next frame to the next frame or so. So this is roughly two-ish seconds or so to be able to run this. And we can see that it's running a little bit slower than we'd like. So let me go ahead and quit out of this. So if I drag and drop that into Pooch and go ahead and select my little cluster here.

There we go. Okay. So actually the computations are now being done on the four processors that are here. And so one, two, three, four. So it's actually going, you know, you can sort of roughly about four times as fast. And we can see that actually the live message passing pattern is on the lower right of the screen here.

It's showing the live message passing pattern as it goes as a function of time, as well as a histogram of the number of messages being sent and received as it goes. So while that's running, what I can also show you is one last thing is the features in the automator.

Let's see, so definitely a computationally intensive thing to be able to take up what's here. I can actually say use Automator to be able to have the finder select some items and then combine that with a number of different pooch actions that are here. So I can have it say launch an executable, like choose an executable let's say, and say launch the executable on a four node cluster as well as say on a single task or get nodes of a pooch cluster. In this case I'll choose distribute single tasks onto a pooch cluster.

And if I run this, I can go ahead and say select a particular executable, something simple, and this is like Automator is actually going ahead and submitting an executable as single node tasks into the pooch queuing system that pooch will later launch on the local nodes that are here after the other job runs. hands.

So that completed the queuing of the system. Oh, and one other thing that's also in here is that we also feature a dashboard widget. So we're actually able to have the dashboard be a window into the cluster. We can see that it shows the history of the job activity as a function of time over the last couple of days.

And it also shows the cluster capacity. So actually, it estimates that I'm using pretty much all of the cluster capacity of the local nodes that it finds here, and also lists how many nodes there are and some of the current status of the cluster. So with that, that's the conclusion of the demo. I wanted to thank you very much for coming here today. And thanks for your attention.

Thanks, Dean. Well, I think one of the messages there is that before Pooch and before Pooch on Mac OS X, Doing something like that in a lab would probably require a programming staff, a lot of time, a lot of effort before you even began to get to the science part of it. And the beauty of something like Pooch on Tiger is that that is all done for you. You saw the Automator demo there. Basically, you know, grid computing with drag and drop. It's an incredible boon to scientists everywhere, I think.

So, next I'd like to sort of present to you a challenge. And what I believe is that the Mac platform really is a platform that allows you to push the limits. And I want to talk about what those limits are and where you should be pushing over really the next 10 years on this platform.

The first is really big computing. And we saw examples here. It's computing that is tackling very large problems, problems that could not even be thought about 10 years ago. And it's taking advantage of 64-bit computing, not just in terms of the length of integers and doubles and floating points, but taking advantage of the fact that with Tiger you can break through the 64-bit or break through the 4-gigabyte barrier and have 64-bit pointers. And with the Tsunami demo you saw a perfect example of the kind of breakthroughs you can get with that. Really big computing also means cluster computing.

It also means grid computing. Both Mathematica and Pooch are great examples of that. And we put together in a few seconds a grid that extended beyond the walls of this room and could easily extend around the globe with that same simple drag and drop metaphor. So there are problems that can be tackled with really big computing that are ideal in terms of the Mac platform.

Cluster computing, grid computing, high-end numerical computing. We put a lot of work in our engineering group into making sure that the libraries, whether it's the Accelerate libraries or the AlteVec libraries, are available with simple APIs so that you can write apps that take advantage of that. And our numerical team makes sure it gets the right answer and we're watching every single instruction cycle. We're making sure that the cache load lines are completely optimized. So please take advantage of that. There's lots of sessions here that will go into a lot of detail on that. So for any numerical intensive scientific applications, Mac has a lot to offer there.

Second big area, and this really applies to any area of scientific computing, but we saw a perfect example of it here with Dr. Ratib and medical imaging. And you know that in the early part of this century we heard a lot about the human genome and the genome project and the online databases that have genomes of human and other organisms. Well, it turns out that medical imaging and functional medical imaging, the databases there completely dwarf what the genetics guys are dealing with.

I mean these guys just have four characters per codon to deal with, and what we're talking about with medical imaging are huge databases. And as you get into simulation and keeping the results of simulations around and being able to compare them and be able to share them around the scientific community, and visualize them, these are going to be very important problems. Now Apple is known for our ability to visualize. We have nice 30 inch displays that I'm sure all of you have back at home. This area is one where Apple spends a lot of time making sure that things are rendered properly on the screen.

We support OpenGL to the hilt and will continue to do that. We have Quartz Extreme for dealing with image processing in real time. So this is an area where Mac is uniquely situated to really provide a great contribution to science. Humans are just not good at looking at terabytes of numerical values. You have to turn it into a visualization if you're going to have it interpreted, have it published, have it actually contribute to scientific breakthroughs.

[Transcript missing]

We have a lot of things going on at the Developer Conference. I want to point out a few of them. First of all, we've got the Science Connection Room and a lot going on there. That's room 3014. And I believe, if I'm not mistaken, there's going to be some discussion there after the HPC presentation.

Science discussion is going on around, well, HPC and Science is Wednesday, Thursday. Scientific and Medical Imaging on Mac OS X. We saw a little bit of that here today, but there's going to be a lot more in-depth. On Friday, a Science Feedback Roundtable. It's important to hear from all of you and get feedback. Apple Design Awards, the Best Scientific Computing Solution. There's going to be an award for that. That should be interesting. A BioCoCo Group Meeting, Wednesday, 6:30 p.m. in the Science Connection Room. Many sessions, many labs relevant to creating great science apps.

And then you've all got your booklets. There are a number of sessions that I encourage you to attend specifically for science. Those of you who have been coming to this developer conference for a while have probably noticed the sort of exponential growth curve we've been on. That continues, and it really is because Mac, in my opinion, is the best machine ever that's been created for scientific productivity. Everything from doing the analysis and getting the results to publishing your data and getting your grants. So we're just really pleased to see this market bloom, and I encourage all of you to go out and create great applications so we continue to bloom.

Couple of contacts, people you can get in touch with. The people who presented today will be probably milling around here for a little bit. I think there's a presentation. We'll have a presentation right after this, but we probably have a few minutes that you can come up and get in touch and exchange cards with people. So with that, I want to thank you very much, and see you again next year.