Enterprise IT • 1:08:36
Whether it's the PowerBook G4 for UNIX to go, the Power Mac G5 workstation for serious computational horsepower on the desktop, or the immense power and scalability of Xserve G5 and Xserve RAID, Apple delivers an ideal platform for scientific computing with an exceptional price/performance ratio. Learn how Apple products are driving momentum in scientific markets and hear how scientific developers are using Mac OS X Tiger technologies to deliver innovative research tools.
Speakers: Bud Tribble, Osman Ratib
Unlisted on Apple Developer site
Transcript
This transcript was generated using Whisper, it may have transcription errors.
Hi, everyone. Welcome to Mac OS X and Scientific Computing. I'm Bud Tribble, Vice President of Software Technology for Apple. And I've got a lot of great stuff to go over today. I'm going to be talking about Mac OS X and Scientific Computing in general, but we've also got sort of the meat of this presentation, which are some great demos from some third parties that I'm sure you'll enjoy. To get started, it's been a year since I was up here last talking about Mac OS X and scientific computing and pointing out some of the momentum that we were starting to see in the science market with Mac. And I'm pleased to report that that momentum has not only continued, but it's on an accelerating course. And I'd just like to take a few moments to highlight a few things here. First thing, in August of 2004, we introduced our iMac, 17-inch and 20-inch. Now, why is this interesting for science? Well, any of you who have done any bench science know that you like to have your computer on the bench, but bench space is sort of at a premium. And so it's been pointed out many times to me as I've wandered around and talked to various labs.
This is a great machine for bench signs because it sort of sits up there on its aluminum stand. You can shove the keyboard under it. Someone pointed out to me that it was sort of impervious to spills on the bench. And I mentioned, well, that sort of depends on exactly what is spilling on the bench. But this is a great machine for the scientists.
XServe, RAID, I'll come back to this subject. More and more, science is about big storage. And Apple originally got into the storage business. Part of our market is the high-end media market. Those guys eat terabytes for breakfast. But this is more and more the story of science. The ability to either collect data, store it, data that is generated during simulations, data that has to be analyzed. Almost any lab you walk into these days, there's a need for big storage. And if you look around the market and say, well, what's the most cost-effective way to do that? It turns out XSERV RAID is there. It's a bit between $2 and $3 per gigabyte of fully rated storage with fiber optic connectivity. In a nice 3U mount system, you can get 5.6 terabytes. So it's the kind of thing that no lab should be without. If you're not generating terabytes of data, you're probably not doing real science these days.
Just a couple other things to point out. Society for Neuroscience in 2004. We had a nice presence there. That was very well attended. Sort of showing our colors in science generated a lot of interest. Some key products were launched. MATLAB DCE, VIBE, GridMP in November 2004. Sort of continuing the momentum, it It was great to see in January, National Instruments start to bring their products over to the platform. So National Instruments is, I'm sure any of you who know or are involved in data collection is kind of one of the key vendors there, bringing over their M series, which are their card-based PCIX systems, their USB devices for data collection. So this is a big, big plus to have all of this available on the Mac. January 1, 2005, we introduced XSAN. And again, this gets back to the storage theme, the big storage theme. So XSAN lets you basically manage your storage in a way that you don't have to worry about reformatting volumes or erasing volumes and kind of reorganizing your back end storage. If you need more storage, you just throw on another volume in the back end. It's fronted by fiber optics to the front end server and the metadata server. An excellent solution if you're up into the terabytes or even above. Now the great thing about XAN-- we introduced it in January 1, 2005-- XAN 1.1, which is a free upgrade, When you run that on Tiger, that gets you up to 64-bit file system, which means that you can manage up to two petabytes of data. And there are scientific projects around that can easily get up into that range. So, again, Apple has great solutions in the storage realm.
And if you look and compare price per gigabyte or price per terabyte, I think you'll find that Apple is going to be the most cost-effective solution out there. And XSAN makes it so easy to manage that it's the kind of thing that no lab should be without. Mac Mini, I'll just mention, introduced in January 11, 2005. Mac Mini is interesting because it can be so many things in the mind of the beholder. Our idea in introducing it was that this would be a great way for someone who maybe has a Windows or Linux box. They've already got a monitor. They've already got a keyboard. They've already got a mouse. Why not for $499, grab a Mac Mini, hook it up, try out Mac OS X, find out that it's really Unix-based, find out that it actually runs the applications that you need in your lab. But what people have done with the Mac Mini goes way beyond that. People have used the Mac Mini as just a controller in the lab for experiments. People have used the Mac Mini as just a data collection device. It's sort of a utility appliance that you can utilize many places throughout the lab. So you're only sort of limited by your creativity there.
Portability. So Apple obviously has always had a great lineup of portable computers. I can't tell you the number of scientists that have come up to me and said, you know, it's just so great that I can have Xers in my lab or clusters or grids in my lab running my simulations or running my analysis.
But, you know, if I'm on a plane to a conference, I just take my PowerBook with me and it's exactly the same environment. That is such a productivity boost for anyone involved in research. And with the PowerBook G4, it used in 2005, you got 1.67 gigahertz. It gets slightly warm, but that's okay. And then a key product launch happened in February 2005. BioTrue. BioTrue is a collaborative system for sharing data, very useful in the biosciences.
Continuing on, BioIT World, we got the cover of BioIT World. And as you probably know, BioIT or biotech is one of the first places where Apple really got a strong foothold starting a couple years ago. And we got our foot in the door. The door is now wide open. And I don't know if you'd call us dominant, but certainly it's hard to go into a bio information lab without seeing Max around, without seeing Max used for analysis, for genetic analysis, biochemical modeling, etc.
Continuing in that vein, what we're starting to see this year, March 2005, we were at the American Chemical Society. What this represents is Apple moving into other parts of the pipeline for drug development. So we sort of started in the genetics part of that pipeline, genetic modeling and some of the biochemical modeling. American Chemical Society represents a different part of that pipeline, and we're coming on extremely strong in that area, and I expect that trend to continue. Again, more products launched in April of 2005, molecular imaging from Kodak, just to mention a few of them, SAS, drug development.
So what we're seeing here is building momentum. Last time I was here, I spent a lot of time talking about all the open source applications that were available for Mac OS X. That continues to be the case, but the big news over this last year is that we have penetrated this market enough to the point where the commercial applications are moving over in large numbers. So that's just really exciting for us.
In April of 2005, of course, the Power Mac G5, 2.7 gigahertz for any real number crunching, this is a must. With the and the acceleration libraries, any sort of simulation or analysis that involves floating point or double precision floating point, this is probably the top system that you can get today.
Just spending a little bit on Mac OS X Tiger and the things that really matter for science. Some of the general things that we put in there just kind of make life easier for the scientists. So examples are Spotlight. You look at any scientist's computer and they've downloaded PDFs and papers, and the first thing you know, they can't find things on their own 60 or 120 gigabyte disk drive. Spotlight is a godsend. Spotlight, as you know, not only indexes all of the metadata associated with a file, the metadata includes when was that file created, who was the creator, who sent, if it's an email message, who did it come from, but it also includes full-text indexing. So if it's a PDF file, Spotlight will do a full text index the moment it sees that file, the moment it's created or dropped on the disk. And then when you go to find it in Spotlight, you're finding it via full text index.
Widgets and dashboard. Widgets and dashboard play a very interesting role in terms of sort of this heads up display, especially for monitoring of experiments that run over long periods of time or real time. and they're very easy to create. All they are are HTML and JavaScript. And they can hook up on their back end to instrumentation so that you can have your computer doing browsing or whatever you want to check in on your experiments as they're running. You just hit F12 or hit the hot key and up comes dashboard and can give you current status on whatever you want to be monitoring in the lab. Last thing on Tiger to mention that is super important is 64-bit. And as you know, prior to Tiger, the largest process that you could have was limited by a 32-bit address space, which is a 4-gigabyte limit. And there are many times, especially in simulation, when really the only way to get the job done is to have resident in RAM, in memory in one process space more than four gigabytes. And Tiger takes the limit off that, goes to 64-bit pointers, 64-bit addressing. So for example, you can have an XSERV loaded up with 16 gigabytes of physical memory, and you can utilize that all in a single simulation. And we're going to see some examples of this later on when we get some of the demos going.
This one I won't spend too much time on because this is one of the examples we're going to see, but June 2005 we saw Wolfram Mathematica launch with 64-bit. You're going to see that here today and in some great detail. And the last thing I want to mention is macresearch.org. So this is a great website if you're using Macs at all in science and want to get hooked in with the community and find out what other people are doing and find out hints on how to make things work or what applications are available. macresearch.org is a great resource for that.
Just a quote that I think is really reflective of the current situation with respect to Mac OS X and science. This is from Mario Roterer at Stanford University. In research life sciences, probably 50 to 70 percent of research laboratories use Macs. It's by far the most common analysis platform. This is amazing. I mean, five years ago, if you had said this to someone, they would have thought you were crazy. This is absolutely the case now. The Mac OS X is the platform that scientists love to use in computing. We're showing up first in life sciences, but I expect this trend to continue throughout chemistry, physics, et cetera.
Now to the fun part. I'd like to start out by introducing Dr. Osman Ratib. Dr. Rattib is Professor of Medicine and Vice Chair of Information Systems at Department of Radiology, UCLA. He's a board-certified cardiologist and radiologist who obtained his medical degrees at University of Geneva, degrees in biophysics, and a PhD in medical imaging from University of California, Los Angeles. He's responsible for coordinating the development of enterprise-wide strategy and infrastructure for image management and communication. His clinical activities include cardiovascular, MR and CT imaging procedures, combined PET/CT imaging, and advanced cardiovascular imaging. And I'm gonna ask him to come up and show you some of the very exciting stuff he's doing. So, Dr. Rakeeb, welcome.
Thank you very much, Bud. Good afternoon. Yeah, my name is Asim Latif, and as you heard, I have a lot of responsibility at UCLA, and we are very excited to be here. I would like to first thank Apple for giving us the opportunity to share with you a very exciting project that we had for about a year, a year and a half now, in developing what we think is a killer application in open source and medical imaging. And thanks to OS X and Tiger now, we were able to make that application really a very high level, very exciting, and thanks to all the features that we were able to put in our software, became really very popular. So let's start with a big, small, or just an introduction of our software.
In a few words, it's just a 3D viewer of medical imaging. We call it a 3D, 4D, and 5D because now medical imaging comes into very large data sets that are acquired in three dimensions and can be acquired over time. That makes you a fourth dimension. And you heard about now things like molecular imaging, functional imaging, which add more dimensions to the data. So that is the purpose of the software, is to provide the tools that are necessary to visualize, manipulate, and interpret those images. We intended to make it open source so we can benefit from a large community of academic people that will contribute to it, and it became very popular, as I will explain to you in a few minutes, where people have now joined our group and developed other part of the software and are contributing to it. It is developed by physicians. I'm a cardiologist and a radiologist by background. Antoine Rosset was here with me. He's a radiologist, a board-certified radiologist from the University of Geneva. And he spent one year with me at UCLA as a research fellow. And again, his clinical background helped him develop that platform designed for clinicians. And I must say about five or six other physicians have now joined from different universities, have joined our group in open source, and have contributed with very clinical-oriented application or extension of the software. It is already used by thousands of users. We've done a survey in December last year, which brought back about 2,000 responses of people that are actually actively using. We have 10,000 of downloads, but I would say probably now we estimate about 6,000 active centers in the world that are using that software for either clinical or research applications. It is really intended to meet very high-end demand and performance because it is for the new generation of medical imaging coming from scanner like CT scanner, MRI scanners, ultrasound scanner. These machines nowadays generate huge amount of data that are very hard to handle if you don't have access to very high-end, very expensive workstations. And we wanted to fill the gap to provide a tool that will provide other users that don't have access to these high-end machines with something that they can use on their laptop or on their own computer to visualize those images. And it is intended really as a collaboration tool. It's very important in medicine. We know that one of the difficulties we have is to convey information from one physician to another. And we think that will help us, that software will help us to do that. The challenges are multiple in medicine, especially in medical imaging, because we are now faced with a huge amount of information that we have to deal with for every patient. Every patient gets a lot of images, a lot of studies, and we are, as physicians, we have to go through a lot of effort in trying to interpret, review, analyze those data. We also have a problem, as I mentioned, communicating with other physicians because we're dealing with a lot of data, a lot of complex data we have to share and it's sometimes not very easy in the usual environment that we currently have in our institution. Performance is very important. And advanced processing tools are required to process those data.
You will see that in the demo. And it's very critical to have the performance and the speed to create things like dynamic 3D visualization, rendering, sculpting, and we'll show you that in a minute. But most importantly, I think we are under heavy pressure in healthcare today to have very high performance, very high throughput. So we need to think about automating those things, making it easy for the users to do. We don't want users to be IT specialists to be able to use the workstation. So that's where the Mac comes in. That's where the OS X user interface. That's where all the consumer market tools that we integrate in our software makes life very easy for physicians, that they can be very high performance, focus on what they do best, which is diagnostic clinical diagnosis and clinical decisions, and don't have to learn all the complexity of the software behind it. So based on that, we actually incorporated in our software as much as we can from the existing tools that are provided in Tiger and OS X in general.
For performance, of course, we use things like Altivec and Quartz and other tools. OpenGL was very useful to do very rapid image manipulations. Things like QuickTime 3D allows us to generate 3D VR object that can be exported and sent to referring physicians. was a key here. So we integrated things like email, we integrated things like exporting things to an iDisk so that physicians can use it as a storage media to store their data. I know physicians that are using it for being on call at home. They have their technologist upload the images to their iDisk and they can download it and just review it using Osiris. We incorporated iChat. We'll demo that in a minute. We incorporated things like being able to export to your iPod and carry your data with you. 40 gigabytes in your pocket is pretty good for carrying several studies that you can't put on a CD or a DVD. And of course, we integrated things like X-Grade because performance is key here to be able to expand and be able to process and analyze very, very large sets of data. So I think the best is to move on to the demo that Antoine is going to be giving here. And we'll show you some of the features that this software does. Of course, these are some of the components, open source component we have. Starting the software will give you a database browser of the patient list and data list that we have on the disk right now Every patient can have multiple study every study can have multiple images and these are hundreds and thousands of images for each patient In order to show these series or sets of images We have thumbnails that actually dynamically browse very rapidly through those images so you can rapidly see Which set you want to look at and you have a preview on the lower corner that shows you a larger view of that So the icons and the lower corner allows the user to browse through a very large set of data very quickly. Then the next step is to open those images. And you can open multiple sets together at the same time. So here Antoine is going to show you two sets of data from different modalities that were acquired in the same patient. The one in color is actually a PET. It's a functional metabolic image. And the one in black and white is a CAT scan, a CT scan. These two sets are acquired in the same orientation, so they can be fused together. That's what we do more and more now in medicine. We combine information from different studies. To make it simple, we just drag and drop one set over the other. It can be anything over anything, and the physician will decide what he wants to overlay. And now we have both sets put together and overlaid together. So now we can see that the hot spot on the left actually corresponds to a tumor.
And that's where the activity of the tracer has been. But you can also see the anatomy behind it and see what part of the bone are being involved and so forth. And most importantly, one thing we wanted to have is that everything we do on that software can be easily communicated to other physicians. You can export that by email, but you can also start an iChat session. If you don't have the tool, this is just to show you that the software can be customized. So there's a lot of tools that don't have necessarily to be there. You can just drag and drop them when you need it. So every user can have his own customized environment. Starting an iChat session here, Antoine is going to start a session. He's going to call me on a laptop. up. So pretend I'm here thousands of miles away from Antoine who's just here. And he's going to call me to try to show me an image. And that's the beauty of integrating iChat because it's a very simple tool to use. He has me on his body list and he's calling me. And all I have to do is just accept or don't accept. If I'm busy, I just say no. And now I'm here accepting the session. And as soon as we get that, we'll be both together. So Now I have on my screen what he has in small on his screen, and he can see me. And we can, with the new iChat, we can have three or four physicians together conferencing over a case. This is great. I've been trying to do this for years. It was very specialized, dedicated, proprietary tools that were very expensive. Now with iChat, we have this performance and this facility practically free. You can use a computer, high-speed Internet, and use iChat. and here you are, you can do medical consultation remotely, what we call teleradiology or telemedicine.
And that, again, this is using consumer market tools to meet the demand of practice in medicine. Well, the software was basically designed for doing 3D and 4D, so I'm going to show you some of those features. And, again, we use a lot of open source component. That's the beauty of open source. There's a lot of open source software out there to do 3D and 3D rendering visualization tool. So we used open source component. We integrated that into our software, images come in slices so basically that's what the raw data looks like these are just cross-sectional slices over a part of the body from a CAT scan now the 3d rendering tool we have multiple of them we're going to show you one or two of those will now allow to have a different vision of that which is much more useful especially if you try to visualize topology and orientation well this is a heart being a cardiology I chose to start with a heart here you can see the whole chest pretty much render in 3d and these are rendered real time. This is nothing pre-calculated. This is rendered on the fly. Antoine is basically moving it up and down. And you see now we have the chest and ribs in front of the heart.
So what he's going to do is he's going to just sculpt this out. He's going to do a little surgery. Don't worry, there's not going to be any bloods. It's not going to be bloody. It's just, you know, like you do in Photoshop. Just take a region of interest, cut through the 3D volume, and let the computer just recalculate the rendering with that piece taken out. So this is how easy it is for physicians to go and process the data. We want it to be Very, very simple. Now we took the ribs and the chest wall off, and we can see the heart, we can see the coronary arteries, can zoom in and out, can change the contrast. All these tools can be done in real time, and it's very high performance.
If you're not in the heart and you don't understand the heart, well, we'll show you something that you may understand. Again, here are the cross-sectional images. Well, I'm not sure anybody will really be able to pick up where exactly we're in the body. This looks like a piece of head with eyes in it. Well, we go for the 3D rendered image. And-- It takes a few seconds. These are hundreds and hundreds of images. Voila. This is the head of a patient. And you can adjust the contrast intensity to go in and out from the skin to the bones to the muscles. And this is real time. So this is all calculated and manipulated. You can cut this out.
And last but not least, it's open source, free. So please join the group. Anybody out there that has great ideas about developing 3D or developing any image processing tools, please go on the website, join the development team. This is going to be a great platform for developing new applications for the future. And we also provided a way to do plug-ins for those who want to keep their intellectual property. They can also have binary plug-ins. If you have the greatest segmentation or rendering tool you want to put in there, you can still protect it or even sell it as a separate plugin. Thank you very much.
I really want to thank Dr. Rattib. I mean, you're witnessing an incredible revolution in medical imaging here. I think it's great that it's open source. I think it's great that it takes advantage of all the things in TIGER. I personally-- and I'm sure Dr. Rattib remembers-- when I did my-- I'm originally a medical doctor. when I did my training, just the medical images associated with one patient, which were at that time physical chest x-rays and so forth, just to carry them around. These things were heavy. These were big acetate things with silver halide. And it was the intern's job to carry them around. Now they come through iChat. And the medical images associated with a single patient would literally fill this stage if they had to be physical. So I think this is an area, again, I'll come back to storage and storage of images like these and manipulation of them with high powered computing is going to be the story of the future. Next.
I'd like to introduce Rob Raga-Scofield. And he is the guy behind Mathematica from Wolfram Research on the Mac. He was hired out of college to port Mathematica to, at that time, what was going to be the newly introduced Mac OS X after Mac OS 9. and he has been associated with that project ever since at Wolfram Research. So I'd like to ask Rob to come up here and show us what's latest in Mathematica. Thanks.
Can we have the slides, please? Hi. Thanks, Bud, and thanks, Apple, for asking me here. My name is Rob Breguet-Scofield, and I'm the primary Mac OS X developer at Wolfram Research working on Mathematica. Today I'm going to talk a little bit about some of the Mac OS X features we utilize in Mathematica to add value for our customers.
So first of all, I'll just give a brief overview of what Mathematica is, in case you're not familiar with it. At the most basic level, it is a calculator. It has symbolic, numeric, and graphic visualization functions completely integrated into one package. Next, Mathematica is also a programming language. It allows for functional programming, object-oriented, and more traditional procedural styles of coding. And then with the Mathematica user interface, has very extensive support for text graphics and typeset formulas to be completely integrated, which makes it a very capable technical word processor.
So Mathematica began its life on the Macintosh in 1988. It had a great UI which talked to a computation engine that ran in the background. This computation engine also ran standalone on several Unix platforms for a number of years with no user interface. Because of the more advanced multitasking and memory management, Mathematica generally ran better on these Unix platforms than what it did on the Macintosh. For a number of years, this was the case, until Mac OS X came along. With the merger of the superior Mac user interface with the more advanced Unix Core OS, it really created the ideal platform for Mathematica.
So some of the things that our customers are trying to accomplish and some of the challenges they face in doing so. First of all, Mathematica allows you to run computations interactively. So you can examine the results at each step along the way. This is very helpful, for example, so you don't have to run a very lengthy computation only to find out at the end that there was an error somewhere along the way. interactive environment helps a lot with that. Users want to utilize sophisticated algorithms without having to rewrite everything themselves, reinventing the wheel. So Mathematica includes what is the most extensive set of algorithms of any software package in existence today.
Users also want extreme performance, and so if they have parallelizable tasks, they would like for those tasks to be distributed across a grid of computers so they complete much more quickly. And our product Grid Mathematica allows certain types of operations to automatically be distributed. Thank you. So users want to explore their data. They want to dig into it, and they want to transform it into something useful. And so Mathematica has a really good intuitive environment for that. The interactivity is great for that sort of thing. And the advanced pattern matching in the mathematical language is very suitable for this type of task.
Users want to work with very large sets of data, and they want to work with lots of different sets of data at the same time. So the 64-bit optimizations that are included with Mathematica 5.2, which is going to be shipping in a few weeks, allow users to break the 4-gigabyte barrier and work with massive data sets.
And finally, users want to visualize their data so they're not just looking at the screen full of raw numbers and text. So Mathematica includes extensive 2D and 3D plotting functions. And then finally, users want to publish their results and make them look good. So Mathematica's top-notch mathematical typesetting and numerous export formats make it a breeze to publish your results to the web, to print, or to anything else.
So some of the technologies we take advantage of in Mac OS X are shown here, and I'm going to give demos of some of these things. As far as the... So we take advantage of Apple's highly optimized linear algebra libraries that are part of Veclib to make sure that numerical linear algebra is as fast as it can be. We have a great native Aqua user interface. And I think the rest of the things we're going to show, so I'll just get to that. And the first thing I want to talk about is the 64-bit support that's in Mathematica 5.2.
There are a couple areas where 64-bit support provides advantages over 32-bit computers. And one example is arbitrary precision mathematics. This is... Here we're calculating the digits of pi, the first one million digits of pi, and doing so on the 64-bit version of Mathematica is nearly twice as fast as what it is on the 32-bit version of Mathematica. And I'm going to show that right now. So can we cut to demo one, demo machine one please?
Great, so this is Mathematica 5.1 running in 32-bit mode. It's calculating the first million digits of pi right now, and we're just going to see how long that takes. And it's a little over 9 seconds. And we're going to run the exact same computation on Mathematica 5.2 running in 64-bit mode, and the result is going to come back much more quickly, just a little over 5 seconds. So there's a tremendous performance increase for the 64-bit version for this particular type of operation.
The other advantage of 64-bit support is being able to work with massive data sets or other memory intensive computations. And this is something that I can't do in real time because it takes a couple hours to complete. But this is a simulation of a tsunami that is going to start in the middle of the screen here. And the unique property of tsunamis that the energy they carry is that they're not just a single energy.
moves along all the way down to the bottom of the seafloor. That means that variations in the seafloor, like these mountains in this example, can cause disturbances all the way back up to the surface. And that is what this simulation is modeling. So when we run this on a 32-bit computer, we had to actually back the resolution off a little bit so that the simulation could complete in the 4-gigabyte address space of the application. And as you can see, there are some artifacts that show up here. And this isn't because the calculation's wrong, it's just because it's run at a lower resolution, which causes, you know, slight variations to be kind of exaggerated at times. So this--it obviously doesn't look correct. So we can do better on a 64-bit system. We increased the resolution and we rerun the same simulation using about 6 gigabytes of memory. And as you can see, all the artifacts disappear and we get pretty much what we would expect.
So that is another advantage of 64-bit systems over 32-bit systems. As you saw in the keynote the other day, one of the features we're taking advantage of in a future version of Mathematica is OpenGL for interactive 3D graphics. Now this is pretty neat, but lots of applications can do things like that. But because Mathematic is a completely general system, we handle... very complicated things as well. And lots of times more specialized packages fail when they get more complicated inputs such as this here. So that is OpenGL.
Next is Java. We have a Java connection technology built into Mathematica called JLINK. And on top of that, we have a package called GUI kit, which enables you to build Java interfaces that basically utilize Mathematica functionality from within it. And what this is, is a An interface that allows you to plot different starting points and adjust parameters and then it will solve numerically differential equations based on the parameters that you set. And this is all written in Mathematica utilizing the Java environment.
And just a couple other quick things. Mathematica includes with it a library called MathLink and an API that allows you to call into Mathematica from external applications or call external applications from Mathematica. And so what we have here is an external application that is calling into Mathematica. It's a simple Automator plug-in that will do one of two things. that will either just evaluate some text as part of an automator action, And the result of the action will be the actual evaluated version of that text. Or you can use it to basically process the input text, calling Mathematica functions on it. And so I'll run another example here.
And what this code here I have is going to do is it basically capitalizes the first letter of each of the words. So that is calling into Mathematica. And one more example of that, just because it's interesting, is that there's also a service that we have that you can download from our website that you can use from any services aware application that allows you to basically evaluate the text right in place. and there's even a keyboard shortcut for it. So you can immediately evaluate Mathematica commands from any application to text, or there's a version that will create a graphic as a result.
And the one last thing I'll mention here is we have a spotlight plug-in that knows how to read our documents. So you can quickly find text inside of your Mathematica notebook files. The spotlight will catch up with me. So here are some of the Mathematica files that include the pot 3D command. But it also, because we include different types of metadata in our files, you can actually do advanced searches on different types of metadata. So if you were working with a certain file and you don't remember where it is or what exactly it was doing, but you just remember it had a lot of graphics in it, you can actually search for all the notebook files that have, for example, more than 50 graphics in them. So you can quickly find what you're looking for. And with that, I will give it back to Bud.
Thanks, Rob. Well, I think that's a really great example, as are these other examples, of what you can do when you take advantage of the unique capabilities of Mac OS X and especially Mac OS X Tiger. And I just want to stress one of them, which is the Tsunami demo. That is the perfect example of how to take advantage of the 64-bit address space because, as you probably know, The front end of the application is still running in 32 bits, the Cocoa or the Carbon front end. But the back end, the computational back end, where you're using the full six gigabytes so that you don't have those artifacts, that runs on top of all of our new 64-bit APIs that take 64-bit pointers in Mac OS X Tiger. So that's the canonical example of the best way to take advantage of that full address space. And you see the advantage in terms of those artifacts are gone because you can finally do that tsunami simulation at high enough resolution.
Next, it's my pleasure to introduce Dr. Dean Dauger of Doudger Research. And Dr. Doudger has a Ph.D. in physics from UCLA, where he created the very first MAC cluster in 1998. He's also the award-winning author of Adam in a Box and Fresnel Diffraction Explorer. And this I didn't know. He co-authored the original award-winning Kai's Power Tool, so that's pretty cool. And today his pursuits include computational physics, particularly dynamics of complex quantum systems and high performance computing and visualization. So welcome and hand it over to you. Thank you very much.
Well, thank you very much, and thank you all for coming. I'm very glad to see you all here today, and thanks to the kind people at Apple to be able to speak to you today about Pooch and how you can use it to build plug-and-play clusters on Mac OS X. Thank you.
So, like, first, to be able to jump into this by describing what is Pooch is that, basically, Pooch is a piece of software that manages clusters and constructs them out of Macintoshes that are connected together over a network, over any kind of TCP IP network. But one of the things that it supports is essentially supercomputer-compatible calculations, and so that you can take your codes that you ran on your, say, you were intending to run or you did run on your supercomputer, say, at the major supercomputing centers across the country and instead run them on the Macintosh or optimize or debug them there and develop them there as well. But one of the key features to be able to enable that kind of capability is that it supports MPI, also called message passing interface. This is -- Pooch was the first software to be able to support MPI on Mac OS X back in and it supports actually five different MPI implementations now. But it also supports this kind of supercomputing behavior, such as queuing, launching these kinds of parallel jobs, keeping track of them, keeping track of their CPU time, and makes them accessible to users so that you can make the clusters plug and play, and you can go ahead and use your software, use your application, and go ahead and accomplish your work as you need to. The kinds of users that we have are from all disciplines, such as physics and chemistry, as well as mathematics, biology, and across all the different kinds of customers such as in academia and government and industry as well.
So to give you an idea of what we did to essentially reinvent the cluster computer, this is how we support five different MPIs, open source as well as commercial MPIs, but we also dynamically manage the cluster, and we also use Bonjour as well as service location protocol to be able to discover the nodes out on the cluster, discover the computational resources, and make use of them on the fly. It'll also support a number of diagnostics for development optimization, so you can make it run really well on the Mac cluster run that much better on some other hardware. So you can mix and match and actually have the Mac cluster complement some of the Big Iron if you don't have a large system yourself. So the idea of this build is to be able to bring high performance computing to the mainstream user and do it in a way that's completely independent of any kind of shared storage or remote command line login or mucking around with any parts of the lower parts of the operating system so you can focus on getting your work done. And so that's really the point of this is that it's the lowest varied answer that we're familiar with to be able to accomplish this kind of work, and so it's a result in the savings of time and money for users like myself as well as for everyone else that I know who uses Pooch. And so, which is the whole point of cluster computing in the first place is to be able to provide computational resources for users. So some of the challenges of high-performance computing clusters is really it's meant for problems that are simply too large to solve in one machine. It either simply takes too long or you simply can't fit it within the RAM in one machine. But other cluster types, previous cluster types are very fragile and very hard to use because they rely on being able to configure things low level in the operating system and so requires a lot of technical expertise to be able to understand what to do and what not to do when configuring and setting up these kinds of low level features. And so because the nature of the design of the way these things are put together, the previous cluster types that the users become responsible for solving any kinds of incompatibilities or any other bugs in the problem. So one of the ways we try to solve that is to be able to take advantage of a lot of the things that are in desktop computing to be able to bridge the cluster to you as the user, let's say. So for example, we support MPI, so we support the supercomputer, the same calculations that would run on a supercomputer, as well as we use Bonjour to be able to discover dynamically the health and status of the nodes out there on the cluster so we can dynamically respond to any problems that might occur, as well as we provide, say, AppleScript and Automator support to be able to make it an easier way to be able to launch jobs on the cluster, as well as different kinds of windows into the cluster, such as Spotlight and Dashboard, and other ways to be able to access the cluster and combine these things together. So if you wanted to do it yourself, you know, you know of various books to be able to build really our book would basically start with this, is the Mac cluster recipe. You take a bunch of Macs or XSERPs, G4s or G5s, get an Ethernet switch and some cables, and the directions are simply to connect the hardware together, download Pooch, and go ahead and install it. And it literally takes seconds per machine to be able to install the software. So what I'd like to be able to provide for you today is a demonstration, a quick demonstration, we hope. So I'll switch to demo number three. Let's see. Let me just get that going.
Okay, so the first thing I'd like to be able to give you, to be able to show you, to give you an idea of, say, a numerically intensive application that takes a little while and sort of stresses out the processor a bit, this is a Z to the Z fourth iteration fractal, a little different than the Mandelbrot fractal, that produces a nice, you know, graphical result. And it achieves about five gigaflops on my one power bank here, 1.5 gigahertz G4 in this case. And so that's all well and good. And if I wanted to make it go faster, well, if I had a second processor, it can make use of multitasking. But what if I want to get beyond a factor of two? That's where you need to get outside the box, and that's where parallel computing comes in. That's where Pooch comes in. So to be able to install Pooch, this is how long it takes.
And there it is. So that's how long it takes to install Pooch, and you do that on each node that's available. And so the way, actually, let's go into it this way. I can use Spotlight, actually. I know that I queued up some jobs a little while ago, and we have a Spotlight plugin that exposes the queuing system inside Pooch to Spotlight, so you can go ahead and search for other jobs that are out there. So I can pull up the nodes of the job that's there, but let's edit this to be able to do this particular application. So I can remove that, go ahead and drag in the fractal program, and let's remove these and select some nodes. I can open up, I can select some nodes, and I can see, oh, I've discovered a couple of other machines here. There's a couple more G5s right behind the scenes here, but I can see that they're actually reporting up as busy here. Oh, actually, let's have a look at what kinds of processors they are. I can see that dual processor G5s, they have 2.7 gigahertz, and also they actually use this information to be able to rate each one of these units.
But of course the rain is busy so I can actually see, gee, what is it running? I can see, okay, there's a running job there. So I want to go ahead and say kill that job that was there. And it goes ahead and as soon as I do a refresh, I can go ahead and see what's there to be able to make use of the nodes.
So let me go ahead and, oh yeah. So it just takes a little while to be able to submit the data that's there. Okay, so I can see that it rates each one of these nodes, and I see that it gives the G5s a pretty high rating, so let me go ahead and select some of those and bring those in. Some of the other features that you can add, you can delay the launch until some later time of day, or say at a later time, so you can use a colleague's machine after she's gone home from work. It supports five different MPIs, as well as grid-type applications, compute jobs and so forth. So to launch the job, I just click on launch job. So this copies the executable out to the other machines. You might have heard that.
That was pooch barking to be able to tell you that it launched correctly. And so when it passes control to the executable, you can see that it went quite a bit faster, something like 26 giga-flops or so. So all this is all well and good, but what if that wasn't enough? So I can go ahead and pull in some more nodes by essentially extending Bonjour. What I can do is say select -- let me see.
That's right, select some machines over at UCLA. So this is something like 300, 400 miles away or so. I can go ahead and pull in some of the machines that are there and make use of those. So let's see what we can do. I can see that we have some G5s over there. And to be able to get through the firewall, I need to use a particular port number. Yeah, there we go.
And let's see. So it's now actually copying that executable. Whoops. It should be copying that executable over to UCLA. And... One of the other things that I can do here is actually sort of an iTunes interface that I can select the local network from essentially our node playlist on the left side of the interface. And if I make use of that... There we go. It's now combining the nodes that are here with the ones at UCLA and asking the... And so it's combining... This is a 15-processor cluster here that's distributed over about 400 miles or so, and I get something like 88 gigaflops or so. So this is substantially more improvements. And so my distance record right now is like about from Munich, Germany, about 6,000 miles away back to UCLA, combining these kinds of nodes together. So just to be able to show you that this is... normally works not just for fractals. What I'll do is pull in a physics code. Let's see. Actually, let me give you an idea of what this looks like when we run just a single processor.
There you go. So this is actually a million particles, a million charged particles all interacting electrostatically and showing the electric potential as a function of time. And so it's from one frame to the next frame to the next frame or so. So this is roughly two-ish seconds or so to be able to run this. And we can see that it's running, you know, a little bit slower than we'd like. So let me go ahead and quit out of this.
There we go, good. So if I drag and drop that into Pooch and go ahead and select my little cluster here, There we go. Okay. So actually the computations are now being done on the four processors that are here. And so one, two, three, four. So it's actually going, you know, you can sort of roughly about four times as fast. And we can see that actually the live message passing pattern is on the lower right of the screen here. It's showing the live message passing pattern as it goes as a function of time, as well as a histogram of the messages is being sent and received as it goes. So while that's running, what I can also show you is one last thing is the features in the Automator.
Let's see. So definitely a computationally intensive thing to be able to take up what's here. I can actually, say, use Automator to be able to have the finder select some items and then combine that with a number of different pooch actions that are here. So I can have it, say, launch an executable, like choose an executable, let's say, and, say, launch the executable on a four-node cluster as well as, say, on a single task or get nodes of a Pooch cluster. But in this case, I'll choose distribute single tasks onto a Pooch cluster. And if I run this, I can go ahead and say, select a particular executable, something simple. And this is like, Automator's actually going ahead and submitting an executable as single node tasks into the Pooch queuing system. That Pooch will later launch on the local nodes that are here after the other job runs.
So that completed the queuing of the system. Oh, and one other thing that's also in here is that we also feature a dashboard widget. So we're actually able to have the dashboard be a window into the cluster. We can see that it shows the history of the job activity as a function of time over the last couple of days. And it also shows the cluster capacity. So actually, it estimates that I'm using pretty much all of the cluster capacity of the local nodes that it finds here, and also list how many nodes there are and some of the current status of the cluster. So with that, that's the conclusion of the demo. I wanted to thank you very much for coming here today, and thanks for your attention. Thank you.
Thanks, Dean. Well, I think, you know, one of the messages there is that, you know, before Pooch and before Pooch on Mac OS X, Doing something like that in a lab would probably require a programming staff, a lot of time, a lot of effort, before you even began to get to the science part of it. And the beauty of something like Pooch on Tiger is that that is all done for you. You saw the Automator demo there. Basically, you know, grid computing with drag and drop. It's an incredible boon to scientists everywhere, I think.
So next, I'd like to sort of present to you a challenge. And what I believe is that the Mac platform really is a platform that allows you to push the limits. And I wanna talk about what those limits are and where you should be pushing over really the next 10 years on this platform. The first is really big computing, and we saw examples here. It's computing that is tackling very large problems, problems that could not even be thought about 10 years ago, and it's taking advantage of 64-bit computing, not just in terms of the length of integers and doubles and floating points, But taking advantage of the fact that with Tiger, you can break through the 64-bit or break through the 4-gigabyte barrier and have 64-bit pointers. And with the Tsunami demo, you saw a perfect example of the kind of breakthroughs you can get with that.
Really big computing also means cluster computing. It also means grid computing. Both Mathematica and Pooch are great examples of that. We just saw here, put together in a few seconds, a grid that extended beyond the walls of this room and could easily extend around the globe with that same simple drag and drop metaphor. So there are problems that can be tackled with really big computing that are ideal in terms of the Mac platform. cluster computing, grid computing, high end numerical computing. We put a lot of work in our engineering group into making sure that the libraries, whether it's the Accelerate libraries or the Altivec libraries, are available with simple APIs so that you can write apps that take advantage of that. And our numerical team makes sure it gets the right answer. And we're watching every single instruction cycle. We're making sure that the cache load lines are completely optimized. So please take advantage of that. There's lots of sessions here that will go into a lot of detail on that. So for any numerical intensive scientific applications, you know, MAC has a lot to offer there.
Second big area, and this really applies to any area of scientific computing, but we saw a perfect example of it here with Dr. Rattib and medical imaging. And you know, in the early part of this century, we heard a lot about the human genome and the genome project and the online databases that have genomes of human and other organisms. Well, it turns out that medical imaging and functional medical imaging, the databases there completely dwarf what the genetics guys are dealing with. I mean, these guys just have four characters per codon to deal with. And what we're talking about with medical imaging are huge databases. And as you get into simulation and keeping the results of simulations around and being able to compare them and be able to share them around the scientific community and visualize them, these are going to be very important problems. Now Apple is known for our ability to visualize. We have nice 30-inch displays that I'm sure all of you have back at home. This area is one where Apple spends a lot of time making sure that things are rendered properly on the screen. We support OpenGL to the hilt and will continue to do so. We have Quartz Extreme for dealing with image processing in real time. So this is an area where Mac is uniquely situated to really provide a great contribution to science. Humans are just not good at looking at terabytes of numerical values. You have to turn it into visualization if you're going to have it interpreted, have habit actually contribute to scientific breakthroughs.
The last item, really big storage. And I believe that this is going to be the story of really the rest of this decade certainly, the increasing rate of data that's being generated that needs to be stored, that needs to be analyzed. This is an area where you hear a lot about businesses needing big databases and big storage. actually science has an even greater need for dealing with large quantities of data. And luckily, the technology here is in our favor. If you look at the cost of storage and the cost of the technology for, you know, per bit or per petabyte, however you want to measure it, we've actually been exceeding Moore's Law. Moore's Law, in terms of processing, has said that every 18 months you're going to double the amount of processing you get for the same number of dollars. In terms of storage, we've been on about a 12-month doubling cycle.
And with some of the new phase change modalities coming along in storage, and I'm sure you've heard about things like Blu-ray for the next generation of DVDs, The vertical magnetic storage is coming along in terms of rotating storage. The storage is going to stay, I believe, on this 12-month doubling for the rest of this decade. And you guys and the scientists are going to absolutely eat this up. We're committed to making sure on the Mac platform that you have all the tools you need to deal with this storage. And, you know, one of the examples in Tiger that really just scratches the surface here is Spotlight. Here you've got a system with, you know, 100 gigabytes of your stuff on it. How do you find your stuff? Well, Spotlight will immediately index that the moment that file hits the disk. The kernel intercepts that right and says, okay, let's grab the metadata. Let's do a full text index.
Let's make sure that when that person types into Spotlight, they can see exactly what they get. The message to you guys as developers is make sure that your applications create the correct plug-ins for Spotlight so that your particular metadata and the particular metadata that is for your application gets put into the Spotlight world. You saw a great example of that with Mathematica where you could type in, you know, I want to see all of the files that have at least 50 plots in them. I'm sure every single application that gets written in the scientific community probably has specific metadata. And we designed the metadata schema to be extensible so that you can add your own data types and then have it plug in automatically to Spotlight so that your customers and the people who use your applications can find their results, can find the data in the huge disks and huger disks as time goes on.
We have a lot of things going on at the developer conference. I want to point out a few of them. First of all, we've got the science connection room and a lot going on there. That's room 3014 and I believe, if I'm not mistaken, there's going to be some discussion there after the HPC presentation. Science discussions going on around, well, HPC and Sciences Wednesday, Thursday. Scientific and medical imaging on Mac OS X. We saw a little bit of that here today, but there's going to be a lot more in depth. On Friday, a science feedback roundtable. It's important to hear from all of you and get feedback. Apple Design Awards, the best scientific computing solution. There's going to be an award for that. That should be interesting. A BioCocoa group meeting Wednesday, 6:30 PM in the Science Connection Room. Many sessions, many labs relevant to creating great science apps.
And then you've all got your booklets. There are a number of sessions that I encourage you to attend specifically for science. Those of you who have been coming to this developer conference for a while have probably noticed the sort of exponential growth curve we've been on. That continues, and it really is because Mac, in my opinion, is the best machine ever that's been created for scientific productivity. Everything from doing the analysis and getting the results to publishing your data and getting your grants. So we're just really pleased to see this market bloom and encourage all of you to go out and create great applications so we continue to bloom. A couple of contacts, people you can get in touch with. The people who presented today will be probably milling around here for a little bit. I think there's a presentation right after this, but we probably have a few minutes that you can come up and get in touch and exchange cards with people. So with that, I want to thank you very much and see you again next year.