Building Computational Clusters - WWDC 2003

Enterprise IT • 1:03:01

Learn how to build powerful computational clusters with Mac OS X Server, Xserve, and Xserve RAID. Hear how customers are designing and deploying large installations to solve a variety of complex computing problems.

Speakers: Douglas Brooks, Michael Athanas, Theodore Gray

Unlisted on Apple Developer site

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.

Thank you and good afternoon. So we're going to talk this afternoon a little bit on building computational clusters with Mac OS X Server and with XServe. And I think it's really important before we get too far into the presentation to make sure we are all here understanding the right thing. Clustering is one of those really powerful, overloaded words that means a lot of things to a lot of people. And I get asked all the time, ooh, I want to cluster my Xserves.

Well, what does that mean? And so I think it's important to recognize that clusters are used, the term clustering is actually used in two very distinct areas. The first is clustering for high availability. I want to take a server service and cluster two or more servers together for high availability such that if one would go down, the other one takes over its place with ideally no network interruption. That's something very different than what we're here to talk about today, which is clustering for computational ability.

Aggregating the computational performance of several servers together to generate a larger compute farm per se. So just to be clear, that's what we're going to be talking about this afternoon. Anyone interested in learning a little bit more about applications for high availability with XServe, invite you to my session tomorrow afternoon on deploying XServe, where we'll touch on some approaches to the other kind of clustering.

So let's talk a little bit about what we want to cover in this session and dive right into it. So obviously the first thing might be why I might want to build a cluster. What goals would it deliver for me? How can we use Xserve and Mac OS X Server in a cluster? What are the benefits and advantages? Typical cluster architecture. What does it mean to build a cluster?

What is the topology, the network, the basic system requirements, and physical requirements needed to do that? We'll talk a little bit about deploying applications on a cluster, what kind of things can be distributed, and some approaches to that. Physical network concerns. When you bring a large number of CPUs into a certain area, there are obviously requirements that go above and beyond the number of machines. It requires planning on power, cooling, and network requirements. We'll touch on the requirements in those areas as well. Tools and techniques for deploying a cluster. What resources are available? What tools? are available and highlight some of those capabilities.

[Transcript missing]

The other advantage of clusters is that, and we'll talk more about this, is that it's very easy to scale a cluster to both budget and problem. So, you know, based on a given budget, you can, you know, very granularly add more computing power by adding more compute elements to a problem. And, of course, it's also easy to scale out that cluster based on the size of your problem. And so, if you need more computational power, add some more modular servers to your cluster deployment. So it makes it very flexible in these areas.

Some example applications, so take some of the more obvious ones first. Image manipulations, rendering, compositing, the only way these digital effects that are being done in theaters today are being generated is by massive computation behind the scenes. And again, clustering is the way to achieve some of these goals.

Simulation, biology, simulating car crashes, airplanes, financial models of the markets is something very well mapping out to compute cluster environments. And of course, one of the mainstays in the cluster market and some of the areas where we're seeing some of the biggest, strongest, earliest successes with Xserve is in the life sciences with genomics and other kind of life science analysis. Let me give you some examples.

So, let me give you some more dramatic examples. So, Pixar's Toy Story, they have an amazing little website up on their pixar.com website. And there's a whole category called "How We Do It," you know, behind the scenes. And on that site, they talk about what it took to build Toy Story 2.

And the example they give is the average frame in Toy Story 2 took six hours to render. Now, I'll stress average because they also highlight some of the more complex work took closer to 80 hours a frame. So, that's a lot of work. So, let's talk about the average frame.

But just as an example, six hours a frame, if you do a little bit of quick math and you multiply that by 24 frames a second times 60 seconds in a minute times 92 minutes in the feature film, that turns out to be around, you know, shy of 800,000 hours or, you know, roughly 92 years of computation behind that. And that's, you know, just for the rendering piece of the film.

So, you can imagine that the only way to deliver this kind of work is to use the feature film. So, you can imagine that the only way to deliver this kind of work is to use the feature film. deliver this kind of result is with massive computational clustering power behind the scenes.

It's also interesting that this is Toy Story 2, which is a number of years ago. I can only imagine what Finding Nemo took to render. Look at another example is in the genomic life sciences environment. The fact is that the data that needs to be analyzed is growing significantly faster than Moore's law.

Processors aren't getting as fast as the data is growing. Moore's law processor speed doubles every 18 months. This is data from GenBank, NCBI. Roughly in the same timeframe, the amount of data is growing by the order of 8X. The only way to do this is aggregating computing power.

Let's talk a little bit about XServe. Why do we think XServe has a story in the computational space? Well, a couple of different reasons. First and foremost, in the 1U form factor, we're able to deliver quite a bit of processing power in a small form factor. When we couple that with velocity engine in the G4, XServe becomes a very potent machine for computational analysis. I'll save the question now, which is the G5 question with XServe. A lot of people ask me that. I'm sure that will come up in Q&A. I'll mention it now. I'm not able to comment and answer that question for you today.

But anyone who's seen the G5 machines when they were here earlier in the week, the heat sink is rather significant. And so I have a little bit of a challenge to get this in an XServe, but it will be an interesting challenge. Regarding Mac OS X, Mac OS X is a tremendous advantage for XServe in the cluster space in that we have a very powerful open source BST core Unix operating system.

That can leverage the major open source projects and compile major applications out in industry and yet take advantage of an operating system that's very easy to deploy and easy to maintain. This is a real highlight of XServe and Mac OS X server. And we'll talk actually some of the ways you can very easily and rapidly deploy Mac OS X servers and XServe in this space.

And finally, remote management. Rack mounted servers are meant to live in a rack and not with the system administrator in front of the rack at all the times. And clusters are even more so in that environment. And so having powerful remote management tools is essential to be able to maintain and monitor the status of a cluster.

[Transcript missing]

Let's talk a little bit about a typical cluster deployment. What does it typically look like when we deploy XServe in a cluster environment? And so there are three main pieces in XServe cluster deployments. We'll start with what we'll call the top, which is the head node. The head node, and actually in the cluster we have here, is built upon this model as the top machine.

Typically this is a full XServe with multiple drive bays. This is really the machine responsible for managing the cluster. Obviously it typically runs the software to distribute the load, manage user access, provide storage to the compute elements in the cluster. That could be through internal storage or through external storage. And again, in this example here we have on stage an XServe RAID providing the storage for that head node.

One of the most important pieces in a cluster is, of course, the interconnect network. So typically, the head node is the only machine that you have connected to your campus or corporate network, production network per se. It's typically the only thing that's directly accessible to end users on the network. The interconnect network actually connects the head node to the compute elements. This can be done in a number of ways.

In most traditional fashion is using Ethernet networking technologies, 100 megabit networks or gigabit networks for added performance. We have mirror net capabilities on Mac OS X and Mac OS X Server. So mirror net is a high-performance PCI interconnect card with extremely low latencies between nodes. And for certain kinds of applications, this is very, very important. And so mirror net is an option available.

And so there's also an accompanying mirror net switch that allows that interconnect. And actually, FireWire becomes very interesting for smaller clusters as an interconnect. Since for the cost of a cable, you can chain several XSERVs together, especially with FireWire 800 built in on XSERV. And that's actually why on the back of the XSERV that when we introduced FireWire 800, we made sure that there were two ports of FireWire 800 on the back so that we could chain down the back of a small cluster and have FireWire connectivity.

Now, FireWire has a lot of interesting properties as an interconnect network. It has extremely low latencies. It has DMA capability between nodes, which means one node can DMA memory out of another node without any processor intervention by the secondary host. We also have in the latest revisions of Mac OS X Server have added IP over FireWire. So we have the ability to run standard IP networking. And any application that takes advantage of standard IP stacks can take advantage of.

The FireWire network built right into Mac OS X Server. So this becomes very interesting, again, for the cost of a FireWire 9-pin to 9-pin cable. Chain down a small cluster, you know, somewhere between four and eight machines. This is really ideal. No additional switching costs. And of course, the workhorse of the cluster is of course the compute elements. So any number of compute elements can be added to the environment and scale that up to your specific tasks and problems.

So the way we'd see this deployed with XServe hardware is, of course, an XServe standard server configuration as the head node, providing storage and network access. Optionally, an XServe RAID for up to two and a half terabytes of RAID-protected storage, again, through fiber channel, which are represented by the heavy white lines. An XServe compute node configurations, and, of course, in this particular example, a 10100 Ethernet switch.

So if we look beyond the hardware now, let's talk about some of the other issues that we have to deal with when we look at deploying clusters, which are some of the most interesting things from a XSERVE perspective, which is deployment and management. So obviously the first things we have to look at are physical concerns. You know, where is this equipment going to live? What are the power, environmental, and networking requirements? And also logical concerns, you know, software installation, configuration, management.

So let's take each of these individually. So let's first talk about power environmentals. So Xserve actually has quite an interesting advantage in that with the G4 processor, it has quite a low power consumption and heat output compared to other competing processors in its class. So when we look at the Xserve, off the back of the data sheet, we rate it at 3.6 amps at 345 watts.

Now this, I should add, is for a fully, fully loaded system, if you stuff every possible thing you could in the box, and that actually has margin on top of that. What we've actually found is that when you actually measure real-world usage, when you max the processor at 100%, running velocity engine optimized code, really leveraging the processing power out of the Xserve, you're going to really have to work hard to draw around 2 amps at 134 watts. So the actual real-world power consumption, is actually much lower than the actual system rating. And we actually publish these numbers in our knowledge base, kbase.info.apple.com. And we also publish BTUs per hour, which is just shy of 460 BTUs an hour for a dual processor system.

So for one system, that's not too much of a challenge. You can pretty much plug an XServe in just about anywhere and not have a problem. But what happens is when you rack a whole bunch of these in a space and multiplying these out, you've got to really take into account both power and heat requirements.

So if you multiply those by 16, you start getting nearly 60 amps and about 5,500 watts of power. Obviously much lower than that in real world consumption. But you do have to plan up for startup currents, which are actually close to rated consumption. And one of the things that you can do is actually stagger the startup in the cluster to prevent maximum current load at power up time. And of course, over 7,000 BTUs an hour. So from an environmental space, making sure you have adequate cooling for a room that's going to host a cluster is, of course, critical.

So, you know, the big things here is, of course, that obviously you need to plan for these requirements. You know, if your, you know, UPS becomes critical and being able to plan appropriately for that. One of the strategies that we're starting to see is typically the most important element in a compute cluster is providing power, backup power for the head node in the storage. So, very rarely do you actually put all the elements on protected backup power since if you have resource management software, any terminated computation will be restarted automatically when an element's available. So, the head node is really the critical piece in this whole puzzle.

So the next thing becomes an issue of the networking. What kind of problems do you have and what kind of interconnects are required to be able to manage this cluster? And so one of the big factors becomes how much I.O. does the particular compute task do and whether you get the bang for the buck deploying gigabit Ethernet as an example across a series of compute elements.

If there's heavy I.O., there might be dramatic advantages to adding that kind of network behind the scenes. Other types of compute jobs don't require that. It's very compute focused and sits in processes on a small amount of data. And a great example of that, kind of a more dramatic example of that, is, for example, SETI at Home, right?

If you've ever run the SETI at Home client, even over a very low speed modem connection, it will download a block of data and it will sit and churn on it for hours. So having a high performance network back to the machine doling out the work is not real critical. However, other problems, that's not the case. The other question becomes that of latency.

A lot of computational problems are what you might call embarrassingly parallel in that a job can be sent out across a whole bunch of nodes and with no dependencies on it, they'll just sit and churn and then return the results. Other problems have very tight dependencies that the results of one computation will get fed into the results of another computation and they'll be tightly coupled. And so having low latency, not necessarily bandwidth, but low latency between the machines becomes very, very critical.

And this is where solutions like MirrorNet become important because of that low latency interconnect. And of course, the other thing that you'll always want to manage is just who can connect to this cluster. And typically, access is provided through the head node and secured through the head node. So you authenticate into the head node, submit your job to the head node, and the work begins to be managed from there.

Let's take a look at some of the logical concerns, the management. And this is an area where we actually have a lot of advantages with the tools that are provided in Mac OS X Server. What's interesting is that a lot of the tools that we provide in Mac OS X Server for desktop management become very applicable for cluster management. And so tools like cloning tools, Netboot, and network install become very, very valuable in cluster management.

The reality is that for a cluster, more often than not, every compute element needs to have the exact same system image, exact same access to tools. And so Netboot becomes a very, very viable way to manage and have basically a single system image across your entire cluster. And actually, when we add some of the headless tools that we provide out of the box with Mac OS X Server and with XServe, literally you can take a brand new XServe out of the box, hold down a button on the front panel, and have it Netboot off your head node. It really provides quick out-of-the-box deployment for XServe.

Of course, remote management tools are essential. Again, clusters are meant to live, interact somewhere. You should be able to access them from anywhere. You may choose to provide command line access with things like SSH or provide web tools, so provide web interfaces. And you'll see some examples of that in this session.

And finally, user accessibility. I actually just touched on that. You know, web interface and terminal access. And finally, backup. Being able to have a backup strategy for the head node, being able to back up that critical data. Again, typically in these deployments, the compute elements themselves are thought of as disposable. They're quick to replace or re-image. It's really the head node that becomes critical in the storage that's provided there, particularly the computation programs and the results.

So before I introduce my next speaker, I wanted to highlight and we'll kind of bring up some of the things that we typically say for the end of these presentations up to the front. I wanted to really highlight some of the key cluster resources that are available for Mac OS X and Mac OS X Server. In the SciTech lunch yesterday, someone asked about Apple providing an MPI stack and whether they thought that was important.

I think one of the interesting things I think you'll see here is that this is an area where we have actually a wealth of solutions to choose from. And we actually prefer having a large number of excellent open source and third-party solutions available. So if we look at some of the examples, one of the keys of any cluster deployment is DRM software, Digital Distributed Resource Management.

And we have a number of solutions to choose from. Platform LSF, SunGrid Engine, OpenPBS, PBS Pro. Some of the top examples. High-performance computing tools. So again, in the MPI stack area, MPitch, MPI Pro, LAMMPI, Lyndon Paradise, and Pooch from Dogger Research are all great examples of solutions in this space.

all available for Mac OS X. Grid Computing Tools, Gridiron Software, a really excellent piece of distributed application resource. Globus Toolkit was actually ported to Mac OS X from the bio team and has a port available from them. Science and Research Applications. You'll see a demo of Grid Mathematica a little later in this session. Excellent tool for a number of kinds of computation. Life Science Applications, Turbo Blast, Turbo Hub, Turbo Bench from Turbo Works.

Bioinformatics Toolkit, which is a set of life sciences applications that have been ported to Mac OS X, kind of a single-click, double-clickable installer for Mac OS X. And Inquiry Bioinformatics, those same solutions wrapped up into a really easy deployment solution. You'll hear more about that in a minute. So, with that said, I'd like to introduce our next speaker, Michael Athanas from the bio team. Michael is a principal investigator and a founding partner and is going to talk to you a little bit more about accessible clustering.

Michael. Thanks. Does this work? Again, my name is Michael Athanas, and I'm a scientist and co-founder of a bioinformatic consulting group called the BioTeam. One of the interesting things I learned from one of my hosts here at the conference is that one out of nine attendees is a scientist. Is that true? Scientists here? I'm actually quite impressed at the way this platform is, it seems to resonate with the scientific community. It's very well matched.

Anyway, So what motivates me as a bioinformatics consultant in the morning is how to take advantage of boundless computing. In this presentation, I'm going to briefly talk about some of the pressures in life science computing, just expand a little bit beyond what Doug talked about. Briefly talk about computing solutions with emphasis on clustering, and then talk about instant clustering. I'll explain that as we go along.

Quickly, the BioTeam is a group of scientists focused upon delivering life science solutions. The group, what makes us somewhat unique is that we're somewhat vendor agnostic. We work with all sorts of platforms, including Apple platforms as well. The principals of BioTeam have been working together on several projects over the past few years, and most recently a great deal of projects with Apple. This list here shows some of the clients that we've worked with, and actually I've bumped into representatives of some of these organizations at this conference as well.

So again, what motivates me as a bioinformatics consultant is to get my clients to think about what you could do with boundless computing. That is, what if CPU was not a limitation in modeling and simulation? What if you had very fast access to terabytes of information? And what's the most appropriate way to visualize the knowledge that is derived from these data analyses?

In doing so, the computing has to be accessible. And in order to do that, some level of abstraction has to be defined. Apple seems to be great at this in terms of, for example, the finder, the emphasis is on the user experience as opposed to the nuts and bolts of the computing behind the scenes. It's true to enable scientific computing as well. It's not about the computers, it's about the applications and pipelines involved in the scientific computation.

So what is important from a scientist perspective is quick data access, reliable fast execution, and application interoperability. What is not so important in order to carry out science is the nuts and bolts of the computing, you know, the details of the storage, how the storage is laid out, or even what type of processor is used. It really doesn't matter, and it shouldn't matter from a scientific perspective.

[Transcript missing]

computing problems and clustering can be an appropriate solution. There are benefits to clustering, as Doug pointed out, and perhaps some arguments against clustering. And I'm going to augment what Doug was talking about in terms of why clustering. Well, I think one of the most compelling reasons to go for clustering is scalability, because in terms of scientific research, it's very difficult to forecast what you're going to be doing tomorrow. Clustering inherently is a blueprint for growth. If you architect the system properly, you can increase your computing power in step with your computational need.

Another compelling reason is price performance of commodity hardware. We've seen that, you know, with the announcement of the G5 dual processor box for only $3,000, That same computing power, if I wanted to buy it four or five years ago, would probably be tens, if not a hundred thousand dollars for the same thing. Computing is getting ridiculously cheap. If you look at the curve, they're going to be giving it away pretty soon. So the trick is, how do we take advantage of it?

[Transcript missing]

Construct your architecture based upon the scientific demands of the applications in your infrastructure. There are many parameters that you can tweak from a hardware perspective. For example, as Doug was pointing out, there are network alternatives. That decision is based upon the applications that are within your workflow. Also, there are storage options as well.

Do you take advantage of local caching on the individual nodes, or do you use some kind of network or SAN available storage? And even the processors, there are some applications may take advantage of different processors or accelerators better than others. And I think one of the interesting flexibility issues that we're seeing in the industry is A compelling reason is clusters kind of transcend the single vendor solution. You're allowed to build a cluster of components that are optimal to your workflow as opposed to a single vendor providing everything which some of the components may be ideal, some may not.

Reliability is another reason for clustering, and this may not be so obvious. Doug pointed this out a little bit, but with careful architecture of a cluster, careful identification of single points of failure, your cluster can have extremely high availability, high uptime. The architecture can be such that if a compute element dies, you just pop it out like you would replace a light bulb in your home. You wouldn't have to shut down the grid of your home and unsolder the light bulb and solder a new one in. You just unscrew it and plug it in.

But clustering is not appropriate for all applications or all workflows or... or types of scientific research. Not all applications map to loosely coupled architectures. An example of this could be relational database engines. Another reason why clustering may not work out very well is management complexity. If you don't pay careful attention to the initial architecture, the effort required to maintain your system may scale with the number of elements within your cluster. That's a sign of failure.

I think one of the more compelling reasons why clustering can be difficult is user application complexity. You may have a really good application that runs great on a single processor, but you need a thousand times that. How do you break that up and run it in parallel? Really depends upon the application, the tools that were used to construct that. There isn't a silver bullet that you can use to automatically paralyze an application. It still takes special skill to do that.

Reliability is also on this list because if the architecture is not correct, you can have, and you don't pay attention to single points of failure, it may be very difficult to maintain. Also, achieving high utilization seems to be an important consideration. It comes back to the user application. If you don't get the degree of parallelization necessary to utilize the cluster, then you're kind of wasting all your compute elements.

This is a different topological view of what Doug showed earlier in terms of the, we call it the portal architecture. And again, this is a great way of abstracting the computing resources from both from an administrative perspective as well as a user perspective. Neither administrator or users are allowed to access the individual nodes within the private subnet in which the cluster elements reside on. And because you do this, then the nodes become anonymous. That allows you to replace them if something goes wrong. You guys skip this.

So I mentioned scalability. Again, I think scalability is a crucial issue that clustering can address. But there are many characteristics of what scalability is. Scalability in terms of quantity of data that you're distributing on the cluster. Scalability in terms of number of users that are hitting the system.

All those have to be addressed in terms of the architecture. I think one of the more significant components of achieving scalability is fault tolerance. How is your system going to be aware of some kind of adverse event within your cluster? The flip side of fault tolerance is automation. How are you going to respond to that adverse condition so that you can continue to process so you don't need extensive management or monitoring capability to ensure completion of your workflow?

Okay, so I wanted to contrast two different approaches for computing. The mainframe SMP monolithic approach compared to the clustering approach. Again, one thing that's going against the clustering approach is application complexity. It's definitely more difficult to make full usage of a cluster than if you had an SMP type machine. However, what's going against the mainframe SMP approach to computing is the upfront cost. A mainframe type system with comparable compute power of a cluster can be about 4 to 20 times more expensive.

As I mentioned, a cluster architecture can give you better scalability, but countering the upfront cost of the mainframe system is the total cost of ownership of maintaining that system. If you don't architect your cluster correctly, then it can be extremely expensive to maintain. So if you can address the application complexity and the management complexity, then the clustering solution can be very compelling.

Okay, I'm going to switch gears a little bit and talk about the BioTeam Inquiry. The Inquiry was an award last night, an Apple Designer Award in the category of Server. and David So what is inquiry? The concept behind inquiry is instant scalable informatics, just add hardware. So the concept is to provide a full functioning informatics solution and you start out with an empty cluster. And the fun part is that we can do this in about 20 minutes.

So what we do with inquiry, we've deployed many clusters, many types of clusters, but there's some common denominator that we see in our deployments. We've taken these best practices in terms of network configuration, OS configuration, various optimizations, deploying the right administration tools, monitoring tools, but we don't stop there.

If we stop there, that would be a fine cluster that you can use from an IT basis, but we go beyond that. Our goal is to enable the scientists, in this case, the bioinformatics scientists. The inquiry cluster is loaded with more than 200 open source applications, which are all cluster enabled, and we provide a consistent user interface, a web interface to all these applications. And on top of that, we deploy about 100 gigabytes of genomic data. So as soon as the cluster is up, you're ready to fire.

So the idea is to go from the many computer concept to a single virtual computing resource that is usable by a scientist. But we go beyond that. Like I said before, we don't want to necessarily train scientists to become computer scientists. We want to empower them to go from, to abstract the command line into something that is more accessible.

So Inquiry is an orchestration of many open source tools and utilities. Just to mention a couple of the components behind Inquiry. The first one is Pies. Pies is a very cool tool from the Pasteur Institute. The heart of Pies is essentially a collection of XML documents describing a bunch of command line bioinformatic tools. Starting from this set of XML documents, we can render an interface, whether it be a web interface or a web services interface.

So now that we've presented the application, we connect the execution of that application to the cluster using Sun Grid Engine, or we can use platform LSF. And that's all completely abstracted away from the user and integrated within Inquiry. We've also deployed several monitoring and administrative tools that are commonly available in the open source domain.

For example, this is Ganglia, and it provides a very nice snapshot of your cluster and allows you to drill down to get more detail of the health of various nodes within your cluster. In addition, we provide another perspective of your cluster from the load management system. So how are jobs running on your system? Who's running jobs? And what jobs are pending? And everything from the user job submission perspective.

Because we're using PI's, we have a great deal of flexibility in terms of how the application interfaces are presented to the user. For each application within Inquire, we provide two interfaces. A simple view, which gives you just the bare bones of what you need in order to execute that application. In addition, we provide an expert view with all the bells and whistles of that application, along with complete documentation for each of those flags, for every application within Inquire.

Also within Inquiry, we manage results that are generated from the various applications. Results calculated at different times are accessible and can be retrieved and either piped into other applications or just reexamined. Okay, so one of the fun things about Inquiry, like I said, we can deploy this within 15 or 20 minutes. And this is Inquiry. We deploy it on an iPod. Okay?

And the idea behind that is first we take the iPod, we plug it into the head node, and we mount the iPod and run an application called the Cluster Configuration Tool, as shown here. And this tool provides a way of setting the number of nodes in the cluster, external IP addresses, just the external things that is needed to describe that cluster.

Step two in configuring your cluster is then you boot off the iPod from the head node. And that takes about five or six minutes. And when that's done, images for the entire cluster are loaded onto the head node. You're essentially done with the iPod at that point. And the third step is to boot each individual node of the cluster from the head node.

And that can take anywhere from a few minutes to up to ten minutes, because depending upon the network, as Doug pointed out, that you've deployed within the cluster. But that can be done in parallel. So when you look at the aggregate, it takes about, you know, 15-20 minutes before you have a fully working cluster. And that's it.

Congratulations on your award. I'd now like to introduce Theodore Gray, Director of User Interfaces for Wolfram, to talk a little bit about Grid Mathematica and its solution. We have the demo machine. Mathematica is a presentation tool, so we use it instead of Keynote. And I just typed in my presentation. Isn't that typical? Okay, so the first thing I should say is that this is actually not my talk. Ordinarily it would be given by Roger Germanson, our director of R&D, but he could be here, so I'm giving the talk, which should be interesting.

At least I didn't have to prepare it. That's one plus. So Grid Mathematica is basically the sort of grid version of Mathematica. Mathematica itself is a desktop application. You can buy it. It's a very general purpose programming language and system for doing mathematics. There's a product called Network Mathematica, which is basically a network license server.

And there's an application package called the Parallel Computing Toolkit, which is an application pack that lets, one Mathematica session manage multiple ones on a network. And then Grid Mathematica is essentially a marketing concept of those two together, and it's cheaper per node than the regular copy of Mathematica. But you can actually put together those different elements separately.

So basically, one of the goals of Grid Mathematica is, like has been mentioned several times before here, to try to abstract as much as possible the details of the configuration of your cluster and sort of what brand of computer it is and things like that. So from a system point of view, you think of a cluster in terms of you have some processors, you start processes on them, you schedule them, and you exchange data.

In the Mathematica view, you think of having kernel processes, Mathematica kernel, that's what we call the computational engine, and you have expressions in the Mathematica language that you want to have evaluated. And the sort of grid clustering element is to distribute those processes, those Mathematica expressions, to different kernels running on a cluster.

And it's sort of, we try to be buzzword compliant. And, you know, we're trying to be So we can handle, I guess we have the same sort of general arrangement where you have a head machine, which is the master, and you have these multiple ones which are not accessible from the outside world. And you have Mathematica handling that communication strictly between Mathematica processes, not involving any other sort of resource management software.

The system is written entirely in top-level Mathematica code, which means it's completely machine-independent, completely platform-independent, and you're not restricted to any of the sort of C data types or anything like that. You can use arbitrary Mathematica expressions, which could be numbers or arrays of numbers, strings, but also structured symbolic expressions that represent either mathematical objects or protein structure or whatever. It's sort of a general thing.

So it's not just for sort of numerical or data analysis type things. You can do, well, abstract mathematical sorts of things too. The communication between processes is through MathLink, which is our sort of high level communication protocol. It uses whichever of the underlying protocols you'd like. I think in this case we have it configured to use TCP between nodes, but if you have different, you know, we have devices for various different kinds of networks. So you can have either relatively tightly clustered things or you could have them on, you know, distant, more loosely clustered things.

Our,

[Transcript missing]

and you can do that. Which I have to admit, I'm not an expert in cluster computing, but it sounds great. And it also, the sort of concurrency controlling structures deal with a kernel that dies or doesn't come back or whatever. You can sort of shuffle things around, which we'll also see in a minute.

Okay, so let's actually do this. We're gonna start Mathematica here, yes. So these evaluations are running on the head machine. It just told us that the name of the head machine is XSERV0. We'll kind of use this machine name as a way of telling where a calculation is going. And this is an OS X version. So this is kind of some configuration which took most of yesterday to get right, but that's because I didn't know how to do it. The little bit about plugging in an iPod and it's automatic, that would be great.

And so now what we're going to do is actually launch all, we're launching 10 kernels, and that's because there's five machines with two processors each, so we're kind of putting one process on each computer. So that's finished. Now we're going to do a little thing. This command says, take this expression and evaluate it on each of the clients.

So you see it's returned XSERV 1, 2, 3, 4, 5 twice, and each of them is a Mac OS version. So I should note at this point that from here on out, absolutely nothing would be different about any of these demos in any way, shape, or form if this had returned a list of, you know, Sun and PC or Linux or, you know, anything else. There's absolutely nothing machine-dependent or hardware-dependent or platform-dependent or anything.

Which, you know, it's sort of, it's a nice advantage because if you've built some cluster and then you find out that, oh, you built a big Linux cluster, but now you can get Macs that are cheaper per, you know, per CPU cycle, you could just add some Macs to it or vice versa.

So let me show you some simple examples of how you actually use the parallels. So this is just running on the local machine, and this says run the same command on this particular node, number one, and this means run it on all of them, so we can see. And obviously you could put something more interesting, you could do 100 factorial on each one and get that back.

Um, For those of you who are familiar with Mathematica, this probably will make somewhat more sense than to those who aren't, but this is just showing some basic Mathematica commands. Table builds a table of expressions like this. And here we're building just sort of to demonstrate a table of machine name always on the same machine.

And now we're going to do it on farming that out to the processors. Now you notice that it's used the same machine over and over again. And that's actually because I discovered that just a few minutes ago, because this command is too fast. So it's done before the load management is basically saying it's done already, we'll just do it on the same machine.

But if we slow this down a little bit, if we put in, let's say, 1000 factorial and suppress the output, it's still too fast. Thank you. All right, so now it's, you see it's now sort of distributing a little bit more because the processes are not actually finishing instantly.

Okay, so another function is map. Map takes a function and applies it to each of the arguments in a list. We do the same thing here. And again, it's kind of boring because it's just too fast. But you get the idea that many of the sort of programming constructs that you have in Mathematica for building tables or for applying functions to data can be parallelized very easily.

And as long as there isn't, you know, a data dependency between the instances of that function, it'll just work. And there's a host of other sorts of commands that are built in, dot products, inner product animations, plotting, things like that, that are automatically -- or where there's prepared sort of parallelized versions.

This is an example of how you distribute data and code. This is a Mathematica program. Actually in Roger's version of the talk, it just added up the numbers 1 through n. I thought that was silly. What I had to do is add up the numbers 1 through n and then add the process ID and then take the factorial of that just so we would get a better number. Also, it proves that I have great confidence in the system because I have no idea what the process IDs are going to be.

So we execute this in the head machine. We've made that definition on the head machine. And now we execute it. And we get a number that is involved with the process ID in some way. And now what we're going to do is export this definition. And that command took the definition that we made in the head machine and distributed it to all the nodes.

Which you can do because it's not a C program. It's not something you have to compile. It's a Mathematica expression that can be interpreted by the Mathematica interpreter. And now we'll go and evaluate this machine name command. That will let us see which one each one executed on. And we do that. And so now we have the ten results. And you'll see each number is a little bit different because it had a different process ID.

So, how are we doing for time? Let me skip these. And this one, this is basically showing the lower level operations where rather than just do a map, you can actually set up a queue where you, I'm not going to go through the details here, but you basically tell it to queue up these processes and then you can ask for it to wait for certain ones to finish and you can wait for a list in which everyone finishes first will return sort of like a select call if you're familiar with Unix. And that allows you, that's sort of a foundation in which you can build your own manual more sophisticated load balancing and process managing things.

So, now here's an example. And as I mentioned, this is actually Roger's talk, and I don't actually know that much about parallel computing, but I thought I would make a little example and sort of whip something up for the demo to see if I could do it. And what this does is it recreates the keynote demo fractal that I used a couple days ago, and this is the code for that fractal.

And so, here, this will run this example now on the -- just on the head machine as a single process. And you see it goes through, and this is a little sort of graphical animation progress monitor thing that I wrote a while ago. And as you can see, it's kind of poking along. You may notice it's not much slower than the G5 demo. That's because I'm not computing as many points and because it's also not doing the big num calculation in the background. It's not, in fact, the case that this is as fast as a G5.

And there, it's put the animation together. So, now let's run it on the grid. And for those of you who can see the lights, look at the lights. Here we go. It's very important always to watch the lights on your cluster. And here we go. Now, notice the first frame, you don't get any faster because it takes, you know, they're all doing it at the same time. But then once it gets going, you basically get ten at a time.

So if I'm reading my clock right, I really need to hurry. So basically the advantages are, it's much, much cheaper than buying separate copies of Mathematica, if you buy the nodes. It works in a completely open-ended heterogeneous environment, absolutely no restrictions at all as long as it runs Mathematica.

We have sort of high-level symbolic representation of the parallel structures and the parallel control that you need to do, or the controls to get good performance. It's pretty easy to take existing code, and as long as it's suitable for parallelization, it's easy to do that. And you can do it, you can do this sort of in the rich world of Mathematica rather than the sort of more limited, you know, worlds of C and Java or whatever, where you have to do a lot more sort of by yourself. And as a prototyping environment for parallel algorithms, of course, it's very nice. And I guess that's about it. And we all have to remember to close these, otherwise we leave things running. Thank you.

Okay, well, I would like to basically wrap up by pointing to quite a variety of resources for more information. So, first of all, there's been a lot of tracks and sessions that are relevant to this topic. And, unfortunately, a lot of them were earlier in the week. So, hopefully, you had a lot of opportunity to go see some of these sessions this week on the Enterprise IT track. If you don't, I encourage you to watch the videos since there were some excellent sessions.

There will be a session tomorrow on deploying Xserve RAID in the afternoon. I'm sorry, deploying Xserve tomorrow afternoon and Friday afternoon on deploying Xserve RAID. I encourage you to attend those sessions. On the developer side, there's some excellent sessions on development tools for the Unix layer and performance tools. And those were, again, either earlier today or yesterday.

But, again, I encourage you to watch the videos if you weren't able to attend those sessions. Who to contact? So, again, for information or follow-up, you're welcome to contact me. Again, Doug Brooks, my email is up there. Michael and Theodore have contacts as well. And Skip Levins, who's our server technology evangelist, wasn't able to attend the session. But from a developer perspective, he's your contact from a server technology point of view. So, he's a great person to talk to. technology perspective.

Additional resources. So we gave you a list of solutions earlier. I wanted to point you to a key page that we've recently put up about two or three weeks ago, which is the Compute Cluster Solutions page. If you go to the apple.com slash server page, you'll find quite a number of solution-specific pages, one specifically on clustering, which highlighted all of the key solutions that were referenced today. Again, the same page can provide product information and, of course, information on both the bio team and Grid Mathematica.

This is an area where there's a wealth of community support from mailing lists. And so I wanted to make sure you're well aware of several key mailing lists that Apple and some third parties host. Apple hosts the SciTech and the Unix porting list, which are both very relevant to the cluster space. And bioinformatics.org has some excellent mailing lists for BioClusters and BioDarwin development under Darwin.