Configure player

Close

WWDC Index does not host video files

If you have access to video files, you can configure a URL pattern to be used in a video player.

URL pattern

preview

Use any of these variables in your URL pattern, the pattern is stored in your browsers' local storage.

$id
ID of session: wwdc2004-618
$eventId
ID of event: wwdc2004
$eventContentId
ID of session without event part: 618
$eventShortId
Shortened ID of event: wwdc04
$year
Year of session: 2004
$extension
Extension of original filename: mov
$filenameAlmostEvery
Filename from "(Almost) Every..." gist: ...

WWDC04 • Session 618

Distributed Computing Made Easy with Xgrid

Enterprise • 43:35

Xgrid makes it easy to use a group of distributed Macs as your own personal supercomputer for performing CPU-intensive calculations. This session will cover how scientists, animators, and developers can use Xgrid to distribute their applications. We describe the different protocols, developer APIs, and services provided by Xgrid, as well as provide guidelines for integrating it with MPI and other clustering solutions. If you are developing software that can easily be split into multiple parallel tasks, you won't want to miss this session.

Speakers: David Kramer, Charles Parnot

Unlisted on Apple Developer site

Transcript

This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.

Good afternoon. This is 618 Distributed Computing Made Easy with Xgrid. And I am David Kramer. I'm the engineering manager for the Xgrid group. We're having a little bit of fun with the names down at the bottom. There are two of us in the Xgrid group, David Kramer and David Kramer. There will be a test at the end. Don't get us confused. So this is the first session after lunch. We're going to do a little bit of an exercise. How many folks here have downloaded and/or played with Xgrid? Raise your hand.

How many folks have not downloaded or placed XGrid? OK, so we've all just done a distributed computing problem, an embarrassingly parallel one, and we'll get to that. Thank you very much for the input. So what we're going to talk about today-- XGrid is Apple's solution for cluster computing. And we believe that it helps make distributed computing easy, although full disclosure, "easy" is a little bit of a relative term, of course, but significantly easier than it's been in the past.

And most importantly, you can today start using the technology preview of XGrid, as well as what is going to be coming with Tiger, to start grid enabling your applications and your workflows, whether you're an engineer or a user of the technology. Today we're going to talk about the system architecture and a general overview of the capabilities and how XGrid works. We'll talk about some new APIs that we'll be introducing with Tiger, the Grid Foundation APIs, and then discuss how you can use those in your applications, as well as some of the other components of XGrid to grid enable your workflow. So let's get started with the overview.

We had a couple of goals with Xgrid. So we wanted to make distributed computing painless and easy. And by easy, we mean easy to install, configure, use. You obviously have to have enough of a brain to understand some of the concepts, but you shouldn't have to have that much brain and fingers to go through all of the pain of setting up 10 different computers, 1,000 different computers. We want to take care of the hard things for you. Secondly, it should be non-intrusive. One of the goals we have with Xgrid is to be able to do what we call desktop recovery.

That is to take advantage of some of the unused CPU resources that are sitting around your environment. And if that caused your friend or colleague or student's computer to crash, or if they noticed that there was something wrong or the behavior was a little bit different, that would be a bad thing. So we want to make sure we don't mess that up, obviously.

And we want to solve a range of problems with a range of architectures. We sort of glibly say from Beowulf to SETI at home. That is from pretty massively parallel kinds of problems to very highly distributed problems. And we'll We'll go through a couple of examples of that.

Our target customers for these products are really scientists, engineers, and creative professionals. Those are the people who really can take advantage because those are the people who have the kinds of problems that take a long, long, long time to execute and tend to be somewhat parallelizable. And again, we'll talk about those problems in a moment. In support of those folks are typically the software developers and system administrators who need tools to help them solve their problems.

So hopefully you fit into one or perhaps more of those camps. If not, hopefully you'll learn something here that will be just fun and interesting to use anyway. So we solve two primary classes of problems with Xgrid. First of all, what we call embarrassingly parallel problems, problems that are fairly easy to visible where I have the same executable that I just want to run on lots of different machines with slightly different or maybe vastly different inputs. And then I want to collect together the outputs. Maybe I'm looking for a single output.

Sort of a needle in the haystack kind of problem. Or I want to collapse together all the different outputs. Or I just want to go off and process something and get the result back. A good example of that is like I have a batch image filter and thousands of images I want to apply that same filter to. Maybe I want to make a thumbnail, for example, or something like that.

I can use the services of Xgrid to distribute that work across a large number of computers so that I don't have to sit and wait for it to get done. So that's what we call embarrassingly parallel or embarrallel problems. A second class of problems that Xgrid is good for are tightly coupled problems. So that's what we call embarrassingly parallel or embarrallel problems.

These tend to be more in the science domains, physics and simulation kinds of problems. These are problems that you have to think really hard about and how to parallelize your algorithm. And then use tools, things like MPI, we'll talk a little bit about that later as well, to enable these massively parallel communications between different nodes in the cluster.

So this is not sort of I have an image and I want to just transform it. This is I have a simulation problem where this proton is hitting that proton is hitting this proton is hitting that proton. And I want to make sure that I get that right. the math right across a large number of computers. We can help you there too. That's the message.

So how do we accomplish that? What's the basic functionality that you'll get as part of Xgrid or that you do get now in the TP2? First and foremost, we collect all the nodes and group them together into a cluster. We should back up for a second. We use the term grid and cluster somewhat interchangeably. In the sort of more in-depth grid cluster world, there are slightly different definitions that may be important to you. In this case, we're just talking about perhaps different kinds of computers.

In any case, we gather them all up together and identify them, bring them together in one spot so that you can start adding work to each of those computers. And then we identify a queue of jobs and perhaps subtasks of each job that we're going to be doling out to each of those computers.

We manage and monitor the availability of those computers and then, of course, dole out the tasks to each of them as the fastest one becomes available. David will talk a little bit more about scheduling. And we make sure that the right executables are on the right nodes. So that's an important task that's often overlooked. It's interesting if you actually step back and look at what people who run clusters do.

A very relatively small percentage of their time is spent doing the interesting parallel cluster work. And a very large percentage of their time is doing just almost mind-numbing system administration because I got to do it on node one and node two and node three and node four. So we really want to help you with that.

And one concrete thing we do right now is distribute, if you want it to, the executables around and the data sets around to all of the nodes that are going to be doing the computation. And then of course we gather up the output from the results from each of those nodes and bring it back to the client machine. And we'll talk a little bit more about how that works too.

So history of how we got to this point. As you can see here, we released a technology preview one, and then fairly recently, a technology preview two of this. We've been trying to develop this project somewhat in the open and gather feedback from the community, because many of our customers have very specialized kinds of tasks, and we want to make sure that the tool is applicable to those tasks. Today we're at WWDC, and we'll be talking about features of the Xgrid that will be part of Tiger and Tiger Server, especially.

[Transcript missing]

Hi, so I'm David Kraemer. I'm Xgrid engineer. So the Xgrid system architecture has three tiers. There's a client, a controller, and agents. And the agents are the ones that do the work. The client is the one that has the work to be done. The controllers are in the middle to make sure everything's going the way it's supposed to go. So the way it works is the client submits a job to the controller.

The controller takes this job, splits it up into tasks. This isn't really that fancy or exciting. What's happening is that a job is just a list of tasks. So it splits it up and starts scheduling it on the available computational resources, the agents. So the agents start to compute the tasks, and as they finish, the results go back to the controller. And the controller collects them all and sends them back as the job results to the client.

So why have three tiers? Traditionally you might imagine having two tiers, where you have a client and then you have the computational agents. And you send out the work and you get the results. But there's situations in which that's not the way you want it to work, especially if you have clients that aren't always online, for example laptops, and you want to submit a job and it's going to take a day to run. You don't want to have to leave your laptop plugged in, you want to take it home.

So the controller is there to keep track of what's going on to make sure it all gets done and to collect the results for you. When you come back the next day, you can get the results. It also allows you to have multiple agents, multiple clients, and they all confine each other by way of the controller.

It also supports a variety of distributed computing styles. So in one case, you could have a dedicated rack of XSERVs, and they just sit in the closet, and they're always being used for computational tasks. Or you might just have some iMacs and some laptops sitting around that sometimes are plugged in, and sometimes they're used by their users, and sometimes they're just sitting there idle. And so you can recover these resources at night or at lunch or whenever they're not being used.

So the first tier is the client tier. You can either create an application using the Cocoa framework and have that be a client, and with TP2 there is an application called Xgrid that is that kind of client. Or you can use the command line tool, Xgrid, to submit jobs, monitor the grid, and get your results. So that's basically all there is to it, submitting, monitoring, and retrieving.

The agent on the other side of the three tiers is a background daemon. It is configured with preferences. There's a plist file. There's some scripts to start and stop the agent. You can run it in dedicated mode, like on the XSERVs, or you can run it in screensaver mode. In that mode, it only accepts computational tasks when the computer is idle.

So the controller in the middle is also a background daemon. It's a server. It listens on a socket and accepts connections from agents and clients. It handles all the resource management and scheduling, monitors the agents, accepts the jobs, splits them up, submits them to the agents, collects the results, and returns them to the client.

There's also a limited controller built into the client framework. This means you don't get the benefits of three tiers now. You've just reduced it down to a two-tier system, but what you gain is you don't need a dedicated controller. So you can just walk up to a network that has some agents on it and start submitting jobs and getting results. But again, if you need to go home and take your laptop home with you, you're not going to be able to continue doing the work because there's no dedicated controller.

So the controller does all the scheduling, as I said. And so the scheduler knows about files, jobs, tasks, and nodes. It schedules the jobs as they come in, they're in a queue. And it takes the one off the top of the queue, looks at what tasks it has to be done, and sends them off to the fastest available computer. It's also fault tolerant, which means that if one of the agents goes offline while it's doing the work, the controller will notice this and will reschedule the work on the next available agent.

The scheduler also handles dependencies, which means that jobs or tasks can depend on arbitrary URIs, universal resource identifiers. Each job, and each task, and each data set is given a unique resource identifier. And so in the first example here, you can have a job depend on another job.

So you have a second job here with two tasks, and none of those tasks will run until all three tasks open. In the second case, you have one job, but it has these five tasks, and the first task runs. This might be a preprocessing task. And then the next three tasks will run once that one completes, and those are doing the work. And then the final task won't run until all of the rest of the tasks finish, and that could be your post-processing stage.

So the Xgrid software architecture looks like this. You've got the daemons and the applications up top, and they're all built on the XG framework and the XG protocol. And that, in turn, is built upon the Blocks Extensible Exchange Protocol, RFC 3080. And all of this is built on top of BSD sockets, the core foundation libraries, and the foundation framework.

So in Xgrid, various aspects of the communication that are important to note are that the controller advertises via rendezvous. And the controller is the only tier that actually accepts connections. It's the server. So the agents and the clients sit around browsing for services, and when they find a controller, the agent automatically connects to the controller that it's configured. to connect to.

So the agent and the client can be configured to either connect to a Rendezvous service name, or you can just use standard host names or IP addresses. And so you can do this on a local subnet with Rendezvous, or you can do this across the Internet. Another thing to note is the agents and the clients leave the connection open after they've connected to the controller, and this allows for asynchronous peer-to-peer communication. So any time an event occurs to the job, when the job completes, the client is immediately notified. There's no polling.

So the controller stores and streams the data from the agent and to the client. So this means that as the agent is running the task and generating output, that gets streamed back to the controller while the task is still running. And that is then, in turn, streamed back to the client. So you can start getting results before your tasks have completed.

And then one more note about MPI that is a little bit tricky is that when we run an MPI task with XGrid, you don't know ahead of time which computers you are going to run on because that's managed by the controller. So you just submit the job. And then the controller schedules it.

And there's some communication that occurs between the nodes at this point to find each other. And the master node collects all the IP addresses and saves it to disk and then runs the actual MPI executable which uses that configuration file to connect to all the rest of the processes that are running on the other agents.

So there's two models for security in Xgrid. The first is the ad hoc model, and this is what you see in the technical preview two. It's password-based mutual authentication. So if you want to have someone join your grid, you can tell them, or they can tell you the password for their agent, and then you can allow your controller to connect to it.

There's also, the second model is the managed security, and this is where all of the components, the client controller and the agent, are bound to an open directory administrative domain. So there's the same set of users on all these computers. In both of these models, the agent and controller can be protected. What that means, what we're protecting against here is unauthorized use.

So the agent can be configured to only allow connections from trusted controllers, and the controller can be configured to only allow connections from trusted controllers. So the agent can be configured to only allow connections from trusted controllers, and the controller can be configured to only allow connections from trusted controllers. And in that way, you are assured that you're not doing work for someone who's not authorized to use the resources.

So in the ad hoc security, the communication occurs in the clear, although it may be encrypted in SSL. But at no point are passwords sent in the clear. There's a two-way random protocol used to make sure that it's unintelligible to packet snippers. The agent runs tasks as an unprivileged user because there's no sense of shared users here.

So what this means is that if you're running a computational task on this agent in an ad hoc security mode, that you don't get any connections to the Windows server, you don't have a home directory, and so you only get world access to the files on the disk. So this means you can write into slash TMP, you can read some various public system configuration files, but you don't have the ability to read private files in home directories.

In the managed security case, all the components, as I said, are bound to open directory, and so the agent can run tasks as a privileged user, and this would be useful to run the tasks as the person who initially submitted them. So this requires delegation of credentials, but it allows you to access home and network directories as the user who submitted them, so you don't have to make all of your files world readable and world writable to have them be used in a distributed computation.

So next, I would like to talk to you about how to actually develop using the Xgrid APIs. So first, there's the Xgrid command line tool that you can use. And you can use this from shell scripts. You can use this from your applications in Cocoa using nstask, any way you want to do it.

What you do is you factor your computational code into a command line executable, and then you use Xgrid to submit this--to submit a job. You can include the executable with the job if it's not already installed on the remote computers. And then you can collect the results. And you can do all this with the command line tool, and I'll show you an example in a moment.

And so that is how the Blender example that David talked about was done. You can also integrate Xgrid with your application, so you don't have to write a new application or write a shell script. You can just link in the Cocoa framework and then use it to distribute the tasks when the grid's available and monitor the status of your work and retrieve the results.

So as you've probably caught on to now, the lifecycle of job is that you submit it, you monitor it, and you get the results. And that's the theme of the talk here today. So the command line example here. First, you submit the job. And so in this case, we're running the CAL program, which prints out a calendar. And we're saying we want June 2004. And so you submit it, and what gets returned is a job identifier.

And with that job identifier, you can retrieve the status of the job. So in this case, we're getting back the status and we see when the job started, when it was submitted and what its status is. In this case, the job has already finished. It didn't take very long. It is just putting out a calendar after all.

And so we'd like to retrieve the results which we do like this. And so again, you use that same identifier that was returned in the submission and it just prints out the results to send it out here. And then you delete the job because the job sticks around in case there's an error retrieving the results. The job doesn't get automatically deleted. You have to explicitly delete it yourself.

So the Grid Foundation APIs contain a number of classes, and I'll talk about them in detail in a minute. But this is the big list. The one you notice here is XGResource, which is the base class for a lot of these other classes here. The XGResource classes all represent a remote resource that you're monitoring with your application. So they act as proxy objects. So they aren't the actual job, they aren't the actual node that lives in the controller. They are your sort of view onto those things.

So, using the client APIs, what you begin with is you connect using the XG connection object, and you authenticate using the XG authenticator object. And then you create a job first by creating a specification of what that job entails, and that would be the command and the arguments and what files it depends on.

And then you create a submission using that specification and submit the submission. And then as that -- if that submission succeeds, you receive back an XG job object. You then can monitor the job using action monitors and the related object, the update monitor. And so you use these to get your asynchronous callback to find out when the status of the job has changed.

And then finally, when you're ready to download the results, you use the XG file download object. which allows you to retrieve the results. So first of all, you want to connect and authenticate. So use the connection, and we use a subclass of XG Authenticator in the technical preview called the Two-Way Random Authenticator.

So first you create the connection. In this case, we're using a host name and a port number, and zero means use the default port number. You can also use an NSNet service if you're using rendezvous and you've browsed for a service. And then you create an authenticator, set the username and set the password. You may get these from Keychain. And then you set the authenticator on the connection, and you tell it to open.

Which we do right here. And so before you do that, you create a grid controller, which is a subclass of XGResource. So it's your view into the controller that's running on another computer. And so you create the controller object, you set yourself as the delegate so you can get callbacks when interesting events occur, and then you set the connection on the controller when you initialize it, and you open the connection.

Then you'd like to know when you've actually found out what the grid controller is doing, what tasks it's running, what jobs it's running, what file sets it's aware of, which nodes it's aware of. And so you ask the grid controller to update, and you get back an update monitor object. And then when you'd like to know when that update has succeeded or failed, your delegate will get a callback.

So here's the callback, the update monitor, resource did update method that your delegate should implement. And so in this case, we check to see that the resource is the grid controller, and if so, we see if the grid controller is available. And again, if it is, we call a method, which I'll describe in a minute, submit job. And that isn't part of the API. This is a method that you would implement yourself. So when you submit a job, you use the specification object, XG specification, and use XG submission, and again, you use XG controller many times.

So to create a specification, first you need the job info. And so rather than showing you some code on how to create this dictionary, I'm showing you the actual property list here. So info dictionary contains a job dictionary, contains information about the default task. And so these are parameters that apply to all the tasks.

And then you actually have a list of your tasks. So in this case, we want all of the tasks to use cal as the executable, but each task should have a different set of arguments. And in this case, we're getting June 2004 and June 2005. And then the type of this job is unordered tasks. The scheduler is free to run these in any order simultaneously, sequentially, whatever it needs to do to make the best use of the resources.

So with this job info, we create a job specification. So you can imagine that that info dictionary would be returned by the job specification info method in the second line of this method. So you create the specification object using a type, the info, an application identifier, and application info. The application identifier and info are completely uninterpreted by Xgrid. They're just there so you can sort them.

David Kramer, Charles Parnot So with this job information, you can filter out jobs so that you only look at jobs that you've submitted and that you don't have to worry about jobs that have been submitted by other clients. And the application info would be sort of information about the job that's not relevant to how Xgrid uses it, but is relevant to how you might want to process it when it's done.

So using that job specification that we created in that method, you then create a job submission and a job submission monitor. And so you set yourself as a delegate on this, and when the submission succeeds or fails, you will get a callback. So here we see the callback.

Again, it's the submission monitor resource did submit. And we check to see that it's the submission monitor we're expecting. And we take the resource, which we expect to be a job in this case, because we're submitting a specification with a type of job. And with this job, we tell it to update continuously.

We don't want to just find out what it's doing right now. We want to know everything that happens to it from now until we don't care anymore, until the job is done. So, we get a job update monitor object and we set ourselves as the delegate. So we've now submitted the job, and it's time to monitor it to see what happens. So we're going to use the action monitor classes. We're going to use the job, task, node, and node list objects to monitor what's going on with the grid while this is running.

So the XG Grid Controller object can be used to get a list of all these other resources. The NodeList object is used to get a list of the nodes. So NodeList is you might consider is kind of like a virtual cluster. It's just a collection of nodes that are meaningfully grouped together.

So you can submit jobs to run on specific NodeLists or you can submit jobs to just run on all of the nodes. And then the nodes actually the XG Node object actually contains the information about the node such as how many processors it has, how fast it is, that kind of thing.

So when you're monitoring a job, you of course use the submission object to first get the job, and then you use the job object to get a list of the tasks. And that is available once the job has started running. And then with these XG task objects, you wait until they're done executing, and then you can retrieve the results.

So, first thing, it's waiting for the job to update. And again, we use this update monitor, resource that update method. And here we check to see if the resource is the job. And if the job is finished, we retrieve the tasks. So this is what retrieving the tasks looks like.

You ask the job for a list of tasks, and then you walk through that list and ask each task to update, and you would want to set the delegate on that, too. So when you get the callback, you're going to want to retrieve the results. And to do that, you're going to use XG file and XG file download.

So as I mentioned before, you get the standard output and the error streams from a task. And these are treated as files, special files, files that have no attributes. So you still use the XG file object to retrieve these streams. An XG file is a reference to a file's attributes and content. It's not the actual content itself.

To get the content, you use the XG file download object. And this lets you do an asynchronous download of the file from the controller to the client. And you can download either into memory or onto disk. You might want to download into memory if you wanted to do some further processing on this data immediately display to the user.

And you might want to save it to disk if it's really large. If you're downloading four gigabytes of output, clearly, you probably don't want to load it all into memory just so that you can save it to disk later. So you can download directly to disk. And the XG file download object can also be used to begin downloading standard output of a task that hasn't completed yet. It will just continue downloading, and if there's no more output ready but the task is still running, it'll just wait for more output.

So here's an example of how to start downloading files. First, you get the output files from the task, and then you enumerate them, and we'll walk through them. But before we do that, we create a base path, and this is where we're going to be saving all the results.

So For each file, we create a file path by using the path of the file, which is probably in most cases just a file name, just one file name, and we append that to the end of the results directory path. And then we create a file download object with the output file, and we set a delegate, and then we set a destination for where we want this file download to go. And finally, we need to retain this file download object. So we have a set here, and we just add the file download to the file download set. And we properly manage our memory.

So when the file download finishes, the delegate gets this callback. And so we make sure it's one of the file downloads that we're paying attention to. And we remove it from our set. And then if there's no more file downloads left, we know that we've successfully downloaded all of the files. And so we're ready to notify the user that the job results have all been saved. And at this point, I would like to turn over to Charles Parnot to talk about actually using Xgrid to fit biochemical models.

I would like to thank David to give me the opportunity to talk here. So I want to present to you today some of the work that I'm doing in Brian Kobilka's lab in Stanford University. And I will start with a short introduction with quite a bit of biology to explain to you our scientific goal and the kind of biophysical studies we're doing to achieve these goals. And then I'll turn to the challenge that we're now facing in terms of data analysis and how we use Xgrid to try to deal with this challenge.

You probably all know that the brain regulates the heart activity by releasing adrenaline to it, particularly when you're a little stressed, like I am right now. But what you probably don't know is that the adrenaline is actually released by the neurons at the surface of the heart cells. And the adrenaline is recognized there by the beta-2 adrenergic receptor. And this is the receptor we are interested in.

This receptor is a protein with a very precise 3D shape, and here's a schematic of it. So on the top is the outside of the cell. That's where the adrenaline comes from. And when the adrenaline binds, the receptor goes through a series of changes in shape in 3D structure through a series of states. And we want to really understand this process because this is what ultimately regulates the heart rate. And if we understand these changes, we'll be able to develop new drugs for heart disease.

So the question we're asking are quite simple. Is this model true? And if it is true, how many states do we have? And how fast are the transitions? And the way we try to answer this question is by using fluorescent probes that we can attach at a very precise location in the receptor.

And then the nice thing with these probes is that each state then has different brightness. One state, one brightness. And we can monitor the transitions. And I imagine you put several trillions of these molecules in a tube, and you're measuring the average fluorescence of all this population of receptor. And here is some real data obtained in the lab where you have fluorescence as a function of time where you add the drug at time zero, and you can generate more data by using different concentration.

So now back to the model. This is the model I showed you before. What we really want to do is take this data in green and try to fit it with the model in red. For that, we have an OSDN application that we have written to do the simulation and the fitting. But the problem that we rapidly faced is that we have really many parameters to fit.

So for each state, as I told you, we have a different brightness. And then for each reaction, back and forth, we have a different rate. So here's the problem. We have many parameters to fit. And then when you do a fit like this, you want to give the computer an initial guess, an initial values that's not too far from the actual best fit that you can find. And this is very difficult with so many parameters. So it boils down to one simple problem. We have a very large parameter space that somehow we need to scan. And this means a lot of computer time.

To address this problem, of course, we opted for Xgrid to dispatch the work, the load, onto several computers. And the reason why we turned to Xgrid was, first of all, because we're no cluster experts. So that saved us a lot of time, because Xgrid is really easy to install and to use. And then we have a typically, embarrassingly problem, because we just want to run many independent fits, each with different starting values.

And finally, we already had a Mac OS X application written, so that meant very little additional code to write with some familiar APIs. This application, Biokin, has a Cocoa-based graphical front-end, and there's also a common version of it that we bundled into the application package. So here is how the tasks were designed to run in Xgrid.

Each parameter was sampled over a reasonable range of values. Then each combination of the parameters constitutes a set of starting values for one fit. Then each one task, so one Xgrid agent, typically consists of about 100 fits. And the positive results can be selected based on the threshold set by the user.

We first implemented this using the technology preview 2 with the plugin architecture. So we had to use a different format for the jobs that could be created inside a main application that could then be read and processed by the plugin, saved back to the file, and then could be displayed and analyzed further on the main application. Now we are implementing the new version of it with the Xgrid framework, which makes a much simpler and much better integration of the Biokin code with the Xgrid code.

We have this command line tool that is on the application package that we can use to submit a job that contains this executable together with a temporary file that contains the description of the fits that we want to run. Of course, that's just one job, and what we do is we send several hundreds of these jobs to xGrid that then takes care of dispatching the jobs to all the agents as David showed you. And xGrid also takes care of retrieving the successful jobs and sends them back to the application for further analysis and to display the results.

So far, using XGrid, we've been able to test really extensively those two models that we call the three-states model and the four-states model. And it was a big surprise for us, because we were not able to fit the data. But because we used XGrid and we did all this computation, we're really sure that those models don't fit. We have scanned the parameter space extensively. So we can now dismiss those models. And this is already a very important result for us, a surprising result, but an important result. And of course, now we are testing other models.

So I want to thank the people in my lab, particularly Brian, Gayatri, Xavier, and Aaron, which were very helpful for this project. But I also want to thank all the people in Stanford that have contributed-- they are listed here-- that contributed the CPU to the cluster. And finally I want to thank all the people around the world that have contributed to this project. And this has been really amazing. Now we're reaching 40 gigahertz, actually.

And so I want to thank them all, and maybe some of you are in the room. And I want to encourage you to join the cluster, send me an email. And finally, all of this wouldn't have been possible without XGrid. So thank you, Apple. Thank you, David. And thank you, David.

So, I just wanted to, I love this slide, I absolutely love this slide. The first time when Charles sent me his slides, I saw this one. I knew right away, and I forwarded it off to David and to Richard Crandall, who was sort of the originator of this thinking, that we had essentially achieved the, if you will, SETI at home for the rest of us.

This is something that Charles, who as you can tell is a scientist, he has a task he's trying to get done. He is not interested in doing distributed computation from the inside. He's interested in using the capabilities of doing distributed computation to solve his task. He managed to assemble this amazing grid of people through a bunch of essentially social engineering, sending out a really nice email message to his friends and their friends and their friends and so on, and built himself a 40 gigahertz cluster just by being a nice guy. So.

Here we are. Key issues and takeaways. So, Technology Preview 2 is available right now. You can download it off of the website. There's some URLs at the end. And, of course, you can always just go to apple.com slash acg slash xgrid, and that URL is coming, and get it. This is part of Tiger Server. There will be enhanced functionality that is part of Tiger Server, as well as some components will be part of Tiger. So stay tuned to see sort of how we package this up. But Xgrid is part of Tiger, Tiger Server.

The agents themselves, because we realize that a lot of us especially have Panther clients out there running, and it's not feasible to imagine that every single Panther client will be updated instantly to Tiger, although we certainly hope that's the case. If you're doing desktop recovery, the agents will run on Panther as well.

And we really, really, really want your feedback. That's why we put TP2, TP1 and TP2 out to the whole community. Use it. There's bugs that we know about. I'm sure there's bugs we don't know about, and we want to fix those and get this to be great quality.

So first of all, download TP2, Xgrid right now, or take the information you've learned today, think about it, send your feedback to the addresses that are shown here. If you're a user of Xgrid, there's a great user list in an archive through our email system that you can look through, lots of posts from Charles and others on things that they found out, as well as just the generic feedback that David and I read. So, first of all, download TP2, TP1, TP2, and TP2 out to the whole community.

Use it. There's bugs that we know about, and we certainly hope that's the case. And if you need more information, of course, go to the website. This is obviously in your kits and on the web page. And give us a call. Lastly, I want to also thank and bring up here for the QA James Reynolds from the University of Utah, who has kindly worked with us to put together the demonstration that you can see running right now in the Enterprise IT Lab.

David Kramer, Charles Parnot And we have harvested the power of a number of machines running around WWDC right now. And we've built a grid that's roughly plus or minus 10, depending on whether these machines are being used by someone else or not, roughly about 100 gigahertz. And actually, it's a little bit more than that.

And what James has done is provide us with a number of scripts and some interesting models. And he's doing a massively distributed Povray render of a 3300-frame movie, I think. And as the show has been going on, that movie has been getting longer and longer. It's really quite beautiful. So check that out in the Enterprise IT. Lab.