Develop and Deploy with Xgrid 2 - WWDC 2006

Information Technologies • 44:37

Xgrid 2, part of Leopard and Leopard Server, makes Apple's groundbreaking distributed resource technology for clusters and grids more powerful and even easier to use. We will explore the newest features of Xgrid 2 while surveying best practices in building and deploying applications for your IT infrastructure, from loosely coupled 'ad hoc networks' to tightly integrated clusters.

Speakers: David Kramer, Steve Simon

Unlisted on Apple Developer site

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.

session of the conference. I'm glad you chose to come to my session. I think you'll This is really the capstone of the entire conference. So I hope you enjoy it. My name is David Kraemer. I am with Xgrid Engineering. Steve Simon will be joining me later to demo one of the new features that I'll be talking about now.

We're going to start with an overview. This is going to be an introduction for those of you who haven't used Xgrid before, haven't attended one of my sessions. Show of hands, how many people have never heard of Xgrid or haven't really learned anything about Xgrid? Alright, for the four of you, this is going to be great. The rest of you, just a refresher course.

And we're going to get through this real fast and move on and talk about new features. So, let me get back to this. The main thing we want to let you know is that Leopard includes Xgrid 2. Xgrid 1 already was included with Tiger and we've made some improvements in Leopard and we're calling it Xgrid 2.

All of the primary components of Xgrid are included with base Leopard systems, but the administrative tools that make it easy for you to deploy your grids are included with Leopard Server. And my recommendation is that you should be able to use Leopard Server to deploy your grids. And my recommendation is that to use, to get these administrative tools, the best way is to buy an XServe or 8 or 16 because it's going to come with Leopard Server, especially one of those new Xeon, XServes, QuadCore. Sounds great. Buy a bunch, please. You get the administrative software, it would be really easy to set up your cluster.

So as I said, we're going to start with the overview, then we're going to move on to developing for Xgrid, talk about some of the new API features we've added, and talk about the administration. And those two sections are going to be completely all about new features in Leopard. Stick around.

So why would you want to use Xgrid in the first place instead of something else? There's lots of options out there. I mean, you can use SSH, you could hire some grad students to run around and start stuff on a bunch of computers. You can do it any way you want, but we think you should use Xgrid and here's why. Well, one is it makes distributed computing painless and I'll talk a little bit more about that later, but the goal here is to make it drop dead simple for you to deploy your clusters, for your users to use the distributed resources that you have.

The next point I'd like to bring up about why you should use Xgrid is that it allows people-friendly sharing of resources. And so we have a mechanism for letting people's computers get used when they aren't using it. But once they return and they are using the computer, Xgrid backs off and it's no longer monopolizing the computer. And so people can feel good about sharing their resources with you.

And one more reason why Xgrid is great in our opinion is that it supports a wide variety of styles of distributed computing. So one style is sort of the Beowulf style where you just buy some components, put it in a rack, hook it all together with Ethernet, and you have a dedicated cluster, a closet or somewhere that just contains all these computers and you use it just for your one task.

Maybe you get this cluster because you write up a grant proposal and you get this nice cluster. You set it up. You run Xgrid on it. And you do your work. But we also support other styles. At the other end of the spectrum, we have the more SETI at home style.

And this is where you have machines all across the wide area network, across the Internet. And they're connecting to the central authority for Xgrid and getting their work and doing it that way. And so that's SETI at home. You can get a lot of people involved. Charles Parnot's Xgrid at Stanford project is a great example of that kind of thing.

And then in the middle, we have the sort of cycle recovery at an organization. And that's like where you have a computer lab or just a bunch of computers on desks. And at night, they remain unused. And so in that case, it would be nice to make use of those computers for computing purposes.

So that's why we think you should use Xgrid, but who should be using Xgrid is the next question, and there's a number of different people we'd like to be using Xgrid. We think institutions and organizations can get a lot of benefit out of using Xgrid because we're making setting up distributed computing resources much easier and much simplified.

So setting it up is just a few clicks in server admin. Maintaining it is easy. There's some really nice graphical utilities for doing that. And then once it is set up, the grid becomes a service, and it's a service like mail or print or file sharing, and the IT department maintains it.

So the people who are good at maintaining servers maintain the servers, and the people who are good at doing distributed computing get to submit their work to the grid and do it. And they don't have to worry about rebooting the servers, about access controls, about security authentication, whatever. They just get to use it. It's very easy. So scientists are clearly one group of people that have a lot of computing to do.

And so we've been working with a lot of engineers as well. I was trained as a physicist, and now I'm a software engineer in both of these professions. I have found that it's very useful to have a lot of computing resources when you have a lot of data that needs to be processed.

So one of the reasons that Xgrid is nice for these people is because -- and this gets back to the painless distributed computing concept. And this is that we have a persistent job queue in Xgrid. So this means that you can fire and forget your work. So, you know, we have a lot of data.

You can get a lot of data. You can get a lot of data. And, you know, it's very easy to get a lot of data. But we have a lot of data. And so we've been working with a lot of different engineers to be able to get a lot of data. And so we've been working with a lot of different engineers to be able to get a lot of data.

And so we've So he sets up the job and he submits the job off to the cluster, but he doesn't want to stick around all weekend to wait for this to get done. He needs to go home. He wants to spend time with his family, go to Disneyland, something, have a good time.

And so what happens with Xgrid is he can submit the work from his laptop and then disconnect, close the lid of his laptop, take the laptop home, and the job stays running at work, at school, wherever he did the submission. And it'll just stay there. It'll keep working. Xgrid will manage it, make sure everything goes right.

If anything goes wrong, it'll try to fix it. And then when the researcher returns to work on Monday, He just reconnects to the controller and the job results are waiting for him to download at his convenience. So we think that makes it much easier to use grid resources and one of the benefits of using Xgrid. And then a third group that we're seeing more people getting interested in Xgrid, these people are content producers. And so one example of content producers are people using Apple's Podcast Producer, which we announced on Wednesday.

And so clearly here they get to use Xgrid as the distributed processing engine for doing the video transcoding for running the workflows. And so they can get a lot of benefit out of having Xgrid set up without actually needing to know how to write software for Xgrid. And then there's other third party applications out there that use Xgrid. For example, VideoHub uses Xgrid to, I think it runs FFmpegX across your network to do transcoding of video as well. as well.

So before we get too far into this, I wanted to talk to you a little bit about terminology for those of you who aren't familiar with the terms I'm going to use. Actually it's simple and there's only a few terms here, but I just want to make sure we're all on the same page so there's no confusion.

We have clients, controllers and agents. These are the primary operators in this process, the primary people that are working together. So the client is the one that has the work that needs to be done. This is the scientist, the researcher. The controller is the process in the middle that maintains the job queue and it receives the work and it distributes it out. And then finally the agents are the ones that do the actual work.

It's a little bit dry. I have a picture here to make it a little clearer. So over here we have the clients. These are people using their computers, laptops, iMacs, G5s, Xeons, whatever you have. They're the ones with the work. They set up the jobs and they run an application to submit the work. And the work gets submitted to the controller. The controller is in this middle tier of this three-tier architecture.

And it's the one that manages everyone on the other two tiers and makes sure everyone's playing together right. So, finally, once your work's been sent to the controller, then the controller finds available agents and sends the work out to them to get it done. So, agents, I mean, agents can be any kind of computer. It could be the same computers as the clients.

You could have an Xgrid set up where all of the clients were also agents. You can have a controller that's an agent. You could have all of these tiers in one computer and just have a one computer grid. Not so exciting. I think you should buy those 16 Xers I talked about before. But, you know, this is what we got. So, the three tiers all in one or distributed across lots of machines.

So just a little bit more, and this is kind of like a dictionary, I'm almost done with the terminology. What we're actually managing here in the controller are grids, jobs, and tasks. And so the grid is the collection of agents that the controller is maintaining. And you can have multiple grids that you set up and you can drag your specific agents that you want to each separate grid.

And then you can tell people, "Okay, well, you guys, you submit to this grid, and you guys submit to that grid." And then you both get some of the resources that we have on the network. And then the grid is also the job queue, and this is what manages the list of work that needs to be done and makes sure that it gets done in the right order.

So a job, which is what is stored in the job queue, is the set of tasks that you want to get done on the agents. And it also includes the input and output data of the tasks. So once the task is finished, the job includes the output data, which can be retrieved by the client.

[Transcript missing]

So, that was the workflow, but what else does Xgrid do? One thing it does that I didn't mention before is that it enforces authentication and authorization policies. So this means that you can rest assured that if you set up your service ACLs that only authorized people have access to your resources.

As I said before, it groups agents into grids, it monitors agent availability, manages the queues of the jobs, and then one thing I didn't mention was that jobs can have dependencies. So you can submit multiple jobs and say that these following jobs should not even be started until the first job completes. And so the scheduler schedules runnable jobs when there are available agents to run.

Xgrid also handles data staging. As I said, the jobs and the tasks include the input data and then collect the output data. Xgrid will move all the data around for you if you would like it to. And then the really important thing is that it's a very easy way to run your data. So you can run your data in a very simple way.

So you can run your data in a very simple way. And then the really important thing that I think Xgrid does that I like, my favorite feature I think, is that it recovers from failures. And so that's the agent going offline. It can deal with that and send the work over to another agent.

And then another way it can recover from failure is that if the actual controller were to crash, heaven forbid, or if the machine were to be shut down or maybe you need to restart for a security update, you can reboot. The jobs will persist. They're stored on disk in a database.

And when the server comes back up, the jobs are all there and they will continue running. And the pieces that have been finished ahead of time, all those results are still there. The tasks that were in process at that point will get resubmitted once the agents reconnect to the controller.

So that was the overview. And now let's talk about what is new here in Leopard for Xgrid. So we've addressed three areas of improvements here. Ease of developer adoption, and for that we have three new features: Xgrid Anywhere, Xgrid Scoreboard, and Task Feedback, and I'll talk more about those in a moment. We've also tried to address ease of setup, and to this end we've created a service configuration assistant for Xgrid to make it even simpler to deploy Xgrid on your servers.

It was pretty simple before, there was only, I don't know, six clicks, I think we've got it down to three or four now. So it's pretty easy to get going with Xgrid. And then the final piece that we've addressed is ease of porting, and this isn't exactly an Xgrid feature, but OpenMPI will be included with Leopard, so if you have existing distributed computing applications that make use of the MPI, API, you can use OpenMPI in Leopard and it will automatically use Xgrid if you have Xgrid configured.

And I see we have some open MPI fans, so that's great. So, here's the good stuff. What's new in Leopard for writing your Xgrid software? So, three features: Xgrid Anywhere, Xgrid Scoreboard, Task Feedback. Xgrid Anywhere is a feature that allows your Xgrid-enabled software to run no matter what.

And by no matter what I mean, even if there are no controllers or agents on your network. So, you're just on a standalone machine, you're flying 20,000 feet, heading back home tonight. You can start writing Xgrid software on your desktop or your laptop, your portable machine. And you don't need to set up any controller or agent.

And you can just start actually testing the job submission and results retrieval right then and there. Xgrid Scoreboard is a really exciting new feature that we've developed based on a lot of feedback we've gotten on the mailing list and at previous sessions and Q&A. Where people have wanted to be able to specify which resources are used for a particular job.

And so, we've addressed that with Xgrid Scoreboard and we'll be talking about that more soon. And then Task Feedback is the third new feature that we've added, which lets the scheduler make more informed decisions about where you want your tasks to run based on the current conditions. And that might sound a lot like Scoreboard and I'll explain sort of the differences between those two. These are just the three features that we're talking about today and that are included with your seed. We have more enhancements for Xgrid that we intend to get in there, but we're going to talk about these today.

The first one is Xgrid Anywhere. And basically the problem that has come up is that we'd like everyone to use Xgrid, and everyone's like, "Well, yeah, if I had five computers, everything should just go five times faster. That sounds like a good plan." And they say, "So can I just drag my application onto the Xgrid icon in the dock, and it'll go faster?" And it sounds good. I'd like that. But that's unfortunately not how it works right now. Sorry.

So there's no free ride. You do need to adopt Xgrid. And the problem there is that adoption is hard. And I don't mean that Xgrid, in particular, is hard. I just mean that adopting things is work. You're going to have to learn something new. There's a new API.

You're going to have to change your code. And then once you do change your code, the real problem here is that you still need to maintain all of the code that you wrote before to do it all locally, because you don't know if your users are going to have Xgrid set up ahead of time if they're going to have a cluster.

So with Xgrid Anywhere, we've decided to solve this by making it so that Xgrid is available anywhere you are, no matter what operating system of Mac OS X Leopard you are using, desktop or server. And you can just run the application, and it'll work. So now you can rewrite your code to Xgrid, and you only need one code path. It's always going to work, even if there are no controllers or agents available on the network.

So there's two pieces that make up the Xgrid Anywhere feature. The first one is Xgrid here, I'll call it. And this is the private controller. And basically, there's a new API, private controller, on the XG controller class, which you can call, which provides your application with its own private in-memory space controller and agent. These controller and agent are basically running inside your application's process space.

And you can use them as if they were a network controller, a network agent. Everything is going to work exactly the same, job activities, submission, monitoring, retrieval, all exactly the same. So you just need one code path that always will work, whether or not there's any network set up. You also don't need to use this new API. If you already have a host name entry field in your application, you can just type in this special host name, colon, private, colon, and you will connect. It will instantiate this private controller and connect the application to it.

The next piece of Xgrid Anywhere is Xgrid There. And this is the default controller. So what we've done is created a system and a user preference that you can set that says, if I don't know what controller to use, use this one. And so this makes it easier for end users at a large organization who don't know how to configure Xgrid to connect to a particular controller. They don't know which one they're supposed to use. You can push these settings out and then these users will just automatically connect to the right one. And so to make use of this feature, your application should call the default controller API.

And this will return the default controller that's been connected up based on what this setting has been set to. And then the really interesting part here is that if no default controller has been set up, presumably because there are no controllers on the network, you just automatically get the private controller.

So, you take Xgrid here, and you take Xgrid there, and you put them together and you get Xgrid anywhere. This makes your application easier to test and easier to use. And I've really enjoyed using this new feature myself because, like when I was developing some of the demo code for this session, I didn't have this rack in my cube.

So, I needed to make sure the job submission and results retrieval was working, and I didn't have to set up a grid. I just could start the application, type in colon, private, colon, and I was going, I could do all, test all of my application code without having to set up a grid. Now, without having to worry about Kerberos or passwords or anything like that.

So, if you use the default controller in your application, it's always going to work the way your users want it to do. Either it's going to use a private controller and it's just going to work without a network setup, or it's going to use their default controller, which is the one they want.

[Transcript missing]

The way that you specify to the controller what you care about is by including an agent ranking tool with your job submission. And this is a tool that you write. It can be a really lightweight script or an executable. And you just include it along with the job.

And what the controller will do is make sure that this ranking tool gets run out on all of the agents in your grid before it schedules any of your actual tasks, your computational tasks out there. And the agent ranking tool gets to run and it evaluates the conditions that you care about. So it could look at syscadol, it could do benchmarking, it could look for specific hardware devices, licenses. Whatever you care about and whatever is important to you, your tool can evaluate these and then generate a score.

[Transcript missing]

And then you can use more than one agent ranking tool and include them with your job submission. And then what we'll do is we'll just take all those scores and we'll multiply them together and normalize them so that basically, I mean, if any of your arts return a zero score, then that agent's not going to be used, but otherwise we're going to multiply it together and do the prioritization that way.

Kind of a dry description, so I have a picture here to show you how this works. So you start out, the client has the job and the agent ranking tool. The agent ranking tool is that nifty-looking, spinning magnifying glass. I'm not quite sure what that is, but it looks pretty cool.

So it gets sent off to the controller, and the controller monitors and looks for the agents that are available, and once it's found them, it sends the agent ranking out to them first. So the art is out at the agents and it's running, and it generates a score. And so in this case, it's returned a non-zero score for the first two computers, but not the last one. Apparently this art doesn't really like IMAX.

I don't know. But so once the controller has retrieved these results and it knows which scores have come back from which agents, it can then send the tasks out. And so in this case, the tasks only go out to the ones that have been shown as acceptable by the agent ranking tool.

And then finally, once the agents are done, the results go back to the controller. And then the client can retrieve those results. So that's the basic scoreboard workflow. It's pretty handy to be able to use this to limit where your job runs onto a very specific subset of the grid.

The third feature that I want to talk about today that we've added for developers is task feedback. And the issue here is that the scheduler doesn't know everything about what's going on on the agents. I mean, it knows whether the agent's there or not, but it doesn't really know about the environment, and it doesn't know about how the task is running in that environment.

So sometimes the task can detect itself a condition that makes that particular agent that it's running on at that moment unsuitable. For instance, maybe the disk fills up. You're doing some sort of transcoding, and then you realize, "Hey, I don't have enough space to actually finish writing out this file." That's not going to work.

And so, right now, your task could just fail, and then your job would fail, and then the user would see the job had failed, and that wouldn't be very exciting or make anyone happy. So what you really want to have happen there is to have the task restart. But what's going to happen here is that if we just told the controller to restart the task, it would say, "Oh, well, that computer that was just running the task is available now, I'll just send the task right back to that same guy."

And that's not what you want, because the disk is still full over there. So you want to make sure that you can say, "Well, let's restart this task, but don't do it here, please." And so now you can provide this information back to the controller from your task about the current situation on the agent and give the scheduler more information about what you want it to do with your task to make better decisions about where to send it in the future.

So how this works is that agents can send the task back to the controller, and then the controller can send it back to the controller. So what this works is that agents can, to their standard out, send -- in addition to any logging information or output that you're sending there, they can enclose some data in a property list inside an xgrid XML element.

And the content of this element is a property list. It says, "Hey, what's going on on this agent?" It tells the controller what it needs to know. And so there's a few messages you can send here. And so the first one sort of gets back to that example I gave you, which is that you could say this agent is unsuitable for this task. The task has determined that something's wrong with this agent as far as it's concerned, and it just doesn't ever want to run on that agent again.

And then if you know that all of the tasks in your job have very similar characteristics and they're going to have the same opinion about that agent, you can say, "Well, just don't run any more tasks from this job on this agent." Or if that's not the issue, maybe you just detected a transient error condition in your task. And so you could just say, "Just retry it," and then the controller will have the opportunity to retry it wherever it wants, and it may want to retry it on that same agent.

You could also say, "Retry job," which basically throws out all of the results of all of the tasks that have completed so far, and then rewinds the job back to the beginning and resubmits all of the tasks out again, and they start working again. Now, if your tasks keep telling the controller to retry the tasks or retrying the jobs, eventually the controller will say, "Hey, I've retried too many times. We're just going to give up." So that's a configurable value. You can say how many retries are acceptable to you.

And then finally, you can just fail the job. And this is already the behavior, but we've made it explicit where if the task just says, "You know what? Things are messed up. I was trying to connect to this file server, and apparently it's not even available. Let's just fail it so we can let the user know as soon as possible that something is really wrong that they need to address out of band from Xgrid.

So, task feedback and scoreboard are pretty similar and in fact you could get most of the features of scoreboard by using task feedback. Basically your task would go out there and start running, it would use syscadl or do its benchmark, whatever it needed to do and determine, hey, this computer is unsuitable, never run any of the job tasks here again.

But that's not the only thing you can use it for and in fact you can actually use these together and I have an example here of why. So, it might be that memory integrity is really important to you, like critical to you that just no bits get flipped ever.

And so you've already bought some Xservs that have ECC memory, but not all of your Xservs have it. And so you want to use scoreboard to say only run my tasks on the computers that have ECC memory. And so you'd have an art that just looks at that system profile value and determines what kind of memory there is and then returns it.

So, for example, if you have a computer that has ECC memory, you can run it on the computer that has ECC memory, but you don't have to run the task on the computer that has ECC memory. So, you can run the task on the computer that has ECC memory. And so you can use scoreboard and see what kind of memory there is and then returns either 1, yes, it's ECC memory or 0, no, don't run the task here.

But once the tasks do go out to those agents, the tasks may also have their own sort of checksumming or error correcting or maybe they just run the calculation five times because you just absolutely have to be sure that cosmic rays did not give you the wrong answer.

And so what you're going to do here is that you're going to do it a bunch of times and compare the results. And if you determine that, well, the results aren't the same after all these runs, maybe this agent isn't really as reliable as we were hoping. Maybe this ECC memory, there's something more to it than just ECC memory and something's going wrong here. So let's just not use this agent anymore for this job just to be safe.

And so this would be a way that you can use these two features together to really control where your tasks get sent to initially and then also where they remain and where they get scheduled. And so this would be a way that you can use these two features together to really control where your tasks get sent to initially and then also where they remain and where they get scheduled. at in the future.

So, at this point I would like to invite Steve Simon up on the stage. He's going to show you a demo of Scoreboard in action. And so we're going to be using this rack down here. I realize not all of you can see what's going on with this rack, so we have set up a video camera that will be showing on the screen what exactly you have, what we have going on on this rack, and you'll be able to see which computers are being used when.

Okay, so for the demo, the first thing that I would like to show you is all we've done is added one line to the job specification. We get the ART data and insert it into the job. Very simple. And then there's one other step, which is later on when we set the conditions, we just insert a key that says we want them to be exactly equal.

And if we take a look at what an art controller looks like, this is a very simple one. In the rack of XSERVs here, we have three types. We have red, green, and blue. And each of them will return a score based on this program here. So if it's red, it will return 1. Greens are 2, and blue is 3. And if it failed to return a score, it would give us a 0, and then nothing would run on that one.

Xgrid admin, you can see we have a grid here. And I have GridColor. GridColor is a small application that's going to run our jobs. So first I'm going to pick all the Xserves with the blue profile and submit a job. And you can see over here that the, in fact, the blue Xs are running and the others are not. And then you can also fire off a green job. And now the green ones are lighting up. And then you can also fire off a green job. And now the green ones are lighting up.

And then you can also fire off a green job. And now the green ones are lighting up. Thank you, Steve. So that's Scoreboard. We're really excited about it. It's based on your feedback. Like I said, we didn't really think you wanted to care where your stuff ran, but if you want to run all your stuff on red Xservs, be my guest.

So the last section of this talk is to discuss the new features that we've added for making your life easier to deploy Xgrid. And so what we've done is Well, first, let's say what other administrative tools already exist for xGrid that are already in Tiger and continue in Leopard. And so first of all, there's the sharing preferences.

And this is where you configure the agent on Leopard systems, on non-server Leopard systems, and also Tiger systems. There's a little check box, xGrid. There's a Configure button. You can use it to set up xGrid. There's also server admin, and we still have this, although the UI has changed a little bit for Leopard. This is used to configure both the agent and the controller on Leopard server. You can turn it on and off, set what kind of authentication you care about, and a couple of settings.

And then we have Xgrid Admin, which Steve used briefly there. And this lets you monitor, once you've deployed your grid, this lets you monitor the actual state of your grid, look at the job queues, arrange agents into grids, and see their status and see if they're offline or if there's some problem. So what we've done now is added a new simplified server admin setup for Xgrid to just make this easier for you. So there's fewer steps to deploy Xgrid on your machines.

So basically what this gives you the choice of setting a couple of behaviors. You can either say you want a particular machine to host the grid or you want it to participate in a grid. And then we also give you the option of turning Xgrid off using the same mechanism.

So the simplified server admin setup does require existing network infrastructure. We don't set that up for you. You're going to need DNS, you're going to need open directory because we're going to be using Kerberos authentication, so we need open directory or some other directory service. So you can set up a server admin setup, and you can use server configuration assistance to set up OD.

So you can actually go up to the menu in server admin, choose service configuration assistant, and choose, To set up OD, check the open directory box and then you check the Xgrid box and It'll set up OD for you. You just make, say, Open Directory Master and click Continue, and it'll set that up for you, and then it'll run you through the Xgrid setup. And so at this point, as long as you have DNS set up somewhere on your network, you can get your box hosting and grid and running in Open Directory Master with just a few clicks using the system.

The one more thing that the simplified server admin setup does is it sets up a shared file system for you. And so let me digress for a moment here to explain what this is all about. So what we're doing here is we're sort of adding a new data staging mechanism for Xgrid. As I said before, you can include your job executables, you can include input data with Xgrid, with your job submissions, and they'll get moved out across the network to the agents.

And this is very convenient, but it's not the most efficient mechanism. And so there can be a lot of memory overhead for the controller to do this. It's parsing XML that contains this data. And it's going to have a lot of data. It's going to have to copy this data to all of the agents.

And it's actually going to have to copy it to the agents before the agents even begin running the task. So if this is a large file, a one gigabyte file, you're going to have to wait until the entire thing gets there before the task starts, even if the task maybe just wanted a little piece of that large file or maybe just wanted to start reading at the beginning before the end arrived. So rather than trying to solve that with an Xgrid, we want to acknowledge that other people have solved this problem before, and they've created network file systems.

And so what we're doing is giving you an area where you can put your job data. You just have to put it there and then refer to it at that path in your job submission. And we're going to make sure that all the agents that you've configured to connect to your controller using the simplified setup and that are bound to this directory domain will get the mount records and will be able to access these shared file system in exactly the same place across all the computers.

And so now you can put your large data there, and when it's time, for the tasks to run, they just get to grab and stream back the data that they care about from the file server. "The time for the tests to run, they just get to grab and stream back the data that they care about from the file server."

So, what does it look like? First of all, when you begin to do the service configuration assistant, you're given an explanation of what the prerequisites are. This is the DNS, the open directory. And you're just told what's going to happen. And when you click continue, it goes out and checks to make sure that all those prereqs have been met.

And then you're given the option to choose what exactly it is you want to do. Do you want to host a grid? Do you want to join a grid? What do you want to do? So, in this case, we're going to host a grid. And then we just need to enter our directory administrator username and password. And we're going to use this credential to create the export and the mount record in the directory so that everyone can get the shared file system.

Once you've entered that, that's basically it. Now we're giving you an option to confirm all the settings you've chosen. There weren't a lot to choose. And as soon as you click continue here, we're going to actually go out and do the work. And so what work do we do?

Well, we enable the controller, which is like checking the box in the old server admin UI. We're also going to enable the agent, and we're automatically going to point the agent at that same controller that we just configured on that computer. You get both. You get a full operating Xgrid controller and agent connected to each other using Kerberos authentication just by doing this one step.

As I said, Kerberos authentication is enabled so you can feel secure that your data is private and confidential and secure. We're also going to create the export record on the controller, and we're going to put the mount record in the directory so all of your agents that are bound to that directory will automatically mount that file system. We're sharing it over NFS.

We're going to put the file system in the Leopard seed, and we'll be evaluating what we're going to do with that as we move on with the Leopard schedule. So the file system is created. There's a folder created at /Xgrid, and then the file system is mounted on all of the agents and the controller at network Xgrid.

Joining a grid is very similar to this. You just choose the second behavior, join a grid. In this case, you're going to have to choose which controller to use on your network. You can either browse Bonjour discovered controllers or you could, if you have a controller on another subnet, you could just enter the host name or the IP address right there. Although, you actually do need a host name because Kerberos is going to use that host name to make sure that it's connecting to the right server.

So, then if that server that you're configuring hasn't already been Kerberized and doesn't already have Kerberos principles, we're going to do that for you. And so we need the directory administrator password again. But if that server has already been set up with Kerberos principles, then we're going to skip this step. And then finally we give you the option to confirm. And again, there wasn't really many choices to make, so you're just confirming that you chose the right controller here. And finally you hit continue and it does the work for you. The agent is enabled.

It's pointed at the controller you chose. Kerberos authentication for the agent is enabled. And we made sure that the principle was created if necessary. So that's about it for this talk. Those are the new features that we have. So in summary, we've added XRED scoreboard. That's the big new feature I think you guys are going to be excited about to let you choose which of your machines to use for a particular job. We've added task feedback to evaluate the conditions as they change there.

We have XRED Anywhere, so it's easier to write your software for people to use so that they don't need to set up XRED to get started with it. And then once they're ready to scale beyond single computer performance, they just need to set up XRED and your application is going to work and act exactly the same. And then we have this new service configuration assistant to make it even easier to deploy XRED. So all that together, that's what we have for you in the Leopard Seed. You can look forward to some additional improvements. And I hope you guys enjoy.