Using Xgrid to Create and Deploy Distributed Computations - WWDC 2005

Enterprise IT • 51:28

Xgrid simplifies the task of distributing CPU-intensive computations across your existing hardware, from racks of Xserve G5s to rooms of PowerMacs and iMacs--even Mac minis! This session will start by describing Xgrid's architecture, which not only allows ad-hoc grids via Bonjour and Dynamic DNS, but can leverage the administrative and security capabilities in Mac OS X Server for actively managed grids. It will also cover the basics of how to run jobs and manage clusters using the command-line, as well as tie-ins to Xsan, OpenDirectory, and ActiveDirectory. Plus, we will show you how to quickly build rich front-ends to your computations using Xgrid's Cocoa API.

Speakers: David Kramer, David O'Rourke

Unlisted on Apple Developer site

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript was generated using Whisper, it may have transcription errors.

My name's Dave O'Rourke. We're gonna be talking about Xgrid here for a little bit. A little later on, I'm gonna be bringing up my coworker, David Kramer, and he's gonna be doing some demos and showing you some sample code. Those are our names. So as with most presentations, we're going to start out with an overview. I'm going to give you a quick brief introduction where Apple positions Xgrid, what we think it's good for, and generally try to bring people up to speed that may not have dealt with the previous previews. We had a technical preview and a preview two. We'll cover that in a bit.

So what will you learn today? First of all, we're going to have an overview. I've discussed that. We're going to go a little bit into the architecture. Not everyone's familiar with grid architecture, so we're going to do some diagrams for you and hopefully show you guys how the grid, and actually how simple it actually is, but how much power you can derive from it. We'll then go into administration. We have a host of administration tools that we've included with Tiger. That makes things a lot easier to set up, manage, maintain, and monitor. We'll be talking about job submission. There is no grid without job submission. It's an important topic. We'll be diagramming the workflow for you there. Hope to give you a mental model of how the grid's actually taking code and executing it out on someone's iMac. We're gonna talk about how to develop. You will leave this room with sample code for a grid-enabled Cocoa application. The sample code's on the DVD. You guys can play along while we're coding up here on screen.

So introduction. XGrid is Apple's distributed computing solution. The slide says that. You didn't need me to say that. But what it does is we've built this into the operating system. So by saying it's Apple's grid solution, you as a developer and you as a system administrator can rely on it being there. This allows distributed processing of grid jobs to make workflow go faster. Hopefully that's what people realize. If you're going to grid enable your software, you're going to do things to try to take advantage of the grid resources and make things go much quicker. Again, the emphasis here is this is now built into Tiger. It's Apple's solution for grid computing. You can start relying on this technology being present. Tiger's server comes with Xgrid. The Xgrid agent is on every Tiger machine. And we have also provided an agent install for Panther. So you have a mixed environment, some Panther machines, some Tiger workstations. You can install the Xgrid agent on Panther, and you can fully utilize your entire computing network.

The Tiger development tools are also included and support all the XGrid frameworks, all the XGrid functionality. We'll be going into that in a lot more detail later on in the presentation. But you can use Tiger with Xcode to develop grid-enabled software today. There's nothing else you need to download. For those of you who have been with XGrid a little longer, we thought we'd put up a timeline to kind of orient you as to where we are. Back in January 2004, we had a technical preview. That came out. A lot of people jumped on and immediately started using it. gave us some excellent feedback. We love the feedback we got from the technical preview. That led to technical preview 2. Again, the community got a little bigger then, got a little more feedback, and a lot of the feedback from technical preview 2 is now being delivered to you in commercial quality form in Tiger. So this is an ongoing effort.

We hope to get some feedback at the end of this session, and ongoing feedback as more and more people adopt XGrid, and now that it's built into Tiger, this is just the beginning for Apple, and we think the grid-enabled technology is a core differentiator for our operating system to come with standard.

So what sort of solutions does Xgrid enable? Well, the first solution it enables is we wanted to do what Apple always does, is we wanted to make this so average humans can set grids up. I don't know about you, but I didn't know how to set up a grid 12 months ago, and I can set up an Xgrid. I've seen a lot of people who didn't think they could set up a grid able to set up a grid. So Apple not only built grid-enabled software into their operating system, which I'm surprised there isn't more call-out for that, but we've made it so that mere mortals can set it up.

We've built the support into Tiger Desktop. We've built the support into Tiger Server. You need nothing else to deploy a grid. The administration tools are provided with Tiger Server. They're integrated into the Tiger Server administration tools, so there's no separate set of administration tools for configuring the grid or managing the controller.

XGrid Agent is available for install in Panther, as I mentioned earlier, so this eases your deployment burdens. You don't have to upgrade your entire network to Tiger all at once, although we highly recommend that and get a site license. Ernie can talk to you after the conference. But if you can't do that or aren't willing to do that, we do provide the grid agent for Panther. together.

We also wanted to make sure that if the grid agent's enabled, that it just doesn't take over the user's machine. If a user's volunteering their machine to be part of the grid, they still want iPhoto to work well, they want Pages to work well, they want Keynote to work well, they want Final Cut Pro to work well. So the grid has knowledge and accommodations into it to not do grid tasks when users are using the computer. This is obviously configurable. You can turn it off. We'll go over that later. But the goal was to allow people to be part of the grid without it dominating their daily workflow. We support various grid computing styles. For those of you who were here last year, I wasn't up on stage last year. That's because I wasn't the XGrid manager last year.

So when I inherited XGrid, one of the first things that David Kramer sat down and did was tell me that we supported various computing styles. I didn't know what those were. So this will be a quick tutorial. It was useful for me. Hopefully it's useful for you. The first and most obvious computing style for a grid is dedicated participants. That's represented by the grid over here you see in the gray box on the end of the stage. This is sometimes referred to as the Beowulf model. This is writing a check to Apple Computer for a very small sum of money for a very large amount of compute power to buy 20 rack machines and dedicating those to doing some DNA analysis, doing some financial analysis or something like that. Xgrid fully supports that model. This is racks of XSERVs or closets of Mac minis. Yes, we've had questions with people wanting to run Mac minis as grid agents. They'll work perfectly fine. They're a great space-saving utility. I personally like the G5 for the big heavy iron stuff a little bit better. But yes, XGrid would fully support a closet full of Mac minis. I actually haven't done the math. I wonder how many minis you could fit in the average closet. Part-time participants. How many of you run SETI at home? Okay. You're volunteering your computer resources, but you're not using your computer. XGrid fully supports this model as well. This, you know, people can volunteer their computer, the grid will know about it, and if it gets any jobs, it will schedule it on the volunteer's computer. So we support that type of computing model. This is great for underutilized office and university computers. Think of how much compute power is on your campus or your work site or your data center on the weekends when everybody breaks on Friday at 3:00 p.m. and doesn't come until Monday at 10:00 a.m. You know, there's a lot of compute power over the weekend. There's a lot of compute power. So this is a great way to more fully utilize your computer resources by running weekend jobs or running overnight jobs even when people break at 5 and don't come back to the next morning until 9 or 8.30. Idle cycle recovery on a global scale. Yeah, on a global scale. We can recover idle cycles from Japan. My manager, Kazu, is setting up a grid, and his grandmother apparently is donating her computer from Japan, And he runs grid jobs on her computer, so he can harvest that extra 1.8 gigahertz G5 from all the way across the country. But we can do that. It's all based on TCP/IP. The agents can be located anywhere in the world.

So who can benefit from X-Rit? The first thing that we've done with the grid computer is by building it into 10 server and building it into the desktop, we've made grid computing like a print service. It's a service that the IT department can set up, maintain, but they don't have to write the jobs. They don't have to do it. It's a service that they can make available to a department, and yet they can host it in the data center. They can manage it. They don't have to know anything about grid computing, but they can put the data in it. So it's just like mail, file, or print. Grid is now an option that you can deploy right after you set up your mail server. So the IT department can host it, manage it, but the scientists can use it, just like the scientists use the color printers.

Software developers. If you're a software developer, you can use the grid to make your software run faster, or in some cases, not even run faster, but do more options. An idea we're kicking around is, you know, what if you were to render things and render multiple copies of the same thing so that when you get it back, you can actually review multiple copies? a graphics converter, if that developer's in the audience, you know, he could do multiple conversions in parallel on the grid and convert a JPEG to all to five different formats and do it much faster by distributing the work out to the grid. Scientists and engineers, this is the most obvious and historic market. The scientists and engineers like the grid because we built in a lot of features for them. One of the features they wanted was persistent job queue submission. Something that people don't quite realize, and I'll point it out when we go through the architecture slides, is when you submit a job, The computer that submitted the job to the grid can be, you can slap the lid closed and take off for the week, and the grid will continue working on your job. So the job queue, just like a print job, is persistent.

The grid will continue working the job. It will collect the results, and it will hold the results until that computer reconnects to the grid and collects the results back from the controller. This is fantastic. Big compute job, going to work for several hours or several days on the grid, going to do a lot of computation. I don't want to have to have my power book where I submitted the job sitting there for that entire time. I'm going to close it, I'm going to take it home, and I'm going to play World of Warcraft on it. So the grid works on your job while you're away or while you sleep.

Creative professionals. Apple being so strong in the creative market, we feel there's huge opportunity for third-party developers to start treating the grid as a built-in support component for developers' creative applications. There's so much you can do with the grid. If you can assume you have, you know, 5, 10, 15, 20, 200 computers at your disposal, you can start doing things you never even considered. We have Cocoa APIs, we have Objective-C APIs. You can build it right into your application so that the user doesn't even have to know they're using a grid. You can just make your application go faster. - Sure.

Terminology. When you're sitting down and talking about a new technology like Xgrid, it's always useful to get the terminology out so that everyone's talking the same things. Xgrid's simple, but, you know, the terms will help us have a further conversation throughout this presentation. So the first terminology that we tend to throw around is the client. Now, for the purposes of this presentation, the client is any computer on the net that's doing a job submission to a grid. That's the equivalent of the user choosing print and and submitting it to the print spooler, that's the client. User choosing send my job to the grid, they're the client.

These are the people that have work that needs to be done. The controller. The controller accepts the job, parses the job, and figures out, well, who can I get to schedule this work on? It figures out, okay, I've got 200 computers. This task has 10 things. All right, how many computers do I need to use to do that? It does all the matching or all the brokering for, you know, given a job, how do I make it all, get it all scheduled and get all the results out and collected. So it manages the job scheduling, the data movement, and the agents. The agents are the computers that have been joined to the grid and they're the ones that are going to do the actual work. These are the idle G5s in your computer labs at universities. They join the grid, they are now available to the grid to schedule works that the clients submit as jobs. So these can be anything from high-end dual 2.7 G5s to older, underutilized blue and white G3s or a closet full of Mac minis. So the agent is the place where the job actually executes, where the computational resources are consumed, the results are collected from.

work is scheduled and executed on the agent. If your grid job has been properly composed into multiple jobs, the grid will execute as much work in parallel as possible. So you're not submitting a job to the grid and just having it execute task one on agent one and then waiting for that to finish and task two goes on agent two. The grid will figure out as much work as possible that can be done in parallel. This is where you pick up the grid's performance capabilities.

You can have the grid working on things in parallel. Huge, huge computational benefit. Client, which we discussed. You have the controller, and you have agents. But hopefully with your grid, this is a lot of overhead just to have one agent. Hopefully you have more than one agent. You have as many agents as you can lay your hands on. And you don't have just one client. There's not just one scientist, one financial analyst, or one creative content professional doing renders. There are multiple people that are using the grid, And they don't have to worry about how many agents there are. They don't have to worry about where the controller is. They don't have to worry about what jobs are running local, what jobs are running remotely. They can just use the grid as a resource to make their workflow go faster or to accomplish a task such as, you know, calculating pi to the 10 billionth digit or something along those lines or something much more meaningful such as curing cancer. The terminology for the grid is the grid is a set of agents and a job queue. The job queue controls the list of work that's going to be scheduled on the agent. There are jobs. A job is a set of tasks that are definable and atomic. It's a collection of input and output data. This is huge because we put all the data onto the agents. And a task. This is the smallest piece of work done. This is the little thing that takes, you know, a number, runs a fast Fourier transform on it, and transforms it into something else. These are command line executables. They can take arguments. They take working directory files, and they also work with standard input. So with the three-tier architecture, not only do you have clients, controllers, and agents, you can subdivide the agents, if you want to, into virtual grids. So you could take all of your computing resources on campus, join them to the controller, and I can dedicate, you know, the top three iMacs to the physics grid, the middle to the biology grid, and finance can have all the iMac minis in the closet at the bottom. Or you can combine all the resources into one Uber grid and use all the resources you want. The point is you as the administrator have control over how the agents are used and how they're scheduled.

So what does Xgrid do? It groups agents into grids, as seen by the previous slide. It monitors agent availability. This is very big. While the agents are a member of the grid, the grid controller knows whether the agent is currently awake, asleep, idle, so on and so forth. So the controller, when it receives a job, already knows what agents are available for it to schedule work on. This is a huge deal. This means we're not farming tasks out to agents that have been put to sleep ages ago. We know that the agent's up. We know it's alive. We know it's ready to work. It manages queues of jobs and their dependencies.

This is, in my opinion, somewhat obvious, but that's what the controller does, is it manages the jobs. There's ways you can describe to Xgrid that this job has to complete before this other job's appropriate to run so you can manage the dependency. The controller makes sure that everything executes in the proper order, and then it has all the necessary data that it needs to keep everything flowing. It schedules runnable jobs, so it knows which agent has a job. It schedules it on it. It monitors the agent's results and collects them when they're done. And I can't emphasize this fifth slide-- this fifth bullet enough. We on Xgrid handle the data staging. One of the big things about grid computing is getting the input data onto the agent, and there's all sorts of different ways to do that. Some people set up NFS mounts, other people write FTP scripts, so on and so forth. Xgrid's really beautiful in this particular area. We take care of staging the input data onto the client, so you can have a client with absolutely nothing on it.

And when someone submits a job to Xgrid, the Xgrid controller makes sure the agent has the executable, makes sure it has the input data, and we collect up all the output data. So the agents don't have to have anything pre-installed on them. The Xgrid controller will get everything down to the agent that it needs to run the job, collect it all up, clean it up, and leave the agent clean when it's all done with the process. So you don't have to wander around to 1500 machines and pre-install all of your grid-enabled software. The controller will take care of getting all of that to the agents. So you can take a Tiger machine out of the box, plug it in, do any additional data install, join it to the grid, and if there's jobs being scheduled on that grid, that agent will immediately be put to work if it meets the criteria for the job scheduling. No additional software installs are required. This is a huge thing that goes-- not only do we think it's obvious, but it also is one of the ease-of-use things that we've put into the grid.

And it recovers from failures. It wouldn't do the scientist a lot of good if he submitted the job and just because one of the agents got put to sleep or someone tripped over the power cord that the job didn't run. The controller knows when the job's done. It knows the job was scheduled. If the job doesn't finish executing and the agent goes away, it'll reschedule the job on another agent. So we are always working-- we always make a best effort to fully execute the jobs as they are submitted. And because we have the agent status, we can do that.

What are the new features in Tiger? For those--this is mostly a slide for people that were familiar with Technical Preview 1 and Technical Preview 2. The first thing we added was authentication. Well, grids are a computing resource. So, you know, you might not want everyone on your campus being able to submit a job to the grid, so we've added Kerberos-based authentication so that you can restrict the grid to only being used by certain people. This fits in with open directory and ties in with the rest of the server architecture. Cocoa Developer APIs. You can now, as a Cocoa Developer, very easily integrate the grid into your Cocoa application such that using the grid is seamless to your user. We'll be going through a code example later in this presentation. Server administration integration. I mentioned this earlier. We provide a full suite of server administration tools to administer the agents and the controller.

Xgrid Admin Application. This is what lets you monitor your jobs--excuse me-- lets you monitor your jobs and see their job status, so on and so forth. Multiple tasks per job, didn't have that. We now allow a single job to say I need to do multiple tasks. It seems kind of obvious, but we previously didn't have that. So we now can have multiple tasks be part of a single job. Task and job dependencies is a huge feature. This way you can post a single job that has a number of dependencies and needs to be run in segments. The controller won't schedule the jobs before they're ready to be run.

Where are the components? Well, Tiger comes with the agent already installed. How many of you have Tiger on your PowerBook? If you open it -- oh, wow. Thank you, thank you very much. If you open up sharing, you'll see Xgrid on the sharing panel. Xgrid's installed on Tiger desktop.

If I gave you the password to the controller up here on campus, you could all become agents and we'd schedule all sorts of jobs on your PowerBooks while you're doing this presentation. We have a client framework and CLI tool and their code examples and the developer tools. In addition, Tiger Server comes with the controller and all the administration tools. So if you have Tiger Desktop, you already have an agent. If you have Tiger Server, you already have everything you need to set up a grid. So what are you waiting for? And again, for those of you who can't go to Tiger, but you still have Panther, we do have an agent available for install on Panther. David, is that on the Tiger server, or do they download that from the web? Download? Yeah, so you download the Panther agent as an installer package from the web.

We do support message passing. For those of you who do grid programming, MPI is a very big deal. We have sample codes to show how to submit Mac MPI jobs, which is a legacy or pre-existing grid technology. Those are at this URL. And OpenMPI, a big grid consortium or people who do a lot of MPI development, have actually modified their MPI run command to use XGrid. You can get more information about that from www.openmpi.org. They're big fans of XGrid. We've been working with them, and we're looking forward to their feedback. back. So at this point in time, I'd like to bring up the author and primary contributor to XGrid, David Kramer, and he'll be taking you through some demos and additional information. Thank you. Thank you.

Thank you. Thank you, Dave. So I'm going to talk to you about X-rated architecture. So as Dave mentioned, it's a three-tier architecture. There's a client, a controller, and an agent. The controller is the heart. It's in the center. It's the one that does the splitting and the monitoring and collects the results. And when the client is ready to download the results, it can get them.

So here's your client. It's detachable. As Dave mentioned, you can shut the lid on the PowerBook, take it home, and the job remains queued on the controller. Having the jobs queued on the controller with no one to do the work isn't very useful, though. So of course, you can have agents. We have the full-time agents. It's our rack right here. You can have part-time agents. That's your computer labs, your workstations that aren't used at night. They can come when the user isn't using the computer and join the grid. And then finally, you sort of have another kind of part-time agents, which are the internet agents. And these are the volunteers across the world who are participating in your grid and helping you solve your problems.

So let's talk about security. A lot of people have questions about how secure XGrid is or what the model is and how it controls access. So first there's the authentication. And you have three choices. One is none. This is not very secure, of course. But it is an option if you're doing development or testing or you have a private network. That might be suitable for you. There's also password and Kerberos. And I will talk more about these authentication methods in the next slide.

There's also privilege separation. So the controller runs as its own user, and so does the agent, but the tasks that get run by the agent don't run as that user. And so the agent uses a helper tool that runs as root to launch the tasks as an unprivileged user, or in the case of using Kerberos authentication, you can have the task run actually as the user who submitted it. So the task privileges, as I said, runs Nobody under password authentication or no authentication. Otherwise they run as a submitting user.

So no password, suitable for private networks only. I use this in my own testing because it's more convenient not to enter a password. However, if you go to Kerberos, then you've got single sign-ons, so you only need to enter your password once at the beginning of the session. And then it's just as easy to use as no password. So I would highly recommend getting Tiger Server, setting up an Open Directory master, turning on Kerberos, and using that with X-Rid. that will be most secure and you'll get the most benefit out of that.

In the password case, we don't send a password in the clear over the network. There's a two-way random challenge response, mutual authentication protocol used. One more thing to mention about this password is that it's a single password. It's not per user. So if you want 10 people to be able to use the grid, you're going to give all 10 people the same password. This may not be what you want, in which case if you want to go to per user authentication, you need to go to Kerberos.

As I said, Kerberos is the most secure. It also provides confidentiality, so we do encrypt all communication going over the wire if you use Kerberos authentication. So you don't have to worry that someone else is sniffing your results from your thesis project that you're going to publish next week. Thank you. It requires a directory system and a KDC. We recommend Open Directory Master. However, it does work also with Active Directory.

The client access is controlled by ACLs. So you can specify exactly which users or groups are allowed to connect to the controller and submit jobs and retrieve results. The agent access is also controlled, but it's only controlled by the presence of a principle in the KDC database. So if the agent has a principle, it will be able to connect at this point. There is no way to specify that only certain agents are able to connect.

So I mentioned the privilege separation before. The Xgrid agent starts as the user Xgrid agent, and it's launched by launchd at startup, or you can start it manually using the Xgrid kudl command. Immediately, because it's launched by LaunchD, it starts as root, but it immediately lowers its privileges. But before doing that, it launches a helper tool, and that helper tool also runs as root, and it remains running as root. So the helper tool doesn't have any sockets open on the network. It's not talking to anyone else except the extra agent, so you've got a little bit more separation there, a little bit more protection. When the tasks actually run, they get started by the helper, they run as nobody or as a submitting user in the case of Kerberos authentication. And the results actually get sent directly back to the agent. There's not an extra copy going through the helper because of the wonders of VSC file descriptor PASA.

So the discovery and authentication is also an area where people wonder about how it works. They a lot of times think that maybe they need to connect directly to the agent to tell the agent what to do, or they want to go out on the network and gather all the agents. And that's not how XSRID works. The way XSRID works is that the controller is the one that opens the socket on the network. It's the only one that accepts network connections. Everyone else has to find the controller.

So the first thing the controller does when it starts up is it advertises via Bonjour and says that, hey, I'm here on the network. Now the agents can be configured to connect to this service, or they can be configured to connect to an IP address or a DNS name. But using Bonjour, they get instant notification that the controller is available and that they can connect. Once they connect, the mutual authentication occurs.

and both the controller and the agent determine whether they are talking to who they expect to be talking to. The same thing happens with the client in parallel. The client gets the notification that there's a controller online. Presumably, if it's an application, you've got some sort of browser, or it's been preconfigured to use that controller. And so again, the authentication occurs. It's visual. And everyone's happy at this point. We're all connected, and we're ready to actually do some work.

So the workflow follows a mantra that we call the submit, monitor, and retrieve. So the first thing is the client has some work to do and packages it up as a job and sends it off to the controller. And the controller splits the job up into tasks and sends the tasks off to the agents.

So now that the agents get the work, They start doing the work, and once they're done spinning the little icons, they presumably have finished the computation. And they can send the results back. But in the meantime, if one of these agents had gone offline, the controller all this time is monitoring the progress of these computations and will reschedule the work if necessary.

Once all the work is done, it gets sent back to the controller. The controller collects the results and stores it waiting for the client to retrieve the results. It notifies the client as soon as the results are available so the client doesn't have to keep polling asking, "Are we there yet? Are we there yet? Are we there yet?" So you retrieve the results, and that's the entire workflow. Submit, monitor, retrieve. And-- sorry.

We'll get there. Almost there. All right. And now I have a demo to show you of doing audio encoding on the grid. So can I have demo one, please? So we have this little demo app that we put together. And the first thing it does is it allows us to browse for controllers here. And so we're going to connect to the top XServe over in our rack here. And we've got a job queue up there. Not much going on yet. We're going to drag some audio files in there. And they're AIF. And we're going to encode them into AAC. We're using the AF convert sample code that should come in with all your developer tools. nothing real special going on. We're just doing the encoding. I ran the encoder on my machine last night to see how long it took to do these files in -- serially on one computer. And it took about two and a half minutes to encode 600 megabytes of AIF on a G5.

So, presumably on this XServe rack we're going to go a little bit faster than two and a half minutes. Hopefully all the lights are lighting up over there and we can see all the jobs have started running and hopefully a little progress will go by on a few of them and they keep going and it looks like a couple of them are done. Looks like they're just about all done. Almost getting there. So instead of 2 1/2 minutes, it looks like we got this down to about 20 or 30 seconds. So that's the power of using the grid.

And there you go. You see we've got our AIFs and we've got our AACs, which are about 10 times smaller. All right, let's go back to the slides. Thank you. So now, how do you administer your grid? It's clear how to submit jobs. You just drag files into a window, right? But administering is a little bit more complicated. You want to think about how you want to partition your resources up. And also how to just have them all configured to find the controller, because that's the one thing. Everyone needs to know where the controller is, or at least know the name of the controller. So we have various administration avenues. The first is the sharing preferences, which configure the Tiger agent. There's also the separate pref pane that configures the Panther agent. There's the server admin application, which you use to actually start the processes, to start the controller and start the agent. You also can configure it, and it gives you the same configuration options for the agent as the sharing preferences pane, as far as, and I'll go into the details more later. There's also the XSuite admin application, and this is the one you use once you've started the controller and configured all the agents to find the controller, you then use extra admin to partition the agents into grids to monitor the job queue to see how much of the performance you're utilizing just to get a broad overview. It's the administration application to monitor the grid while it's in use. So here's a screenshot from the sharing preferences. Down there at the bottom you've got the Xgrid item and that's selected. And so you've got Start/stop button and a configure button. Pretty simple. When you configure things, these are your choices.

You specify the controller. You can either enter a rendezvous or-- excuse me-- Bonjour service name, or a DNS name, or an IP address. You also get to specify whether it's a part-time or dedicated resource. In the part-time case, when you only accept tasks when the computer's idle, idle is determined by 15 minutes of no user activity. It doesn't look at network or disk access. It looks purely at mouse and keyboard activity to determine if the computer's idle. If a user comes back and moves the mouse or types something after a task is started, that task will complete, but no new tasks will be accepted by the agent. And then finally, you get to choose which authentication method you want to use to connect to the controller. And this gives you the choice of none, password, and Kerberos.

In server admin, you get some of the same agent configuration controls, but you also get an overview of both the controller and the agent, as well as settings to set for the controller. So first you've got to add your server with server admin. You select the service. You choose which tab you want to look at. In this case, we're looking at the overview. There's also the logs, which lets you see the messages that are put in the system log by Xgrid, and then the actual settings where you get to set up the agent and the controller. And then finally, once you've set up the agent and controller, you need to press the Start button. In this case, it's already started, so you would click the Stop button to stop the services.

On the Overview tab here, we see at the top, you've got your agent information, and at the bottom, you've got the statistics about the controller. Thank you. So once you've actually gone to the Settings tab and you choose the Agent Settings, you can enable the agent. You can set the exact same settings as in the sharing prefs-- dedicated, part-time, and which authentication method.

The controller settings are very similar. You can enable, and then you get to choose the authentication method used for clients and the authentication method used for the agents. And these can be different if you want. If you do want to have the Kerberos support for running tasks as the user that submitted, you need end-to-end Kerberos authentication. So both clients and agents need to be using Kerberos.

One more thing to talk about in server admin is the service ACLs, and these allow you to control which clients can connect to Xgrid to the Xgrid controller. So you select the server. You choose the settings, you choose access, you choose the XGrid service from the services, and then over here on the right, you get to add the users and the groups that you want to allow to use the service.

Next is the Extrude Admin application. This is the one you're going to use to actually monitor the status of the grid while it's running. This is the overview tab of the application. We've got a tachometer here that gives you the currently active CPU power of the grid, all the grids in your controller in this case. The CPU power is simply the sum of the gigahertz of the clock CPUs of all the CPUs. So if you've got a dual 2.0, that's 4 gigahertz of CPU power. If you've got a G4 400, that's 400 megahertz of CPU power, one-tenth this amount.

You get to add the server, the controller you want to connect to with this app. Then you get a list of all the grids. You can add more grids. You can rename the grids. You can choose which grid is the default grid. When someone submits a job to the controller and doesn't specify which grid they want to use, it goes to the default grid.

And then you've got the overview, the agents list, and the jobs list. And I will show you those in a moment. On the Overview tab, in addition to the statistics, you have the tachometer, as I said. The statistics are very similar to what you saw in Server Admin. When you go to the Agents tab, you get a list of the agents that are in the grid or being managed by the entire controller.

You can use the search widget up there in the toolbar to narrow the list if you have a lot of agents and you're looking for the status of a particular one. And then down at the bottom, when you select an agent, you get a little bit more information about that agent, how many processors it has, and how many of those are currently in use.

And then the job queue, probably the most interesting part of this application, lets you actually see which jobs are running. Shows you the progress, when they started, when they finished that they finished, or whether they failed or were canceled. And you have control to pause jobs and cancel them using the buttons down at the bottom.

So you have the job list. Again, you can use the search widget to find a job you're particularly interested in. And then down at the bottom, you get the statistics about the job. One thing to notice here is that you can look at and see what the identifier of the job is. And the identifier is just the string used by the controller to uniquely identify the job. This normally doesn't need to be exposed to the user, but if you're going to use the command line tool to work with the grid, this allows you to figure out how to talk about this particular job using the command line tool, you would use the identifier when using the command line tool. Thank you.

Now I'm going to do a little demo. I'm going to actually write some code for you up on stage, show you how to make your own job submission interface using the Cocoa API. So last year, I think I showed people how to run Calendar. Cal is a simple tool that takes a couple arguments, the month and the year, and then it calculates a text calendar and prints it out. And let me show you.

It doesn't take very long, but I would like you to ignore the computational power of this and think more in terms of how you would factor your own computations out into a separate small executable that takes various arguments and depending on the arguments does different things and presumably if you can submit the same executable to the grid with slightly different arguments and run them in parallel, you can get a lot of results back that you can use.

Scientists might think of this as a parameter study, but for video encoding, you can imagine maybe different frames, the number of frames -- the range of frames is the argument to the tool and then maybe also the file name that you're going to be transcoding. Thank you. So first, I have this grid calendar example, which I haven't made any changes to really, from the sample code that came with Xcode. So connects-- let's get rid of these guys from our last demo.

And create new job. This is just grid sample, basically, with the name change to grid calendar so far. I haven't made any code changes, so... What this allows you to do, the sample code lets you submit a job that just runs shell with whatever you type in the command field here as the argument to -c of sh. So I can type echo world, submit the job, it runs, it finishes, and I see the results and it echoed hello world out. Not too exciting, but at least it works. However, for your own application, you probably don't want to force people to type a command line into a text field to submit to the grid. You'd like there to be a more interesting user interface that is problem specific. So in this case, we're going to use the new date picker control that's in Tiger to choose which months and which years we want to get the calendars for. So first thing I'll do.

is add the code to these files which don't have any code in them yet. So this is the application delegate. It's just a subclass of the grid sample application delegate that's already in your developer tools examples. In case you're wondering, it's in /developer/examples/xgrid/grid-sample. That was the interface. We're not declaring any new instance variables, so there's really nothing there to show. Here we're overriding one method just to choose a new class for the job submission interface controller. This is just the way the grid sample project is set up. It's very easy to override the class that's going to be the window controller for the user interface so that we can create our own method that responds to the submit button and takes what's in the user interface and turns it into a job specification to submit to the grid.

I'm also going to fill in the stubs on the new window controller here. And now in this case, we're going to -- the interface is going to have a start date and an end date that we're going to use to determine which months to calculate with Cal. Amen.

I'm going to leave the implementation empty for now. We're just going to make sure I didn't introduce any typos. Everything still builds. and I need to change a couple classes in the nib file now. So I created a new application delegate. So I need to change the class of this guy. It's currently GridSampleApplicationDelegate. I want it to be GridCalendarApplicationDelegate. GridCalendar. That's it for the main menu. you.

And then we have the new job nib. And this is the one that actually gets shown when you choose click new job from the toolbar. This looks pretty good, but we don't actually want to specify the name or the command. So we can get rid of these guys. We want a start date. We want an end date. And we need a date picker. So we got one of these, another one of these. And let's see, we don't need any time, since we're just choosing months and years.

So that's about it for the window. I also need to change the file's owner class here. It's currently grid sample new job window controller. I want it to be grid calendar new job window controller. And so it looks like that nib's looking pretty good. However, nothing's connected yet. So I'm going to use Cocoa Bindings to bind these controls directly to the instance variables that I defined. So we've got the owner object. We've got a start date.

End date. And that should be it for our new job. So before I write any code, let's just make sure I didn't make any mistakes. We'll run the app. And sure enough, it looks like everything's working just fine. So now we have to actually take what's in the user interface and turn it into a job specification. A job specification is a dictionary that contains a number of well-known keys and values.

So the way this code is set up is that it will call a job specification method when the Submit button is clicked. But before that, we need to set up some initial values so that the user interface is just ready to go. So it's a good demo just to click Submit without having to enter anything. So in this case, we're just creating a new start date, a new end date, and setting them as the initial values for those instance variables.

We also need to manage our memory, as all good Cocoa programmers know. And finally, this is the interesting method here. We're gonna return the job specification from this method using what's in the user interface. So first thing I'm gonna do is put the dates in ascending order just for programming convenience.

So if the user enters a start date that comes after the end date, we'll just swap them. It's very easy to do. We have to do this conversion here because the date picker instance variables are NSDate, but it's much easier to work with NSCalendarDate when all you care about are months and years.

Then we just compare them, and if they're in descending order, we swap the order so that they're in ascending order. Thank you. Next thing we're going to do here is create the name of the job. So rather than have the user enter the name of the job, we're just going to calculate it based on the structure of the job specification. The first thing we do is we create the task prototype. Since we're going to submit a single job that does, say, 12 months, we're going to have each month be a separate task so that they can run in parallel. But since each task is going to be running exactly the same command, it's convenient to set up a prototype that all of the tasks that we submit are based on. In this case, it's a very simple prototype. We're just specifying an absolute path that we know exists on all of the computers. However, you could include input data, the actual command data with this task prototype, which makes it much more efficient to submit. If you're submitting 1,000 tasks that all run the same command, you don't want to submit that command 1,000 times. You don't want to submit the executable code 1,000 times. You just want to submit it once and have them all use that. So here we create the specification.

We set the command using the command key and we create a prototype identifier, which is just the name of the command in this case, because we only have one prototype. It doesn't really matter what the name is. But you can have any number of prototypes all keyed off a different name, and then you can have some tasks are based on one prototype, other tasks based on another prototype.

So now with the prototype, I'm going to create a test specification. So again, we start with the empty dictionary. We've got those integers for the start year and the start month that we determined before when we were creating the name. We're going to use it now as a counter. So we're going to assign it to a current variable. And we've got this while loop here. And we're just going to keep looping over, creating tasks. And then we're going to increment the month. and then we'll increment the year when we get to December.

So in this case, we turn the month and the year into strings because arguments are always strings. We put them in an array. Then we create the specification in this case. We're saying that it's based on that task prototype that we generated a few lines up. And we're going to use those arguments. So each task is going to have unique arguments but the same command. Then task identifier is any identifier you want to use to identify the task within the job. must be unique within the job but it doesn't need to be unique across jobs. It's solely for the person submitting or the application submitting to be able to then refer back to specific results when you do retrieve the results and you want to find the results of task seven or task B or whatever you called the task. The identifier is an opaque string.

We add the task specification to the larger task specifications dictionary using the task identifier as the key. And then we look for the end condition. If we're not there, we increment the month. If we're off into month 13, we're probably really meant to be in January of the next year.

So we handle that too. So that's... Almost all of it, the last part is putting it all together into a job specification. So you can specify an application identifier when you submit jobs. You can also specify when you write an application using the Cocoa API that you are only interested in jobs with a particular identifier. And this means that your job list in your application, if you do have a job list, will only be displaying the ones that are relevant to that application and not ones that were submitted by other applications or by the command line tool. So we set the name that we determined before, we set the application identifier, we set the prototypes, and we set the specifications, and that's it. That's the whole job specification that we've put together. So let's see how that works.

So we've got my start conditions in here already. Let's do three years here. We're going to try from January 2005 to December 2007. Submit the job. So you see the name was generated based on that. And you see the progress is going by. As each task finishes, the controller recognizes that and increments the progress a little bit further. It's just going purely by. If there's 36 tasks and 18 of them are done, we're at 50%. So this is going pretty quick. As you saw, CAL doesn't take any time to run.

But we can look at the results now. And here they all are. We've got January 2005 at the top. And we've got December 2007 at the bottom. So that's really how easy it is to create your own user interface in your Cocoa apps to submit jobs to the grid. And that is it for that demo.

That pretty much wraps up our talk today. I think I'll invite Ernie Prabakar and Dava back up on stage to take any questions. But before we do, let's just remind you to take a look at the session website for this. The grid calendar final code is available on the session site as well as some documentation for the XGrid Foundation frameworks. To see more how those are used, I highly recommend looking at the example code that does come with the developer tools.