Information Technologies • 54:45
The NetBoot services built into Mac OS X Server allow system administrators to manage a group of computers as easily as managing a single computer. But what happens when that group grows to be several hundred or thousand? Discover how to scale your NetBoot environment beyond the traditional workgroup size from those who have done it before.
Speakers: Mike Bombich, Joel Rennich, Gavin Cook
Unlisted on Apple Developer site
Transcript
This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.
Welcome, everyone. As I said a couple years ago, it's practically impossible to give a very dynamic and thrilling presentation after the great Steve Heyman. Alas, here we are. My name is Mike Bombich. I am a systems engineer with Apple. I've been... I've been with Apple for four years. For those of you that don't know me, I've got a sick obsession with Mac OS X deployment technologies. And one of those is, of course, NetBoot. And that's what we're here to talk about today, especially in the context of large networks.
In a nutshell, NetBoot is a collection of server services that allow a client machine to discover and boot from a disk image that's kept on that Mac OS X Server. That's a really basic overview of what NetBoot is. If you're not familiar with it, I encourage you to take a look at the Mac OS X Server documentation, the system image documentation at the Apple Server documentation website.
I also encourage you to take a look at the Boot PD man page. There's a lot of really good technical detail about how NetBoot works on Mac OS X Server in there. The rest of this presentation, though, is like drinking through a fire hose. So if you're not more familiar with that, hang on to your seat.
In order to get a really good idea of how to leverage NetBoot technology on a large network, it helps to have a really in-depth understanding of how the NetBoot process works. Today I'm going to break down, I'm going to talk about the server services that are involved with NetBoot on Mac OS X Server and some of the NetBoot support files. I'm going to break down NetBoot into four distinct stages based upon what you see when you're sitting at a NetBooting client.
I'm going to go into some really detail on the BSDP exchange between a client and a server during the NetBoot process. And then I'll wrap up the last three stages of the NetBoot process by talking about the role of those NetBoot support files in the rest of the boot process.
With that under your belt, I'll hand it over to Joel Rennich with Apple and Gavin Cook of Genentech to talk about how they've used NetBoot in real world scenarios. So with the intro done, let's go ahead and dig in. This is going to be a lot of fun.
Server admin can be a little deceiving when it comes to enabling the NetBoot service. It's not actually just one service that you're enabling when you click on that start service button. There are several different services that are enabled in the background and configured on Mac OS X Server. For starters, the BootPS launch daemon is initiated by LaunchD and loaded to listen for traffic on port 67. That's the BootP port. BootPD is the daemon that's launched when traffic is received on this port to communicate with a client that's issuing a DHCP or BSDP request.
TFTP, or Trivial File Transfer Protocol, is the protocol that's used by clients to download the NetBoot support files. NFS is used to mount the NetBoot disk image. Some people choose to use HTTP in networks where NFS is not a good choice. And finally, in the case of diskless NetBoot, AFP is used to attach that NetBoot shadow file.
In order for a client to boot from a NetBoot server, that client needs basically a NetBoot set, which is just a folder of five files that reside on your NetBoot NFS SharePoint. In that folder, there's the nbimageinfo.plist. This is a NetBoot set configuration file. It contains details like a unique identifier for that NetBoot set. It contains the name of the NetBoot set, where various files within the NetBoot set are located. They can be on other servers, for example.
And then what architectures are supported by that NetBoot set, and what hardware can actually use that NetBoot set. There's a disk image file. This is just a disk image of a fully configured Mac OS X installation for the case of standard NetBoot, or a disk image of, for example, the Mac OS X installation DVD.
There's a booter file. This is your secondary loader. This is just a bridge between the firmware of the machine and the executed kernel of the machine. This kick starts the NetBoot process. Mach.macosx is your kernel file. This is what you would find at mock_kernel at the root level of the file system. Mach.macosx.mkext is your kernel extension cache. Basically, that's just a cache of drivers that the client needs to be able to boot from a disk image that's hosted on a network server.
So these last three files are responsible for the most primitive part of the boot process. They really communicate with all the pieces of the hardware at the very lowest level. So it really should come as no surprise that these last three files are architecture specific. You may have noticed that the 10.4.4 server admin update allows you to NetBoot Intel clients from your Mac OS X Server.
And basically the way this works is that the architecture specific files, if they're PowerPC, they can either exist at the root level of your NetBoot set or in a folder labeled PPC, or for the Intel architecture specific files, they need to be located in a folder called I386, as you can see in the screenshot here.
When you NetBoot a client, the first thing you see is a flashing globe that is replaced with the Apple logo and the spinning globe beneath it. That spinning globe is eventually replaced with a progress indicator that you would see on any typical boot. And eventually you see a blue screen indicating that the Windows Server has loaded. And finally you see either a login window or the installer interface.
We can use these visual indicators to actually break down the NetBoot process into four stages. First of all, why would we want to do this? Well, NetBoot involves a whole slew of communication protocols and different services hosted on different machines. Who here has had a NetBoot problem? Okay.
More often than not, it's actually your network. And I'll tell you why in a little bit. But by breaking down the NetBoot process into four stages, using those visual indicators as cues, we can really get a better grip on troubleshooting. For example, if my machine is stuck at the blinking globe and it never proceeds, I know that I'm in the DHCP and BSDP exchange, and maybe there's a problem with that exchange. So I can use some specific troubleshooting tools to dig into that issue. So I can use some specific troubleshooting tools to dig into that issue. So I can use some specific troubleshooting tools to dig into that issue.
So the first stage of the NetBoot process, the blinking globe, is initiated by, first of all, the firmware determines that our boot device is a NetBoot disk image. So the first thing that firmware does is it sends out a DHCP discover. This initiates the BSDP and DHCP exchange.
This is actually the most important part, the most important stage of the NetBoot process. This is where the client gets all of its information about how to boot from our network server. This is also the stage that most frequently fails, especially on a large network. You have to do some specific router configuration to make this work, quote unquote, out of the box.
So I want to go into a little more detail in this particular part of the NetBoot stage. And actually, I want to first give you a 30,000 foot overview of what kind of happens between the server and the client during this initial exchange. But I want to do it for two different scenarios. In our first scenario, the client has never NetBooted before. The client has never indicated a preference for a specific NetBoot set.
In the second scenario, we do have a preference. Either the client has booted from our network server or the client has booted from our network server. Either the client has booted from that NetBoot server before or we've gone into the startup disk pref pane and actually chosen a NetBoot set.
The exchange between the client and server is actually quite different in these two different scenarios. So in our first scenario, our initial boot sequence, the client issues via broadcast a DHCP discover. Our DHCP server responds with an offer. We request, and the DHCP acknowledges that request. This is a standard DHCP exchange. There's nothing special about it here.
Not having received a response from any NetBoot servers yet, our client now goes into a collection mode. It issues a BSDP informed list via the broadcast address to get a list of NetBoot servers that are out there on the network that have a NetBoot set for me. I'll play the client.
Our NetBoot server responds, if it has a NetBoot set for our client, with a BSDP ACK list. I'm a NetBoot server and I've got this list or this NetBoot set for you. After a selection process, the client issues a BSDP-informed select to the NetBoot server. And the server responds with our really juicy packet, the BSTP Act Select, which contains all the information our client needs to actually boot from the NetBoot server. And then the client will go ahead and start the next phase of the NetBoot process.
So that's how it works for a client that has never NetBooted before. What if we do have a preference? What if I want to boot from a specific NetBoot image? And more importantly, where could I even store that preference? If you think about it, we have this thing called diskless NetBoot.
And if I don't have a disk in my machine, how do I store a preference for what NetBoot server I want for this specific client? We could store it in firmware, but that would be kind of complicated from a UI perspective. But what we actually do is the NetBoot preference is stored at the server in the NetBoot bindings database. This is a flat file located at var db bsdpd_clients at the server. And the server will create a record for a client any time it receives a bsdp_informed_select.
So there's actually two different times that we'll get that bsdp_informed_select from a client. We saw one in the last slide. I pulled the machine out of the box. I held down the N key. It went through the DHCP and bsdp_exchange. And the last thing the client sent was a bsdp_ax_select. My client implicitly selected the default boot image ID on my NetBoot server.
So the other place that this can happen is there's actually a bsdp client built into the startup disk preference pane. When you open up the startup disk preference pane, and there's NetBoot servers available, you notice that, first of all, the startup disk preference pane will issue a bsdp_informed_select.
form list. And all of the NetBoot servers within range will issue a list of all of the NetBoot images that they have to offer for that client's architecture. You then click on one, and startup disk preference pane sends out a bsdp_informed_select. And the NetBoot server will create a binding for that client in that NetBoot bindings database.
So it's important to point out both of these scenarios because if you were to pull a machine out of the box, hold down the N key, boot from your default NetBoot set, Go on about your business. And then go to Server Admin and change your default NetBoot set to something else. And then go back to that client, hold down the N key. What's it going to do? It's going to boot from the previous default NetBoot set because that client had implicitly made a selection.
It had made a preference for that NetBoot set. So that tends to confuse some people. You can actually hold down Option N on the new Intel Macs to boot from the real default NetBoot set to ignore the BSL. Go on about your business. And then go to Server Admin and change your default NetBoot set to something else. And then go back to that client, hold down the N key.
What's it going to do? It's going to boot from the previous default NetBoot set because that client had implicitly made a selection. It had made a preference for that NetBoot set. So that tends to confuse some people. You can actually hold down Option N on the new Intel Macs to boot from the real default NetBoot set to ignore the BSL.
So if I do have a preference, my client is going to send out that DHCP discover, just like it did before via broadcast. The DHCP server is going to pick up on that. But now my NetBoot server is saying, hey, I've got a record for this client in my NetBoot bindings database. It's going to respond with a BSDP offer immediately.
And of course, this has to occur via broadcast, because our client doesn't have an IP address yet. The client's going to go ahead and request that IP address, get an act from the server. And now, because I have everything I need to start the NetBoot process from my BSDP offer packet, I'm going to go ahead and immediately initiate the NetBoot process.
So those are the two phases of NetBoot. And that's the 30,000 foot overview. That was the easy part. So now you're stuck. Now you're stuck. You're at the flashing globe, and you can't figure out why you're not getting past it. You're stuck somewhere in the middle of this process. What can we do to get more information about what is passing between the client and the server? How can we get more details on this? Hands down, one of the easiest things that you can do. Oh, that's beautiful. I've never seen it that big.
One of the easiest things that you can do is a packet trace. This is a very simple thing. You can tell how simple it is based on that tcpdump command. And you can do that at the server and you can see all of the packets fly between your client and the DHCP server and your client and the NetBoot server.
And if we pick this apart, we can see the IP header, the UDP header, our RFC951 standard DHCP packet information, a little magic cookie that says that the rest of this packet is our DHCP vendor options, and our BSDP vendor options are stuck in there in a little opaque value. Now, I got really good at reading hexadecimal. In fact, I printed out a little cheat sheet and taped it to my desk. My wife called me a geek.
and I think we're all there though. For those that aren't really good with the hexadecimal, there's a little bit of ASCII on the right there that we can read some things like the boot file, the server name. And we can maybe try and discern what kind of information is passing between the client and the server. Decoding those vendor options, especially the BSTP vendor options, could be a real challenge if this was the approach that we took. So there's actually an easier way.
If you recall, Boot PD is the server process that responds to DHCP and BSTP requests from our client. If we unload that Boot PS launch daemon and then load Boot PD manually in verbose mode, we can actually get a lot more information about the communication between the client and the server. You can use these commands to put it into verbose mode. Don't forget to take it out of verbose mode, otherwise you will have NetBoot problems.
And then, so, I don't want to give away too much yet. The next several slides, what I've done is I've taken my boot PD server, I put it into verbose mode, and I netbooted one client against my netboot server. And then I'm going to see all of the communication, one packet at a time, between the client and the server, and then I went ahead and color-coded things and broke it down a little bit to make what is very dry a little bit more interesting and a little more digestible.
So in our first packet, our client issues that DHCP discover out to the broadcast address. Boot PD, sitting on port 67, says, hey, there's a client request, and it parses all of that hexadecimal, all of that information, and spits out our DHCP information, the standard RFC 951 stuff.
I cut some of it off here because it's not particularly interesting, and I needed some space in my slide. Our DHCP vendor options, we've got some interesting things here. The message type is discover. We've got a parameter request list, client identifier, vendor class identifier, and then our vendor-specific options, our BSDP stuff. And then Boot PD further breaks that down by parsing out the BSDP options. So we see lots of information here. We see exactly what conversation our client and our server are starting to have. In the red there is what you normally see in the system log.
Raise your hand if you've ever tried to use the Boot PD messages in system log to actually troubleshoot NetBoot. Okay, that's great because that's kind of a waste of time. It's interesting, but it's not actually very useful, and it shouldn't be. We shouldn't typically dump debug information into System Log.
So BootPD sees this packet and it actually recognizes it as a BSTP discoverer in addition to a DHCP discoverer because our client passed on two critical pieces of information. Our client passed on that vendor class identifier saying that I am an Apple Boot Service Discovery Protocol client. My architecture is I386 and my hardware identifier that I pulled out of firmware is MacBook Pro 1.1. I've also asked for two specific vendor options from the NetBoot server. I've asked the server to identify itself using that vendor class string and I've also asked for some specific BSTP parameters.
In this case, my NetBoot server is not providing the DHCP service. So while that packet was interesting, I'm going to show it to you and parse it. Boot PD is not going to respond to it. It's just going to ignore it because we're not providing that DHCP service.
In a case like this, I like to also do a TCP dump at the same time. I don't run the DHCP server. Joel down the hall runs the DHCP server. And he's kind of shady. So I want to make sure that my DHCP server is giving my client an IP address in that offer and a pingable domain gateway.
Our client is going to respond with a DHCP request. This is going to go out via broadcast. So again, my Boot PD server is going to see that client request. Again, this is interesting insofar as it's coming in on port 67. But again, I'm going to ignore this because I'm not providing the DHCP service. And further, I'm not the server identifier that's mentioned in this packet.
So I'm not supposed to respond to this. So I'll just log this and move on. Again, back to our TCP dump output. I get a DHCP ACK from my DHCP server. So I should have some confidence at this point that the DHCP portion of the first stage of NetBoot is working properly. My client should have a configured network interface.
Not having received anything from a NetBoot server at this point, my client is now going to go into collection mode. It's going to issue a BSTP inform out to the broadcast address. and it's going to ask for a list of the NetBoot servers out there. It's a very simple request.
Before the NetBoot server simply responds, though, the NetBoot server is going to qualify this client. If you've ever looked in server admin, you see you can set up a NetBoot filter. You can filter out specific clients to prevent them from booting from your server. The first thing that BootPD does is it determines if this client's MAC address, its client identifier, is in one of these NetBoot filters. If it is, it'll just ignore the packet.
If not, it's going to go ahead and loop through each enabled NetBoot set and further qualify this client. First, does this NetBoot set support this client's architecture? Is this client's system ID supported by this NetBoot set? Both of those pieces of information, BootPD is going to pull out of that nbimageinfo.plist file from that NetBoot set.
It's also going to check to see that there is a booter file for this client. This one's a little obscure. It seems like an odd requirement. But BootPD will actually look to see if that booter file exists for this client's architecture. And if it doesn't, even if the NetBoot set says it supports that client architecture, it's not going to respond. So if for whatever reason there isn't a booter set, Joel deleted it from my server, then our server's not going to respond.
And finally, BootPD will determine if this NetBoot set is the default NetBoot set for this client's architecture. This isn't a requirement. It is optional. But if you have several different NetBoot sets that meet the criteria for your client, by specifying a default NetBoot set, you're going to reduce some ambiguity. You can be sure that your client is getting the right NetBoot set.
So our server will issue a BSDP Act List, which includes two important pieces of information, the default boot image ID that it's providing to the client, and then server priority, which I'll talk about next. Our client is also going to qualify the servers that respond. It won't just pick the first one that responds. It actually issues that BSDP-informed list several times to make sure that it gets a very thorough list of NetBoot servers that are out there and the default images that they're hosting.
Once it gets that thorough list, it's going to pick the NetBoot server and the default NetBoot set on that server that has the lowest load, that has the least load. And it does that by using that server priority tag that it gets from that last packet from the NetBoot server. What that is, is basically just a number of clients that are in that NetBoot bindings database for that NetBoot server.
So if server A has three clients bound to it, server B has five, we're going to go ahead and pick server A because presumably there's fewer clients actually NetBooting from server A. So this is an important point. If you're doing some stuff with load balancing and you're getting some odd results, say you've got no clients booting from one server and like 50 from another server, you may need to clear out that BSDPD clients file because that contains some information about previous clients that had booted from that server.
Finally, our client has made a choice. It's going to issue a BSTP-informed select, indicating that it's selecting. It's going to indicate a specific server identifier and that default image ID that was provided from that server. Finally, our server issues a BSTP Act Select. This, of course, is the juiciest packet. It's got all sorts of information. We've got a TFTP path to our boot file, that booter.
We've got a root path, which is the NFS or HTTP path to our NetBoot disk image. And then we've got, of course, that selected boot image ID again. And we also have a name that we can assign to ourselves when we get into the boot process. At this point, the client has all of the information that it needs to move on to the next phase of the NetBoot process. We're done with the BSTP exchange.
This is where we see the Apple logo and the spinning globe. This is where the client will download those NetBoot support files. The first thing that happens here is firmware will take that boot file path from our BSTP X select packet and it will download the booter and execute that. As soon as it executes the booter, firmware is done with the machine. Firmware is going to unload so it can hand off all of the resources back to the rest of the OS. And booter will assume control of the rest of the boot process.
Booter is then going to determine the TFTP path to the other NetBoot support files, the kernel and the kernel extension cache. It's then going to download those. Finally, the booter will execute and then transition control to the kernel. And that's really all there is to this particular stage of the NetBoot process. During this stage, that spinning globe is kind of a progress indicator, indicating that you're downloading files via TFTP.
The next stage, of course, we see that progress indicator. This is where the client is loading our NetBoot support files. Again, the kernel has control of the system here. The kernel is going to replace that spinning globe with the progress indicator that we usually see when we boot up Mac OS X.
The kernel will then attempt to load the kernel extension cache because we need some very basic driver support for accessing the network and mounting a disk image over that network. Once we've loaded that extension cache, we're going to attempt to mount the NetBoot disk image either via HTTP or NFS.
These are two very important things because you can tell just based on these steps that occur two things that could happen in this process. If we've got a bad KEXT cache or we've got a KEXT cache for an inappropriate architecture or inappropriate hardware, you might get a kernel panic here when the kernel tries to load that cache.
This happens all the time when you get that new Mac Pro, you try to fire it up from your old NetBoot image and it instantly kernel panics right after you get that spinning progress indicator. That's because you need to update your kernel extension cache or update your kernel extension cache.
So you can't just create a new NetBoot set. People also see problems here when they attempt to mount that NetBoot disk image over NFS and they've either got ports blocked or NFS isn't even running on their server for some odd reason. So if we succeed in these two steps, we've got a set of drivers and we've got a mounted root file system, we're ready to roll. The kernel is going to go ahead and execute launchd and launchd is going to initiate the rest of the boot process by firing off the ETSI RC script.
In the case of a standard NetBoot, the RC script is also going to execute the RC.Netboot script. This script is responsible for attaching a shadow file to our NetBoot disk image. If you've ever looked at the NFS configuration for your NetBoot SharePoint, you may have noticed that it's exported to world, but it's exported as read-only.
This, of course, is a pillar of NetBoot. I want a bulletproof disk that I can NetBoot my clients, they can make whatever changes they want, and I can just reboot that to wipe out all those changes. That's because those changes aren't occurring at the server, those changes are occurring in that shadow file.
Typically, I'm going to look to the first local hard drive to create that NetBoot shadow file because of performance reasons. It's much faster to make a lot of small writes to that local disk. In the case of diskless NetBoot, though, in some areas, perhaps, say, the United States government that has very sensitive information and they don't want anything ever written to an internal drive, they actually remove the drives altogether from those systems.
In that case, I don't have that internal drive available to me. I'm going to use a network SharePoint to create that shadow file. And the way that I do that is, one thing I didn't tell you before, is that BSDP X select packet that I got from the server is actually stored in RAM. And there's tons of information in it, a little bit more than I was forthcoming in that last slide. We also have a shadow mount path, which is an AFP path.
Including a username and a dynamically generated temporary password, a server address, and the AFP SharePoint that I can use. I'm going to go ahead and mount that SharePoint, and then I'm going to use the shadow file path parameter out of my NetBoot packet to determine where on that SharePoint I can create my shadow file. So, assuming all of this occurs correctly, I should have a mounted file system that appears to me as read-right. And I can go ahead and conclude the rest of the boot process, which is characterized by the loading of the Windows server and our sweet little dock.
So I hope from this that you've learned that there are some challenges to the NetBoot process, but by breaking it down into specific stages, you can really-- Pick specific troubleshooting methods and not just try all of them or go about like a chicken with its head cut off.
Most notably, like DHCP, BSTP also uses extensively broadcast. Joel will talk a little bit more about how this can be overcome. But if you don't configure your routers to pass on these broadcast packets to your NetBoot server, well, the very first stage of NetBoot is going to fail if you attempt to NetBoot across subnets.
Again, several communication protocols are used, often from several different servers, managed by several different people. So not only are there technical issues with managing several different servers, but there's often political issues that you have to deal with. Of course, you all are going to have to work those out.
And finally, DHCP is a requirement. On large networks, this should be a no-brainer. Raise your hand if you manually configure IP addresses for 15,000 machines. But there are certain cases at Penn State University that are forced to use manual IP address allocation. This is actually potentially solvable, I learned just in the last couple of days.
As of right now, DHCP is a requirement, but that is something that hopefully we can move on from. So now that you all are experts on Netboot, I'm going to go ahead and hand the reins over to Joel Rennich to talk about how he's used Netboot in Japan.
Thank you, Mike. Joel Rennich with Apple Enterprise. And I won't do any packets. and David We're done with packets. So first I want to talk about some NetBoot fallacies. Over the years some myth and other stuff of NetBoot has come up and so sometimes people think it's a little different than what it really is. You don't need a DHCP and NetBoot on the same box. We haven't done that since colored plastic.
So once we moved into the whites and the blacks and the silver cases, you don't need to have your NetBoot and your DHCP server on the same box. So don't worry about that. You don't need an Apple DHCP server. As Mike talked about, any DHCP will do. We don't care where it comes from.
We just add some extra stuff afterwards once you've already got the IP address that you need. So no, you don't need an Apple DHCP server. Another fallacy is that the larger the NetBoot image, the longer it's going to take to boot, the longer it's going to take to use stuff. No, that's really not the case.
That one kind of surprised me, but we found with very, very large NetBoot images that since the NetBoot client only grabs what it needs, who cares if you've got 30 gigs sitting there on the server on that disk image? We'll only need the pieces that we use. It's not like we're transferring the whole file down to the NetBoot client. Size really doesn't matter. Just keep that in mind.
Another issue that a lot of people thought you could only have 16 NFSD processes running. Obviously NetBoot heavily leverages NFS if you're using NetBoot over NFS. And in the server admin GUI you could only set 16 of these in there. That was kind of a bug. It was a little UI limitation. So you could actually go into NetInfo and actually edit one of the config files to up that up. Or now starting with 10.4.7 I think we fixed that. So you should be able to put upwards of a lot more into there.
NetBoot Server needs to be on the same subnet as the client's is not necessary whatsoever. So you can use some stuff called a DHCP helper, or you can actually statically assign it if you wanted to. DHCP helper is much, much easier. And I even have a little graphic that kind of shows you how this works. And you've got your iMacs up there. And you've got a switch. Typically this is done in the switch, sometimes in the router. If you have a layer 3 switch, a DHCP helper is going to allow you to broadcast DHCP requests to different subnets.
And it's typically used so that you can put your DHCP servers on the other side of the building or something like that. But you can also use that exact same technique or tool to broadcast boot P requests for the BSDP stuff. All right. So in this case, the clients go off and they fire off DHCP requests. And that gets echoed to both the DHCP servers and to the NetBoot servers. And then each one responds with what it needs. The DHCP gives DHCP information. The NetBoot gives NetBoot information there.
If you were here at WWDC last year, you probably heard us talk a little bit about a bank in Tokyo that we've been doing some work with. There are some people in the audience who are even very involved in this. So hopefully I don't get anything wrong, because I only go there on weekends now. So I wanted to give you some ideas.
This is a fairly large NetBoot environment. So I'll give you some numbers about how we're doing stuff, where all the pieces fit together. I've been accused by a fellow presenter that this is in the PowerPoint 2.0 style, that it's very fast. So we'll see if you stay with me.
So 1,500 is the number of NetBoot clients that they're using right now. They're going to scale that up to 2,500 over the next couple of months, and that's about full, 2,500 Netbooting iMacs. And they use one Netboot image. So all 2,500 are using one Netboot image. Cool stuff. That image is 15 gigs.
15 gigs of space on there is the NetBoot image for all those 2500 iMacs. 150 to 200, the number of NetBoot clients per server. All right, so tweaking out the NFSD processes, doing some other stuff, getting more servers in there as possible when you need to scale up more. But it's been very solid and a very good technology to work with.
120 seconds is a typical boot process. So not nearly as fast as you would be off a local drive, but that makes sense. Still not too annoying. There are some things that you can do that can severely degrade that. So hopefully you don't do those things like mess with your network, remove your DHCP servers and things like that. Helps you out a lot. 10 minutes.
Time to Netboot 250 Netboot systems at the same time. That was kind of fun when we first started off. We grabbed a whole floor of iMacs, used ARD, and booted them all at the same time. And this big startup chime kind of echoed throughout the floor. So when you hit them all at the same time and you really kind of hammer the two Netboot servers that we had supporting these 250 clients, it took about 10 minutes from start to finish. A lot of machines started up early. Some machines took the full 10 minutes to go there. So just keep that in mind that if you ever had to restart them all at once, it's going to be a lot longer than individually.
40 kilometers, which is about 24.8 miles, is the length between one of the data centers and where some of the clients are. We are actually Netbooting across WAN links. WAN links in this case is like gigabit fiber. That's a little misleading. However, they boot off that connections without even knowing that they're booting off across town. There are, however, some remote sites that have a 50 megabit link.
To some of the branch offices. And we were very hesitant about this, but they are Netbooting off those links. Not a whole lot of machines. We just have a couple in each of the branch offices that are doing this. But they do go over 50 megabit links, and they're usable. They're not speedy. They take longer to boot up and some of the other stuff, but they can do it.
This kind of shocked our Netboot engineering group that they were even doing that, let alone that it was working. 12 hours. Important, important thing to keep in mind. Because that's the time of the flight from Chicago to Tokyo. And since I've been almost commuting with some of the other people that have been out there, you understand that pretty well.
So here's a little picture of one of the floors full of IMAX. They look all pretty. This is a little older picture. These are the G5s. They're transitioning to all intels to keep it all in one processor architecture. A typical kind of desk there. Some pieces there, a little eyesight in all the machines.
And a really kind of messy but hard to get any less complex than this kind of topographical map of their network. We were going to point some things out, but I think it has more effect if you just kind of stare at it and look at all the lines.
The graphics guy was like, yeah, we could highlight some of this stuff, but that'd be tough. So the idea here, though, is that all the iMacs are NetBooting. We've got an Active Directory environment set up in the Kerberos cross realm. Open Directory is the top dog in this. We have some Oracle Documentum Red Hat clients all using LDAP through Open Directory. Home shares on OS X servers. We're probably looking at some other stuff for that.
NetBoot servers, LDAP management, Kerberos authentication, kind of the whole ball of wax, which would have been hard enough to begin with. But now that you throw in NetBooting, it really becomes interesting. But some very cool stuff so far. Things have been going pretty well there. So some ideas of a practical NetBoot application. So next I'm going to hand it off to Gavin Cook from Genentech to talk about some of the stuff that he's doing.
Thanks, Joel. So I'm Gavin Cook, Macintosh Systems Architect at Genentech. And building on what Mike kind of showed, we've basically extended NetBoot so we could use NetBoot as a method for distributing 10.4 out to all of our users. We use NetBoot to roll out 10.4 because it allows us to basically build a virtual upgrade CD, but instead of having to go replicate thousands of these CDs, or actually would be DVDs at this stage, or in the past we'd use FireWire drives with a team of techs that would go out, meet with the customer, oh, the customer didn't show up, oh, no, this is a bad time, and got this new surprise meeting. It took us forever to get 10.3 rolled out due to these little unseen incidents.
We'd spent so much time scheduling people and getting things set up for the right time, people would miss. So this was great. This allowed us to just basically give it to the users. They could run it when they needed to. I was able to get it so it's more convenient for them, cost us less money, we had less staff involved. It was a really good upgrade. It also gives us a lot of flexibility. It's just a little bit more control than the default network install technique does.
We're basically using a true NetBoot. We're loading a new OS. We've completely tuned and customized that OS that we boot, and then that OS is essentially doing an upgrade of the user's system. So we're kind of running with no backups, no net, as it were. It worked out really well.
What we did to extend this is we've built a custom AppleScript Studio application that we pass out to the users from our web server. They download it. It's basically a single button for the most part. And it connects up to the NetBoot system. I'll kind of go through that.
We're also using PHP scripts in the back end to manage the client connections and to get a picture of how many clients are connecting, where they're connecting to. All the PHP scripts manage the whole flow of the system. MySQL database to store the session information. We're tracking what clients are upgrading, what they're upgrading from, how long it took them to upgrade. And that data just ran all over the place. We had some systems that I think are fastest.
Complete from beginning to end system, I'm sure it was a desktop, was I think 12 minutes. Blew me away. And then another system we had I think ran out the worst I think I saw was about 45 minutes. So I'm assuming that was probably an older iMac. We're using a custom local.rc in the shell level that basically once that system boots up, we're booted off the NetBoot image, we're able to use that custom local.rc file to basically start communicating back up to the PHP scripts in MySQL.
Actually, sorry. We also did a custom startup item that would install additional packages that we had. So we have some management stuff we wanted to use. We also wanted to get everybody up to 10.4.4 at the time is what we're using. We're upgrading that now to go to 10.4.7. So we have this final round that kind of does some light housekeeping.
[Transcript missing]
We're also checking the system specs to be sure they have an Ethernet network connection, to be sure they have power, to be sure they have enough RAM, enough free hard drive space. Putting this client in allows us to do so much more than what we can just do by holding down the N key. We're able to do a lot of preliminary stuff before we just throw them onto this image. So it allows us to be sure that that system is really ready to handle what we're about to do with it.
Again, as I mentioned, this integrates the NetBoot sessions with the PHP MySQL server and setting the firmware, just the whole network thing. Can't stress that enough. So basically once you download this little application, you launch it, this is what it will look like. Ask for your admin password.
All of our local users are admins so they can install their own apps. Essentially you need admin privileges on that system in order to set the firmware. So if you don't have that on your systems, that could be a stepping little thing you have to get over a little harder.
Basically, it checks to be sure you have power. In the case of a laptop, you might not have a power source connected, so it'll ask you to connect it to a power adapter, and then you quit and try again. Let's say you don't have a network connection, so we've got to have Ethernet. Please connect Ethernet. Try again. Oh, you're trying to be sneaky. You're on wireless. It's not going to work. Got to have Ethernet. Try again.
Basically, if you get through all this stuff, we've also checked the system's RAM, we've checked the available disk space, you're ready to upgrade. We tell the users this is going to take from 30 to 50 minutes, just because we really want to set the expectation a lot higher, and then we can be so much faster than that, they'll just be so impressed.
We really, in testing this, we found that we had a huge issue, not with our servers and how big we made our servers, or how many servers we had, or how many clients, it was really about the client on the end. So, in a very, you know... Simple environment where all systems are the same, similar to what Joel was working with, that's going to be a lot tighter number.
You can really have a much better idea of what that picture looks like. In an environment where you're completely heterogeneous and you have anything from, you know, the old colored plastic all the way on up, it can really be a challenge. And so we wanted to set the user expectation with that.
Basically, once the client reboots, it's going to go into the NetBoot image, it will be installing Tiger. Once you're done with that, it's going to restart again, and then we throw up a custom display screen that just basically has a progress indicator. They never see the OS, and it says installing additional packages. So instead of seeing the Macintosh is booting and you have the little progress indicator, we kind of hijacked that whole thing with a custom RC file.
And that allows us to install these additional packages kind of very cleanly without the user having to interact with it. So once you press restart, you can basically walk away from your system, and you come back in an hour after a meeting or something, you're going to have a new OS.
Really our big concern was our network. We wanted to protect the network and protect the bandwidth of the network. We're basically running one gigabit links from all of our edge routers and switches back up to the core of the network. So we're very concerned about having, you know, every subnet we can basically have up to like 240 systems. 255, but yeah, we've got some devices. So we really wanted to ensure that we're not going to flood that connection and affect the network performance for all of the other users. So really the big focus for our system was shaping the network traffic.
Each client will be copying the full 2.5 gig image during the upgrade because you're installing the OS. Everything on there is going to get touched. We placed our servers really in areas of high Mac concentration. We ran a number of network scans where we're basically analyzing how many Macs we had on different subnets, set a percentage to each of those areas, and that's how we picked where to put our servers. The server name is basically based off the building that they serve and we kind of use Zen as our prefix for the whole system because we want it to be very smooth. And so everything would be like Zen 1, Zen 5.
Then we also place two servers in our data center to serve the other buildings. Again, similar to Joel's environment, we're booting that, basically our data centers over a dark fiber WAN link, so no problems with that. Works really well. This is a picture of the environment. You can see that at the top we have that Zen server, which is basically our database server running the PHP scripts and the MySQL.
It manages the entire process. So when a user installs this application, they run it, the first thing that client does is it goes and communicates with the Zen server. And it says, hi, I'm so-and-so's machine. This is the building I live in. We determine the building based off the subnet. We basically built a table of all the subnets in our network and then mapped out what we're going to do with it. Which subnets are in which buildings. Again, we have over 400 subnets. Our network topology is pretty complex.
It's a standard Cisco core edge style layout. So one of the big things is just figuring out how to flow these clients, where the client's supposed to go. Once you've determined that-- so for this example, we have the other building. If you're in a building that doesn't have a dedicated server, you'll talk to the Zen server. It's going to say, oh, you're out in some odd place. So we're going to send you to the data center.
So you'll boot off the Zen 91 or Zen 92 servers that are basically in the building 90. If, on the other hand, you're in a building that has a dedicated server, such as the building three client, then it's going to say, oh, you're in building three. We're going to send you straight to Zen three. So that NetBoot server is basically installed in the network closet. So your connection is really short, really high speed. It's worked out incredibly well, almost too well.
So here's some of the code that I wrote in our AppleScript Studio application. This is basically--this is AppleScript code, but it's forming a shell command. Basically, that shell command is a curl, so it's just command line URL retrieval. What this does is this is forming basically an HTTP request that it posts up to the database server.
You can see here how we're able to capture all these different variables, and we're using The, just all the utilities available on the Mac to dig out this information. We're looking at a lot of different places to find the hard drive name, machine name, amount of memory, amount of disk space, the user account. We take all these details and post that up to the database. And again, one of the really important things here is the IP address, which we use to determine where the client's at. And that's actually in a different post.
So the MySQL database will store all this information, keeps track of session times, where clients came from, how long it took them. And then the PHP scripts actually control the flow of the queue and determine which server a client should go to. This is kind of a walkthrough of the session flow.
Basically, the user will launch the client, collects the data, post that up to the database, and then the server will reply with a session ID. And we're just using this to keep track of the client all the way through the process. Again, we're going through three different reboots, so we want to write that client ID down to the hard drive so we can keep track of it.
Basically, once we've done all those checks and the user clicks begin on the application, the client's going to request a position in the queue. That's going to go up to the database server and the PHP server and ask for a position in the queue. It's going to look up the IP address. It's going to compare our subnet table to say where is this IP address, where should we send it to.
It's going to then check the server queue and see if that server has open slots. Basically, what we did is we said we never want more than 20 clients booting off of a single server at a time. So, the day we announced this, everybody's so excited to get their hands on Tiger. Wow, spotlight.
We were really concerned about having a huge flood on that day. So, we really wanted to be able to back that off. So, what will happen is if the client can't get a position in the queue, it will basically pass back down to that interface for the user and says you're queued up. You're user 25 out of 20 people. So, as they're waiting to get a position, they'll see that tick down. And once people come off that NetBoot server, then we free that up right away.
So once they have an active slot, they basically go to the NetBoot start. At this point, the system will reboot. It's booting off of the IP address that was pressed into the firmware. Once you go through that, it'll start up our custom RC local on the NetBoot image. That RC file will post up to the database as soon as the client's done with the NetBoot.
So at this point, we're one step away from restarting the system. And we basically say, hey, we're out of the queue. So that'll free up a spot in the queue for another system. Then that client reboots, goes to another RC local, And that is basically the RCLocal that throws up the custom installation screen. It says we're installing additional packages. Here's where we do 10.4.7, 10.4.4. We do NetBoot, Landesk, a lot of different pieces. Some of the NetBoot cleanup. We remove that session ID from the drive, all that kind of stuff. So then it does a final reboot, and you're done.