From Power On to Login: Inside the Mac OS X Leopard Boot Process - WWDC 2007

Mac OS X Essentials • 59:12

Gain insight into the Mac OS X startup process from the time the power is turned on until the login prompt appears. Developers of file system plug-ins can learn how to boot from volume formats not supported by the Mac boot ROMs while avoiding the need to write Extensible Firmware Interface (EFI) or Open Firmware drivers.

Speakers: Curtis Galloway, Soren Spies

Unlisted on Apple Developer site

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript has potential transcription errors. We are working on an improved version.

My name is Curtis Galloway, and welcome to the Power On to Log In session, number one twelve. If that's not the session you think this is, then turn around now. I'll be presenting today with Soren Spies, he'll do the second half. And it looks like there's a lot of people here today, so I'm very impressed that everyone is interested in booting.

( Laughter )

I guess you do it a lot.

So why this session? The purpose of this session is to take some of the mystery out of the booting process, to help you do better debugging when you're you know, debugging something during that whole sequences, to design your products better so that they can interact with the standard boot process, and to reduce your support costs, and the confusion of both you and us, and your users.

What we'll cover today is basically everything that happens from when you power on your Mac up to log in window, and we'll also cover some of the things that can go wrong, and how to tell what's going on, and get some better information about debugging. And we'll cover some places where you're able to hook in to the boot process and use some supported APIs to do things, and also maybe some guidance on whether you should do that or not.

And I should note that much of the information that we present today is informational, you know, everything that we show is not necessarily something that you should hook into and try to use. So take care to always use supported interfaces as always, and although the title of this session might lead you to think that it is gonna cover loadable file systems, we're not really gonna cover that today. And so if that's your interest, you should come and see us afterwards. And please read the prospectus carefully before you invest, or send money.

So the basic booting process in a nutshell is four different stages, in OS ten or basically any modern operating system. You have the firmware, or the ROM, which is the very first part that happens when you turn on your computer. You have the boot loader, which is part of the operating system, but is at a very low level. You have the kernel, which manages the resources of the system on behalf of user processes. And then you have user processes, like your applications or the window server, or a log in window. And so we'll go through each of these steps in turn.

And in a little more detail, the EFI boot ROM section includes both initializing the hardware and searching for something to boot. The booter loads the kernel environment, and then switches into the kernel. Then the kernel does device discovery and I/O Kit, loading device drivers, and searching for the root device.

And then once you're in user level, you have launch D and log in, and all of the other applications that come up. And each of these stages have some signs that you can use to tell just from an external perspective where you are in the process. So for example when the light on the front of the system comes on, you know, that EFI has turned on, and is initializing the system. And then the boot chime indicates that it's loading the EFI device drivers and executing them.

The gray screen that comes up when the screen lights up is also in EFI, and then the Apple logo is actually drawn by the boot loader. So once you see the Apple logo, you know you're in the booter. And then the spinning gear is run by the kernel, which is the next phase.

And then of course once the blue screen and log in window comes up, then you know that you're in user land. So we'll go through in more detail in each of the sections just exactly when these things come up so you'll be able to identify more clearly, what's going on.

So first, let's concentrate on stage one, the ROM. The ROM really isn't read only any more, it's stored in Flash, so it can be modified and updated. But conceptually it's the thing that's built into your system that doesn't go away when you turn off the computer. So its job is to initialize the hardware when there's no software present on the Machine. So the very first thing that happens, even before you turn on your Machine, is there is a processor called the SMC, which is sitting there listening to the power button.

So even without any other power, the SMC is the thing that does this leap life, and sits there and waits for the power button. So once you hit the power, the SMC actually sends the signal to turn on the rest of the processor, and memory, and all the rest of the hardware.

And the processor does the first thing that it's programmed to do, which is to jump to the reset vector, which is common to all Intel processors, and loads a little bit of code that's built into EFI that knows how to get the processor in the right mode, do some very early initialization, turn on the memory.

And if you're familiar with the terminology used by EFI, that's called the pre EFI environment It's a very small section of code that's usually written in assembly language that is just the very most basic initialization to get you up into a state where you can run what you'd consider you know, normal Flash device driver software.

So once the processor and the chipset and the memory are initialized, then you go into EFI's Dixie, or D X E, which is device execution environment, where all of the EFI device drivers that are built in for the hardware on your Machine will run concurrently. Actually, it's a lot like I/O Kit, things sort of happen asynchronously so that they can initialize the Machine in the least amount of time possible.

So this is when the boot chime plays for example, one of the drivers that's the first one to load is the sound driver, which starts playing the boot chime while the other EFI drivers are actually initializing. So that all happens in parallel as your Machine is coming up. And any PCI add on cards that have ROM drivers would be run at this point, they would be found and run along with the built in EFI drivers.

Okay. So what can go wrong at this point? Well, the most common failure mode at this point would be a RAM failure. So EFI would detect that during the power on self test, and would sound a tone, and flash the hardware light on the front of the Machine. And that's how most of these kind of failures would be reported, is the flashing light and a tone, if the sound hardware is working.

So EFI's going to check, do a very basic check of your hardware, and if the initialization of any of the pieces fails then it will sound an error. Or if EFI itself fails, if you were doing a firmware update, and something interrupted it half way through, or the image was corrupted for some reason, then EFI would detect that. One of its own internal images was faulty and would abort itself, and then go into a recovery mode where it would load a recovering image from a different part of the Flash, so you'd be able to come back up and retry updating the firmware.

The EFI environment is a little bit different than the standard kernel environment. The processor is running in a virtual mode, but physical memory is all mapped one to one directly. So all the physical memory gets mapped in by EFI, and it lives at a certain address, loads any programs at some address that it picks. And then any devices are mapped in somewhere in memory the EFI controls. So it's sort of similar to operating system mode, but not exactly. Hopefully you won't be writing any EFI applications, so you won't care about this. If you are, see us afterwards.

We know who you are. Since our first Intel Machines we've had EFI one point oh compliance, and any EFI option runs that follow the EFI spec should just work as you would expect. We do encourage you to use EBC, the EFI byte code compiler, which would give you the ability to run on both 32- and 64-bit machineds. So that's very important now that we've started introducing 64-bit EFI with the new Macbook Pros that came out recently.

And in the future we do plan to meet the UEFI two point oh spec, which if you go to Tianocore.org or UEFI dot org. You can download that gigantic spec, and see just exactly how much work that is. But it has some nice features in it that we've been working on. We're a member of the UEFI consortium, and so we'll be working towards making all of our products compliant with that in the future.

And that also means if you are developing a graphics card, then we would encourage you to use the new graphics output protocol, which is a replacement for the old UGA, if you know what that is. If you don't know what that is, then that's probably a good thing.

So EFI's whole job is to do one thing, to boot your system. And what does that mean? That means to load some program and run it once. And to EFI, a program is an EFI application, a PE (inaudible) executable that has the EFI header on it. And so that could be the OS loader, but it also could be a diagnostic program, a partition utility, it could be a firmware updater.

That's how we update the firmware on your Machine is we actually execute a separate EFI program that does that. Could be a shell, you can download the shell from the tiana core website, it could be pong, it could be anything. But in normal usage it's gonna be the OS loader.

And the way that EFI finds things is by default it has a search path built into it to look for the one application that it's gonna load. So by default it goes through the local disks and looks at each partition. First, if it's an HFS partition it looks for a blessed file.

And if you're not familiar with HFS, a blessed file is just a file that is marked in the volume header in an easy to find way for a simple boot loader program. Or if there's not a blessed file, then it looks in a series of paths on the disk, and that's also specified by the EFI specification.

Or if you tell it explicitly what to look for, then you can set an NV RAM variable that will point directly to the file that you want, or to a partition, or to a device that it will then prefer in its search order. Or you can interactively affect the search order that it uses to look for the boot program by holding down the option key to get the boot picker, you can hold the N key to force it to loon on the network first, the C key which used to be CD but now is any optical media, or the D key to prefer loading a diagnostic program.

So all of these are ways to sort of affect the order that EFI itself looks for the boot program. Things that could go wrong here are if it can't find a program, so it could look on all these partitions and not find something that looks like an EFI boot program. Or you could have pointed it to a partition that didn't have a blessed file on it, and it couldn't find a file in all of the locations that it thought it should look in.

Or it could load the program, and the program could fail. So for example if the boot loader runs, but then can't load any of its resources, then it will return with an error code that then EFI will say oh okay, I'll give up on that, and then continue on in its boot path. So for example, if you had two partitions, and you load the boot loader from one and it fails, then it will fall back to loading from the other partition. So it's always trying to find something to do.

It doesn't give up because it knows that your Machine isn't very useful sitting there in EFI. So EFI's message to you that something failed is the flashing folder with a question mark, which it will flash on every failure. And then once it gives up and can't find anything, that's when you see it sitting there with the flashing folder that looks very sad. But let's assume that it's able to find something to boot, which is the normal case. We'll move on to stage two, which is the boot loader, which is gonna load the kernel environment and switch to the kernel.

So the EFI booter is an EFI application that's actually delivered along with the OS. Conceptually its part of the OS, the booter is on the DVD that you got with Leopard on it. And it's responsible for getting the OS into the real part of the operating system. So it's insulation for the kernel and the rest of the system from knowing too much about how EFI works. So most of the knowledge of EFI interfaces is encapsulated in the boot loader.

And as I mentioned before, it's the one that draws the Apple logo on the screen. So it uses the frame buffer to draw that as one of the first things that it does. So once you see that, you know that the booter was found, and it actually was able to execute far enough to find the screen and draw something on it. And its job is to load the kernel and the drivers, or kexts, and execute them so that you can actually do something more useful.

The way that it loads these things depends on a couple of factors. So by default, the booter is gonna look on the same device that it came from, which is the normal case. If you have one partition with the kernel and the booter and everything else on it, then the boot partition is gonna be the same as the one that you load the kernel from. So you can override that and specifically tell it oh I want you to load the kernel from some other place, which is specified as an EFI device path. So you can override all of these behaviors. But the default is to do kind of the usual thing.

So it's gonna load the kernel into memory at the place where the kernel wants to go, which is a specific address, and then it's gonna load the drivers after the kernel in one of three ways. Either the kernel and drivers all together as one package, which is called a kernel cache, which is one file with the kernel and all the drivers with all the symbols resolved, and that's the best, that's the fastest and the usual way when you're booting your system. It can load the kernel as a separate file and then a driver package called an M kext, which is basically all of the boot drivers present on the system.

And that's the next best way cause it's faster to just load one file than to pick through and do it the third way, which is to load the kernel and then each driver individually off of the disk. So it will actually scan through the extensions folder, look at each driver's P list, decide whether it should be loaded, and then load its binary into memory.

I'll discuss each of these ways here. So the kernel cache is the preferred method because it's the fastest. It lives in system library caches com dot Apple dot kernel caches, which if you say it three times fast, no. That contains both the kernel and the drivers as I said, and the file name of that cache file is a hash of a couple of different things.

It has to do with the path of the kernel, and another value which represents the state of the system, which is passed in by EFI. And the idea of all that is to make it easy to determine whether the cache is valid or not, because it has to go with the path of the kernel and the system to know that it represents drivers that are okay for the state of this system.

So the OS builds this once the system is all the way up, and it examines the current state of the drivers in the system. Waits a little while after your system comes up to give it a chance to settle, and then takes all of the currently running drivers and links them together with the kernel that's present on disk on your system, and stores it in this cache file. And the result is a set of drivers and the kernel that's good just for this one Machine. So it's intended to be local to that Machine.

The second method is a multiple kext archive of all the drivers, which is usually present on your system. It gets rebuilt whenever you change anything in system library extensions, and this kext M kext cache is basically all of the drivers in the extensions folder that have a certain property, which is OS bundle required, which is not equal to safe boot. Which sounds kind of counter intuitive, but really that's right. Because it means anything that's needed for a local route or a network route, or a console.

So it's any driver that could be used during the boot process. So that would leave out things like sound drivers, or something else that isn't required to either load a file to boot from, or interact with the user at boot time. And that is potentially a bigger set of drivers than are strictly needed for this particular Machine. But that's useful because if you have a group of Machines, like you're net booting all Machines that could run Leopard, that's probably what you want.

So it's good in that case. But you may still need to load other drivers later, like the sound driver that I mentioned is another example, something that wouldn't be in the M kext, but might get loaded at run time. And then the worst case is the individual driver loading.

So if the caches are missing or invalid because you took the disk and you plugged it into a different Machine, or some other reason, or you rearranged the hardware in your Machines, so now the hash of your hardware state doesn't match any more. Then the booter will fall back to loading every driver that has basically the same criterion that the M kext is built with, which is OS bundke required not equal to safe boot. And that's quite slow. The HFS implementation in the firmware is you know, not OS level. It's not intended to be.

So just as a sort of a special case here, let me discuss safe sleep, which is almost just like regular booting. Most operating systems call this hibernation. And so you may know that this happens, when you put your Machine to sleep it writes a hibernation image, so in case your battery fails, you'll be able to come back and recover.

So when you come back up after that, it's a lot like a regular boot in that there's a file on disk that has the kernel and the drivers, it just happens to have a lot of other stuff in it too. So the booter notices that the kernel set out a special flag saying hey by the way, this is a hibernation boot. So the booter says okay, I know what to do with that, it goes and finds the hibernation file.

Loads that into memory in preference to the usual kernel cache or whatever else, and it's bigger but it loads as much of that as it can, and then leaves some notes behind, shuffles the EFI run time pages a little bit, and then jumps into the kernel, a lot like a normal boot.

So it's really not that much of a special case, except that the booter does draw that graphic of the grayed out screen and the little progress bar, since the kernel isn't there yet, and it takes a little bit longer than would be comfortable if you didn't know that something was happening. From the I/O Kit perspective, looks just like wake from sleep. Your driver says oh I woke up, and you have to reset your hardware. So it's not that much different.

Net booting is another special case. Again, it's real similar to local boot, because you do the basic same set of things. You load the drivers and the kernel and jump to them. But because the EFI APIs are a little bit different for loading files over the network, the booter does have to know oh I'll call these instead of that. And it also knows that it's probably not worth trying to load individual drivers, so it never tries that. It only will ever load either the kernel cache or the M kext, the kernel plus M kext.

And it does load its configuration file so you can tell it what to do. But other than that, it loads it into memory, and jumps, and leaves a little extra information for the kernel saying oh I net booted and here's my packet that I got from the server. Otherwise the flow of things is pretty much the same.

So what knobs do you have to turn on the booter? As I mentioned before, there's some NV RAM variables that you can set that control the booter's behavior. So boot args you may know about, that's usually thought of as controlling the kernel. But also the booter looks at that to see if you have set the verbose flag, and it will use that to clear out the graphical screen and draw a text console of what it's doing.

And if you've never used verbose mode before, that's a really good debugging tool to use. Verbose mode to see what's happening, because then you can tell very clearly oh did the booter fail? Did the kernel fail? You know, where am I in the process where something went wrong, instead of just the screen with the spinning gear and that's not very helpful.

So other things you can set are the paths to the kernel cache, you can set the path to load the kernel, you can set the path to the M kext, and even on Power PC open firmware you can also set the kernel path. This, in case you didn't get it already, this is mostly about Intel and Leopard.

So I haven't really said anything about open firmware, hopefully that's okay with everyone. There is also a P list file that the booter will look at in library preferences system configuration. And this P list file can be used to set pretty much all of the things that you can set with NV RAM variables.

But the booter will prefer NV RAM over the files so that you can override them from the command line. And normally all of this is set by the bless command line tool. So if you're thinking about trying to do this to you know, alter the way that the startup disk is set, please use bless.

Bless knows the right thing to do, it has a pretty good interface for setting all this stuff. So check out the bless man page and see what it does. It gives you a lot of control over being able to do things like reboot once into a special program, like the firmware updater. Check that out.

But if you know what you're doing, you can set NV RAM variables explicitly with the NV RAM command. Now this deserves a little digression, because the way NV RAM variables work is a little bit different than the days of Power PC. EFI defines what it calls non-volatile variables, which have a name and a value, but also a GUID, which is a name space so that you can set a whole name space of variables and not collide.

And the intention is that each vendor will have its own name space. So what we did is we set one name space for the ones that are visible from the operating system. So using the NV RAM command from the command line operates on all the variables in that space.

But the variables that control for instance how EFI finds the boot file are in a separate EFI name space. So how do you set those things? Well the NV RAM driver does a little bit of magic behind the scenes. So if you set a variable with a name starting with EFI, then it knows oh this is one that I need to do something special to.

And it will do two things. One if it's EFI boot device, then it knows to set the EFI name space variables that control how the firmware loads the boot loader, and also it will translate EFI something to EFI something data by taking a matching dictionary that's stored in the variable, and finding the I/O Kit device for that, and then trying to translate that into an EFI device path. And I'll show you an example of that.

And again, bless knows how to do this, so. If you want to see an example of it you can go on your own Machine and set, with startup disk set the startup disk for your Machine and see what variables that bless set for you. Maybe you're doing it right now.

Here's an example. If you wanted to boot off of disk zero S three, then the value of EFI boot device that would get set for that is this big ugly matching dictionary thing. But basically it's, if you're familiar with using I/O Kit from user space, it's how you would construct a dictionary that would match properties that refer to a specific device in the device tree. So in this case it's a child of an IO media device with a particular UUID.

And so the platform expert was able to find that and translate that into this ugly looking binary thing, which if you speak EFI, you would see it refers to some PCI devices and a partition, and it just works. So it will translate this to also a variable called boot zero zero eight zero in the EFI space, which is what EFI is looking for.

More booter knobs you can do. You can hold down command S or command V to tell the booter that hey I want to boot in either single user or verbose mode, and command V is the simplest way to just see what it's doing, so you can try that. And that's interpreted by the booter, and then added to the boot args variable for the kernel to notice also. And that'll switch to the text console both in the booter and the kernel.

Shift will set safe mode and cause it to ignore any caches, like the kernel cache or the M kext. And it will also add minus X to the boot args, so the kernel will know to do that. And then other keys are actually interpreted by EFI itself before the booter, so the N, T, C option key, those are all interpreted by EFI itself and not the booter.

So, this is pretty much the end of the booter story. At this point after it's loaded the kernel and the drivers, it leaves behind some notes for the kernel saying the memory map that it got from EFI, a pointer to the ACPI table, which is a standard PC style table that refers to some of the hardware that you can's actually easily probe on the system, where it loaded the drivers in memory, the UUID of the boot volume, and the path that the booter came from.

It leaves these all in the device tree, along with the location of the frame buffer and the screen resolution. And then it tells EFI all right, I'm done with you, go kill yourself. So EFI closes itself down, frees all the resources it was using, except for a very tiny amount it uses for run time services that are used to run NV RAM, and maybe some other things, but a very small amount. And then it jumps into the kernel, and that's where the next phase of the story starts, which I will turn you over to Soren Spies for kernel, rooting, and user land.

( Applause )

Thank you Curtis.

So, we're up to stage three of the boot process. That's we've got the kernel in memory, and we've just jumped to it, and we've gotten a few little hints about what the system looks like. But the main thing that the booter has done for us is it's loaded the correct Mach-O segment out of the mach kernel file, and jumped to the right starting address. So on a PowerPC Machine we'd be executing Power PC, but we're gonna talk basically on Intel. So the good thing it did for us is it put us into the very beginning of a low level assembly, let's start our Intel initialization.

So the kernel comes up, it has a few clues, but basically all that stuff that EFI did is gone, right? It tore itself down, all its device drivers are gone, all that kind of stuff. The first thing you're going to see is that little gear. So the little gear will start spinning, and that means that we at least were able to find the frame buffer the EFI told us about, and that we were also able to get scheduling and threads started up.

So some of the things we have to do is reinitialize virtual memory to get it more into the model that we want for the kernel, we start timers, we install our exception vectors in case there's any traps that the processor needs to take, it'll be able to call the kernel and say hey, we had this kind of fault, what do you want me to do about it?

And then we go ahead and we fire up I/O Kit. Now some of you who write kexts as drivers know something about I/O Kit, some of you who write user land code that probes devices know about I/O Kit. But the important thing from a booting standpoint is that I/O Kit goes out and finds devices.

Meanwhile, the main thread continues on in a BSD init call. We've basically started up the scheduler and the mach task stuff, so we've got multiple threads running in the kernel. But the BSD init starts bringing up more of the BSD part of the world. So let's talk a little bit about what I/O Kit does in order to determine what's on the system.

So we mentioned that we get the ACPI tables, among other bits of information, from the booter. And basically the whole bunch of threads, we got the main thread waiting over here, cause it can't do anything until I/O Kit finds devices. Bunch of threads go out, and they start matching.

And in this case basically they're saying oh look, here's a device, the ACPI table as interpreted by the platform expert, there's a special driver that is kind of the first driver. And it says here's an ACPI table, and now I'll build some nice tree of device information that you can look at. And then I/O regs or I/O registry explorer will let you look at that same I/O Kit registry.

So we start looking at all these devices, and we say well which drivers want to drive these devices? So we brought all those boot drivers into the kernel, we have all their personalities, all their P list data that says well, I'm a USB driver, or I'm a hard disk driver, or I'm a sound card driver. Although the sound card driver is being ignored right now cause in trying to boot we don't really care about sound cards, they're not important for booting. The only noise that we made was the one a long time ago in EFI.

So we bring up these personalities, and we look at them, we say oh look, these drivers claim to be able to do this thing. And then we go over here and our threads are looking at all these different processes, or all these different devices, excuse me. And we say aha, this device looks like this driver, those are a match. And we say aha, that driver and this driver and this device, connect them together. So all the threads are connecting with all of the driver personalities.

And in some cases we have what's called active matching, where it's not just the description of the driver that matches on the device, but it's actually, your driver can run a little bit of code and say well I want to send a command to that device and see if it responds in a certain way. Is that my, you know, am I the right vendor or that kind of thing. So you can probe it actively. But the important thing is these threads are going on, and devices are appearing.

Let's take a little picture of this. So the important thing to notice here is that unlike the previous time during EFI, EFI's goal was to boot. So the only thing that it wanted to do was get at RAM, it didn't really, it was one to one mapped, it wanted to get at the graphics card so it could draw a few things on the screen. But all it really cared about was the boot path. It was saying the shortest number of device drivers I can get going on the boot path to actually get to my disk. The kernel on the other hand, wants all drivers to match and load.

So it goes ahead and initializes everything, all the drivers, all the boot drivers anyway are going ahead and doing their thing. So we start bringing up networking card stuff, and things that we might not need to boot this time, but could be needed for boot. We do have a four gigabyte address space, and we've begun mapping in in the more traditional virtual memory style.

If we need pages from physical RAM, we go ahead and map them in to the address space in a particular spot, and we use them as necessary. The video card remains mapped in to a frame buffer area that we can write to. So let's talk a little bit more about the matching that's going on.

We got the matching, and what we're waiting for, we have that description of the boot device that we got from EFI, it told, or the EFI environment where the booter was running. The booter said well I booted off of such and such a volume, and here's a way of describing it, or I booted off the network, and here's a way of describing it. So this one thread, this BSD init set comf thread gets stuck in I/O find BSD root, and it calls wait for service with a dictionary that describes a thing.

So if it was a network device, it's a network matching dictionary, if it's a disk device, it's a disk matching dictionary. But this is very similar to the user land calls that you can make. You say well I'm waiting for a certain device to come online, it looks like a certain thing. The kernel does this internally, and it's waiting. So threads are going around looking for devices, and this one BSD init thread is waiting for the root device to appear.

So eventually, ideally, that device does appear, and you know, the matching continues you know, after that. We don't you know, we just let that go. But to make progress on the boot, we go ahead and we say aha, here's our root device, and we take a look at it sort of I/O Kit style, and we say well, this has a certain BSD information, and we'll go ahead and pass that off to the BSD code and say here you go BSD, let's fire up your virtual file systems. And you know, here's slash so you can start going ahead and building up all the things that you're gonna put on top of their layer. Even though, so we basically are able to do a mount. So our HFS volume is now mounted on slash, normally HFS volume.

We also go ahead and create slash dev, and we attach it to I/O Kit. So as the I/O Kit matching continues, you'll see things dynamically appearing in slash dev, new disks that you plug in will appear over there. If this goes wrong, what you'll tend to see if you're in a regular boot, is you'll just see that gear spin forever basically, well not forever. It spins for quite a long time.

Eventually the kernel, even though it's still waiting, just wait, wait, wait, it will put up a circle with a slash to it, and it'll say you know, I don't think I'm ever gonna find anything, but I'm still waiting. If you do a verbose boot, you will see every thirty seconds you know, still waiting for root device. And that's cause when they call wait for service, we have a thirty second time out, but we just keep looping on that.

So that, we got our root device, which means that we've got our volume, our HFS volume probably. It's mounted, and now we want to go ahead and start up user land. So basically we're gonna get launch D running, and launch D's gonna take care of running a whole bunch of other processes to eventually get us logged in.

So we know that we have that one address space, you know, four gigabyte virtual address space, mappings, that's nice. And we know, most of us know you know, we've got a separate address space for each process that we're running. And we got to get from here to there.

So the way that we create user land is sort of like we pretend that we had a process. We say well if we had a process in user land, what would we want it to do? Well we would want it to start up launch D for us, because that's what we want to have happen.

And so we go ahead and we set up a second address space. We create a process from the BSD standpoint and the task from the mach standpoint, and we go ahead and we give it a thread. And we actually, we have an empty address space with no mappings, and we say hey thread, why don't you jump into that process, into that address space, go run in that space.

This is generally a bad idea to jump into unmapped address spaces with your thread, because it's just not gonna work. But we also set something called an asynchronous system trap, which was, it's part of BSD, and it, or came from BSD, and it basically lets you run a little bit of code right before you finish a system call. So normally you call in to make a system call, you do the system call, and then you jump back out and you switch contexts.

So this basically says after you've switched into the kernel context of this process, but we're not actually in the address space yet, run some code. And the code that we run is called load init program. And this is the one that's actually going to magically get us into launch D. So what it does is it allocates a little bit of RAM inside of this space, it's one page.

It copies the string sbin launch D out into this otherwise empty address space, and then it kind of makes it look as if this empty address space was gonna call exec. So exec as you probably know on Unix, is the way that you can say well here's my process, and I'm you know, I'm the shell, and I really want to run you know, LS.

So what generally happens is you make a copy of yourself, and then you call exec, which actually replaces that copy with the program you're running. So you fork, copy, and then exec LS. So LS blows away the shell. Now if the shell didn't really want to exist any more, it could simply directly exec LS and it would be erased by the time you return.

So we go ahead and we call exec DE, so it's almost like there was a process, and it's got that string out there, so it looks as if maybe it had called into us asking us to exec. And when it returns, it is actually, it's gone through the exec path, and the exec path basically blows away this whole address space that barely existed in the first place.

And it maps in, it reads in the Mach-O stuff, it maps in the right segment in terms of text to execute, and it fires up DYLD, the dynamic linker to link in all the other libraries. So we went from an empty address space, and then we pretended it called exec, and exec took care of tearing down the address space and building us new process. So we use the same code to build that first process as we would to build any other process.

If this goes wrong, we'll panic here and say that we couldn't launch process number one. Those of you familiar with Unix probably think oops, probably think of init as the first process, and launch D as the new init. There's gonna be a session on it later, you can learn all about how it works. We've had it for a little while, it was in Tiger, but it's even more in Leopard.

And it has replaced init, and mach init for that matter, which was a mach thing. And what it does when it fires up is it does, checks for system tuning files that needs to do system tuning, it kicks off something called boot cache which tries to optimize the disk layout of it. I mention boot cache because basically you've, let's say you've got like ten things that need to run to boot system.

We look at all the, we watch all the blocks coming off the disk. And whichever blocks were needed by those ten things, on the next boot we go ahead and we fetch them all off in order on the disk. So if you looked at how launch D launched the jobs, you might say well I would expect the first job that launched to run first, and the second job to you know, make progress a little bit slower. But in fact boot cache is going to cause basically whoever was on the disk first to get to run, because all of these process immediately basically block waiting for pages to come off the disk to execute.

So for those of you who don't know anything about launch D, a very short summary is that there's two kinds of jobs. One is on demand, and this is generally the preferred kind of job, it doesn't run until there's anything to do. For example, SSHD, launch D will bind onto a port for you for example, and when a connection comes in it'll actually launch the daemon SSHD. The other, but otherwise SSHD doesn't run.

The other kind of job is called keep alive. And keep alive jobs are, there's a few critical jobs that we have to keep running all the time in order for the system to work. One of them in this particular case is kext D. Kext D is generally the system daemon that's running, it talks to the kernel, and if the kernel needs a kernel extension loader, it'll say hey kext D, can you go get that kernel extension for me?

And at early boot time, kext D has an important job, and that is remember we only had boot personalities, we only had the boot drivers available. And so kext D's job at this point in time is to say hey kernel, here's all the other personalities that weren't boot personalities, you might need them for other devices to get plugged in.

So at that time, once kext D sends that down, that's the signal to the kernel that hey, user land is running, I can let it fetch kext for me. And also we have a busy count that we decrement at that point, so that once all those threads are done, we know that we're done. Cause if you imagine you only have the boot personalities, and you run around and you find all the things you can drive with the boot kexts, you might still be waiting for that sound card to come online.

And if somebody in user space managed to look for the sound card and it wasn't there yet, it might look like we had lost it, or it had disappeared. But in fact we just hadn't loaded the driver personality yet. So we pretend that the I/O registry is busy until we've actually sent down the rest of those personalities. This is a place also that you can plug in, in terms of writing launch D jobs that run, ideally they should run on demand.

We take advantage of this, if you look on your systems, at least on Leopard systems, if you look in PS, you will see that the process ID of log in window is actually lower than the process of window server. So log in window gets run out of ETC22IS (sp?) TTYS, I think it's still there anyway. And basically it says okay, I'm ready to log in on the system. So a server might not even run log in window if it just had you know, text log ins.

So log in window runs first, and then window server has simply told launch D, and said hey, I'm available if anybody needs me, but I'm gonna stay on disk, I'm not gonna run until somebody connects. So log in window runs, tries to connect to window server, and that actually runs the window server, and you can see that if you look in PS.

Once we have log in window running, I mean there's many other demand jobs that come up, disk arbitration comes up to mount the other disks for example. But that's not strictly necessary to get you logged into the Finder. So we try to take the short path of log in window pushing to get all the services it needs, so that we can get you logged in and running apps as soon as possible. Now maybe the log in window comes up and makes you log in, but most people, most users at least have automatic log in, and you're running the Finder and any other apps.

So that sort of gets you through the kernel, we constructed the launch D, and we got up to user space. And here's kind of what that looks like. We've got a bunch of address spaces, some might be 64-bit, most are still thirty two bit. We've got our drivers are all matched up, we've got you know, the video card, everything is available to us. But we've got now not just many to many within one address space, but we've got many to many across all of the address spaces.

So now that we've talked about the basic boot process, let's talk a little bit more about that rooting thing. So we know that we can stop and wait for the root device. And we know that basically the thing we're waiting for was told to us by the booter.

So if you want to change it, we mentioned earlier how you can change, I want to load the kernel off of a different place, I want to load the booter off a different place. You can also say I want to root off of a different place. And you can put this into that boot P list file, because the booter will read it, and then it will adjust the hints that it sends to you. You can give it a UUID if you use disk util info, you'll see a UUID for your volumes. That's what it normally gives you.

But if you store a specific one in your boot P list, you'll root off of that volume. And you can also give it a whole I/O Kit matching dictionary, if you want it to match on some strange device, you can get it to root off of that. There's also a sort of single use times, if you just want to debug, you can use boot dash args again, and V RAM variable.

You can set the root path to have a firmware path in it that can point out to something like NFS or HTTP for network routing. You can also use RD, which is more BSD style name. And if you say RD equals UUID, you can put a boot dash UUID right on the boot args command line, either from your EFI shell or from your NV RAM tool.

So one place we might use this sort of alternate routing, not just I came off of, booter came off a disk, described it to the kernel, kernel comes off the same disk, or kernel roots off the same disk, net boot could use this. So if you wanted to, like I mentioned, you could point off to some NFS or HTTP server.

But what normally happens is that we get the name from the boot services discovery protocol packet, which Curtis mentioned earlier, where you have some net boot server that knows all of the, all the roots, all the root images that should be used. So this is the classic example of where we're not just rooting on a basic disk device. And so the I/O find DSD root is waiting and then matching on a network device, and mounting up a root from it.

So we've talked about how boot and root are independent, and we've actually formalized that a little bit, first with the Mac Pro, and now more in Leopard into something we call boot root, with a not equal sign in the middle. So it's a new technology, and it basically takes advantage of the fact that there's a separation.

The firmware when it first comes up, the ROM, it needs to get to the booter, but it doesn't really need to do a lot more than that. The booter runs with firmware, and it needs to get to the kernel, so it'd be good to have the booter and the kernel together, those should probably live on HFS to make things simple in the early boot environment, but it's not like the firmware needs to get to the Finder, right?

It has no use for the Finder, the Finder is not a firmware app, and nor does it help bootstrap the system like the kernel does. So the Finder could live off on some virtual volume, or some network volume, or some other kind of storage that maybe the firmware doesn't know anything about.

And so we've formalized this distinction with boot root, and basically we allow the OS to live on an exotic volume, so called, whereas the firmware looks for little helper partitions. And as far as the firmware cares, it's got a booter, right? That's all it cares about. So it says hey look, there's the booter, let's go. So it can boot the little helper partitions.

And in the helper partition we store a boot P list that gives the UUID of the virtual volume that we actually want to boot. So you want to boot some virtual volume, you start the firmware pointed at a little helper partition, you put the UUID of the volume in the helper partition, booter reads it, passes it to I/O find BSD root, and you end up rooted off of this other partition.

So where do we use that? Oops, where do we use that? Apple RAID, most of you are probably familiar with having multiple disks kind of acting as one. So this is how the Apple RAID partition layout is. So basically we've got the Apple under our RAID partitions, which our Apple RAID kext matches on. And that then, it absorbs all of those, and then it publishes this RAID boot. It says here you go system.

But that's completely at the kernel level. So the firmware doesn't have any idea about RAID boot, it only knows about, in this case, these little Apple boot partitions. So your firmware is willing to boot off of basically two kinds of partitions, one is Apple underscore boot, and one is Apple underscore HFS.

If you hold down the option key on a system like this, you'll actually see four volumes that look, that would say RAID boot one, RAID boot two, RAID boot three, RAID boot four, representing each of the Apple boots. Because from the firmware perspective, there are in fact four bootable volumes, it sees four partitions that it knows how to boot. And what do we put in there?

We put in there the booter, we put in there the kernel so that the booter can get at it easily, we put that M kext, we talked about having all the drivers necessary to boot, and we put the P list, which has the UUID in it. And finally we put the disk label, so if you have a custom icon for your RAID boot volume, we'll in fact see four custom icons in EFI.

So how does this work? Basically we have to automatically rebuild the caches. So if you install new drivers, kext D is actually now watching, it's got a KQ, it's watching on system library extensions. And anytime you touch system library extensions, which is what you're supposed to do if you're installing a driver, it goes ahead and says aha, there's something new, I'll fire up kext cache, I'll rebuild your M kext, and you know, tell kext cache to update anything needs to be updated, and kext cache will update all those Apple boots.

So when you change some system component, which would normally be a kext, but you know, you can touch a mach kernel and it would also oh Mach kernel changed, better copy it down to the Apple boots. Kext D will notice and fire up kext cache, and it'll go ahead and copy it down.

This eliminates the need to boot from individual kexts, because we're always updating that M kext. Basically five seconds after you've touched system library extensions we go ahead and make a new M kext. So we're basically always booting from the M kext. And we also while we're at it, some of you may know that if you install a brand new kext, you need to send a HUP signal to kext, you need to say hey, there's a new kext here, it's got some new personalities, so you should now know about my device. So some installer scripts will send a SIGHUP to kext D. That still works, but if you touch system library extensions on Leopard and later, it will take care of that for you.

The, it should take care of that for you, but there may be some cases where it doesn't, so keep sending SIGHUP.

( laughter )

We, talk to me afterwards. It just occurred to me. So it's important to think of that Apple boot as just a copy of the stuff that's up in the root partition. So you're always installing to the root partition, and if there's any system bits that need to go into the helper partition, we go ahead and take care of that.

And if they get destroyed and you can't boot, you can in fact boot off your DVD, and if you repair the disk, we'll go ahead and repopulate those Apple boot partitions. But there's never anything in there that's unique that we can't rebuild from the main root volume. And the way that it's enabled for Apple RAID and other devices that have multiple data partitions is you have a property on your I/O media, which is just a type of object in the I/O Kit registry, and it describes the data partitions that are being used. And we, from the data partitions we derive the Apple boots that we need to update.

So there's a few new behaviors. There's a new flag on kext cache, kext cache dash U will basically update anything and everything on a volume. So you say kext cache dash U, you give it a volume, and it will update your M kext. It does, you may not have noticed but there's a lot of flags you have to pass to kext cache now to build M kext. You got to build all the architectures and stuff. This takes care of all that for you.

Kext D also notices a safe boot, and will touch system library extensions, which helps, it means the safe boot's a little bit stronger of a method for reinitializing all of your caches, not just on that boot, but on subsequent boots you'll be using fresh caches. On boot root volumes we'll be watching your system library extensions, so it'll go ahead and automatically rebuild the M kext for you. And also since you can put anything you want in the M kext, and we're always booting from it now from those Apple boots, the kernel actually verifies OS bundle required.

So if you have a OS bundle required key, and it's not right or it's not there, even if you put something in M kext, kernel's gonna ignore it. We're also dependent on volume IDs. So volume IDs are how we identify volumes. They're supposed to be unique, all volumes should have a different volume ID.

Volume ID is stored in HFS in, at the root node, and we derive that UUID you see in disk util info from it. If the volume IDs are incorrect or inconsistent, it's possible you'll not be able to root. So it's important to be careful about them, not to change them, and not make disks that have duplicates of them. So if you do need to change them for some reason, make sure to unmount and remount the volume right afterwards, because otherwise, remember we store that little cookie, the boot UUID down in the boot P list.

And if that's a stale value and we come up here and we're waiting, we're saying okay where's that device? Where's that device? I don't see it, you know. And the device is here but you changed the UUID on us without letting us see that it changed, then we're gonna wait forever, and we're not gonna be able to root.

So where is this used? We shipped it first on the Mac Pro, so you'll see Mac Pro kext cache dash U, it's used for Apple RAID on the Mac Pro, and in Leopard it's used for both Apple RAID on, it's both used on Power PC and Intel, and it's also used for K sensitive HFS.

You may or may not know, but the HFS implementation on open firmware doesn't understand case sensitive, it predates it significantly, and so we have a little Apple boot partition as a helper for all case sensitive volumes, and we're going ahead and doing a full boot root style, copy the booter, copy the kernel, copy the boot P list into that Apple boot.

If you run disk util list, you will see the Apple boot partitions on your Power PC case sensitive volumes, or on your Apple RAID setups. And this is part of content driver solutions going forward. If you have a device, and it's basically just descrambling that device, you're doing a different partitioning scheme, something like that, boot root's gonna be the way that third parties are going to be able to get their kext loaded. So you don't need to be at firmware time, you don't have to worry about firmware drivers for certain classes of content scrambling.

If you know, we can load the kernel, load the kexts, you match on your partition, you descramble it, and then you present the descrambled version to the OS. So this eliminates a certain class of need to write EFI drivers. At this point in time we cannot root from file systems that are not in Mach-0 kernel.

So that's mainly a interface issue and a code calling issue, but there's a couple reasons that we aren't able to root. So if you got a file system kext, sure the kext is in memory, but we can't necessarily get you connected as the root volume, so that's not supported.

So, let's talk a little bit about the tools, a little overview. Boot cache control I mentioned is the tool that communicates with the driver and the kernel for optimizing the disk layout of all the blocks you need to boot, bless is how you set your device pointers, you say well I want to boot on that or I want to boot on this. It can also change you know, volume, the pointers to the blessed file or the blessed folder.

Kext D is kind of watching over everything, responding to requests from the kernel for kexts. It's watching system library extensions, it's rebuilding the M kext in the pre-link kernel, and kext D's worker bee is kext cache. So kext D will notice something needs to be done, something needs to be rebuilt, it'll fire off kext cache to do the work.

And finally we have the NV RAM tool, which if you really know what you're doing and really want to muck with stuff, you can directly set the NV RAM variables for you know, for example you might want to boot verbosely all the time. NV RAM is a, tool is a great way to add dash V to the boot dash args flag, and then you can see all the Unixy goodness pouring forth from the Mac OS ten kernel and drivers.

So we've talked about lots of files and lots of how things work. I just want to remind everybody, don't change the booter, don't change the kernel. We want to keep the system increasingly able to heal itself, and able to verify that the files are correct. Software updates come along and they replace these things. So you know, play around as much as you want, but don't make any products rely on changing any system files. And basically those Apple boots, don't get any ideas about that either, because essentially we erase them every time we update them.

But there are some things you can do. You can use the debug knobs, you can create on demand launch D jobs, you can write kexts, if you have drivers that need to be written, although there are nice user land driver models for several things including USB. You can make AFI applications if you want to, if you want that DOS feeling, you can go get the shell and you know, run that. And if you want to do boot root for exotic device rooting, or if you want to write EFI ROM drivers, please talk to us.

It's possible, but there are many things that you need to know. And we will be happy to talk to the, several of you. At least one person came up, so I'm excited, sort of.

( laughter )

There's at least one, one third party has adopted it, so it is possible to do.

Okay. You should now know how to debug like a pro, cause you know the difference between the Apple and the gear, everybody remembers the difference, Apple, gear. Apple booter, gear kernel. Repeat after me.

( laughter )

Booter kernel, booter kernel. Okay, good. Don't muck with those file system IDs, we use them for boot root, we also use them for Spotlight.

And you know, sometimes, it's tempting to say well I think I know what's going on. Just, file system IDs should be unique, and ideally they shouldn't change. If you do install something new, please touch system library extensions, we like to think of that as our doorbell. That's like doorbell to kext D saying ding dong, something new, go check out things.

So do that. If there's a place where you're building caches and you're gonna be adopting Leopard, feel free to use kext cache dash U to build those caches for you without you having to figure out what all the command lines should be. And finally, use bless. Bless is really your fried. It is the back end of startup disk, and several other pieces of the system that need to set these NV RAM variables to make sure we boot off of this volume or that volume.

If you need more information, there's our I/O technology evangelist, there's our desktop hardware evangelist. We also have amazingly some documentation on this, system startup programming topics, the boot process. And websites we mentioned earlier, Tianocore is the open source implementation from Intel of EFI, great place to get that DOS feeling from your shell.

And UEFI is the standards body that is driving forward EFI adoption, and increase. So mailing list. There's a new mailing list, boot dash dev at list dot Apple dot com, you should feel free to join it. And there are great man pages installed on your systems for bless, kext, cache, and NV RAM.