Mac OS X: Kernel - WWDC 2000

Mac OS • 59:18

The Mach kernel is the heart of Mac OS X. In this session we shed light on the kernel basics by discussing the services it provides, such as memory management and interprocess communications.

Speaker: Jim Magee

Unlisted on Apple Developer site

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.

I want to thank you all for coming this early in the morning to talk about something so wildly exciting as the kernel. Okay, what are we going to talk about in this session? Well, mostly we're going to talk about the Mach kernel, what the basics of the Mach kernel are, what the basic abstractions are, when you want to program to those, when you don't, but most importantly, how the Mach kernel and its environment affects the rest of your development.

For the majority of you, you may be bringing over a Carbon application, and for a lot of those applications, you fix a couple of things that the Carbon data tells you are wrong with your application, and you're done. But you're not, because things don't behave exactly the same on X as they do on IX, and a lot of those differences are based on what happens in the Mach kernel. And so we want to give you a little background about what's happening under the covers so you can understand your app.

Okay, you're probably kind of tired of seeing this picture, but it's basically the layout of all applications in Mac OS X. Your application is going to fall into one of the basic categories of a classic app, a Carbon app, or a Cocoa app. At the bottom of that is the Darwin layer. And inside the Darwin layer is where the Mach kernel resides. The Mach kernel is the foundation of the OS. It provides the basic abstractions for getting the system running, things that Your task and your threads. The CPU and memory abstractions of the system.

Mach is intended to provide a rich set of semantics for those kinds of services. One of the things you'll find is that Mach was intended to develop lots of different operating systems on top. It was originally designed as a foundation for Unix operating systems, but then other operating systems came along, people provided DOS boxes running on top of Mach, provided specialized operating systems or embedded environments on top of Mach, and over the years the Mach semantics have become richer and richer and a little bit more formal each time. And that ends up giving you a set of semantics that let us do some things in Mac OS X that you wouldn't otherwise see. Mach is, you know, intends to be policy neutral. It leaves the policy decisions to higher layers of software.

In our situation, a lot of those policy decisions end up in being implemented in the BSD layer. Some bubble up in the higher layer, so Classic has some of its own policy decisions. They use Mach services, but the Mach services try to be policy neutral. They give you a set of mechanisms, but not the policy.

So what, that's what Mach is. What Mach isn't is an operating system. Alright. Mach is just the foundation for building operating systems on top. Mach does not provide I/O. We have I/O Kit for that, and it provides those set of abstractions. Mach does not provide any networking. Although people have taken Mach and networked some of the Mach services, Mach itself does not provide any network services. Mach doesn't do file systems, as you saw yesterday if you were in the file system session. The file systems are implemented inside the BSD layer.

Mach doesn't do any security policies either. Again, Mach tries to provide mechanisms. It provides lots and lots of security mechanisms and some really fundamental ones. But it makes no decisions about how they get applied. In our situation, a lot of those policy decisions again fall to the BSD layer.

So you're going to see that we use the BSD style of user ID and permissions. BSD then restricts your access to certain Mach resources based on those attributes. But the Mach services really are all based on whether you have access to those services and don't care who, what your user ID is or what permissions you have or who you logged in as. We basically have a set of services, a set of access points, and if you're granted access to those services by the policy maker, the decision maker, in this case BSD, then you get to have access.

Otherwise not. Mach doesn't really provide application APIs either. While you may dip down sometimes into the Mach APIs in some of your applications, you're not going to be programming to Mach itself. Mach is intended to be the layer that the operating system pieces converse with. All right, and all of those abstractions that we just talked about are actually provided in higher levels of Mac OS X, or Darwin. Okay, so why are we doing this Mach thing? It's the number one question people ask when, in the Darwin mailing lists or otherwise. Why Mach?

Well, we do have a complex system in Mac OS X. We have lots of environments to program to. We have the classic environment, the Carbon environment, the Cocoa environment. We have BSD, we have I/O Kit, we have Java. Others are coming that people may be working on in here. People are doing DOS emulators or doing other environments for this system. Each of these has a set of semantics and policies that they implemented and they control, and they cannot change. It's legacy.

Mach provides a set of abstractions that sits below a common layer that sits below those, that allows you to implement all of those. It's been tuned over the years to give a fairly good match to each of those environments. There are some environments which people have tried to put under Mach, or on top of Mach, and haven't had such a good match.

But for all of the ones that, that traditional operating systems and the new Java VM and those things, there's been a lot of work to tune those layers between Mach and those environments to provide a, a good match, a good impedance match between them. But Mach does more than that. It makes sure that those services are consistent across each other, so that a Carbon application using memory in a certain way doesn't, doesn't destroy a Classic application doing the same thing, or a BSD application doing the same thing.

Another reason for Mach. Again, 15 years now, Mach is nothing new to the research world. It's new to Mac OS X, but it's to the Mac OS environment, but it's been around for 15 years now. Lots and lots and lots and lots and lots of research. Obviously the original research done at Carnegie Mellon was intended to prove that you could build an operating system by defining a layer underneath that provided the basic services and layering the operating system on top.

More work was done at the University of Utah, kind of jumping forward in time. They took the Mach work that had been done before them and tried to make it more modular and embedded and add a bunch of real-time services to it. And then a lot of research was done at the Open Group/Open Software Foundation, depending upon your time frame.

There are research institutes, both in Cambridge and in Grenoble, France. Lots and lots of real-time work done for government contracts and otherwise. And as some of you may know, OSF focused a lot of their research on cluster work, building up complex systems out of modular boxes. A single system image, if you will, across many, many boxes.

And there was a lot more research in Mach. Mach was one of the first publicly available source codes for something other than Unix. Most people had been using BSD in the research community forever. But if you've tried to do research on BSD, you realize that there's a lot of knots and a lot of things tied together in BSD that makes it difficult to pick a piece out and do your research and replace it with a different piece.

Mach is really good at allowing pieces of the system, higher level systems, to be picked apart. And replaced. So a lot of that research occurred on Mach. And most of that research is available to us and to others. And we've been picking pieces, select pieces of it to include in Mac OS X.

And another obvious reason. If, for some of you that don't know, Avi Tavenian, Dr. Avi Tavenian, our senior VP of software, was one of the original authors of Mach. And obviously he has a fondness for it. But that's not a misplaced fondness, in my opinion. Okay, so which Mach? If we're going to choose Mach because it allows us to build complex systems, multi-environmented systems on a single box that maintain consistency and yet provide good performance to each of the environments, there are several Machs to choose from. Lots of research produces lots of Machs.

The original Mach, Mach 2.x, most of you probably know the 2.5 version of Mach. It's the one that originally got out from CMU and found its way around. It was that, the original work at CMU to decide, to prove that you could build a layered operating system, that there were common services underneath of those that a traditional operating system provides, and that you could formalize those. But Mach 2.5 was monolithic in nature. It was a layer underneath of BSD inside of BSD, if you will. They picked apart some of the pieces, but built it as a single unit.

Areas where that kind of showed through the most is the virtual memory system. They provided an abstraction for virtual memory inside of Mach, but the lower levels of that virtual memory system had hard-coded dependencies on what BSD did for file systems and the like. It also had hard-coded dependencies to a lot of the other parts of BSD for signal handling and the like that was kind of intertwined with Mach. And I/O, it had direct dependencies on BSD I/O style.

That kernel was also not really available in a form useful for SMP. And when they did the work, their main goal was to see if we could take a Unix operating system with Unix current real-time constraints and see if we can implement it in a layered fashion. They did not try to extend it in any real-time fashion.

Next in the horizon was Mach 3.0. And again, that was started at CMU to say, "Can we take this a little bit further? Can we formalize the rest of the interfaces so that we have no dependence between the Mach portion of the kernel and the rest of the kernel?"

So they formalized all those interfaces which had been in 2.5 hard coded dependencies. So they ended up with a more modular architecture as a result. And there was a lot of work that occurred once this basic work happened. There was a lot of research that came along in the areas of real time and in the areas of MP.

And most of that work got folded back in. Some of it didn't. There was a bunch of work at CMU on real time, if any of you are familiar with Dr. Takuda's work on real time. A lot of that did not get directly into the standard Mach 3.0 base, but found its way in over time through various other projects. And Mach 3.0 was where MP became a very formal part of the system. And of course that's quite interesting to us.

Okay, Mac OS X, the server product, and the Darwin zero dot variants of the operating system were based on that original Mach 2.5 work. It included, if any of you have looked at the Darwin source, a lot of the features, the niceties at the user level from the Mach 3.0 work.

A lot of the Mach 3.0 IPC interfaces and the like had been brought forward by Apple and others into that kernel. internal. Mostly by next, but we, we picked a lot of that up at Apple, and that was in the Mach, the server kernel. For Mac OS X, and you've got to excuse me, I'm, a lot of times I'll say Mach OS X, a bias.

[Transcript missing]

Basically, a lot of distribution was what they were going after. But we had, there was a bunch of work going on at OSF also to fold back in some of the research that had been happening at other places. After CMU stopped working on Mach 3.0, a lot of the Mach work drifted over to University of Utah. And they were working on their Mach 4.0 base, which actually wasn't, wasn't, wasn't, wasn't A lot of people think 4.0 may be more advanced than 3.0, but it's not.

It just had different goals and they chose a new name because 3.0 had been used so much. So a lot of that work from that environment had been brought back by OSF and integrated into their kernels, but not the ones that they made publicly available, the ones that were used internally and given to their partners.

Well, when we started our work, we had that code brought back and merged with the work we had done with our Mac Linux Mach 3.0 kernel, and that became our base. So in essence, we actually pulled a lot of that private code that had been done at OSF and only available to OSF partners, and through our licensing of that, have made it available into the public source again, where it had not been available before.

And besides the work that we had done on Mac Linux at Apple, we have done lots and lots and lots of things to it since then, and you'll see some of that work in the further slides. Okay, so, a lot of people say, but that Mach 3.0 thing, it's inefficient.

I've heard it. Everyone tells me it's inefficient, because the general goal of Mach 3.0 was, in order to prove that you could have the Mach kernel providing a rich set of abstractions, we wanted to be able to prove that you could run the Mach kernel separate from the other environments in the system. So BSD ran as a user level task in that environment.

[Transcript missing]

Okay, so how do you guys access all these powerful abstractions that allow all of these funny things, all these wonderful things to happen in Mac OS X? Well, for the most part, you don't. Most of the Mach abstractions are used directly by lower layers of software than the application. So for the most part you should be programming to Carbon, to Classic, to Cocoa, and sticking at those layers, even BSD or Java, and using those.

If you need to get a little bit lower in the system, a little more detail and a little more control, you're typically going to drop one layer down from that and maybe you're going to access some core foundation things directly, some bundle stuff, or access some of the CF run loop stuff in the system.

Right. But you're going to possibly stop at the CF level as well, the core foundation level. Or you're going to use I/O Kit to do things. Or you're going to use the BSD file system extensions, or you're going to use the BSD networking extensions. But typically you don't end up having to program to Mach directly.

Okay. If you do program the Mach directly, one thing you've got to realize is it's not portable. If you drop, if you have a Carbon application that tries to reach under the covers and access Mach, that obviously, that Carbon application is not going to go directly back to 9. There are tricks you can play with Gestalts and trying to load extensions and load bundles to do some of that work, and if you really need to, you can do it, but again, it won't be portable and you've got to go out of your way.

Some of these interfaces are not final at the Mach level. As we build up the Mac OS X environment with all the stacks coming up now to their, pretty much their final form, right, we're taking a look at the abstractions that Mach provides and we're trying to find more efficient paths and better matches to the semantics of all those environments.

So we're tweaking them a bit as we go along. You're gonna see some of that work happen between now and when the product's final. And so some of these interfaces are gonna change on you. But again, the majority of them are correct and, and final, or reasonably final. There aren't gonna be that many changes.

So then why should I care about Mach? Right. You care about Mach because you need to understand how your application works. You need to understand how the frameworks are built. Um, mostly so that when you go to debug your application and you, you see that all of your threads are blocked in a certain location, oh, what does that mean? Why would that happen? And you need to understand how your application and those frameworks might interact. And a lot of the things that you will see end up being Mach things.

So what are these wonderful abstractions that lets us build all these different environments? Tasks and threads are, are fairly common, fairly well understood abstractions. In Mach, a task owns all resources for an application. So all resources are task-wide. Every thread inside a task, then you have threads. Threads are the unit of execution, the things that the operating system schedules. and virtual memory provides protection for loads and stores to your application.

Then once we build up all of those environments, that gives you all the isolation that Mac OS X is supposed to give you to a Carbon application, let's say. You've now got a nice isolated application. It's got its own private virtual address space. It's got its own set of resources. They're protected into the task itself. And you've got another Carbon application on the other side.

Now, for the first time, you've got to have formal ways of communicating between them. Right, you used to be able to cheat and poke this location in memory and check that when you're coming around your next loop in the Carbon application if you need two to communicate. Right, but now they're isolated. So you need some formal way for them to communicate. And so there's a formal task-to-task communication mechanism in Mach.

Okay, so a Mach task, as I said before, owns all resources. It sets up an environment for threads to run. All your threads have equal access to all of the task resources. There are other resources available, so virtual memory and a port namespace, and we'll get into what the port namespace is in a little bit. Virtual memory, I think most of you understand. It's a protected address space for each application, each task.

But there are other resources associated with a task. We won't get into them too much, but there's task-wide exception handlers, and a bootstrap server reference for a task. That's basically how a task understands what kind of task it is. When Mach gets a task created, all Mach tasks kind of look alike.

So how do you distinguish a Java task from a, A Carbon task. Well, a lot of that happens through the bootstrap server. The task, very first thing it does when it executes is go converse with its bootstrap server and that gives it some of its original primordial references to things. And then it bootstraps itself up from that and all of a sudden becomes one of the environments or the other.

So when you look in Mac OS X, you're gonna have your application, maybe it's a Carbon application. Well each Carbon application is actually a BSD application. It's a BSD process. And a BSD process has two basic sets of resources with it. The Mach task part and, and file descriptors. There's other BSD resources, shared memory ID, um, space and the like, but most of the things in BSD processes are file descriptors or tasks.

You end up with that environment. But inside a Mach task you have multiple threads executing in a single task, sharing a virtual memory space and a port name space. The thread. The thread's the unit of execution in Mach. They're preemptively scheduled against each other. This is how Mac OS X gives you preemptive scheduling.

Mach does all of the scheduling work for those threads. Carbon and Classic don't have to do their own scheduling work. Again, they have equal access to the system resources. An important thing to note, at the Mach level, a thread is nothing more than a register state and some scheduling attributes.

There's no such thing as a stack to a, to a thread at the Mach level. We don't care whether there's a stack there or not. You give us a register state to start the thing, we start it running, we'll preemptively schedule it, take it off the processor, restart it at that state. Something else is responsible for doing those other layers. Right. So this is what a, a thread ends up looking like.

Each application in Mach is going to have threads, multiple threads. Each in Mac OS X, each of those threads actually is a POSIX thread. And the POSIX thread is the part that's responsible and does the work to allocate the stacks and set up things that you would expect from a higher level concept of a thread.

And even inside of that, in a Carbon application when you have preemptively scheduled threads, or sorry, cooperatively scheduled threads in Mach, in Carbon, those are Pthreads which are Mach threads, and Carbon does some work to make sure that Mach schedules them in a way that they look like they're cooperative rather than preemptive.

One of the things you also have to be careful when you're developing your applications, you may create a simple Carbon application today, single threaded Carbon application, and you go and you look in the debugger and lo and behold there are two or three threads in there. Where'd those other threads come from? Well they're actually created by the frameworks.

The frameworks sometimes have to create threads to bridge the gap between the semantics of the services below them and what the Carbon application may expect up above. So you're going to end up with some threads that you're not aware of, and you're going to have to worry about those when you're doing your debugging, so be aware they come. A lot of the work we're doing for semantic matching is trying to eliminate some of those worker threads.

And all of our tools are thread aware. So you're going to see that our debuggers will show you dumps and tracebacks of all the threads. Our sampler tool will show you what your thread is doing, but it'll also at the same time show you what the worker threads that are provided by some of the frameworks are doing.

And so you end up with a picture like this, that you have Mach threads. Mach threads are wrapped at user level by P threads to provide stacks and the like. Inside of a Carbon application you may have those thread manager threads like I talked about that are cooperatively scheduled against each other through the assistance of Carbon.

But you then have MP threads which are free to run and execute, and Mach is the only one who schedules those. So obviously if you're worried about getting true MP performance and true parallelism, the Carbon guys are trying to push people a little bit towards the MP threads. If you're doing a Cocoa application, all of the Cocoa threads are POSIX threads. There's no cooperatively scheduled notion in those, so you don't have to worry about the cooperatively scheduled part.

What's new for threads in Mach, in Mac OS X versus server and Darwin 1.0 versus previous Darwin's? Well, Pthreads replaces Cthreads. Cthreads was something that was explicitly for Mach from way back when. Pthreads are a more standard version of a lot of the same services, so we've switched to Pthreads. We also have scheduling frameworks in the Mach 3.0, or in the Darwin 1.0 and Mach 3.0 code that we've got.

Scheduling frameworks allow for different scheduling policies for threads, rather than just the standard timeshare scheduling policy that's trying to gain fair access to everything. There's also some fixed priority real-time scheduling policies in there, and you can select those from user state. And one of the big things, obviously on the PowerPC processor, is G4 support for Velocity Engine is now in.

What's coming soon for threads? Additional real-time work, additional policies, isochronous policies that some people are looking for are being investigated, no promises, but we're looking into those. There's been a lot of research in Mach, again, with being able to leverage work that other people have done. Real-time priority inversion avoidance.

A lot of the work that came from the University of Utah was about trying to avoid those situations for real-time applications where you're dependent upon a resource and that resource is currently consumed by some other thread in the system, one that you don't really care about, but you've got to wait for it.

And that thread may be so low a priority that it's not actually making any progress to get rid of its hold on that resource. And so the system-wide approach of promoting those threads in order, so that they could let go of their resources and then let the high priority thread continue is something that's being, extended--it's already in the code base that you're looking at now in Darwin 1.0, but it's being extended and deployed system-wide. In kernel preemption, it's something that was built in to the Mach 3.0 code that we've got. It's currently disabled, but it's there and enabled and it will be, or, enableable, and it will be turned on in the near future.

Another service in Mach is virtual memory. Again, it provides a protected address space for each application. It provides a flexible way to construct those address spaces. And it allows for controlled sharing. Don't go hog wild, because Mach lets you do sharing, jumping down there and doing wild sharing. We do sharing in a controlled way in Mac OS X.

One of the key features of virtual memory in Mach is copy on write behavior. If you look at each Carbon application, you can almost consider a Carbon application as a unique instance of the Mac OS X environment. A lot of the frameworks that used to get loaded once in Mac OS X, or sorry, in Mac OS 9, get loaded into each application as it starts.

If each application had to have a unique copy of those frameworks, it would have to be loaded into the Mac OS X environment. It would take forever to load. It would be huge amounts of memory that would be required. So Mach provides a way to flexibly share those spaces and to flexibly provide copy on write semantics to those. So that unless you modify a page, you get to share a common page. But as soon as you modify it, only that one page gets to be a unique copy for you. And that's done lazily, as you touch it, rather than up front.

So when you look at the virtual memory system, each task ends up having its own virtual memory space. And inside that virtual memory space is little regions of mapped memory objects. Well what do you map as a memory object? Well for most of you, you make calls to frameworks and they do this magically for you, but let's take an example where you map a file descriptor under a BSDM map call.

That file gets converted to a memory object, an abstract memory object, inside the kernel. Won't go too much into that, but there's a general concept called an abstract memory object. When you map that into an address space, the kernel creates an object called a virtual memory object. It's the manager of the cache of that data associated with that file.

So we don't bring the whole file in, we don't even know what the file is, but we know that there's some object that we have to cache data for. And so we create a virtual memory object for it. And as you touch pages or access locations in your address space, we bring in some of those pages and cache them, and then make them available to your application.

So here's an example of that. Now I've come along and touched, in that second page, I've touched a location. And when I do it, that causes a fault in the operating system, right, at the lowest level. That gets translated into, we look up the address space, we look up the mapping, the region, and describe, figure out which object and which range of that object and with which permissions you have access, and we go and check the virtual memory object.

In that page we just map it into your address space and everything's fine. There's only one version of that data in the cache. If you were in the file system session yesterday, you heard some of the ways that we make sure that not only are we maintaining it at the VM level, but we also maintain consistency at the file level.

But sharing isn't the only thing we do. As I said before, you've got a situation where You don't want to have to bring in whole copies of things that you want unique copies of. So the data section for each file or framework that you have loaded. We don't want to bring that all in at process initialization or at task initialization time. So we want to do copy on write.

We want to access the original object as much as possible and only copy, bring in unique copies of pages for those that we modify. So here's the situation of how that works. When you map something copy on write, rather than having a direct reference to the object, there's a new object that gets created and stuck in the middle. Alright. And when you take a write fault on one of the pages, All right, we go to that virtual copy object.

It, instead of going directly to its, its abstract memory manager at first checks the thing that it's a virtual copy of and says, "Do you have that page?" And in this case, it's really that second page that we just talked about, and of course he does. So he sends us a copy of that page back, and we cache only that one copy page, and we make that available. So as you run a bunch of the tools in the system, you're going to see things like private pages, alias pages, virtual pages, right? The private pages are those things which we have pulled our unique copies of.

At the lower level below that abstract memory manager, what happens there? Well, as I said before, typically each one of those things is a file that got mapped, and so those go through the VNode pager and they access the BSD file system directly for each file type. But there's also those other kinds of objects which aren't files.

Those things, when you allocate memory or you just did one of those copy operations, we end up creating some magical data for you. A new object that you didn't know about, nobody really has an explicit handle to. All of those objects are actually managed by a pager in the system called the default pager.

And what he does is takes all of those objects and the, since they are very sparse objects, they have a one page touch, maybe at the beginning of the object, and another page touch way down inside the object, you know, at offset ten thousand. We don't want to create unique files for each one of those objects you've created. There are thousands of these in a running system.

So he creates the default pager, creates a set of swap files, right, and maps those data, that data from those sparse objects, down into a swap file and writes that through the file system to disk or across the network or wherever your file systems happen to be. And there's a daemon in the system that gets queries from, you know, notifications from the default pager.

If one swap file is filling up, we send a notification up to that daemon and he allocates another swap file and creates one and we start swapping into that. One really neat feature that's new to Mac OS X, for those who are used to Mac OS X Server, is now swap files can go away.

And they do automatically. So if you burst up in your system and access huge amounts of written data, anonymous data, and then all of a sudden deallocate it all, the, as the data gets freed, space gets freed up in the swap files, and then the swap files are free to coalesce.

And you can actually take a swap file offline. So if you want to put a swap file on one volume, You could even put it on removable media. Start swapping to it, then when you want to unmount that media, you can say, okay, unmount that swap file, and the default pager moves all the data from that swap file into the other swap files, maybe making space available somewhere else for that data to move, and then frees up that volume. So that's a neat, unique feature that we've got in the current kernel.

Okay, how does it show all this funny stuff to you? Well, what you're going to see when you look at your applications is this long range of memory regions in each virtual memory space of an application. You're going to see a little bit that got created here for mapping this framework and a little bit that was here for this framework. All right, they're going to be sparse. They're typically going to be copy on write for things like frameworks. Most of them are not, you never touch them, like all your text is mapped copy on write.

Well, I don't write text. Why should I map text copy on write? Well, debuggers do. When you run a debugger on your application, you would hate to sticking a breakpoint in one application at a particular location in a framework. It would cause every application to take that breakpoint. Alright, so by doing everything copy on write, you have the ability to come along and modify just one application.

You're going to see lots of tools. Hopefully some of you will be able to go to the tools session later today or the debugging session first thing tomorrow morning. You're going to see a lot of tools around managing your virtual memory space of your application. malloc-debug is probably going to be the one that you're going to go to first. Well, maybe second.

You're probably going to go to top first. Top is going to tell you how much memory you've got, how many regions you've got in your application. And if you see that starting to leak, you're going to turn around and go to malloc-debug. And malloc-debug is going to help you walk through each of your applications in these areas.

There's another neat program that's not--it's a demo program, a helper program right now, it's not finalized. But I really like it, not necessarily as a debugging tool, but as a tool to understand how an address space or an application is put together, and that's the pagewatch.app. PageWatch is in debug developer tools, I think, or debug developer applications.

Sorry, local developer applications on your CD. And when you bring it up, it gives you a visual image of what your address space looks like, a little square for each page in the system in your application. And it tells you whether that page is present or not. And you can set it to scan at certain intervals, right? And you can see pages faulting in and out. You can also click on another view.

Now, that's the VM summary view. There's another view that lets you see each of the individual things that got mapped into your application, each library, and which pages are being used and how many pages are being used out of each one of those. It's really neat just to get a sense of how an application's put together.

Okay, what's new in Darwin 1.0? We've got the external memory manager interface. That was something that wasn't in Mac OS X Server. Now we don't currently have any need for user level pagers. This is actually an exportable interface between the kernels so we can call out to a pager at user space. And a lot of people will go, oh, really neat. I can go ahead and build myself my own private pager. And I can do really neat things with that.

Well, in most cases you don't want to do that. In most cases what you would rather do is build a file system layer. Right? Because you don't want something that just behaves that way. Let's say you wanted to do an encryption pager. Oh, I could do encryption on my data. Well, but you want reads and writes to see that data encrypted or decrypted appropriately as well.

Right? If you do it as a file system extension, the VNode pager I showed you earlier allows you to go ahead and access those things for free. It becomes a pager for free. Right? If you do it just as a pager, well, you've got no way to access it the other way. So there's very limited use for pagers through that formal interface. But it gives a nice separation between the two. And you can access those things even inside the kernel.

Right? Another thing that's in Darwin 1.0 is the ability, rather than just having an address space, be a linear list of memory regions. You may have... A bunch of things in common between a bunch of applications. And so rather than mapping each one into each application, you may want to just create one memory address space that has all of those files mapped into it.

And then rather than mapping, again, each individual file into each address space, you can take that one address space and recursively map it into each of the other address spaces. So as data appears in that single address space, as things get mapped in and out, they magically appear in the other address spaces as well.

It's a really neat feature. Most users won't have direct need for it, but coming soon we'll actually, if you look at the bottom one, where the shared memory part, shared region part, we're going to use that to employ the system frameworks. throughout the system. They're all going to be mapped into this common thing, and then that common thing is mapped into each address space. It provides a level of efficiency that we don't have today.

Other things coming in virtual memory, if you were in the file system session yesterday, they talked about how the file system part of the system is already enabled for 64-bit files, but So, the VM subsystem keeps them from being able to be mapped, and one of the things, the first things we're doing is adding the 64-bit object size support.

And if people are familiar with the PowerPC roadmap, there's 64-bit PowerPC processes coming right down the pike, and we're looking at ways to add 64-bit address support. Now this is a little bit different than traditional 64-bit systems, because in the PowerPC roadmap you can mix 64-bit applications and 32-bit applications on a single running system. So we need to be able to flexibly support both, and that kind of work is ongoing.

And there's also some changes in that EMMI layer to add some efficiencies there. Okay, so we now have a task, and it's isolated and it has one or more threads in it, and it has its virtual space, and we may have two of those tasks. Alright, but how do they talk to each other?

Now that they're isolated, we've got to let them talk. And the way they talk is through IPC. IPC in Mach has three basic abstractions. It has ports, which are the endpoints of communication between tasks. They're highly restricted. Each, you cannot send a message to a port unless somebody hands you a write to send a message to that port.

And through that restriction we actually get the basics of how the security model in the system works. Right? But you're only granted access to those things which the policy makers decide that you're allowed to have access to. And by not having access to the task port for some other task, some other process like a system process, you can't do things to that process.

A port set is a collection of ports. So you can have both sends and receive rights to ports. You can implement something so that someone can send you a message. And a port set allows you to collect those into a single spot so you can have a single thread wait and receive on any one of those ports. It's kind of like select sets or FD sets for file descriptors, except that they exist forever. The kernel maintains them and you can add things to them so you don't have to construct them on each call.

And where are all these ports maintained? There's a namespace for each task that, that maintains the task, the port space. So here at the bottom you have this port space, right, and you may have a port which gives you the permission for one task to send a message to another, right. And that task may own a port of its own, right, that it owns the receive right for.

And so it can construct a message, not only that sends data, but it can also send writes to ports to that other task. And it can also send memory, copy on write memory to that other task. So all of these systems inside of Mach are kind of recursively defined. They each depend upon each other and they each take advantage of each other's services. So.

VM uses IPC to implement itself in certain areas, and IPC uses VM to send data to other tasks. So when you send that message over to the other side, and the other guy receives it, virtual, new virtual memory appears in that other task, and now he sees a virtual copy of what was sent by the other process. And you also received that port, so you can now send a message back the other way.

And through this mechanism and building up of these little individual ports and sending writes and sending messages, all of the interfaces to the kernel use that mechanism to the Mach part of the kernel to implement their services. So when you say task or thread create, the first parameter to that is the task reference. Well, that's really a port write that gives you the right to send a message to a task port.

Right. That sends a message to the kernel. The kernel processes that request. If you didn't have a write to To that task port, you couldn't create a thread in that task port, in that task. But by having a right to send to that port, you have the right to do that operation. That goes into the kernel, he does the work, and as a result he sends you back a right to manipulate that thread.

That's how the kernel works. This is how the Windows server works. So when you converse back and forth between the Windows server and applications, the Windows server gives you rights to access certain core graphic fundamental pieces of the system. If you don't have the right to access that, you can't. So you can't trounce somebody else's window.

As I said, by using the protection of Mach ports, you're going to see that each application ends up having a pile of ports. Typical application will have 50 or so ports to it. These are reference handles to objects implemented somewhere else, either in the kernel or inside a Windows server.

But they can also be things that you implement and reference, and give references to other people. Exception handlers are one of the common examples from the Mach perspective. When an exception happens in a task, we'll send a message to whoever the exception handler is for that. Um, that task, whoever registered a port with us to send a message, and we'll send a message that says, "This exception happened." Um, and that may be your own task or that may be GDB.

You're going to see that each application, almost every time you look at your stacks or your tracebacks inside of Sampler App, you're going to see that they're all blocked waiting on IPC operations, right, at the very highest level. That's because almost everything in the system is, at its fundamental basis, an IPC operation.

So when you're waiting for a semaphore to go off, or you're in your event loop, right, for Carbon or whatever, you're going to, that thread is going to see, at the very lowest level, you're going to see that it's in Mach message overwrite trap. Today you're going to see that over and over and over again. And you've got to realize that, okay, well that's because everything degenerates down to a message eventually.

And you're going to see those things through TOP, and you're going to see them through SC Usage. SC Usage is really nice to see what you're doing in the system. And that should be coming up in the tools session later. Okay, so how does this show to me? Like I just said, each of your applications is going to have a set of ports for accessing other things. But each of your applications is actually going to have a set of ports for delivering events to it.

Right? And at the lowest level they are a collection of ports. They're wrapped up in port sets so that a single thread in your application can receive them. Typically your application's going to be using something like a Carbon Event Queue. The Carbon Event Queue actually internally has a run loop associated with it. And that run loop has a port set and that thread, eventually when it's waiting on that event queue, you're going to see at the lowest level it's waiting for a message to come in on that port set.

All right, so all of these things that deliver events to you are typically created as ports. For those that aren't, there are certain things that are in CFRunLoop that aren't delivered that way. CFRunLoop actually creates a port so that, and possibly a thread sitting in that other box to receive that event and then pass it on as a message, a simple message that says this event happened to one of those other port sets because that's where your application's thread is waiting, in that one spot.

And it just tells him to come over and look at the others. Right, and the same thing at the Carbon event loop model, or the Carbon event queue model. Things will happen that aren't delivered by ports, but because your thread is waiting for a message, a port is created to send that, the notification that happened.

What happened in Darwin 1.0 or Mac OS X? Well, there were some IPC changes. One of them allowed that discussion we just talked about where you may have a CF run loop and he may have more than one mode or event queue. So you're in a certain mode, you're in a modal environment where you only have a few things you want to wait on.

Before, lots of activity had to happen in order to change the set of things that you were waiting on. We added a facility that lets you wait, lets you put those events into multiple port sets simultaneously and then you just have to switch your set that you're waiting on. That all happens under the covers, but you'll see that if you look at the code.

Coming soon, as you saw that there were certain things that cannot be delivered efficiently through the port mechanism, and since that's where your application's waiting, we had to come up with some other mechanism to ping pong it over there. Well, we're not going to do that. Okay. But we can do that more efficiently and so we're creating additional channel types other than a message queue so that we can deliver those events directly to that waiting thread rather than having to ping pong it.

And there's, there are new IPC, new IPC APIs coming with those and related MIG enhancements to come with those. Well, you saw earlier that I'm a part-time MIG pilot. What is MIG? MIG is a tool that allows you not to have to worry about the details of how this actually works. It's used for system services.

You typically won't have to deal with it, but a lot of the interfaces you may want to call may be defined. Graphic services use it. The low-level Mach services use it. And it's just an IDL that creates, uh, message formats for you and packages them up so that you can just do a remote procedure call model.

Again, you'll typically use higher level things. You'll use Apple script or Apple events in your Carbon application, where you may get down to a lower level and use CF messages. But that's typically where you'd stop. If you want to go below that and use Mach, well we suggest you don't use Mach messaging directly, but try and stick with the interface generator.

There's other things that Mach provides. Won't have time really to get into those, but it provides all of the host resources as well, the system-wide resources. Security management, control, rebooting, all that stuff is a Mach function. And processor control and statistics. Coming soon to a processor near you is power management and thermal management. A lot of that work is pretty much ready to go.

Obviously, all of this stuff is not only as the core of Mac OS X, but it's also Darwin. Right? It's open sourced. This is in keeping with the Mach legacy. Mach has always been, or pretty much always been an open source thing. There was that brief period where OSF did a lot of research and didn't make it available to anyone except their paying partners.

But again, we've taken that legacy and now we're extending it by adding all of our changes to Mach into the open source environment. Obviously, when all else fails, having that source is a wonderful form of documentation. What is going on? Why is my application not making any progress? Well, if it really comes down to that, you can dive down and look at the source, because you know that thing's blocked in a foobar, you know, Mach message overwrite trap. Well, what am I, what am I waiting on?

Right? And you can actually use in really, really, really bad situations, because we have open source, you can actually be debugging the kernel at the same time you're debugging your application. So you can kind of watch it go through the kernel. That requires two machines, but again, if you're really stuck, it's an interesting way to look at things.

The roadmap, well sorry, most of the things that talked about a lot of this stuff has already been completed, a lot of the BSD and file system stuff. But there's some interesting sessions left. There's the performance tools later today. You definitely want to look at that because when you're tuning your application, you're going to be looking at things that show all of these Mach pieces.

There's the feedback forum tomorrow. Actually, tomorrow morning at 9 o'clock is the debugging your Mac OS X application session at 9 o'clock in the big hall. I didn't add this to that slide, but it's probably a really good session to go to. And they will dip into a minor amount of these kind of things. How, as you're debugging, you're going to see the Mach pieces showing through. And then there's a session also later tomorrow. on BSD and how the BSD environment is made available.

Where else can I find out about this stuff if I care? On your CD there's a Kernel Environments book and you can look at that. It actually discusses in more detail a lot of these pieces. Obviously the Darwin open source, you can go get that. And the Mac OS X homepage talks quite a bit about what Mach provides. There's also a book, Programming Under Mach. It's getting a little bit dated. You may have trouble finding it in some of your bookstores, but you can go to libraries and pick it up there. But it gives you a good overall concept of how Mach is put together.

And who to contact? John Signa, if you have business questions. But you also have the Darwin mailing list to contact us on technical issues. Most of us are on the Darwin mailing list all the time, watching what's going on. And with that, I'd like to invite the rest of the team up and we can have some Q&A.