Leveraging BSD Services in Mac OS X - WWDC 2001

Mac OS • 1:14:36

Wrapped around the Mac OS X kernel is a custom version of BSD 4.4 that includes many of the POSIX APIs as well as abstractions for both networking and the Darwin file system. Darwin's BSD enables developers to take advantage of a large library of existing applications and tools. Learn how to exploit this power to bring BSD-based applications to Mac OS X.

Speakers: Brett Halle, Eric Peyton, Jordan Hubbard

Unlisted on Apple Developer site

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.

Thank you and welcome to session 139, Leveraging BSD Services in Mac OS X. Hope you all had a good lunch. We'll try and keep you awake this afternoon. And as most of you know, wrapped around Darwin kernel in Mac OS X is our own customized version of BSD. And with that, what that means for you as developers, you now have a whole library of BSD tools and applications available to you. And you can also exploit the power of our BSD to bring your BSD applications to our platform. So without further ado, I'd like to introduce the Director of Core OS Engineering, Brett Halle.

Good afternoon. I'm impressed. Here it is, Friday afternoon after lunch right before a three-day weekend and you're in here. So you must be the real hardcore guys. So we're going to spend some time this afternoon and talk a little bit about what BSD is in Mac OS X, what kind of role it plays within the system, how you can actually get involved in programming with it on X, and a bit about the BSD community because it's more than just the technology.

By now, if you've been to a number of our sessions, you've seen the architectural diagram for where Darwin and Core OS fit into the architecture of Mac OS X. There's a lot of technology down in the kernel land. And we're going to focus today on just the BSD portion of this. A little later today, there'll be some discussion on the BSD kernel pieces and the Mach kernel right after this session.

First we want to talk a little bit about BSD from the kernel perspective. Understand that as far as Darwin kernel architecture is concerned, that BSD, if you will, if you layered even at the kernel level, BSD sits on top of our kernel architecture. It has the file system technology, the networking layer, and basically provides the POSIX, if you will, functionality layer for the system.

I.O. kit within the kernel space is our answer for dealing with I.O. and drivers and basically abstractions for devices on the system. And underneath that sits Mach, which is fundamentally responsible for abstracting the processor, dealing with low-level tasks and thread abstractions, memory management, and again dealing with the processor or processors in the case of an MP system. For BSD, you need to realize that as being just a subcomponent of Mac OS X, that really BSD is an operating system in and of itself.

It is a kernel environment that sits within our kernel space of the system. It's also a set of user land libraries and services that are available to the applications. And it's also an application environment in and of itself, the command line, if you will. From the kernel perspective, it's worth noting that there's some history to how BSD actually came to play in Mac OS X. And we'll talk a little bit more about that later. But for what we ship, it's based on the BSD 4.4 environment.

as well as being integrated with Mac and I/O Kit. It provides the personality APIs for the system. Mac, as it sits on the bottom, is very much of an abstraction layer dealing with processor management and memory management. It doesn't actually provide any, if you will, policy to the system.

BSD kernel, however, does. It very much represents the OS policy of the system. It is responsible for the process model that defines how each and every application space runs. It also provides the basic security policy for the system, whether it be the concept of individual users or even as it's abstracted down to the file system to access to files and other services of the OS.

From a process model perspective, there's kind of two ways of looking at this within Mac OS X. Again, at the very lowest level, Mach is responsible for the abstraction of the processor and the memory management. And it has an abstraction or model called a task, which is responsible for providing that container to map to a memory, basically for an address space or a chunk of memory that represents an application.

The BSD process sits on top of those very low-level primitives and provides a considerable amount of additional state to the system. It's responsible for much of the OS resource management. So for things like file descriptors and network services and things like that, and high-level memory abstractions as well as, of course, all the network resources.

The BSD process is responsible for that, and when a BSD process terminates for some reason, either because the application quits or because there's some type of fault condition, it's the responsibility of BSD to actually reclaim all the resources that are associated with a given application. In addition, there's other ancillary services that the process provides, things like environment variables and signal delivery ways of being able to provide some type of a low-level interrupt dispatch, if you will, at the application level.

[Transcript missing]

Other aspects of security policy involve the file system. Again, the concept of users in the system is actually reflected down into the file system. Each and every file on the system has an owner, a group that it's part of, and permission access to those files that are associated with it. And those same types of capabilities can be applied to other kinds of capabilities in the system.

BSD is also kind of the environment within which the file system sits. It's based on a VFS architecture, a standard BSD file system architecture, which supports a number of different file system plug-in types and things like that. And earlier in the week there was a session on file systems that went into this in more depth, but it's important to note that the file system environment is a subset, if you will, or rather a subsystem of the BSD environment.

The same is true for networking. The capabilities of networking on Mac OS X are built on the BSD Sockets APIs. And if you're interested in learning more about either of these particular subsystems of BSD at the kernel environment, there's some really great books out there by McKusick and a couple others that actually go into some of this architecture. Stevens is a good one for networking. And there's, again, both of these are based off a standard BSD architecture.

Before I get on to the user environment, I do want to make sure people are reminded that immediately following this session, there is a session on the Darwin kernel itself to get more into the low levels of the system where they'll talk a lot more about Mach and the BSD kernel. And that's here in this room.

Moving on to the BSD user environment, and that's really what's kind of important from the standpoint of most people who are going to be writing code against BSD. The user environment could be considered to be another peer of Cocoa and Carbon and Classic, if you will. It's another application environment available to the system, and it has a number of services and facilities that are part of it.

Certainly, it includes the command line and shell and those kinds of things that you would expect to be as part of a standard Unix environment. But it is also where most of the network client tools, things like SSH...

[Transcript missing]

They include things like the POSIX style of APIs. Again, things like Pthreads and other things like that all exist there. Your standard math libraries, C libraries, all of that, those are the kinds of things that are part of the system framework.

It's worth talking a bit about how POSIX actually fits into the plan here for Apple. From our perspective, it's important to try and make sure that the APIs are compliant. So as people are working with the system, we find places where we need new APIs or things like that, we actually will use POSIX APIs as a reference point. That's usually the basis. As Steve said at the keynote on Monday, the focus around staying within standards is a good thing.

However, even though compliance is a goal, for our perspective, certification is not something that we're trying to do. So as you're looking for things, there's going to be probably a lot of little details where we may not necessarily be completely POSIX because that isn't a goal for Apple. But again, where there are APIs, when we introduce things like P-threads, like there's also the POSIX shared memory facilities, those are, as we've introduced those into the system, we'll tend to reference the POSIX APIs as our starting point.

It's worth noting that the BSD API set is in fact a first class citizen in Mac OS X. It's just like Carbon and Cocoa, it is another application environment. It can coexist in fact however with Carbon and Cocoa and in fact Java because the way that it sits is that all of the application environment selves are actually built on these APIs.

Even though it's a peer of Carbon and Cocoa if you will and the ability of writing kind of BSD and Unix types of applications, the fact is this is the abstraction layer that actually represents the operating system and OS personality of the system and all of these services are what the other application environments use.

So for things like Cocoa in particular, you have pretty direct access to the BSD and POSIX and system framework APIs.

[Transcript missing]

There's a number of little hints, if you will, about porting your app over. One thing, for example, is we basically discourage the use of common variables and such. Our implementation for dynamic libraries is very, very different.

We also do not support the use of the C++ precompiler, at least for the purposes of porting. Our precompiler for C++ is very different than the other environments, and you should really try and avoid its use. Again, dynamic libraries are very different under 10, so use of things like DLopen is not recommended. That won't work. You should actually look at the DLib and some of the other facilities.

Try and create analogous solutions to the CFPlugins and other kinds of services that exist. Those dynamic library services are not the same on 10. One thing to note is that GNU Make in the build environment is the default build environment for BSD apps. There is BSD Make on the system as well. You need to explicitly use it if that is what your app depends on.

Also, use of AutoConf, which is a very common portability solution that's available for many BSD tools and applications, usually works. And in fact, a lot of the various ports collections and things like that have been modified to run under and build under Mac OS X already. However, if you find that it doesn't, we actually include as part of the developer tools and user lib exec a set of config recipes, if you will, that will usually allow your app to be able to be ported over completely.

One other thing to note is that in terms of how the system is laid out and when you're building your application, remember it is built on a BSD, basically as a BSD system. If you compare this with how, for example, Linux files are laid out, you're going to find that the file system layout is a little bit different between these two environments.

And our environment certainly references much more off of the BSD environment and you should try and, if you're going to be modifying the build, you want to lean more towards a BSD set of build variables and settings in order to be able to get your system to work. All of these directories that are mentioned here certainly exist on the 10 system, but you'll probably notice they're invisible to the end user. This is certainly something that we're not trying to encourage people to use in terms of the layout.

Some tools and commands that you get are dependent on some of these file system layouts, but from a perspective of applications that you package up, we would actually try and encourage you to package them up as part of a GUI app and in other places. But these standard install places exist. Now, kind of to show a little bit more about how you might actually go about porting an application to 10, I'd like to bring Eric Peyton up.

And Eric, one of the things that I'm interested in seeing is you're being able to take an app right kind of off the net and build it for 10. How do you go about doing that? Sure, not a problem. So we-- see, do we have the right-- yeah, we do. About a week and a half ago, two weeks ago, Brett called me and said, "Eric, go to the ports collection and do something cool."

And I said, okay. And I went up to the ports collection, which is something that I'm not intimately familiar with, but I've played around in there a couple times. And I had a mission. I had been playing around and I have a very large collection of MP3 files.

And they were spread over multiple different directories, multiple different disks, and I wanted to get them all into my nice big server. So I needed to find ones that were duplicated all over the place. And I knew that I could do this with, you know, a little bit of shell scripting, but that's not really the point here.

The point here is to go get a tool to help me do that, bring it down, make it, maybe put a GUI on it or something nice like that. So what I did is I went up into the utilities directory up on the ports collection, up on FreeBSD.org, I believe or something, and I grabbed same file. And what same file does is in multiple different ways, it looks at a collection of files and determines which ones match. And one of the interesting things about same file is that it does it from standard in.

And versus a lot of the other ones where you have to like enumerate all your stuff on your command line. And I thought it'd be an interesting technical challenge to bring this down, write a little, make sure that it builds, and do something interesting with it. So I downloaded it, just, you know, OmniWeb or IE or whatever I used, I can't even remember. And I brought it down and... Let's unpackage it. Now this is one of the well-behaved packages in the fact that I can just, oops. Once I get in the directory.

I can just configure it and it works out of the box. Like Brett said, in user libexec, the config files are quite often needed to bring across a package that hasn't been touched in a while. Something that hasn't been updated recently with the more recent config files. But this one works out of the box. And I type make.

And if you notice, I have an executable that I built 25 seconds ago or whatever it was. And that's just pulling down directly off of the ports collection. This is pulling directly off the ports collection. No work on my part. No messing with make files. Nothing. Just everything works.

In a little bit, we'll talk a little bit about what the ports collection is because certainly one of the advantages of being part of the BSD community is there are actually thousands of interesting little tools and utilities that exist out there within this thing called the ports collection. Yeah.

And the interesting thing, like I said, was that same file uses standard in to receive its output. And so what you do in this case, I've got a little command line find here that will find all the files from here on down and pipe it into same file. And you notice here, we get, if the font wasn't so big, it might actually look like lines, but it's pretty darn ugly. This is not a Mac user experience. Yeah, this is not the user experience that we're looking to provide for a Macintosh application.

So Eric, I expect better than that. Yeah, we're gonna try to do something a little bit better than this. You'll notice here, before we move on, you'll notice the first column, if you look along this side here, this side here is the size of the file that it found.

The column is the first file, the next column is the second file, and there's some extraneous stuff on the end, which doesn't really mean much unless you read the readme and start playing around with the different options that are available to it. But that's not really what I'm looking for here. I just want to find how big are my files and which ones match up and are the exact same size. So let me get this right.

This tool basically takes two big glumps of files, directories full of files, and points out which ones happen to be the same ones, regardless of what they're called or anything. Yep, regardless of file name and that kind of good stuff. And like I said, this could easily have been done with shell scripting, but it's not nearly as interesting.

So I've already, let's go ahead and hide that. I've already created a project, but the first thing we want to do, when you're coming at something from a Cocos perspective, a lot of times you just want to make a UI first. So I created a project which provided the main menu.nib which if any of you had been to any of the Cocoa sessions throughout the week you would have probably seen that multiple times by now.

And I've already dragged my files over and I'll show you all the code in a few minutes. But let's just start off with something simple here and let's create a little UI. A lot of window. Make it nice and big so that we can see these long paths all over the place. We will add a table so that we can see the output. And you've got to love the blue lines.

And what we really cared about was in that output we had three columns that mattered. We had file size, The first file name and the second file name, and that provides us with some Interesting stuff. So if we ended up with output that looks something like this in a table view that we could sort, that we could mess around with, that'd be a little closer to a reasonable UI.

Can you see the second color bigger?
Yeah.
There we go.
So there we've got some nice big fat columns so we can see all of our paths. And we can, for example, There's a field that will let us put in a path that we can type in or, how about even better, we'll add a little button so that we can select a directory to start from. And I'll throw another button up here that allows us to... Search for duplicates. Let's clean this up a little bit. Nothing really major, we're just playing around.

[Transcript missing]

So the thing with programming in Cocoa here is basically what you're spending all the time doing is laying out the UI. Yeah, this is the fun part. Spend a lot of time doing the fun stuff, laying it out, making it all nice and pretty. And Interface Builder, if you missed the Interface Builder tutorials and sessions this week, you missed a lot of really good stuff. It is one really cool tool. So here we've got a UI that I would consider somewhat usable for us Unix people.

Something that we can work off of. Let's go ahead and stop right here and start hooking stuff up and figuring out what's going on. Stop the window. Yeah, we want the window to be visible when we launch the application. I've already created a lot of the code and we'll walk through that together here in a second. But let's start off by creating our controller object. Which I've already imported into.

We have a same file controller object. I've already imported the code, which we'll look at in a minute. Let's go ahead and hook up the portions of that object that matter to us at this time. Once again, like I said, if you missed the interface builder talks, I really recommend that you spend some time playing around with this tool. It's pretty cool. So we've got a table for output. We have our three columns.

A size column, a column for our first file, Column for our second file. We have a text field that contains our path. We have our button that begins and ends our tool. We can go ahead and stay here. Everything's hooked up coming out of the controller. We know what all the output parts of the UI are gonna be. Let's hook up what's going into the controller, the things that make stuff happen. So there's two major portions here that matter. And of course the window's so big you can't see what you're doing here, hold on.

This is an easy way to do it. We want to tell the system to start my tool. We want to tell the system, let me select a directory. We'll save our little nib in our project here. And for now, this is all we really need to do. So let's go ahead and go off.

and open up our project that I previously had created and let's walk through what the code actually does to actually get this to output information to the GUI. So at this point this is all pretty standard Cocoa stuff. Yeah. Nothing really unusual at all. Nothing unusual at all.

However, it's not hooked up to BSD. There's no tools involved or anything like that. First thing we want to do, let's go ahead and add our, it's already here. I'll show you what it is. I've already included the same file binary as a resource of my project. Now this is the one you built that was the command line tool. Yeah. This is the one that I built four minutes ago or whatever. It's the command line tool.

The reason that I'm putting it in the resources directory of my project is so that I don't have to have separate install instructions for my user to get the same file UNIX binary installed on their system in /usr/usr/local/bin or anything like that. If I include it as part of my project, they don't ever have to worry about it. It just becomes a cohesive whole. It all works together.

Nothing else has to ship and they don't have to install files into locations that they can't delete them from or don't even know that it's happening in the first place. So basically if you're going to wrap something like a UNIX tool, unless it's something that already exists on the system, you can bundle, put it inside your application bundle and make it, you know, just another resource, if you will, of your application. Most definitely. And we really suggest that you do that to avoid situations where you're installing fragmented portions of your package all over the hard disk. It's not very Mac OS-ish to do that. It's something that we're looking to see improvement from you.

Now, before I began, or earlier we were talking about my different classes and I had the same file class. And the same file class is the portion of my code that is the controller object for what's going on in my interface. It consists of two major parts. It consists of two methods. A method for beginning my search and a method for finding Finding files for the user to do that.

So they press that button and it fills in a path so that stuff can happen. Let's start off with the selector. So the user runs the application. The first thing they're going to do is they're going to go off and select a directory. And so what I've done here is I've added in a simple little method here, five lines of code, creating an open panel, telling them that they can't choose files but they can only choose directories. I don't want multiple selection at this time. It would be an easy enhancement to add with some more UI later. And then I tell my system to begin a sheet.

and David For a directory. You'll notice here it's modal for the window and I've got a selector that happens at the end. So basically this just pops up a sheet, lets them pick a directory. This pops up a sheet, lets them do what they want to do and everything works.

Up above the other, the only other method in this class at all is the toggle tool method which will start or stop your execution. If it's not executing, go ahead and look at the path that we had set up in the UI. The one the user just selected. The one the user selected or the one that the user had typed in.

Let's create an array of our arguments. The first argument is the path to our same file resource. That's our binary that we had included. That's the straight Unix binary, nothing special there. Then we add in some arguments. You'll notice here that I have if verbose check state. So if the verbose checkbox is checked, add an object. Well, I never added a verbose checkbox. And to show you a little bit about how easy some of this stuff can be, what I'll do is I'll come back here and I'll add a little checkbox onto my UI.

Verbose, or you could name it something much more concise than that if you wanted to. And we'll hook up the verbose check there. And what this would do, if you go back to the code that we're looking at, all it's going to do is it's going to add an object into my array of dash v. And we'll talk about why this array looks the way that it looks in a minute when we get to the process, what really happens.

So basically, like most Unix tools, there's a half a bazillion little options and stuff, and it's very easy in the UI to be able to add checkboxes and things like that. Most definitely. To kind of hide that kind of stuff away from having people having to build command lines.

It's a lot easier to start, once you have all this set up, and you'll see, it's not very much code. Once you have it set up, it's very easy to add command line switches can turn into GUI switches very rapidly. Very little work whatsoever. So you'll notice here that at the beginning of this code segment, I've got a current process equal process alloc init with controller self arguments.

And what that is, is for every time that we run this, we want to have a process run. Something is going to launch this Unix tool, get the output, and all that. We feed it in with the arguments at the end. I just called them arg v, but you could name them whatever you'd like.

Down after that, you'll notice I have an append output. This is just a little log so that you can see that it's begun running. And then in this block here, you will notice that what we do is we take the path that the user had entered or selected, Use the NS file manager class, which will, as a BSD programmer, probably become a very good friend of yours. It's a general purpose file manager class. We retrieve all of the paths at, or the sub-paths from that path.

We even do some nice stuff like we expand the tilde in path. If you have BSD users, they very well might want to use tilde for their home directory and not have to go selecting all over the place with a file browser because some people type faster than they click.

Usually the tilde is something expanded by the shell. Yeah, we have no shell performing any of the actions here, so there's no possibility for expansion there. If I can interrupt, one thing that's worth noting is there's a huge number of facilities within the foundation portion of Cocoa that actually wrap a lot of and do a lot of the services that you expect to be able to talk to background Unix apps. Things like expanding tildes, things like breaking up command lines into specific... individual arguments. Yeah, breaking up command lines, finding your home directory, finding the current user, all that kind of stuff are all portions of foundation.

This specific case is a method on NSString, but a lot of them are in different portions of the foundation classes, and they're very, very useful to somebody who's trying to leverage BSD. You'll notice here what we have is to our process, so we get all of our paths, we enumerate through our paths, and we pass our process on standard in the string that we had just expanded.

And then when we're finished with all of, once we've spun through all of them, we pass it on, we tell the process, okay, I'm done writing, go ahead and go off and do your thing. This close writing is, I guess you could say, analogous to hitting Control-D at the end of messing around a cat. I'm done messing around, go do something.

And that's all it is in our same file class itself. There's nothing else to it really, except for handling of the output, which we've abstracted into the superclass. So we've got a class that controls them pushing buttons, But what happens? Where does all this go on? I don't see any work going on here.

So you'll notice that I've got another class, Process, and a class, Simple Tool. Simple Tool is the parent class for the same file. And the reason that it's abstracted out was because the Simple Tool class is an abstraction that I've used in multiple different places and it made it very quick and easy. I just literally dumped this class in, object-oriented programming, let's use it again. And what it does is it allows, it's the thing that handles the output and passes the output back to the UI.

And there's a very common set of abstractions when you're trying to wrap a Unix tool, which is passing a set of arguments and information into the tool and then grabbing the output of the tool and processing in some way that's appropriate for the UI. So you'll notice here that the only thing that this thing really does is have, the process has started, okay, let's create some strings to stuff the data in, and the process is finished, let's create some arrays to stick the output in. But we still don't have the glue. Okay, we can get the output, we can tell them what the input is, but what's in the middle? And that's what my process class is for.

So, Cocoa provides in its foundation class a couple fundamental tools that could be used by BSD programmers. NSPipe, NSFileHandle, NSTask, and there's a bunch more. I expect that all of you will go home and do your homework. Start playing around with them. In the case here, if you remember earlier, we created our array of paths and everything and put them, or our array of control and put it into an argument that we pass to a knit with controller for the process class. And here you can see exactly what we do. We just retain those arguments and we create a write pipe.

The write pipe, you see there, it's NSPipePipeRetain, creates a pipe that we can use to pipe to standard in. If you look down into the next major function, you'll see when they click on start, when they click on toggle tool and it begins, we gather all the arguments, we tell it to start the process.

We create an NSTask. One of the main purposes of NSTask is to wrap this kind of work for you so that you don't have to do interesting, neat, but boring things like fork exec and sitting around and waiting for output. NSTask does it all for you and provides all these tools in an object-oriented manner so that you can access them. You can set the standard output. You can set the standard error.

You can set the standard input. In our case, like I said earlier, same file presented a interesting case because it needed standard in versus most things which just pass everything on the command, or a lot of tools that pass everything on the command line. So we've got a pipe for, you know, For standard in, and we've got the same pipe for standard output and standard error.

We tell it the launch path. You'll notice here the arguments is objected index zero, which is somewhat hard-coded in this case. But if you remember back when I created the array, the first thing that I passed was the path to the same file binary. So my launch path is the entire path to the same file binary, the little bitty Unix tool. You're building up the command line. I'm building up the command line right here. The next thing I do is set the arguments. And I want everything else except the path to the Unix command line.

So I subarray with range, and I start at one, and I just go to the end. So this one line right here will easily allow you to just keep on adding on arguments, dash v, dash d, dash 4, 1000, or whatever the arguments need to be to that Unix tool. You can add them on without having to change your process class in any way whatsoever.

You then need to register with the notification center. And if you go in and look at NS file handle, NS pipe, and NS task, you'll notice there are some standardized themes throughout all of them. But one of them is the fact that you can register with the notification center to see when something's done, to see when a task terminates, and that kind of stuff. Here you'll see that I'm registering for NS file handle read completion notification. When the file handle has received some output, let me know.

Basically wanting to know when all the output from the tool comes along. Sure. When the tool starts spitting stuff at me, I want to know what it is. The next thing I want to do is I want to tell it to go. And that consists of two parts in this situation. We tell it to read in background and notify. That tells us, yes. Your job is to wait for background notification and then tell me. And then we tell it, launch.

Pretty simple. Stop process is even simpler. We tell it, OK, remove my notification for reading, and let's terminate the task. The only other interesting thing in here is what happens when I get data. When I told it that I wanted to register for notification, I gave it a selector to a method. Notify me when I get data by calling my get data method. And you'll notice here all it does is it takes all this stuff and then it passes back to the controller that I registered at the beginning a penned output.

So we can get the output. This class does all the work. The only other-- I guess this is another small but interesting thing. You'll notice here the insert string method. This is the method that goes to the write pipe and does standard in for all intents and purposes to our little Unix tool. And all it does is it writes the data in UTF-8 encoding, and then it passes a slash n.

New Line on so that it knows that all the writing is done. And then when we told it to close writing at the very last thing, Control D as I explained earlier, let's close our file. We're done piping stuff to standard in. Go off and run and do stuff.

So 99% of what you just talked about is basically standard for talking to any Unix application. There's nothing unusual about this particular tool that we're running. It's basically, there's a couple of small methods that you use to be able to pass information to the background tool. There's a small number of methods to be able to read it in and parse it.

And some basic stuff for wrapping the management of that particular task. But the stuff that you've shown me so far doesn't have anything really at all to do with... No, this is very tool agnostic. If you know the... We ship some applications that are based on the same style of concept. Network Utility, for example, uses multiple different Unix tools to do the same type of work. And this is the easiest, fastest, and most productive way to do this kind of thing.

So now we know we have the controller that gets the information from the user and tells the process to run. The process runs and can tell the controller the output. The only other interesting thing left is back in here we have a tool. So simple tool, which was our parent class, knows how to put output into a table view. And for this specific case it's a little hard-coded because it's a demo. Basically all it's doing though is just taking the output and formatting it up for the UI.

It takes the output, it creates a row, an array of rows. Those rows contain arrays of output. The size of the file, the first file name and the second file name. And this is a table view delegate, which if you look back here in the UI, we had our table view. And our table view needs to be hooked up.

to a delegate and a data source. Now, what that does, if you're familiar with AppKit programming, a data source provides data to an object that requests it. Table views, browsers, those kind of, I guess you could call heavier weight Cocoa objects. Quite a few of them have data sources. Delegation is a concept we won't go into here, but it's a way of having an object perform actions on the behalf of other objects.

Our table view needs to get the data from something. And so what we would set it up as is that it will get the data from our-- It just knows to ask your class. It knows to ask my class for the data when I'm ready to have it draw. It only matters, there's only two methods that truthfully matter, and it's the table view wants to know how many rows do I have so it can start setting up its scroll bars, know how much amount of, the amount of room it needs to allocate and so on.

And then what value do I put into this row for this column? And you notice here, if table column is the size column, then I want from my row array the very first indexed object. I want my size. You can use table column identifiers to get much better granularity than this in slightly less hackish terms, but this will do just as well. Okay, I'm in management, I'd like to see the end result, show me. End result. So let's build our project here.

And he's in management. You'll all notice that I built with no warnings, no errors. So let's go ahead and select a directory. And this should all work. Let's select a directory that we know has some duplicates, otherwise this would be very boring. And you'll notice here I had a select directory. I filled in the path. I could have typed that path in as well, but got to show that the sheet works.

Let's see if this works. Ooh, there it goes. I must not have waited long enough. So you'll notice here what it did is it went off and it grabbed all that output. And this is essentially the exact same output with some typo errors, it looks like, stuffed in there. Probably parsing. It's, uh, that's bug fixing. This is feature complete, but not bug finished.

You'll notice here that it went off and it noticed that I have a same file binary inside my app, which is what I had requested. And if you look here in the second column, There's also a same file binary in both the root of my app, which I had copied there earlier on accident, and a same file binary in the same file directory of what I downloaded. You'll notice that they're all exactly the same. And in this case, they're all the exact same name and everything.

But if we would have renamed one of them, it would have shown it up anyway and done its magic. If we had turned on the verbose flag, in this case, we don't have any place for the verbose output to really go. We could have dumped it to standard out. But the verbose output doesn't really belong inside of a table view.

The next steps for what you could do in here are things like adding in flags for size. The dash S flag, I believe, is the same file. Then you put in 10,000. That means I don't care about anything underneath 10,000 bytes. So all it would have shown were these first two rows. That would have literally been four more minutes worth of work, but my time is over. Thank you.

Thank you, Eric. I think the important key here is that there's, as you see, a bit of work around kind of setting up the standard structure for being able to talk to a background BSD application in terms of being able to manage sending data to it and getting it back. But Cocoa does an enormous amount of the work here in terms of being able to allow you to parse the arguments and tear them apart and basically send them to a UI. At this particular point, it's UI polish, and that really is the fun part.

But the neat thing is being able to take, you know, something like a BSD background application and even just starting with this as a starting point, being able to support just about anything you can imagine that runs there. And this is of a particularly valuable and important feature for Mac OS X, being able to leverage these kinds of capabilities and tools in the background of your application and being able to provide them forward to our end users in such a way that doesn't require them to learn, you know, something like Emacs or the command line. So I can move on. Let's move on back to the slides. Thank you.

[Transcript missing]

Does this work? Yeah. So as Brett said, they incorporated quite a bit of FreeBSD technology into Darwin, I guess, several years ago when the project was first launched. But I don't think that collaboration certainly ends there. And there are a lot of interesting things that we are doing at FreeBSD and have done. And I think that there's quite a bit of room for collaboration in both directions. So I'll talk about it a little bit from the FreeBSD perspective.

So one of the nicest things I find from a developer about FreeBSD is the source tree organization. And that may seem like kind of a small thing, but it's actually of tremendous value for a lot of reasons. One thing it gives us is a common taxonomy. It gives us a place for everything to go. It gives us a place for new things to be incorporated into the tree without having to argue for four months about it first.

And it also means that there's a one-to-one correspondence between where binaries are. For example, binLS, you know it's going to be in user source binLS. So it's very easy to find stuff. And it also segregates all of the code that's, for example, under the GPL or is export restricted into concise areas of the tree where you know exactly where it is and you can firewall it off if you're making a product that requires the BSD license, for example, or you're exporting it to Croatia and then there's issues behind that.

The source tree also handles, it encodes all of the bootstrapping issues so that if a binary depends on a certain library, it is smart enough to go build that library first before building the binaries in question. And that removes a lot of the burden on the developer of figuring out exactly how things are tied together. And that's very important. We also have a distinction between vendor. We use vendor-supplied software and stuff that we maintain ourselves.

We use CVS very aggressively. And one of the things we do, for example, is when a new version of GCC comes out, which is a vendor-supplied bit of software, we import it on a vendor branch and we have all of our own changes on the project branch. So we can keep the two change sets distinct. And so we know exactly what the GNU people have changed. Whoops, my microphone fell off. I'll just hold it. We know what the GNU people have changed versus what we've had to change.

And this makes ongoing maintenance a lot easier. Also, we provide a series of targets in the source tree, like world and kernel and update, which remove a lot of the hair of knowing exactly what order to build things in again. So it gives us a common reference point. If we have developers who are complaining that they're seeing some strange anomalous behavior, we can say, well, when did you last build world? And we know exactly what that means. We know that they've gone and they've updated all of it.

They have their binaries, their include files, their libraries and so forth from a common reference point. It also makes it easier for newbie developers, people who are maybe not so familiar with the project, they may be very skilled at C or whatever, but they don't want to have to learn the ins and outs of our build system. They just want to grab the source tree, type make world, watch the whole thing work, and be done with it. So these are all things I hope that will eventually in some way, shape or form, migrate over to Darwin.

As I said, we also, using CVS very aggressively, have come up with a number of interesting ways of distributing the bits. We have something called CVSup, which understands CVS natively. That means that when you run it, it pulls over just the deltas that have been added to the repository since the last time you ran it. And that means that it preserves your own local changes. So if you have a local branch in your own CVS repository, it's not just going to come and splat a file on top of it.

It's going to interleave the deltas together. And that's very valuable. It also uses rsync and some other advanced protocols, so it's very fast and very efficient. And if you run it on a daily basis, it takes maybe five, ten minutes to run, and you're completely synced up. It also understands how to check things out. So you can use CVSup to check out a branch of the repository. So if you don't want the repository bits, but you want a certain branch of FreeBSD, you can ask CVSup just to give you that and keep it up to date.

We also support some esoteric methods of getting the bits, like CTM, which actually bundles the patches together and checksums them and sends them through mail. So some of our developers in Estonia and places like that who don't have good hardwired connections or who have very infrequent connectivity can sync up that way. And then, of course, we support the classic AnonCVS methods. And we also have a CVS web interface so that you can see very colorfully what's changed at any particular point.

We also support multiple branches of development, as I said earlier. We have stable branches, which are sort of more tried and tested branches that we aim at the end users. And we have current, which is the bleeding, spurting, severed artery edge of development, which is guaranteed to hurt you. And so we aim the developers more at that.

But we do supply both. So this means that we don't have to freeze out developers from doing active and, in some cases, experimental development. That development can always occur at whatever pace it wants to. But the people, the yahoos of the world, don't have to suffer from that.

They get their changes backported from the current branch at a much more slow pace. And they also get things tested. You don't just bring things immediately over into the stable branches. There's a methodology that requires a certain amount of testing and integration. On the flip side, it's also a real pain in the butt to backport stuff. And CVS isn't tremendously good at holding your hand in that.

So that's an ongoing project cost that we incur. Nor is it very easy to synchronize things across repositories. So we have had a lot of initiatives that have kind of fallen flat to share codes, say, between NetBSD and FreeBSD, or FreeBSD and OpenBSD. Because they have stuff in their repository. We have stuff in our repository.

And it's just a lot of pain to manually bring those changes back and forth. So we're looking at ways in perhaps automating that somehow. Putting pointers in CVS, for example, that say whenever you make a change to this particular subtree, it gets bundled up and sent over to a neighbor repository and automatically checked in or something like that.

There's a lot of low-hanging fruit, I think, also in the free BSD source tree that the Darwin community can benefit from. We've added a lot of creature comforts to our user land. Just for example, our FTP client does file name completion. So whenever I'm on my OS X box, I'm always whacking tab and kind of frowning it that it doesn't work. So things like that would certainly make a positive difference.

We have a lot of really interesting libraries that front-end things like fetching bits over FTP or HTTP that make writing clients a lot easier. And then obviously we have some of the large-scale subsystems and whatnot. Open SSH, Open SSL get updated, I think, on a much more periodic basis. We have a lot of security work that's ongoing. We've added POSIX ACLs for giving you much more granular access control than simply the one big root.

We've added obviously a lot of ongoing auditing work as well. And I would like to see those changes getting into Darwin on a more frequent basis, as I'm sure the Darwin people would as well. Finally, it's also, even though we do use a very different driver model, Darwin using I/O Kit and us using Nubus, I think that obviously having working drivers to look at is of tremendous benefit to Darwin developers. And I hope that they will continue to use them.

So, getting away from the bundled apps, we also have the ports collection which you've heard referenced a couple times already. I started it in August 20th, 1994. I just wanted to kind of prove a concept which was that you could create an expert system, as it were, for porting software. That is to say, I found whenever I went into a new box, the first things I'd do is I'd bring over Emacs, I'd bring over Bash, I'd bring over a couple of tools I was familiar with.

And that involved, of course, remembering where the reference repositories on the net were, FTPing them, unpacking them, configuring them, building them, installing them in the whole nine yards. And it occurred to me that, you know, there was kind of a very common process running across all these different types of software. And maybe I could encode the smarts for doing that in addition to any patches that I might need to make into some sort of taxonomy. And this became the ports collection. I think by the end of the year it was 200 ports or so.

And today it's over 5,100 ports in 52 different categories. There are languages, localized versions of software, German, Vietnamese, you name it. It's just a huge, huge collection of stuff. And it grows at a rate of about 50% a year so far, and there's really no end in sight, unfortunately.

There have been some forks of the ports collection. The NetBSD and the OpenBSD folks have taken them, made some slight tweaks, and the open packages effort was launched by Daemon News in an effort to kind of bring all that back together again and create one common standard for the ports collection. And so they've been doing very good work there, and I believe they've gotten Darwin to sign on as well, and so Darwin will be substantially leveraging a lot of that effort.

As I said, what the Ports Collection does is it automates the fetching, the configuring, the building, the installation, and all of the dependency checking. So if you build some little GNOME client, for example, it goes off and installs reams and reams of stuff. It goes off and gets libraries, gets support infrastructure, a tremendous amount of stuff. Basically drags all of GNOME in with it, and you don't have to know about all of that. And it's a tremendous time saver.

It also is the way that we build all of our packages. Because you've got the dependency information already encoded in the Ports Collection, it's very easy to pass it on to the package system, which was actually written after the Ports Collection was done. And it gives us a great way of automating the package building. We have a cluster of 16 or so machines which do nothing but 24 hours a day, build all 5,100 packages, and put them off into an FTP site.

And you can access this very easily just by doing, package add minus R, name of package, and it will go off and figure out what version of the operating system you're running, what the closest FTP site is, and suck it down along with all of the dependencies. So if you don't want to build the stuff from ports and go through all the compilation time, you don't have to do that.

We also use the packages to do more than just,

[Transcript missing]

So like I say, it's really great that it saves a lot of time and makes it really easy to go build really complicated pieces of software without having to know anything about the dependencies or where the tar balls are stored or what particular weird configuration arguments you need to use. It's all pretty much hands off.

What sucks about it is all of this is encoded in Berkeley Make. And if I had it to do over again, I probably would have put all the meta information in XML or something and written a tool which just extracted the information and built the Make files on the fly or something.

So I am actively looking now at some way of doing a ports next generation effort or something which makes it a lot easier to build documentation trees, web browsable pages and whatnot which describe the ports collection. And that's something that I'm really excited about. I hope maybe we can do hand in hand with Darwin.

So I think our community is at least as important as our bits. Obviously we've been around for nine years and we wouldn't have been around this long if we hadn't had some ways of keeping the whole thing cohesive. So one of the things that we're proud of is our core team, which is now a democratically elected body that is chosen by the majority of committers. And committers are people who have right access to the CVS repository. So if you're a committer, you can stand for election and every two years we choose a majority. It's a pretty good system and I think we should try it in this country sometime. Anyway.

We have 267 CVS committers now who are segregated in three different categories. The docs people who just work on docs, people who work on the bundled sources, and the people who work on the ports collection. So you don't need to be a cross-disciplinary person. You can choose an area to specialize on. And committers are approved by core. So it's not quite as difficult as it is to become a Darwin developer, for example, but you do need to have the unanimous approval of core before you make it in. So there's some level of quality control.

We also make allowances for sort of collective versus individual ownership. In the large picture, FreeBSD as a project owns the sources and nobody can really put an exclusive lock. But we do have kind of a non-exclusive lock model where somebody says, hey, I'm really maintaining this. Everybody should lockstep their changes through me because I understand it better than anybody else. And so we have maintainer bits that allow for that. And this works very well, actually, because it lets people feel a certain degree of ownership over it but without destroying sort of the collective ownership of the sources, which keeps the project together.

I think it's really important to get developers' hearts and minds if you want to keep them interested over the long term. What I mean by that is you can't just get a pile of bits thrown over the wall periodically from some developer who doesn't buy off on your vision or your overall strategy because that pile of bits will just hit the ground with a dull thud and start to smell almost immediately.

We tried that with BSDOS and some other groups where code was thrown over the wall, but we didn't get the developers that came along with those bits, and that was a critical mistake in all cases, and in such cases, those projects essentially died off. So if I could say something to the Darwin community, it would be it's not just about the bits. It's about communicating your vision and really getting people to buy into it, and that's what gives you longevity.

So our identity, as it were, is also determined largely by the users. And this was kind of a shock to us. We started off as a bunch of hobbyists with a computing problem, and we did it really for our own edification and ego gratification and whatnot. But what we found is that we have pretty much been driven over the last five or six years by what the users want because they yell very loudly for it, and we have some pride in our work.

And so we have really become extremely user-driven. The yahoos of the world pretty much tell us what to do. They ask nicely because they've learned their lesson, but they do essentially drive our direction. And so our vision is essentially nothing more than listen to what the users ask for and try and fulfill that as best possible.

And still try to have fun, obviously, and people will still work on things that interest them the most. But if I had to point to the overall direction, it would be listen to the users. And I think that's what's going to happen with Darwin as well. The needs of OS X will dictate Darwin's direction if they follow a parallel line of evolution.

So where I think that the FreeBSD project can work with Apple on an ongoing basis is one, I think that Apple is helping us to shatter the desktop myth, which is that BSD or FreeBSD are only good for servers. Because they run Yahoo's and Hotmails and companies like that, that it's no good as a desktop OS. And we certainly know that that's not true, but Apple is helping to really drive the point home with a very large hammer, and we appreciate that.

It's a great foundation for a desktop operating system. And I think that having the same OS run on your server and your desktop also means that interoperability is more or less assured. You're going to have the right tools for doing network communications between the two. It's going to be much easier to administer heterogeneous collection of servers and desktops, and I really think that there's a strong case to be made for one operating system being both.

And also, paradoxically, I think Darwin has more freedom to innovate now than we do. We have been around for nine years. We have a very stodgy and somewhat conservative user base. And as I said, they tend to drive our direction. So we can't make any sudden U-turns or swerve in violent directions without pissing a lot of people off.

Whereas Darwin is still comparatively young and they're well firewalled in some respects from OS X by this nice GUI desktop. And I think that Darwin has already done quite a bit of interesting stuff with I/O Kit and even the stuff pulled across from Mach that addresses some of the old problems in very new ways.

And that's something that I think we can learn from. Darwin can learn a lot from some of our methodologies and they can learn a lot from our technologies. But I think we can learn. The most from Darwin's innovation. And I look forward to working with them in the future. Thank you.

Thank you, Jordan. So kind of in closing here, again, you really need to realize that BSD is an amazingly powerful part of Mac OS X. We've done a really good job of hiding it, but from you as a developer, it's there. It's something you can take advantage of and really should.

However, you should be careful about exposing the end user to the user experience. It is a pretty different world underneath the covers there. This is a very powerful tool for you to use as a developer. It's a powerful tool to help leverage new technology into Mac OS X. But it's up to you as the developers to be very cognizant of keeping it Macintosh in terms of ease of use and the kinds of things that you would expect on a Mac. And there's a very interesting juxtaposition there of goals.

It's also important for you to realize that many of the BSD parts of the system are actually optional in terms of how they're installed. So if you're writing an application on Mac OS X and you're depending on bits and pieces of BSD, make sure that you include the pieces that you need to and that you know what's actually installed kind of in the base system install. The BSD package that exists in the installer may actually, you know, may not be installed by the user and in some environments they may decide not to.

So be very, very careful about making assumptions about what pieces are actually in the base install of the system. And there are things you can do like packaging your tools as part of your application to help protect you from that. And we're working on additional things for, you know, long term for being able to do things like package management and stuff like that to help out. But it's something you should be very conscious of.

Make sure you leverage Darwin. I think this is an important part of Mac OS X, and I mean Darwin from the open source perspective. This is a pretty new paradigm for the Apple community to use, and something that we consider to be a real advantage that Mac OS X brings to the table. Take advantage of it. Also, leverage the FreeBSD community. They're very much part of our extended community.

Jordan and his team really brings a lot of experience and stuff to this aspect of the system, and we're really happy to be partnering with them and being part of that community. I would very much encourage you to leverage some of the knowledge and technology that exists there.

[Transcript missing]