Developing for UNIX on Mac OS X - WWDC 2007

Mac OS X Essentials • 1:03:24

Whether you're new to the platform or a veteran Mac OS X developer, find out how to adapt to the new POSIX-conformant industry-standard APIs available in Leopard, as well as how your UNIX, BSD, and Linux experience (and applications) translate to the Mac OS X universe.

Speaker: Kevin Van Vechten

Unlisted on Apple Developer site

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript has potential transcription errors. We are working on an improved version.

[Kevin Van Vechten]

Good afternoon everyone. My name's Kevin Van Vechten. I'm the BSD Team Manager here at Apple in Core OS. And today I'll be talking to you about developing for Unix on Mac OS X. Our hope is this session will be a good introduction to the Mac OS X platform for those of you who may not have developed for it before, particularly those of you who have a Unix background.

And we'll be going over some of the high level features of Mac OS X from a programming perspective and some of the design decisions that we have and we hope you'll adopt in your applications to integrate successfully with Mac OS X. So what are these concepts of Mac OS X? Well, I'm going to give a little bit of a laundry list and these are the topics that we'll cover during the course of the talk.

First, Mac OS X has a lot of open source and we really like to adopt open standards and we hope you'll interoperate with open standards, too. Mac OS X has a very rich structure in terms of how it's organized on disk and how the APIs are layered with each other. Mac OS X is very dynamic. Most of the computers that it runs on are portables, the Mac Books.

They change networks a lot, they have a lot of different configuration changes all the time, so you need to be dynamic in your applications. And Mac OS X has a very sophisticated API that we hope will help you deal with this dynamic environment. Internationalization and accessibility are also important file sets of Mac OS X. and Mac OS X has an event driven architecture, which we feel is very good for the performance of the system for dealing with low power situations and just encourages a good overall design.

So B, well, virtually all of Mac OS X's Unix layer is open source. So if you're dealing with the command line in libraries or the kernel, almost everything you need is open source, which can be a good place to look at the code to see how things are really working, really dig down with the debugger, see how well the subsystems are interoperating.

And it truly is open source under a variety of open source and free software licenses. We have our Apple Public Source License 2.0, the Apache 2.0 is used for some projects. And then of course any projects that we bring in from the open source community we release under the same license that they were originally brought in under.

There's also literally thousands of open source projects available from the MacPort's and Fink collections of open source. And so if there's anything that we haven't already packaged in Mac OS X, hopefully you'll be able to go download it there easily and install it. Open Standards are probably even more important than open source to be interoperable in a larger environment. And I'm really pleased to announce that due to a lot of hard work of people in Core OS, we actually are Unix for the Leopard release.

( Applause )

[Kevin Van Vechten]

So not only do we adhere to the standards of Unix, but we are a registered Unix and this is for the CAPI, the shell utilities and our threading model. The good thing about standards, of course, is there are so many to choose from and we like to choose them all. So here's just a little laundry list of some of the various standards that we employ in our different features of the operating system.

So you may be wondering what I was talking about when I said Mac OS X is really dynamic. Well, Mac OS X really challenges a lot of the norms that you might be used to on an historic Unix system, the big server setting where everything pretty much always stays the same. Your IP address is probably always the same because you're serving up web pages and that just is what it is.

All of your devices are probably always the same. You have your ray to ray plugged in, nobody ever unplugs it, or if they do there's a lot of angry phone calls. And you're probably operating 24 hours a day, $7 days a week, scheduling chron jobs at 2 in the morning because that seems like a perfectly good time to run.

Well, Mac OS X operates in an environment that really challenges all of these assumptions. And they just don't apply for most users in Mac OS X. In particular, most users are on MacBooks and they're roaming from network to network. As we are here, attending the conference, and you walk between buildings or walk down the street, you might be leaving the network provided by Apple and joining the network provided by Starbucks down the street, so you need to be able to adapt in your application to servers coming and going, services being available, not being available, not being available, being available again.

In addition to that, all of the hardware on our system, for the most part, is hot pluggable, USB, FireWire, SATA, 80211 isn't even a plug, but you know, the network can associate and disassociate. And most of our computers in their default configuration will go to sleep, wake up, hibernate, so that assumption that a job will run at 2 in the morning doesn't really hold true because very few users have their desktop machine on at 2 in the morning. So you may need to be a little bit more flexible in how you time some of your events.

And the way to address all of these is to subscribe to notifications, to be aware of some of the changes that are happening in the system. One of the important things to do is invalidate your caches when you get notices that hint you should do so because perhaps some IP address that you have cached isn't any longer reachable. You might want to go look that up again and find a different route to it.

In those terms, you should be willing to try things again later. If a user tries to save a file, you might be able to save it in a few moments when the networks back up and the remote file server's available again. So it's a better experience to give the user some sort of option to retry something later rather than failing and quitting and leaving them no recourse. And in terms of network connections, be willing to renegotiate them if necessary. Again, if the service isn't there now, it doesn't mean it's never going to be available.

So Mac OS X has a very rich structure and I'll start discussing this at the file system layer, since that's one of the first places you'll encounter when you're using a Unix system. We have a very standard file system hierarchy that's pretty similar to a BSD or Linux system. It has all the usual directories, bin, user, var, temp.

But we also have a bunch of other directories, which you'll see on the left hand column, which aren't very familiar from other Unix systems. Well, those are the directories that are actually visible in the Finder to most users. And all of the Unix directories, while they're available on the terminal or available from your programming API aren't shown by default to the user in the finder and there are no icons for them.

So you may be wondering what all of these directories are that Mac OS X has added to the hierarchy. Well, they're organized into file system domains. And file system domains are essentially a search path for resources on the system. When you want to run an application and it needs to load a dynamic library, the system will start iterating through the file system domains to see where that library can be loaded from. Or if there's a picture resource or a sound resource or something like that.

So the first place that's checked in the search path is the user domain. And inside the user's home directory we might have a library directory and inside there, there might be frameworks or preferences or other user specific settings or user specific installed applications. If something's not found there, it'll proceed to the local domain and the local domain is the slash directory of the root volume. And in there there's a library directory where there's the same structure as you found in the user domain. There are frameworks and preferences, fonts and other types of resources.

So this is basically everything that should be available to all users on the local machine should be stored. But perhaps you're administering a site of several different Macs. It is possible to provide resources on the network to all the Macs in your site, and that appears in the network domain with the same structure again.

Finally, if all those other search paths fail, there is the system domain. And we really prefer that third parties treat the system domain as immutable. By default, everything in there is owned by root wheel and the permissions are locked down sufficiently that no other user should be able to modify them without escalating privileges first.

This is the domain that software update will be modifying when you install future software updates. And if you make changes here, there's a good chance it'll trip up a future update, or worse one of our future updates is going to trip up your application. So it's really best to just stay out of the system domain and just user one of the other domains for your data.

So inside each of these domains we have something called bundles. And bundles are regular directories and files on the files system, but presented in a specific structure and we recognize them as applications or frameworks or documents. They show up as a single icon in the finder. The user can drag them, drop them, move them around. And behind the scenes, of course, it's a bunch of directories and files.

So application bundles, they should use relative paths to the resources that they contain. So there's a high level directory and then inside of that is your executable code, maybe strings that are your localizations into other languages, any pictures or sounds or QuickTime movies that you use in your application. The execution code should be using a relative path to access these resources so that when the user drags the application to a different location, it can still be found.

And new in Leopard, application bundles can be signed for integrity. This is a pretty neat new feature. Something that allows us and you to check that the application is complete and unaltered and really what you have installed initially, so you can detect corruption or detect when the user is gone and move something out of place and hopefully be able to deal with that situation more gracefully, knowing that that's indeed what happened.

Framework bundles are our organization of dynamic libraries. And framework bundles do indeed contain an ordinary dynamic library. But also inside the bundle, we package all of the header files, so instead of scattering a bunch of header files all over user include that belong to all the different libraries on your system, we package them neatly in bundles And when you link against a certain dylib, you're also going to be linking against the correct set of header files for that dylib.

There's also a convention we have for storing multiple versions of the dylib in the same framework for compatibility. So, for example, in Leopard we have the Python framework. And on Tiger, we had Python 2.3. On Leopard we have Python 2.5, and they aren't perfectly binary compatible. Well, applications that linked against 2.3 can continue to access it at the historic 2.3 location. Applications new in Leopard, when they're compiled, they'll read the current sim link in the bundle to see that they should link against the 2.5 version and then that's their compatibility version going forward.

It's also possible for framework bundles to contain other resources in the same way that applications do, but more importantly it's possible for them to contain command line tools. And you'll see with the Python framework, the Tickle framework, the Ruby framework, some of these other frameworks on the system, they contain command line tools and it's a good way to keep things organized neatly instead of scattered across the user bin.

Document bundles are where you can store your application specific data. Generally there's a primary document such as an XML file and then again, you can have auxiliary resources and other files in the bundle. So the API for dealing with these bundles, there's several different ways. First we have the NS system directories header file, and this is what allows you to enumerate the file system domains to look for where you should find bundles to begin with.

Then you can use the POSIX API for unstructured access to the bundle, that's manipulating the bundle in terms of being a directory containing a bunch of files. But if you want more structured access that does some convenience functions for you, you should look at the Core Foundations CF header bundle or foundations NS bundle header to get some of these higher level APIs for manipulating bundles.

So these higher level APIs lead me in to the discussion of sophisticated API on Mac OS X. The API Mac OS X is organized in several layers, all of which are available for your application to use. But at the high layer, we have Carbon, Cocoa, X11, and Java.

And these are all interfaces that present some UI on the system and interact with the user in a GUI fashion. At the mid layers Core Foundation, and this is what Carbon and Cocoa are layered on top of, the Core Foundation is a nice API to use when you want to maybe just be a command line utility or a system service, but also interact with a lot of the high level features of Mac OS X. And of course all of this is layered is layered on top of the Unix layer, which is the Unix standard.

So to develop with our APIs you need to use Xcode tools, which is our name for the compiler, the debugger and these other tools on the system. Basically the GNU tool chain at the command line lever. Its optional install in general, not everyone has the Xcode tool installed.

So if you're new to the Mac, you might be surprised to see that it's a Unix that doesn't have a C Compiler installed by default. In addition to that, in Leopard, the command line tools of Xcode aren't even always installed. So when you're installing, make sure that command line tools package is checked, otherwise, just installing Xcode to this isn't enough. You'll get Xcode, but you won't have a user bin GCC.

But when you've installed Xcode and its command line tools, you'll have GCC, GDB, Make, lib tool, which for legacy reason, we call G lib tool on our system to differentiate it from the lib tool that we have internally, auto conf, auto make, all the usual GNU tool chain that you'd expect.

So frameworks, as I discussed earlier are a Mac OS X concept of packaging up dynamic libraries into a bundle. Well there are GCC extensions that we have added that deal with frameworks directly. So when you include a header file in your sources, you can actually include with a path that is the name of the framework and then the name of the header. And this actually gets examined automatically by GCC. It looks in whatever your current framework search path is and then looks in the current version of the headers for that framework.

There are also some GCC flags that you can use to manipulate the standard search paths for frameworks, so if you're linking against a framework that's not in the standard library frameworks location, or system library frameworks location, manipulate the search paths and find the frameworks where you need to find them. And these basically work equivalent to the dash capital L or dash lower case L flags that you're more familiar with in GCC. Now we also have a concept on Mac OS X called software developing kits or SDKs.

And in a weird way I guess you could think of them as a bundle of all the bundles on the system. So if you took all the frameworks for a current release and packaged them up into an SDK then you have a snapshot of all the interfaces for that release. And what this lets you do is on a 10.5 system compile in a way that will be compatible with 10.4 or 10.3 or some earlier release.

Now what this does behind the scenes, is GCC has some extra flags that were added, the dash I sys root and the linker argument dash sys lib root. And basically these paths just get prepended to any of the normal search paths. So it goes through in the SDK that you specify and then looks for the system libraries and headers when linking your application.

Deployment targets are a mechanism that we use within one sys SDK or one disk compiling against the base system without an SDK. And this lets you declare your target environment in terms of which APIs are available. So there's the GCC flag, the Mac OS X minimum version flag. And when you set this to a version, let's say 10.4, it'll do two things. It'll make all of the functions that were defined later than 10.4 invisible for the purpose of this compile, so you won't accidentally link against any functions that are too new for that release.

It'll also warn you about any functions that were deprecated in 10.4. So perhaps there's a better function available that you should be using. The function was valid in 10.3, but in 10.4 discouraged, so you'll get a warning. And this is all implemented in the availability macros header file. And in previous releases, you would set the Mac OS X deployment target environment variable, which still works in Leopard but is deprecated; we recommend setting Mac OS X minimum version tag on GCC.

So when you compile your binaries, they'll get compiled into the Mach-O file format, which is our executable file format. We don't use ELF on Mac OS X. It behaves pretty similarly to ELF. There are a few small semantic differences. Probably the most likely one you'll encounter is that in Mach-O there is a difference in how we treat plug-ins and dylibs.

plug-ins are what we refer to as bundles, not to be confused with the bundles I had been talking about earlier. And dylibs are pretty much the same as an ELF dylib. And the difference here is the bundle actually has some knowledge about what application is intended to load it.

We also provide a two level name space in Mach-O. So not only do we store the symbols that a binary will reference from a library, but we also store the name of that library. And what this achieves is any library or any plug in can define the same symbol and it's unambiguous which symbol the application is going to get. If you run into trouble from the two level name space, it is possible at compile time to specify a flat name space, and this will be have more similarly to other Unix systems.

One of the big features that Mach O gives us is a universal binary. And this is one file that supports many architectures. And currently on Leopard, we're supporting four architectures in Mac OS X. We have Power PC and Intel, but we also have the 64-bit varieties of both of these.

And one of the points I'd really like to stress about universal binaries is it's quite literally the same sources that get compiled multiple times to produce the same binary that can run on many architectures. And the way you achieve this is through the GCC flag dash arch and you can specify as many or as few architectures as you want. We encourage you to specify all the possible architectures so that your application will run any where. Now sometimes it is necessary to know a little bit about the platform that you are compiling on.

To that end we have some preprocessor macros that are available. Big -endian, little -endian, LP64 are the ones we encourage you to use. So if you have a data structure that is dependent on the -endians of the architecture, you can test for these macros. Or if you need to make data structures wider for 64-bit, you can do that.

In the off chance that you're actually doing some assembly programming or literally need to know which architecture you're on, you can test for architecture specific macros as well. But it's probably a better idea wherever possible to use the more generic big or little -endian macros or LP64 macro so that you're insulated against any future changes.

We support static and dynamic linking on Mac OS X. as a general rule, all the libraries that Apple provides will be dynamic. And we can do this so that we can software update and provide any bug fixes that we might have. If we provided static libraries, those bug fixes wouldn't really help third party applications.

When you are linking, we really recommend that you use the GCC linker driver directly. Don't use LD or LD64. GCC has some smarts. It can do the right thing. That might not be immediately obvious when invoking LD manually. And the way to use the GCC linker drivers is simple as taking all of your object files and compiling them into an executable with GCC. For dynamic linking we use the DL open interface. Compatibility was taken from open source and introduced in Panther. It became native in Tiger. And now, actually in Leopard it is the preferred solution. If you're going to be dynamically linking in code, you should use DL open.

So some of the other Unix API that we have on the system and some of the changes that we've been making are that we now conform fully to the Unix 03 standard. So Unix 03 command behavior is the default in Leopard. We did keep any non standard behavior wherever we could that didn't directly conflict with the standard and it was something that was being used by scripts or programs, then we tried to keep it in for compatibility.

If you run into any cases where there is an incompatibility with a new behavior, it's possible to set an environment variable known as the command mode. By default, if it's empty or it has this validating in Unix 2003, that's what you're going to get on Leopard. If you set it to the string legacy, it'll make command behave like they used to behave on 10.4 or earlier.

There are also a few places where we automatically set this environment variable for compatibility. First of these places is in installer post flight and preflight scripts. If you have an installer package that was created on Tiger or an earlier version, then when the installer runs it'll go ahead and set this environment variable. So hopefully you won't have any problems with any of your existing installer scripts. However, if you create a new package on Leopard, we won't set this. We hope you are using the standard behavior of the commands. And so we'll assume that.

We also set this environment variable for compatibility for any applications that link against Core Foundation or a higher level framework. And if we can detect that they were linked on 10.4 or earlier, we'll set the environment variables so that if they fork an exec or P open or call system or any of these other APIs that reach out to the shell, hopefully, again, the shell will behave in a manner that they expect.

But for any applications compile don Leopard, we hope that you have done testing on Leopard and are using the new standard behaviors. So to give a little bit of a practical example of how this might impact you, on Leopard now, with the Unix standard, echo dash N, hello world, quite literally echoes dash N, hello world.

So if you wanted to revert that temporarily to the legacy behavior, you could set the command mode equal to legacy, run the same command and you would get hello world without the trailing new line. But as I said earlier, wherever things didn't directly conflict with the standard, we tried to preserve them for compatibility.

If you're not using the POSIX shell, but let's say you're using Bash, which is outside the scope of the standard, then we've preserved the standard Bash behavior, which is to honor the dash N and not print a trailing new line. And of course, what we would hope, is that you do the POSIX standard way of suppressing the new line by including a back slash, back slash C and then hopefully that'll work in the widest variety of places.

So not only are there command changes in Leopard, but we also had some source compatibility changes. And you'll probably get some compiler warnings and compiler errors that you didn't used to get before Leopard. Most of them tend to effect system level services, not so much application level services, so hopefully a lot of you won't even notice.

But if there is a case where you absolutely cannot live with the new standard behavior, it is possible to define the non standard source preprocessor macro and that will opt out of the Unix 03 behavior. But I should mention that non standard source is not available on 64-bit. Since 64-bit is a new platform with Leopard we really wanted to start with a clean slate. And 64-bit is completely Unix conforming from the beginning.

So the difference of setting or not setting this environment, or this preprocessor macro is that functions that had changed behavior are invisibly suffixed with a dollar sign Unix 2003. so non suffixed versions still exist and that's what applications compile on Tiger or earlier continue to call so they continue to behave as they always did, but if it's compiled on Leopard or later and it's using one of these functions that changed due to Unix standards, then it'll have an invisible dollar sign Unix 2003 appended to it and get the new Unix behavior. This has a few applications that I'll discuss in a moment.

The first of these implications is that you really need to include header files and not rely on implicit function definitions. Because if you do you're not going to get the magic that invisibly suffixes dollar Unix 2003. in the worst cases, in some places in your code, you might have the header file link scope, you might be getting the new behavior, and in other places in your sources, you might not have the header and scope and you might be getting the legacy behavior and perhaps mixing and matching legacy behavior with the new behavior could lead to unpredictable or undesired results. So you really should warn about missing prototypes. If you see missing prototypes for standard functions, go see what header file needs to be included to make sure that you're getting all the right preprocessor magic.

The other thing this affects is binary compatibility. Binaries that were compiled before leopard still should run on Leopard as mentioned. But if you compile a binary on Leopard and don't use an SDK, it might not run on an earlier version of Mac OS X. this was always the case, but we've just introduced a whole lot of additional ways this could fail with the Unix compatibility.

And what the symptom would be if you do choose to build a Leopard binary without an SDK and run it on an earlier system is that you will get undefined symbols at run time and most of these undefined symbols will have the dollar Unix 2003 suffix on t he end of them.

Now all the Unix API we actually package into one library, lib system. And for compatibility, since this is often in multiple different libraries on other Unix's, we have sim links to lib systems, so hopefully you won't have to change too many of your make files. But for performance reasons, we do have it all packaged in a single dynamic link library.

Another difference in the Unix behavior from other Unix is that our standard calls for get adder info, get PW Int, get GR Int. I actually talked to our directory service team and that's part of our open directory infrastructure. What this means is that unlike other systems where you might just be reading the files in slash at C, you know, at C password or at C groups, on Mac OS X, you'll also be consulting the directory service local database, you'll be consulting LDAP if you're bound to an LDAP network, Active Directory if you're bound to an Active Directory network, Bonjour for host names on the local area net, and so basically you get all of this additional information free of charge from the standard Unix APIs.

Another big change in Leopard, with respect to open directory, is that Net Info was removed from Leopard and everything has been consolidated on open directory. So if you have scripts that use NICL or NI Util to manipulate the Net Info data store, you should be changing over to use DSCL, that's the Directory Service Command Line tool. It will let you manage your user and host records on the local data store. It's also been present since Panther, so you should be able to use it compatibly across several releases.

There are also alternative GUI methods that were added in Leopard to do some common tasks. One of these is it's now possible to enable the root user, should you need to do so, using directory utility in the utilities folder. It's also possible to control click an account in the accounts pain of systems preferences and that will bring up a sheet that lets you edit a lot of the advanced characteristics of the user account, including the user ID, the home directory path, and some of these other things that aren't normally editable.

Another interesting note about open directory on our platform is that open directory is really what handles all of our authentication and PAM is mostly a shim to Open Directory. So it takes the standard PAM API and calls out to Open Directory to perform authentication. We do this because there are a lot of standard Unix utilities, SSH, FTP, Login, that all rely on PAM and we can bridge them to Open Directory in a unified way. However, login window does not use PAM, so if you're expecting your PAM modules to work with Login, it's just not going to work. There are login authentication plug- ins that are a different architecture.

So some of the higher level Mac OS X API that we have layered on top of the Unix API has a few concepts that I'd really like to emphasize. And the first of these is Unicode. Unicode is a big part of what we do in Mac OS X. Unicode, as you probably already know, is a standard that provides a unique number for every character. f All the characters for all the different encodings that are out there were unified into Unicode, which hopefully let's you display characters from any language in the same encoding.

One of the ways this most directly impacts the Unix layer of Mac OS X is that HSF plus file names are Unicode. They're not a random, or I shouldn't say random, or arbitrary blob of data like they might be on USF, where you can store pretty much anything you want in there aside from slash, I guess. But on HSF, there is an encoding. It's Unicode specifically UTF-16.

and our convention for the POSIX APIs is to use UTF-8. and the reason we use UTF-8 is it doesn't have any embedded null characters, which means it works with the standard C string APIs such as stir L copy or stir L cat. And yes, you should be using stir L copy and stir L cat to avoid any buffer overflows that you might get with stir copy and stir cat.

Another way that Unicode directly impacts your Unix applications is that terminal defaults to UTF-8. o all of the command line input that you get in your RV array are going to be UTF-8 strings by default on almost everybody's Mac. Any standard input that's typed into the terminal and piped to your process is going to be UTF-8.

and of course you should produce UTF-8 because that's probably what all of the other tools are going to expect and that's what the file names should be opened as. So if everybody plays with UTF-8, then for the most part you'll have maximum compatibility in terms of character encodings with all of the other command line tools on the system.

I don't know if you're too familiar with UTF-8, but it breaks some of the common assumptions that have held true for a long time in Unix programming in the United States, which was that one character was one bite. No, that's not any longer the case. UTF-8 is a multi byte characters. So characters can be anywhere from one byte to 6 bytes or so. It's also for a character to take up multiple columns on the display.

So what this means is in the world of UTF-8, the stir line function is really only useful to determine the size of a buffer. It doesn't tell you much about the string at all. It doesn't tell you the number of characters, it doesn't tell you the number of columns that it'll take to display.

And you should also be aware when you're manipulating strings that picking an arbitrary location in the string to break it into two pieces might split the middle of a sequence that represents one character, which leaves you with an invalid character on the end of one string and an invalid character on the beginning of the second substring.

So what can you do to avoid this? Well we have the Unix standard wide character strings API. And the fundamental unit of the wide character strings are the wide character type. And a wide character type represents a single Unicode code point. And there are APIs available to convert from Unicode to wide character strings.

And a bunch of APIs available for manipulating wide character strings once they're in that format. And then you can convert back to UTF-8 and print them to the display and get the results you wanted. So I'm going to give a real world example of where wide character strings come in handy.

In Leopard we brought in the Intel application, which prints this nice little calendar on the terminal. And as you can see it's printing the first two characters of every day of the week in English. Well, what happens if you set the local to a multi byte local. The idiom I'm using of Lang sets the environment variable Lang for the single invocation of cal.

You can see all the days of the week are question marks. Well, why did this happen? It turns out that let's say a day of the week character has a Unicode code point that's also represented by a three byte UTF-8 sequence. And the end cal program was simply copying two bytes of the string into the destination buffer because it only wanted the first two characters. So what you ended up with is question mark, question mark because you had this illegal UTF-8 sequence. It wasn't a complete character.

The terminal didn't know how to render it. It just put a question mark there as a place holder. So what can we do to fix that? Well it turns out it's possible to use the wide string API and in the first block of code you see up top you can convert from UTF 8 into a wide character string.

In the middle block we're actually iterating over the string and using the WCS width function, which is determining the display width of the string. And what we want to do is keep chopping off characters off then end of the string until we get the display width down to two. And then we can concert back to UTFF 8 and print to the screen.

And as it turns out, that makes the display much nicer. Another interesting thing to note, in this example is each of these characters are rendered by terminal as to columns wide. So we were limiting the display width to two, we ended up with a three byte sequence and it's actually only rendering as a single character on the screen.

Aside from the Unix standard wide character string API, there is some higher level API on Mac OS X for dealing with Unicode strings. These are CF String and NF String. And the reason you might want to use these is because they work very well with all the other CF and NS types.

So if you're doing a lot of programming at the higher layer, you probably want to use their string types so that you can mix and match with their other objects. And CF string is actually an abstract representation of the UTF-16 string and it supports many character encoding versions. So where ever your source data is from you can create a CF string from that and you can also convert to any encoding that you need.

And this is just a simple example of creating a C string from a UTF-8 encoded arg B vector. So whatever argument you are passing to your command line tool would create a CF string from that, then convert the string back to a C string so we can print it out.

In addition to Unicode, Mac OS X has an event driven architecture. And the even drive architecture really focuses around the concept of run loop. And the run loop provides much more than select. If you're used to the standard command line programming, then you'll probably have a select loop, which tells you when data is ready to be read from a file descriptor or written to a file descriptor. If you're used to using a GUI tool kit like KD or Gnome, they have more advanced concepts than a select loop.

And we have run loops on Mac OS X. And run loops accept a variety of sources, more than just file descriptors, but they do accept file descriptors, too. And in Leopard we have a new CF File descriptor type, which should help with compatibility. But you can also be waiting for events regarding network sockets, timers, so you can have a timer fire at an arbitrary point of time in the future, mock IPC messages, and there are many more sources.

You can even create your own sources. So if you do have a select loop that you absolutely have to have in your application, there's really no way to integrate it with a run loop. It might be possible to create a custom source. Do your select loop on a separate thread.

Do whatever type of event recognition you were historically doing on a separate thread and then send a message over to the main thread. One of the great advantages of run loops is that it does automatic event dispatch via call backs. So when you make a select loop, yeah, it will block until there is data waiting to be operated on on a socket, but then you have to figure out which socket it was on and then decide what you're going to do with it.

The way the run loop is modeled is that when sources are installed into the run loop, they each have a call back associated with them and you just get your callbacks called whenever an event happens. There's really no dispatch that you need to take care of at all. It's all taken for you by the framework.

So, what does that get us? Well it gives us seamless interaction between applications and frameworks. One of the advantages of that, applications and frameworks won't be stealing each others events. They can all install sources cooperatively into the run loop and they can all handle the events that they're supposed to handle. They're not going to be clobbering each other.

It also gives frameworks an amount of autonomy. It's possible for a framework to install a source into the run loop without any knowledge on the part of the application. If you're doing a select loop, the framework's going to have to give you a file descriptor or some other type of handle that you can select on, but with the run loop, they can just install their sources into the run loop. The application doesn't even have to know. So the framework can be doing it at initialization time. It can get every event that it needs.

The application can be compatible across multiple versions of the framework as the framework changes behind the scenes how it deals with events. And the way frameworks can do this is they can act as a run loop which is stored as a per thread global and is accessible via the CF run loop get current API. That returns a reference to the current run loop and that's where you can install all of your event sources.

Another advanced feature of run loops is that it's possible to run them in a mode. And a mode is essentially a subset of all the installed sources in the run loop. So perhaps you want to listen for all events and then once you received one you only want to listen to a subset of events until that operation's done. That's what run loop modes can be used for. We also have a concept of observers in the run loop. And observers are basically callbacks that get called in each point of the cycle of the run loop regardless of whether any events have been delivered or not.

So what are some of the best practices for using run loops? Well, you shouldn't block the main thread. If you do, you're going to get that spinning beach ball cursor that we're all so fond of. And to this effect, you should keep your callbacks relatively fast. The more operation you do in your callback, the more likely you're going to be blocking the main thread for a period of time long enough that it's going to cause the spinney beach ball cursor to appear. The window server is what displays that when it feels that your application's no longer responsive.

The best way to keep your callbacks fast is to post notifications or cue work for another thread to do so that the main thread is then free to respond to future events. And the new NS operation API in Leopard is a great way to cue those types of tasks for other threads to handle.

So here's a real world example of using a run loop in a command line program for Unix. We have the I/O stat tool, which will print statistics from time to time about disk activity on the current machine. And one of the things we notice is that it only reported activity about whatever disks were plugged in at the time the command was invoked. If you added another disk to the machine it wouldn't print any statistics. If you removed a disk from the machine it would crash I/O stat.

So what we were able to do is use the I/O Kit framework and install some notifications. So first we established a connection with the IO Kit framework and then we created what I known as a matching dictionary saying that we wanted to receive notifications about all events pertaining to whole media devices. So that's an entire physical disk or logical disk, not one partition on it, but the whole disk itsELF.

And once we've created this matching dictionary, which indicates what type of events we're interested in, we can add our calls backs. And in this example we're adding two call backs, we're adding a call back called add disc call back which gets invoked anytime one of these whole media devices appears for the first time on the computer. And then we also added call back called the remove disc call back which will get called anytime one of these whole disc devices is removed from the current device tree.

Now we will get a reference to the current run loop and add the I/O Kit notifications as a source to the current ruin loop in the default mode. So once we have all of our call backs set up we can go ahead and kick off the run loop and while the I/O stat tools running, we'll set a time out of one second by default, actually not by default but then one mode of operation will update the display every second and so instead of just doing a sleep for that second, we'll actually run the run loop for that period of time and if any disc's appear or any discs disappear our call backs will get invoked, we'll update our internal data structures after the period of time has elapsed we'll update the display and loop around and do it again, wait for more disc's to appear or disappear and in the meantime keep printing all the statistics of the current discs. I would like to give you a demo of this in action.

(Pause in speech 0:46:21.8 to 0:46:43.2)

So the dash D argument is going to suppress a little bit of extra information and then I'll have it repeat for really high counts so it probably won't finish before the demos over. So you can see right now it's burning statistics about two discs and we can take a disc image file here and mount the disc image and it shows three discs.

Well wrong icon. Underneath there I have a second disc image. You can see that there were four discs mounted. We can eject one of these discs it goes back down to three and eject another, goes back down to two. So this is a real world example of a command line utility that actually can benefit from using a run loop, receiving event notifications and dealing with Mac OS X in a dynamic way in opposed to just assuming a static configuration.

(Pause in speech 0:47:39.0 to 0:47:47.0)

Turn back to the slides.

(Pause in speech 0:46:47.2 to 0:47:54.0)

So there are a lot of frameworks on Mac OS X that provided a lot of run lib sources that are events that you probably would be interested in, in your applications. I/O Kit is the one that I just demoed and one of the reasons to use I/O Kit is that it provides everything in the device tree not just, not everything in Mac OS X appears in slash dev. Unlike Linux or unlike the SD where pretty much all devices appear there, I/O Kit has its own device tree and only a few things are exported to slash dev for compatibility.

An I/O Kit will give you these call backs anytime a device appears or disappears and while the matching dictionary, I showed you was repetitively simple, we were just matching on any whole device. We could really fine tune that search and say only USB devices, only FireWire devices, only FireWire devices with certain vendor ID's. There's all sorts of things you can match on.

Built upon the I/O Kit framework is the disc arbitration framework which doesn't operate on the devices themselves but more of the logical volumes and the partitions that devices contain. Disc arbitration has API for mounting discs and ejecting discs and more specifically run them call backs to let you know anytime a disc has appeared or a disc has disappeared.

These also the system configuration framework in Mac OS X and this deals with all sorts of local machine settings, particularly network related setting. Things like the host name the IP address of the current machine, what your web proxies are on a system wide level and you can get run loop call backs for anytime one of these preferences changes so your application can start using a proxy if one is just been recently been specified.

Or you might be monitoring the status of network connection, such as PPP connections or VPN connections to know that you should or shouldn't be trying to talk that server because the connection has gone away. And there is also a more generic network reach ability API, which tells you whether s particular IP address is routable or not so you can decide if it s not routable there's probably no since trying to contact it and more importantly if you do have some type UI or you bring some sort of dialogistic messages you can inform the user in a more intelligent manner that the server that they want to access isn't reachable with the current network configuration.

We have the Notify API in the BSD layer which is just a simple lightweight notification mechanism. Its analagous to Unix signals but we use name strings which gives you a much wider name space for sending notifications to other applications. But like signals if multiple notifications are sent and relatively close to one another, they'll all be coalesced into a single notification.

The Notify API has multiple delivery mechanism, we tried to make it as easy as possible to incorporate these notifications into whatever architecture your existing application has so you can receive notifications via file descriptor that you can plug into your select loop if that's what you are still using you can receive them over a mach port, you can receive them as Unix signals, you can receive them by pulling although we really do discourage that and for all the details about the Notify API, you can view notify.h. Also new in Leopard we've published some of the notification strings that we're using around the system so you are free to monitor some of these notifications in your applications as well.

These include the directory service cache and validation to let you know that user and group caches have just been purged in the directory or service daemon, perhaps a move form one ldap domain to another or got taken off an ldap domain entirely and so your application may be running with some credentials that are no longer applicable an the current environment an might want to re evaluate all the users and groups that it knows about. We also send out notifications whenever a new mount appears at BFS layer or when there's low disc on the root device or other devices and you can receive notifications about host names changes and time zone changes.

Bonjour is another source of events for you r run loop. Bonjour lets you advertise your services on the local area network and I thin k on the wide are network now with DNSSD and this lets you find other services on the network without knowing there IP address or port ahead of time.

Its really kind of an at hawk DNS system and the way this feeds into your run loop is that as new services are published by other machines on the network and you will get notifications on your run loop and present more options to the user about binding to these other services it that's what makes since for your application.

Another important consideration in terms of being dynamic with Bonjour is that you should cache the service name, not necessarily the IP address or the port, so remember what services you want it to connect to not specifically what the port was at the time because that might change in the future.

We also have some new API, the Mac OS X layer and Leopard, one of these have been around in the Tiger time frame but is recently published as public API for you to use and that is our copy file API. And copy file lets you copy files while preserving there extended attributes, access control list, resource for all that other additional information aside from the data of the file.

It also offers some compatibility for file systems which don't support these characteristic's natively. It will automatically take a file that contains extended attributes or somebody's other bits of information and create and AppleDouble file which your probably more familiar with seeing it with a dot under bar and your directory listings.

Then as these are moved around and copied back to the file system that does natively support these characteristics. Those two files will get merged back into a single file and copy file is the engine is the engine that we are using by CP, TAR, GZIP, Rsync all of our standard tools that will preserve this information and we encourage you to use it to so that all applications can copy files around in a manner that's preserved for any extended attributes.

Soon as we add new types of information to the file we can add that support to copy file and your applications should be able to get it for free because the engines taking care of copying all the things that you may not have never even heard about at the time of writing the application.

Mac OS X also has a security framework and the security framework is our API for authentication and authorization. One of the main parts of the API is the Keychain. The Keychain provides secure credential storage. It encrypts your web passwords and other sensitive bits of information. There's also an authorization API which is a rights spaces system, we really would encourage you to move away form the old historic Unix assumption that if the process is running with the same UID as I am running as everything's fair game, trying to introduce finer grain controls into the system.

One of the authorization API will do is it will prompt for a password if that's necessary but the rights also have al lifetime to them which is an administratively configurable setting so sometimes you can do the same operation several times since the succession and you won't need to be prompted for a password if your configured that way.

And like I said the administrator does have control over a lot of the aspects of the authorization API so you can grant certain rights to certain users. You can deny certain rights to certain users, you can set timeouts individually. You really get a lot of fine grain control as an administrator.

So what does this authorization API give us? Well one of our long term goals in Mac OS X is to avoid the use of set UID binaries in Unix instead we really abdicate using IPC and LaunchD. A set of UID binaries are very difficult thing to secure, you have to know everything in your execution environment and assume that whoever's allowed to execute the binary, you know isn't going to be able to attack you by setting an environment variable that changes some behavior in library or a lot of things that are just really hard to anticipate.

On the other hand if you use IPC and LaunchD you can send a message to launch on demand helper which will you know perform some action? Launch D can start this process privileged, but it can start it from a clean execution environment. And in order to know whether this privilege process should carry through with the task that's been requested, it can use the security frameworks authorization API.

And the way you can do that is by taking an authorization token and serializing it, passing it in the message to the daemon, the daemon can extract the serialized authorization right, ask the security framework is this valid? Is this something I should perform? And if it is, I can go ahead and perform the operation.

And really the root of all of this is that protocols are something that can be secured easily. You can look at your protocol, you can look at what parses the protocol, you can figure out what it's allowed to do and insure that it is only letting those valid requests through. But securing the runtime environment is really an intractable problem. So that's why we'd like to shift away from set UID binaries to launch on demand privileged helpers.

And another big change in Mac OS X is that X11 has been updated to X11R7.2 from the X.org tree. It's an optional install, but it's on by default in Leopard. We've added in launch on demand support from terminal, so in a way, X11 is a launch on demand helper of any of the X11 binaries that you invoke on the command line. X11, like all of our other BSD libraries are available 64-bit.

We did move the path from user X11R6 to user X11, so that we won't be dated in future updates. And another great news about the X.org changes is that we've been committing them back to the X.org repository so when we ship Leopard we fully anticipate that if you go to X.org, download the sources from their repository, build those sources, you'll get exactly what you would have gotten if you'd installed it from the Leopard DVD. And now I'd like to give you a demo.

( Applause )

[Kevin Van Vechten]

So in the spirit of not running things until they're needed to run to conserve battery power. By default the X11 environment is not running. However, we do have a display variable set. So when we run an X11 application like Xterm, X11 runs.

( Applause )

[Kevin Van Vechten]

And it's the X11 that you all know and love.

( Applause )

[Kevin Van Vechten]

That's the extent of the demo. It's real exciting, I Know. But behind the scenes there's a lot of integration that goes on to make this the case. Actually I can give another quick demo in terms of privilege separation. One of the things we wanted to do in leopard is provide a more secure environment for temp files. So you might have always had temp files go into the slash temp directory.

Well, that can be erased. Other applications, other users can create files before you. You have to do some special checks to make sure that the files don't already exist. So we've given every user their own temp directory on the local volume that is created securely so that no other user can interfere with it. And when an application wants to access this directory, it can use the comp store API which also is manifested on the command line as get conf.

so if I ask for the Darwin user temporary directory, you can see there's a path, Var folders, it's based on the UUID of my currently logged in user, temporary items. And if that directory didn't exist ahead of time, invoking this API to look up the path will actually call out to a privilege helper to create it for the first time. If it already exists, there's nothing to do. It doesn't need to start another process.

But if we look at the permissions on var folders, you can see var folders has it's on by root wheel. W3e've divided up the name space so we don't get too many directories in one directory. But if we look in there we can see that the temp directory is in fact owned by the user. So we don't have a set UID binary anywhere, but we can create a directory in a secure location via a launch on demand helper.

( Applause )

[Kevin Van Vechten]

Going back to the slides. So for more information there's a mailing list on the Apple Mailman List Server. Particularly there's the Unix porting list where we'd be happy to answer a variety of common questions that come up when porting Unix applications to Mac OS X. there's the X11 users mailing list for specific X11 questions.

And then for any of you who are interested in manipulating the open source layers of the system itsELF, there's the Darwin development mailing list where we discuss aspects of the Darwin system itsELF. And for more information you can contact Ernest Prabhakar), the Unix and Open Source Marketing Product Manager. We have our open source website at developer.apple.com forward slash opensource and we have documentation sample code and other Unix resources on the developer.apple.com website.