Darwin • 25:11
Want your UNIX application to reach a whole new set of customers? This session covers the foundation of how to port your UNIX app to Mac OS X. Developers get specific guidance on what to do, the tools that are available, and helpful porting tips and techniques.
Speakers: Dave Zarzycki, Jason Evans
Unlisted on Apple Developer site
Transcript
This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.
So thanks for coming to the Porting UNIX Apps to Mac OS X talk. How many of you were here for Ernie's talk? Ask it the other way around. How many of you were not here for the last talk? Okay, good. Enough of you. There's some overlap between what Ernie talked about and what we're going to talk about.
So we're going to take this a little bit quicker than we had originally planned to spare those of you that have already heard a lot of this content. There will be plenty of time for Q&A at the end, and we have some pretty experienced people porting people. So keep the questions in mind for the end of the talk. My name is Jason Evans.
I'll be giving the first half of the talk, and Dave Zarzycki will be doing the second half. We work in the BSD Technologies Group, which is part of the Core Operating Systems Group. We spend a lot of time porting UNIX applications to OS X, maintaining them, feeding patches back to the original maintainers of that software when it's a fitting thing to do.
So, we're going to talk about some of the gotchas that you'll run into when porting applications. Not really going to talk about the normal process you would run into, you know, go through in porting a UNIX application. Going to talk about the things that you would stumble on with OS X that are unique. We're not going to talk about drivers. There are other sessions this week. We'll refer to some of those at the end of this talk.
Graphical user interfaces. Not going to talk about that here either. There are other sessions you should check out. You're highly encouraged to, if you have a graphical application, say that uses X, you're encouraged to go ahead and use the Aqua frameworks. Check that out. Multimedia we won't be talking about. Printing, that doesn't affect a number of UNIX applications. There's another session I think tomorrow afternoon that you should check out. We're not going to talk about font handling and pretty much anything you can see we're not going to talk about in this session. So.
Brief history. Ernie gave some history on this. A little bit of background kind of from a technical perspective. Next took, back in the early 90s, they took BSD 4.3 from Berkeley and they took Mach 2.5 and they merged that together, made a number of modifications, enhancements, and they maintained that over a number of years.
When Mac OS X development work was started on that, they wanted to sort of refresh that. So they took the final drop from Berkeley, the 4.4 BSD Lite, Lite 2 distribution, and then Mach 3.0. They merged that together, merged some of the next modifications into that, and that was sort of the foundation for Mac OS X.
The software comes from a number of different places, and since then, we've also taken some of the technology that's been developed by the FreeBSD project. We've been working on some of the features that we've been working on in the past, in FreeBSD 3.2 and more recently in 4.4 and 4.5. It's kind of hit and miss, though, which parts came from which versions.
Here's a basic system diagram from the perspective of someone that's actually porting a UNIX application. A lot of the higher-level frameworks are left off in this diagram. This is really kind of the low-level command line view of things. So your application will typically sit on top of libraries, which you're surely familiar with, and frameworks, which are a conglomeration of headers and libraries. It's kind of a neat little packaging for an integrated development environment programming.
You'll potentially run into some of this if you're using the printing framework, for example. You hear of frameworks in OS X. That's basically what they are, and there's some special flags that you pass to the compiler in order to use those. Under that, we have libsystem, which is a conglomeration of a number of low-level system libraries. libc, libm, which is the math library, libinfo, which gives you net info access. About a half dozen libraries in total. We merge those all together, essentially, to reduce the amount of shared library overhead.
There's, of course, a certain amount of overhead for every shared library that you load into your application. And pretty much every application uses most of these libraries, and so we reduce the overhead, and it gives us a pretty significant overall system savings. At the bottom level, you have the kernel, and you can see in this diagram, BSD and Mach are side-by-side. We don't actually have a microkernel architecture.
We have the APIs for both of those, but they are merged together into the same address space. I/O Kit is our driver framework. The I/O Kit is essentially how you would write drivers. It's substantially different from typical Unix drivers. So there are other sessions that talk about that in detail.
OS X comes from a number of different places. The typical distribution that you'd be familiar with is the one that comes on most systems. It comes with the Mac OS X client software. We also have a server product which has some enhanced management tools, also has a few extra pieces of software.
This really isn't going to be that important to you unless you find yourself developing on Mac OS X's server. Be aware that not all of the software is available on the client product. For example, MySQL I don't think is a default part of the client product. If you end up using that, keep in mind that you're limiting yourself to the server product.
Darwin is a distribution that we do of the low levels of the operating system. That's our kernel, most of the drivers, all the user land, the low-level libraries. We provide that in source and binary form. We usually try to get that out about the same time as we get out our major releases. So you're able to get at all the source code, pretty much, of the low level. This ends up being a viable development platform if you're used to a traditional UNIX environment. Pretty much everything that you're used to is there.
Third-party software is provided as ports from a couple of different places. The Fink project, which is hosted on SourceForge, they have several hundred projects that they've ported over, and they provide Debian packages that you can install on your OS X or Darwin system. This is interesting from the perspective of porting, basically because there are a lot of examples there.
If you have an application similar to something that has already been ported by Fink, you can take a look at how they've done it. We're also working on our own infrastructure, which is going to be available on the opendarwin.org site sometime soon. Keep a look for it in the next week or so.
File system portability issues. Some of these are general issues. Some of them are specific to OS X. POSIX file systems are normally case sensitive. Probably any UNIX operating system you've worked on has had a case sensitive file system. Sparse files, although not universal, are available on some operating systems so that you can create very large files that are zero-filled, and then as you actually write the blocks of data, those get filled in on the disk, and it doesn't end up taking a lot of space overall. So you could create a two-terabyte file, sparsely fill it, and only end up using a few megabytes of space. Keep in mind that although our UFS file system does that, our HFS Plus file system does not. So HFS Plus is case-sensitive.
That case-sensitive file system doesn't really cause as many problems as you might think, but there are a few gotchas. One of them is, say that you have a build system where you would have a make file, and you would type make, and then you would type make install to actually install the software. Well, a lot of the GNU software, as an example, has an all-capitals install file in its top-level directory. And so you type make install, and it tells you that the file is up to date.
Not ideal. Basically what happens there is that the file make is looking for the install file to see if it's up to date, and it finds it. It tried all lowercase install, and the operating system, the file system, said, yeah, it's there. So there are a couple of ways around that. You can create a .phony target in your make file. You can move the file aside.
It's not that big of a deal, but it's a pretty surprising result if you don't realize what's going on. Another thing that you cannot do, you cannot have, for example, a capital M make file and a lowercase m make file in the same directory, because they're treated as the same file with the HFS Plus file system. HFS Plus also has resource forks.
This is something that we're not really, it's kind of a legacy thing. We're not really using it that much. It's something to be aware that it exists. When you're moving files around, if you're trying to preserve somebody's data, you might need to be aware that the resource forks actually exist there.
Hard links are not supported by all the file systems. UFS supports them. HFS Plus, after a fashion, supports them. There's some trickery that goes on behind the scenes to actually make that work. From an external perspective, it effectively does the right thing. But there are other file systems like MS-DOS, ISO 9660.
If you need your software to run on those, keep in mind that you're not going to have hard links in every case. Okay, and at this point, Dave is going to come out, and we should have plenty of time for Q&A at the end, so keep your questions.
ready?
[Transcript missing]
Also in Mac OS X, we have something we call the two-level namespace. What that means is that your application or your library, whatever you're building, has every one of its symbols that it uses that it doesn't provide. What library provides that is recorded in your application. So, for example, if you use Sturcat from LibSystem, it's recorded in your library that Sturcat comes from LibSystem. So in that case, if one of the other libraries that you use starts providing a Sturcat, you won't get the wrong Sturcat.
You get the one you know you were linked against. This can provide a problem, though, for legacy UNIX developers who rely on what the inverse of a two-level namespace is, is the flat namespace. So, for example, if you want to override the malloc libraries, malloc, free, realloc, et cetera, and you want every library to take advantage of those symbols, you need to compile your application differently, and that's through the dash flat underscore namespace compiler flag, and that'll provide you, again, with a flat namespace. Amen.
Also, due to our Next heritage, where dynamic-- dynamicism is heavily encouraged, static linking is seldom used in the system and, in fact, is heavily discouraged. You'll find it very hard to find a .a file anywhere on the system, so get used to dynamicism. Also, in Jaguar, we've added Mac OS 9-style weak symbols.
Weak symbols are a way for-- at runtime to determine if a library is available. So, for example, if you need some function called foo from some library, you can just say if foo, and if foo is non-null, you can then use foo. And by testing the-- for symbols for non-zero, you can know whether or not the symbol is available for use.
Frameworks is the, frameworks are bundles. They're basically everything you think of as a library is now a framework. The one that the BSD team deals with the most is lib system and that's your traditional UNIX libraries as both Ernie and Jason have covered. It covers lib C, lib M. It used to cover lib curses under PUMA and previous systems.
That's been broken out now into a separate library for performance reasons and compatibility reasons. And there are a few other scattered libraries that you can probably think of that are in Other runtime issues. Configuration resource files. Open directory is the preferred way from Apple to store your system preferences. Today, that'll back into a NetInfo database for local system information.
For personal user information, CF/NS preferences is the way to go. The preferences files will actually end up in your home directory/library/ and David In the case of startup items and scripts and whatnot, the old BSD way of just etc, rc and whatnot and static scripts doesn't scale for a dynamic system like Mac OS X where users just want to drag startup items to their system. So considering that, they live somewhere different. They're in system library startup items. They're bundle-based, but they're not real CFBundles. And they're composed of an executable and a property list or a plist.
This property list contains dependency and provides information to allow our runtime startup system to determine when to launch startup items in the correct order. It also allows for determining parallelism so that potentially some, you know, two network services could start up at the same time to potentially increase -- improve startup time.
More runtime issues. Pre-binding. Pre-binding is an optimization technique to make applications launch faster. What it involves is taking all the symbols that an application uses that are provided by libraries and pre-computing the location the libraries will load at in your address space. This allows for increased better launch times because no computation needs to be done. It also provides decreased memory usage because less memory is used.
The one con, though, that many UNIX people might run into is that it modifies your executables and libraries. This consequently confuses some applications such as backup and security tools like Tripwire. Having all your programs change every time you update a library is kind of annoying from a Tripwire perspective.
The other important thing to note is our bin SH is now bash in Jaguar. It was Zish. Those of you that are good script writers won't notice. Portable scripts won't have a problem. But there were, as we discovered during the transition, there were a few Zishisms that crept in that bad people made mistakes. But that was -- you good people won't have that problem.
On to the topic of frameworks to get into more detail. It's a bundle-based alternative to the UNIX hierarchy, so you'll find things in the paths that are listed there. It contains the libraries and the headers and potentially different versions of the same library, too, for compatibility's sake. It potentially contains other resources such as sounds, pictures, anything you can think of that a library might need. In a typical UNIX system, you would find these in user, share, or some directory.
Frameworks can live in the following locations. There's system library frameworks for the things that Apple provides. There's library frameworks for the things the system administrator installs. There's network library frameworks for kind of a network-wide location. There's your home directory yourself if you want to personally install a framework for your own apps. And applications that need a framework can also install an application inside-- a framework inside their application bundle, since applications are bundles themselves.
On the topic of authentication and authorization, Apple's preferred framework is the security framework. It provides a capability and rights-based system for users to control the ability to do things in a secure manner. We do provide PAM for a portability sake. If you need to port your application to 10, you can call PAM. And what PAM does is just quickly call into the security framework. It's just a thin shim.
In the case of the development environment, we are mostly GNU toolchain. This includes GCC, GDB, Make. The GNU binutils is not there. We've got a very heavily modified version of it that it's basically not the same anymore. In the case of the C preprocessor, we have a technology called precompiled headers.
Precompiled headers provides a massive speed improvement for C++ and Objective-C applications, but negligible speed improvement for C applications. And considering that the bulk of Apple's developers are C++ and Objective-C developers, we decided to optimize for their case. This can create a problem, though, that our C preprocessor that understands precompiled headers does not understand some of the GNU C preprocessor extensions that some of you might be using. So you might need to use the --no-cpp-precomp flag to get your code to compile. We also have the lovely project builder and interface builder that you can find out more about at the rest of the show.
But it provides a GUI editing environment for not only taking your legacy projects, like in project builder, and using our GUI editors and tools there. But it also provides a GUI editing environment for not only taking your legacy projects, like in project builder, and using our GUI editors and tools there, but also for the application.
And we have a lot of great tools out there that we can use to build our project. And we have a lot of great tools out there. Dave Zarzycki, Jason Evans In the case of the C preprocessor, we have a lot of great tools out there. But if you want to build GUI apps, too, you can use our interface builder to design your GUI in a GUI, which is a novelty for the Unix developers.
API considerations. We do not have pull. It would break binary compatibility, and we consider binary compatibility very important for our customers. Pthreads is partially implemented, but if you find a problem, please let us know. It is an ever-evolving suite of APIs, and they're getting better with each release. :Asynchronous I/O is missing, but we do have P-read and P-write. This allows for multiple threads to read and write from files without creating a lock around the L-seq call. OpenSSL is available, but CDSA is preferred for implementing encryption and secure transport of data.
There's a good reason for this, too, is that CDSA can abstract-- Well, in the case of OpenSSL, keys are handled directly by the library in the application, and that's just not always possible sometimes. Sometimes your key might be on a hardware device, such as a token or some off-board or PCI device. And with OpenSSL, that simply doesn't work. We also have Objective-C in the system. It's--if some of you have ever dealt with a small talk language, it's very similar.
If you want to start diving into GUI applications, such as Objective-C--COCO, you will be able--you will need to learn Objective-C. It just takes a day. It's not hard. And it's really nice, in fact. Also, in the case of lib tool, we actually had a tool called lib tool. We had it first. So in the case of the GNU lib tool, we've called it glib tool. And that's just something to run into when you type lib tool from the command line.
Also, in the case of mockifying your code, sometimes the POSIX APIs might fail you and you need to drop down to the Mach APIs. We hope that doesn't happen, but if it does, please let us know. But some things to be aware of is fine-grained control over process priority in the case of real time.
You might need to use the Mach APIs. In the case of bootstrap ports, it's a Mach concept, you might find that if you're daemonizing yourself on a pre-Jaguar system, that your bootstrap port is--no longer works. And that was an implementation detail that kind of broke daemon's launch from a GUI context.
So you might need to be careful about which context you launch your daemons from or directly deal with the Mach bootstrap port situation. Also, in the case of the P-Trace system call, we basically only implement the attached APIs. Past that point, you need to use Mach to control an application, but this really shouldn't affect anybody except GDB and possibly a few other apps.
But... In the case--moving on to mailing lists to contact us. We have lists.apple.com. There's more lists than are listed here, but the important ones to know about are the Apple CDSA list for talking about our CDSA technology. We have a Unix porting list for those of you that want to ask porting questions after the show. We have a Darwin development list, which is our kind of general forum for the development of Darwin. We have a Darwin OS users list. If you're just a user and you want to poke around and ask questions, this is the place to go.
We have a Darwin kernel list if you want to ask very kernel-specific questions, like possibly about the funnel or implementing a kernel and networking or I/O Kit extension. And in the case of development of the user level, we have a mailing list dedicated to that. So if you're more worried about libraries and, you know, bringing lib system up to date with FreeBSD and questions about that, that would be the list to go.
Websites, we have our premier open source website hosted at Apple called www.opensource.apple.com. We have the recently announced OpenDarwin site, which is a community-run website and development site for pushing Darwin forward from a community perspective. And as Jason talked about earlier, there's the Fink project for getting ported software already, and that's fink.sourceforge.net.
In the case of books that will help you along in your porting efforts, there is the UNIX for Developers from O'Reilly. You can find out where to get it from our developer website, TechPubs. There's the Inside Macintosh Macro Runtime Architecture, for if you want to get even more in detail about the Macro Runtime. There's the additional developer toolchain documents, so the typical info pages and whatnot you can get for GCC and GDB and whatnot are available on the developer website.
For the further roadmap, we have the Darwin Kernel session tomorrow morning, bright at 9:00 a.m. We have the security session, also in this same building, at 2:00 p.m. tomorrow. And we have our feedback forum tomorrow at 3:30 in room J1 of the main developer conference. So who to contact if you have some questions? I'm Dave Zarzycki. Jason was who was just up. And Rob and Jordan are going to be up for the Q&A.