Porting Unix Applications to Mac OS X - WWDC 2004

OS Foundation • 32:54

Porting Linux and Unix applications to Mac OS X is easy. This session discusses tools and APIs designed to make it even easier. We cover all the basic concepts in bringing Unix applications to Mac OS X.

Speaker: Dave Zarzycki

Unlisted on Apple Developer site

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.

Thank you very much and now if you would please welcome to the stage Dave Zarzycki. Good afternoon. Today, I'm going to talk to you about porting Unix applications to Mac OS X. One might say this is about porting, learning about the unique differences our platform provides if you're coming from maybe a Linux or a Solaris box or your favorite Unix platform of the day. Now, I assume you actually, since you're coming from another platform, have experience with Uniq. So when I use certain terminology, it won't be too foreign to you.

So what are we going to cover? We're going to cover the history of Unix a little bit, just to show where Apple comes from. We're going to talk about bundles, a unique Apple invention for packaging stuff, and packaging itself for software distribution. We're also going to talk about some interesting daemons on our platform that you may notice when doing a PS or wonder why, you know, just when you're poking around.

We're going to talk about standards conformance, since that's a very important thing to so many of you, since you are using standards APIs most of the time. We're going to be talking about the linker, the runtime, and other what we call frameworks. It's the whole stack of things you link against. File system portability. Our file systems that we have on our platform are very wide, and they support certain options, and others don't support certain options.

We're also going to talk about authorization and authentication. Obviously, in a multi-user environment, you need to be able to allow certain users to do certain things and not allow certain users to do certain other things. Finally, there's the development environment we provide, Xcode. And lastly, we're going to talk about Mach.

topics we're not going to cover. We're not going to talk about drivers or writing or I/O Kit. We're not going to talk about the GUI in any way, nor multimedia, printing, font handling, basically anything you can see, we're not going to talk about it. It's just about the bottom layers of the system.

Now, Apple's Mac OS X. We've got a long legacy of different OSs in the platform. We have some of our roots from BSD, the Berkeley Software Distribution of Unix, from way back in the day. We have Mach out of Carnegie Mellon, which was the microkernel work and took large advances into virtual memory and interprocess communication.

Finally, NeXT brought these two together and tried to make a product out of it called NeXTSTEP. And they succeeded fairly well in accomplishing the goals they set out to do. So much so that Apple bought it and produced Mac OS X, what you're using today. And Mac OS X brings in its own unique flavor of things to add to the mix with our Mac OS 9 legacy, with our Mac apps and our legendary ease of use.

Now, let's start getting into some of the details that make us unique. First and foremost, we have a notion called bundles. You're going to see them everywhere on the system. A bundle is just a user-visible atomic blob. When you see an application, you'll click, you drag it. You can't, but behind the scenes, it's actually a folder. The folder contains, or directory, whatever you want to call it. It contains executables, resources, pictures, sounds, you name it. Anything needed to support it.

The application is in that bundle, that one directory. Now, in the Unix world, you might stuff some of these things in user share. You might stick some of the libraries in user lib. They're all scattered throughout the system. We like bundles because it brings them all together in one directory. The users can move around and add to their system and delete at their discretion. Finally, we have two variants of the bundles, given our history. We have the traditional bundle. We have the name of your application, foo.app/foo.

That's how we'd identify it as a bundle. Now, in the Mac OS X timeframe, we invented the notion of a CFBundle, which is the name of your bundle slash, you know, the name of your application.

[Transcript missing]

Now, again, I'd want to drill down that we're talking about the basic system part of the system. We're talking about that layer right in there that you see around lib system and the top part of the kernel, meaning system calls, things like libc.

and some of the other various libraries like libm and whatnot that you're already familiar with. The things that make us unique on our platform is not only do we have libraries, we have these notions of frameworks. Frameworks are really just libraries, but they live in a bundle and they live in a different location than user lib. We'll talk about more later where they live on the system.

Now, if you want to get third-party software, there is various options available to you. We have the Fink project. I don't know if you've heard of it. But if you're porting, let's say, an existing open source project, let's say you grab Screen and you're trying to run it on 10, well, someone may have already done that work for you.

If you check with Fink or Darwin Ports, you'll probably find that the project you're trying to port is already ported to the platform, saving you lots of work. Also, as far as packaging is concerned, our own product, we have Mac OS X, our client distribution, and we have Mac OS X Server for our server customers. The server product is mostly value-add based around our file server technology and additional components to make the server experience more pleasurable. They are.

Not only based on our GUI to control applications, but things like additions to the web server environment like blogging software or whatnot. Finally, we also have Darwin, which is Apple's proof that we are committed to open source by releasing the foundation of the operating system in a fully bootable fashion that you can run on your Apple box should you desire.

Now, you're coming from a different Unix. You do a PS and you see some interesting daemons floating around. I want to talk about at least a few of them so you can understand some of them that we have on our platform and why we have them and why they're important. Let's start with NotifyD. NotifyD is a way that daemons can send messages to each other and not necessarily agree on the technology used to send the message. For example, our networking team has put together a daemon that actually controls how interfaces come up or manage, set up.

Well, some daemons want to find out when the networking changes, but they really don't need to know the finer details of how or why. So what happens is we send a Mach message over to NotifyD saying the network's changed. And NotifyD then sends, let's say, a signal over to one daemon.

Or they might send a message over a file descriptor to another. A lot of our daemons already on the system actually use the signal method to get restarted when network change events occur. They request that a HUP signal gets sent to them and, ta-da, anytime the network changes, they get HUPed. is the developer of the new UI application. He is responsible for configuring and controlling various pieces of hardware, but mostly centered around networking.

is the head of the network policy department. It's where our DHCP client lives. It's where some of our other networking policy decision making happens. It's very critical of the system and I would recommend that if you want to learn more about it, you can look at our system configuration framework which are the external APIs for talking to configd.

Also, if you look, you might see something called the MDNS responder. That's also known as rendezvous. This is what implements it. Some things talk to it to either browse the network or to send out advertisements. And that's its sole function in life. Finally, Lookupd. Lookupd is equivalent to NSCD if you're coming from a Linux or Solaris box. It's our name service caching daemon. It caches DNS lookups, it caches name lookups, group lookups, anything you can pretty much think of that's being looked up, it caches them. It also implements get at or info, for example, and does the advanced queries for that.

Now, let's talk about standards. Standards. Our policy, you know, is that we like standards. We want to comply to them, but we're not making a serious effort to go out and test all the standards. But, but, but, but, if you find a bug that is where we do not conform to a standard, please, please let us know.

We will try our best to fix it, because we do believe that standards are best for all involved, us and you as a developer. Now, for example, we do know about some bugs that we haven't gotten around to fixing, and one that might affect you when you're porting an application is that we don't set error no in Mathlib. It's unfortunate, but true, and it's something you can't do. you'll have to work around.

Now, let's talk about some other interesting details we have. We have a different linker than you might find on a Solaris or Linux box. We have MachO as our file format and DYLD as our dynamic linker. Linux box might use ELF as a file format and LD.SO as its dynamic linker. What does this mean? Well, some behavioral differences. For example, versioning.

On our platform, what might be considered the major number in the Linux or Solaris world is the path on our system. If you do an O tool dash L against an executable, you'll see the full path to a library on the system. If you want to do a binary incompatible change, change the name. This is a big difference because that means that if you redirect through a symlink, you'll get a performance hit, unlike the Linux world, which has an LD cache, which caches the symlink translations.

Also, we have bundles versus libraries. In the Linux and Solaris world, there is no difference between a plugin and a library. It's just whether you link to it at link time or you load it at run time. It can be the same file. On our platform, that's not true. You have to specifically compile a plugin as a plugin to load it in that manner.

Now, what does that mean? You need to use the dash bundle flag. Also, if you want to make your life easier as a developer of a plugin or a bundle, depending on how you want to call it, you can use the dash bundle loader flag. What that does is specify the actual executable which is meant to load your plugin. So that way it can resolve symbols and make sure that your plugin is fully resolved at link time rather than just letting it slide. Some symbols are undefined and we hope that it works out. runtime.

Also, a unique difference that we have is the two-level namespace. We invented this technology to help us deal with binary compatibility going forward. What it means is that a library, let's call it library foo, might use malloc from our lib system. Now, let's say you include a malloc in your product, in your application. Well, only your code is actually going to end up using it.

and the other two speakers will be talking about the Malik symbol. There is a direct reference from that library to the Malik symbol. It is in a global namespace. You can, if you need the semantics that if you define a Malik, it will be overridden everywhere in your address space, you can say dash force underscore flat underscore namespace. This will tell the dynamic linker to collapse things down and make sure that symbols are the same everywhere.

Also, a very interesting thing about our platform is that we're far more dynamic when it comes to dynamic symbol resolution at runtime. What this means, let's describe a classic bug, is that Sendmail, for example, had a signal handler. The signal handler wasn't resolved fully in the sense that the functions it called hadn't been resolved yet. Now, Sendmail's running along, calls a function which triggers the dynamic linker to start resolving it, and then the signal fires.

Well, we're already in the dynamic linker trying to resolve a signal, and what happens is now the signal handler's running, and it reaches a symbol that needs to be resolved, and now we've got two instances where we're trying to be in the dynamic linker at the same time on the same thread.

And you deadlock. This is a neat difference. We're far more dynamic, but it can be problems if you're porting code. What I recommend is you just do dash bind underscore at underscore load, and that'll tell the dynamic linker to make sure that all your symbols are resolved before you hit main.

Now, when it comes to forward and backwards compatibility on our platform, we have something that will make your life easier. We have something called Mac OS 9 weak symbols. What it means is that you can say if symbol, call symbol. So there might be a function you want to use, but it's not available on a previous release. That's okay. Just test for it. And if it's there, you can use it. And if it's not, you can't. It's a very powerful tool for dealing with forwards and backwards compatibility.

So, the first thing we're going to talk about is static linking. Static linking is a very common problem in the Mac OS X platform. Static linking of standard libraries is impossible. We don't ship them. Sorry. It's just something that we find is best for our needs, is supporting you, and, yeah, it's best for all involved, we believe.

Now, I'm not done talking about the linker yet. There are some other interesting behaviors that are worth talking about. Symbols in an object file must be overridden together. In the Elf, Linux, Solaris world, you can, for example, override malloc and not override free. Now, that's okay if you never called free, but if you actually do call free, you're going to get a different implementation. Their backing stores might not even be the same.

One can imagine that horrible problems will start to ensue when you do that. Well, on our platform, the way we solve that is we put malloc and free and realloc and the other associated functions all in the same .c file, which get compiled down to the same .o file, obviously. And the rule that our dynamic linker enforces is that if you override one symbol in a .o file, you can't use any of the others there. You have to override them, too.

Now, that might not affect you. You might not be overriding symbols all that much, but it can potentially affect you if you put unrelated pieces of code in the same .o file. It's something to watch out for. Also, on our platform, common symbols are problematic. I just recommend that you do -f no common. Try and make sure that you're just not using common symbols. They're just a bad idea in general anyway.

has a couple more things to talk about. Bundle unloading is currently unimplemented. Sorry, you can't unload a plug-in, basically. It will still reside in your address space. Another interesting issue to be aware of is that if you're using C++ in a bundle, our static initializers are not being called at the moment. Again, it's an issue to worry about when porting code.

Now, if we can move outside of an actual address space and just talk about general runtime issues, we store configuration and resource files not necessarily in the same place on Mac OS X. We store them in some different locations. We have Open Directory, which is our great multiplexing, demultiplexing engine for getting preferences from different places.

And we'd recommend that you use those APIs if you want to be a good citizen on our platform to find out about your various configuration parameters or general system configuration parameters. Also, instead of having a lot of dot files in your home directory using different parsers and just different altogether code, we have CodeForce.

We have a lot of other things that we can do for you that can make your life easier. It's called the CF or NS Preferences APIs, and they put them in tilde slash library slash preferences, and it'll save you a lot of work of dealing with how to save preferences and restore them on our platform.

Now, when it comes to system startup, we have some differences compared to a Linux or FreeBSD or Solara system. On those kinds of platforms, everything essentially calls out from etcrc, and you'll just have shell scripts calling shell scripts calling shell scripts. And what will happen is that when it comes to what order to run things in, the Unix world is pretty static still. All they do really to accomplish proper initialization is just make sure the shell startup scripts are named in serial order. So you might have one foo, two bar, three baz. Eh, that's, you know, pretty not very dynamic. And we've done some things to improve that.

We created startup items. Again, they're bundles. We've talked about these before. They're stored in the system library startup items. Each one of those bundles contains a little blob of data specifying dependencies. This means we can dynamically start up the system and boot things. When they're ready. In fact, we even boot some startup items up in parallel since they don't have dependencies on each other.

is going to be talking about booting up the system. The boot up is changing. Unfortunately, the session that talked about that was yesterday, session 106. It's being replaced by something called LaunchD. One can think of it as I Net D on steroids. It's going to be able to support any daemon on the system and make sure that it gets launched on demand when any configuration detail of it changes.

Some more runtime issues. In fact, I just talked about pre-binding with one of the Xcode people right before this session. So a lot of this slide doesn't really matter anymore. Pre-binding, but for the purpose of this session, I'll still talk about it. Pre-binding. It was a method we have to increase performance of runtime of your application.

What we did is, since we know of all the libraries on the system, we can pre-calculate where they will load in your address space. Once we do that, we know where all the symbols are, and we can then record in your executable the linking information. Let's say, again, use malloc.

Well, we know that's going to be 9 kabillion something, and we can record that in your executable. That way, when we load, we check that the state of the world hasn't changed, and if it hasn't, we just let your application go because we've already pre-linked you. percent. Unfortunately, what this means is we've modified executables and libraries all the time. Whenever a library changes, we go modify everything and reprebind everything.

That's a problem for some people. It means that backup and security tools get false positive for changes. Things like tripwire to do intrusion detection don't work because, again, they're false positives for changes. How is this all changing? Well, we've written a much, much, much faster dynamic linker in Tiger, so much so that we believe that you as developers no longer need to worry about pre-binding. I'm hoping that will make a lot of you happy.

Now, this is more of an issue if you're thinking of a 10.0 or 10.1 box, but our bin SH changed. It used to be Zish. Zish was almost POSIX compliant, but not quite. We switched to Bash. We think Bash at the time happened to be faster, too, which was a nice perk, but that's not true anymore. But Bash is POSIX compliant, and we like that. But if you were using some Zish-isms, you might need to accommodate the change. If you're obviously writing a portable shell script, hopefully none of this should matter to you.

Frameworks. Frameworks is something I touched on earlier. It's a bundle-based alternative to the hierarchy of libraries and resources. It contains the library and headers but it could contain other shared resources too. Let's say your library is a GUI library and it has pictures and sound. It can be in your framework. When it comes to actual compilation, though, there is a difference. You don't say -l foo to link against the framework. You say -framework foo. That tells the compiler to look in a different location for it.

On the topic of file system portability, POSIX file systems are typically case sensitive and they support sparse files, although that's not necessarily universal. Our native default application that we have, default file system for Mac OS X is HFS+. It's case insensitive, case preserving. That means that you can have big foo or little foo, but you can't have both at the same time.

We also support resource forks and the API for changing the file system. We have a lot of tools that are not in the same format, used to be out of the way and Unix applications wouldn't know about it, which created issues for backup tools or anybody copying files using Unix tools. This has changed in the Tiger timeframe because resources are now going to expose as extended attributes and any Unix tool that is aware of extended attributes can deal with resource forks correctly.

But it is something to be aware of considering that extended attributes are now coming into Finally, hard links are not supported by all file systems. It's something that your application might need or use, but you need to be aware that while supported by HFS+, some of the other file systems we have on the system don't necessarily support it, like the WebDAV file system.

Now, something that is, again, unique to our platform when it comes to the file system hierarchy is the way we do scoping. We have four primary scopes on the system. We have the system scope, essentially software that's shipped by us that you shouldn't ever need to touch or manage. We have the network scope. It's where a network administrator might put anything they want to show up on all machines. We have the local scope, anything on the local machine. And finally, the user scope, something in the user's home directory.

How does this affect anything? Well, you remember how we talked about frameworks? They can live in all four of these scopes. You have system library frameworks. We have library frameworks, which is the local case. We have network library frameworks for the network case. And you can, as far as I know, put frameworks even in the user's home directory, should you desire. Again, tilde slash library slash frameworks. When you're developing, let's say, a framework or really any application that wants to put files on our file system, we would recommend that you try and conform to our file system hierarchy for where to place files.

On the topic of authentication and authorization, we have the security framework. It's our preferred API for doing things. It has a lot of support for advanced technologies like smart cards and other interesting authentication mechanisms like fingerprint readers, you name it. They're trying to support a lot of the advanced things that many organizations want.

It's also a capability and rights-based system. I won't go into the details of that, but it is important to keep in mind if you need to do authentication and authorization. And finally, while we do have the security framework, we do have PAM available if you need a compatibility solution. Either you have a PAM plugin or you're using the PAM APIs. We make sure that the authentication and authorization can route correctly if you need any kind of PAM compatibility.

Our development environment is a lot of familiar things and some different stuff at the same time. It's mostly a GNU toolchain. It's GCC and GDB and Make, things you're familiar with. But we do have some differences that are good to be aware of. For example, we have the C preprocessor. We've modified it to support precompiled headers. You might see these in user include when you maybe say an ls star dot p.

You'll see some precompiled headers there. It helps us when compiling large applications that are including lots and lots of headers, so we don't have to regenerate all the C preprocessor paths. Unfortunately, the C preprocessor doesn't support all of the GNU extensions to their C preprocessor, which some of you might have consciously or subconsciously started using. How do you work around this? Well, it's dash no dash cpp dash precomp. That'll get you your old GNU linker back.

Now, Xcode, obviously. There's been many, many sessions about it this week. We highly recommend that you use it. In the case of debugging, it has a very powerful visual debugger that we would highly recommend that you use. And you don't necessarily need to go to a lot of effort to port your application's build system into Xcode. Xcode supports legacy targets. You can just say, look, there's a makefile over there.

Just go build against it. And you can keep your legacy build system, which is probably desirable. It's not desirable to you if you still want to maintain portability. But you can take full advantage of everything that Xcode has to offer with code searching and debugging and lots of yummy things like that.

Finally, some of you have some very, very, very big apps out there. So much so that you run into a nice, interesting architectural anomaly on our platform. It just requires some different code generation. If you need more than 16 megabytes of text, you need to use dash M long call on the compilation line to make sure that your code gets generated correctly. Otherwise, it will fail to link. Again, it's just something to be aware of if you have a large executable.

If we talk about APIs now, we're not completely like other platforms. We have some things you need to be aware of. If you're using Poll, it happens to be emulated on our platform and we highly recommend you use KQ instead. If you can't, though, at least it's still there.

But at the moment, it's emulated via select. So if you're familiar with those two APIs, you might be able to imagine the scaling problems we have with that current emulation technique. But if you don't need a large number of file descriptors, it's okay to use our poll emulation at the moment.

Also, DL Open for loading plugins, again, it's emulated. Does this affect you? Well, again, as we talked about, unloading plugins is not supported. And if you're taking advantage of the RTLD Next functionality of DL Open, we don't support that at the moment. That's something to be aware of.

Other issues we have that you should be aware of is that Pthreads is partially implemented. It gets better and better with each release, but you need to be careful when you look for functionality. The biggest thing to be aware of with Pthreads is we don't necessarily support inter-process sharing of locks, for example. Something to be aware of.

Also, POSIX message queues are missing. If you really, really need that kind of functionality, our current story is that we recommend that you use maybe Mach ports. But we hope that you can find a different API. Also, when it comes to System 5 shared memory, we support it, but it's very weak at the moment.

If you look in our boot up in /etc/rc, you'll see that we statically set the variables for that. And once those System 5 shared memory variables are set, they can't change for the life of the boot. So if you need those changed, you need to be aware of that.

Finally, well, not finally, while we do have OpenSSL on the platform, we would recommend that you use CDSA. OpenSSL has some architectural limitations when it deals with, for example, smart cards. OpenSSL will hand the key around directly, whereas CDSA supports having the key being represented as a handle, which could be connected up to a smart card somewhere else where you don't actually even have a physical access to the key.

A unique language on our platform, a language opportunity, is Objective-C. Some of our frameworks are implemented in it, and you might need to use it. I actually recommend it. It's kind of fun to use. It looks like small talk, and it'll probably only take you less than a day to learn, so I wouldn't be scared of it. And you might enjoy it.

Finally, LibTool versus GNU LibTool. It's unfortunate, but these things happen. We have a name conflict. If you run LibTool on our platform, you'll get a tool that we wrote many, many, many years ago at Next called LibTool for dealing with library generation. Has nothing to do with the GNU LibTool. They're not even really functionally overlapping, but it's just a conflict in names. And that means on our platform, you need to call, if you need to use GNU LibTool, you call it glib tool.

Now the last thing we're going to talk about today is Mach. Our story is if you need to. We don't want you using Mach if you don't need to. Mach has APIs, for example, for allocating memory in your address space. VM allocate and VM deallocate. We'd rather you not have used those APIs. We have perfectly fine POSIX APIs for doing that. It's MMAP, for example.

But sometimes you do need to use it. If you need advanced control of your process priority, that's the currently best mechanism for doing that. Also, if your process ends up using libraries that use Mach, you need to be aware of the bootstrap namespaces. It's a mockism. I really don't want to go into it at the moment. What it means is that different processes in your login session have a different context mock-wise than, let's say, a system daemon.

So if you need to talk to another daemon, this might cause problems for you. Finally, when it comes to traditional Unix APIs that are non-standardized, ptrace is only partially implemented on our platform. We only implement it enough to do an attach. After that, GDB on our platform, for example, uses the Mach APIs for getting and setting and controlling the process it's introspecting.

I doubt very many of you are using ptrace, so that probably isn't something you need to worry about. Who do you want to talk to after this session is over and after you leave WWDC? You want to contact Jason Yao, our Track Manager. He should hopefully be able to direct your inquiries to the right people around Apple if you have any questions.

Also, we do have some documentation offline if you need it. There's our porting Unix slash Linux applications to Mac OS X. There's the, this tells you where it is. We also have the porting drivers to Mac OS X if you're coming from a different platform. And again, this is the locations for those pieces of data.