Designing for Security - WWDC 2006

OS Foundations • 1:03:35

Mac OS X is the first mass-market operating system built from the ground up with security in mind. Discover how to utilize best practices for key management, code execution, default configuration, and controlled execution in your secure application or network service. If your application manages sensitive data, provides a network service, or accesses the network, you can't afford to miss this session.

Speakers: Simon Cooper, Conrad Sauerwald

Unlisted on Apple Developer site

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript was generated using Whisper, it may have transcription errors.

Welcome to session 411, Designing for Security. We’ve got a lot of very exciting things to tell you about today. There’s going to be two presenters. Conrad Sauerwald from the Data Security Group will be talking first, and then I’ll be taking the latter half of the session and talking about some other exciting things. So here’s what we’re going to cover.

We’re only going to be highlighting some key points. We don’t have a lot of time to go into great detail. We’re going to talk about system facilities and techniques you can use, and hopefully we’ll give you some takeaway points that will at least help you with some of your issues. So here are the areas we’re going to cover.

We’re going to cover data security, then system security, secure coding, and then we’re going to talk about process sandboxing, which is a technology preview of what we hope to put into Leopard. So here we go. I’d like to introduce you to Conrad Sauerwald, who’s going to talk about data security.

Hello there. So I’m first going to talk about a topic that doesn’t come to mind that quickly when people talk about security, which is data security. When you say security, you’re often immediately looking at buffer overflows and such. And while as a user you prefer that your system runs smoothly, beyond that, if you start getting a little bit into the user’s perspective on things, you actually start looking at the user’s data and seeing how that is actually really valuable and not that often thought of when we talk about security. The topic obviously ought to be like you’re writing an application and you’re managing the user’s data. What can you do to keep that data safe?

So first question, do you manage sensitive data? Well, you know, that’s kind of a question that has multiple answers because obviously often you will have data that can be sensitive and other times it may not be. It really depends on what the user may do with your application. If you write a word processor, well, you know, they are obviously going to write a sensitive document with it at some point in time, but it may not necessarily be the core strength of your application.

In those cases, users have, of course, the possibility of using the built-in facilities like FileVault and encrypted disk images to take care of keeping their data safe themselves. Beyond that, applications may not just manage the actual content, the actual data to be kept safe, but also passwords, symmetric keys, and more these days as well, private key pairs or identities for the user that they use to use certain services and that can then be stored in the keychain.

To go a little bit beyond using a keychain for passwords, the benefit of keeping identities in there as well is of course that if you use a public private key pair that is stored in the keychain, then the actual crypto can be done in the security server so that the secret is never given out, so that is an extra strength of using the keychain for those types of things.

Now beyond that you may go like, “Well, that’s all great and nice, but why don’t I just roll my own?” Well, we really want to discourage you from doing that because it’s just hard. So I’m going to go through a couple of slides and try to give you a better perspective on what we offer and what things you’d need to take care of in case you actually think of doing this.

So first one, that’s really one that you cannot stay away from. It’s like the data handling. Once you hold the sensitive data, you’ve got to make sure that there’s no traces left behind, and this is really hard to do because often you’ll take a piece of sensitive data, you’ll use it through an API, it creates a nice object for you.

You didn’t get to say what allocator to use, how this was going to be stored, and when you say like, “Hey, this object I no longer need it,” how is it going to be cleaned up? So that’s a really hard area, and the only reasonable possibility you have right now to handle that is really system-wide and just turn on secure VM to make sure that anything gets swapped out to disk gets encrypted.

But beyond that, you get into areas where if you’re rolling your own and you have a deficiency in your protection scheme, well, there you go. There’s a big portion of maintenance for you as well as now you’re dealing with the key material yourself instead of having security server, as I mentioned, when you use it through the Keychain as a third process that can handle this so that it only gives you results that were produced out of using this key.

So why use the system facilities? First of all, this is a lot of work. It’s a lot of code to write. Now, once you’re done with writing it, the next step comes, which is really testing this whole stuff. And testing is actually not as simple as you test your application.

Your application works or it doesn’t work. Testing it also becomes partially seeing if you can break it, if you can circumvent it and all these other things. And that’s going to add to the time that it takes to test it. And then, of course, there’s the maintenance. And maintenance is not just make sure that it works, but you’re working with a lot of stuff that changes in the world. Certain hashes are suddenly not that good anymore. Now you sign up for the work of keeping this secure and trying to run that arms race with what is secure today.

So system facilities then also provide all of their stuff through an API that kind of hangs around, and it goes through multiple versions of your application that you role, so it will always be there. And as it improves, it will try to implement the best it can do on the back end while you just have this one API to work towards.

Another positive aspect of putting passwords in the keychain is that when people start using a particular application and they use a username password to connect to a particular server for some service that they use, then they switch to a different application. Say you’ve created a much better application that they’d really like to use.

If the password is already in the Keychain, that actually makes it really easy for the user to just say, it’s OK that that application starts using it too. They’ll get a dialogue. They can say that’s OK and then be on their way. So that’s a good usability aspect. And the other thing that even if you built a whole security system, the one thing that you probably don’t have time for is to also scale it, make it appropriate for your audience.

If you look at a standard install of OS X, the first thing that happens is you have an auto logged in account. Well, it’s all great and all, but your keychain is now unlocked, and that may be perfect for you because you’re not connected to a network. You don’t really care.

Now, if the user decides, I have a funny feeling about this. I’d really like to just have a password to log in. They can turn auto login off. As you continue, you can start setting stricter access control lists on the items so that applications that want to your passwords at least show up as trying to use it.

And from there on out, you can start having multiple Keychains to have different classes of information that have different strengths of passwords. One can be a simple one you can type easily. One can be a much longer one that you don’t use that often, and then it all remains usable.

So if that doesn’t scare you and you’re continuing on, let me point out a couple more things that you run into while you’re implementing this yourself. Obviously, you can read up on this because every system that’s out there has been written about to prove to the world that this is it. This works. Everyone has seen this, so I’m pretty sure that there’s no problems with this. But that’s a lot of work.

The second one is that-- Cryptographic protocols go through a lot of revisions. I mean, most of the protocols we use today have had changes happen to them. And you have to go through those revisions. So this goes back to that maintenance work. After you implement the first one, there’s the second one that you already have to do.

Third one, you may want to make it easy. A user can enter a password to use the service. Well, that’s great. Passwords are, however, not good key material. You have to do something with that password to make it good. I mean, someone can type an utterly long password, but that doesn’t make it a key yet.

I’m going to skip the password-based key derivation function for the moment and just continue on saying, if you have a lot of data lying around, say like a 5-volt image, and the user decides, you know, I don’t like my password, I really want to change it, you really want to deal with a master key, a key that actually encrypts the data, where the user’s password only gives access to this master key.

So beyond that, though, some of you may not be looking to really just write all of the system. You are just coming to the platform with a bunch of code that needs to use some basic algorithms. And we’ve obviously advertised CDSA many years as a very generic API through which you can select whatever algorithms you want to use for doing all your hashing and your encryption purposes.

But that is not a very drop-in replacement, not something that’s very convenient to use. As CDSA was used through the system, obviously it goes to many lower levels. And at these lower levels, you don’t want all that complexity of the stack anyway. You actually want some more raw performance.

So out of all of that, we’ve basically distilled the algorithms and pushed them all the way down to Common Crypto, which is actually a library that becomes part of LibSystem that contains the algorithms. Now the rest of the system actually uses this same library. So it’s already there. You don’t have to link in another library. You don’t get into link another library that you have to stay compatible with. It’s just there. And it contains algorithms.

Since Tiger has had cryptographic secure hashing algorithms In there, you may remember Common Digest from there. For Leopard, we’ve added-- and this is actually on the seed already-- HMeching. So if you’re choosing to use hashing as a method of authenticating a piece of text that you want to do integrity checking on, you want to use HMeching. And encryption. The API is generally pretty much compatible with OpenSSL. It’s just initialize, feed data, final step. That’s it.

To contrast CommonCrypto to OpenSSL a little bit and give you an idea of what our thoughts are around that, OpenSSL will obviously be there. It’s there for portability. Some of the algorithms, we also provide some of them. We don’t. Depending on your application, it can be much easier just to use it. It does, however, come with the out-of-the-box performance. We have the benefit of looking in CommonCrypto and seeing what alternative implementations of these algorithms have been done that are possibly faster.

We have ways of modifying it to our own content, so to speak. And the last one is really a public service announcement. If you are using OpenSSL, OpenSSL has recently changed from a version 0.9.4-- well, not recently, but we’ve recently updated to 0.9.7 from 0.9.4. And unfortunately, those are not API compatible, although there is another symlink that points to a 0.9 version. So technically, that should have been a major version. So you may run into some linking issues as you go from Tiger to Leopard if you’re currently using libcrypto.

With that, I want to quickly show you the new APIs that we have. So for encryption, we obviously offer the basic suite of algorithms. One thing about all the algorithms that are in here that I want to make clear is they’re there for older purposes, too. If you just happen to need some really ancient algorithm, it’s in there, too, just for that reason. It’s not something that we recommend over the other ones. You still need to do your homework and figure out which ones you want to use.

When you use encryption, it also turns on cipher block chaining per default for the block cipher status. There is still the common mode in there that avoids the block cipher chaining if you want to implement alternate encryption modes. And if you want to use it, it also provides padding in case the amount of data that you’re trying to encrypt basically will never be exactly a multiple of the block size of your block cipher.

So padding takes care of that to basically put in the message some pointer to kind of suggest what the actual size is. Now you don’t really have to understand that because as you use this, there is the get output length function that will tell you what the actual length was, and that will look at the padding for you. Now if you are trying to encrypt something and you are worried about the data handling, as I mentioned earlier, there’s also a way for you to provide the data that’s going to be used in the context.

So the one that you already know from Tiger is that we have the secure hashing functions in there. The difference is that that one was named with the actual algorithms in the name instead of actually taking that as a parameter in the init. The more important thing that I wanted to point out, if you have a program that already uses OpenSSL and you just want to give Common Crypto a role, So if you do a pound define of common digest for OpenSSL and then include that file, it will actually define some aliases, which will make like all the invocations of OpenSSL that we support actually directly go through Common Crypto instead.

And last, of course, HMAC. Not much to mention about that. The common digests are all included. So that is all I have for data security. Let me continue with system security a little bit. Now the one bug to destroy it all sounds a little bit cynical, but if you look at it, it starts really with the whole story. Your system is as secure as the whole is.

So you only have to find that weakest link. And if you’re writing some secure system, everyone is looking for it. So it rings true when you are trying to write a crypto system or you’re even just trying to deal with the use of resources and operations that are restricted.

So I’m going to show you a couple of example system services that actually do some of this handling through trampolines to give you an idea of how we already support access to some of these privileged resources and operations that you can also use without having to write your own programs to deal with those. And after that, I will go into a more general description of how you can structure your application to have basically a similar architecture.

So the privileged operations kind of are, should be pretty known to you at this point. They’re common for most of the Unixes. There’s, you know, the permissions that can restrict you from modifying files. There’s the ownership that requires privileges. There’s ports below 1024 and raw sockets that needed. One note to make with that, of course, is like that is not a security function. Don’t ever use, you know, hey, someone came from a port below 1024. That’s a good way to find out whether they are a trusted user of that system.

Then there’s also process management. Normally, you shouldn’t be running into having to kill this process, kill this client, if you actually manage all the pieces, because you can obviously, within your own protocol, tell these pieces to go away. But sometimes you’re wrapping around something else, and then, of course, that comes into play.

And the more obscure last ones are really like if you want to tune the kernel parameters or if you’re trying to load kernel extensions. But you don’t need to be rude for all of these because we have a couple of these trampolines already in the system, and they will actually deal with some of these aspects. They do this by having a trampoline, which is kind of like an extra process that has the privileges to do this restricted operation.

Now, as you want to ask this process to do it, You basically tell it, for example, the auth open one, you tell it, hey, I want to open this file in read/write mode, can you do that for me? Auth open then looks and uses authorization and then asks the user to authorize opening it. And then through a Unix domain socket, manages to pass back the file descriptor to the original process, which can then continue to function. Permissions are checked on open. So from that point onward, you can do whatever you want.

The trampoline at this point has done its work for you based on the user authorizing it. Same goes for bind. If you have a port below 1,024-- and that has a nice comma in there-- we have a trampoline that also will do the binding for you and pass you back the file descriptor for the socket that is now bound to it. Works the same mechanism like auth open. One that might be more familiar to you is the trampoline we have to execute applications to run as root.

Now we’ve talked often about the problems with using a trampoline like that and how the off sample describes to you how to avoid this, but if there’s only one initialization you do, it’s a perfectly feasible mechanism for running this initialization step without having to ship a saturated binary.

So that’s a privileged trampoline that works slightly differently. You run the trampoline and tell it what you want to run, and then the trampoline authorizes and then execs your tool again so that there is a chain between you and the process you asked to start. What it can also do is then inherit a file descriptor through this chain of processes so you can communicate over it. So that’s a different way of doing a communication. And then finally, kind of obvious, but if you have files to install and you need them to be laid down in the file system, installer is in the world. That’s its only purpose.

So let’s have a look at how you would structure your own application, and maybe it gets a little clearer on how we arrived at this trampoline model. So in the beginning, there’s your application. It’s nice. It’s happy. It’s sitting there. But you want to use a resource or an operation, and those are technically limited, because, well, otherwise we wouldn’t be talking about this.

So as your application needs to access this resource or operation, well, the most obvious thing is like, hey, let’s run it with privileges. Oops. This is a big application. It contains a lot of code. As you remember in the beginning, I talked about having one bug in the system is going to take everything down. Well, that application is a pretty big system right there.

This privilege is very crude. You have the privileges or you don’t have them. You cannot really temporarily turn them off securely and then just regain them at a later point in time. That’s just the reality of the system we deal with right now. But if you look through your application, you can probably iterate through the right items in there that actually need these privileges. You can define the operations that you need to do that require the privileges. So if you structure your application such that you can identify those as individual functions like do it here, then you can start lifting them out and putting them in a separate process.

So we put them in a tool and we move them out of the application. And now the application can run without these privileges again. The tool is the one that has the privileges and that will do some limited work. It has a limited amount of code, so it’s easier for you to audit that it will behave as you expect it to.

Now, as you move it out, there is another point that you need to take into account. Your application may have been linking with an external framework. And these frameworks, generally, when you start with an application, are fairly high level. High level applications are nice and flexible. They have lots of functions, and they have lots of interesting implicit behavior that you maybe did not know about. So when you’re trying to create an application, or in this case, the tool with the privileges, to operate in a way that you accept, it’s going to be hard.

So the framework in the application, that’s just fine. But if you move that tool out and have that tool use a high level framework, you’re in for a lot of ugly surprises. So you have to try to keep that tool limited to using the lower level frameworks. I cannot really give you a safe frameworks list because we don’t really keep one. But you can see what the reality is on the system as it exists.

So let’s move that resource or operation thing out of the way for a second because the tool has kind of taken its place as being the one that has the privileges and that we have to focus on for a little bit. So now that the tool is outside of the application, we have one other thing that we have to fix here.

And that is, well, the application needs to make sure that it talks to this tool. It needs to make sure that it is talking to the tool that it expects to do the work that it wants to do. There has to be a relation where the application knows who’s going to do this work.

And on the other side, the tool, of course, has to go ahead and see, well, who am I doing this for? It now completely lost context. It’s out there by itself. It doesn’t know who it’s working for. So somehow the tool has to also be given a form of context so that when it authorizes operations or authenticates people, that it knows who I’m doing this for. So let’s talk a little bit about this inter-process communication.

There’s a bunch of low-level communication mechanisms that you can use. Now, one that I already described a little bit and I will go into now is the one where you can inherit a file descriptor. Now, if I make this tool such that my application launches the tool, then I have the parent-child relationship through which I can gain a little bit of information.

But one other thing that is possible in those situations is that if the application opened a file descriptor to a file that does not have a backing store in the file system, it can have put some information in there. Now that information is only accessible through this file descriptor.

And as you execute into the tool, the file descriptor goes to the tool, and the tool can now also read it. So if the two have an agreement that the application writes it in, starts the tool, and the tool can read it out again, then you have a way of passing some information through.

One particular instance that uses this is the authorization execute with privileges trampoline. And of course, this is all in the Darwin sources. So if you want to have a look at how that would work for real, you can have a look at that one. Another mechanism you can use is communicating over Unix domain sockets. Now this has very limited possibilities. It’s a little bit better than a file descriptor, which really is kind of a one-shot operation, while the communicating over domain sockets, at least you can keep a couple of sessions going and you can go through a couple commands.

So, um… With that, though, you have the opportunity of finding out who your client is. Because if you use, for example, a streaming socket, which is the one that you have to use to get this, local credentials can be passed along. So your tool getting this request can actually figure out what the UID was of the user on the other side that asked you to do this operation.

A much higher level one, and you notice that I’m skipping Mach messaging here, is distributed objects. That’s a very high level tool. I put it on here not as a glowing recommendation of trying this out, but that’s one that you may already be familiar with. The unfortunate problem with it is that it’s so flexible that for you to try to use it safely requires a lot of research and a lot of work.

So it’s very, very tricky. So if you don’t note them into every detail, it’s going to be hard because, well, it’s a very flexible system. You send messages. They get delivered to whoever actually can implement them. Do you know whether you’re talking to the proxy? Are you talking to the real object?

As you find beyond that is, of course, that these invocations are very much unstructured. Anything could be sent across, but you want to make sure that only the things that you allow in your protocol actually can send across, and for that you’d have to use a protocol checker. That, in short, is my recommendation against it. All in all, this is really about trying to secure the endpoints of this authentication. between these two parties and for the server to find the tool and for the tool to figure out who its server is.

So one other mechanism you can use, I’ve described these tool and server connections where the only thing that they really shared was the UID, which is nice and all, but that doesn’t really tell you much about the fact whether the user did this himself. If he installed a piece of malware that started running, then the UID might have just been-- the user never accepted that this was going to run, that this was going to happen. The application just ran with the privileges of the user that came across it. A way to make this more flexible would be to use authorization.

I’m going to quickly go over the principles to give you an understanding of what it is, because of course, we’ve talked about this in many sessions, but there’s new people coming to the platform every day. So authorization is basically about the question, do I have a certain right, and getting the answer yes or no in response to it.

Now, it does not imply any privileges in itself. If you have the right, there’s no way that anyone is going to see this. You can share this handle that you have for this authorization, pass it to someone else, and then can figure it out. But the system by itself does not imply that you have privileges. In that, we have a third party, the security server, which is the process called SecurityD these days, that operates as the middleman.

So it will sit in the middle. You will ask it to get authorization to get a right to perform an operation, then you pass it to the tool out of our example, and the tool will be able to ask the impartial middleman, like, hey, what do you think? Does he have the right to do this?

And if everything went well at that point, the answer will be yes, and then the operation can be performed. Now the right in itself does not do anything, so SecurityD has a database in the back end in ETSI authorization that contains a description that tells it what to do for this particular So that basically gives you an out.

And for administrators, you could ship, for example, a very strict policy just for your own benefits that you don’t have to deal with any well, insecurely shipping your application and allowing the administrator who may not have such high restrictions to lower that, the policy lives now outside of your application, outside of the tool. It’s kind of inspectable by the administrator to see what you normally require and how they might want to change that. Another thing it does for you is transparently handle the UI.

So the UI we handle through a different process. So neither your application nor your tool actually needs to have real possibility of providing GUI. And both could be just command line tools. The only thing that is required is, for right now, that the client that is in this operation actually lives in a GUI context. Because the only way that we know how to interact with the user is in a GUI kind of a way. In the future, there might be ways that we improve that, but for now, there’s not.

And a third feature that is kind of interesting is instead of giving one user the authorization to do a particular operation, you could actually just avoid it. And for once or twice that he requires that kind of authorization, you could come in there, type your username and password for them, and allow them to do this operation once. So you can restrict it even further than just giving people access to it all the time.

So now quickly moving back to the situation that we ended up with, we have the tool living outside of the application. Now, how does that tool have these privileges? Well, obviously, it needed to get them somehow. So most often, the tool will have to set your ID bit set.

Well, this is, of course, the start of a new problem because that tool lives in a hostile environment. Everyone can just run it, and it will still have those privileges. It’s not just your application that’s trying to use it. So let’s have a look at a couple of the threads that exist.

So there’s the tool, and there’s some environment, and well, there’s a couple of problems that happen with this. Some are more obvious. Some are slightly more intricate. The ones on the left, arguments and environment variables, often seen as basic user input that you’d have to check. Well, that’s all good and well, but it goes a little further than that. Say that you have a framework that has some implicit knowledge of, hey, if that argument gets passed, I’ll do something special.

So an argument can have unexpected results for you, and that’s a reason why that needs to be kept in check. Same goes for environment variables. I was just talking about high-level frameworks, but DOLD supports an option where you can kind of insert like a piece of code and say, like, go ahead, take that along with you.

Now, I have to say, of course, with that one, this situation doesn’t exist because the framework will actually, the library in this case, will actually figure out, like, hey, you’re running with set-duty privileges. I’m not taking any of this random user input that might have been faked. I’m just leaving that alone, and we’ll just continue as normal. I mean, we’re turning that feature off for this particular situation.

The third one, kind of surprisingly because I’ve described it as a way to securely talk to your tool, are file descriptors. Now the case may be that your tool wants to write out a standard in, standard out, or standard error. Well, those have the notion of canonically sitting at these file descriptors 0, 1, and 2, but that’s just convention. That’s not necessarily to be there. So if you do not use standard in, standard out, and standard error, you may want to dupe those to def null and make sure that nothing leaks out of there accidentally.

And lastly, process limits, which is even more intricate. Every process can put certain limits on itself. And well, those limits may actually cause the program to work slightly differently than it did before. So say that I limit the process to not be able to open any files. Now if you don’t do enough error checking and you go through your code path and you try to open the file and it doesn’t work but you didn’t do the error check, now we have basically created a new code path through your code that you may not have tested for and it may have unexpected results.

So one way to go is say I’m going to avoid this whole set you ID business, and I’m going to let LaunchD do this work. Now, LaunchD is nice in such that you can basically set up your helper to go. You can tell it what privileges to run with.

It will sanitize the environment, and all of that is under control now. It doesn’t always have to run. It can demand launch it when it’s necessary again. So those are really nice features. But of course, the problem changes with that in terms that you only tell LaunchD where the process is that needs to be run. You give it a path.

If the directory where that tool exists is not very secure and you can just replace another one in there, LaunchD will just launch it for you and that’s fine too for it. So that is another way of solving it and it has its own problems that you’re gonna have to deal with.

So if you’re interested in more of these intricacies about changing code and how to protect it against it, of course there’s a session after this coming up on launch day as well as a session on code signing. If you have any interest in that area, I suggest you go see those. Those are sessions 4.13 and 4.14, respectively.

So with that, quickly some documentation. You can look up about the things that I’ve talked about via the Apple Developer website. Follow the breadcrumb trail, because the URL is hard to parse. You go through the reference library, the Darwin section, then the security subsection, and you can find security overview that will tell you a little bit about the facilities that are in the system.

And AuthSample describes more or less how the AuthExec with privileges trampoline works. So that will give you an idea how you can implement such a thing. For LaunchD, I have to refer you to the man pages. There might be some documentation coming up. I’m not aware of that.

And for authorization, of course, look at the File and Security Framework. And then any questions in the area of data security and authorization, you can join the Apple CDSA mailing list and talk with the whole data security group, where we’ll happily answer your questions. And with that, I want to ask Simon back on stage.

Thank you, Conrad. So we’re going to switch gears a little bit here. We’re going to talk about secure goading and some easy ways to eliminate frequent flaws that show up. So I’m going to emphasize the reference library again. In May 2006, Apple released a secure coding guide, which is there, which covers all of the material I’m going to be talking about and a whole number of additional things. So it also contains references to other books and online resources which can be useful. The guide that Apple has published is specific to the Apple platform.

So I’m going to talk about two things in the secure coding section. I’m going to talk about buffer overflows and integer overflows. And I need to say, why did I pick those two things? If you take the big bundle of security vulnerabilities and you throw out the 52% of things which were stupid mistakes in your coding, you just got the logic wrong, or you just did something wrong, you’re left with 48% of technical vulnerabilities. 40% of those vulnerabilities are either buffer overflows or integer overflows. So if you can eliminate those, you can eliminate a huge number of vulnerabilities in programs.

Now, also, buffer overflows and integer overflows are a big risk to your customers and to your users. So why is that? A buffer overflow usually kind of manifests itself as like a crash or a data loss, and, you know, it’s an irritant. And you want to swat it, and you say, ah, horrible. It’s mostly harmless at this point. Problem is that there are things with bigger bite. There can be a vector for a worm, virus, or spyware.

So the only thing that takes you from the irritation, really, to the big bite is motivation, experience, and time. And the bad guys out there have that. That’s what they do. They look for crashes. When they get a crash, they start poking around, and they start manipulating it. So their target is to take things which are an annoyance and turn it into a vector for a worm virus or spyware.

So what’s a sign that you have an overflow? This is an example where I took a little program that I hacked up-- it took me five minutes to do in Xcode-- and it had a text field. And I copied it into a stack-based buffer, which was too short. And I just entered a whole bunch of characters, capital letter A. And this is a PowerPC crash.

And you can see there that the capital As, the 41, 41, 41, appear in the crash dump. And they appear in the crash dump in quite a few places. So this is a sure sign that you’ve got a buffer overflow. If you start seeing them in the link register and various other registers, you can get a similar dump on an Intel platform.

So usually engineers, when they see this, they kind of get mad and they go, why is somebody doing this? They go through the stages of grief, they get the anger, and then they get the denial. So they go and then say, I need to prove that this is an exploitable. This isn’t a problem for me.

Well, doing that is very, very hard. The cone of effort you have to go to to prove that a buffer overflow or an integer overflow isn’t exploitable is huge. It escalates very, very rapidly when you start looking at the code. And so what does a bad guy have to do?

Well, all they have to do is find one code path, right? It’s much easier for them to find a vulnerability than for you to prove that it couldn’t possibly be exploited. So one thing to take away here is that it’s almost always faster to fix the problem than prove it’s benign.

So let’s talk about stack frames. This is the usual place where buffer overflows occur. I’ve got a PowerPC stack frame here and an Intel stack frame. They’re laid out a little differently, and the names are not quite the same. But they’re generally the same idea holds. So what happens in a buffer overflow? You usually have a local variable that’s allocated on the stack, and you overflow it, and you corrupt either saved registers or the bookkeeping that the compiler does for returning and going between functions.

Now, for Leopard, we do, in fact, have no execute stack turned on for the Intel platform and for G5 processors. We can’t do that for G4s, unfortunately. But that will be on by default. This does cause some problems for certain types of functions. GCC, for example, uses stack trampolines for nested functions, which means if you’re using those, you’ll have to turn off the no execute stack. Now, no execute stack doesn’t solve all your problems. The bad guys, well, you know they’ve got lots of time.

They figured out ways to get around all of the no execute stack stuff. They do return into libraries. And then they also do heap overflows. They can all try and switch their area of focus to heap overflows. So what you’re seeing here is a layout of the Mac OS X malloc allocator. It’s a little different than traditional Unix allocators. There are no pointers between the buckets.

that data is held in a separate area. So you have contiguous memory regions. And then when you get very large allocations, they get to go directly to the VM subsystem. So here’s a picture of a buffer. And when you overflow it, it goes into the red area. And for Objective-C and C++ objects, that’s very bad. So why is that very bad? Well, for both of these objects, the first thing at The beginning of the object is a pointer.

It’s a code pointer. So if you can overflow and influence that pointer, you can make the program jump somewhere else, maybe back into the data again. So you’re basically going to get exploited. This is a little harder, but again, the bad guys have the time and the effort and the will to go and do this.

So what’s the easiest way in C-like languages to eliminate buffer overflows? And we’re talking about C here. If you’ve got a high-level language that has constructs for strings that have lengths, then please use those. What I’m talking about here is very frequent mistakes that people make in just C. They use functions on the red side.

Don’t use those functions. They’re really bad. They’re going to get you. Everybody will say, yeah, I know how to calculate this. You know, if I multiply this by this and add one and take off one when I do the malloc, because I have to account for the zero, then maybe it’s all right.

It’s very, very difficult to get these things right. So what I recommend is you use the stuff on the green side and use the strlcat, strlcopy for both strcat and strcopy. The n functions, I don’t recommend using those. And I’m going to show you why in a second.

For sprintf, use the n versions where you’re actually accounting for the space that’s used. So another takeaway point here is use unsigned variables for calculating sizes. You never get a negative size buffer. They’re always positive. They always have real value. And use the save functions for all new code.

Sometimes it can be very difficult to go back and retrofit old code. But if you can write new code, then use the new functions. In fact, if you read papers on software quality, a way to improve your code by 75% is to just introduce new practices and change the way that you write code.

So let’s compare str_copy, str_ncopy, and str_lcopy. Now we’re going to start out with a tiny buffer that’s actually going to fit. So we’re going to take those three functions, and then we’re going to see what the result of running those functions are. Now, in each case, we’re fine. No buffers or overflow. There’s no buffer overflow. The overflow is in the red area.

So let’s start again. This time we’re going to take the source string to be a little larger. It’s the same functions again. Now what happens with strCopy? Well, you get owned. You’ve overflown the buffer. You’re in the red area. You’re running somebody else’s code. That’s game over. So that’s bad. What happens with STR in copy? Well, you’re OK. You don’t actually overflow the buffer. But there’s something a little fishy about this.

That’s not a zero. For a correctly terminated C string, this isn’t the correctly terminated C string. So if you try and use that string in additional code later on, you’re going to go into the red area. So you’ve got an overflow here, but it’s a little bit more subtle than a buffer overflow. So this is also bad. And there have been hacks that have used this to actually cause problems.

So what happens with strlcopy? Well, strlcopy does the perfect thing. It doesn’t overflow the buffer, and it always zero terminates. So you don’t have to add that little extra piece there that either puts it at the beginning before you do the strncopy, or afterwards to actually make sure that the last character is zero. So this is good. Definitely use strlcopy and strlcat.

So in summary, think the L means total length of the buffer. This is a subtle difference between the N versions and the L versions. The STRL versions always use the total length of the buffer when doing calculations. This can simplify the arithmetic in your code amazingly. You can get rid of all those, if I add one, take off one, to adjust for the end of the buffer. You don’t have to do any of that. You just give it the size of the buffer. You always give it the size of the buffer.

So this reduces the need for complex length tracking. If you do, in fact, overflow the buffer, and it couldn’t put all the characters into the buffer that you wanted to put in there, the return value from the L functions is a little different. And it tells you the actual length you would have needed. So if you wanted to go back around and allocate some more memory and really use that length string, you can.

Now for Leopard, we actually implemented these functions in the kernel. So for Leopard, you can use these functions in the KEXT. And in fact, we would very much like to deprecate the use of the non-n versions. And people do use those versions in the kernel scarily enough. Please don’t use those versions in the kernel. Use the l versions.

So let’s move on to integer overflow. They’re kind of related to buffer overflow in that if you get an integer overflow, you normally end up triggering a buffer overflow. So here we have some unsigned arithmetic. We’ve got 1, 2, 3, and we added 1 to it, and we got 1, 2, 4. Now we’ve got another integer, and we add 1 to it, and suddenly it’s become a negative number.

And in hex, if you remember your two’s complement arithmetic, the last line shows you why that is. The top bit suddenly flips on, and all the rest are zeros. So in that last line of that box there, 1024 plus 2147483647, is that less than zero or greater than zero? If you’re using signed arithmetic, that will actually turn out to be a true statement.

Now if that data is coming from the user, then you’re basically going to be trying to fit something that’s too big into a small spot. Because if you then feed that to malloc to try and get some area to actually store the data, you’re going to get up a small data region. And the data that you’re going to try and put into it is larger, so you’re going to get an overflow.

So the real problem here is external data, things that are provided to you by the user. Maybe you don’t really think of it as being provided by the user, it’s an image on a web page, for example, and you’re using WebKit or something like that, or you’ve got your own image code for processing images, and you have a width and a height and a number of pixels, the depth of pixels. Well, if you multiply those things together, they overflow very quickly. And I’ve forgotten the number.

It’s not very large. When you multiply numbers together like that, you run out of space really quickly. In fact, probably in the next couple of generations of cinema displays, you’ll be able to overflow that if you multiply those numbers together. I’m sure about it. So another common thing that people do is they take a count and they multiply it by the size of an object because they’re going to create an array of structures.

The malloc of zero, if you can magically make that come out to be a zero, the malloc of 0, although it allocates 0 space, isn’t actually a 0 pointer. So if you’re testing for null in your code to see whether the malloc failed, it won’t fail. It will succeed. So you really do need to be careful with integer overflows. Now unfortunately here, this is a little bit of a tease.

Compilers don’t and cannot help you with integer overflows. This is a runtime problem, and the C specification doesn’t have any help for you in detecting overflows. You kind of leave that as a sort of a processor implementation detail. So you have to either code it yourself or get some help from external libraries. The publicly available tools don’t solve the problem very well. You can find some C++ classes that try to help you with integer overflows, or that’s one common library, and it uses template overloading. If you’re using C++, you could potentially use that.

The Civa things that are out there force you to use unsigned ints and aren’t very helpful. So we actually would like to provide something for everybody in Leopard to help with this problem, but we weren’t quite ready for WWDC. So I apologize for that. This is a little bit of a tease. There’s more coming in this area. We know it’s a big area, and it’s important.

So here’s a summary for overflows. You have to be very careful about statically sized stack-based buffers. They are the real red area for buffer overflows. Avoid dangerous string functions. Only use the safe ones. If there are higher language features or frameworks available to you, such as CFString or other mechanisms which can handle buffers of size, then use those. Check for arithmetic overflow.

And use unsigned arithmetic when manipulating buffers. They always have positive size. Do not trust user input. Most of the time, users will provide you very nice input. The bad guys don’t. That’s the first thing they go after. They have these things called fuzzes, which try to manipulate the data so that it goes outside the bounds you would have normally programmed for. So again, a pointer to the ADC reference library. There’s a secure coding guide there.

So okay, now we’re going to talk about sandboxing. This is a technology preview. This is new in Leopard. It doesn’t exist anywhere else. This is very experimental. So here we go. Why do you want sandboxing? Why is sandboxing important? Well, on any kind of operating system, and in particular Mac OS X, your application has access to all of the resources that the operating system provides you. So you can access files, there is networking, and then there are other applications running on the system.

And it will interact with them. Now what happens if you have one of these floors that allows somebody to inject code into your application? application? Well, once they’ve injected that code into the application, then that rogue code has access to all of those resources, everything. If you’ve got photographs somewhere off in a particular place, that rogue code has access to your photographs. It may not be photographs, it may be credit card numbers, it may be anything, right? It has access to everything that application has access to.

So as users, what would you really like the behavior to be? If you have an application, you want that application-- let’s pick Safari. You want the application to behave like a web browser. You want it to do web browser type things. It can maintain a cache of its image files. You want to be able to save pages.

And you want to be able to download stuff to the desktop. You want it to go out onto the network so that you can retrieve and browse around and have a great rich experience. And sometimes you want to make it talk to other applications to have hyperlinking through.

So in this environment, what does a user really want? What do they expect? And so I think it would be reasonable to expect that if you get rogue code inside the application, that that rogue code, the only things it can do is do damage to the things that Safari can do.

So it can only modify the cache files, maybe, that Safari’s keeping. It doesn’t necessarily have carte blanche access to download things or modify files anywhere that Safari is not normally allowed to do. And it can’t talk to other network places. It can’t phone home to Bulgaria and squirt data over there, although that’s a very hard problem for Safari.

And for other applications, you don’t want it to be able to go around and fire up all sorts of other applications on the system. So that’s kind of what the user would really want to have happen. Well, that’s what I would want to have happen. I don’t know about you guys. So we don’t want-- the bad code to access any files that are outside of what Safari could access. We don’t want them to be able to go and talk to websites or particular services on the network that we don’t want them to talk to. And we don’t want it to talk to other applications.

So how are we going to manage that? So here’s the sandboxing implementation that we currently have. It’s based on mandatory access control. This was talked a little bit about in the kernel session. Each process can have its own sandbox. There is not a limited number of sandboxes. It’s not predefined.

It doesn’t require privilege to use. So an unprivileged application can put itself in a sandbox. But you cannot escalate privilege. So you can’t say, I’m going to put this thing in a sandbox, but I’m also going to give it access to write into the root directory. You can only create sandboxes that have less privilege than you currently have.

This is included in the developer seed. The sandbox is defined right now by a text profile. There are three examples in the seed in user share sandbox. And the whole design of the sandboxing mechanism is designed to be cooperative with apps. So we designed this in a way so that if you’re building an app, you would think about building a profile, a sandboxing profile, to go along with that app.

So let’s talk a little bit about the profile language. I said that it was a text-based language. It is very provisional. It’s subject to change. There are things we would like to add to it. We’re sure there are things that you would like us to add to it. So we want your feedback. We definitely want your feedback on this.

The profile at its basic level consists of rules and filters. It ends up getting compiled for the policy engine, which executes up in the kernel. So there’s strong enforcement here. There are three examples. As I say, it’s shipped on the developer seed in there. You can go look at there. And after this session, there will be a little cheat sheet and a language definition for the profile language available at this sessions developer page.

So here’s a very simple example. I’m going to give a demonstration of this example a bit later on. So you’ll have to memorize everything on the screen here to make sure I’m not cheating. At the top here, there’s a version number. We’re definitely going to make some changes.

So we’re trying to point out that this is going to change and subject to change. So don’t build any products based upon this language. There is a debugging facility, which we’ve got turned off right now. In this profile, everything is denied by default, so you get no access to any operating system resources by default.

Parentage allow, that allows things like fork exec and spawn to actually operate. And because this example and the demonstration is going to be running a shell, all of the sub-processes need to get launched. So I need that. Sys control read-only, that’s a little bit of boilerplate that we need for DYLD.

That’s reading some system configuration information. And we’re not going to allow any network access. So in this demonstration, it won’t be possible to make any network access off the machine. for this process. So then there’s a couple of lines to allow DYLD to load, and then for applications to load and run. And then here’s the little example. I’m going to allow reading the file readme, but not writing or removing it.

So that’s any path on the system that says read me, I’m going to be able to read it, but not write to it or remove it. I’m going to deny reading any file named secret, but I’m allowed to write to that file. And lastly, any other file I can read on the system.

So that was a little introduction. The general form of the language is a sequence of operations followed by an action, and that currently is allow or deny. And then you can apply filters to that action. And those filters for path names and string-like objects are regular expressions. And that’s also true for network addresses and ports.

It’s a little regular expression-y. Wildcards, sorry. I’m getting a look from one of the implementers down here. So here is the list of operations that can be performed. This describes the actual documentation and explanation of what all these mean is in the documentation guide that you can download from the session website.

The regular expressions, they’re mostly POSIX 0.2. Then there’s no backtracking allowed in regular expressions. And there are some additional character classes that we’ve defined that allow construction of the strings a little easier. You’re not allowed to-- so in these examples, though, there’s a filename and paths which eliminate access to dot and dot dot. So you can’t escape out.

So this is an excerpt from the profile that’s shipped on the Leopard Seed for NTPD. And what this actually does here is allow NTPD to access the network. It allows it to modify the NTP configuration file and the key files. It allows it to read those two files, I’m sorry. And it allows it to write the NTP drift file and a temporary file when it’s renaming that. and it also allows it to create the PID file.

It also allows the time to be set. So this example, if you apply it to NTPD, when running, let’s suppose that NTPD, tomorrow somebody finds a buffer overflow. If you were running under this profile, the attacker would not be able to exec any programs. The only files they will be able to write to are the etcntp.conf and keys file, the drift file, and the pid file. No other files on the system will be accessible by that program.

And that’s pretty much what they could do. So the only things that they could do if they exploited NTPD, they managed to generate a buffer overflow to try and exploit it, is they could do the same things as NTPD, which is not very exciting. You’re not going to generate a self-propagating worm that does any damage to a system using NTPD like that. OK. You can do a denial of service, maybe.

OK, yeah, we’re going to have some questions later. Sorry. You can’t harm the user system. So here’s a usage example. This tool is also on the Leopard Seed. There is a sandbox exec program, and you can provide it either a path to a profile or a profile string in its entirety.

You can also use this mechanism to launch programs via launchd by prepending sandbox exec to the the launchd command. And we also have a little API for launching things into a sandbox so that an application can also launch itself into a sandbox and get some cleanup. So now I’m going to show you a little demonstration.

Alrighty, so now we’re going to, just to show you that we’ve got network access here. OK, that’s a text version of the Google web page. Very exciting. Oops. I have a readme file. I have the secret file, and I have the profile. So this is the same profile that was up on the slide.

And now we are going to-- so the profile doesn’t allow-- I’ll just go over this again. It allows fork and exec. So when I run this profile, I’m going to be running bash. And so I want to allow subprograms to run. They inherit the sandbox. I didn’t actually say that earlier. I should have.

It does not allow any network access. It allows DYLD to operate so that it can load programs. And I can read the readme, but I can’t write to it or remove it. And I can’t read the secret file and every other file I can read on the system.

So here we are in the sandbox. So let’s show that-- so we’re not able to connect to the network. So it says permission denied. So here’s the README file. It allows me to do that still. I can’t access the secret file. So I also am not able to remove the readme file.

So I’m going to add something to the secret file. And that allowed me to add something to the secret file. But-- whoops. I still can’t read the secret file. So let’s exit out of there. So we’re now back at the regular shell. And oops, I guess I don’t have completion turned on. So there’s the secret file again out of the sandbox.

Can we go back to the slides, please? So here’s a summary. This is a technology preview. This is subject to change. We would like you to play in the sandbox, do some things with it, send us feedback. We have documentation after the session, which should be up, in the session area. There’s some sample profiles in UserShare sandbox. There’s a header file in UserInclude sandbox for using the API.

There’s a binary on your system that you can play with, user bin sandbox exec. There is a mailing list which does not quite exist yet. I’m not sure when it’s going to show up. Sandboxing dev at lists.apple.com, that’s the list we would like people to use to send us feedback, suggestions on this technology. Thank you.