Introducing Time Machine - WWDC 2006

Application Technologies • 27:34

Leopard introduces Time Machine, automatic backup built in to Mac OS X with an intuitive time based visual display to search back through time to find and restore anything on the Mac. Learn more about Time Machine, how it works, and how you can use time based search and restore in your application.

Speaker: Robert Ulrich

Unlisted on Apple Developer site

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript was generated using Whisper, it may have transcription errors.

Good afternoon, and welcome to session 135. This is Introducing Time Machine. My name is Robert Ulrich, and I'm one of the engineers working on Time Machine. And so today, here's what we're going to cover. Rather than spend a lot of time doing a demo, which hopefully you've already seen, and hopefully you've actually installed this stuff on your machine and have tried it, there's really not a lot I could show you better than what Scott showed you on Monday. So I'm not going to spend a lot of time showing you the UI, because hopefully you've already seen that. Instead, what we're going to do is we're going to talk a little bit about what it is that we've done with Time Machine. Some of our design goals, the features, and why we've done them the way we've done them. how Time Machine works internally, and how to make yourself play nice with Time Machine, a few tips for your app, and then we'll have plenty of time for Q&A.

So why did we do Time Machine? As Scott kind of mentioned on Monday, we did a survey of a bunch of users, and we asked them if they thought they should be backing up their data. And what we found out was that 80% of the people said that they believed they should be backing up their data. So we then asked them, OK, so how many of you do it? And we got this answer, 26%. So this means that over half the users not only know they should be backing up their data, they're not doing it. So in addition, the people who do it on a regular basis turns out to be 4%. And that's kind of not a good number, because you want to protect your data. It is important. And mistakes can happen. Things can get deleted. And hardware can crash.

Why don't people back up today? Well, a lot of the reasons-- there's many reasons. Here's a few of them. Existing solutions can be intimidating to set up. They have a lot of options. You see huge lists of files, and you don't know which ones you really need to back up and which ones you don't. It doesn't happen for you automatically, so it's not just part of your life. It's something you have to think to do. It's not convenient, because you've got to remember when to do it, and you've got to set everything up to do it at that time. And it can be really hard to find those files later when you do actually need them. You're fumbling through different disks.

You don't know where it's at, and it can be very difficult. So we've tried to make things really simple. That's kind of a goal. So the main design goals we have are shown here. We want it to be really simple to configure, integrated with the system and just part of your life, automated so that it just happens for you. You don't have to think about it, very convenient to access your data and get to those files later, and compatible with the system you use today, not something that's hacked on.

So here's what we did to tackle the configuration. Some of you who've hopefully installed Leopard by now and tried it, you've probably seen this. But we think we did a pretty good job of making things really simple to configure. Because in fact, all you need to do in order to run the Time Machine is simply select a disk and flip the switch.

And then you're running Time Machine. So that hopefully will encourage plenty of people to do it that are not doing today. And of course, one of the major features of Time Machine is the integration with the system. We're fully integrated with the Finder. Hopefully you've already seen the Time Machine browser. And we think this is really, really a leap forward in terms of how to interact with backup data.

In addition, we're fully integrated with the installer. And we think this is another really important thing. So when you go to do an install, we've been telling people for years, before you install your machine, go out and back up your data. And nobody does it, because they don't know how. So today, now we can say, go out there and back up your data before you do an install. And just offer them a checkbox that says, back up my data for me, and it just happens.

And then in addition, on the other side, you can also, when you go to install a new system, you can choose to install it directly from a backup, which is really nice. It's automated. By default, everything is backed up. And it just happens once per day. And in addition, we maintain rolling backups for the last couple of hours. And I'll get into that in a couple of slides why that's useful and important.

So here's the convenience. Because of the state of the art, how disks have gotten so big and so cheap, we've really been able to move to kind of a new paradigm, which is to use high speed, high bandwidth storage devices, which in our case is either locally attached external disks or AFP servers over a local net. And this is really important because it really brings your backups to where they're online. They're always accessible. They're always there. And they just become part of your system rather than being something on the side.

So we use a compatible archive format, and it's in fact so compatible that you're already using it. And that is HFS+ with journaling on. And you can view the disks on Tiger and Panther systems because it's just HFS. And we don't encode or compress anything, and we don't put everything into a huge data blob to make it hard to go through later. So let's talk about a few of the features of Time Machine.

Automatic backups. Well, we backup changes on a rolling basis each hour. So what this means is for the last couple of hours, we always have the data at the state it was about an hour ago. And this provides extra protection, obviously, because if you should happen to need to get to data that you had, you've only got an hour of lag. And at the end of the day, we then take the last one of and we roll it over into a daily backup. So we preserve those over time, so that lets you go back and find older versions over time, but you've always got a backup that's no more than an hour old. In addition, you can specify if you happen to be somebody who doesn't want the machine doing it hourly, you can specify that you only want it once per day using the option in the control panel.

So as you go and you create multiple backups, you're going to end up using more and more disks, obviously, on the backup disk. And by default, we always keep all backups all the time. So you can always go back a year, two years, however long, and look at every backup that you've made. If, however, you decide that for me, I only care about six months of backups, anything older than that, I figure that I probably don't need or something like that, and you'd like to allow yourself to get more data onto your backup disk, you can specify a time period to keep backups. And in that case, we'll still keep all your backups all the time until we start running out of disk space. And in that case, we'll start removing older backups to make room for new ones. And of course, you always have the option, if you start running out of disk, to buy a larger disk, which is always a good idea anyway, because by the time you get to that point, it may just be time to get a new disk because they just keep getting bigger and cheaper, right? And in addition, you can change the period to keep backups or manually delete ones that you don't want.

So one of the big features of Time Machine is that you don't have to figure out what files are important. By default, we back up everything that you need to restore a system disk. So this means that you don't have to go and worry about, well, what if I didn't get this one file that I need, and it's going to bite me two years later? By default, we grab everything. And this includes system updates. And now a lot of people, when I first told them that, said, well, but system updates, is that going to be too big and use too much space? But it turns out that system updates are usually in the tens of megabytes on a hundreds of gigabytes disk. So it turns out that it's really easy to grab system file changes. The other thing is that system files only change every couple months or so. So it's really not a problem to do that.

So here is an example of the layout that we use for Time Machine. On the backup disk, inside of there is a backup directory. And inside of that, you'll see for each machine that has been backed up on this backup disk, you'll see a folder. And inside that folder will be a series of dated backups. And inside of each one of those will be the disks that that machine has got backed up. So on top of that, we use special access control lists on top of the files we lay down that make these read-only. These cannot be modified. You can't go and delete stuff. You can't rename stuff. The file system itself is protected and can't be modified.

So when you go to restore with Time Machine, you can restore using the great Time Machine interface on Leopard. You can also browse the structure that I just showed you with your average file management tools. Or you can use the installer to restore an entire system from a backup you've made before. Furthermore, if you buy a new system and you want to move your files over to a brand new system, you can make a backup, buy your new system, and then use the migration assistant to migrate directly from that backup disk, which is really a nice feature.

So Spotlight-- well, our backups, how do they interact with Spotlight? Spotlight does index your backups. And so-- Obviously, the question is, well, when I search with Spotlight, what happens to those backups? Because I don't want to search for some text file and find the last 20 revisions of that text file in there. So what happens is, by default, Spotlight does not include backups in any of your queries. And this is not implemented yet, but by the time we ship, there will be special options added to Spotlight, which will allow you to tell Spotlight that you want to search backups, and you'll be able to use Spotlight on your backups to find files from the past.

So the requirements in order to use Time Machine are here. First of all, you must be running Leopard. That's probably not a surprise. You must have either an external hard disk or an Apple-shared server on a local disk, on a local network. And we do require, once again, journaled HFS+ disk. On the external disk, it should be formatted that way. On a server, we use a disk image file with a journaled HFS+ disk image file.

So that's kind of an overview of what Time Machine is. Now let's talk about a little bit behind the scenes about how it works. So when we first make up a backup, as should be pretty obvious, everything needs to be copied because we don't have anything yet. So the first time through, we simply copy everything.

After you've made the first backup and you go to make additional backups, what happens is we use the new facility in Leopard called FSEvents, and hopefully you at least have heard about this. This is a new feature of Leopard where the file system itself can broadcast events that happen to you. And so you can find out, for example, everything in this file system that has changed since a given time period. And this is really useful. This is what we use in order to say what has changed since the last backup. And in this case, we have four files, and FSEvents has reported to us that files B and D have changed since we last made a backup.

So we simply create a new backup, and we copy files B and D over to that backup. So what about the other files? What we do is we don't want to copy those. They already exist. So we use file system hard links to point back to the previous versions in the former backup.

So this has a really nice benefit, because to the user, there appear to be two separate directories, each of which contain four individual files. But in fact, on disk, there is only six files. So this allows us to fit a lot more stuff on the disk. And obviously, that means we can make backups and backups and backups and backups, even for lots of data. So if we did that, that's great, but it still means that we create a ton of links, because everything in every folder that hasn't changed needs to be linked to the file that's in the previous backup. So we don't really want to do that, so we don't.

So what you'd really want to do in this case is you'd want to say, well, everything in those two top-level directories, nothing has changed. So why do I need to do anything inside the directories? Why don't I just say, make that directory be the same as the one before? And in fact, that's what we do.

We've got a new special kind of links that we added, which are called archive directory links. And instead, any time we've got a top-level folder which has not changed, we simply point it to the previous one. And this lets us, in the case where there's very few file system changes, to create an entire backup using very little data. And this means that this is how we're able to fit so much data on a backup disk. Thank you.

So let's talk about integrity. One of the issues for doing backup is there's a lot of things on the system, and the primary one are bundles, where you've got files that are related to each other. And as we go to backup, obviously all the files don't get copied across the link at once. They're copied serially. And as things are copied, it takes time to occur. And at some point, the user can come along and modify a file before we're finished, after we've started and before finished. So when we do this, once again, after we've completed, we ask FSEvents, well, what changed while we were doing the backup? In this case, it came back and said, well, the user changed file B. So we use that information to then go and recopy B to make things stable.

And this process repeats so long as the files keep changing. But what happens is over time, we start with a very large number of files getting copied. And then we say, FSEvents, well, we copied this 100 files, what's changed? And it says, OK, well, these two files have changed while you were doing that. So we go through and we then recopy those two files. And then we ask FSEvents again, well, how many have changed now? And hopefully it comes back zero. But what happens is that the first time, the first copying window might take, say, five minutes to copy all of those files. But then the second one, only two or three files have changed during that period. And the copying period during that might be a matter of seconds. So in that time period, we only have a small window to worry about things changing. And that window continues to get smaller. And hopefully it gets stable. But we will time out in the case that there is extremely volatile files. But that should be really rare. So how can you play nice with Time Machine?

First of all, these are the top two things that you can do. Number one is to tell us which files you have which we shouldn't back up. The other one is to add support for Quick Look. So excluding files. Here are examples of files that you might want to exclude from the system.

These are temporary files that you know they're just there for a bit, cache files, which just cache data from your other files, index files, which reference other files if they're rebuildable. And in fact, any large rebuildable file, you should mark and tell us, if you can rebuild it, you're better to rebuild it after a restore than to have that backed up in every copy. So here's an example where we have the-- for Safari, Safari has cache files. This was shown by Scott earlier this week. Safari has cache files here that we obviously don't want to back up because they're changing all the time and there's no reason to keep them around.

So the first thing you can do is to use one of the well-known existing locations for cache files or temporary files. And this is /temp or /tilde library caches or /library caches. In the case of Safari, it puts its temporary files into library caches Safari, and by default, we just don't pick them up, and Safari doesn't have to do anything else.

The other thing you can do is to use this API, csbackup set item excluded, to exclude your items. And there's two ways you can use this. The first one is to exclude by path. In this case, you specify a URL to a path, and you pass true for exclude by path. When you do this, the path is added to a list of excluded locations, and we won't back that location up.

It should note, though, that this is stored by path. So if the contents in that path happen to move somewhere else on the disk, they will get backed up. The other way is to exclude a specific item. If you make the same call but specify an item URL-- and this can be either a file or a folder-- and false for exclude by path, that specific item will be marked as excluded. And it will be excluded any time we hit it anywhere in the system. So even if it moves to another location, it will still be excluded.

On the other side, you can find out if a particular item is currently being excluded by making a call to CSBackupItem is excluded. And this will return true if the item is currently excluded from the backup. And excludeByPath in this case will be set to false if the specific item has been marked as excluded by the previous API. Otherwise, it will return true. And that means that it's excluded because of its current location. But that can mean either that it's at a path which has been excluded, or it's contained within a folder which has been excluded by item rather than by path.

The other thing you can do to help with Time Machine-- this is probably the most important one in addition to the other one-- is to add Quick Look support to your app. What this does is this allows us to preview your app contents from older versions without having to load your app. And so the user can go back, can find that old document from your app from a month ago, hit a button, bring it back, and you're done.

If you happen to use standard data types like HTML or RTF or PDF, we have a set of these that we'll be shipping and you'll already work. But if you have a proprietary format, you should look into writing a Quick Look importer. And this will also help you with anybody else who uses Quick Look, which in Leopard will include Spotlight and a bunch of other stuff.

So copying from backups. In general, you should never need to do this. By default, the user goes to Time Machine, finds the file they want, brings it forward. It just happens entirely outside of your app space. But there may be cases where your app decides that it really needs to pull something out of the backup store.

If you do that, it's important that you use fscopyobject to do it. You can't just go and do a simple copy. And this is because it handles things like the archive directory links correctly and will rebuild the directory tree so that it looks right. It'll also restore all the original files permissions. It's-- and it will also remove all of the access control lists and set all that stuff right. And it should be noted that this is exactly the same copy engine which is currently used by Finder and Time Machine, so you're guaranteed to work into the future.

So here's an additional few helpful tips for your application. When you use index files, if possible, make the index rebuildable. Don't put any non-rebuildable information into the index. Make it simply a cache of information that you can rebuild from the items themselves. And if you do that, it's good because then you can then go and mark that file as not needing to be backed up, and it will just never be backed up.

If you do have a backup file-- in this case, the example is for a music system where you have an index file that has a bunch of cast information about all of the music files. In this case, it's got the album name, the group name. But in this case, they've included play count. Well, what happens is that every single time that you play a song, that index file gets touched and play count gets bumped up. Well, what happens is that that index file might be multiple megabytes of file, because I might have thousands of songs. And what happens is every time that you play a song, we have to then go and re-backup that file. And so you're going to get in every single backup, you're going to get this huge five megabyte blob that gets moved into the backup when it's not even needed. So what you should do is any kind of data which you know is going to be changing on a regular basis, take out of your index files. Make the index files big static caches that you can mark as not needing to be backed up. And then move any data which is going to be changing on a regular basis into smaller little files that we can back up and that don't take a lot of space.

One option to consider that will help you with this is, it doesn't make sense in every case, but if it makes sense for you, you can look at using bundles for your app data. In this case, this gives you a really simple way to separate any metadata, things like play count or user rating or anything like that, from the actual data itself. So in the case of, let's say, for a song file, let's say the metadata has a user rating and it has a play count in it.

Well, in this case, by doing this, we've separated the song data from the metadata itself, from the user rating and all that. And if the user marks it changed, we don't have to back up all the song data. We don't have 12 megabytes of song data. We just have 150k file or something instead.

The other advantage to this is because it appears as a single icon within the finder, if the user moves it to another directory, all the metadata will go with the file. And when you rename it or restore it with Time Machine, it will all come back as one unit.

The other thing you can do if you happen to have a database-based file is really you should consider adopting a document model. This means you don't assume the database is a specific path and you can only have one open. And when you open your app, you always open this database of this path. Instead, allow selecting more than one database and having more than one database open at a time so that you could, in the future some way down the road, be able to open both of your database, one from backup and one from current, and be able to merge records from one into the other.

So a final note, the API that we've talked about here today can be found in backup.core, which is a new header in carboncore.framework. And it should be noted that this is one of the newest pieces of the system. It is rapidly evolving. Features are subject to change. And many changes already have been made, even since we froze the conference DVD. So please keep your eyes open for seed updates that will be coming, and this code is rapidly evolving, so just be aware of that. And if you want more information or you have comments, this is the person you should contact, John Galenze and his galenze at apple.com.