Leopard Innovations • 54:36
Time Machine, with its automatic backups and intuitive file recovery, transforms the experience of using backup software. Gain in-depth knowledge about Time Machine, including detailed information about its infrastructure and considerations that affect developers.
Speakers: Deric Horn, Robert Ulrich, Eric Weiss
Unlisted on Apple Developer site
Downloads from Apple
Transcript
This transcript has potential transcription errors. We are working on an improved version.
Hello. Good morning. (Applause). Welcome to Time Machine In-Depth. I am Deric Horn. I'm the Application Technologies Evangelist, and before joining evangelism I used be a member on the file system engineering team. And being here today really reminds me of a time when we were developing HFS+. At that time we had gotten HFS+ up to the point where we were able to finally copy a file onto an HFS+ volume, and this was really a huge accomplishment for us.
But, about that time as well, WWDC was right around the corner, and I don't know if you guys know this, but some sort of a California law that every presentation at the conference has to have at least one demo, and I think in my mind I was picturing what our demo would look like. Sitting there in the Finder and copying a file onto a volume.
Not much of a demo. So that's really one of the reasons why I really love Time Machine. It really emphasizes the use of the file system, and it's really, I think, going to be probably the number one user feature that's going to drive adoption of Leopard to our users, and cause people to upgrade from Tiger to Leopard. After all, the most important thing about our machines is our data.
So, how many of you out there have actually tried Time Machine yet? Wow. Okay. Good. About half of you. I don't know if your experience was exactly like mine, but the first time I plugged in that volume, and I did that backup, it was almost like a-- this-- this new feeling I had. This different relationship I had with my computer.
It was almost like I just unloaded this backpack full rocks on the ground. I mean, it was like, now I have this safety net. I felt really free to go around and try thing I had never tried before, going on into Photoshop, trying new filters, plug-ins. I didn't care if undo was supported or not, because I felt like I always had undo. I could always go back to the version that file I just had.
Right away I found myself on like Version Tracker, downloading new shareware, trying new things. So it was really this kind of this-- this enlightened feeling I just had with my machine after that, because I always felt safe right next to me, all my files right there. I could always go back to that stable state I was just in.
By now a lot of you have probably heard about Xcode Snapshots. So, Xcode snapshots are the ability to when you're working a project in Xcode, you take a snapshot, try something new, if you screw up, you can always go back that to that safe snapshot. Well, Time Machine implements this idea of snapshots, across the board for all your applications. So whether you are in Photoshop or any other application, if you screw something up, you just go right back to that safe place where you were, that safe snapshot.
So what were some of the goals we had when we designed Time Machine? Well, obviously the goal was to protect our data. To provide the safety net. But we wanted to do something that was really that provided the seamless integration of a backup solution onto your system. Something that went beyond what normal backup solutions do and really make it intuitive for all users to understand how to use and to be able to retrieve their data. We wanted to be able to backup our data and keep it in-- in a file format that everyone understands. We're not going to use encryption, we're not going to use compression.
This way in the future you don't have to worry about, you know, do I have a method to unencrypt this file? Do I have this method to unstuff this file? Is that company around that still developed this encryption technology? Do your files just sit on regular HFS+ volume? It's as easy as copying-- copying them back over, so it makes it very easy, very intuitive to use and to understand.
Now that being said, Time Machine 1.0 isn't going to be everything for everybody. It's not designed to be an enterprise-level solution. We don't have these goals in mind of I have my university of 4000 machines, I need some encrypted offsite storage, and Time Machine's going to provide all the functionality you need. No. Time Machine was designed for our typical usage in mind. Small businesses, users, home usage, and so forth.
And probably one of the neatest things about Time Machine, and I'm going to say this about a lot of the technologies in Leopard, is that it's really well integrated across all of Mac OS X. So, the typical place where we're going to be using Time Machine is directly within the Finder. Go ahead and find that directory that contains those files that you're going to--that you want to find these older versions of these files.
Once you find this directory, this folder, this window, you enter the, you click on the Time Machine icon and you enter this Time Machine presentation mode. Now, I think from there that's where this intuitive user interface comes in. I think it's very intuitive to understand that depth represents time. The further back you go, the older in time you go. All the way back to when the Big Bang was.
We also have integration of Time Machine into a number of our applications in Mac OS X. So again, I may have added a number of contacts into my Address Book before this conference. And then maybe-- and after the conference I delete those contacts. A month later I want to go back and find that person that I wanted to have contact with, not in my Address Book anymore. I go ahead and open up Address Book, from the Address Book application itself, I click the Time Machine icon, and I'm presented with that same familiar Time Machine user interface. And I can go ahead and go back in time and reconcile those records.
Now the interesting thing about this is that Address Book, behind the scenes, actually has to have two versions of its database open at the same time. So, it's not working on a file-by-file basis. And it's going to go ahead and reconcile those individual records and move them across. Similarly we have integration with iPhoto, as well as the Installer.
So, we've all been told, make a backup before you install new software. Nobody does it. I've never done it. But right now we make it very easy to do. It's as simple as clicking a checkbox before you install your new software. If things get screwed up you can always go back to that safe place where you were right before you installed that software. In addition, we have integration with the Migration Assistant.
So, you may have all of you data on the MacBook Pro, slips out of your lap, drive gets broken. Now, unfortunately you have to go out and buy a new MacBook Pro, but the good thing is with the Migration Assistant, you can now move all of your data from one of those Time Machine snapshots back onto your current new volume.
So, now that I've described a little bit about technology, you're probably wondering why did it take us so long to write Time Machine? Well, it turns out Time Machine really leverages a number of technologies which are really just now coming to life in Leopard. So, a typical backup scenario-- a typical backup scenario, involves scanning your entire source volume and your entire destination volume. So really what I mean by that is we might have a million files on our volume.
The backup solution now has to go ahead and scan every single file in our source volume, every single file in your destination volume, compare things like modification dates, attributes, file sizes, when it finds a difference copy those across. Now that takes a long time to scan 2 million files. So now in Leopard we have a new file system technology called FSEvents. This provides file system event notification.
So for instance, when we want-- when we launch Time Machine now, Time Machine can then say, "Tell me all the files that have changed in the last day." FSEvents will then send us a list of directories containing only those changed files. So that accounts for the blazing speed we see in Time Machine when we do those incremental backups.
We also leverage another-- a number of other technologies in the file system. Things like EAs-- Extended Attributes. That-- this allows us to tag some extra metadata onto the files. This metadata then can be searched by Spotlight, maybe the metadata contains information such as whether the Time Machine backup is in progress, the last time of the backup and so forth. Eric Weiss will go into details about this a little bit later.
We actually had to change the volume format of HFS+ to accommodate Time Machine. So when you think about it, on your system, out of those million files, there's really only a few files that actually change over time. So, for instance, everything in that /System directory, it rarely changes. So to get better performance out of Time Machine, we introduced a new concept, and this is the concept of being able to create hard links to directories themselves.
Again, Eric will talk more about this in a little bit, but the idea being here if nothing-- if nothing changes in my /System volume, don't go ahead and create all these folders or-- or links to files that already exist for better speed. Let's just say that /System is exactly the same, create directory hard link, or what we call an archive directory link, to this original /System.
Come on. Quick Look. So by now you guys have all heard of Quick Look, I'm sure. Quic kLook is another one of those technologies that is pervasive throughout all of Mac OS X. It's used in Spotlight, it's used in the Finder, it's used in your open save panels to provide those little previews, and probably what I think is the most important place where Quick Look is used, is directly in Time Machine.
So currently the workflow using a backup solution would involve something like I'm a graphic artist. I drew a picture and I remember that picture had maybe a small red balloon on it. Okay, go over on my backup, retrieve this file, put it on my volume, double click it, launch Photoshop, no, not the right file. Go back to find the other backup, and repeat this process.
It's not a great user experience. So now with Quick Look, directly from this Time Machine presentation mode, you can go ahead and finds these files, hit the space bar, presents a Quick Look preview, not the one I'm looking for, go back one more version, hit the space bar. Really increases productivity and that user workflow.
And lastly, Core Animation. So I don't believe that Core Animation is just eye candy. I think that Core Animation is really responsible for providing this rich, kind of intuitive user interface. This is the first time that we're seeing a backup solution that doesn't use things such as tables and lists to portray the data. It's portraying the data in a whole new way and that's through Core Animation being able to animate and show you your snapshots over time.
So now let's talk about Time Machine 2.0. The ability to get your documents before you write them. So what do we need for Time Machine 2.0? (Laughter) Okay. Wrong presentation. I must have made a mistake here. Let's go ahead and look at Time Machine and see if we can get the right version.
So I'm just going to give you a quick little tour of how Time Machine works. The most common scenario, we'll find a directory here and we want to go ahead and find the older version of this file. So in this one I am going to go ahead and look at a representation of my presentation. Simply click on the Time Machine icon, and we are presented with this user interface.
The first thing you'll notice is on the right-hand side, a number of these tick marks are white, and as we go back in time you will notice the grey ones. The white ones represent the snapshots that we have made and the grey ones are where we don't have that data. So as you can tell my first backup was probably made on May 30th, for instance.
And there's a number of different ways we can navigate within this Time Machine presentation mode. For one, you can notice it's very similar to the Finder. If we-- if I decide this isn't the right folder I can go ahead and click around from here. You will notice that all the other windows now change their titles. So, I can go ahead and navigate back to where I was.
By using these arrows over here, it will automatically zoom us back to that version to-- to the version of that directory which has something different in it. So as I click back, nothing has changed for a number of days here and now all of the sudden we will see, okay, we have 2 files in this directory. We can also go ahead and click on individual directories over here.
But I'm going to go ahead and try and find that file that I'm looking for. I don't know, was this it? We can check just by hitting the space bar, and again we'll be presented with this Quick Look preview of what was in the document. It's that easy. Directly from Time Machine.
I know that's still the-- that's still the wrong presentation, we'll hit the space bar again. Go back in time a little further. Let's keep on going back. Okay. I think this is the right one, let's go ahead and check it out in Quick Look. This is the right presentation.
So to restore it, we'll just go ahead and click the restore button, and as you can see through animation, it will just bring us right back into today's time. And ask us if we want to replace this original file. So it doesn't just replace that original file, we can keep both of them, replace it, or keep the original one.
In this case I'm going to go ahead and replace, I'll go ahead and keep both of them. So we have the original one and the new one. That's a little bit to see how Time Machine works. (Applause). At this time I'd like to welcome Robert Ulrich on stage to talk a little bit more about the details of Time Machine.
(Applause)
Thank you.
(Applause). So, hello. I am Robert Ulrich. I am an engineer. I work on Time Machine. And, here's what we're going to talk about here. What's new with Time Machine, how it works, and tips for your application. I'm going to cover more of the high-end sort of user experience, and Eric will talk about some of the low-level details. So, what's new this year in Time Machine? Well, we've added a lot of stuff. We have simplified the options, we've added automatic setup, we have added backup to the servers and AirPort disks. A lot of stuff. So we'll cover some of that now.
So, we've simplified the options this year in Time Machine, so now everything is much simpler and much easier to set up. You'll see this in your seat, it's really, really easy to deal with. And here's some of the stuff that we've done. First of all, we've eliminated the on/off switch. There is no more on/off switch.
Time Machine is always running, there is-- it's just a question of whether you want to do it manually or automatic. So, if you turn on the automatic switch, we just back it up. You no longer need to set the backup time, it just happens for you. And that's all you need to do.
So, when you do automatic backups, what happens now is your machine is automatically backed up every hour. That's why you no longer need to set any time, it just happens every hour. The hourly backups are kept on a rolling basis for 24 hours, so after the file has passed 24 hours we'll-- we'll remove the oldest one and we'll make a new one.
And-- but the first one that happens every day after midnight will become your daily backup. So, one of those every day gets saved as a daily backup, and then we have a rolling list for the last 24 hours. We've also added support for backing up your disk when-- when it's-- when a FireWire or USB disk has been attached. And we haven't backed up for a while, which is really useful.
So, if you turn off automatic backups for some reason, you can still do manual backups and you just go to the control panel and press Back Up Now and we'll back it up. In this case if you turn off automatic there's no hourly backups done, they are just done on a demand basis when you press the button.
One of the features that we have had a lot of requests for and we've added this year is Stop and Resume, which is very useful. So, if your machine is in the middle of doing a backup and you need to go, you can now stop your backup. You can-- and then we'll remember where we were and we'll pick it up later, so if you come back later and just start, you can just go back and press Start Back Up again, and you'll continue from where you are going-- from where you left off. And this is especially useful if your machine is in the middle of doing a backup and you need to shut it down or take it on the road, then we'll just stop what we're doing and we'll-- we'll pick it up later when we do a backup again.
So, whenever we backup your system, we always backup everything you need to do a single-click reinstall of an-- and get an entirely bootable system with all of your user accounts and all your system files and everything. When we do this, we keep the system data for everything and that turns out to be on the order of a couple of gigabytes, but because it doesn't ever change, it stays roughly constant.
So even though, you know, even if you save your-- your system files are not huge, you know, you've got like a 500 gigabyte hard disk and you're talking about a few gigabytes, so it's not a not a big penalty to keep your system and have always a full rebootable system which we think is really important.
So, but if you decide that you really don't want to keep your system file, this is primarily useful if you always do a full install off of a DVD and you prefer to do that versus restoring your full system. You can choose to turn-- skip system files now.
And in this case, you'll-- you'll save that few gigabytes from your system, but it means that you won't have a full rebootable-- a fully bootable restore from one button. Now, fortunately, the-- the important thing here is we're still-- we still work right with the Migration Assistant and the Installer, so that we haven't saved your system files, but we're able to restore all your user files on top of a fresh DVD install. So in this case you're still-- you're still able to get back to a full system with all your user accounts and all the permissions are set up and everything.
So this is really important. If you do turn on this-- this-- this option, by the way, it will speed up your initial backup because we won't have to grab any of those system files on your initial backup, but it really doesn't effect any incremental backups. So, because the system files don't change for the most part, unless you do a system update, most of the time this makes no difference in performance so you're not penalized for keeping your system files around which is really, really a good thing.
So, even if you decide to keep everything, there-- I just wanted to point out there are some directories which we automatically skip. And these are a few of them here. Primarily there are things that are cached like ~/Library/Caches, /Library/Caches Spotlight, metadata, things like that, that can be rebuilt later.
Temporary directories, of course, because they have got files that are that have just been written for temporary reasons. Device mounts because individual devices are backed up as individual devices. And network-- network mounts, because we don't want-- we don't want your backup to be traversing your local network and backing up other peoples' machines. And-- and FileVault images, and I'll get into that-- in that in a little bit later, because there-- they're handled separately.
So, now we have an even simpler configuration than before. You simply go and choose a disk, and you are done. There is no step two. (Laughter) So, this is even easier than last year. You don't have to set times to backup, you don't have to do anything, you just choose disk.
And we have even gone a little bit step further, as Steve mentioned in his keynote, we've now added single-click backup setup. So, what I've got here is a FireWire hard disk. And I'm going to take a machine here, this machine has never been set up for-- for Time Machine. And I'm just going to plug this disk onto here. Like so. And the disk appears on the Desktop. We notice this. And...okay.
Well...Maybe-- maybe somebody did plug a disk on here before. Let's try this-- another one. I didn't. Okay. Well...I guess I didn't pray to the right demo god today. So, what would have happened-- (Laughter) -- had somebody not already set up Time Machine on those machines...is you would have seen this dialog.
Do you want to enable Time Machine? So by default, now, you just simply plug a time-- you simply plug a FireWire disk or a USB disk onto your computer, and you'll be asked, would you like to enable Time Machine? And you say, yes, and you are done. That's it. There is no longer any step one even, and we detect new disk, we configure things for you and you're just set up to go.
So, part of how we do this is we changed the default configuration of Time Machine. Now automatic is turned on for you, automatically, so as soon as we have a-- as soon as we get a disk set, we-- we're running, so automatic is turned on. All your system files are backed up, your boot volume is backed up, and all your volumes containing user accounts are backed up. So, if you have to-- if you happen to have user accounts which don't reside on your boot disk, we'll grab those volumes as well so that we get all your user accounts.
So, we've added Spotlight integration now, which you should have seen in Steve's keynote. So that if you don't happen to find this on your disk, if you don't happen to find a file on your disk, you can start searching your backups and see if it happens to exist on your backup.
And if in-- in Spotlight knows about a particular file, maybe referenced in multiple backups, Spotlight knows about this, so if you won't return multiple hits, beyond the ones that-- if a file only exists once but it's in many backups, you'll only see once which is kind of useful.
And one thing I wanted to mention, this is not included in the WWDC build, this is fresh, bleeding-edge code here. So, as you go on and use your backups these days, now, especially with automatic backups, what's going to happen is over time you're going to get more and more backups. It's going to fill up your disk, so we have added an option to optionally remove automatically old backups for you to recover their space. (Applause).
Thank you, I guess. (Laughter) So let's talk about how-- how all that works. So, to begin with, whenever you delete a backup, you don't delete everything that appears-- that you apparently have in your backup. So, as I make a backup here, as we can see, we've got a Backup number 1 and it has 3 files in it, A, B and C. So, if I go work on my computer for awhile and then I do another backup, in this case what's happened is I make another backup, and in this case I've removed file A, so it's not in Backup 2, I've left file B the same, so it's still there, I've modified C, so now I have a new version of C in Backup 2, and I've added a file D. So here's a case where I've got two backups. There is 5 files existing on disk, so if-- but, and-- and Backup 1 appears to have 3 files.
But if I-- if I remove Backup 1, now, those go away and I lose files A and C, but B is still around, even though it was in Backup 1. So because of this, removing a particular backup doesn't remove everything that you see in that backup, it removes only data that is unique to that backup.
So, if we look at it that way, rather than seeing one huge backup, and then, you know, other backups that are also huge, rather we have a set of-- of individual backups where each one contains a certain amount of unique data. So if I remove any one, I'll only remove some amount of data and the amount of data depends on how much stuff changed in that particular backup when I made it.
And so, if we look at that over time, we end up with something that looks like this, where over time I've got a bunch of backups and each one of them contains a certain amount of unique data to that backup that would go away if I removed it. So the first thing is, that we do, is we always keep the backups that you specify. So if you say keep, you know, couple weeks, then we-- we lock those down and we never touch them.
Then, we go and we remove using a non-linear distribution, we remove files-- we remove backups that-- with a tendency to remove older backups more-- sooner than newer ones. And you-- we'll see as-- as this goes on we'll drop out more files, we'll drop out more files, and we end up with something looking kind of like this.
Where we have the ones that are guaranteed up front, and then we have a bunch of files left that cover the entire time span, so we haven't removed-- we haven't done anything like just lop off the last end snapshots, but we've actually kept them over time and they're just more-- there's more of them upfront because you tend to be making those-- those tend to be the files that you are actively working on where you might need to go back to one from a day ago, versus one from 6 months ago, you're probably only going to need to go back as an, you know, archive measure.
So, another feature we have added in encrypted backups. This is really useful because nowadays we've all got those little disks and they're everywhere, and we can either misplace them, somebody could break into the house and steal it, or, you know, your roommate could just decide he wants to look at all your files and he can't get on your machine, but he can the get the hard disk because you are not home and plug it onto his machine and try to read it.
So, in this case, we backup onto an encrypted HFS+ disk image. This is protected by a single machine-wide password that if you don't know, you can't mount the disk. But we have a-- we write a key into the assistant key chain so that when it-- when it's attached to your machine, we can read it, we can do backups to it, we can restore from it without any user intervention.
So, for-- for those of you who know what FileVault, this is our-- this is account encryption facility, where you can have your entire account encrypted and not readable by anyone else on the machine. So you have been using FileVault, we now support backing up FileVault accounts. FileVault accounts are backed up only when the user is logged in, because normally it's just an encrypted disk that we can't read.
But when you are logged in it'll, it'll occur, and each backup is-- each account is backed up on its own encrypted disk image which is encrypted individually, but we also put just as with regular FileVault accounts, we put a mass-- the system master password on top of it, so that the admin is able to restore all the-- the entire system including all FileVault accounts.
So, this year we've added support for backing up to servers. And we back up to servers by backing up to an HFS+ disk image on the server for each machine. So each machine has its own individual backup image. And each-- each machine-- so you can have multiple machines back up on a single server, and as-- as Deric mentioned earlier our focus is not on the enterprise but on the individual computer and LAN workgroups, so you have, hopefully, a few of these but not thousands.
So, for working with MacBooks, of course, the-- the-- being able to backup to the server is really great because now when you come home you can turn on your machine and you can just backup to the server or the AirPort Extreme that you've got in your house. You can also-- also as I mentioned earlier, we have added support for automatic backup on disk connection.
So that if you connect a FireWire or USB disk and you haven't backed up for a while because you've been out on the road, we'll notice that and we will say it's been a while, so we'll go ahead and start a backup even if it's not on the hour.
So, finally I wanted to put a plug in for how you can work with Time Machine, and the answer is Quick Look. Quick Look is very important to Time Machine. It is key to-- to how Time Machine operates. When their-- when your users go into their old backups and they want to find that file that they have lost, the way that they can find-- figure out which files-- they might have 5 files named the same thing that are different ages, the one-- how are they going to find the one that they want is using Quick Look.
So if you have an-- if you have a file format which isn't a standard file format, place add a plug-in for-- for Quick Look so that users can see their files in Time Machine. And as a bonus, it works right-- it works with Finder, it works with Spotlight, it works with iChat Theater, so it's, you know, this is important. Write a plug-in! And, just so that you will, we have sessions tomorrow, there's a session at 3:30 in Mission 2,0 that will-- that's Getting Started with QuickLook.
So that will be-- that will go over everything you need to do to write one, and then if you on Friday there's a-- there's a Lab in-- at noon on how to-- that will actually help your write your first one. So now I'd like to do a quick demo of this, and assuming it works better than the last one.
So, I have-- I was looking for some cars earlier, because I was thinking about, you know, it's time to get a new car. And I was talking to the loan guy, it was ACME was the name of the company, and here's some of the stuff I did, and now I can go back in Time Machine, because I know that I had some quotes on several different cars, and I don't remember which one, but I can go back and I can find, I've got a bunch over time, and I, oh, was this the one I wanted? I'm not sure, I can look at it, and I can easily just go and look at it with Quick Look, say, the Porsche Carrera, that might be a little pricey. I had some ones before I was looking at. Let's see.
This one? Oh, yeah, that's the one I was looking at, the Infiniti G5. It's a little bit less than the 911, and I think I can swing that price. Now the important thing to notice here is this is a machine that does not have Excel installed. So I'm still able to look at this as Steve showed in his keynote and I can look at all the information.
This is something that was sent to me by somebody else, and I'm still able to look at it and view it, and this is the kind of thing that you can get by adding Quick Look to your-- to your-- adding a plug-in to Quick Look from you. So...And so now I would like to go ahead and bring up Eric Weiss who is going to talk about some of the low-level implementation details of Time Machine and how it all works.
(Applause)
Thank you.
Hello there. My name is Eric Weiss. I'm a software engineer. I work on Time Machine's backend, specifically on a process called backupd whose job it is to wake up, make a backup for you, and go away, hopefully, without getting in your way. Here we go. So I'm going to give you a little bit more detail on how it is that-- that Time Machine actually works when it's going about making backups, and hopefully, give your an idea of how you guys as application developers can-- can take advantage of it.
So, there are 4 things I'm going to talk about. First is we're going to take quick look at what it is that a backup actually looks like on your disk. So, obviously the preferred way for you to, as a user go and recover and-- and view your backups is through the-- the Time Machine UI.
But, if you, say, want to take your backup disk and plug it into a Tiger machine and look at your backups, you actually are actually capable of browsing your backups entirely through the Finder as though it were a live disk that you were looking at, so I'll show you exactly how it is that that actually looks. Also if you're-- if you're going to interface with Time Machine backup library programmatically, you'll want to know what the layout looks like.
Second I'm going to discuss some features that we take really, really heavy advantage of in Time Machine. So, not-- not things that backupd itself implements, but public features available to all of you guys that let us get backups done and get them done very, very quickly. If you don't know about them, maybe you'll want to learn about them and take advantage of them in your own applications.
Third, with those features in mind, we're going to run through very quickly how it is that-- that backupd goes and makes a backup for you. And finally we'll talk about what you guys as App Developers, as Tool Developers can do to work with Time Machine and help each other out.
So, first, let's take a look at the structure of a backup on your disk. At the top level you have your-- your disk. As Robert mentioned, we need to be backing up to an HFS+ volume, because there are lots and lots of features that have been-- some added to HFS+ in Leopard, some in Tiger, and we rely on them very heavily within Time Machine, and so we make you back up to an HFS+ volume. The top-- top level of the volume you'll see a folder called Backups with the extension backup DB.
This is the only place Time Machine writes anything, so if you're looking for Time Machine content, that's where it goes. Underneath there you'll see a folder, associated with every computer that you're backing up onto that backup disk. In this case, I'd just be backing up my own Mac Pro, but say I were in a small-- small home, I might have 2 or 3 other machines all being backed up to that same disk, and here's where you would see the content for each of those individual machines. Underneath that you'll see the actual library of backups for that machine. So, each of these 3 folders underneath Eric's Mac Pro is a backup.
The name of each of the folders corresponds to the time that backup was completed. So, for instance, the middle one, the highlighted one would be a backup completed on June 12th at noon. And finally if you look inside one of those backups, you'll see the-- the disks that you're backing up on that-- on that date and time. So, once you get down to this level, you can browse those disks as though you were browsing them as a mounted live volume in the Finder.
Here we go. There are a couple of special cases, one of which is FileVault accounts that you are backing up. As Robert mentioned we added support for backing up FileVault accounts since the-- since last WWDC, and if you look for FileVault backup content, it will all be in a folder called FileVault Accounts at the top level of your-- your backups database, or your machine database. And inside of there you'll see an encrypted disk image of each user you are backing up.
As I think Robert mentioned, we can only backup FileVault accounts when the user is logged in because that's the only time that the the disk image is mounted. But, if you go and look at that disk image you'll see a backups.backupdb directory at the top level and then each of the backup snapshots for that-- that user. Keep in mind there won't be a one-to-one correspondence, necessarily, of system backups to FileVault account backups because of that requirement that the users be logged in.
And finally if you are backing up to a network location, we-- we require HFS+ to backup, as I've said a couple times already, but we want people to be able to backup over the network. So the way we do this, is to create disk images at the top level of the network volume which are themselves formatted HFS+ which we can write into as though they were local volumes, as though we have-- as though a sandbox.
So, we support AFP, for instance, if you have a fancy new AirPort Extreme and you can plug in a disk, or SMB, or NFS, and in that case, yeah, it's pretty simple, top level of the volume you'll see a disk image, encrypted or not according to your preference, and if you mount that disk image it will look exactly like the previous two cases.
All right. So now I'm going to talk about some of the publicly accessible core technologies that backupd uses in order to get backups done and get them done quickly. There are 4 things that I want to talk about, sorry, 5. The first is Extended Attributes. Extended Attributes are not new to Leopard, they were added in Tiger to HFS+, and they allow you to apply attributes to-- to, sorry, key/value pairs to file system objects.
Files and directories, which essentially let's you treat them, least from my perspective as though they were programmatic objects. We use the file system extremely heavily in Time Machine, and so, we need to store lots of data on files, and this is-- this is the way we do it. Next, Access Control Lists, ACLs, again, added in Tiger. But if you don't know about them, you might-- might be curious. This is how we protect your backups against accidental tampering.
Next the FSCopyObject API, the final feature I'll talk about added in Tiger. It's been enhanced a whole bunch in Leopard and this is a fantastic Copy API. If you need to copy things, whether or not you're doing it from a backup, I highly recommend FSCopyObject. It's super fast, it's robust, and it's used by all sorts of applications on the system.
Time Machine uses it to copy everything, Migration uses it to copy out of backups, and Finder uses it to copy everything whenever it makes a copy. New to Leopard is a feature called Archive Directory Links, added to HFS+, which essentially gives us the ability to hard link a folder. Which is how we represent an unchanged hierarchy of files and directories very, very quickly.
And the FSEvents API. So there's a session on the FSEvents API and-- and for a whole lot of detail I recommended you go to that. It's in Pacific Heights at 3:30, I think it's called Introduction to FSEvents. FSEvents is a notification mechanism for file system events. Whenever it is that you change a file on disk, you can not only be notified live of that change happening, but in Time Machine's case, we wake up at some point and we want to say what's happened on these two disks over the last week. Which is the last time we ran a backup, and we'll get a list of all the directories that we care about without needing to scan them all individually.
Looking a little bit closer at each of these features, first, Extended Attributes. As I mentioned they're just key/value pairs that you store on-- on file system objects. And the things that we really like about them is that 1) since they are file system-- since it's a file system feature, obviously they're stored persistently across reboot, they let us-- let us query whatever information we need to about-- about backups that we have. For instance, here's some stuff, some things we used them for. One is to when we go to make a backup we want to figure out what the last backup was very quickly.
We do this by remembering an attribute on each of the backups saying what date if was completed, so we look up, we find that the folders that has the last completed date, the one that's most recent, and that's our most recent backup. And finally if you look at the directories associated with each volume you're backing up you'll see a bunch of Extended Attributes on them. Oh, I should mention that these are certainly not things I would recommend relying on existing here. I'm just giving you a couple of examples of what we use them for.
But, right, if you look at the directory containing a backup of a volume, you'll see a volume UUID, so every HFS volume has a UUID associated with it, which ends up being super useful for our case, because when we have a backup of your volume, it might have a certain name, but if you rename your volume we don't want that to force us to make a whole new backup of that volume again. So, rather than relying on the name of your volume, we-- we rely on the UUID not changing.
Archive Directory Links. These guys are new to Leopard as I mentioned, and if you know what a hard link is, these guys are hard links to folders. So, what they do for us is they let us super quickly represent a directory of files and folders that could be N levels deep, in a single operation. The thing that's very nice about them is that they are browsable since they're implemented in the file system they're browsable seamlessly through Finder and through Terminal. If you go and look at them they look exactly like normal directories.
And they degrade to aliases on Tiger systems. So you can take your backup volume, plug it into a Tiger system and still navigate it and view your backup content on it. If you don't know what a hard link is I will just very quickly explain what it is. So, normally you have a file which gives you a reference to some blocks on your disk and its link count is originally 1.
So if you create a hard link to that file, you end up with another reference to those actual blocks on disk, this would be as opposed to a simlink or an alias which would essentially be a-- a pointer to-- sorry, a simlink would essentially be a small file that just includes the path to the original file. Instead you actually have another reference to those blocks on disk which lets you do the neat trick of deleting one of them and still being able to access that content on disk through the other hard link.
And finally, when you get rid of the last hard link, there goes the actual blocks on disk and they're freed to be reclaimed. So this is the basic principle that we rely on in order to compress our backup content and keep it-- keep it small. In the case of an Archive Directory Link, common backup scheme that involves hard links would have one that-- let's say you are backing up this one directory and nothing changed inside it, a lot of backup schemes will do something where they'll hard link each of the files underneath that directory which requires you to make end hard links for end files underneath that top level. What we do in Time Machine is replace those three links just to make it painfully obvious, with a single hard link to the top level of that-- that volume.
Access Control Lists. These guys added to-- to Tiger, again, but not sure how many people actually use them and they're super useful. The reason that we use them in Time Machine is for essentially two reasons. First of all they can be applied on top of and in coordination with POSIX permissions and other ACLs. So when you copy something into the backup store, we want to be able to protect it. We don't want someone to accidentally browse through their backups, not realize they're inside their backups, and open something up, modify it, delete it, and whatnot.
So we want to-- we want to protect the content of the backups. So we-- we apply these ACLs to them, that-- that tell people you can't-- what do we have on there, don't add anything inside here, don't delete it, don't modify it, etc., etc. And the nice thing is since we're not doing that through-- through a POSIX permissions, when you go and restore that content later we don't have to worry about it being out of sync with what you had originally put in your backup content. All we have to do is, and we do this through the FSCopyObject API is strip off the ACLs that were put on there as they were copied into the backup store.
There we go. And finally the FSEvents API. Again, I highly recommended going to the session on it. The FSEvents API, managed by fseventsd, is the guy that tracks the list of events, the event history that's happened on all the drives that you are backing up. So you can get-- you can get events live, you can get events historical, and this is how Time Machine very quickly figures out what it is that we need to backup.
Now I wanted to mention one-- one thing about this that I think illuminates the-- the-- the greatness of this feature is that there are situations where you might not have an FSEvent record. For instance, let's say I take my backup disk, remove it from Leopard, plug it into a Tiger machine.
So, we don't have the FSEvents even running on Tiger, and so there would be-- what happens when I go and make changes there? Well, nobody knows about those changes, and when I go and plug it back into the Leopard and make another backup, I wouldn't want to have just a bunch of stuff that I had done that the Time Machine knows nothing about. So, the FSEvents framework is able to tell me that that drive was modified elsewhere, and sorry, it can't tell me what those changes were.
Which requires backup, or sorry, which requires Time Machine to have this separate fallback path that actually does the manual scanning of your-- your previous backup with the entire contents of all your disks being backed up. And so, if you want to see how much faster you get to be when you look at the historical event record, and you-- you take advantage of it, give that a try and then press-- press backup, then you'll see that it takes quite a long time on the order of 5 to 10 minutes to scan a very large disk.
Which speaks volumes to the-- the effectiveness of FSEvents. Now, we'll put those features together and briefly discuss how it is we go about making a backup. First backup is-- is pretty simple. There's not too much trickery you can do with that. We basically take all your disks and we are going to copy everything from disk A to backup A, obviously filtering out the directories Robert mentioned which have your temporary stuff, your cached stuff.
And all the copying happens through the FSCopyObject API which takes care of slapping the ACLs on all of your backup content so that we don't let people modify it later. The more interesting case is an incremental backup where the first thing we do is we wake up, we find the most recent backup that we have, and we query the FSEvents daemon. We say what's changed in my backup structure.
Let's say I have 4 items that I'm backing up, 4 folders. The FSEvents API might say folders B and D changed. We then know that those are the only two places that we have to look for any other changes, so we'll scan inside B, we'll scan inside D, we will copy whatever it is that we have to into the new backup.
Finally we make 2-- 2, sorry, I perform 2 other operations to say hard link A, hard link C. These could be directories, these could be enormous directories like /System, and that's it and we're done making a backup. Except, there are a couple other things we do. One is that we want to guarantee that the point that we finish a backup, you have a-- a-- a-- a coherent piece of information that says this is exactly what my disk looked like at this given point in time.
So once we're finished a backup we do this extra pass where we say, we ask the FSEvents API, did anything change during the course of this backup that we actually care about? And if so, we'll do another pass making another backup of that smaller set of content, and the nice thing is this will eventually catch up quite quickly, because as we complete backups, there is less stuff that's changed and so we have less things-- or fewer things we have to look at and we finish even quicker. And finally, the last thing we do is-- is put a bunch of attributes on the backup that we have, erase all the ones that we made in the middle, and make that your new backup. So, I am going to make a quick demo for you.
This is largely because Deric, in his desire to show off HFS+ back in the day, always really wanted the demo copying a file. So now I am going to, in fact, demo copying a file. Okay. So, here's the machine I've been looking at earlier. Let's just do something to generate some kind of file system activity. I'll duplicate this-- the presentation, I believe this is the presentation from 2008 that your saw earlier. We open up the Time Machine preference. (Laughter) Not that copy, the next one. Oh no. What's happening? Huh.
(Applause). Let's see if I can give this one more shot. Go away. Work this time, okay? Oh, man. Okay. I guess copying a file will, in fact, have to wait for yet another year. (Laughter) Trust me, it's very cool though. But anyway, what you would have seen had I been able to actually make a backup is that despite the fact that we're backing up an entire system and there are, what is it? It looked like, as of a couple days I took a look and there was something like 92,000 files and 400,000, sorry-- 400,000 files, 92,000 directories in a typical install. Oh, could I have the slides back? Oh. Well, I hear that it's not spinning anymore. So, could I have the demo back one more time? Sorry. Yes. All right. Okay. If it spins again, it's your fault, Robert. All right. (Applause) There we go.
So, again, my-- this disk has about 92,000 directories, about 400,000 files, the only thing I actually changed was that on my desktop. And hopefully through the magic of FSEvents, we see that we have already figured out what it is that we need to backup. Now it is that we're copying those files as they were, the ones that were changed.
So it's probably taking a little bit longer because there's a bunch of stuff that actually changes in the background just in the course of keeping your machine up and running. And so it has to come back and get all of that. So we've just made a backup.
(Applause)
And, third thing I'd like to talk about, sorry, the final thing I'd like to talk about is how it is that you as app developers and tool developers can work with Time Machine. So lots of ways you can do it. One: If you have any interest in copying things out of the backup store, you can totally do this. This is just fine, but you have to use the FSCopyObject API.
That's actually a good thing because the FSCopyObject API is really, really good. And it's really well tested, and it's been improved a whole bunch in Leopard simply because Time Machine copies an enormous amount of stuff and we've been working very closely with them to-- to-- to make it as good as it possibly can be.
It's extremely fast, it's robust, it knows about all the stuff that I've talked about. It knows about the Archive Directory Links, it won't have a problem copying them. It knows about the ACLs because it put them on in the first place and it can strip them off on the way out, and it's used by all sorts of applications throughout the system.
Secondly, you're an app developer and you have-- you use the file system. Let's say you have documents that you create or you write a bunch of files. Here's are bunch of things that you can do. There are 3 that I'd like to talk about. One is if you write a lot of files or you write big files, exclude the ones that you don't need to have backed up. Second, try not to have directories that have huge numbers of files in a flat directory structure. That just makes backup a little bit slow.
It's not tragic if you do it, but it will slow things down. And finally, if you have really-- really big files that change constantly, try and avoid those, because that will eat people's disk place very quickly, and the most important as Robert mentioned and-- and Deric mentioned also, please support Quick Look if you have document types.
So, if you're excluding stuff there are things that you should exclude, obviously temporarily files, cache files, index files, anything you can rebuild. The easiest way to get this stuff excluded is just to put it in a standard directories for these sort of things. If you do that, Time Machine has logic built in where it will not go ahead and backup anything that you're putting in your /tmp directories, your cache directories so that's probably the easiest way. But if that doesn't make sense in your particular case, we have some API for you. There are 2 versions of this API.
CSBackupSetItemExcluded. And what you do is you call this on a path, and you say, depending on the value your giving to an excludeBy Path, either Time Machine will remember that specific path and never backup whatever happens to live there at that time, or if this is an item that you don't want to backup that might move across the file system, set-- pass false to that, and we'll set an extended attribute on this specific item that you don't want to backup and we'll track that wherever it goes and make sure that it never gets backed up.
Finally, sorry, thirdly, if you-- if you're using the file system, heavily, again, try not to create really, really big files that need to change all the time. The problem that we have with them is that we're making backups every hour, and when we do that and you have a several gigabyte file, unless people have enormous hard drives we are going to fill those up pretty quickly. The easiest way to avoid that or-- or the-- the, economical example of something to avoid is an enormous file most of which is static but you have some-- some indexable content that you are constantly updating on that file.
And finally, yeah, if you have a-- if you have a single folder that has a huge number of files inside it, this is going to slow backup down, because, as I mentioned a couple of times, we're, we're creating these Archive Directory Links and this is one of our real-- really major performance advantages is being able to-- to skip these really big directories as quickly as possible.
But if you have a very flat structure and a single thing changes in it, so you have a 10,000 file folder and one file in it changes, we can't create a directory link to that and we need to scan the entire contents of that folder; we need to scan the entire contents of your previous version of that folder, and make 9,999 hard links to-- just so we can copy over a new version of one file.
So, again, just slows us down a little bit, it's not tragic, but if you can avoid it, that would be fantastic. And, finally, again, support Quick Look. If you have your own document style this is the best way to look fantastic in Time Machine, and they are pretty easy to write. So, that's all that I have for you. And, thank you very much, we're going to move on to Q&A now.
(Applause)