What's New In The File System - WWDC 2006

OS Foundations • 1:01:07

If you read or write lots of files, or care about the precise way your data is laid out on disk, you'll want to attend this session. This is the best place to find out about the latest features of Mac OS X's filesystem architecture. We will cover change notification, 64-bit inodes, extended attributes, ACLs, ""safe save"", and the other APIs, formats, and structures you need to know to make optimal use of Mac OS X's many filesystems

Speakers: Deric Horn, Dominic Giampaolo

Unlisted on Apple Developer site

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.

Hi, good afternoon. Welcome to session 415, What's New In The File System? I'm Deric Horn, the Application Frameworks and File Systems Evangelist, and I'll be joined a little bit later by Dominic Giampaolo, File Systems Engineer. So today I'd like to start off by going over a little recap of some of the important features we introduced in Tiger.

Those include the copy engine, extended attributes, introduction of access control lists. Then we're going to talk about some of the really important new features that we're introducing today in Leopard. Things like ""safe save", move to trash" functionality. And then Dominic's going to come up on stage and talk about filesystem event notification. This is a very heavily requested feature from you, the developers, and it's really a very elegant API. And we'll have plenty of time for Q&A afterwards.

So I'd like to start off with a little filesystems architecture diagram. Here on the upper left, we see a typical Cocoa application. Cocoa applications being object-oriented call down into the NSFileManager. And NSFileManager offers object-oriented APIs. And in turn, the NSFileManager calls down directly down to the BSD layer, and also down to the Core Services File Manager.

I just want to take a minute and kind of talk a little bit about the Core Services File Manager. The Core Services File Manager is what we used to call the Carbon File Manager. And it contains those familiar APIs to us, those FSRef-based APIs, the old FSSpec-based APIs, the APIs located in files.h.

So the core services file manager offers an abstraction layer above the BSD APIs and the volume format, like discrepancies underneath, right? How resource forks are stored, how extended attributes are stored. So we don't have to bother with that. We use one familiar API. And it's really a thin layer right above the BSD APIs, actually located in the Carbon Core framework. So obviously, we have Carbon applications, which call down directly to the core services file manager.

And then on the right hand side, command utilities. Typical command line utilities will call down directly down to the BSD APIs. But we often find that command line utilities also like to take advantage of the abstraction that the core services file manager offers. So they also call down into the core services file manager.

So I've got a long history with the filesystem. I started out on the development side back in the HFS+ days and moved over into the filesystem on the Carbon, and then moved over to developer relations where I got to work with a lot of you. And by far, I'd say the most requested feature from you developers has been to give me access to the same copy engine that the Finder uses. Well, in Tiger we introduced the FSCopyObject API. This is a high performance API which is tuned for performance across all volume formats. And it's the same copy engine that the Finder uses.

During its copy, it manages copying all the metadata for you, the data fork, resource fork, extended attributes, and so forth. And it comes in those four familiar sets of APIs, and those are the FSRef-based, both synchronous and asynchronous APIs, as well as the path-based synchronous and asynchronous APIs. So we have the FSCopyObjectSynchronous for the FSRef-based synchronous, versus the FSPathCopyObjectAsync, path-based asynchronous APIs.

So here's a little snippet of how to use the FSPath Copy Object Sync API, and underneath it, its counterpart, the FSPath Move Object Sync API. I'm not going to go into the code, but periodically you'll see these yellow boxes at the bottom of the slides. And that will detail where you can find examples. So if you look in the WWDC attendees site, and the area associated with session 415, you should find a sample called FS file operation. And this is going to have details of the implementation of the copy object APIs.

So, extended attributes. Extended attributes are really a great way of associating some metadata with any file or directory. So if you want to leave a little breadcrumb with any file, whether you own it or not, and just attach it to a file, you can do this with extended attributes.

It's supported across all volume formats, and we typically recommend that you use the reverse DNS name convention. So if you were to look at your filesystem today, you'd notice a lot of files have the com.apple.finderinfo extended attribute, and that's going to contain just what you'd expect. Things like the type, creator, maybe label information that the finder is going to save.

I also want to mention the "ctime" that you get back from "stat". When you "stat" a file, it will return back the "ctime". I just want to clarify a little bit what this is. This actually represents the status change time. So the last time you change any attribute on the file, the "ctime" is updated.

So for instance, if you're writing a backup application, you might want to compare the "content_mod_date" to the "ctime" to see if potentially you don't have to actually backup the actual contents of the file, the actual data portion of the file. You may just be sufficient in backing up the attributes, extended attributes, and metadata supported with that file.

But you should be aware of the current limitations of extended attributes. So in general, extended attributes are going to be under 4K. I believe on HFS+ volumes, it's 3083 bytes. I think that's been drilled into me now. It's set up so that you're going to read an entire extended attribute in one fell swoop. It's not a stream-based call. You're going to go ahead and allocate a buffer and read it all in at one time. And likewise, when you write an extended attribute, you write it all out at once. Now, I said this is kind of a generalization.

The one exception is the resource fork. So the resource fork, you can access through extended attributes, and at this time, you can access it kind of in a stream-based API. What I really mean by that is you can set the file position to potentially halfway through the resource fork and continue reading.

Whether or not you deal with extended attributes directly, you should always be mindful of them. Whether you're writing an FTP client, an archiver, or so forth, there are always three rules that you have to be aware of. And that is to be mindful of the data of the file, extended attributes of the file, and security information. So sometimes I wish all Apple engineers were mindful of these three rules.

So let's take a deeper look at the APIs. The APIs to access extended attributes are the get-x-addr, set-x-addr, list-x-addr, and remove-x-addr APIs. These are all BSD-level APIs. Really shouldn't matter whether you're Carbon, Cocoa. You can still call BSD APIs directly. The method for actually getting the size of an extended attribute is quite easy.

We'll use the Get XAddr API and specify null for our buffer, at which time it will return back the size of the extended attribute. You can then allocate your buffer to be that size, call XAddr again, passing in that buffer, and read in the entire buffer. Again, for samples of how to use the extended attribute APIs, we have the filesystem_example sample.

So, access control lists. So typical Unix permissions are what we would call, or what we all know of as, owner group, other, read, write, and execute, or as I call them, user group world, read, write, and execute. Very limited. We're not allowed to create more than 16 groups. We can't nest groups.

So in Tiger, we introduced access control lists, or ACLs. They provide a much finer granularity security permissions for a file. So for instance, we can either allow or deny a user or group to append to a file. We can either allow or deny someone to actually read the ACL information on this file or directory.

So access control lists were introduced in Tiger, but we had them turned off by default. Now in Leopard, we're turning them on by default. There's a few tools that you can use to access your access control information. So a typical tool, ls, if you specify the "-e" option, it'll print out all the extended attribute information associated with that file or directory. Likewise, ChangeMod with the "+A" option is a great way of either allowing or denying users access control list information.

You should also be aware if you're writing your application now and you want to make use of access control lists on Tiger, for instance, you can use the FS ACL control tool to turn on or off access control lists for that volume. So as I said in Tiger, they're turned off by default. Use the FS ACL control tool and you can turn them back on on that Tiger volume.

Again, these are BSD-level APIs, and I wanted to mention that they're used heavily in the .Mac servers. So this was really a big push to control access information and permissions for users and groups on the .Mac servers that we all use. And now in Leopard, we have full finder integration, so when you do a "Get Info in a File", you're going to see all the ACL information, more detailed ACL information. Again, you can find samples of this in the "File System Examples" sample.

Okay, now on to some of the fantastic new features that we're introducing today for Leopard. First one, ""safe save". We're all pretty familiar with how this works, right? You have your original file, you write out a new file, once it's coherent, we're going to want to replace the original with our new file.

Up until now, I would say it's been almost a burden to implement ""safe save"" for yourself, based on all the different volume formats, and the details of the underlying formats, and which options they support and which ones they don't. We're going to talk about the ""safe save"" API, replace object, available through the core services API through files.h. It's supported on all file systems, and it's largely based on the renamed semantics. It's designed to properly preserve all your metadata.

and by and large, in the past, maybe on Mac OS 9, we used something like FSExchangeObjects. And this was a great way to do safe save. But the big problem on Mac OS X is that this was really limiting. It only worked on files, and it was really only designed to work on HFS+ type volumes.

So we weren't able to use it on NFS volumes, for instance. By releasing a new API, this gives us the opportunity to do what's best in each particular volume format. So for instance, on HFS+ volumes, we can go ahead and call exchange data under the hood, if that's the best way to do it.

I want to mention that now in Leopard, if you're a Cocoa developer, it's used by default in NSDocument. So in the case of a catastrophe while you're saving a file using NSDocument, you may notice that a file might be created called "a document being saved by text edit" in brackets.

This would be your original file, maybe while you're writing it out, network cable gets pulled, power goes out, something along those lines. You can get more information on that in the AppKit release notes, and specifically there's a section in there called "advice for overwriters of NSDocument reading and writing methods". Thank you.

So the Replace Object API is Mac OS X API, which does really what we'd expect from a Mac OS X API. What I mean by that is we can not only replace files with files, or directories with directories, but we can replace files with directories, or directories with files. This is really especially useful if you've ever maybe edited a README file in TextEdit.

And you're editing your README file, and then you decide to start pasting some pictures in there, and you notice that your README file changes from README.RTF to README.RTFD. That file is actually a document package. So for instance, if you're saving a file to your iDisk in this case, you can go ahead, you have your original file, go ahead and make your entire document package hierarchy, make everything coherent. Once you have the entire document hierarchy coherent, replace that original file with your document package. There can be only one.

What I mean by that is the way the API works by default. When you call replace object, you have your original file and you have your new file. After replace object is called, you end up with your new file in the original file's location, one file object when you're done. Of course, we have plenty of options to override the default behavior. The "save original as backup" is one of them, in which case you might want to specify a directory to save your original as a backup.

We have an accessory routine called fsgettemporarydirectory for replace object. This is a higher-level API which might sit on top of find folder, and it will return back the best place on this volume to save your files as a backup. Not only is it a good API to use to find the best place to save your backup, one of the things, as you can imagine, about replace object is that your new file and your original file have to be on the same volume.

We can't have our new file on our local file. We can't have our new file on our local disk and our original file on our network disk and expect this to be one operation. By calling the fsgettemporarydirectory for replace object, it will return back a suitable location on the same volume as the original file.

The default behavior of replace object is to actually merge the extended attributes of the original file and the new file. In the case of a collision, the priority is given to the new file's extended attributes. and with ACLs, we take quite the opposite approach. We will take the security information from the original file, move the new file into that location, and then apply it to the new file. Again, we have options to override both of these behaviors.

Here's some sample implementation file. Actually, there is actually sample code, and I believe the sample is called FSReplaceObjectSampleCode, found in the same location. Move to trash. This is actually quite a complicated problem. When someone wants to move a file to the appropriate trash, it's hard to know what the appropriate trash is.

We have the /usr/trash, /trash, /networktrashes, I believe. And some volumes don't even have trash cans. So typically what developers have done in the past is maybe create an Apple script that says something like, "Tell finder move file object to trash." This is really a suboptimal approach for many reasons.

Once the finder does this, you have no idea where the file went. You have no idea if it succeeded or if the finder is presenting a dialog in the background. So now in Leopard, we're introducing the fs move object to trash API. It will move your filesystem object to the proper trash. So now in Leopard, we're introducing the fs move object to trash API.

It will move your filesystem object to the proper trash. and it will return a reference to the moved item. One of the reasons why it returns back a reference to the moved item is because it can potentially rename that filesystem object en route to the trash in case of a name collision.

The Move to Trash API will return back a -120 or directory not found error in the case where it cannot find a trash can. So in this case, the finder typically presents up a dialog that says something like, the item main.c will be deleted immediately. Are you sure you want to continue? You can do the same thing.

Here's a list of other information that I wanted to get out for Leopard. Kind of a grab bag of small tidbits here. In Leopard, we are finally moving to 64-bit file IDs. What this really means is that the alias manager will automatically resolve your 64-bit file IDs for you, and so forth.

That integration is already built into the Finder, and we will be releasing higher level APIs to gain access to the 64-bit file IDs. UFS. With the introduction of case-sensitive HFS+ or HFSx volumes, there is no longer a big reason to have UFS disks around. So starting in Leopard, we will allow you to read and write UFS disks, but we will no longer format disks as UFS.

Last year at WWDC, we announced the deprecation of non-thread-safe filesystems. Through the KPI right now, there's a flag to specify whether or not your filesystem is thread-safe. And we went through the hoops in VFS layer to make sure that your filesystem did the right thing. Now, starting in Leopard, those non-thread-safe filesystems are going to be unsupported.

and things that I know that you would never do. Dot underscore files, right? I think most of you know what those are. Those are the Apple double files that we sometimes have to save the resource fork or extend attribute information in. We store them in dot underscore files, for instance, if you're saving it to an NFS volume, and we've all seen these kind of lingering around.

In the past, a lot of this functionality was in the core services file manager. So the way we do things is, for instance, write out the size of the-- when we touch the resource fork, we would write out the size of the resource fork into the header of the Apple double file. And then we'd go ahead and write out that entire resource fork.

Well, as we move this functionality from the core services file manager and push it down into the kernel, we operate under different constraints now. So now we're going to effectively write out the size information and the resource fork at the same time when the file is actually closed.

But you shouldn't be accessing the ._ file directly anyway. The /RSRC. If we have a file named "foo" on an HFS+ volume, for instance, we can always go cat foo/rsrc, and this will actually catalog the entire resource fork of the file foo. This was kind of a hack, as I mentioned, because it really only worked on certain volume formats. And now, since we pushed a lot of this functionality down into the kernel, we're removing that back in, or that hack anyway.

Removal of VOLFS. So in the past, if you ran the FS usage tool, you would notice a whole bunch of reads and writes going to /.vol directory, and you typically see a file ID after that. We're starting to close the hole on VOLFS, so you're no longer going to see that.

We have a great tech note, 11.13, describing the /.vol and VOLFS. And one of the comments in there, or one of the notes in there, says, "Under no circumstances should your application construct paths with /.vol." So this still applies. At this time, I would like to welcome Dominic Giampaolo.

All right, I'm on. I'm Dominic Giampaolo. I'm a filesystem engineer, and I'm here to talk about the filesystem events framework. First, I'll go over the agenda of what we're going to cover today. I'm going to start with some history and background about what the problem is, or how we see it, and what's difficult about it, what we were trying to do with the FS Events Framework. I'll introduce the FS Events Framework API itself. And of course, since this is a developer conference, we're going to go through some code and an example application that uses the FS Events Framework, and then review and wrap up.

What is the problem? Well, there's two classes of problems that we see or we get a lot of requests for. Did anything change in this hierarchy? So you have an application like iPhoto or iTunes, and it manages some large hierarchy. Did something change underneath of there? You have other applications where they're running live, including something like iPhoto or iTunes or the Finder.

Is anything changing in this hierarchy? So there may be some window that's open that represents some current file system state, and you want to know, is anything currently changing underneath there because I need to update my display? So those are actually really difficult questions. And historically, there has been no good answer to those questions. We also see that different applications want different kinds of answers. What changed differs depending on what type of application you are.

So what exactly is a filesystem and event? At a low level, a filesystem event is a create, a rename, a delete. It's other more subtle things such as the update to the mod time or the ownership permissions of a file. It's when a file changes. It's really quite a bit of low-level events.

At a high level, an event is something like, "Well, I saved a document." From my perspective, that's one thing. As a user, I hit save and there's a file there now. In reality, when you change a file and you do a safe save, as Deric alluded to, there may be a whole bunch of operations that go on under the hood.

And in practice, if you've ever looked at it from the underside, as I'm frequently doing, you'll see that a filesystem save operation from an application may be 5, 10, or 20 even, actual events that come through at a low level. The raw stream of events that we see generated is really very difficult to manage. It's a big fire hose and it's quite complex.

Now, clearly everybody hasn't just sat around for 20 years waiting for us to introduce the FS events framework. They've come up with other solutions. At one level you have polling. "Did this file change? Did this file change? Did this file change?" And so on. Or a full rescan. You know, if you have a backup application, typically it walks the whole filesystem hierarchy to find out what is different and what needs to be backed up.

[Transcript missing]

There's no history of events. You can't say, "Well, this backup application, or whatever, this synchronization app, or whatever have you, hasn't run for a week. What changed in the last week?" So, not having any history, all of these things were sort of live, not having any history was a bit of a problem. And the chaos subsystem is extremely powerful, but kernel-level monitoring is not practical for most applications. Typically, people are not going to write a KEXT to go along with their user-level app.

Something else that we introduced in Tiger was called DevEvS Events. Well, we didn't really introduce it, but a number of you found it anyway. It was implemented for Spotlight specifically. And it provides a raw stream of events to support the functionality that Spotlight needs. It's all locally generated events. So any change to any file system that's mounted locally, or any change that's produced by a local application produces an event.

The events are very raw. As I said, you see everything that comes out of the kernel. So a stat change, a permission change-- well, permission change is a stat change, an ownership change, when a file is closed after being written, you get a modification, renames, deletes. It's everything.

Because it's a kernel-based mechanism, it's also sensitive to slow clients for some reasonably complicated reasons. It has limited buffer space in the kernel. And it's shared by all the clients. If someone is slow at reading them, it will back up, and eventually it will run out of space, and events are lost, which is a fairly catastrophic event. So because it's sensitive to slow clients, it's not really appropriate to open that up to general purpose applications. Besides, with the event stream being as raw as it is, it's not as useful either.

With all these problems, we clearly have an opportunity on our hands. So, we started thinking about it, and we decided to come up with the FSEvents framework. I'd like to talk a little bit about our design rationale, so you understand where we're coming from. We sort of thought about some of the different clients. Clearly, as you can see from the Leopard preview, backup is important, sync is another application, and of course the Venerable Finder. Backup and sync will ask the question, "Did anything change in this hierarchy?"

The Finder is more of an online, live kind of "Is anything changing in this hierarchy?" type of application. Clients we know that we aren't going to satisfy are virus checkers. Virus checkers really have very sophisticated requirements that can't be met with this API, and the KAUTH subsystem is actually more appropriate. So, we're not trying to be all things to all people. We also knew that we had to put some limitations on the API. and constraints on the problem to make it a little bit more tractable.

Storing a complete log of events is just not possible. Your computer is designed to run user applications and to do things, not to sit around storing all the stuff that you're trying to do, but it's being interfered with by the fact that you're storing all the events that you're trying to do.

So we have to really filter the event stream down to something manageable, something digestible that makes sense to applications. If you look at the raw event stream, it's extremely complicated to understand, "Oh, that was a safe-- those 20 operations were actually just a safe save of a single file." And as I said, we can't be all things to all people, so we had to narrow things down a bit.

So without further ado, introducing the FSEvents framework. This is a core foundation-based API that lets you watch an entire hierarchy of-- a file hierarchy for changes. You get directory-level notifications of changes that happen within that hierarchy. So if you say, "I want to watch the user's home directory," and I'll go into more detail on this, you'll find out specific locations that have changed within it.

We offer a persistent change history, and again, I'm going to go into more detail about all of these. And we have fine control over the frequency of updates that you get. What don't you get? You don't get events for changes to specific files. And I'll talk about this again more in the later slides, but you can't say, "Did file foo change?"

What does it architecturally look like? This sort of helps to understand what's going on under the covers. At the top level, FSEvents is obviously the framework, the API that you as developers can call. There's a corresponding daemon, FSEventsD, that orchestrates things behind the scenes. FSEventsD is based on the DevFS events device that provides the raw stream of events from the kernel. It filters, FSEventsD filters the event stream into something that is coherent, sends the updates through a Mach message to client applications, and it also keeps the historical records.

Actually, I should go back. So as you can see, we have the event history that's being stored away on disk. FSEventsD is reading the events out of the kernel, passing them over to the FSEvents framework, which in turn hooks into the core foundation-based API. So what are the concepts behind the FSEvents framework?

You can monitor a path in the filesystem namespace, and you get only events that happen beneath that path. So yes, you can watch slash and you'll see all the directories that are changed, all the directories that have changes made to them anywhere in the filesystem. All events that you get have a corresponding event ID.

Event IDs are 64-bit, so they never get recycled, at least not for, now let's say about 5,000 years. You can ask for all the events since a specific event ID. As I said, event IDs are persistent, so they last across a reboot. When you get an event, you're supposed to use it to figure out what happened that's of interest to you, and then, of course, do the right thing.

The path that you watch doesn't have to actually exist. It's actually just a string-based prefix match. So you can watch something that doesn't exist, and when it does come into existence, you'll start receiving events for the changes that happen under that path. For now, the path has to be an absolute path. Even in the future, if we allow you to specify a relative path, we're just going to turn it into an absolute path.

As I said, you get events for anything that changes under the path. You can watch any number of paths that you want, I suppose within reason, of course. And security, right now there is none, but obviously there will be, so you're not going to be able to watch things that you don't have permission to see.

Here's some specific examples. If you watch users foo and then someone creates the file users foo documents my cool stuff new file, you get an event, the event that you get is for the directory users foo documents my cool stuff. If someone creates slash temp ignore me, you don't get any event, nothing comes through to you because you're not watching that, you're only watching users foo.

You don't find out about changes to the root of the path. So for example, if you chose to watch a directory which didn't exist, let's say users foo sumdir, and then someone creates it, you don't get an event for that. If then they create something inside of there, then you do get an event. Now as it says, this is open to discussion because we've kind of gone back and forth. Clearly it's something that's easy to work around. You just watch one level above for the thing that you're looking for. But it's something that we'd like to get feedback on.

[Transcript missing]

We also had a few assumptions. And I want to detail these so we understand where we're coming from. First off, clients that want to monitor some large file hierarchy have to have a mechanism to generate their initial state about what it is they care about about that hierarchy. So you have to have code that does a full scan and knows how to process that and build state.

If you have code that does a full scan, it's pretty straightforward to make it code that does a partial scan by saying, oh, just go update this particular directory. And given a directory that changed, you can scan that directory and update your internal state based on what differences you find that are interesting to you. So we sort of have, like I said, those assumptions about what clients are going to have to do to maintain their state given the event notifications that they receive.

Now, we'll go to this demo machine and we'll go through a simple example that shows how the raw event stream that comes out of the FS events API. So we have this tool down here called FSEvents. And this just prints out all the events that it receives. As you can see, I'm specifying an option of a latency-- dash latency 1.0.

And I'm going to give it the path slash so that we see everything that changes on the system. So we may actually see some other stuff depending on what Damon's run, what goes on. Now, I have this other terminal window. And I'm going to make a directory, and I'll call it foo-- wonderful name.

About a second later, you see that we got an event for users/apple/dbg, because that's the directory. If you see, that's where I'm at, and that's the directory that I created. If I go into foo and I touch new file, we're going to get an event for the directory foo, because that is what changed. Another example, I'm going to geek out here for just a second, to create a whole bunch of files.

So we're going to create a couple thousand files, and I want you to observe... So that created... This machine's too fast. I need to slow it down. Well, alright, I can play this game too. We'll create a lot of files. All right, thwarted. All right, so now what you see is that we're getting multiple events as I'm creating many thousands of files here.

And they're coming through. And what I wanted to demonstrate is, if I went down here and changed this latency-- like you may say, oh, one second, that's too long. I can't tolerate that kind of latency. So I'm going to have a 0.1 second latency. Now, if I go up here and create-- well, I'm not going to create that many files this time. You see, we get a whole lot more notifications.

Now, this is obviously good if you need to update in that kind of real time. But typically, you want to have a latency that's long enough so that events get compressed. Because 4,000 files were created in this directory. But really, I only want to have to do one re-scan at the end when the dust has settled, so to speak.

So if I was to change this latency and, say, use a 2.0 second latency, and then go and create all those files again. Whoops, that one. What you'll see is that that will run and I might get one or two events instead of 10 or 15. In fact, I only got one event. Now, you see these event IDs. I will show you the history.

If I specify-- well, let's do latency 1.0 and I'll specify a dash since when. And I'm going to say-- because I was paying attention earlier. So this is actually going to be a fair number of events that are going to come through for the history since, let's say, a time-- an event ID of 320,000, which when I came into the room was roughly what the event IDs were at. Whoops, I have to give it a directory to watch.

So here we see everything that's changed since-- so you can see Spotlight was busy. Something changed in /dev, which is probably worth ignoring. And these are all the other directories. In fact, we can see that the fseventsd was writing to its directory. Temporary items, preferences got modified. So again, this is what's happened since that particular event ID. And then you can see that users dbgfoo got a few updates as well. These get coalesced on a 30-second granularity.

That's not super interesting, but you can get multiple events for the same directory. So this is just kind of giving you an example of the parameters and options that you have with the fseventsd framework. So if we go back to the slides-- So now we're going to get into talking about the specifics of the API.

The data types that you have to worry about-- it's a very simple API. First thing you have to know about is the FSEventsStream. This is the channel that you receive the notifications on when there's changes. It's a CF run loop provider, so you can create it, schedule it on a run loop, and we'll go into those details too. And then the next data type-- not really a data type, but the FSEventsStream callback, which is a typedef for the API that you have for your callback function. And that's what gets called each time that there are changes.

FS Event Stream Create is a standard CF run loop provider, so there's a variety of template arguments that you expect, the CFAllocators, there's a context pointer. The arguments that are of interest for the event stream itself are, first and foremost, the callback, or perhaps the paths are even more important.

The paths is a CFArray of CFStrings of paths that you want to watch. The since when parameter, which is a 64-bit ID, that's the starting event ID, we have a couple of different options for that. You can specify since now, which means just from the current time, from when you get this forward is when I want to receive events. Or you could specify zero, which would be from the beginning of time, but you probably don't want to do that. Or you can specify an actual event. You can specify the current event ID, which is the last one that you saw.

The latency is the frequency of updates that you want to receive, or how often you want to be called. So as you saw, I was specifying like 1.0 for one second, or 0.1 seconds, and that's how you will receive events at most that frequently. There's also a flags argument, but we don't have any flags, so you can just pass zero or the constant for none.

The lifecycle of an FS event stream looks like this. You create it first, specifying the arguments that you want. Then you schedule it on a run loop. Then you have to start the event stream. Again, we have very sort of explicit phases so that you have pretty good control over when you, so you're creating it, then you can start it so that you know that everything in your application will be initialized at that point, and your callback's going to start being called.

You can stop an event stream, which allows you to do clean teardowns so you don't have to worry about race conditions between, "Oh, I'm still getting events called back," and "I need to stop it," and, you know, this is a very straightforward way to do it. Once you stop an event stream, you can restart it if you want. Once you call invalidate, the thing is basically headed for the bit bucket, and the only other thing you can really do is call release.

The callback itself gets a couple of arguments. There is first the number of events that you receive. You may receive more than one event. If you have a very long latency, you may get 10 or 20 or 30 events. I forget what the maximum is that will pass at a time, but you can receive quite a few. You get three parallel arrays, and you get the paths for each event. So the event paths array, the first entry is for the first event, and that tells you which directory under the hierarchy that you're interested in that was modified.

For each event, you also get a flags argument and an event ID as well. The flags argument is very important. Normally, it's just going to be 0, which is good. That means nothing of interest happened. You may also receive one of these three flags. The first one, must scan sub dirs, means that, well, there were some problems.

As I said, devfs events in the kernel has limited buffer space, and when it starts to get tight on space, it will start to combine events. So what can happen is that you will receive an event for some directory within the hierarchy which you're watching, but it says what this flag means is that you need to rescan from that point down.

This is an unfortunate circumstance, but it's better than clearly having to rescan the entire hierarchy. . Again, we go to great lengths to avoid having this happen, but if it does happen, you need to be aware that if that bit is set, you need to rescan from that point down to be able to find all the changes that happened. The next two flags are a little bit more serious, and hopefully you'll never see them.

The first one is called "Flags User Dropped". What this means is that your application was not reading events fast enough, the callback wasn't processing the data fast enough, and things got clogged up, and we had to drop events for you. Something changed, we can't tell you what, because you weren't reading them fast enough. Now this is catastrophic in the sense that you now have to do a full rescan of the entire hierarchy that you were interested in, but it's a very clear indication that you need to make sure that the you're processing the callback fast enough.

The next event, "Kernel Dropped", there's nothing that you can do about this. You have to do a full rescan, but things got so clogged up that the kernel couldn't even keep up with the stream of events that were being generated. And again, this is catastrophic in the sense that it means that you have to do a full rescan. So we go to great lengths to avoid these things, but realistically they can happen, and you need to be aware of that.

The last array is the event ID. For each path, it has the flags and an event ID. These are monotonically increasing numbers. They are in essence a timestamp, but they bear no relation to any wall clock time or anything like that. It's just a number. You can store them away for later use.

When your application is shutting down, the last event that you receive, you store that ID, and then you can pick up from that point in the future.

[Transcript missing]

This program called Watcher, that I'll start going through the code in a second, is, well, sadly, a command line application because that's all I'm capable of. I took a Cocoa programming class, but it's just, I don't know.

Anyway. Yeah, what it does is monitors a file hierarchy and keeps track of the size. So if something changes inside of the file hierarchy, it updates its state and tells you what the new size is. And it does that by re-scanning the one directory that changed. So it doesn't have to do a full re-scan. So I can watch a particular directory hierarchy. It will build its initial state, and then it will update that state as things change.

So there's a little bit of support code that I'm not going to go into. I'm not going to go through the code that walks the directory hierarchy to build the initial state. You can figure out that that's not too exciting, or the code that parses the command line options.

So anyway, starting here in main, we're going to go down. As I said, the first thing is that we have to get an absolute path. So we call real path on the path that came in on the command line. And we only support monitoring a single directory as a simple example.

And if real path fails, because let's say the directory doesn't exist, we just copy it in. One other thing to point out, a frequent thing that people do with the FS events APIs, they'll go and watch /temp. And they go, I don't get any events. Well, that's because /temp is actually a symbolic link. So if you call real path, then it'll resolve the symbolic link. Now, the first thing that we do is we get the directory size for that full path.

So that prints out just what it sees the initial size is. And that get directory size function builds some internal state for each of the directories that it encountered, and you'll see how we use that later. Now we're going to go in and create the event stream. So up here we have the fseventstreamcreate, which is a pretty simple wrapper function in this case.

And the first thing that I want to point out is the context, the fseventstreamcontext. So your callback function obviously needs some hook back into your application data structures. The fseventstreamcontext is how you do that. In this case, we're going to set our info pointer to be the path that we're monitoring, the root of the hierarchy that we want to monitor. And so there's some other arguments that you can fill in if you're a sophisticated core foundation app. In this case, we don't have any need for that. Next, we create a cfaray of CFStrings and basically massage the string that we got into this cfaray of CFStrings.

And then we call fseventstreamcreate. As I said, you have the callback. We have this function called fseventscallback. We pass it the context pointer. This is our hook back into the application data structures. Pass in that cfaray, and then the settings that were specified on the command line for the since when, latency. And flags. In this case, most of these, except for latency, are not terribly interesting.

Back in main, once we've got our stream, we schedule it on the run loop, and then we start the event stream for the event stream that we just created. We're not going to talk about the flush seconds, this is a separate thing. And then we call cf run loop run, and off it goes to get our events. Now, first off, there's a bug in this program, I'll just tell you that right now, but I want to go and show you how it works in practice.

So, down here, I will run -watcher -latency, let's be a little aggressive, and this current directory. So it tells me that the initial size is somewhere over 2 megabytes for this directory. Now, up in this other window, I'm in a subdirectory there. And again, I'll - whoops, gee, I've got a lot of stuff.

Right, so it's removing all these things. The size of the directory has gone down a small amount. So now what I'm going to do is I'm going to create a file. And again, I'll geek out.

[Transcript missing]

If I remove that file, the size will go back down. Remove junk. And you can see that it just re-scanned the one directory where the changes were made.

So, remove junk, size goes back down to what it was before, and everything is okay. Now, if I open this, So we have an empty directory, and you can see that a change was... Made to the directory up above by the finder. If I get a new finder window and go to applications, I want to take the calculator and drag it in here. We get a whole bunch more. It goes up by about 5 megabytes.

If I delete that, our size goes back. Yeah, 5 or 6 megabytes. Empty the trash. Yes, I am sure. And it's gone. So it plays well regardless. So that calculator application, my point of showing that was that that's actually a bundle and has a whole bunch of subdirectories and the FS events stuff.

Tells us about the changes. If I do make dir, let's say, a series of directories, A/B/D, get a series of events, these changes in sizes are small because it's just the small directories that are being created. But if I go into A, B, C, D, and then I go and do the D, D again, we get an event.

And even though we have a particularly large hierarchy, we only have to rescan the one directory that actually changed. That's the whole point of this allowing your application to be more efficient. So again, if I delete it, the size goes back down. Now, going back to the code, what I want to point out is what the bug is.

Here, you saw I said I got the initial size of the directory, and then I created the stream, and then I started the stream. The problem is that by getting the directory size first, and then starting the event stream, we've left a window open. And we all know that there can be problems with windows.

You want to make sure that you're getting events after you've created your initial state. So the right thing to do here is to take this code and to put it after we've started the event stream. So that once we have our size, we have our initial size, then when we start getting updates, they'll update. There's no window for things to change without us receiving notifications about it. So that takes care of that. If we go back to the slides.

All right. So, it's a summary of what we did. The application that we showed is just very basic FS events stream usage because it's just monitoring a single path, no complicated stuff with the run loop, pretty simple guy. So, just for pedagogical purposes, we kept it straightforward. The application creates its initial state, and it lets the callback drive updates to that state.

As I said, whenever we made a change to a directory, you know, somewhere deep in the hierarchy, it would update that portion of the state and reflect the new total size of the hierarchy. More advanced apps could do a lot of different things. They could use the event history. We didn't actually use that in this case. They could watch multiple different hierarchies if they needed to.

They could create multiple streams. You may have a highly multi-threaded application, and different parts of it need to watch different hierarchies, and they want -- they have different run loops, and they need different event streams. You can do fancy scheduling -- excuse me -- with start and stop. As I said, that's a good way to do clean sort of start up and tear down of your FS event stream.

You can use custom CFAllocators if you wanted to and so on. Now, some advice and sort of other things. You have to look at the flags for an event. Oh, I suppose actually that's one thing I didn't show was the callback. Can we go back to the code?

I kind of forgot the punchline, so to speak. So the FSEvents callback is, as I said, it gets a couple of standard things. The context pointer is passed to it, which in our case is the full path that we are monitoring. And then for each of the events, it goes through and takes a look at them. Now, in this case-- let me just make this just a wee bit bigger.

[Transcript missing]

So, dropped events are rare, but you do need to handle them, as I just showed in the callback. The types of re-scanning that you may have to do are starting at some point in the hierarchy and below, or everything, which are really just variants of the same thing. Another thing that we discovered, or not discovered, or sort of realized you could do, is you can create token events.

So you don't have to be passive and just listen to what happens. You can actually do things and observe those events coming through the pipeline on the other end. You can use this to kind of wrap the beginning and end of some other operation that someone may be doing on your behalf.

So you may create an initial, some directory or file with a well-known name. You see that the modification comes through, and you observe that it's there, and then a series of other events may happen. And then you, when that operation is complete, you do a token event at the end. And so you sort of have begin-end pairs to know when things are finished.

As I mentioned, you want to be careful of leaving windows open between when you generate your state and when you start receiving events that would update that state. It's very important because they may be small windows, but you don't want to miss things that would cause you to have out-of-date state.

Now, FS events isn't the only thing. It's a good thing, but it's not always appropriate in all cases. A lot of times you go, oh, wow, new API, got to use it. And FS events framework is a good solution, but it's not the only one. You may want to consider just using a KQ. There's no need to change if you're only monitoring a single directory.

If you have a spool directory and there's no subdirectories or anything like that, a KQ is actually probably lighter weight and more efficient, less trouble to use than using the FS events framework. If you just have to check a couple of files, did some particular preference file change when I wasn't running, you would not use the FS events framework.

Simply stack the file and get the information that you need. FS events is best when you've got a really big hierarchy that you want to monitor or know about changes to, or you need a history of changes. If you don't actually need the history and so on, it's probably more straightforward to use a KQ if you only have a few files, or you only need directory changes for a single level of hierarchy.

Now, as we've been developing things, we noticed there's some issues. So there are going to be some changes coming. We realize that you're going to have to have a GUID for an event stream to uniquely identify it so that an event ID is really paired with the event stream from which it comes.

So that if for some reason you stored an event ID and then let's say someone dragged copy stuff to another machine, that event ID is no longer really meaningful. You have to have a way to know that the underlying event stream that you're getting is different. So it's really going to be a pair, a GUID for the event stream from which the event ID came. We're probably going to also introduce a special event that indicates a volume was modified outside of our purview.

That is, somebody changed it. Like if you have an external FireWire drive and it's taken somewhere else and modified and then comes back to a Leopard system, you need to know about that because it means that you're going to have to do a full rescan. And there's nothing that we can do about that. There's no magic in the world in the sense of if somebody else makes changes and they don't record the FS event history, there's no way to know about that.

We will probably also have some API cleanups, so that some little things like the paths to the callback may become CFStrings instead of just being standard C strings like they are now. On the way in, you specify a CFArray of CFStrings, but then the callback just gets C strings. So there's some little inconsistencies there that we'll probably try to resolve. Again, we're also very open to feedback and hearing what people are interested in.

So, in summary, the FSEvents framework provides you with a change notification for any file hierarchy or file hierarchies that you want to watch. Monitoring a file hierarchy provides notifications for changes to directories within that hierarchy. Events are coarse-grained, as I said, they are directory-level changes, and it gives you a full event history.

The purpose of this is to allow your application to be more efficient at processing updates and changes, so that you don't have to just rescan an entire hierarchy, or poll, or periodically stat things to find out about changes. So, that's the FSEvents framework, and I guess that concludes things, and we have, I think, about 15 minutes for Q&A.

I just also wanted to mention that we're starting a filesystems dev list, a public list. You can find it at list.apple.com where you can get eyes like Dominic's, have discussions. The filesystems engineers will be available tomorrow morning, 9am, in the Kernel Extensions Lab, and also at the Campus Beer Bash.