Introduction to the FSEvents Framework - WWDC 2007

Mac OS X Essentials • 55:05

The FSEvents framework helps applications efficiently monitor large directory hierarchies for changes. Applications can receive live updates as well as the history of changes made since the last time the application ran. If your application manages files or needs to track modifications to the file system, come find out how the FSEvents framework can help.

Speakers: Dominic Giampaolo, Brent Knight

Unlisted on Apple Developer site

Downloads from Apple

SD Video (318.4 MB)

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript has potential transcription errors. We are working on an improved version.

Good afternoon. I'm Dominic Giampaolo and this is my imaginary friend Brent Knight who is actually in San Antonio, Texas because his wife is sick and his child. We worked on the FSEvents.framework and that's what we're going to talk to you about today, well, Brent isn't, I will. Okay. So what's the agenda? We're going to talk about, start with the history and background, some of this, if you were here last year, this will be a little rehash, but there is quite a bit of new material.

So don't totally fall asleep. After the history and background, we'll go over the FSEvents framework, the API, kind of the concepts, what's involved, and then, of course, since this is WWDC we'll be dealing some example code, sample application that uses the FSEvents framework, not too complicated, but gives you the basics. Then we'll move on to some advance topics, not so much in the code, but speaking about different parts of the API , this is some of the stuff where a lot has changed since last year, and then, of course, a review and wrap up.

So, history and background. As I'm fond of saying, any engineering problem starts with defining what it is you are trying to solve before you go and start writing code. So in this case for FSEvents framework, the problem that we were trying to solve is, did anything change in this hierarchy or is anything changing in this hierarchy?

And those are actually pretty hard questions, because that hierarchy can be big and it's not something that is easy to necessarily just watch everything. So that's what we were trying to solve. We also realized that different applications may want different types of answers. Like as you noticed, did anything change, sort of has this historical bent to it and is anything changing is more of a live kind of question.

Now, of course, everybody wasn't just sitting around waiting for us to do this for the last 25 years, so there are various techniques that people have used. So let's kind of go over those to set the stage for what's wrong with them or what the deficiencies are and what FSEvents brings to the table.

The first thing is there is polling with stat(), you can ask, did this change? Did this change, repeatedly, and drain the battery and spin up the hard disk and so on and so forth? A step above that kqueues() which were introduced in, I think in, definitely in Tiger probably I think it was Jaguar that they were first introduced. And this is certainly a better mechanism because it is event driven.

That is, you say I'm interested in this object, tell me when something happens to it. So that's a step above, then there is the FNSubscribe and FNNotify APIs which existed all the way back to Mac OS 9. Those are more kind of opt-in APIs where someone can subscribe to notifications and someone else can publish them with FNNotify. And then, of course, there is the kauth subsystem that was introduced in Tiger. Now, what are the problems with this? full rescans are slow, so anything you do with stat(), where your application starts up and goes, hmmm...did anything change? Let me just go walk the whole hierarchy.

That has a lot of performance implications, can potentially be very slow in addition to using a lot of CPU time, burning down the battery, spinning up the hard disks more than is necessary. There is no way to watch an entire hierarchy with kqueues(). So you can say I want to watch this individual object and that's great especially because it's event driven, but if you want, if that object happens to be the root of a large hierarchy, you won't find out if something ten levels down changes. You have to open file descriptors for every single thing. If there are 100,000 files you can't do that.

Often APIs like FNSubscribe and FNNotify are good in that they are event driven, but they are opt in so you don't necessarily find out about all the changes. kauth is a kernel level subsystem and that's not really practical for most applications developing text, installing them and so forth is a bit of a hassle for just wanting to find out what changed.

And with all these systems there's no history of events. So unless your application is running you won't necessarily know. So you would have to do a full rescan. So it would be nice to actually have a historical way to ask for what did change while I wasn't running.

So, the first step in solving this was /dev/fsevents events, which got introduced in Tiger for Spotlight and some of you discovered and started writing programs that fiddle around with it even though you're not supposed to, but that's okay. And this provides a raw stream of events. When I say raw stream of events it's all LOCALLY generated events. So at a fairly fine granularity of this file had stat() changed that is like the times were updated, this file was modified, this file was created, deleted, renamed, so on.

So even local changes to network file systems generated events--or generate events. These events are very raw and the kernel mechanism by nature and by design is lossy in that if there's no room to put the events in, they will be dropped, because you can't block generating an event waiting for someone to consume it, otherwise you wind up with these nasty three-way or multi-way dead locks that you can't break out of. So, it's sensitive to slow clients. If someone doesn't read it, read the events out of the kernel, because it's a shared queue, everybody loses. So, that's not the kind of thing we would want to expose to regular user applications.

The internal clients that you use /dev/fsevents like Spotlight taught us quite a bit about it as well and then we also got a lot of feedback from WWDC last year when we first introduced the FSEvents framework. And in facts it's pretty remarkable a lot of the feedback we got from that, as well as from internal use, really played a significant part in some changes that were made.

So, setting the stage here, the design rationale that we used, we considered a few types of clients. Time Machine and Backup and all sorts of things, did anything change in this hierarchy so that I can do X, Y or Z. File Sync, which is also a very similar type of application. Did anything change that I need to know, synchronize with another machine?

Now Finder is asking the, is anything changing type question and for when you have the folder sizing turned on, if you have a folder displayed and something gets dropped into a folder way down inside that hierarchy the find, the folder size for that file, for that folder should update.

So the Finder uses it as well. We know that this is not going to satisfy virus checkers. A lot of people think I'm going to find out about changes and go check on stuff. This is much more informational as opposed to interposing; something like a virus checker would probably use kauth.

Limitations and constraints, now we know that we can't store a complete log, a 100% accurate log of everything that has happened. You're computer is intended to be used by you for various applications, not for storing file system events. I think you'd be unhappy if we were eating up significant fractions of the CPU and disk bandwidth.

We need to filter the event stream into something that is manageable, something that applications can more easily digest. We can't all be all things to all people, so let's not even try. I tend to keep things simple in design, just so that it is more manageable and I can get my head around it and that's kind of the rationale behind it.

So, some assumptions we had to make. First off, when you have a client that is watching some hierarchy, it has to have a way to generate its initial state. So you have to have code that does a full scan. Point it at a hierarchy and go, build up some internal state.

So if you have that, then it's pretty straight forward to say, well okay, just go look at this piece here and do a partial scan. Don't actually go cursive and descend into the hierarchy. So, given those two pieces, if you have some current state, some initial state, you can compare that to what is out there currently, when something changes. And that you can update your state when you get a notification that something has changed.

So kind of the mental model is you build an initial state or whatever, metadata you need to extract from a hierarchy and then when changes happen to it, you go update just the specific part of the hierarchy that has changed. We are going to go into this in quite a bit more detail. So, the FSEvents framework is a CoreFoundation API that lets you, gives you the ability to watch a large hierarchy for changes, get directory-level notifications of changes.

All local changes produce events. So you can watch a remote file system if you want, locally generated events will trigger updates. Now, if someone changes something on the server through another client you won't hear about it. There's a persistent change history and this is a key feature, so you can say my Mac last ran at this event ID, which I'll explain in a minute, what happened since then and find out about it, even if that was a week ago or a month ago.

You have fine control over the frequency of updates. So it's kind of a pretty decent package. You don't get specific changes to event changes for specific files. So, if file foo changes, you're not going to receive an event, you are going to receive an event for the directory that contains it.

Okay, let's talk a little bit about the architecture. Sometimes it's useful to understand how all the pieces fit together. So as I've said, FSEvents is a framework, the core foundation framework, the API that you guys can call and it's backed by a corresponding daemon fseventsd. fseventsd reads the events out of the kernel and kind of multiplexes that stream out to clients. It filters it, massages it, and does a bunch of other work to makes it something that's more manageable.

Graphically, it looks like what we have here where the kernel is spewing events up into fseventsd who scrolls them away in the event history and then passes them through to the FSEvents framework which eventually makes its way over to the client application. So that's kind of how all the pieces fit together.

Now, a lot of times, new API, oooh, I got to use it, I got to find a way to make it, I got a hammer now I got to find a nail. FSEvents framework is good, but it's not always the appropriate thing to use. If you have a, let's say a few tens of files at most, kqueue() you make actually be more appropriate.

You're going to find out very specifically this is the file that changed. Once you get beyond, if you wanna say tens, twenty, twenty-five files, then it starts to get a little bit more unwieldy. If you just need to check a few files, did my preference file change or did some other little file change on the disk, just use stat() and check it.

That's actually probably going to be more efficient than asking to churn through the history of all the changes that have happened to look if a particular file changed. FSEvents is best when you have basically a large hierarchy that you'd like to watch. You know, i.e., a large repository of media files, did someone go in there and monkey around with stuff, a large package that contains a lot of subfolders, you need a full history of changes, or you want to watch something that may not exist yet and I'll get to that too in a minute. All right, so let's go over to the demo machine real quick and I will show it to you in practice. This isn't going to be super exciting. Can we switch to the demo machine?

Okay. There we go. All right, so I have two sort of not very nicely colored windows here. We have a little test program that we use internally, that is called FSEvents creatively enough, that let's you play with various API options. It's an internal testing tool. So I'm going to say, -latency and I'll go through what all this means. 1.0, for one second latency and I'm going to watch, ~/dgb, I've made a subdirectory for myself here. And so it prints out this new ID for the event stream for this path and what's a good way to do this?

Let's make this guy a little bit smaller. Now I will mkdir tmp and you'll see an event comes up that the users apple$ dbg directory was modified because I've made this subdirectory called tmp. Now cd into there and I have this little alias called make-files, which will create, I think it's about 5,000 files and we got an event that users dbg tmp was modified and there's 5,000 files in there.

And it's kind of nice that even though all this churn happened, this huge burst of activity, got one event and I go and I can rescan that directory if I wanted to, to find out what happened. As opposed to getting 5,000 events and you know, competing with the guy who's trying to do the work to get in there. If I remove them all then I actually, in this case, get a couple of events because of the way things work. And that's what I was going to show you next.

So if I was to change the latency to let's say, some ridiculously small latency of a hundredth of a second and now I go and create all those files again, I get a whole bunch of events. So what this points out is the fine control over the delivery of events. If you have an interactive application, you may want to have the display update a bit more frequently than once a second.

So you might ask for half second latency or maybe even a tenth of a second, certainly not much more, not much less than what the display update is. There's no point in getting a hundred updates a second if you can only update the screen 60 times a second. So, in this example, a hundredth of a second generated way to many events for the creation of 5,000 files.

So you almost never use a latency that small. Another thing to notice is that if I watch, didn't have to stop it, but if I watch this ~/dbg directory and I go back to the home directory and I mkdir foo it will get an event. It happens somewhere else. It was not in the path or the hierarchy that I was watching and so it doesn't notify my application.

Another thing to notice is that if I was watch /tmp, and I'll harp on this issue a few times, touch boo, the path that comes through was actually /private/tmp. You need to be aware of things that are symlinks or that would cause you to traverse to some other place. So this, the path gets resolved and in this situation, but you maybe surprised when you watch /tmp, you actually get an event for /private/tmp, because that is where it really points to.

So, I think that's reasonable demo of the FSEvents tool itself. So if we switch back to the slides, because we'll get into a bit more into stuff with the example application. So, if we can go back to the slides. Okay. So that was, you know, standard Dominic command-line demo.

Now, if you were here last year like I said. Don't fall asleep; it's time to wake up. This is the stuff that is new. I was just going to briefly highlight and it gets covered in line with the rest of it. So you have to stay awake the whole time. There's a bunch of new device specific, not bunch, there is a new device-specific APIs, there's a per device UUID, that kind of big ugly string that got printed at the very beginning at the FSEvents tool, is a UUID that identifies a particular event stream for a device.

There's better handling of historical events, we noticed that it was kind of difficult to process historical events, like when did they stop when am I getting live events, how will I know when it switched? We have a new HistoryDone flag. There's more control over event delivery. There was this bit of contention over the latency, the interpretation of the latency parameter, there's a NoDefer flag and as well as these flushing functions.

There's better handling to changes of the root of a watched hierarchy. And again I'll go into this in more detail later on. Better handling of volumes coming and going, because if you're watching a device-specific volume you, with a device-specific API, you need to know when that volume goes away. Security which wasn't implemented at all last year is, of course, in place and in for a variety of reasons it's much more robust and has a lot fewer dropped events then it did last year. So, let's get into the concrete details of the API.

The data types and the concepts that you have to master, are basically three things. You have an FSEventStream, which is the mechanism that sends you the notifications about what changed. It's a CFRunLoop provider. So if you're familiar with CFRunLoops and how that works and if you program on the Mac, I imagine you are, is pretty basic stuff and that's how you get hooked into the events stream and what happens the notifications call a callback and that's your FSEventsCallback. The second thing that you most "master" to understand the API and it's called every time there are changes. And then the events you get. The events have three pieces of information each, where the change happened, that is the path to it, "when" and I put that in quotes and I'll explain later, when did the event happen, and some information... some flags about the event that you need to look at.

As I said, there are two types of streams. There are device relative streams, which is the preferred type of stream as we learned over the course of the last year and you must use them when your application stores event IDs. Paths specified with device relative streams are relative to the root of the device. So if you are watching /Volumes/MyDisk, the path is /Users/freddy. It would be /Volumes/MyDisk/Users/freddy being the full path.

You don't have any problems with volume renames also when you have the device relative stream. Absolute path streams are useful if you are doing live monitoring, but we sort of don't really recommend them because of the, especially when you have a deep path, if you're not watching slash it can get tricky as to what happens when things change up above it.

Now, device UUIDs, these uniquely identify the event stream for a given device. So, there's a way to get them, of course, FSEventsCopyUUIDForDevice() and it's something you would use to verify that when you start up a second time is the event stream that's out there now the same one I saw at the last time, because they can change. This UUID is not the same as a DiskArb UUID for a disk. It's not intended to be a replacement; it's to uniquely identify the event stream and the event ID numbers that are there match what you have.

The UUID for device can change for several reasons. Even though it is a 64-bit value, the event ID could wrap, although in practice it wouldn't, but a nefarious user could create a situation where they put an event ID on the disk that causes it to wrap fairly quickly. If someone purges the history on the disk, and I'll talk about that later, that could also cause the event ID, the UUID to change and therefore when a UUID changes it means the event IDs are no longer valid.

Other catastrophic situations, if the kernel were for some unforeseen reason to drop events that causes the UUID to change or the event IDs, we don't know when events get dropped, we don't know what happened. So we have to say there's been a discontinuity. Not so catastrophic situations are things where, for example, a disk is taken, a removable disk is taken from a Leopard system to a Tiger system modified there, the changes are not updated in the FSEvents history and that's brought back to a Leopard system.

Hopefully this will not be a very common sort of situation, but it could happen. Also, devices such as MS DOS where we don't know if they've been modified offline, They are not able to store a permanent record. So, FSEventsStreamCreate(), there's two forms as I said. The absolute path version is FSEventsStreamCreate(), which is what existed last year and then we have the new function FSEventsStreamCreateRelativeToDevice(). The arguments, pretty straight forward. The array of paths you would like to watch.

Again, if it's relative to device, the paths should have the volume prefix stripped off. So that it's just from the root of the volume down. The callback, which is the function you want to get control. sinceWhen is a starting eventID and I'll get into more detail about that in a minute. Latency as you saw was a value, floating point value to specify the numbers of what the interval of time you would like before you get events or how they are batched.

Flags which control the behavior of the stream. There's a lot of bits that you can set to control different features and for the relative to device function you past a dev_t. See this is where my UNIX heritage kind of bubbles up into a core foundation API. You get a dev_t, which is the device you want to watch.

FSEventsStream, as I said, is a CFRunLoop provider, so it follows a pretty standard life cycle. You create it, you schedule it on a particular run loop, then once your start it, your callback will begin being called when the run loop is run. You can stop a stream, which allows you to do an orderly shut down, so that you don't have to worry about race conditions with the callback being called while you are trying to tear down the stream, so on and so forth. So these are clean ways to start and stop a stream. Typically after you stop the stream, you just would call and validate and release it, although you can actually restart a stream if you'd like.

Although there's a, if you're watching historical events in the WWDC seed you'll get an extra history done event. Any way, don't worry about that yet. That will be fixed. So the paths that you pass in, the path does not have to exist yet. This is kind of a neat feature, because it all sort of directory-prefix based matching, string based matching on the names. You could watch users foo documents, does not exist and when that directory comes into existence and something is created inside of it you'll start receiving events. So it's kind of convenient in that way.

Device relative streams, as I mentioned a few times now, have to be using relative paths. Anything that changes under the path that you are watching produces an event. You can watch any number of paths. Typically each stream corresponds to one path that it is watched. But you could watch five, ten, whatever, fifteen paths in an array and you watch them.

Non-root FSEvents clients do have permissions checked. That is fseventsd will say, okay, you're not root, you're UUID 501 I'm going to do, I'm going to validate that you can actually see the event, the directory that is being modified so you can't decide to watch something that you don't have permission to. Again being careful, you have to be careful of symlinks and hardlinks.

If there is a hardlink, if there are multiple hardlinks to a particular piece of data and you're watching it in hierarchy A and it gets modified in hierarchy B you don't get an event because the path in hierarchy A was not modified. Again, the paths are matched on strings.

realpath() is used internally to resolve symlinks so you may, like I said, if you watch /tmp, this was one of the first things that everyone did was, they watched /tmp and they didn't get any events because we were looking for the string /tmp, but actually it is a private/tmp. So we use real path to resolve things. Again, modifications to hardlinks or symlinks, if something traverses through a symlink and goes outside of the hierarchy that you're modifying, or watching and gets modified outside of there, you wouldn't get an event.

The callback gets what you might expect, which is the number of events that happened, an array of paths, an array of eventFlags, each one across the row correspond to each other, and event IDs, one array, one event ID for each event. Now a little bit more about the paths that you'll get in your callback. If you watch /Users/foo and you modify /Documents/MyCoolStuff/newfile, you get an event for /Documents/MyCoolStuff. If you create /Users/foobar/blah, you don't get an event. That's what I showed in the demo as well.

Normally you don't get events for changes to the root of the hierarchy. If you watch /Users/foo/somedir and you make, you create it. Let's say it didn't exist and you create it. There's no event for that, normally. And then when you create something inside of it, that will produce an event. If you use a kFSEventStreamCreateFlagWatchRoot when you create the stream then that will produce events when the root changes or any of its parents. As I keep saying, I'll talk more about that further on in the talk.

The flags that you can get for each event, normally if everything is going straight forwardly. You'll get kFSEventStreamFlagNone, which means nothing special. There's nothing you need to do or be aware of. If for some reason something was dropped along the way. The path will say or the flag will say MustScanSubDirs.

So starting at the path that you received, go recursive from there down. You know, may be a small branch of the hierarchy, it may be further up. Hopefully you'll never see that. There's two additional informational flags that tell you why it happened. UserDropped means that your client was not reading things quickly enough.

KernelDropped, which I expect that no one will ever see unless you run these crazy little test programs like I do, sometimes. Even then it's hard to do. Is kernel dropped and that means the kernel simply couldn't keep up with the volume of changes that were happening and it ran out of space to keep track of events. Now, the EventIdsWrapped, as I alluded to. If you were to just generate events normally with the 64-bit event ID counter it would take you probably anywhere in the order of 100,000 years to cause them to wrap. So that's not likely to be the case.

However, someone could create a situation where they, you know have an event ID of all F's and the next event will cause it to wrap. And so we have to deal with that, because when event IDs wrap you have no way to know and so we have to reset things.

When I say you have no way to know what other changes your event IDs are longer going to be able to be compared relative to each other. HistoryDone comes through, you'll get a fake event when you, if you've asked for historical events with the sinceWhen parameter. You'll get a HistoryDone flag set, so you'll know, okay, now I'm getting live events.

RootChanged is if you asked for the watch root bit when you created the flag, or created the stream. RootChanged means that the root of the hierarchy that you were watching is no longer there or something about it has changed and you need to go reexamine it. And mount and unmount mean a device has come and gone. And this will--talk about that more later too; it's not replacement for DiskArb.

EventIDs, as I said, there are 64-bit numbers.d They're monotonically increasing, they are never recycled for all practical purposes, but they have no relation whatsoever to any kind of wall-clock time. They're just a number, that just sequentially increases. So for a given event stream you can compare this one happened before this other one, yes or no.

You can ask for everything from this last event ID that I saw, but it doesn't have any relation to wall-clock time. Now, the event history, few people maybe wondering, well what do you really store? We really do store all events and for all time unless something bad happens or we're asked to purged it.

So the event history is pretty small and it compresses well, because of the granularity of time and the window of time we look at changes and the strings that tend to be modified tend to compress nicely. You can actually just store everything forever without really significantly eating up a lot disk space. When I run tests that go 7 x 24 and really hammer this system, I generate maybe, I don't know, I think it's about three or four megabytes of logs a day. So hardly a significant imposition.

The event history is persistent, so you reboot no problem. You'll, as long as Leopard-only systems modify the disk and given a previous UU... previous eventID and device UUID you can check that the event, that the device UUID matches what's out there currently. So the one you stored matches what's out there and you can ask for all the events that happened since that previous eventID. And as I said, the last historical event that you'll get is fake and it's just means that you get HistoryDone bit and now you're switched from historical events to live events.

Now, this is another example of things that we got feedback from last year. You can ask for an approximate event ID corresponding to a wall-clock time. So as I said, even though event IDs themselves do not correspond to any particular time base, there's a way to map approximately so you can say, what is the closest event ID that you know about to this particular time and we look at that. And the way we determine that is the time stamps on the log files and we just sort of find the one that's the closest match for it.

You can find out of the last event ID for this system, so that you know what the greatest thing that's come along so far and you can purge the history up to a giving event ID. This was actually kind of very, I don't know if touching is the right word, but after the WWDC last year, this Russian fellow comes up to me and says, it is very important that you be able to erase the history, because your government may be trying to persecute you.

I was like, he's not joking. I actually took that pretty seriously and made sure that you have a way to erase the history should you have something that you're afraid your government will persecute you for. So, switching back over to the sample code and hopefully nobody here has any of that kind of problems. Could we switch to the demo machine? Okay. So, stop that, so, I have this sample program called Watcher. I'm just going to set up a little better here.

Don't think I need that window. Okay, what Watcher does is, besides from all the disclaimers, is, it's a simple example, if using FSEvents framework to watch a large, to watch a hierarchy and monitor for changes to it so that you can update what the total size is. So, the first thing it does is, the state that it builds is what the total size of this directory that you're watching and then when changes happen inside of it, it will update that state. So just, show you how it works in practice if I do watcher large-hierarchy, it will-- it had the storage size and amazingly enough there's 850,000 bytes exactly in this hierarchy. If something, I did actually need that other window, lovely.

Well, apologies. You won't be able to see what I'm doing in the window, that's okay. Maybe now you can. All right, so if I-- dd if= --I'm just going to create a file this specific size, to-- =10kfile -- that's actually not 10K. This will create a file and now you can see the size change to 860,000 bytes and if I was to remove the 10kfile, it goes back.

Now, again the whole point of FSEvents is that if changes happen while your application isn't running, that you will find out about them and it gets, the stored total size as you can see was 850,000, but then there was a change made and so then the current is done processing the historical events that happened while the program wasn't running and new size is 860,000. So this is pretty basic functionality, but let's go through and see how it actually works.

Let's jump down to where all good C programs start, name. There's a lot of support code that I'm not going to go into such as interating the hierarchy and sating the files and figuring out the size that's in there. It's a sample code that you can download and look at, but we're not going to cover that stuff today. There is one thing, hold on a second. This is going to drive me nuts.

Yeah, yeah, yeah, I know. I can see everybody groaning. Sorry, we have four space versus eight space tabs. Anyway, all right, the first thing the program does is some standard command-line processing of the setting structure that keeps track of the state for the application and then we have a path, and as I said, we call real. Why is that, oh, it's not, okay, there we go.

We call realpath to find out what the actual path is that we're monitoring and this application cares, because if we were watching something that was a symlink, the events that you get will be for the real, the true path and so you wouldn't want to have differences or discrepancies between what you think the paths you should be getting are and what they really come through as. So that's just some initial set up. We call this function watch_dir_hierarchy, which is where most of the work happens, the set-up work. So, we call get_dev_info, which is going to find out what the device is that we're watching.

That function and I'll just pop to it this way. Does a little bit of wonky string processing to find out what the, if the path didn't exist it walks further back up it to find the piece that does and gets the dev_t out of the stat structure. Calls lstat and gets the dev, which we set in our setting structure. Then we call FSEventsCopyUUIDForDevice. So this is the UUID that that device currently has. If we have saved state, we're going to compare the UUID so we have to do this first. If things fail and you know, really suck we also get the full mouth point.

So, going back to watch_dir_hierarcy, we've gotten our dev info and now we're going to load the stream information for this hierarchy that we're watching. And again, this is a simple program, so we load, we save and load state. It only knows about watching a single hierarchy. So that will get a UUID that was stored and then we check if they're equal. So, if CFEqual(uuid_ref) with what the settings is, the current UUID and the uuid_ref is the one that we have stored. If they match, well, then we can go and load our previously saved state and we have this utility function, load_dir_items and it loads it up.

If they mismatch, then we will get rid of the stored history and do a full rescan. For example, one way I can force that, I'll stop this program and it will save its state if I just change this to something that won't match. Now when I run the Watcher program again it is going to complain. There's a UUID mismatch, because what was stored doesn't match what's currently out there and so it goes and does a full rescan. Just to show you that code really works. We go and create a cfarray from the path.

In this case we are only watching a single path so it's pretty straight forward and now here is the real meat of things. Even though I have a dev_t, I was going to go do a more complicated example, but as it is I'm going to run out of time already.

So, we just call the plain FSEventStreamCreate, we pass it our callback, the array of paths, since_when parameter, which is the last event ID that we stored or the symbolic constant since now. kFSEventStream event ID since now. I forget the exact name. The latency and we're not going to pass any particular flags, although we could pass WatchRoot.

Once that comes back, if we got a stream_ref we schedule it on a run loop and then we start it. Then if we need an initial scan, we go through and do that. Otherwise we just fall into the standard CFRunLoop. Now one thing that's important to point out here is that we start the stream before we do the initial scan and the reason that's an important point is that if you were to go and do your initial scan and then start the stream, events could happen in between that you would miss.

So you always want to be aware of the ordering of these kinds of operations. Don't generate your state before you are going to be receiving events, because things could change and then you do not have any way to know that your "initial state" is actually already out of date.

So, once we call CFRunLoopRun(), then all control really comes through the callback function and the callback function right here, as I said, gets these three arrays, the event paths, the flags, and the eventIDs, and as you might expect it just simply loops through them. Make a copy of the path so that we can monkey around with it if we want and then we look at the flags. Like I said, it is very important that you do check for things like dropped events, the MustScanSubDirs flags. So the first thing is the HistoryDone event, which if we get, as I said, that's a fake event, we just say we are done processing historical events continue onward.

If you were watching the root and the root changed, then what we do is we say we stat it. If it exists, well then we know we're back in business and we can watch the hierarchy again. If the stat fails, then we need to disregard all our state because the hierarchy that we were watching no longer exists.

Now, in the case that the MustScanSubDirs flag is set, then we set the recursive variable and informationally we print these other things and if recursive is set then we, for that given path, we go ahead and scan it. If it's not set, the check_children_of_dir function will just iterate at that level in the hierarchy. It doesn't go down, it doesn't have to descend into any deeper, it just looks at that one level and finds out what has changed.

And then, of course, we print out what the total size is. Like I said, this is a pretty simple example that once the callback drive updates and you know, these work functions actually do a fair bit to go and recalculate the size, they have to iterate the directory, but it's not terribly exciting stuff.

So that's the main way it works. Now, if I, again, if something happens in the hierarchy, let's see, then see the size changes. If something happens while we are offline, now I'll actually go ahead and get rid of a whole bunch of stuff here, let's go and erase this whole hierarchy.

Now as you notice the hierarchy, this program, the Watcher program is not actually running, but if I rerun it, it processes the historical events that happened. I had erased that Documents directory and I had erased a file in there and that updates its size as opposed to having to rescan the entire thing. So let's switch back to the slides.

That was the Watcher sample code. Now it's pretty basic FSEventsStream usage. You saw how we created an initial state after we had created and started the stream and then it lets the callback just naturally drive updates to, to the internal state. Now a few things you definitely have to, I don't want to sound like a broken record, you have to look at the flags for an event. Drop events maybe rare, but they can happen and when they do happen you need to do the rescan from the path that it tells you to.

Again, it's typical to make this happen in practice, but make sure the code is there and that you at least test it. You start at the specific directory and scan down from there. Normally you just scan that level and you don't descend, but it MustScanSubDir as a set and you have to go down.

Token events, this is not something, with the HistoryDone event, it's not as necessary, but token events can sometimes be handy. It's a way or technique where you can sort of bracket a set of other file system operations. So let's say you create a directory called, /tmp/ starting operation and you create a file inside of there. Now when you go into a whole bunch of other stuff and then you go and delete it, you're application will get events in /tmp/starting operations directory and when it sees that that file has been deleted in there, it knows the other operations are complete.

So it's a way to kind of bracket a more complex set of operations. As I said, be careful leaving windows open when you generate your initial state, you should make sure you're going to be beginning the events that correspond to that so that you're not out of date. And, of course, symlinks and hardlinks are an issue as well.

Now, Advanced FSEvents topics. Not that advanced, but it's sort of not the standard stuff that you have to necessarily worry about. We're going to cover a couple things. I'm starting to run low on time so maybe rush a little bit. Working with the event history a little bit more, the WatchRoot flag.

How the different ways that you can use that, mount and unmount events and some additional stream operations that maybe useful to you As I said, there is a way to get the approximate event ID corresponding to a wall-clock time and that one is a mouthful. FSEventsGetLastEventIdForDeviceBeforeTime() It's like the documentation is built into the function name.

It's not exact, but it will give you an approximate event ID so you can say, well, what happened since last Tuesday and at, let's say at noon. You'll get an event ID that will be approximately at least that time. You can purge events with FSEventsPurgeEventsForDeviceUpToEventId(). Again this is the way you can clear out the history. This is useful perhaps in hand with or corresponding to the GetEventIdBeforeDeviceTime function.

You can pass events ID since now and that will purge everything that basically cleans it out and will reset the device UUID or the event stream UUID. As with a lot of things that I do with Apple. Everybody just wants to disable it. I worked on Spotlight too.

So there is a way to do this, I understand the need for it, so you have /Volumes/VOLNAME/.fseventsd, that's where the logs are stored. If you put the file no_log then the no_logs will ever be written to that device and so you don't have to worry about stuff persisting when you don't want it to.

Now the WatchRoot flag. This allows you to track changes to the top level that you're watching. An example that came up internally was the console lap wanted to watch /library logs and whenever library logs got erased it wanted to blow away it's internal state and any internal logs that it may of cached in memory and reset that and so this is a way that it can do that. When any component of the path changes you'll get an event.

So the tracking is path based. As I said, we're looking at strings and we're doing directory prefix matching, so if for example, you wanted to watch, lets say /Users foo /MyDocuments/blah and then rename my documents just be documents, you would no longer get events. Now if you have the WatchRoot flag you'll find out, well that path you asked to watch it doesn't actually exist any more. So this is kind of useful in a number of situations. If you want to follow a directory if it moves.

What you should do is open up the file descriptor, open a file descriptor on the root of that hierarchy and then when you get a root changed event you would find out where it moved to by calling the F_GETPATH on that file descriptor that you have open and that will sort of track it wherever it moves in the hierarchy. That's kind of very sticky, that may or may not be, you have to understand the implications of that kind of behavior if that's exactly what the user wants.

In that case you would create a new stream for the new path and rescan it, or as I said, in the case of console log, you can just reset your internal state when the root moves and then wait for it to come into existence again. As we saw in the Watcher example app, if you discard your internal state when the directory hierarchy goes away and then recreate it when it comes back into existence, you're all set.

Now, you also have to be a little bit careful when you use it with device-specific streams. It's best if your current working directory is the root of the hierarchy that you are watching. The monitoring happens client side. This is an important, it's an implementation detail, but it's relevant because there are open files descriptors on that volume in that process.

So that would mean that if you were watching something on a removable volume and you're not part of the DiskArb framework or you're not participating with DiskArb that volume will not be able to be unmounted while you your application was running. So just keep that in mind if you use the WatchRoot flag.

Now, mount and unmount events. It's definitely not a replacement for the DiskArb framework. It's more informational if you're using device relative stream. You, it's useful to know, well that device isn't there anymore. And so the unmounted event comes after the volume is completely gone, where as with DiskArb you have a chance to clean up your act and get everything closed so the volume can be unmounted and DiskArb is just a more sophisticated way to know about volume coming and going.

But if you are just watching stuff kind of more, in an observant fashion as opposed to, you know, having open files on the volume then use it, seeing that the unmounted comes through, well there's no point watching that stream anymore because it ain't there so let's get rid of it.

The unmounted events are important also, because device IDs can be recycled. So if I have external FireWire drive A and it has let's say device ID 1,000 it gets unplugged and unmounted when it comes through. Now a new device comes in FireWire drive B that would get the same device ID. So you need to be careful that you are not burned by recycle device IDs. When you get an unmount event close down the stream.

There's a way to get the latest event ID. This is, of course, useful if you want to just record what's the last event ID in the system. FSEventStreamGetLatestEventId() That's for a particular stream or FSEventsGetCurrentEventI() that's the greatest event ID in the entire system. So notice the distinction, one is for a particular stream that's the last event ID that has come through on that stream which may not be the most current event ID in the system or GetCurrentEventID is the most current event that exists in the system. There's some flushing functions, which pretty much you would really only want to use the FlushSync version, but this will ask the server to send any pending events.

So if you were in the process of shutting down, it is possible that some events were, you know, queued up, because you have very long latency and they haven't been sent yet. This will cause them to come through, so that you make sure that when your app shuts down you've seen all the latest stuff. Again, generating sentinel events that allows you to make sure you didn't miss anything. So you can sort of have an event stream watching some hierarchy create something and then look for that coming through in the event stream to know that something is started or that has finished.

So, well, I actually rushed through it and I'm going to finish a few minutes early, but wrapping up. The FSEvents.framework, what is it? It's an event-driven mechanism to find out about changes to the file system. So there's no more polling. This is an important thing as we move into new devices where battery usage and hard drive energy usage is important. You don't want to do large rescans if you can avoid it.

It gives you a full history, so your application doesn't have to be running all the time. You can say what happened since the last time I ran. Last time I ran was last Tuesday or more specifically event ID one billion whatever and you'll get everything that has happened since then.

The FSEvents.framework enables things like Time Machine and FileSync. Again no more full rescans that enables them to operate very efficiently, internally we've even experimented with doing things where you have continuous backup, which is every couple of minutes, because you don't have to do full rescans of the entire home directory to find what changed.

Of course, you have the Finder and its live monitoring and things. And then there's the unknown. There's you guys. How will you use it? I actually expect to be surprised. A lot of times things, people find creative uses, that's what we are hoping to see with the FSEvents.framework.