OS Foundations • 1:01:07
If you read or write lots of files, or care about the precise way your data is laid out on disk, you'll want to attend this session. This is the best place to find out about the latest features of Mac OS X's filesystem architecture. We will cover change notification, 64-bit inodes, extended attributes, ACLs, ""safe save"", and the other APIs, formats, and structures you need to know to make optimal use of Mac OS X's many filesystems
Speakers: Deric Horn, Dominic Giampaolo
Unlisted on Apple Developer site
Transcript
This transcript was generated using Whisper, it may have transcription errors.
[Deric Horn]
Good afternoon. Welcome to session 415, What's New in the File System? I'm Deric Horn, the Application Frameworks and File Systems Evangelist, and I'll be joined a little bit later by Dominic Giampaolo, File Systems Engineer. So today I'd like to start off by going over a little recap of some of the important features we introduced in Tiger. Those include the copy engine, extended attributes, introduction of access control lists. Then we're going to talk about some of the really important new features that we're introducing today in Leopard, things like safe save, move to trash functionality. And then Dominic's going to come up on stage and talk about file system event notification. This is a very heavily requested feature from you, the developers, and it's really a very elegant API. And we'll have plenty of time for Q&A afterwards.
So I'd like to start off with a little file systems architecture diagram. Here on the upper left, we see a typical Cocoa application. Cocoa applications being object-oriented call down into the NSFileManager. And NSFileManager offers object-oriented APIs. And in turn, the NSFileManager calls down directly down to the BSD layer and also down to the CoreServicesFileManager. I just want to take a minute and kind of talk a little bit about the CoreServicesFileManager. The CoreServicesFileManager is what we used to call the and File Manager. And it contains those familiar APIs to us, those FSRef-based APIs, the old FSSpec-based APIs, the APIs located in files.h.
So the core services file manager offers an abstraction layer above the BSD APIs and the volume format, like discrepancies underneath, how resource forks are stored, how extended attributes are stored. So we don't have to bother with that. We used one familiar API. And it's really a thin layer right above the BSD APIs, actually located in the Carbon core framework. So obviously, we have Carbon applications, which call down directly to the core services file manager.
And then on the right-hand side, command utilities. Typical command line utilities will call down directly down to the BSD APIs. But we often find that command line utilities also like to take advantage of the abstraction that the Core Services File Manager offers. So they also call down into the Core Services File Manager.
So I've got a long history with the file system. I started out on the development side back in the HFS+ days and moved over into the file system on the Carbon, and then moved over to Developer Relations, where I got to work with a lot of you. And by far, I'd say the most requested feature from you developers has been to give me access to the same copy engine that the Finder uses. Well, in Tiger, we introduced the FSCopyObject API. This is a high-performance API, which is tuned for performance across all volume formats. And it's the same copy engine that the Finder uses.
During its copy, it manages copying all the metadata for you, the data fork, resource fork, extended attributes, and so forth. And it comes in those four familiar sets of APIs, and those are the FSRef-based, both synchronous and asynchronous APIs, as well as the path-based synchronous and asynchronous APIs. So we have the FSCopyObjectSynchronous for the FSRef-based synchronous versus the FSPathCopyObjectAsync, path-based asynchronous APIs.
So here's a little snippet of how to use the FSPath Copy Object Sync API, and underneath it, its counterpart, the FSPath Move Object Sync API. I'm not going to go into the code, but periodically you'll see these yellow boxes at the bottom of the slides. And that will detail where you can find examples. So if you look in the WWDC attendees site and the area associated with session 4.15, you should find a sample called FS file operation. And this is going to have details of the implementation of the copy object APIs.
So extended attributes. Extended attributes are really a great way of kind of associating some metadata with any file or directory. So if you want to leave a little breadcrumb with any file, whether you own it or not, and just attach it to a file, you can do this with extended attributes.
It's supported across all volume formats. And we typically recommend that you use the reverse DNS name convention. So if you were to look at your file system today, you'd notice a lot of files have the com.apple.finder info extended attribute. And that's going to contain just what you'd expect, things like the type, creator, maybe label information that the finder is going to save.
I also want to mention the ctime that you get back from stat. When you stat a file, it will return back the ctime. I just want to clarify a little bit what this is. And this actually represents the status change time. So the last time you change any attribute on the file, the ctime is updated. So for instance, if you're writing a backup application, for instance, you might want to compare the content mod date to the ctime to see if potentially you don't have to actually back up the actual contents of the file, the actual data portion of the file. you may just be sufficient in backing up the attributes, extend attributes, and metadata support with that file.
But you should be aware of the current limitations of extended attributes. So in general, extended attributes are going to be under 4K. I believe on HFS+ volumes it's 3083 bytes. I think that's been drilled into me now. It's set up so that you're going to read an entire extended attribute in one fell swoop. It's not a stream-based call. You're going to go ahead and allocate a buffer and read it all in at one time. And likewise, when you write an extended attribute, you write it all out at once. Now, I said this is kind of a generalization. The one exception is the resource fork. So the resource fork, you can access through extended attributes, and at this time, you can access it kind of in a stream-based API. What I really mean by that is you can set the file position to potentially halfway through the resource fork and continue reading.
Whether or not you deal with extended attributes directly, you should always be mindful of them. Whether you're writing an FTP client, an archiver, or so forth, there are always three rules that you have to be aware of. And that is to be mindful of the data of the file, extended attributes of the file, and security information. So sometimes I wish all Apple engineers were mindful of these three rules.
So let's take a deeper look at the APIs. The APIs to access extended attributes are the getXAdder, setXAdder, listXAdder, and removeXAdder APIs. These are all BSD-level APIs. Really shouldn't matter whether you're Carbon, Cocoa. You can still call BSD APIs directly. The method for actually getting the size of an extended attribute is quite easy. We'll use the getXAddr API and specify null for our buffer, at which time it will return back the size of the extended attribute. You can then allocate your buffer to be that size, call xAddr again, passing in that buffer, and read in the entire buffer. Again, for samples of how to use the extended attribute APIs, we have the file system_example sample.
So access control lists. So typical Unix permissions are what we would call, or what we all know of as owner group, other read, write, and execute, or as I call them, user group world, read, write, and execute. Very limited. We're not allowed to create more than 16 groups. We can't nest groups.
So in Tiger, we introduced access control lists, or ACLs. They provide a much finer granularity security permissions for a file. So for instance, we can either allow or deny a user or group to append to a file. We can either allow or deny someone to actually read the ACL information on this file or directory.
So access control lists were introduced in Tiger, but we had them turned off by default. Now in Leopard, we're turning them on by default. There's a few tools that you can use to access your access control information. So a typical tool, ls, if you specify the -e option, it'll print out all the extended attribute information associated with that file or directory. Likewise, change mod with the plus A option is a great way of either allowing or denying users ACL information, access control list information.
You should also be aware if you're writing your application now and you want to make use of access control lists on Tiger, for instance, you can use the FS ACL control tool to turn on or off access control lists for that volume. So as I said in Tiger, they're turned off by default. Use the FS control tool, and you can turn them back on on that Tiger volume.
Again, these are BSD level APIs. And I wanted to mention that they're used heavily in the dot Mac servers. So this was really a big push to control access information and permissions for users and groups on the dot Mac servers that we all use. And now in Leopard, we have full finder integration. So when you do a get info in a file, you're going to see all the ACL information, more detailed ACL information. Again, you can find samples of this in the file system examples sample.
OK, now on to some of the fantastic new features that we're introducing today for Leopard. First one, safe-save. We're all pretty familiar with how this works, right? You have your original file. You write out a new file. Once it's coherent, we're going to want to replace the original with our new file. Up until now, I would say it's been almost a burden to implement safe-save for yourself, based on all the different volume formats and the details of the underlying formats and which options they support and which ones they don't. So now we're releasing a higher level API replace object available through the core services API through files.h. It's supported on all file systems, and it's largely based on the renamed semantics. It's designed to properly preserve all your metadata.
And by and large, in the past, maybe on Mac OS 9, we used something like FSExchangeObjects. And this was a great way to do safe save. But the big problem on Mac OS X is that this was really limiting. It only worked on files, and it was really only designed to work on HFS+ type volumes. So we weren't able to use it on NFS volumes, for instance. By releasing a new API, this gives us the opportunity to do what's best in each particular volume format. So for instance, on HFS+ volumes, we can go ahead and call exchange data under the hood if that's the best way to do it.
I want to mention that now in Leopard, if you're a Cocoa developer, it's used by default in NSDocument. So in the case of a catastrophe while you're saving a file using NSDocument, you may notice that a file might be created called a document being saved by text edit in brackets. This would be your original file. Maybe while you're writing it out, network cable gets pulled, power goes out, something along those lines. You can get more information on that in the AppKit release notes. And specifically, there's a section in there called Advice for Overwriters of NSDocument Reading and Writing Methods.
So the ReplaceObject API is Mac OS X API, which does really what we'd expect from a Mac OS X API. What I mean by that is we can not only replace files with files or directories with directories, but we can replace files with directories or directories with files. This is really especially useful if you've ever maybe edited a README file in TextEdit.
and you're editing your README file, and then you decide to start pasting some pictures in there, and you notice that your README file changes from README.RTF to README.RTFD. That file is actually a document package. So for instance, if you're saving a file to your iDisk in this case, you can go ahead, you have your original file, go ahead and make your entire document package hierarchy, make everything coherent. Once you have the entire document hierarchy coherent, replace that original file with your document package. There can be only one.
What I mean by that is the way the API works by default. When you call replace object, you have your original file and you have your new file. After replace object is called, you end up with your new file in the original file's location, one file object when you're done. Of course, we have plenty of options to override the default behavior. The save original as backup is one of them, in which case you might want to specify a directory to save your original as a backup. So we have an accessory routine called fsgettemporarydirectory for replace object. This is kind of a higher level API, which might sit on top of find folder. And it will return back the best place on this volume to save your files as a backup. But not only is it a good API to use to find the best place to save your backup, one of the things, as you can imagine about replace object, is that your new file and your original file have to be on the same volume. We can't have our new file on our local disk and our original file on our network disk and expect this to be one operation. So by calling the FS getTemporary directory for replace object, it'll return back a suitable location on the same volume as the original file.
The default behavior of replace object is to actually merge the extended attributes of the original file and the new file. In the case of a collision, the priority is given to the new file's extended attributes. And with ACLs, we take quite the opposite approach. We will take the security information from the original file, move the new file into that location, and then apply it to the new file. Again, we have options to override both of these behaviors.
Here's some sample implementation file. Actually, there is actually sample code. And I believe the sample is called FSReplaceObjectSample Code, found in the same location. Move to trash. Okay, this is actually quite a complicated problem. When someone wants to move a file to the appropriate trash, it's hard to know what the appropriate trash is, right? We have the slash user, slash trash, slash trash, slash network trashes, I believe. So, and some volumes don't even have trash cans. So typically what developers have done in the past is maybe create an Apple script that says something like tell finder move file object to trash. This is really a suboptimal approach for many reasons. Once the finder does this, you have no idea where the file went. You have no idea if it succeeded or if the finder is presenting a dialogue in the background. So now in Leopard, we're introducing the fs move object to trash API. It will move your file system object to the proper trash. And it will return a reference to the moved item. One of the reasons why it returns back a reference to the moved item is because it can potentially rename that file system object en route to the trash in case of a name collision.
The moveToTrash API will return back a negative 120, or directory not found error, in the case where it cannot find a trash can. So in this case, the finder typically presents up a dialogue that says something like, the item main.c will be deleted immediately. Are you sure you want to continue? You can do the same thing.
Here's a list of other information that I wanted to get out for Leopard, kind of a grab bag of small tidbits here. In Leopard, we are finally moving to 64-bit file IDs. What this really means is that the Alias Manager will automatically resolve your 64-bit file IDs for you and so forth. Integration is already built into the Finder. And we will be releasing higher level APIs to gain access to the 64-bit file IDs. UFS. With the introduction of case-sensitive HFS+ or HFSx volumes, there is no longer a big reason to have UFS disks around. So starting in Leopard, we will allow you to read and write UFS disks, but we will no longer format disks as UFS.
Last year at WWDC, we announced the deprecation of non-thread-safe file systems. Through the KPI right now, there's a flag to specify whether or not your file system is thread-safe. And we went through the hoops in VFS layer to make sure that your file system did the right thing. Now starting in Leopard, those non-thread-safe file systems are going to be unsupported.
And things that I know that you would never do. Dot underscore files, right? I think most of you know what those are. Those are the Apple double files that we sometimes have to save the resource fork or extend attribute information in. We store them in dot underscore files, for instance, if you're saving it to an NFS volume. And we've all seen these kind of lingering around.
In the past, a lot of this functionality was in the core services file manager. So the way we do things is, for instance, write out the size of the-- when we touch the resource fork, we would write out the size of the resource fork into the header of the Apple double file. And then we'd go ahead and write out that entire resource fork. Well, as we move this functionality from the core services file manager and push it down into the kernel, we operate under different constraints now. So now we're going to effectively write out the size information and the resource fork at the same time when the file is actually closed.
But you shouldn't be accessing the._ file directly anyway. The /rsrc. If we have a file named foo on an HFS+ volume, for instance, we can always go cat foo/rsrc, and this will actually catalog the entire resource fork of the file foo. This was kind of a hack, as I mentioned, because it really only worked on certain volume formats. And now, since we pushed a lot of this functionality down into the kernel, we're removing that back in, or that hack anyway.
Removal of volfs. So in the past, if you ran the fsusage tool, you would notice a whole bunch of reads and writes going to slash.vol directory, and you typically see a file ID after that. We're starting to close the hole on volfs, so you're no longer going to see that. We have a great tech note, 11.13, describing the slash.vol and volfs. And one of the comments in there, or one of the notes in there says, under no circumstances should your application construct paths with slash.vol. So this still applies. At this time, I would like to welcome Dominic Giampaolo.
All right, I'm on. I'm Dominic Giampaolo. I'm a file system engineer, and I'm here to talk about the File System Events Framework. First, go over the agenda of what we're going to cover today. I'm going to start with some history and background about what the problem is or how we see it and what's difficult about it, what we were trying to do with the FS Events Framework. I'll introduce the FS Events Framework API itself. And of course, since this is a developer conference, we're going to go through some code and an example application that uses the FS Events Framework, and then review and wrap up.
What is the problem? Well, there's sort of two classes of problems that we see or we get a lot of requests for. Did anything change in this hierarchy? So you have an application like iPhoto or iTunes, and it manages some large hierarchy. Did something change underneath of there? You have other applications where they're running live, including something like iPhoto or iTunes or the Finder. Is anything changing in this hierarchy? So there may be some window that's open that represents some current file system state, and you want to know, is anything currently changing underneath there because I need to update my display? So those are actually really difficult questions. And historically, there has been no good answer to those questions. We also see that different applications want different kinds of answers. What changed differs depending on what type of application you are.
So what exactly is a file system and event? At a low level, a file system event is a create, a rename, a delete. It's other more subtle things, such as the update to the mod time or the ownership permissions of a file. It's when a file changes. It's really quite a bit of low level events. At a high level, an event is something like, well, I saved a document, right? From my perspective, that's one thing. As a user, I hit Save, and there's a file there now. In reality, when you change a file and you do a safe save, as Derek alluded to, there may be a whole bunch of operations that go on under the hood.
And in practice, if you've ever looked at it from the underside, as I'm frequently doing, you'll see that file system save operation from an application may be 5, 10, or 20 even actual events that come through at a low level. The raw stream of events that we see generated is really difficult to manage. It's a big fire hose. And it's quite complex. tracks. Now, clearly everybody hasn't just sat around for 20 years waiting for us to introduce the FS events framework. They've come up with other solutions.
At one level, you have polling. Did this file change? Did this file change? Did this file change? And so on. Or a full rescan. If you have a backup application, typically it walks the whole file system hierarchy to find out what is different and what needs to be backed up.
Core services introduce the FN subscribe and FN notify APIs. These are opt-in APIs where you can say, I would like to subscribe to notifications about changes to this hierarchy or this file. And other applications can call FN notify to say, I changed this. That's great, but obviously as an opt-in kind of thing, people can either forget to do it or choose not to do it, and you won't be notified. So it's good if everybody plays by the rules, but not everybody does. Typically, you want something a little bit more comprehensive. In Panther, I believe it was, we introduced the BSD mechanism called KQs. This allows you to watch a file or directory. If you have a file descriptor for a file or directory, you can say, alert me of notifications to this. Now, this is a step in the right direction, because it is event driven. So you don't have to poll it. You can just say, tell me when something happens. Then in Tiger, the KAUTH subsystem was introduced. And this is a pretty big hammer. This allows you to essentially interpose every single file system operation and authenticate or deny it based on whatever criteria you want or simply observe that it happened. Now, what are the problems and issues with this? Well, clearly, in the polling case or the full rescan case, it's slow. Walking in a file system hierarchy of a standard Mac OS installation can involve touching hundreds of thousands of files, literally. There's no way to watch an entire hierarchy. While KQs will give you-- it's an event-driven mechanism. It's just for a single file or directory for which you have a file descriptor. If you wanted to watch a whole deep hierarchy, it doesn't provide any support for doing that.
There's no history of events. You can't say, well, this backup application or whatever, this synchronization app or whatever have you, hasn't run for a week. What changed in the last week? So not having any history, all of these things were sort of live, not having any history was a bit of a problem. And the chaos subsystem is extremely powerful, but kernel-level monitoring is not practical for most applications. Typically, people are not going to write a KEXT to go along with their user-level app.
Something else that we introduced in Tiger was called DevEvS Events. Well, we didn't really introduce it, but a number of you found it anyway. It was implemented for Spotlight specifically. And it provides a raw stream of events to support the functionality that Spotlight needs. It's all locally generated events. So any change to any file system that's mounted locally, or any change that's produced by a local application produces an event.
The events are very raw. As I said, you see everything that comes out of the kernel. So a stat changed, a permission change-- well, permission change is a stat change, an ownership change. When a file is closed after being written, you get a modification, renames, deletes. It's everything. Because it's a kernel-based mechanism, it's also sensitive to slow clients.
For some reasonably complicated reasons, it has limited buffer space in the kernel. and it's shared by all the clients. If someone is slow at reading them, it will back up, and eventually it will run out of space, and events are lost, which is a fairly catastrophic event. So because it's sensitive to slow clients, it's not really appropriate to open that up to general purpose applications. Besides, with the event stream being as raw as it is, it's not as useful either. With all these problems, we clearly have an opportunity on our hands.
So we started thinking about it, and we decided to come up with the FS Events Framework. I'd like to talk a little bit about our design rationale so you understand where we're coming from. We sort of thought about some of the different clients. Clearly, as you can see from the Leopard preview, backup is important. Sync is another application, and of course, the Venerable Finder. Back up in sync, we'll ask the question, did anything change in this hierarchy? The finder is more of an online, live kind of, is anything changing in this hierarchy type of application? Clients we know that we aren't going to satisfy are virus checkers. Virus checkers really have very sophisticated requirements that can't be met with this API. And the chaos subsystem is actually more appropriate. So we're not trying to be all things to all people. We also knew that we had to put some limitations and constraints on the problem to make it a little bit more tractable.
Storing a complete log of events is just not possible. Your computer is designed to run user applications and to do things, not to sit around storing all the stuff that you're trying to do, but it's being interfered with by the fact that you're storing all the events that you're trying to do. So we have to really filter the event stream down to something manageable, something digestible that makes sense to applications. If you look at the raw event stream, it's extremely complicated to understand, oh, that was a safe-- Those 20 operations were actually just a safe save of a single file. And as I said, we can't be all things to all people, so we had to narrow things down a bit. So without further ado, introducing the FS Events Framework. This is a core foundation-based API that lets you watch an entire file hierarchy for changes. You get directory-level notifications of changes that happen within that hierarchy. So if you say, I want to watch the user's home directory-- and I'll go into more detail on this-- you'll find out specific locations that have changed within it. We offer a persistent change history. And again, I'm going to go into more detail about all of these. And we have fine control over the frequency of updates that you get. What don't you get? You don't get events for changes to specific files. And I'll talk about this again more in the later slides, but You can't say, did file foo change?
What does it architecturally look like? This sort of helps to understand what's going on under the covers. At the top level, FSEvents is obviously the framework, the API, that you as developers can call. There's a corresponding daemon, FSEventsD, that orchestrates things behind the scenes. FSEventsD is based on the DevFS events device that provides the raw stream of events from the kernel. FSAventity filters the event stream into something that is coherent, sends the updates through a Mach message to client applications, and it also keeps the historical records.
Actually, I should go back. So as you can see, we have the event history that's being stored away on disk. FSEventsD is reading the events out of the kernel, passing them over to the FSEvents framework, which in turn hooks into the core foundation-based API. So what are the concepts behind the FSEvents framework?
You can monitor a path in the file system namespace, and you get only events that happen beneath that path. So yes, you can watch slash, and you'll see all the directories that are changed, all the directories that have changes made to them anywhere in the file system. All events that you get have a corresponding event ID. Event IDs are 64-bit, so they never get recycled, at least not for, now, let's say about 5,000 years. You can ask for all the events since a specific event ID. As I said, event IDs are persistent, so they last across a reboot.
When you get an event, you're supposed to use it to figure out what happened that's of interest to you, and then, of course, do the right thing. - Okay. The path that you watch doesn't have to actually exist. It's actually just a string-based prefix match. So you can watch something that doesn't exist.
And when it does come into existence, you'll start receiving events for the changes that happen under that path. For now, the path has to be an absolute path. Even in the future, if we allow you to specify a relative path, we're just going to turn it into an absolute path. As I said, you get events for anything that changes under the path. You can watch any number of paths that you want, I suppose within reason, of course. And security, right now there is none. But obviously there will be, so you're not going to be able to watch things that you don't have permission to see.
Here's some specific examples. If you watch users foo, and then someone creates the file users foo documents my cool stuff new file, you get an event. The event that you get is for the directory users foo documents my cool stuff. If someone creates /temp ignore me, you don't get any event, nothing comes through to you. Because you're not watching that, you're only watching users foo. You don't find out about changes to the root of the path.
So for example, if you chose to watch a directory which didn't exist-- let's say, users foo sumdir-- and then someone creates it, you don't get an event for that. If then they create something inside of there, then you do get an event. Now as it says, this is open to discussion because we've kind of gone back and forth. Clearly it's something that's easy to work around. You just watch one level above for the thing that you're looking for. But it's something that we'd like to get feedback on.
The event history. The FS events framework, or more specifically, FS events D, stores all events. Yes, really, all events for all time, because currently I'm not deleting them. So the event history, it turns out, is actually really small, and it compresses very well. So what I see on my machine typically is that for the day's worth of logs, it's at most 100k or something like that. So it just doesn't make sense to delete them. So it's persistent across reboots. So you could say, I haven't run for a month. Here's my last event ID. Tell me everything that's changed since then. So this allows you to not have to worry about doing a full rescan. When you receive historical events, obviously, you may see the same directory-- well, not obviously, but you will see the same directory potentially modified many times.
If you are asking for all events since a particular event ID for a user's home directory, and it's been a month since you last ran, well, the home directory has probably been modified more than once. So the events that you get are not unique. That's something that your application needs to take care of.
We also had a few assumptions. And I kind of want to detail these so we understand where we're coming from. First off, clients that want to monitor some large file hierarchy have to have a mechanism to generate their initial state about what it is they care about about that hierarchy. So you have to have code that does a full scan and knows how to process that and build state. If you have code that does a full scan, it's pretty straightforward to make it code that does a partial scan by saying, oh, just go update this particular directory. And given a directory that changed, you can scan that directory and update your internal state based on what differences you find that are interesting to you. So we sort of have, like I said, those assumptions about what clients are going to have to do to maintain their state given the event notifications that they receive.
Now, we'll go to this demo machine, and we'll go through a simple example that shows how the raw event stream that comes out of the FS Events API. So we have this tool down here called FSEvents. And this just prints out all the events that it receives. As you can see, I'm specifying an option of a latency-- dash latency 1.0. And I'm going to give it the path slash so that we see everything that changes on the system.
So we may actually see some other stuff depending on what Damon's run, what goes on. Now, I have this other terminal window. And I'm going to make a directory. And I'll call it foo. Wonderful name. About a second later, you see that we got an event for users/apple/dbg, because that's the directory. If you see, that's where I'm at, and that's the directory that I created. If I go into foo and I touch new file, we're going to get an event for the directory foo, because that is what changed. Another example-- whoops, actually, I'll do it. I'm going to geek out here for just a second to create a whole bunch of files.
So we're going to create a couple thousand files. And I want you to observe-- so that created-- this machine's too fast. Drats. Need to slow it down. Well, all right. I can play this game too. We'll create a lot of files. Oh. All right, thwarted. All right, so now what you see is that we're getting multiple events as I'm creating many thousands of files here. And they're coming through. And what I wanted to demonstrate is if I went down here and changed this latency-- like you may say, oh, one second, that's too long. I can't tolerate that kind of latency. So I'm going to have a 0.1 second latency. Now if I go up here and create-- Well, I'm not going to create that many files this time. You see, we get a whole lot more notifications.
Now, this is obviously good if you need to update in that kind of real time. But typically, you want to have a latency that's long enough so that events get compressed. Because 4,000 files were created in this directory. But really, I only want to have to do one rescan at the end when the dust has settled, so to speak. So if I was to change this latency and, say, use a 2.0 second latency-- and then go and create all those files again-- whoops, that one-- what you'll see is that that will run, and I might get one or two events instead of 10 or 15. In fact, I only got one event. Now, you see these event IDs. I will show you the history. So if I specify-- well, let's do latency 1.0, and I'll specify a dash since when. And I'm going to say-- because I was paying attention earlier.
So this is actually going to be a fair number of events that are going to come through for the history since, let's say, a time event ID of 320,000, which when I came into the room was roughly what the event IDs were at. Whoops, I'm going to have to give it a directory to watch. So here we see everything that's changed since-- so you can see Spotlight was busy. Something changed in /dev, which is probably worth ignoring. And these are all the other directories. In fact, we can see that the fseventsd was writing to its directory. Temporary items, preferences got modified. So again, this is what's happened since that particular event ID. And then you can see that users dbgfoo got a few updates as well. These get coalesced on a 30-second granularity. That's not super interesting, but you can get multiple events for the same directory. So this is just kind of giving you an example of the parameters and options that you have with the FS events framework. So if we go back to the slides-- So now we're going to get into talking about the specifics of the API. The data types that you have to worry about, it's a very simple API. First thing you have to know about is the FSEventsStream. This is the channel that you receive the notifications on when there's changes. It's a CF run loop provider, so you can create it, schedule it on a run loop, and we'll go into those details too. And then the next data type is not really a data type, but the FSEventStreamCallback, which is a typedef for the API that you have for your callback function. And that's what gets called each time that there are changes.
FSEventStreamCreate, as I said, it's a standard CF run loop provider, so there's a variety of kind of template arguments that you expect, you know, the CFAllocators, there's a context pointer. The arguments that are of interest for the event stream itself are, first and foremost, the callback, or perhaps the paths are even more important. The paths is a CFArray of CFStrings of paths that you want to watch. The since when parameter, which is a 64-bit ID, that's the starting event ID, we have a couple of different options for that. You can specify since now, which means just from the current time, from when you get this forward, is when I want to receive events. Or you could specify zero, which would be from the beginning of time, but you probably don't want to do that. Or you can specify an actual event ID, which is the last one that you saw.
The latency is the frequency of updates that you want to receive, or how often you want to be called. So as you saw, I was specifying like 1.0 for one second, or 0.1 seconds. And that's how you will receive events at most that frequently. There's also a flags argument, but we don't have any flags, so you can just pass 0 or the constant for none.
The lifecycle of an FS event stream looks like this. You create it first, specifying the arguments that you want. Then you schedule it on a run loop. Then you have to start the event stream. Again, we have very sort of explicit phases, so that you have pretty good control over when you-- so you're creating it, then you can start it so that you know that everything in your application will be initialized at that point, and your callback's going to start being called. You can stop an event stream, which allows you to do clean teardown, so you don't have to worry about race conditions between, oh, I'm still getting events called back, and I need to stop it. And this is a very straightforward way to do it. Once you stop an event stream, you can restart it if you want. Once you call invalidate, the thing is basically headed for the bit bucket. And the only other thing you can really do is call release.
The callback itself gets a couple of arguments. There is first the number of events that you receive. You may receive more than one event. If you have a very long latency, you may get 10 or 20 or 30 events. I forget what the maximum is that will pass at a time, but you can receive quite a few. You get three parallel arrays, and you get the paths for each event. So the event paths array, the first entry is for the first event, and that tells you which directory under the hierarchy that you're interested in that was modified. For each event, you also get a flags argument and an event ID as well. The flags argument is very important. Normally, it's just going to be 0, which is good. It means nothing of interest happened. You may also receive one of these three flags. The first one, must scan sub dirs, means that, well, there were some problems.
As I said, devfs events in the kernel has limited buffer space. And when it starts to get tight on space, it will start to combine events. So what can happen is that you will receive an event for some directory within the hierarchy which you're watching, but it says-- what this flag means is that you need to rescan from that point down. This is an unfortunate circumstance, But it's better than clearly having to rescan the entire hierarchy. Again, we go to great lengths to avoid having this happen. But if it does happen, you need to be aware that if that bit is set, you need to rescan from that point down to be able to find all the changes that happened. The next two flags are a little bit more serious. And hopefully, you'll never see them.
The first one is called flags user dropped. What this means is that your application was not reading events fast enough. The callback wasn't processing the data fast enough. And things got clogged up. And we had to drop events for you. Something changed. We can't tell you what because you weren't reading them fast enough. Now this is catastrophic in the sense that you now have to do a full rescan of the entire hierarchy that you are interested in. But it's a very clear indication that you need to make sure that you're processing the callback fast enough. The next event, kernel dropped, there's nothing that you can do about this. You have to do a full rescan, but things got so clogged up that the kernel couldn't even keep up with the stream of events that were being generated. And again, this is catastrophic in the sense that it means that you have to do a full rescan. So we go to great lengths to avoid these things, but realistically they can happen, and you need to be aware of that.
The last array is the event ID. And obviously, for each path, it has the flags and an event ID. And these are basically just monotonically increasing numbers. They're, in essence, a time stamp, but they bear no relation to any wall clock time or anything like that. It's just a number. And you can store them away for later use. So when your application is shutting down, the last event that you receive, you store that ID, and then you can pick up from that point in the future.
Now, going over to the watcher example. Let's see. OK. This program called Watcher, that I'll start going through the code in a second, is, well, sadly, a command line application, because that's all I'm capable of. And I took a Cocoa programming class, but it's just-- I don't know. Anyway.
Yeah. What it does is monitors a file hierarchy and keeps track of the size. So if something changes inside of the file hierarchy, it updates its state and tells you what the new size is. And it does that by re-scanning the one directory that changed. So it doesn't have to do a full re-scan. So I can watch a particular directory hierarchy. It will build its initial state, and then it will update that state as things change. So there's a little bit of support code that I'm not going to go into. I'm not going to go through the code that walks the directory hierarchy to build the initial state. You can figure out that that's not too exciting, or the code that parses the command line options. So anyway, starting here in main, we're going to go down. As I said, the first thing is that we have to get an absolute path. So we call real path on the path that came in on the command line. And we only support monitoring a single directory as a simple example. example. And if real path fails, because let's say the directory doesn't exist, we just copy it in. One other thing to point out, a frequent thing that people do with the FSEvents APIs, they'll go and watch /temp and they go, "I don't get any events." Well, that's because /temp is actually a symbolic link. So if you call real path, then you'll--it'll resolve the symbolic link. Now, the first thing that we do is we get the directory size for that full path.
So that prints out just what it sees the initial size is. And that get directory size function builds some internal state for each of the directories that it encountered, and you'll see how we use that later. Now we're going to go in and create the event stream. So up here we have the fseventstream create, which is a pretty simple wrapper function in this case. And the first thing that I want to point out is the context, the fseventstream context. So your callback function obviously needs some hook back into your application data structures.
The FSVM stream context is how you do that. In this case, we're going to set our info pointer to be the path that we're monitoring, the root of the hierarchy that we want to monitor. And so there's some other arguments that you can fill in if you're a sophisticated core foundation app. In this case, we don't have any need for that. Next, we create a CFArray of CFStrings and basically massage the string that we got into this CFArray of CFStrings. And then we call fseventsstreamcreate. As I said, you have the callback. We have this function called fseventscallback. We pass it the context pointer. This is our hook back into the application data structures. Pass in that CFArray. And then the settings that were specified on the command line for the since when, latency, and flags. In this case, most of these, except for latency, are not terribly interesting.
Back in main, once we've got our stream, we schedule it on the run loop, and then we start the event stream for the event stream that we just created. We're not going to talk about the flush seconds. This is a separate thing. And then we call cf run loop run, and off it goes to get our events. Now, first off, there's a bug in this program. I'll just tell you that right now, but I want to go and show you how it works in practice. So down here, I will run -watcher -latency, let's be a little aggressive, and this current directory.
So it tells me that the initial size is somewhere over 2 megabytes for this directory. Now up in this other window, I'm in a subdirectory there, and again I'll -- oops, gee, I got stuff. Right, so it's removing all these things. The size of the directory has gone down a small amount. So now what I'm going to do is I'm going to create a file. And again, I'll geek out. Whoops, junk box size equals. So I'm going to create a 512 byte file. Whoops.
Did I say that right? Anyway, can't think straight right now. Count equals 1. And we see our size updates by roughly 512k. If I remove that file, the size will go back down. Remove junk. And you can see that it just re-scanned the one directory where the changes were made. So remove junk, size goes back down to what it was before, and everything is OK. Now if I open this, I'm going to see that it's So we have an empty directory. And you can see that a change was-- made to the directory up above by the finder. If I get a new finder window and go to Applications, I want to take the calculator and drag it in here. We get a whole bunch more. It goes up by about five megabytes. If I delete that, our size goes back-- yeah, five or six megabytes. Empty the trash. Yes, I am sure. And it's gone.
So it plays well regardless. So that calculator application, my point of showing that was is that that's actually a bundle and has a whole bunch of subdirectories and the FSEvents stuff tells us about the changes. If I do make dir, let's say, a series of directories, a/b/d, get a series of events, these changes and sizes are small because it's just the small directories that are being created. But if I go into abcd and then I go and do the dd again, we get an event. And even though we have a particularly large hierarchy, we only have to rescan the one directory that actually changed. That's the whole point of this allowing your application to be more efficient. So again, if I delete it, the size goes back down. Now going back to the code, what I want to point out is what the bug is.
Here, you saw I said I got the initial size of the directory, and then I created the stream, and then I started the stream. The problem is that by getting the directory size first, and then starting the event stream, we've left a window open. And we all know that there can be problems with windows. So-- You want to make sure that you're getting events after you've created your initial state. So the right thing to do here is to take this code and to put it after we've started the event stream so that once we have our size, we have our initial size, then when we start getting updates, there's no window for things to change without us receiving notifications about it. So that takes care of that. If we go back to the slides.
All right. So that's a summary of what we did. The application that we showed is just very basic FS events stream usage, because it's just monitoring a single path, no complicated stuff with the run loop, pretty simple guy. So just for pedagogical purposes, we kept it straightforward. The application creates its initial state, and it lets the callback drive updates to that state. As I said, whenever we made a change to a directory, of somewhere deep in the hierarchy, it would update that portion of the state and reflect the new total size of the hierarchy. More advanced apps could do a lot of different things. They could use the event history. We didn't actually use that in this case. They could watch multiple different hierarchies if they needed to. They could create multiple streams. You may have a highly multi-threaded application, and different parts of it need to watch different hierarchies. And they have different run loops, and they need different event streams. You can do fancy scheduling with start and stop. As I said, that's a good way to do clean start up and tear down of your FS event stream. You can use custom CFAllocators if you wanted to and so on. Now, some advice and sort of other things. You have to look at the flags for an event. Oh, I suppose actually that's one thing I didn't show was the callback. Can we go back to the code?
I kind of forgot the punchline, so to speak. So the FSEvents callback is, as I said, it gets a couple of standard things. The context pointer is passed to it, which in our case is the full path that we are monitoring. And then for each of the events, it goes through and takes a look at them. Now in this case-- let me just make this just a wee bit bigger.
We take the path that we were given and we chop off the trailing slash if there is one. Now we're looking at the flags. If we have the must scan subdirectories, we set this recursive flag to one. If we got the user dropped flag, that's pretty bad news. We set recursive equal to one and then what we do is we copy over the path that's there and put the full path in its place.
If the kernel dropped events, that's really bad news. And we set recursive equal to 1, copy over the full path again. Otherwise, there's no need to go recursive. We just call get directory size for that particular directory. That's how things were updated. And then it will print out the new size. So that's what the callback looks like. So sorry, going back to the slides.
So dropped events are rare, but you do need to handle them, as I just showed in the callback. The types of re-scanning that you may have to do are starting at some point in the hierarchy and below, or everything, which are really just variants of the same thing. Another thing that we discovered-- or not discovered, or sort of realized you could do-- is you can create token events. So you don't have to be passive and just listen to what happens. You can actually do things and observe those events coming through the pipeline on the other end. You can use this to kind of wrap the beginning and end of some other operation that someone may be doing on your behalf. So you may create some directory or file with a well known name. You see that the modification comes through. You observe that it's there. And then a series of other events may happen. And then when that operation is complete, you do a token event at the end. And so you sort of have begin-end pairs to know when are finished.
As I mentioned, you want to be careful of leaving windows open between when you generate your state and when you start receiving events that would update that state. It's very important because they may be small windows, but you don't want to miss things that would cause you to have out-of-date state.
Now, FS events isn't the only thing. It's a good thing, but it's not always appropriate in all cases. A lot of times you go, oh, wow, new API, got to use it. And FS events framework is a good solution, but it's not the only one. You may want to consider just using a KQ. There's no need to change if you're only monitoring a single directory. If you have a spool directory and there's no subdirectories or anything like that, A KQ is actually probably lighter weight and more efficient, less trouble to use than using the FS events framework. If you just have to check a couple of files, did some particular preference file change when I wasn't running, you would not use the FS events framework. Simply stat the file and get the information that you need. FS events is best when you've got a really big hierarchy that you want to monitor or know about changes to, or you need a history of changes. If you don't actually need the history and so on, it's probably more straightforward to use a KQ if you only have a few files or you only need directory changes for a single level of hierarchy. Thank you.
Now, as we've been developing things, we noticed there's some issues. So there are going to be some changes coming. We realized that you're going to have to have a GUID for an event stream to uniquely identify it so that an event ID is really paired with the event stream from which it comes. So that if for some reason you stored an event ID, and then let's say someone dragged copy stuff to another machine, that event ID is no longer really meaningful.
you have to have a way to know that the underlying event stream that you're getting is different. So it's really going to be a pair, a GUID for the event stream from which the event ID came. We're probably going to also introduce a special event that indicates a volume was modified outside of our purview.
That is, somebody changed it. Like if you have an external FireWire drive and it's taken somewhere else and modified and then comes back to a Leopard system, you need to know about that. Because it means that you're going to have to do a full re-scan. And there's nothing that we can do about that. There's no magic in the world in the sense of if somebody else makes changes and they don't record the FS event history, there's no way to know about that.
We will probably also have some API cleanups so that some little things like the paths to the callback may become CFStrings instead of just being standard C strings like they are now. On the way in, you specify a CFArray of CFStrings, but then the callback just gets C strings. So there's some little inconsistencies there that we'll probably try to resolve. Again, we're also very open to feedback and hearing what people are interested in.
So in summary, the FSEvents framework provides you with a change notification for any file hierarchy or file hierarchies that you want to watch. Monitoring a file hierarchy provides notifications for changes to directories within that hierarchy. Events are coarse-grained. As I said, they're directory-level changes. And it gives you a full event history.
The purpose of this is to allow your application to be more efficient at processing updates and changes so that you don't have to just rescan an entire hierarchy or pull or periodically stat things to find out about changes. So that's the FS events framework. And I guess that concludes things and we have I think about 15 minutes for Q&A.
I just also wanted to mention that we're starting a file systems dev list, a public list. You can find it at list.apple.com where you can get eyes like Dominic's, have discussions. The file systems engineers will be available tomorrow morning, 9 a.m., in the Kernel Extensions Lab and also at the Campus Beer Bash. Thank you very much.