Spotlight: Search and Be Searched - WWDC 2008

Integration • 1:04:37

Spotlight is an integral feature of Mac OS X that enables users to search documents and data throughout the system by name, content, or metadata. Making your own documents and data searchable is a critical part of providing a great Mac experience. Find out how to produce searchable metadata for the Spotlight engine and leverage Spotlight from your own application so your users can find content on demand.

Speakers: Andy Carol, Kaelin Colclasure

Unlisted on Apple Developer site

Downloads from Apple

SD Video (785.8 MB)

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.

Good morning. My name is Andy Carol. I'm an engineer here at Apple, and this session is Spotlight: Search and Be Searched. What does it mean to search and be searched? A lot of your applications already search or could benefit from search. Installers want to know where are previous versions of applications. Productivity apps want to know where fonts, sounds, and other files. A lot of programs can benefit from this.

And be searched. Your customers want to be able to find your data. How can you help them find your data? Let Spotlight do that. Let Spotlight help your customers find your data, and let Spotlight help your programs find the data that they're looking for. But what is Spotlight?

It's a lot of things. Spotlight is a top-to-bottom search facility built into the system. It's part of the user experience in the most common areas that the user works. It's also expandable. It's designed to bring your data into that search capability, and it provides mechanisms for your programs to be able to find the data that they're looking for. Let's take a look at that in a little bit more detail.

At its core, Spotlight is basically a server with a database attached to it. And it has a database that contains a lot of information. The most important information is the metadata storage. Let's talk about that a little bit. What is metadata? Metadata is information about information. Let's take a look at a good example. The Finder info window is probably the most common user visible source of metadata in the system.

Let's take a look at a little bit more detail. From this, we can see the size of the photograph. We can see that the photograph was taken with a Panasonic camera. And the exposure time is 1/125 of a second. Now, this information isn't the photograph, but it's information about the photograph. And it's the kind of information that your users would want to use to find that photograph.

Let's take a look at how that's implemented in the system. It's very simple. It's key value pairs. We have an attribute and we have a value. For example, the... Pixel size, the file name, et cetera, are all attributes, and they typically are named KMDItem followed by whatever value is appropriate for the information being described.

You can store not only information like this, but you can internationalize it. Information can be provided rather than as a flat value. It can be provided as a dictionary. And that dictionary can be keyed by the localization code so that users around the world can see the appropriate information for that metadata.

But Spotlight is more than metadata. It's also content. Content from your files. The KMDItmTextContent field represents all of the user searchable text that would be found in a particular file. Now for a text file, this is pretty obvious. You provide the content of the text file. Spotlight will let people search for words and phrases in that content. But it can be more than that.

A lot of files don't really have text content in the traditional sense but they have a lot of value that can be construed as text content, song lyrics. Or for example, a CAD program might have descriptions for each room and the descriptions could be concatenated and searched from the content index. It's important to note that this information is writable, that is it can be imported into the system, and you can perform queries against it. But it's not generally easy to fetch back the text content of a file. It's used only for searching.

Spotlight also has a built-in UI that a lot of users interact with. This is the Spotlight menu, the Finder, and File/Save/Load dialogs. It's also extendable. We do this with another part of the architecture called the importers. And the importer has plugins. There are plugins to bring in each common file type already provided with the system.

We have a list here. It's pretty comprehensive: images, sounds, video, file formats for text, etc. But the one thing that's missing is your data. And we're going to talk today about how to get your application's data into Spotlight. We do that by your providing a plug-in for your users that will know how to index your data files so that your customers can find data that's created with your program.

[Transcript missing]

Now let's take a look at the info.p list. This is sort of the core file in your project that describes the capabilities of what you're building to the rest of the system. And there's one important field that we're going to want to edit. It's the supported UTI type. We need to define a UTI type for the kind of data that your importer, plug-in, is going to process.

What is a UTI type? It's a uniform type identifier. This lets the system know in a universal way what kind of file you have and what its capabilities are. There's a naming convention for this. It's basically your domain name backwards. This is done to prevent collisions with other companies' data types and to make it so that there's no bureaucracy. You can choose a UTI type that's appropriate for your file without clearing it with anyone else.

There's one exception. UTIs that start with the word "public" are meant for the simple built-in data types that are universal and not owned by any one application. And public is reserved for Apple. Apple's the only source for UTI types that start with public. But anything with your domain name is yours to work with.

Now how do we declare these? Really, it's your application that owns the UTI type. So it's important that your application declare the types of files that it--that it's going to be using. It's important that your application export your UTI type. That makes it the authority for the rest of the system on the behavior for that UTI type. What application is launched when you double-click on it? What icon does it have? This is all very important information, and your app is the final authority.

But what about when you have an importer plug-in for Spotlight and you're not sure if your app is going to be there? Your plug-in can also declare the UTI type for the system, but instead of exporting it, your app will import it. And that's just as good to the system. The system will know everything about the UTI type, but the system also knows that this is a fallback declaration, and that if your app is present, the application will take precedence. So it's a good idea to import one for your plug-in as a fallback.

So now we go and we'll enter in the UTI type made up one for this particular plugin. Now we have to talk about the schema. A schema is metadata about metadata. It describes the metadata attributes that your plugin will be providing to the system. Now, the system provides quite a few predefined attributes, and you can define that your plugin will provide them.

In this case, we're providing a title, a comment, and other fields. Now, you can also specify which fields from your data types should appear in Finder info windows. So when the user opens an info window, the bottom two fields, the title and the comment, are the fields they would see from the Finder.

If you need to define your own data types, the schema.xml file contains comments to describe how to define your own data types and inform the system as to their structure. But we really recommend that whenever possible, you use the existing data types unless there's really no way to describe your content with the existing ones.

The last thing we have to do is implement the callback function. This is really the meat of the importer. The callback function will be called with a path to the file the system wants to import, and your plug-in will open the file, Parse it or input it in any way that's appropriate, pulling out the attributes you want to inform Spotlight about, and then you'll put those attributes with their values in the dictionary that you're provided. That dictionary will be sent back to the server and users can now search and find this specific file based on the attributes that you've provided.

Kaelin will now give us a demo of how to go ahead and build an importer plugin. Good morning. I'm Kaelin Colclasur. And as Andy said, what we'll be looking at doing this morning is actually taking an existing application and looking at how simple it is to build an importer plug-in and expose that application's content to the user's desktop via its metadata.

For our demonstration here, we started with a little application called Fortunes. For those of you with a bit of a Unix-y background, this is inspired by the classic fortune cookie program that shipped in Unix for decades now. It basically allows the users to accumulate a nice little collection of witty sayings and store them away, and then it can spit them out when your terminal window logs in or something like that. So the classic Fortunes--

[Transcript missing]

Yay. Sorry about that.

OK. So our Fortunes application, I'll go ahead and launch it here, and as we see, it gives us a nice little Cocoa table view with a list of the Fortunes that I've happened to put together so far. Now for the purposes of making this interesting content to index with Spotlight, I've actually, instead of using sort of the classic Fortunes file format, where you have a single monolithic file that has a simple text delimiter, I use individual files to store the Fortune data. So let's go ahead and look at one of those.

And we can see here this is just a-- you know, the standard XML plist format file. And, um... Each one of these files holds a single entry in our Fortune database, and we'll basically store individual files like this every time we add a new entry to the database. So without further ado, let's go ahead and see what it takes to get that content indexed and searchable from the Mac OS X desktop.

So the first thing we want to do in Xcode is create a new Spotlight plugin project. We just create a new project, and we navigate down into our Mac OS X standard Apple plug-in section and select the Spotlight plug-in template. Choose that. We're going to save it as Fortune Cookie.

And the actual target for this project will be fortunecookie.mdimporter. So now, as Andy mentioned, one of the first things we'll need to do with our new Fortune Cookie project is update this Info.plist file to basically expose to the system and launch services the UTIs that this particular importer is going to support and also declare to the system what metadata elements are going to be introduced by this importer. So we'll double-click to edit that.

And as Andy was mentioning also, one of the first things we want to do as we edit our Info.plist is to make sure, since we're building a standalone importer project, that we have an imported declaration for our document UTI type. So we'll go back to our application here, actually open its Info.plist quickly, and find its declaration for exported UTI types. And copy that. Then switch back to our Fortune Cookie project and its InfoP list. We're going to create a new declaration here of imported UTI types.

And just by pasting there, getting rid of the template, if I expand this out, we can see that this basically contains the declaration of the UTI tree that this particular content conforms to and also declares what extension we're going to use for our files as we save them on disk.

With that out of the way, we can move on to setting the actual supported UTI type for this importer. Now, your importer plug-ins can actually handle multiple UTIs. Like, if you have a document that has-- or if you have an application that has more than one type of document, you can handle all of those with a single importer. So this document types... So the imported type UTIs that we're going to edit here-- I'm sorry, the document types we're going to edit here actually declare an array. of supported UTI types. In this case, we've got only a single type, so we just fill that in.

example.fortune-cookie. And with that done, we're ready to save this file. Okay, the next thing we want to do is actually edit our project so that we can use Objective-C to implement our plugin. To do that, we need to add to our external frameworks the existing metadata-- or the existing foundation framework.

is a great example of this. He's a great example of how to use Spotlight to create a Go ahead and change the extension of our source file. We see here in the template we have a standard source file called getMetadataForFile.c. We can just change that extension to .m, and that sets us up to use Objective-C.

Click to edit that file, and the first thing we want to do now is go ahead and import foundation framework. And then we can see here the template includes enumeration of the steps that we're actually running through right here. So we'll just come down and... Paste in an implementation for our importer here.

Since we're using Objective-C, we, of course, want to declare an auto-release pool, and basically, Our documents on disk consist of a simple plist format that the Foundation dictionary class has a convenience method to read for us. So we just call that convenience method with the path that's passed in when our importer is called.

And we fill in the attributes dictionary, which is a mutable dictionary reference, from the elements in our dictionary that we want to include as metadata for our document type. If we succeed in loading the dictionary and setting all those attributes, we go ahead and are going to set a status value to true, which tells the Spotlight infrastructure that this importer plug-in succeeded for this particular document. If there's any problem at all, all you can do here is return false. There's no UI or anything presented from an importer plug-in. So then we go ahead and return that status, and that's all there is to the plug-in.

Now, we do have one further thing we need to do. And you can notice here we've -- in our plug-in, we've basically -- we've basically used a set of standard attributes that we got from mditem.h that sort of captured the essence of the metadata that we wanted to have for each of our documents.

But for example purposes, we're also going to go ahead and define a custom attribute. For a custom attribute, you use the same sort of reverse DNS naming convention that you use for the UTI. The notable exception is where in a reverse DNS name, you would have dots in the name, you use underscores instead. is key-value coding compliant for Cocoa.

Okay, so we're done with that file. And now we can edit our schema to go ahead and tell-- in the schema, declare the same information we just imported in our importer so that Spotlight knows what to expect when this importer plug-in is run. So under Resources, we have a schema.xml file.

And as the comment here says in the attributes section, You can put in any declarations of custom attributes that you need. Now, I would remind everyone, in general, it's unusual for you to need a custom attribute. There's a pretty complete set defined in mditem.h, and that's definitely your first place to look as you're defining the metadata types for your document.

But again, for example purposes here, we'll go ahead and define our custom com example fortune ID attribute, and we make it a single, simple, single-valued attribute of type CFString. And then the next thing we want to do We've got an all adders element, which if you remember from Andy's slides, basically tells Spotlight all of the attributes that are going to be returned by this importer plugin.

It's not the correct declaration. Let me just fix that real quick. There we go. Here we declare our custom attribute along with the three we selected from mditemh: timestamp, display name, and text content. The last thing we need to edit is our display attributes. And this is the subset of attributes we would actually like the finder or the Spotlight menu-- the Spotlight menu will display in a tooltip or the finder will display in that little More Info panel in the Get Info window. And here we're just going to use display name.

So with that saved, we're actually ready to build our plug-in. And the next thing we want to do is actually set up so that we're able to debug our plug-in right here in Xcode. So to do that, under your project window, you'll create a new custom executable... and point that at the MD import tool.

Now, once we've got a custom executable pointer at MD import, we basically need to set up the arguments that we're going to pass every time MD import is invoked. And we'll also be taking a look at how to run MD import directly from the command line. But the basic arguments we want for debugging are... We'll add a -d2, which will tell MD import to spit out some nice diagnostic information to the console window. And we'll add a -g, with the path to our built importer. And the last thing we want to do is set the path argument.

In this case, I've got a preexisting... file, and I just want to drag and drop. Good, 'cause I didn't want to try to type that. And with all that set, we can actually go ahead and save our custom executable and switch back to our source files. And we'll set a breakpoint right in Xcode where we would want to check to see that we've successfully loaded our document from disk. Now if we select Build and Go... So for some reason it didn't hit the breakpoint, but we'll pretend it did.

So normally Xcode would break right here, and you'd be able to actually examine the values that were pulled in by your importer and look at them right here in Xcode. So with... The other way to debug your importer is actually using a terminal window. There are some handy options provided by the MD import tool that you can use to look at the scalability of your importer, see that you're getting the data you expected, et cetera.

So one of the first options we'll look at here is the -p option. -p basically tells MD import to collect and format statistics about the performance of your importer plug-in. So what I'm going to do here is actually tell it to run the importer plug-in we just built on all of the cookies that I've currently accumulated on my drive here.

And we can see that the total processing time for this plug-in for all the files, for all 31 files, was, you know, under a second, which looks good. And we also get a nice summary of the most expensive files for your importer plug-in. So if you do have performance issues on an importer, this is a really good tool for sort of isolating, you know, if it's a certain category of files that you have issues with or if it's a specific file.

Another nice tool to work with is MDLS. MDLS will basically list metadata that's been stored already in the Spotlight index and let you see that. So... Let's take a look at an MDLS command for one of the files we just imported. And we can see a nice little summary. We can see our content type tree.

We noticed that our com example fortune ID custom attribute was imported. That's a good sign that our importer was running. And we also got a non-default display name with the text of our fortune message. So that's another positive indication that our importer has indexed this content. And the last thing you want to do is test with mdefind.

So basically, mdefind will let us search for here any item in Spotlight's database which has a com example fortune ID attribute of any value. And in this case, it's only going to be our document type since that's a custom attribute. And we can see how that shows us all of those on the locally attached storage.

So the final place to look, of course, is what your end users are going to see after you've gone through the work of building this plug-in. And we can see if we now do a search, In the Spotlight menu, one of our hits is indeed one of our Fortune files. And if we let the tooltip pop up, we can see the display name, and we can also see the path to where that's stored on our drive. And that concludes the first demo. Thank you.

Thank you, Kaelin. Now, when you've built your plug-in, you want to reinforce that you're going to want to test it. Kaelin showed how you do your basic testing. You can do it from within Xcode, or you can do the testing from the terminal command line. It's important you test against all the architectures. You're going to want to test PowerPC, 32-bit Intel, and 64-bit Intel.

If your machine is running 64 bits, you can force the importer into 32-bit mode by prefixing it on the command line with arch-i386. And that option forces the following command to always run in 32-bit mode. The -d option is also extremely useful. Those are debugging options that will display progress that the importer is making as it's working through your files. It's very, very useful when you're having problems with your plug-in. And of course, the -p option to give you performance statistics. Now, here's some goals for testing.

Performance is extremely critical. Oftentimes, people think in terms of, "Well, I ran the importer against my top 10 files, and it works fine." But your users may not have 10 of your file. They may not have 100 of your file. They might have thousands, tens of thousands, or even hundreds of thousands of files. And the cumulative time it takes to index all of those files can be very painful for a user. I really want to encourage people to use every performance tool to really streamline the process of importing your data.

And because your importer plug-in will be run many, many times, it's important that you don't leak memory. Or file descriptors, or any other resource. And the worst sin of all is crashing. The time it takes to handle a crash and relaunch the importer with your plug-in again can add up to a lot of time if there are frequent crashes.

So test against a lot of files, and test against edge case or odd or even corrupt and broken files to make sure that your plug-in is stable and good for the users. And again, test it on every architecture. Three-way testing is very important. Now where do you put the plug-in when you're done building it?

The primary location is in /library/spotlight. This is where most people will put their plug-ins. You can also put it directly into your application. That way, if the user has your application, they have your plug-in. But it's very important when you put it there that you copy your plug-in to where it's going to finally be.

Don't use a symlink or other trick to make it appear there. That's because Spotlight watches the locations where your plug-in is allowed to be, and when it notices a new plug-in arriving, it will add it into the pool of available plug-ins. If you simply put a symlink there, Spotlight might not notice the addition of the new plug-in.

Because there's multiple places to put a plugin, there's a precedence issue. Spotlight will first look in /library/spotlight. Then it will look in system/library/spotlight. And then failing that, it will look inside of an application to see if you've got an embedded plugin. So if there are two different plugins that both want that UTI type, the first one found will be the one used to import that file. You cannot daisy chain or combine importers. One plugin will get the go. That plugin imports that particular file.

The last thing that Spotlight brings to the equation is an API to allow you to search within your application. Now, how does that work? We take the existing architecture and we add in the ability to have application queries. Those queries will come from your application. Now, you're going to use the Spotlight query language, which for people who program in C should look reasonably familiar. You're going to compare attributes to values, and the files that match that query statement will be the files returned from the query.

You have basic data types: strings, numbers, dates. They can be scalar, that is, individual values, or they can be arrays of values. And you have the expected string operators, equality. You can test a wildcard to test if a string partially matches a value. And you have your normal comparison values that you're used to for numerical values, as well as a range detection, which allows you to specify a range of numbers or dates that will match your test.

You can group these things together to make more complicated queries, queries that match multiple tests. And you can use the asterisk for the wildcard to mean any metadata field. And then the last example, if any metadata field has the word Boston in it, that will return true. It's important to note, however, the KMDItem text content doesn't fall into the star. That specific value must be tested explicitly.

Now, string comparisons, it's actually fairly sophisticated. Your users might be surprised or actually disappointed if you do your string compares as exact matches, which is the default, because case may differ or other minor things that the user wouldn't think ought to make the test fail will fail. So you can make your comparison case insensitive.

You can make it diacritic insensitive if you're going to be having foreign language words. And you can do word-based testing where you can say that if the word appears anywhere in a phrase, for example, you want to go ahead and match. This type of comparison is always done for KMDItem text content. And it's important to note that these modifiers can be combined, and we'll see how that works in a moment.

You can also do dates. Now, as a convenience, there's multiple ways to specify a date. You can specify a date of right now or this week or last week. You can provide your own explicit date by using an ISO format. You can say today. You can also provide for all of the different date formats a range, a relative offset. So you could say in the bottom example, today minus 12, which really means 12 days ago.

You can combine all of these capabilities to make queries. You can, for example, query if the number of pages in a document is larger than some value. Or if any author in a potential array of authors matches Frank Burns, and you're not really worried about case or diacritic and you're willing to do a word search, or you can see the kind of query that's generated by the spotlight menu, where basically if you were to search for the word apple in the spotlight menu, it would actually say if any metadata field matches the word apple followed by anything, case and diacritic insensitive, or if any text content matches apple followed by anything, case and diacritic insensitive, return that file. So that's how the spotlight menu works. And last, this is a query that says you want to find all of the text files that have been opened in the last 12 days. It's pretty useful.

You can make more complicated queries. You can find all the photographs taken with a Canon camera and a specific lens. You can find all the files that have a specific DPI range. Or you can look for sound effects that mention the word "thunder" that are between one and five seconds in length.

Now, to actually do the queries, you're going to have to use the API. And there's two basic APIs that you have a choice of. The MD Query Ref API is a lower level, more flexible API at the core foundation level. And you have NSMetadataQuery, which is a higher level KVO compliant type syntax.

And KVO is Key Value Observer. This is useful where you're doing AppKit programming and you want to be notified when key values change, your observer will be notified. And that makes it very easy to put together higher level applications very, very easily. We're going to take a look at the lower level API first.

There's three basic flavors of query. You can do a synchronous query, where your code will stop until the query is complete. You can do an asynchronous query, where your code will continue to run normally, and then the query will complete and you'll have your answers. Or you can do a live updated query, where your query will run as long as you want it to run in the background. And as the user lives on the system, and files are moved and changed and updated, your query will be updated with the results live as the user's doing things. Let's talk about the synchronous query first. It's the most simple.

There's three basic steps. We're going to create an MD Query ref. We're going to ask it to execute the query. And we're going to take a look at what Spotlight sends back as the results. Creating the query is very straightforward. We're going to create the query and we're going to give it a query string which basically says we want to find all the text files, the plain text files on the system. Very simple query.

Then we're going to execute the query and we're going to tell Spotlight that we want the query to execute synchronously. This clock until all of the query results are available. And then when the results are in, you can get a count of how many results there were. And you can iterate over those results, taking a look at each MD item ref that's returned in turn.

An MD item is basically a stand-in for each file that's being returned. And you can ask that MD item for any attributes that you want about that file. And in this case, we're going to iterate over all of the results, and we're going to get the display name for every file that was returned from this query. Very, very simple.

But your users probably don't want your application to come to a dead stop while you're doing a query. So you might find more value in an asynchronous query. Now when you do an asynchronous query, I first want to introduce some notifications. Because your query is going to run in the background, you're probably going to want to be informed as to the progress of your query.

So you can get a progress notification, which will occur at intervals while the query is ongoing to let you know incremental results, and then you're going to get a did finish notification that lets you know that the query is complete and you will not be receiving any more results. Now let's take a look at how we're going to do this. We're going to create a query like we did before. This time we're going to register some callbacks for those notifications.

We're going to define those callbacks. We're actually going to run the query, and then we're going to handle the results when the query is complete. Let's take a look at that. We're going to do the same query we did before. We want to find all the plain text files on the system.

And then we're going to register for our callbacks. We want our progress notification that will let us know as the query is being handled and results are incrementally being returned. And we're going to want to register for our Did Finish notification so that we know that all of the results are available and that the query is complete.

And then we're going to define the callbacks. Let's take a look at the did finish. When the notification comes in, it's easy to obtain the query from the notification, and then you can call your own function that will process the results when you are done. I'm not going to cover the progress notification yet. I'm going to come back to that in a few moments.

When you actually want to execute the query, this time we're going to pass zero instead of the synchronous statement. So your query will execute, but your code will continue to run in your run loop. This code will not block here. This will run in the background and the remainder of your code will execute as normal.

When the results come in, your notification will be called and then your notification could call a routine that might look something like this. Just like with the synchronous case, we're going to take a look at the result count. We're going to iterate over all of the results and pull out whatever fields that you find appropriate for that query. That's it.

Let's move to a more advanced query, though. Let's take a look at a live query. This is really the most interesting of them. We have to introduce a new notification. You're still going to receive progress notifications as the results are initially coming in. And you're still going to get a did finish when the initial results are complete, when the first phase of the query is done and you have what's currently on the disk that matches your query.

But after that, you will receive did update notifications. And these notifications will let you know that something's changed on the file system and files now match your query or they've been modified in your query or that they no longer match your query so that you can do something live with the results.

Creating this kind of query is a little more complicated than the previous, but not by much. We're going to create the query. We're going to register for our notification callbacks. We're going to get the initial results finished callback, which indicates that all the initial results are done. Once they're up, we're actually going to execute the query. We're going to handle the initial results, and then we're going to listen for updates. That's the new step. We're going to listen for updates. We create the same query we did before. We want to find all of the plain text files.

We're going to register for a progress notification because we want to see the results as they're coming in. We're going to register for the Did Finish notification because we want to know when the initial results are done, when the initial scan of everything that's already on the disk is complete. And then we're going to want the Did Update notifications for the liveness of the query. This is telling us what's happened after the query was done so that we can keep up live as it changes.

The results finished callback looks very much the same as the previous one. We're going to call our handler when the initial stage of the query is done. And then we're going to execute our query. And this time we're going to tell Spotlight that we want updates. This call will not block. Your code will continue to run as you expect it to. But now you're going to be getting your query results as well as updates to the query as it progresses.

You're going to handle the query results like we did before, but notice there's two extra calls. We're going to disable updates at the beginning. We're going to look at the results and then we're going to re-enable updates at the bottom. This is very, very important. When you're looking at these results, this is a live query.

And right then, the user may be changing the file system in ways that cause new files to be added into your results or files to be removed. And by disabling updates, you prevent the results from changing while you're looking at them. When you're done looking at them, you would re-enable updates and they will continue to come in.

And if you remember when I said I would talk about the progress notification later, this is when I want to talk about it. Because the progress notification is called while your query is being executed, more results may be coming in at any time. So in any progress handler that you write for synchronous or asynchronous or live query, you should always disable updates and enable updates when you're done if you choose to look at the results during the progress update.

Now, you're going to get your callback that indicates the query updated. Something has changed in the results of your query. When you make the callback, you can obtain the query from the notification and you can take a look to see what items were added into this query result a moment ago. For example, a file might have been saved that suddenly matches your query. You'd be notified about the new added item.

You can also take a look at what items changed. These are files that used to match your query and still match your query, but something in the file changed that you might want to take a look at. And that would be listed here. You could look through this array and find the list of all of the items that actually changed.

And last, files might have been removed from your query. That is, they no longer match your query. As an example, the user updated a file, but he changed the contents of the file in a way that your query is no longer interested in that file. And so you'll be notified that files have fallen out of your results and that are no longer matching your query.

Now let's move on to the higher level NS metadata query. This is the Cocoa approach instead of the core foundation approach. It has two flavors. You can't do a synchronous query here, but you can do an asynchronous query, and you can also do an asynchronous with updates, a live query. Let's take a look at how we're going to do that.

There's four basic steps. We're going to define and initialize our class. This is a little more sophisticated. We're using object-oriented programming. We're going to listen for notifications just like we did in the earlier examples. We're going to set up and execute the query, and then we're going to process the query results. The defining is pretty straightforward. In whatever class you define, you're probably going to want to put an NS Metadata query reference as part of your class declaration.

And then in your constructor for your class, in your init method, you'll do whatever you would normally do for your own class, but you'll create the query object, you'll set up the notifications, and in this case, we're going to listen to the gathering progress notification. This is the notification you get while the results are being gathered incrementally. And you can also register for the Did Finish notification because you're going to want to know when those results are complete.

Now we've set things up, we have to actually provide the notification. This is a simple handling for the did finish notification. Now notice we have to do something a little different here. When we're told that our initial query is complete, we need to stop the query. Otherwise, this is going to be a live query and it's going to continue. But by stopping it here, we've done an asynchronous query that completes when your results are gathered. And when the results are available, you'll notify yourself with the query did finish method.

Now we're going to actually prepare the search itself. We need to set up sort descriptors. And what sort descriptors tell the system is, "I'm going to get a lot of results back, but I want them sorted in a particular order," which is actually very convenient because it eliminates the need for you to have to sort. They're going to come back in a specific order. In this particular case, we're going to sort by the item display name. You can sort by as many fields as you want, and they'll be sorted and sub-sorted as appropriate.

Then you're going to set your predicate. That's your search term. And in this particular case, we're going to search for any kind of image on the system. Now, this uses NSPredicate, which is very similar to the Spotlight query language, but it's not quite identical. You're going to want to take a look at the NSPredicate documentation to see the differences in the query syntax. And then we're actually going to start the query.

Now, when the results come back and your call is activated to indicate that the results are complete, just like before, you can get the result count, iterate over the results, and pull out whatever attributes for each file are important for you. In this example, we're going to pull out the display name for every file that matches our query.

Now let's move on to a live query. It's very similar. The only difference is that we're going to be listening to updates. We're going to define and initialize our class in the same way. We're going to do a results finish callback. We're going to actually execute the query, process our results, and then we're going to go into a mode where we're listening for these updates.

We define the class exactly the same: Provide an NS Metadata query in and amongst your own fields. We're going to set up the class the same way. We're going to ask for the same notifications, the progress, the did finish. We're going to add in a new notification. We want to get the did update notification that lets you know that something's changed and your query should be updated.

When we get our progress, when we get our "Did Finish" notification, there's an important change here: don't stop the query. If you stop the query, you've made this into a simple asynchronous query. We want this to be live, so if you don't stop it here, it will just naturally become a live query, and you will receive the updates as you expect as the file system changes. It gets the same way. We're going to look for all images, and then we're going to start our query.

Now, because this is a live query, we're going to want to disable updates when the results are complete. Otherwise, the results might be changing out from under us and lead to very unexpected results. So we're going to disable the updates, and then we're going to iterate over each result returned.

And in this case, we're going to pull out the display name attribute from each file. You can pull out, of course, any attribute that's interesting to you or useful for your query. And then when we're done looking at the results, we're going to re-enable the updates so that they can continue to be monitored.

Now, when the updates come in, you're going to be notified through the query update notification. But unlike the lower level MD Query Ref, you can't find out directly which files were added, removed, or changed. You simply know that files were changed and you should re-look at the results. When you do things higher level in Cocoa, though, because of the way you can do KVO stuff, Key Value Observer, you can have the results automatically update screen widgets, et cetera, and Kaelin will show an example of that in a few minutes.

Now, when you do your queries, there's a lot that you can do to make them run significantly faster.

[Transcript missing]

We also can dramatically increase the performance of the query if we fetch the attributes with the query and we do the sorting for you. If you recall when we created the query, the last field was null when we create the query. We had a query string, but we didn't specify the last possible value. That are sorting attributes. And in this particular case, we're going to provide an array to say that we want to sort our results by the title and then sub-sort the results by the display name.

This has two benefits. Spotlight will return the results to you already sorted in the way that you want it sorted. And those two attributes will be returned directly with your query. When you ask for information about the returned items, those two fields will already be available to you, making it significantly faster for you to actually get the attributes for what you're looking up.

If you don't specify it, you're not going to get the attributes. If you don't specify a sorting order and you ask for an attribute for a returned item, that will require a round trip to the server, which adds quite a bit of expense to your query. Now we're going to see how we take advantage of that.

When you're fetching your attributes, if you recall, we got the result count. And then we iterated over every result, taking a look at the empty item, and then asking that empty item for a specific attribute. In this case, we're asking for the display name. Now this is slow because it makes a round trip display name. What I'm going to recommend you do is if you've specified sorting attributes, you get the attributes this way.

Taking the query, you say you want the display name for each item, and we will directly give you the display name that has been prefetched. This is a significant optimization for how fast you can get results for your queries. Now we're going to take a look at Katelyn. will have a demo for us of writing code to do a live query.

Okay, so you'll recall our Fortunes application. Let's take a look and see how we've actually implemented that. We started with a simple Cocoa project template, and you can see here we've got a-- our main window with a table view and a nice little search control here. So the--the way this application works is we actually run-- when the application is launched a Spotlight query that finds all of the Fortune files, gathers all their display names, and then we just plot that into the table view, and we take advantage of NS Metadata Query Class's key value observation compliance to be able to wire all this up with Cocoa bindings and very little code. So let's take a look at what-- what's actually involved there.

First thing we'll look at is our Fortunes App Delegate class. So in typical Cocoa fashion, all of our code for this application really just lives in the applications delegate that we've written here. And if we pop open the delegate, we can see here it's got a slot for the NSMetadata query, which is the Cocoa version of a query class that Andy outlined in the second part of his slides. We also have an IB outlet here for wiring up the search control to a search string so our code can get at it. And we have an IB outlet for getting at the table view.

So it's really a pretty simple application delegate. There's a little more work to setting up the nib, and some of it's a little bit less intuitive. But basically, to make bindings work with an NSMetadata query, what you want to do is set up an array controller to contain the results.

So what we've done here is actually dropped an NSArray controller into our nib file, and we've set up a little bit of a code : So, we set its mode to be class, which, when the object is reconstituted and instantiated from the nib, basically configures it to display objects of a particular class. And we set the class to NSMetadataItem.

So, NSMetadataItem is each result from an NSMetadata query gives you an NSMetadataItem. And the other really nice thing we can do to facilitate key-value binding is to come down here and just add individual keys for the attributes that we want to be able to bind to. So, you can see here I've pre-populated this with keys for our com example fortune ID attribute that we defined and also KMDItemTimeStamp and KMDItemDisplayName. We use the timestamp for sorting and the display name is what we actually want to populate the table view with.

Switching over to look at the bindings, we can see that the content array for this particular array controller is bound to query.results. So that key path basically is going to go-- Back to our app delegate, find the query attribute that we were looking at and then bind to its results path.

And sort of through the magic of KVO, as our live query executes and new results come in, these bindings will fire, the array controller will update, and that internal update our UI, all without us writing any code to explicitly do that. And then the other thing we do is we have the table column actually referencing this same array controller to get values for each of the table column cells.

So now, let's, without further ado, get to the code. The actual app delegate is also pretty simple. In our initialization method, we go ahead and allocate an NS metadata query and bind that to ourself. And then we prepare the query by setting an array of sort descriptors. So if you're familiar with Core Data, this is exactly the same sort descriptor class that Core Data uses in its infrastructure.

And we basically, we create an array of these and set it up so that our query results will be returned to us sorted first by the timestamp, then by the display name, then by the Fortune ID. Now, as Andy was alluding earlier, this has the side effect of also configuring the framework so that as results are fetched from the live query, these attribute values also come back with the results without an additional round trip to the server.

Looking at how we actually start the query running, we have a convenience method here because we call it from a couple of different places. I named it Reset Query Predicate. Basically, we start out with a simple predicate that will search for all items in Spotlight with a content type of com example fortune cookie. So what this gets us is, by default, a simple predicate, or a simple query predicate, that finds all the fortune cookies on all of our attached local storage and displays those in our table view.

Now, if the search string's been filled in through the interface builder binding that we looked at in the nib file, we actually build a more complex predicate, a compound predicate that has a sub-predicate of KMDItem text content like case, diacritic, insensitive, whatever the user's entered search string was with an asterisk appended. So basically what we're asking Spotlight for is a case, diacritic, insensitive, and word-matching prefix search. This is the same kind of search that the finder and the search menu do by default, and it's what users will typically expect to see if your application has a search field.

Now, this is a simple version of this in a more complex application with more rich types of metadata about the content. You might have other options as well, but this is kind of a good example of setting up a default search. Again, if the user has provided a search predicate, we go ahead and create a compound predicate just by anding together these two predicates. And then one way or another, we set the query's predicate and we start the query to running.

[Transcript missing]

Now, to demonstrate that this is actually a live search going on, let's do a... Search that doesn't have any matches, and then we'll flip to the finder, and I've actually got a fortune on my disk that does have that text in it, but it's got the wrong extension right now.

So this sort of... helps drive home the significance of how the UTIs and extensions and everything map into the Spotlight infrastructure. You'll see when we come into the Finder and change the extension on this file, Finder pops up this dialog. Now, one of the reasons that it's doing this is we're actually switching the UTI definition, and that's gonna have side effects. For one thing, it's gonna cause the importer that we wrote in our first demo to be executed on this file because it's now a different UTI. So we'll go ahead and click Use Fortune.

And when we do that, if we bring our app back to the front, you see the importer ran, the plugin imported all the metadata that we had defined. It pushed that into Spotlight. That new data, that new item, now matches the live query that we have executing. So those results get sent to us, and KVO populates it into our table view, and it all just kind of happens, which is pretty cool. And that concludes our second demo. Thank you.

Thank you, Kaelin. Using Spotlight, you can really add a lot to your applications. You can help your users find your data, which they're going to really want to be able to do, especially if they've got a lot of it, and you'll be able to have your application use the power of the search capability of Spotlight to help your users find their data or other data that's beneficial to your application.

I want to stress again the importance of testing it three-way universal. Test it against PowerPC. Test it against Intel 32-bit. And please test it against Intel 64-bit. And test the hell out of it. Test performance. Test that it's stable. Make sure that your user's going to have a good experience importing your data.

And consider adding more Spotlight searches to your applications. Your users can really benefit from it. We're going to have a lab today at 5:00. And if anybody has any questions, about how to import files or how to do queries, I want to encourage everyone to come down to the lab. Lots of times at these labs, people have left with fully functional importers ready to go for their data. So if you have that kind of work you need to do, come on down and talk to us.

We would really love to see you. If you need more information, this is the contact information for our evangelist, as well as a pointer to our documentation. If you go to developer.apple.com and search for Spotlight overview or UTI information, you'll find the links that you need to develop applications for Spotlight.

Other sessions, which probably have already happened by now, but you'll be able to see on the DVD that you're going to get for attending, these other things may help. Using Quick Look, how to use the file system efficiently, and as well, of course, an encouragement to come to our lab.