Using the Spotlight Framework - WWDC 2004

Application • 52:53

We discuss how to integrate with Spotlight to enhance the search user experience of your application on Mac OS X. Don't miss this opportunity to learn how to use the Spotlight APIs and write a Spotlight importer for your documents.

Speaker: Dominic Giampaolo

Unlisted on Apple Developer site

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.

I'm Dominic Giampaolo, a member of the Spotlight team, and this talk is working with Spotlight. First, I'd like to go over what we're going to cover today. So the agenda that we have is why Spotlight, how it works, what is it that we were trying to solve, what did we try and accomplish.

Then the next piece, integrating your app with Spotlight, what does that mean, what are the different parts of that, searching with Spotlight, and of course, working with metadata. The main focus of the talk is really going to be about what it means to integrate your app with Spotlight.

First off, what's the problem? Any engineering task, before you start it, you of course want to define what is, put it in a box, define it. What is it that we're trying to solve here? It's hard to find things on your computer. I think we've all run into this. So, why is it hard? There's too many files.

If you're anything like me, you've accumulated a few files over the years, a couple tens of thousands, and then, well, there's digital cameras. Oh, that's another couple thousand, ten thousand files. And, oh yeah, all those MP3 files, that's another bunch of files. Any movies that you've created or downloaded, it starts to accumulate. Right?

If you've bothered to put things into some kind of organization, that's really nice. You have, like, I'm a little bit of a librarian in a sense, you know, nice broad hierarchy, but everything's fixed into a single location, and that's not always what you want. You may have files that fit into multiple categories.

I have just got back from a trip, and I took lots of pictures of flowers and lots of pictures of mountains, but in different countries. If I want to say, show me all the flower pictures I took in France, well, that's one thing, but if I want to say, just show me all the flower pictures from the trip, including France and Italy and wherever else, can't do it. It's not an easy way to organize things.

Next, there's also a lot of rich information about files, and we're just not using it. So even though MP3 files are tagged with a lot of rich metadata, email has quite a bit of metadata, JPEG images, of course, the EXIF information from the camera is quite a bit. You may want to say, show me everything with an aperture less than f-stop of 3.0 for things that you'd shot as close up or long distance. But there's no way to easily search for that. So there's just no easy way to find this information.

What's the Spotlight solution? Obviously, we want to make it fast and easy to find files, as opposed, of course, to making it confusing and difficult, which I suppose if that was the goal, we're already done. So how do we want to do this? By using metadata to enable richer searches. Metadata is information about the data or the contents of a file. We want to use that to enable users to search in a more natural way.

This allows you to organize files in multiple ways so that you can set, like I was giving the example before, mountains in France or French pictures, pictures I took in France. These are different axes that you can flip around. We'd also like to allow for additional metadata, things that maybe weren't originally envisioned as being associated with the file, but that you need to, such as workflow state.

And the last point is a very key one. We don't want to require apps to change. It would be great if we could wave our hands. Ta-da! New world! Everyone's rewritten all their apps. Great. We all work with metadata and we're all very happy, but the world doesn't work that way. You have lots of code. It's difficult to change. So the minimal amount that we can ask you to do, the better.

Now I'm going to go through a quick demo of Spotlight to cover a couple of things that I'd like to highlight for you. First, of course, we'll start with the finder. So some of this you saw in the keynote, but there's a couple of subtleties that I wanted to point out. So first query type, of course, HTML.

We find 779 items out of whatever are on this disk, and that's pretty fast. If I was to type something like JPEG, like this, we find 1183, which is a little bit bigger. A few people missed this in the keynote, or at least I heard, but there are smart folders. So if I type JPEG down here, now I have a smart folder of JPEG images. I have another one of HTML as well. So you can save your searches and come back to them, and of course, they're re-executed at that time.

Another thing, if I was to type something like Frederick, okay, nothing matches. If I come over here, I'll just position these windows appropriately, and I go and create myself a new folder. But I'm not going to type Frederick normally. I'm going to do it the French way because there's so many wonderful French people at Apple. So we type it with a few extra accents there.

And when I hit return, notice that it showed up automatically in this query over here. So even though I typed the E's without the accents, we're doing case and diacritic and sensitivity on the matches, and you saw that it was live as well. It showed up when I created it. And if I was to drag that to the trash, it disappears from the query. Next, I'm going to show you a little application.

Well, not so little, but a very nice application that we wrote internally called Bullsearch to demonstrate another feature that we have called grouping. So if I was to search for JPEG, again, we find, there's a bunch of items here, and I'm going to organize. There's an option for grouping. And so if I choose to organize these by title, some of these things were actually QuickTime movies that were compressed with JPEG, photo JPEG compression.

So now you see there's kind of these set of virtual folders that were automatically created based on the title. So there were five different versions of the Dungeons & Dragons trailer, and they get grouped together because they all have the same title. So it's sort of synthetic virtual, and then you have virtual folders that get created on the fly based on the set of attributes that you're grouping by. And this is a very powerful way to kind of build virtual hierarchies on the fly. So Finding Nemo, again, there's only three versions of that film.

These are different resolutions or bit rates for the web or so on, which is kind of a nice way to organize things. Now, the key part of this demo, actually, let me just quit that, is that I'm going to run Microsoft Word. Fire that up here for a second, and I'm going to bring up another Finder window.

I'm going to type the word "outrageous." Nothing matches the word "outrageous." Now, in Word, Outrageous document baby. So I've just created this. If I save it, and I'm going to call it whatever because if it had the word outrageous in the title, that would be too easy. And so, of course, it automatically matched in the content and showed up.

Key thing to observe, we didn't change Word, right? We don't have access to Word, so we didn't actually do anything. The find by content, that's pretty straightforward. Another thing, though, that is a little bit more subtle, if you pull up the property sheet, and this was alluded to in Bertrand's demo, you can see that there's some metadata here that was automatically filled in for me, both in the title and the author field. So just click OK there.

If I was to come back over here, and I'm just going to type something else that matches nothing, so you can see there's nothing there. If I type my last name, it matches the document, because we extracted that metadata. And again, this is the role of importers, which is a key thing that we're going to talk about here in a second. And this is how you get your app integrated into Spotlight.

So without any changes to Microsoft Word whatsoever, the simple addition of this importer, we've managed to get it integrated very seamlessly so that you can search for things by their author, so on and so forth. Okay, so let's quit out of here, clean this up, don't bother saying that. And so going back to the slides, if we can for a second.

Why do you care? Spotlight enriches the user experience, plain and simple. It makes your documents easier to find. When you're integrated properly by the presence of an importer, your documents, users can find them based on things that they remember about them, not just the title. So that can take many different forms, some of which we're going to go over. It doesn't require any code changes in your app.

There are things that you can do to take additional advantage of Spotlight if you want, but without doing anything at all, for example, like we did with Microsoft Word, you can take advantage of, you can get integrated into the Spotlight system. Users can find their documents more easily. It's like an additional feature for almost no work whatsoever. And it's another way to share data with applications so that applications don't have to necessarily know everybody else's file format for the information, the metadata that's important, which can be published by the importer.

Then it's more easily accessible to other applications, and they don't have to go through and parse your file format. There's a uniform way to access it. Now we're going to talk a little bit about the Spotlight architecture so you can understand kind of how it's put together and where you fit into the equation.

Spotlight is a system for storing and retrieving and querying and getting information about files. It's composed of a server, which runs in the background, daemons that help the server, and of course, importers. And I should not forget to mention the client API, which is part of core services. The importers are the sort of connection from the rest of the world to the system that stores it. Kind of what does it look like?

Let's see if this can get this to work. So over here on the left side, you have an application, which goes and writes a file. When that file is written, the system notices this, and an importer is run to extract metadata from that file, which is then connected up to the Spotlight server, which stores it into the system's store.

On the right-hand side of the picture, you have the finder icon, which could be any application, which issues queries and receives results and can display those. Not a lot of apps have a need for that, but for those that do, that's the sort of final piece of it.

There's three main concepts in the Spotlight system. Of course, you have importers, which I've mentioned here using the actual code terminology, MD importer, which is how you extract metadata from a file, publish it to the system. That's pretty straightforward, and we're going to write one later on in the talk in a minute here.

You have an MD query, which is a way to write an expression about the attributes that you want to find or the files that you want to find and retrieve them. And then the leaf items are MD items, which represent files, and items are made up of attributes. And I use the word metadata and attributes sort of interchangeably. Attributes are a name, type, and a value, and represent some information about the file.

Ways to integrate with Spotlight. You can write an importer if you have a custom file format. So if you work with standard file formats such as JPEG or AIFF or MP3, you don't have to do anything. We're going to cover the basic data types that Apple supports natively.

So there's no work to be done if you work with standard file formats. With some caveats in the sense that you want to put metadata in there if you can. But the first thing you can do if you have a custom file format is to write an importer. This is what enables sophisticated searches for your documents.

You want to put useful metadata in your documents. So that's sort of what I was saying is that if you can, for example, the EXIF data that comes in a camera, preserve it, make sure it stays in there or put additional information in there that we can extract because a lot of file formats already have support for a variety of metadata. And then if you need, you can, the final level of integration is to actually use Spotlight queries for tracking documents or you know, displaying results.

Now we're going to switch to talking about importers. And in this section of the talk, we will actually go through and create one and write it, install it, and show you how it works. What are the rules of the game? Importers need to publish metadata that helps users search.

Kind of harping on this. You want to allow for richer previews, which in the sense that some attributes are difficult to compute. So the length of a song. If you have a variable bit rate file, computing how long it takes, you know, what's the duration, is difficult. So you would want to compute that once and store it as an attribute so that we can say, oh, find me songs that are longer than three minutes. That's useful.

You want to avoid putting things into publishing metadata that's private data, binary data, icon previews. This is not what Spotlight is about. Spotlight is in a sort of fast and efficient way to search for user-oriented data, things that users remember, the labels of layers in a Photoshop document, the names of tracks in a multitrack audio editor or movie editor. These are things that you want to avoid.

And you want to avoid putting things into publishing data that people would remember that they would want to search for. A chunk of some data structure that's internal to your app that's binary that the user has no connection to, no, that's not something that they'd want to search for. And at the other end of the spectrum, too much noise, too many attributes can confuse the user. If you have 500 attributes, that's probably not the right approach.

So what are attributes? Examples of good attributes: copyright, title, author, dimensions. There's a special attribute called KMDItemTextContent, which we use to represent the text content of a document, and this is how we do the full text searches. So that can take a couple of different forms, and I'll cover that in a couple of minutes.

Some bad attributes would be, you know, app-specific implementation details or binary data that the user can't easily search on. We've predefined a whole bunch of attributes. So you can see the list here, KMDItemTitle, authors, keywords, projects, and so on. There's quite a different--quite an extensive list. It hasn't covered everything, of course, but a fairly broad set of things. And if you look in the include file, metadataMDItem.h, you'll see that there's a list of attributes You will see the full list of these attributes.

So writing an importer. This is where we're going to actually step through the process of writing one. What do you have to do to do it? In Xcode, we've got a metadata importer template. There's one function to implement. So it's not that difficult. You can use your existing document reading code with the caveat that you don't want to have some piece of code that goes and inflates the entire data structure. You have some multi-megabyte data image or whatever, and it gets pulled into memory and exploded and decompressed.

That's not what you would want to do. You would want to sort of scrape the file, get the interesting bits of metadata, and then publish that. So a lightweight version of your document reading code. If you have a custom file format, you know how to read and write it. So you probably have code that you can use. And you return a CFDictionary of the attributes that you would like to publish for that document. So those are sort of at a high level what it takes to write an importer.

There's three steps using the MD Importer template. You have to create and define a GUID, edit the Info.plist, and then implement the code. One, two, three. Defining the GUID, there's a command line tool called UUIDgen. Type that in in the terminal, run it, you get a string, put that into the code, edit the Info.plist to associate that GUID with the code, and then identify the UTI types that your plugin handles.

So if you have a custom file format with a custom file type, you would put the file type for that document into the LSI item content types key in the Info.plist, and that's how the system knows to associate your importer with that data type. And we're going to go through a code. Then, of course, you have to implement the code. There's a function, getMetadataForFile, and that's that. So let's write an importer. And you'll see this doesn't actually take too, too much. So I'll run Xcode. Oh, can we switch over to the code machine? Okay.

Okay, we'll create a new project. And this is a Apple standard plugin. We have metadata importer predefined. And we'll call this is the source importer, although that's just because that's what I've been typing. It's not actually a source importer. Well, it sort of is. Let's see. So if we pull up main.c, you can take a quick look here.

We have a template and there is the three steps that I talked about. First thing it says is create a unique UUID for your importer. So fire up terminal. I type UUID gen. And I get this very beautiful 128-bit ID. And I will push that down there. And I paste that in there. Okay. And following the instructions, go to step two, edit the Info.plist. All right. I can do that. and I come back over here and There is the metadata import or plug-in ID.

And I will paste in there. And once again, this other part down here. And then the last thing that it said to do was to change the UTI type. So in this case, I'm going to say that we edit public-c, or we support public-c header files. Save that.

And now, the third step is to write the code. So of course, I'm going to sit down and write a big chunk of code right now that parses C header files. No, we're just going to cut and paste a little bit. There's a couple of header files that I need to throw in here. I'll put those up here at the top. And I will put in a... is the prototype for my function which I wrote ahead of time.

And... I'm just going to cut and paste a nice big chunk of code down here below that is very rudimentary, but... does enough work to make this demo work to parse a C header file. The last thing said, implement get metadata for file. So here we have that piece of code that's empty at the moment. You're passed in a couple of different arguments, and... The main one, of course, is the attributes dictionary ref that you get.

This is what we're going to fill out with the information that we would like to publish for this file. We're also told the content type UTI, so if you have an importer that handles multiple types, you'll know what we think the type of the file is. And the last thing, most important, is a reference to the file, the path to the file that we would like you to parse. Now, I've already gone and filled in the bit of code that does all this. I'm just going to cut and paste this. Into here, replacing this empty bit, and then we'll go through it really briefly.

And so this, we get the full path and then we have this function getTypeDefNames, which given a path, returns to us the number of type defs in a C array and then does the magic to convert that into a CFArray, which we then add as a dictionary value with a particular, this is the attribute name, comAppleSourceTypeDefName.

And there's one thing I have to do here because we have a custom attribute name. And we pass in the CFTypeDefs, which is a CFArray. So I can save this. Now the last thing that I have to do, because we're, we have a custom attribute name, we

[Transcript missing]

So I need to call this com-- sorry, I called it this. I copied this.

This is where we actually really say what it is. I'll talk about this again in a second. I'm just going to gloss over this for a moment. That's all taken care of. Save that. We've saved this and we're going to build it. If I didn't screw anything up, okay, good job. It built. Now, what do we do with it? We have, go into the source importer directory. If you look in the build directory, there's a source importer.md importer. If I copy-r source importer.

sourceimporter.mdimporter into tilde slash library mdimporters. I'm just putting it in my home directory here. I'll talk about where else you can install it. I'm just putting it here for the second. We drop it in there. Basically, that's all it took to install it. Now, we can run in developer tools. There's a program called mdimport. We do dash L. If we did everything properly, it should show up in the list, which it did not. Okay, so I need to mdcheck schema.

Check schema and that is on schema.xml successfully parsed. Oh, that's right. Thank you. Yeah, so clearly I've used Unix for only about six months. and of course being up on stage helps a lot. Whoever said that, I really appreciate it. I would have spent another ten minutes realizing that. Okay. So now we will successfully install it properly. And if we run developer tools MD import -- ah, there we go. Beautiful.

[Transcript missing]

: I had a sample header file in this test directory. If I run developer tools mdimport again and with the -d3 option so it will print out loads of information, I have this file myheader.h. First off, if I type -- let me just go ahead and run it.

What we can see happened here is It says importing data from file and it tells me exactly what file, what type it thinks it is, public C header, which is useful to see that it matches what we defined ourselves as. And then we can see that, hey, com.apple.source.typedefs, that's the name that we defined for our attribute. And there's three typedefs, myinteger, mybiginteger, and foostruct.

And if we were to look at myheader.h, we can see that there are three typedefs in here that were extracted properly and published as header file. So this is a way that, you know, we've just defined a new importer and installed it in the system and successfully had it publish metadata, which, if we'd like to, go into Finder, and if I say, what file defines mybiginteger, we see that myheader.h shows up. So we're fully plugged into the system.

[Transcript missing]

So we wrote an importer. There's a couple of things we still need to talk about, though. MD importers run in several different contexts. So in the case that I showed there, we ran MD import by hand on a single file. It ran, extracted the metadata. That's all very nice, well, and good. However, MD import can run in a couple of different scenarios. For example, if someone takes and plugs in a FireWire hard drive with 100,000 files on it, we've got a lot of work to do, and so it can be part of a slightly longer running process.

This, of course, has implications. So when you run it once and it works, you're all happy, that's great, and you don't notice anything necessarily, because even if it goes and allocates a lot of memory, you may not feel the impact. However, when it's running, excuse me, as part of a longer-lived process, if it has leaks or trashes memory, you're going to start to notice these things. So you need to be a good citizen. You need to pay attention to this. We're also taking defensive measures as well. So if you're not a good citizen, we'll make you be a good citizen.

However, you want to avoid using a lot of memory if you can. You have to pay attention to things like leaks, and we have a lot of tools to do this. And you want to use some caution when reading large files. So in some cases, like I said, when you plug in a drive with a whole bunch of files on it, you don't want to necessarily just read the file like you would normally, because you can pollute the buffer cache of the computer, which can cause a lot of unnecessary paging activity, because in that scenario, data's not likely to be used again.

So if you're running with standard POSIX file descriptors, you can call the F control for F no cache. And if the data is in the cache, because it was a recently saved document, you'll get it from there. If it's not in the cache, you won't waste time polluting the cache with data that you're never going to read again. So that's always a win. If you're using Carbon, you can use the no cache mask. If you're using Cocoa, you can get it the raw file descriptor and call the F control.

[Transcript missing]

As you saw, I installed it into Tilda Library MD Importers, which works pretty well for initial testing and debugging, but most likely you would want to install your importer into /library/mdimporters. For debugging, I used the mdimport-l, which is a list of what importers are installed. That's a quick test to see that it got there, which is where I had that heart attack earlier.

When you're testing it to see what's happening, you can use mdimport with the -d option to get different levels of debugging. -d4 is probably way too much. You can give it a path to a hierarchy of files, or you can give it a specific file as well. It's in developer tools, and it's the way you would test things out.

If you need to define new attributes, and this is what I kind of glossed over in the code walkthrough, there's a schema.xml file that's part of the project. And you can define new attributes in a couple of different ways. So depending on what your needs are, the first one, we have a string attribute, which we define as a type of CFString, and we give it a name. You can have number, CFNumber.

And the last one is kind of an interesting one, and this is what I used in the source importer. It's a multivalued string. What this means is you can think of the attribute as an array of individual values. So foo, bar, blah, those are all separate entities that are in an array for that attribute.

Then you would localize, or you can provide localization for your attribute with the schema.strings file, which is again a standard convention. It's a UTF-16 file. And you map. You can also map what you wanted to call, or what the name you gave it in the file, which is not something you would display to the user.

And then in my favorite language, the only other language I know, Italian, what you would want it displayed as. And you can check this with md_check_schema. You notice that we're using a kind of funky naming convention here where the reverse DNS style naming, but we have underbars instead of periods. That's because we wanted to keep these attribute names compatible with the Cocoa key. So we have a little key value coding scheme which doesn't allow for periods in the name. So that's why we did it that way.

Apple has written a whole bunch, well, a whole bunch, a couple of importers for the standard file formats that we support natively, and you can expect us to continue to do that. So things like JPEG, PNG, TIFF, so on, we have that covered. QuickTime, of course, you would expect that. PDF. And then things that the application kit can open for text documents, which includes RTF and RTFD and Word documents, we support that as well. So you don't have to do those.

So in summary, importers are pretty simple to write. There's not a lot to it. There's a bit of glue code that you have to get together. We provide that in the template. It's a CF plug-in, so it's not any great magic. It makes your documents easier to find. This is the connection.

This whole system, the whole Spotlight system lives and dies by the quality of the metadata that's there and how easy it is for users to search for things. So it makes your documents easier to find. It's in everybody's best interest to do it. It handles full text indexing with the KMDItemTextContent attribute. And it's the sort of thing you could go home and write one tonight. So with that, let's talk about queries and searching.

Who needs queries? Well, not a lot of people, actually. There's not that many finder applications that need to be written. But apps that have a custom UI where the focus is working with groups of files, so take some of these things and do some of this stuff over here, and that's the main focus of your UI where you're not going through a traditional open save panel, those are the kinds of things that would benefit from working with queries. So asset management, workflow, or file type management applications.

Even something like Soundtrack, which you may not think of as working with files, but in fact, I don't know if you're familiar with the Soundtrack application, but you select different sets of instruments and it issues queries to do this. This is something actually that could take advantage of the Spotlight system to do queries on the attributes about the instruments that it's searching for.

Queries find items based on their attributes. Attributes that you can search on are the metadata that's published by the importers, of course, file system attributes, that's what we've always been able to search on, the file size, last modification time, all those boring things that you don't really always think about but are useful sometimes, and of course full text content.

What does the query language look like? It's a simple C-like expression with standard operators like equals, not equals, greater than, what you would expect. You can have parentheses for grouping. So what does it look like in an actual expression? They have two of them there, kmd item keywords equals star foo star, and that's how you would do a substring match.

And then the bottom example is slightly more complex. When I did the example of searching for Frederic in the finder before, you noticed that it matched because even though the accented, the characters, the accented E and I hadn't typed an accented E, it still matched. And what you see at the end here is, whoops, I went too far.

Okay. That little CD at the end stands for case insensitive and diacritic insensitive. And because we have the asterisks around both ends of it, it's a case and diacritic insensitive substring match. Now, how do you write a query? There's three parts to it, really. You first create an MD query ref, and you have the standard CFAllocator default. And the string you pass in is the expression that we had just on the previous screen. In this case, we're saying KMDItemTitle equals star tiger star, and we have the CD.

And then we have some additional options for grouping and sorting, which I showed in full search, but we're not going to cover here today. We'll pass nulls for those. Then you start the query running with MD query execute. And in this case, we've specified that we want to have, want updates, which is a live query.

If you just want to issue a one-shot query, you pass zero, I believe, for that argument, and then you don't get any updates. It just, that's the results, that end of story. Then you read the results. When you get notifications that there's results available, you get the result at query index I, and there you have it. Thank you.

Queries are designed to work with CF run loops. So there's three phases, really. There's progress, okay, you're getting results, things are coming in from the initial set, we're going through. Then you get a finish notification that says, okay, that's the initial set. If you selected for live queries, then you'll start to get updates as things come and go from the query set.

Now when you saw, like, the liveness things, I did a query, nothing matched, then something popped in, that's an update notification coming in saying, hey, there's a new result. You have, like I mentioned, one shot or live queries, and the sorting and grouping features, which, again, unfortunately we're not going to cover today because we will not have time. So if I can come back over here, we're going to go through a little sample program that we have that does queries.

I'm not going to write the code, but we will go through it briefly to kind of see what it looks like. I have a little application that looks like this guy right here. And there's a search field which is hooked up to the code to a search now. So when I type in a string, that gets plugged into this function here, search now, in the code. Let me slow down a little bit.

First thing we do is set the title of the window, not very interesting. We create an NSString, and this is a Coke application, and we do a star equals, and then we put the string that they type, that's the percent at that was typed into the search field, and we put it in quotes and put it as a substring match with the stars on either end, and we say CD for case and diacritic and sensitive. Then we pass that on to start query, which is another method down below, and that's right here.

Here we take -- we add notifications. This is the very first query that we've run. We add some notification observers for progress, finish, and update. Then we call MD query execute just like I mentioned on the slides earlier. Now, when we run this program, let's go ahead and build it and run it. Here's what it looks like, and if I type HTML, we'll get the same results we'd get in any of the other applications, and we have 779 results.

All right, pretty straightforward. Where did that all come from? When we got updates, we asked the TableView to reload its data, and I'm going to talk about that, how we actually display the data later on in the last bit of this talk. And when we get the done notifications, we don't have to actually do anything, we just note that it's done.

And... Uh... That's basically all there is to it, to issuing a query. That you... When you get updates, you tell yourself to process them, and in this case, like I said, we just asked the TableView to reload the data, which is where we actually go and display it. And that's... That's that.

So, I'll actually, I'll leave that, I'll bring that back 'cause I'm gonna need it next. If we can go back to the slides, actually. So really, few apps need to perform queries. If you need to do it, it's not that hard, but it's not the sort of thing that you have to think, oh, what do I have to do to adopt Spotlight? I need to do this. Not everybody needs to. It's great if you do. It's not very hard, but if it's appropriate for your application.

Queries are C-like expressions about the attributes that you want to search. So you saw we had some very simple expressions with standard equals. You can build much more complex ones with parentheses for grouping. So you can do ors and ands and so on to build. Date is modified between this date and it's less than this other date, or it's in this other date range. I mean, you can build some fairly sophisticated things if you'd like. It's well integrated with CF run loops. It would be kind of a pain. If this was bolted on the side, you had to jump through hoops and do contortions to make it work with your application.

But, you know, we've worked very closely with the Finder team to meet their needs and other applications like the Bullsearch demo or that AskMac demo. So we kind of understand how it should be integrated properly. There's options for doing live queries. So if you want to continue to receive updates and notifications on the fly, we support that. And as I mentioned or alluded to and sort of demoed with Bullsearch, there's sorting. And grouping features, which can provide you with some pretty advanced functionality if you require it.

Now, displaying metadata. As you saw when I ran Ask Mac, it displayed some information about the files in that it was displaying the names, but it got that through the Spotlight system. It's pretty straightforward to display metadata. You have to have an item reference. You can get an item reference in one of two ways.

You can either first get it as a result from a query, or you can create it for an explicit path. So if you know the path through some other mechanism, it was something returned to you via a file open save panel or what have you, you can just explicitly create the item.

Once you have the item reference, then you can get a list of attribute names about the item, a dictionary, an array, I'm sorry, of the names of attributes that exist for that item. So if you don't know anything about it and you want to display arbitrarily what's there, you can get that list.

And then go through and get the actual values. Or if you know exactly what you want, you can use the MD item copy attribute family of calls. And I say it's a family because there are different variants depending on whether you want to get one, a few, or all of the attributes for an item. And then you can use that information. And we all work with standard CF types. So if you have a multivalued string, you'll get back a CFArray with all of the values for that attribute name.

So in the case of going back to the source importer, if I asked for, if I call MD item copy attribute for this specific attribute, com, apple, source, type defs, I would get back a CFArray that contains the names or the values for that attribute name. Of course, if the attribute doesn't exist, you'll get a null, so you need to be aware of that.

You want to use the one that's most appropriate. Of course, bulkier calls. They're better in the sense that if you're going to get five attributes and you know you're always going to get five attributes, build a CFArray with those five attribute names and get all five of them at once. Sort of standard best practices. That way you avoid round trips back and forth to the server. So let's show you how we display metadata back in the Ask Mac application.

So going back down to that mysterious function that I alluded to about reloading the data. Here we have the table view, object for table column. And in this case, you see we come in and we take the identity field of the

[Transcript missing]

Can we go back to the slides? No, thanks. Okay.

So in summary, items consist of a list of attributes. Items are the representation of a file in the Spotlight system, and it's a list of attributes, and attributes are name, type, and value. You can get a list of all the attributes for an item. So if you know nothing at all about it and you want to find out everything that's there, you can get the full list of names, and then you can go through and retrieve the actual values for each one.

You can call... There's calls in the MD item copy attribute family for one, some, or all of the attributes that are associated with the file, and you want to use the bulk calls when possible. One item that I should... Or one attribute that I should mention that is special is KMDItemTextContent. You can't retrieve that in the sense of give me the text content for this document. It doesn't work that way, just so that you're aware of it.

Now, we've talked a bit about the CDAPI, and there's also a Cocoa API that I'd like to mention, although we're not going to go into any code samples for it. As you might expect, the lower-level core services API is very straightforward and procedural. There's the Cocoa API, which is based on NSMetadataQuery, and as expected, is higher-level object-oriented. It manages queries and results.

Use the NSPredicate class, which it should have in blue, but anyway, to populate or to initialize an NSMetadataQuery. NSPredicate is an expression about the attributes that you want to find, and that's how you would build the expression as opposed to just using a straightforward string. NSMetadataQuery also offers a grouping feature, and it's key-value coding and observing compatible, so you can hook things up to NSArray and NSPredicate. You can also use NSTreeControllers for automatic connections between queries, their results, and their display.

Another factor that I'd like to talk about is full text indexing. I've mentioned it a few times throughout the presentation that Spotlight uses full text indexing. The search kit has undergone some dramatic improvements for Tiger. Content indexing is considerably faster. Incremental search, which is something that wasn't really doable before, is up to 20x faster.

So you don't have to wait for all the results to be relevance ranked before you get results. We can start getting results on the fly. Which kind of gives you that find as you type functionality. And when you do want relevance ranking, they've improved the relevance ranking quite a bit.

Now why am I mentioning this? In some cases it's appropriate to use the search kit directly for your own private index, such as the help content or the Xcode documentation. These are things that are sort of more appropriate to private or app specific index. And the search kit APIs which were made public in Panther. And have been enhanced in Tiger are there for you to use and fully documented.

What's the current state of things? So obviously this is not a final release, so we're not done yet. There's going to be issues and things that you'll run into. There's some limits on attribute size that we've kind of self-imposed. We're sort of proceeding very cautiously with this whole project because this is sort of new territory in a lot of ways.

I mean, I know some of these things have existed before, but we don't want to put ourselves into a situation where we wind up with something that's not sustainable in the future. So we're kind of defining a fairly tight envelope, and then where we bump into it, we look at it.

Well, why did we bump into that limit there? Is this the right place to expand things, to push the boundary? And when appropriate, yeah, we'll push it. So like I said, there are some limits on the attribute size and number of attributes. When you run into these, talk to us.

Let us know what it is that you're trying to do. Why is it not working? Sometimes it's the right thing to increase the limits, and sometimes it's like, no, maybe that is an indication that things should be done in a different way. We need your feedback. Like I said, this is kind of new territory for it to be in such a broadly available general-purpose operating system, you know, these kind of metadata functionality and so on. So we want to hear what people are looking for, what they need, what they're missing, what doesn't work for them with what we have today so that we can build the system better, expand the system to meet those needs.

So summarizing what we've talked about today, importers are the main connection from your application file format to the Spotlight system. So importers publish metadata from files. Spotlight takes that metadata and makes the documents easier to find and allows them to be displayed more richly. Spotlight also allows applications to interact in more sophisticated ways.

So as I mentioned before, you don't have N applications that have to know about M other applications and all their file formats. They can just sort of ask for the attributes about the file. They don't have to bother going to parse it because the data has been published.

What do you need to do? What is the end result of this? If you have a custom file format, write an importer. That's the biggest thing, that's the biggest favor you can do for your users, for us, and for yourself. Put useful metadata in your documents. So a lot of file formats already have support for various types of metadata, like I showed with Word, there's that property sheet.

Make sure to populate that where you can with things that are interesting, things that would help the user find that document later on. And when you're doing things, if you're doing special things, manipulating a document and copying it, or doing a save as, preserve the metadata where possible, or when appropriate. So if there's an exif chunk in a JPEG file, and it makes sense, and you haven't completely modified the document so that it no longer makes sense, preserve that, copy it, as part of the file format.

So, Now, where can you find out more about this? Because I've gone through this pretty quickly and, you know, it's not like you're going to necessarily have everything in your head right now. There's a whole bunch of example code and documentation online as well as some updates to what's online and that disk image is where?

Connect.apple.com. Okay, so there's an additional disk image of documentation on connect.apple.com. So here we have... The different Spotlight importers, where you would find that documentation, the MD importer reference. The template that's there pretty much describes it all, so there's not too, too much that you have to worry about. Oops, put that in the right place. MD item, to find out how you would manipulate that, what the functions of that class are, not class, but what the family of calls are, how you would make use of them. query reference, schemas, and so on.

[Transcript missing]