Application Technologies • 58:29
Modernize the way you search for files in your application by leveraging the Spotlight Query APIs in Mac OS X Tiger. Watch and code along as we write a sample application, integrating Spotlight search capabilities and enabling complex queries on the Spotlight data store.
Speakers: Xavier Legros, Vince DeMarco
Unlisted on Apple Developer site
Transcript
This transcript was generated using Whisper, it may have transcription errors.
[Xavier Legros]
Welcome to session 104. My name is Xavier Legros. And I'll be talking to you in the next hour about using the Spotlight query APIs. So what are we going to do? So what we're going to try to do first is give you an overview of architecture, the technology behind Spotlight. And this is going to be pretty much a follow up from this morning talk when Dominique actually went through a couple of details about how to write a Spotlight plugin. And what we're going to do is that we're going to pretty much implement the second part of architectural graphics, if you were this morning at the session. In which case, we're going to do getting metadata out of a file. So we're going to teach you how you, as a developer, with a very simple set of APIs, can query the data store to retrieve metadata attribute on files. And then we'll finish, actually, with Vince, who's going to be coming here. He will show you another part of creating the data store. And in this case, it's pretty much kind of the contrary of creating a file. In this case, what we're going to be doing We have a set of parameters, search, and we want actually to get back all the files that satisfy this search. All right, let's get started. So understanding the technology. What's very, very important for you as a developer, because there are many solutions out there that try to mimic what Spotlight is doing, what's very, very important for you as a developer to understand is that Spotlight is not bolted onto the system. Spotlight is fully integrated inside Mac OS X Tiger. So what does that mean? But it means that first, at the heart of the Spotlight technology is actually your server. And that server is pretty much a central piece that you as a developer will be pretty much interfacing with. So let's see the two different set of APIs that we have and pretty much the two different feature sets we have for Spotlight.
On your right, you're going to see the first thing is that we have a bunch of documents on our hard drive. It could be on the desktop, it could be on the document folder. And the first part of the Spotlight technology enables you as a developer to pretty much extract that metadata from the files that you know to actually put them back in the Spotlight server. What the Spotlight server is going to do when your plugin sends back that metadata, it's going to do a couple of things. First thing is that we're going to store the metadata attributes inside the metadata store.
And in that data store, these are going to be attributes such as, you know, KMDItemTitle, KMDItemAuthor, and Dominique went in details this morning about how to write that first piece of software, the Spotlight plugin, the Spotlight importer. And then we have right now another content store where actually what we're going to store in there is more like, you know, the text content. And, you know, the main idea here, which is, by the way, does not matter too much for you as a developer, doesn't impact you as a developer to the end of the technology. But here in this case, what's important for you to understand is how things are fitting in the system. And here, the content store will store pretty much a binary representation of actually the text that you're going to pass with the KMDItemText content.
All right, the second part of this technology that we didn't talk about this morning is actually querying the query APIs to actually query the data store. And when I say the data store, it could be the metadata store or the content store. But pretty much here, it's like a simple set of APIs to interface with a Spotlight server to start queries. And this is what we're going to be doing here in this hour. So just in case you didn't go to the talk this morning about writing a Spotlight plugin, let me just go quickly about how things work inside the system. OK. So first, here is your application.
Your application is going to write a document. When you're going to close the document, what's going to happen is that the kernel is going to send a notification to the Spotlight server. Once again, this is another example where you can see that what we did with Spotlight is really to integrate at the lower level as possible to be as powerful and as fast as it could be. So here, the kernel is going to send a notification to the Spotlight server. And the Spotlight server is going to go and say, OK, so what's that file? What's the file type? Is it com.apple.blah blah blah? Or is it public.text? The first thing the Spotlight server is going to do is say, do I have a Spotlight plugin that knows how to read that file format? In this case, because you're good developers, you already have a Spotlight plugin available for your file format. So what the Spotlight server is going to do is going to take a reference-- in this case, we're going to pass you to the CFString, call your plugin, and pass it actually the reference to the file. When I say reference, like a path to the file. Once again, Dominic this morning went through great details to explain to you how to implement that part of a software. So that's great. You export the data. Boom, the Spotlight server stores it. Data store or content store, doesn't matter for you. It's pretty much transparent.
Now the second part is querying the data store. And here in this case, represented with orange, I guess, you'll have a set of APIs that enable you to do the query. So as a developer, if you really want to simplify, you can be green, or you can be orange. Isn't that great? That was my joke for the session, by the way.
All right, so what do we have? As a developer, you have three different set of APIs to help you interface with Spotlight, okay? The plugin APIs first, and once again, we went through great details this morning. The key point here is that if, as a developer, your application or your tool or whatever your business is on Mac OS X generates a custom file, what I mean by that is a file that only your application knows how to read, you have to write a Spotlight plugin to interface with our system, okay? And once again, this is very straightforward. If you didn't go this morning, come back to the lab and we'll show you. This is very, very straightforward.
The second part of APIs, and which is kind of like the first part for querying of a Spotlight server, is actually getting what I call the get metadata APIs. And here in this case is that you have a file, and what you want to do is find out what are the metadata associated with that file. So you have an FSPEC, you have a CFURL, whatever you want. And you want to find out maybe the creation date, maybe what's the author, what are the key presence. So that's going to be the first thing we're going to do today.
And the last part, which I think is actually a very powerful set of APIs and very simple, is you want to query the database and get back results like a set of files. So you want to say, show me all the files on my system that are JPEGs, that have the flash turned on, and that type of resolution. And then we send you back a set of files. You know, a little bit like what the Finder is doing, obviously.
As a developer, what's important for you to understand is that in your project, Carbon or Cocoa, it doesn't really matter. In Xcode, what you have to do is link against core services. The metadata framework is actually present inside the core services framework. And I suppose most of you are already linking against that framework. All right, so I gave you a quick overview of the architecture, the different set of APIs you can use depending what you're trying to achieve. Remember, you have a custom file format. use a Spotlight plugin, write a Spotlight plugin. Then after that, you want to do queries. We have two different type of queries. First type of query will be you have a file, you want to get the metadata. And that's what we're going to do here in this part of Apprezzo. All right, so first, as a developer, why would you want to query the Spotlight server for getting back metadata? Well, a couple of cases. Case number one, you're an application, and it could be you're doing 2D content or you're doing 3D content, but your application generated files, but has been actually using different other file formats. And what I mean by that is that take, for instance, a 2D drawing application, and the guy, the user, goes and selects a JPEG file, drags and drops it inside your application.
Maybe you want to keep a reference of that file, and maybe you want actually to present the metadata associated with that file. So this is the type of API you will be using. Another idea would be, obviously, if you wanted to display the properties for your own document, or if you're in the business of tracking documents for instance, maybe your application manages different revisions of files across like the product cycle or the project cycle and you wanted to query, well, I'm working on advertising for the Apple computer right now on the T1 and I want to get all my PDF files and I want to get all my JPEG files and all my PostCrit files. Maybe you want to do a query and get back specific attributes from these files inside your project. This once again will be the type of APIs you will be using to achieve that feature, to implement that feature. A couple of examples that we have. Remember this morning-- I don't know if you were here this morning-- but in Dominic's talk, he went to the Finder. And in the Finder, when you command I to get the information of the file, we get pretty much the metadata information with a file. That would be the type of feature you could implement in your application, get info panel type. All right.
Before we go and we dig a little bit more about the APIs and how you're going to be able to implement that, how you're going to be able to architect your application to achieve these features, I need to talk to you a little bit about the MD item ref. The MD item ref, think of it as this little nice box that actually the Spotlight server manages. Each file on the system is represented in the Spotlight data store by the Spotlight server as an MD item ref.
So think of it as a representation of a file on the disk. And each MD item ref can contain several metadata attributes. It could have a KMDI item author, it could have the pixel resolution, the width, the height, but as well, as we saw this morning, it could contain as well file system attributes. And here in this case, I don't have any, but you have to understand that inside the data store, we have actually the file system creation date, the modification date, the file size. So all these attributes are actually available as well through that simple set of APIs are stored in the MDI time ref. So this is pretty much the key object you're going to be using, you're going to be manipulating in order to query the Spotlight server.
All right. So remember here, what we're trying to achieve is I have a file, and I want to get back all the metadata, all the attributes that are associated with that file. There are three simple steps in order to implement that feature. First, you're going to create an MD item ref for the file. Logical. I told you the MD item ref is pretty much the file representation inside the data store.
So we're going to have to create that for our file. Step number two, what we're going to do is that we're going to query the Spotlight server to get back all the keys, all the data keys, all the metadata keys that actually are stored. What you have to understand is that certain files will be exporting the width of a picture, the height, if the flash was on. But others will be exporting things such as the author, the keywords, and that type of data.
So the second step is to get back a list of all these different keys that stored inside the Spotlight server. Once you get these keys, what you do is that you select the keys you want, because maybe you're not interested in the file system attributes. Maybe what you want to do is like, I don't care about all these flags. If the flash was on, it doesn't matter. I just want to get back the width and the height of the picture. So what you'll do is that you'll manage that and query the Spotlight server again for specifically the keys you want, and you retrieve the data. So three easy steps.
All right, so how does it look like as a code? Once again, very, very straightforward. Step number one, remember, mditemcreate to get an mditemref for representation of a file. Here in this case, we hard-coded something with a CFSTR, the macro to create a CFString on the stack. But in your case, if you have an FSPEC, you go back to a CFURL, and from the CFURL, you get back the CFString, and you're in business. So you pass that to mditemcreate, the default allocator, we don't care, and the path. And you get back this little nice, this neat little object, an MDI 10 ref. Great. So that's step number one. That's cool. At that point, what do we have? Very simple. Just the link, an MDI 10 ref that is written on the Spotlight server and our file on disk. OK?
All right, step number two, remember now we're going to go and query the data store. We're going to query the server to say, okay, tell me all the different keys that that file contains. Tell me all the metadata keys that that file contains. Very straightforward, we're going to define a CFR. So you know all the names of the keys. And remember when I say the name of the keys here, I talk about the com_com.camdi_item, blah, blah, blah, like whatever keys you will be using. And what we're going to do is that we're going to call mditemCopyAttributes and we're going to pass the item ref. Very simple, that sends us back an array and inside this array we have all the keys that are present. We have the keys at that point in time but we don't have actually the associated values with it. So what you're going to be doing from there is that two things. You could go back and say call mditemCopyAttributes and we send you back all the keys with all the attributes associated. But maybe in your case, you want you to be a little bit more clever, I would say, and just request a couple of keys. Maybe like some files would have like 40 attributes. And in this case, you don't want to do that. So between like the mditemcopy attribute names and the mditemcopy attributes, you could actually just go and work the name array and just like remove maybe some of the keys in there. Very straightforward. So at that point, what do we have?
Very simple. We get back our results, which is pretty much like an array of keys and data. Here in this case, you can see that Vince has been doing some trading and has $10 billion probably in his account balance, which is pretty good, we think, for an Apple engineer. And we have actually a couple of keys associated with all that data. So that's pretty nice, but there are two things that are weird here. The first thing is, how did Vince get $10 billion during day trading? But that's just a side question. The real question for this session is that Wouldn't it be better if we could actually have real text, right? What you want to see is that what that key really means. So when you send back the result to the user, you could display, you know, real data. Obviously, you don't want to send back that and, you know, show that to the user. So got text, absolutely, we have got text. And here the main idea, what we're going to achieve now is that we're going to try to get, you know, the text that is associated with the key. But there is actually better.
So what you can do with a very simple API, actually, is get back the localized names of attributes. And the key point here is to achieve, instead of getting back the KMDItem author key, what you get back is, I want authors. And obviously, if it's authors in English and you're running in French, what you would like is like, auteur. Make a difference? In French.
And what is very cool is that if you've been to this morning talk, you know that people writing Spotlight plugins for their file format can actually define their own keys. And when you define your own key in the schema file, you're going to be able to actually define a description as well. So you could have the fact that this key is author and give a description. This is the author for the document, blah, blah, blah, blah, blah, blah.
All right, so how are we going to do that? How to obtain localized names for our attributes? First thing, you're going to call mdschema copy display, name for attribute, and I think it's probably easier for you to just read on the screen, and you're going to pass the key that you get back. In this case, it's, you know, I just coded it here, but, you know, what you will do is probably pass, like, you know, an ID from, like, you know, the CFArray that we got back previously, okay? Very easy. And then you get back a CFStringRef, which is pretty much the full name. Excellent. So that will send us back, for instance, in this case, if I were to pass KMDItemVideoBitRate, DBVideo if I was reading in French, and VideoBitRate in English. Sorry, my English is a little bit rusty.
What's very, very important is that if you're in the case of writing a Spotlight plugin for your file format, it's very, very important as well that you offer translations for all of these keys. I know that some of you may think that there is no French person that would use our software, but trust me, we do have a bunch of US software and it's great when it's localized in different countries. So if you're writing your Spotlight plugin, remember you can localize all these keys. So please pay attention to that. So now remember what I said is that here in this case, what we've done is that we go back, you know, not the description but pretty much like, you know, what the key is about. So in this case, author. What is nice is that as well you may want to get the full description of what that key does. And here in this case, once again, very simple API that will send you back the localized version of a description. MD schema copy display description for attribute.
We're going to try to make like longer file API names next year. And you pass the key. And once again, something that before here what you will be doing is you'll probably pass an ID inside the CFRA that was sent back from the previous API. Okay? So what did we achieve here? We get back for instance in this case that would be the name of the media file, what the name of the media file is or whatever full description you would have in your schema file.
So in this first part, what we did is that we got an MDI time ref, remember. From there, we queried and we got back all the list of the keys that actually are stored in the data store. And then after that, I queried the Spotlight server to send me back only the data that I wanted for the keys.
Once you have that, I show you very simple APIs that enable you to get the localized name of the key and the localized description. To show you that in action, I can invite on stage Vince, Monsieur Vince, as we call him. Hello. And Vince is going to walk you back through the code and show you the second part of his presentation. Thanks. Oh, can you go back to the slide again? So hi, I'm Vince DiMarco. Can you go back to the presentation?
So I'm Vince DiMarco, and I'm a member of the Spotlight Engineering team. So what I'm going to do today is show you what Xavier just showed you a little while ago. So I'm going to make a graphical version of MDLS. MDLS is a command line tool that lets you see all the metadata associated with a file. Except in this version, what's basically going to happen is going to take a file from the Finder and drag it and drop it on the window. So all you see in the top half is a little text field with the path of the file and then a table view showing all the data. Can you switch to the demo one?
So I'm not going to describe all the code that's going on here, but what happens as soon as the file is dropped from the Finder onto this window, this method, File Dropped, is going to get called. So within the notification that gets passed, in the user info is a list of all the paths that the person selected. I don't really care about all the paths, so I'm just going to grab the first one. So the first thing I'm going to do is just set the text field at the top of the window with the path the user selected. So the first step I want to do is I want to get the MD item ref for the file that the person created, that the person just dragged and dropped there. So I get that. Before I go any further, I really need to check to make sure that the ref that I got back is valid. So if any error occurs, we end up returning null. A possible error is the person does not have permission to read the file, or the file may have been deleted in the process of doing the drag. So as the drag was released, the file went away. So now that we know that we have an MD item ref, the first step is to get all the interesting attribute names. So Xavier showed that. So all this window is going to do is going to show all the possible attributes, not any select set.
So this will return an array of all the attribute names. The next step, I want to get a list of all the attribute values. So I'm going to get the CFDictionary back of everything that I want. So now that I've got all the data, I'm basically going to tell the table view to update.
So table view in Cocoa basically shows rows and columns of data. So in this case, I've got two columns and n number of rows. And the number of rows is equal to the number of attributes in the array. So I'm just gonna return that as the count. So the number of attribute names, if that has been set, it's just equivalent. So the number of items in the array is equivalent to the number of rows in the table view. So the last step here is to actually just try to display all the data in the two columns. So the first column is the name of the property that I'm interested in.
Actually, the first thing I want to do is I want to get the attribute name for that particular row. So TableView as it's loading its data calls you and tells you that it wants the data for a particular row. So in this case it's like row n. So the first thing I want to do is I'm going to grab the attribute out of the array that I'm interested in. So I'm going to grab attribute name out of the inspected ref attribute names. So next I want to return this property back to the TableView to display it.
So I'll just simply return that. And then the last thing I want to do in the second column is I actually want to display the individual pieces of data. So that entails just looking it up in the dictionary. So I'm going to grab the value out of the inspected ref attribute values, which is the dictionary of the keys and values, and then return it. This code down here is just basically trying to reformat the data to make it look a little nicer. Here, so I'll compile this and we can go. So here's the window that I just ran from the program. So we'll go into Finder and I'll drag and drop a file onto it.
So here's the file that I just selected. So here's all the attributes associated with the file. So the things that are interesting is-- so this file's got two keywords. It's a lotus and a lease. And this file was taken with a Minolta camera. It's a JPEG image. So you can see all the content types. So all the first column of the name of the property is not in English. Even we have API. So the next step would be to localize these values so they display a little nicer. So this is really easy. So right here, instead of grabbing the original value-- attribute name. Oops. Oh, jeez.
So instead of returning the attribute name like I was doing in the past, I simply call mdSchema copy display name for that attribute that I was going to display. The thing to note here with mdSchema copy display name attribute is right here I'm checking if it actually has returned any value. If it returns a value, then it's okay to display that to the user. If it doesn't return anything, the intent is it's not--it shouldn't be a user visible value. It's not interesting to the user to see it. So I--and I will show you this.
So if we run the same program again one more time with the same file. So all the interesting user visible things, so instead of saying KMDItemDevice, AcquisitionDevice, now it says Device Make here. And then the keywords, instead of saying KMDItemKeywords, it says Keywords in English. There's other things like exposure time, so it's all-- The one thing I didn't do in this program, which you probably may have noticed, is I'm leaking all the MD item refs up at the above. So if you drop the second file in, you'll leak the first one. So to be nice to the system, I should probably clean this up, and I'll just do this right now. So all you have to do here is if we have an old inspected ref to CF release the old one, and then set it to nil. And then I should get rid of the attribute names and the values.
And so I don't know anything's happening. I'll send them to null. OK, that's it for that. I won't bother running it, because I just did. So can we switch back to the slides now? So that's basically how you get all the data for individual items. So the next step would actually be to perform a query to find a file that's interesting to you.
So why would you even want to search for any files in Spotlight within your application? Well, the first thing that comes to mind is it's a cool feature and you should probably just do this in your application. But a more legitimate thing would be to enable search within your application. So it allows the user a new way to find interesting files. So one example that comes to mind is if you're writing a 3D modeling program.
And the person wants to find a skin texture to apply one of their objects in the modeling program, but they have thousands of textures. It would be impossible for them. They really don't want to get a preview for all 1,000 files, try to find which one they're actually interested in. So it would be nice if some part in your UI they could type skin, narrow the search list down to 10, and then only have to get a preview of 10 individual items instead of 1,000.
Another thing is to find related documents. So now more and more people are working on, like more and more documents to get their individual work done. So an example of this would be making a magazine, so if you make an article in a magazine you need the... The desktop publishing program to put all the files together, you need the text for the article and any associated pictures. So it would be nice if within the desktop publishing program they could say, "Tiger article spotlight by me, Vince DeMarco," and then find all the related files and then pop them all together and make the finished presentation.
So how do you actually search within Spotlight? It's actually--it is as simple as Xavier just showed you. So there's just a few key objects you need to understand. You need to understand how the query language operates. You need to understand the different modes of the queries. And then we're going to go together, bring this all together, and implement an example to show you how to do this.
So searching with Spotlight requires the two key objects. So the first one is MD Item Ref, which represents a file on disk. I'm not going to go into it any further, because Xavier explained that a little earlier within the presentation. The next item that's of interest is the MD Query Ref. So this represents--it's a capitulation of the query results and the query string itself.
So the query language within Spotlight is a simple C-like expression. So you have an attribute on one side is equal to a particular value. So in this case, I'm searching for KMDItemContentType, which is the type of the file, and it's equal to public.rtf. This will basically return all RTF files stored within the system. So all the different types-- you can have numbers, strings, and dates of all the attributes, and we have all the standard operators that you can do in C. The only one of note that's interesting is you can do ranges of numbers. You can group the APIs with ands, ors, and parentheses, and you can do logical nots of the big group of expressions.
So some examples of some queries. So once again, if I want to find all the plain text documents in my file, I have kmdItemContentType. That's the attribute name, and I'm searching for the value of it is public.plain-text. So the second example is if I want to find all the documents with my system that have the word WWDC. So I'm searching for my presentation. So I'm searching for KMDItemTextContent, the attribute, is equal to the value, or in this case, contains the value, WWDC. The third example is I'm searching with an array. In Dominic's talk this morning, he described the keywords are actually an array of strings, so a document can have many keywords associated with it, not just one.
So in this case, I want to find all the documents that have the keyword "important" associated with them, and it could be-- it'll return a whole bunch of them. The last example, I'm searching-- I'm basically searching for the attribute "display name," and I want to find all documents that start with an uppercase A and are any random number of characters.
of any length. So in all these examples I'm showing, I'm finding actually exact matches. So the last one I'm finding A, any document that starts with the uppercase A, but of any length. It would be interesting if I could narrow the search down without having to be really explicit in my query.
So to do this, we have string match modifiers. So there's three basic string match modifiers. At the end of every query, you can specify a C, which means that the query is case insensitive. So in this case, spotlight all lower case is equivalent to spotlight all upper case.
The second string match modifier is the diacritic insensitivity. So if I have a bunch of English documents and a bunch of French documents in my system, Since I'm a native English speaker, I'm not going to spell elegant correctly in French with the accent on both the E's. I'm probably going to spell it in the English form. Same if I'm searching the document. If I'm a French user, I'm going to put the accents. If I'm an English user, I'm not going to really bother. So this is so you can say elegant is equivalent to elegant with the quotation marks. With the accents, I'm sorry.
The last modifier is I want to find is the word-based modifier. So it detects transitions in the case. Lots of times people end up doing camel casing words. So you go spotlight with a big S and a big L. And they're actually using that to signify that those are two separate words, but they're written together. Same thing with text edit or interface builder. So in this case, light is equivalent to spotlight. So that word exists in that match.
But the second example we have Paris is not equal to comparison. This would be a match if in comparison it was an uppercase C and we had an uppercase P in the Paris and an uppercase O in the on, then it would match. So if we put this all together, I want to all the files on my system that have the attribute KMDItemTitle that contain the word light, and I'm going to search it word insensitive, diacritic insensitive, and case insensitive.
Oh, and the word match too, it also applies to spaces, underscores, and dashes and dots too. So if we want to-- it would be interesting to put this all together. All the type information in Spotlight is based off of the UTI hierarchy. And in UTI hierarchy, everything is inherited. So we have-- at the top of the hierarchy, we have public.data or public.content. And below that, somewhere, we have, like, public.image. And all the image file formats inherit from public.image.
So this allows you to find, for example, all the audio content or all the image content. So instead of having to find all the public.jpg, public.gif, TIFF, et cetera, all the hundreds of different file formats, you can say in one query, "I or they edited that wrong. So I can find kmdItemContentTypeTree equals public.image. Ignore the equals equals WWC at the end. That's wrong. So this would find all the image files known to the system with one query without having to specify each individual types. But you could do that if you needed to and you only wanted TIFF or JPEG images. So you can put these all together to make a more complicated query. So in this case, I want to find all the images on my system that have an alpha channel and have a height DPI greater than or equal to 300. So it's KMDItem, content type tree is equal to public.image. That's all the image files. And KMDItem resolution height DPI is greater than or equal to 300. And KMDItem has alpha channels equal to 1. So that attribute is set. So we'll do a little demo using MDFind. Can you switch to the demo machine?
So MDFind is a little command line tool that lets you type in queries and get back results. So what I'm going to do as a query is I'm going to find all the application files on the system. So remember at the beginning, a couple of slides ago, I described that the UTI hierarchy is inherited, so we have different kinds of applications on the system. So you can have a packaged application or a non-packaged application, but they all inherit from com.appleapplication. So we can find all application files on the system. The item.
So that'll be all the applications on this machine. And there's quite a few of them. I'm not going to read each individual one out to you, because that'll take too long. So we've got lots of files there. So I want to narrow down the query to find only applications that have that only applications that have a in their display name So I'm just doing *A* and then making it word, diacritic, and case-insensitive.
So if we run this query again, the list is a little bit smaller. So in the last step, I want to try to narrow this down any more. I want to find all applications that have the copyright-- that are copyright by Apple. So we'll just keep extending this query.
So I'm just going to find the word Apple within the word, within the document, within the copyright string. So I'm going to do this word insensitive, diacritic insensitive, and case insensitive too. So let me clear this screen. So now we have a much smaller list again. So let me just run this query again with a smaller list. So if we do only applications that have to start with A instead.
There's a little demo and you have a lot left. The interesting thing-- the only real thing of note-- can you switch back to the slides? The only thing of note to take from that demo is while you're making your query, it's best if you just try it out on MDFind. You don't have to write any code and you can see-- you should try to make a query so-- you're basically trying to narrow it down to a small set of files. So you can try your query on MDFind and see if it narrows down to the files you're actually interested in. So the next step now that you know how to make-- you know how to write some simple queries and you need to implement the search within your application.
There's basically two ways you can implement search within your application. You can open the search--the spotlight search window, so that's that little menu on the top hand corner. If the person types in a string, you get the results and you can select the show all window--show all which brings up the spotlight--spotlight results window. The second way of doing this all is just to simply perform a query within your application. So opening the spotlight window is really, really simple.
And it provides a really easy way for you to integrate Spotlight within your application with a minimal amount of work. So all you need to do in this case is call hi search window show with a string you want to search for. This is the same string that the user would have typed into the field in the Spotlight window.
This is already integrated in a couple of applications. For example, Address Book. There's a little action menu at the top window of Address Book. It says Search in Spotlight. It's basically doing this. And in TextEdit, if you select a word within your document, you can go Spotlight, and you find all occurrences of that word everywhere on your hard disk. This is great to present related items to the user, but you as an application developer cannot interact with this window in any way. You can simply present it, and that's it.
So if you actually want to do anything further, you need to write a query within your application. Executing a query in your application is three simple steps again. So the first step is to create an empty query ref. The second step is to register the callback so you get notified of updates and changes to the query. And the third is to execute the query. So step one.
is to create the MD query ref. So all I'm doing here is I have a query string. In this query, I'm searching for KMDItemContentType is equal to public.plain-text. I'm searching once again for all plain text documents within my system-- on my system. And you just call empty query create. And the second parameter is the query string. This creates a standard CF type object, which you can retain and release. And you can put in all the CF collection classes.
So the second step after you've created your query is to register for the callbacks. The reason that this is the second step and not the first step is in CF notification center at observer, you're basically telling it which query you want to observe. It's the second to last parameter. It says query.
I want to be notified of any changes on this query. You can also pass null and get notified of any query happening on the system. The other thing that's really interesting to note here is that I called the CF notification center. I got the local center, not the distributed notice center. This just got added into Tiger recently. You would do the same thing in a foundation class. You would just register with NS notification center, the default center. So in my example, I'm only listening for the finish notification.
But you'd also typically listen for the progress and the update notifications. So of the three notifications we sent, the first notification we send is the progress notification. So as your query is running and the server is gathering as it sends the results back to your client application, you get the progress notification, so you could do something in your UI. So you could update a table or increase the size of a menu and show the person the results as they're being fetched.
The second notification we sent is the finish notification. So as the server's finished and completed your query, it's gotten all the results it can at this point, it's going to send you this notification saying it's all done. So maybe you could stop the spinny cursor, I mean the little spinny progress indicators.
or notify the user that they're done. The last notification that we send is the update notification. So after the query is finished, it goes into the update phase. So as the user creates or destroys files that match or don't match your query, the results set will change as this is happening. The only thing to note here, this only gets sent if the query is live, which I'll explain in just a second.
So step two and a half, you have to implement the notification callback. So this is what you would do in CF. So this is the standard thing you would do in CF. There's really nothing special here. You could do the same thing in foundation. It'd just be Objective C classes instead. So the thing that's interesting here, and the same thing applies to in foundation, is the object that sent the notification is your query. So you can use this to look up whatever information that you need. You might need to get from the query to know what's happening.
So the last step we need to do is now we need to actually execute the query. So this is simple. We simply call mdqueryexecute. The first parameter is the query and the second one is flags. By default, you could pass zero which I'll explain what that means. So in this case, I'm calling--and the flags can be one of two things. So we can say--if we say on the flags and you order them all together, you can say, KMD query will not return until it's fetched all the results that are available, basically until you would have gotten that finished notification.
The second flag is KMD query wants updates. This is telling you after the query is finished, if the user deletes or creates any new files, let me know about it. So by default, if you set the flag to zero, your query will be running asynchronously and you will not get updates.
So the next step after you've executed the query and you've gotten the notification somewhere is you need to retrieve the individual results. So the retrieving results basically involves two calls. It's only two calls because conceptually you can think of an MD query as an array. So an array-- a read-only array really has only two appropriate calls. You can ask it how many things are in the array, and you can get an individual item anywhere within the array. So hence, we have these two calls. So we have MD query get results count and MD query get results at index. So mdquery getResultsCount just simply returns the number of items in the result set.
And there's really not much to that. You just get the count and you can iterate over them. The next, as you're iterating over them, you can get the particular item at a particular index. So you simply call mdQuery, get results at index. And you just get the mdItemRef at that particular index value.
So the thing to note with retrieving results is if you've started your query in live mode, as your query is executing, the query result set is constantly changing. So if at one point you ask it, "Give me item at index 4," by the time in your code, if you get the--you want to know you get index 4, so if you stashed away this index you're interested in, then later in your program after some period of time, you go to the array index 4, it might not even be there and it might be out of bounds now. So to get around this problem while you're currently iterating, you can update, enable and disable the query. So basically everything is kind of freeze dried for that period of time so you can look at it.
You don't need to do this. If you're only going to be looking at the result set within the callback, within the notification callbacks, you don't really have to enable and disable the query. That's really done for you. And enabling and disabling a query is stacked. So if you've done four enables, you need to do four disables. Or, I'm sorry, backwards. If you do four disables, you need to do four enables for it to get turned back on again. If they don't match, it'll stay in the last state that it was. So the biggest thing to take away from this is the results are live. Everything can come and go as you're working on them, so be aware of that.
So we'll do a little demo to do a query. So this is going to be this application, which is a simplified version of the search window. So basically in that top search field you're going to type some string that you're looking for and then you'll get the results below. So can we switch to demo one please?
Let me just get a drink of water. So the first thing we do as the user is typing into the top search field, what's happening is this start search method is going to get called. So as they type in a character and the appropriate delay, however the Cocoa search field operates, we get the string that the user entered. So the first thing we want to do is create a query string from what the user just typed in. So we're going to basically do a very simple query.
we're going to find all the metadata star is equal to what the user typed in, and we're going to know our case, word, we're going to be word insensitive, case insensitive, and diacritic insensitive, and we're also going to search the text content equals what the user typed in, and we're going to do case and diacritic insensitive. The thing to note here is I'm not checking if they've typed any special characters like quotes and any of that kind of stuff. So if they type a quote, the query is probably going to be malformed, but for this demo it'll be just fine. So given--now that we have the query string, the next step is to create the MD query ref.
So given the query string right here, I'm gonna create the MDQueryRef with a default allocator. So once again, to make sure that the person hasn't typed in just some garbage, so they can type "hello" quote there, which in this case wouldn't parse, 'cause we didn't have enough quotes. You really would have to escape it, but I'm not doing any of that. So what we're gonna do is check if the query is okay. If the query is okay, the next step I'm gonna do is I'm gonna register for all the notifications that I'm interested in. So in this case, I'm gonna do this in Objective-C.
with a default notification center. So I'm registering the finished notification, the progress notification, and the update notification. And I'm going to have them call my update data method. And I'm only interested in the current query that I'm executing. So now that I've registered for the notifications, the next step I need to do is actually execute the query.
So I'm going to execute the query and I'm going to tell the system that I want to be notified of updates. I'm also running it asynchronously because I haven't put any flags there. So now that I've got the query back-- so now the query's going to start running in the UI. So the next step I want to do is I want to update the UI. So as the query is executing, I'm just going to call-- my updateData: method is going to get called. And in this case, I'm going to reload the table view.
and then I'm gonna set the title of the window. So the title, I'm just gonna say the query that the person entered in and how many matches they currently have at this point. And then the number of matches at this point, I'm just gonna call mdquery getResultsCount, which will say that I have 10 or 14 or however many I have at that point. So the final step, is now I need to update the table view in the result query. So the number of rows in the table view is equal to the number of items in the query. So if the person did enter a query that we have, I can ask the query for its result count. If we don't have a query, then I'll simply return 0.
So then the last step in the table view, table view wants to get the data that it wants to display. So it's going to pass me the column that it's interested in, which is that, and then the row that I'm interested in. So the first thing I'm going to do is, given the row, I'm going to get the query result at the particular index. So I'll get the mditemref. You'll notice here I'm not-- yeah, I probably-- in this code, I really should be enabling and disabling the query. I'm not doing that here. And technically, that's incorrect too. But it'll work for this demo. So if I'm updating the first column-- All I'm gonna do is I'm gonna get the path of the individual attribute, calling mditemCopyAttribute with the mditemRef that I'm interested in, which I've just gotten above right here, and then I want the attribute path So I'm going to take the path, and then I'm going to call NSWorkspace and get the icon for that particular path. And then the last step here is... So I'm going to have the two columns in the table view. The first one's going to be the icon, and the second column will be the display name of the file. So the display name is equivalent to what you would see in the Finder if you were looking at it there.
So I'm just gonna go the same thing, the same-- do the same thing again. I call mditemCopyAttribute with the mditemRef of interest, which I got above again, and I get the display name. And I set that to object and then release them. So if we build this one... Whoops. Oh, so I have a syntax error here. There we go. This is just to show you that it's all live and I'm not faking this all.
So here's the query, so if I type in "Lotus," I end up getting 18 matches, and here's all the files. So in this case, they're all pictures. I can go a little further and type "dress book," and I get-- some cards from address book, applications, and then any source code that happens to be on the system. Okay, can we switch back to the slides, please?
So now you know how to do all the simple type queries. Now we need to take this a step a little further and do something more interesting. So this is where I'm going to talk about some more advanced topics. The first thing you noticed in my little demo, I didn't do any sorting.
They just came back in the array-- in the order that they happened to come back in the server. So the metadata library provides some simple sorting you can do. The only sorting that it actually does is sort in ascending order. So in this case, the last parameter of mdquerycreate is a CFArray containing the names of the attributes you want to sort in ascending order.
But you can do further sorting, but you have to provide your own callback function. So, for example, the search menu sorts the dates in descending order and names of the files in ascending order. You can do that kind of stuff. You have to provide the callback on your own.
The other thing you'd like to do is scope the query to a particular directory. So maybe you only want to search the person's home directory or only a particular volume or only a particular hard drive. Finder lets you do this when you set the search scope in the little search slices. Thank you. Thank you. So we'll go back to the demo machine again, and we'll end up-- and I'll add sorting and the scoping of the individual queries.
The thing to know about the sorting and the scoping, you need to do this before the query execute. After the query is executed, you cannot change any of these values. So if we go back here, now that we've created-- so now we've created the query. The first thing I want to do-- oh, sorry. Instead of creating the query-- so in the past-- oops. Oh, jeez. OK.
So I'm going to create the query now. So the last parameter, I'm going to pass it a CFArray, or an NSArray in this case, and I want to sort by display name. So that's just the array containing all the strings. I could have easily added more and more attributes.
So that's that, so now they'll be sorted. And then the next step to do is I want to set-- I want to limit the scope of the query to only search in the person's home directory. So in this case, I have the query again. I call mdquery, set search scope. The first parameter is the query. The second parameter is an array of-- CFStrings containing the path or CFURLs pointing to the path where you want to search for your files. Here you can also pass-- there's a bunch of known constants you can pass. In this case, I'm passing kmd_queryscope_home to only limit it to the person's home directory. There's some to limit it to just the network or the entire computer.
So if we do the same query again and I type "lotus," I'll get nine matches instead of the previous time, which I think I got 18 because it found the rest of the files on the rest of the hard drive. So these are just the pictures in my home directory instead of a picture of a lotus, which was somewhere else of a flower. Okay, so can we switch back to the slides again, please?
So the last thing I'm going to talk about is fetching the query attributes. So basically, all this entails is as your query is executing, the query is actually sorted on the client library. So in order to make this more efficient, what happens with the server sends-- when we send the request back to the server to get the query results, we send it another message saying, please send us these list of attributes along with the result. So basically, if you can conceptually think of this as the query starting the array of the results. And alongside of it, it's got an array. So the first array is array of empty item refs.
And alongside it is another array of the attribute values. So if you're sorting by display name and author as an example, which-- so in this case, I'm doing it by content type. So when I sort by these values, I've also got them locally in the client library. So it would make sense for you to have access to this. So this is all basically a bulk call from the server. So as I get the results, send me back these values, because I'm going to need them immediately.
The only reason-- the big reason why you want to do this because every time you're making-- so we do this in the sorting because if we kept doing-- to sort them, we would need the value and to get the value we'd have to make a round trip back to the server, ask it for the value and get the value back and sort them. If we were sorting 100,000 items, we would be sending 100,000 message, 100,000 plus messages as we're doing the sort.
So to get these values, there's only two basic calls. So if you--once again, if you conceptually think of the query as storing off to the side, so we have an array of the MD item refs that we're interested, and then off to the side, we have another array of the attribute values. So it seems that we only need two calls.
So we call mdQueryGetIndexOfResult, given the query, and then the mditemref that we're interested in will give you the index on that second array of the value you want. And then to look up values within that sort of subarray, we call mdQueryGetAttributeValuesOfResultAtIndex with the first parameter is the query, the second parameter is the attribute that I want, content type, and then last is the index. If the value is not there or it's empty, you'll get a nil back. So--and that was the talk. So I'll invite Xavier back. So as you notice, it's not that much code in your program to try to do a query. It's very, very simple, so you should all try to integrate it in quickly within your application.
Hello, hello. OK, so to summarize, I think today, between Dominic's talk this morning about plugins and our talk on MD queries, I hope we gave you a good overview of how you, as a developer, can integrate with this great technology. I mean, you all seem like all the marketing we've been pushing behind Tiger Spotlight is obviously number one. I think there are really a lot of things that you guys can do on your side of the fence in your application to take advantage of that, to distinguish yourself from the marketplace, but as well to bring tremendous innovation on the platform with your application. So remember, there are a couple of things I want you to remember from today's talk. The first one is that Spotlight is totally integrated inside Mac OS X Tiger. So for you as a developer, number one, if you have your own file type, if you have your custom file format that your application generates, please do write a Spotlight importer. OK, very, very important. Then after that, obviously, I think, I hope today we gave you a quick overview and we reconvince you that adding query APIs inside your application could make a lot of sense and hopefully we showed you the different ways you can integrate that in your application.
For more information, we have a couple of new features. Today, this afternoon, the UTI. Obviously, we talked this morning, UTIs are very important across Mac OS X now, the Unified Type Identifiers. And so Chris actually will be talking later on at 5:00 about UTIs and how you can declare them in your plugins, inside your application, whatever different things you should look for. The lab, which starts, if I'm not mistaken, not at noon, but at 5:00, I think. Is that correct?
3:30, very soon in all cases, so check your little agendas. Where you have actually the spotlight team pretty much there to answer any questions you may have. It could be on plugins or on the MD Query APIs. It doesn't matter. Just come by, talk to us. I know this morning a couple of folks had questions. And so this is a great way to get your questions answered.
And if you want to learn more about the file system, we have a session as well on Thursday, Thursday, today at 5:00 PM. And Dominique will be part of that, if you want to learn more about the file system. Correct. And thank you for reminding me that there is another lab tomorrow morning starting at 9:00. And we hope to see you there. And thank you.
One thing that is very important, whatever you're doing, if you're going to write using the MD Query APIs, if you're going to do a Spotlight plugin, please send me an email. We want to track who's doing what, and we hope we can help you actually promote the integration of Tiger technologies inside your application.