Easy and Powerful XML Processing with Cocoa - WWDC 2004

Application • 39:25

From the operating system to file formats to the exchanging of data on the web, XML plays an increasingly important role wherever an open, cross-platform, scalable solution is needed. This session is for developers who are familiar with XML and want to learn about how new Cocoa APIs in the Foundation framework can enable processing, creating, or transforming any of the many XML-based file formats, web protocols, or representations in use today.

Speakers: Sarah Wilkin, Helena Ju

Unlisted on Apple Developer site

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.

Ladies and gentlemen, please welcome software engineer Sarah Wilkin. Good morning, everyone. So I can probably guess why you're all here, which is I got this quote from the Cocoa mailing list. And someone asked-- oh, well, that's me. Am I missing something, or is there actually no XML object for Cocoa? Well, I mean, there isn't until today.

So what you've had so far is there's the SACS parser, and with that, you basically get a bunch of callbacks as you go through your XML. And that's useful to an extent where you want to build your own custom objects, but often you just want that final thing at the end. And then there's CFXML tree, which has a lot of limitations.

In particular, it doesn't have any sort of namespace handling. So again, these are some more quotes from the developer list. People are asking not just for an XML object, but they're asking, oh, well, you know, I want XPath, and I want XQuery and XSLT, and I want it to fix my malformed XML, and also I'd really love if it could do transformations and all sorts of magical things. So that's what I'm going to give you today. We're going to introduce... And SXML.

So what is NSXML? Well, it's a set of five classes in Foundation. You can produce XML from scratch. You can read in an XML document. Or you can start from HTML, convert it into XML, find all your leaf nodes, convert it, pull things out, put other things back in. And then just in one line of code, you can apply XQuery, which is a new W3 standard, or XSLT.

This is all you need to know. If you know a little bit of Cocoa-- because I'm going to be showing you quite a bit of code later on-- and if in Safari you can go view source and say, hey, you know, this is the text element I'm looking for, this is the attribute I'm looking for, then you should be good to go. If you're starting from scratch, also you just should know something about the DTD or the schema so you can build valid XML.

So XML, it's everywhere, right? Everywhere you look. So you have XML, it's just straight XML. If you go to the w3.org website, you'll see there's all these different standards. And you can go all over the web and find there's information everywhere. And we want you to use it.

Then on top of that, you have the file formats. So Keynote actually stores its data as XML. It's not even exported. It's actually already XML. And Illustrator, you can export as SVG and Final Cut Pro, which we're going to be showing later on. Then you have PLists and also Services.

So scraping. I know you all do it. And I'm trying to put a stop to that. As I mentioned, you can do this custom parsing, which is not that great. But otherwise, you're left scraping these strings out. And you're trying to create this XML, but you know you're going to end up forgetting an end tag, or you're going to do something a little bit wrong. And then somebody else wants to use your XML, and it's malformed. So please, stop. Use this instead. This is NSXML.

So at the top level, we have NSXML node, which everything inherits from. So it covers just your basic things-- attribute, namespace, processing instruction, and text. And these things may or may not have a name, and they'll have an object value and say where they are in the tree. Then you have NSXML document, which holds everything. And it will also have a version and a character encoding.

Then there's NSXML element, where you have your list of attributes and your list of namespaces-- very important. And for those of you that are into DTDs, we have NSXML DTD, where you have your system and public ID. And it stores a whole bunch of DTD nodes. So they'll tell you what can be valid or not.

So before I talk more about this, let's show you some code. So it's really easy. You just have your URL. And then there's three different ways you can start creating your document. You can either init with contents of URL, init with data, or if you've created a root element from scratch, just use that. So I'm going to show how easy it is by walking all the way over to this demo machine.

So here I have basically the same lines of code I was just showing. And what I'm doing is I'm getting this from apple.com/rss. So as you may know, RSS is this really big thing. We just announced Safari RSS. And there's all this information out there available to you. So normally, now that Safari RSS is here, if we click on this, we get this cool view. But we just want the data.

So I've saved this URL, and this is it here. And then I just allocate it, and I'm turning on Pretty Printing, which is a special option to make all the indenting go in. And then there's also a root element on the document. So I'm going to go straight into the debugger.

And let's see if I can-- make sure you can see this all right. All right, so let's say I want to print out-- I said it has a character encoding, right?

[Transcript missing]

Child count, and of course, there's all the things you'd expect, like child.index and replace child.index. All right, let's go back to slides.

So instead of doing this, please, NSXML is for Cocoa lovers. Use this. Be happy. Instead of doing this custom parser with SACS, we want you to use these first-class objects. Instead of doing all this scraping, just use the methods that we have. Instead of being difficult, it's easy. And instead of it being error-prone and malformed, hey, we'll even fix it up for you. So this is, you know, instead of Apple starting from scratch, obviously we're going to use these great libraries that are already out there.

So already, LibXML2 was in the system for Panther. Well, now we have LibXSLT and Tidy. Then on top of that, we have all of our standards. So if you go to the W3 home page and look at their specifications, well, we've tried to adhere to that pretty well. And as I mentioned, we've got this whole XQuery engine in there, which I'm going to show you some XQuery later on, because I think it's a really powerful language for processing XML.

And previously, if you've developed any Sherlock channels, that's what we were using it in. We were using XQuery and JavaScript for channels. And last year, we had a Sherlock session. Everyone's like, hey, why can't I use XQuery outside of Sherlock? Well, here you go. And then NSXML encompasses all of that.

So we're going to go into a pretty long demo. But before we get there, this is sort of what I think is what you're regularly going to do with NSXML. Is you have some data out there. You find it. You're pulling a little bit out of it. And then you have some sort of template. And you're pushing in that new data. Now I'd like to invite Helena Ju up to the stage. Helena works on Final Cut, which can now import and export XML with Final Cut Pro HD.

We're not ready for the demo machine yet. If we could go back to slides. Thank you very much. OK. Let's just say what our problem is. Well, it's a typical problem. You've got some sort of person, an artist that's done a really great template. And then you have this database, or you have all this information somewhere, and you want to put it in. So typical sort of separation of content and display. So in particular, I already showed you the iTunes Music Store RSS. And we're going to use that again for this demo.

So we want to have some sort of template where it has the song name, artist name, the rank. And we're going to add in some new information, which is the rank change. So it's always bugged me that I can check out, well, what are the top 10 songs?

But are they new today, or did they change from yesterday? Is number one going down, or is it going up? So we're going to add that in. So go to the RSS. And then this is what we want to make, is put this information into the template. So now we'll go to the demo machine.

And one way we could do this is just make some simple XML and apply a style sheet to it. And Safari can actually deal with that. So you have this processing instruction that says, this is my style sheet. And you don't even need to bother writing HTML, and it can look great. And we've made this demo for you that will be available at approximately 6:30 this evening. So you can try this out yourselves. And we're just going to modify it a bit to make it better for stage. So we're just going to build and run this.

And here we go. So this is XML with a style sheet. It's not HTML. So let's just check out, view the source. Right? So we're going to do something really similar to this. But the web, it's good to a certain point. Can we go back to the slides now?

But really, we want to do something more exciting. We want to make a video. So we're going to do something in Final Cut Pro instead. So I just want to give you a summary of the algorithm we're going to use so you don't get lost along the way. So this is what the RSS looks like. I briefly showed it. No, I'm just showing it to you now for the first time. So we have a whole bunch of items.

And there's this namespace, ITMS. So instead of just being a straight, regular RSS, we've added in this namespace. And it contains a whole bunch of information attached to it. We have the artist name, the album, the link. And we're going to be using the link as our unique ID to compare between the two different versions of the XML.

Then we're going to use XQuery and extract out the information we want and, at the same time, make the XML a lot simpler to understand. So this is at the point that I left the XML for the sample code that you'll be looking at. So this is sort of as far as you go. And then you apply the and you're done. But here we're going to go a bit further, where we're actually going to go into Final Cut Pro and export some XML from a template and then pull all those things together.

So our data source. As I mentioned, XML is everywhere. There's lots of places to find it. I really encourage you to-- If you find one data source, to keep looking, because the coolest applications of this are really when you can bring together lots of different sources. So instead of stopping at showing the top 10, well, maybe I should go look up information on the artist, or find related tracks and also display that. We've already found the data source, so we're not going to bother showing you that on the demo machine. Next, we are going to extract the information.

So here, we're going to use XQuery. And of course, to use XQuery, we need to know a little bit about XQuery. So XQuery, at its basic, is it incorporates XPath. And XPath is quite straightforward. So if you're in your shell and you can move up and down directories, you can do basic XPath.

Dot is your context item, just like it is in your directory structure. And if you want to go dot slash with a name, well, that goes one element down the tree. So say I went dot slash a, that would give me all my links. If I went dot slash slash a, that would give me every single link in the tree, instead of just the children names that match. And dot dot goes up.

So that's really straightforward. But then we can add these predicates in, which makes it much more powerful. Because inside a predicate, you can have any XPath or XQuery function you want. And there's really a lot of them, which I'll show you in a minute or two. So you can do string matching, including regular expressions based on the Perl-compatible regular expression library. And what else? Match based on what kind of element it is, text or attribute. So we'll go back to the demo machine now and show how easy this can be.

So with our iTunes RSS, which here it is, in Objective-C we could say, oh, give me my first child, move down, give me my next child, walk through the tree, try to find this, try to find all the items. And the items are-- could you just scroll down a bit? Yeah. So we have this whole series of items.

And if all you have is items as your children, well, that's fine. But what if there's a comment in between? Well, now suddenly you have to walk through all of the children just looking for the ones that are items. So XQuery cuts out this whole stuff. So if we go to the XQuery tab, if we want to find all the items, all we type is ./item.

And there they are. And say we want all of the song names on those items. Well, that's itms:song. Okay, there's all of our song names. Say we want the song Links, well, we just add on Link. So it's really, it's that simple. I mean, that's a lot easier probably than whatever you've been doing. So if we can go back to slides now.

Now, if we want to-- so as mentioning all those functions, here's some of them. There's around 144 of them. I really encourage you to use xQuery when dealing with XML as a more fully fledged language. So if we want to do something to each of those items, then we need some sort of iterator, right? And in xQuery, that's called the flower statement.

So it's really similar to an SQL. You're a select, order by-- I haven't used SQL in forever. I like sQuery a lot. So your flower statement here, you have for, let, order by, where, and return. And a lot of xQuery, you'll see, will just have a whole bunch of for and let statements. So for is your iterator, and let is your assignment statement.

Then you order by and you always need to have a return, which is sometimes confusing for new people to the language. You can't go let, let and return. You have to have the return. If you have a really big XQuery file, we also encourage you to put it in a separate file, which we'll be doing. Whereas if you just have a one-line XPath, you should just write it straight within the code. Back to our demo.

So back in NSXML browser, we just want to show a simple flower statement before we get into the code here. So if we want to loop over all of the items, we want to say $4 item. Dollar is the way of denoting variables in XQuery. And we're going to grab each of the song names. And also get the album title and then concatenate them together and just return that and then join the whole sequence together with returns. So if we run that.

So in XQuery, everything is a sequence. Even if it's one item, it's still a sequence of one. And you can't nest sequences. So in our return statement, if we'd written a return of a sequence there, it would just end up appending them all together. It's just a little gotcha that you want to watch out for. So now we're going to go straight into the code. And go back to-- I talked about step one, but we didn't show the code because the code is so similar to what I did in the first demo.

So here we are getting our old RSS and our new RSS. And these are sort of pre-canned versions I saved. I think one of them is from last Friday and one is on Monday. So that will be the two lists that we're comparing. And we go to step two. Step two is where we're actually applying the X query.

So here we're getting, as I mentioned, we want to store it in a separate file because our X query is quite large, which we're going to go through in another minute. The NSBundle, we're just grabbing it. And then the next thing you'll notice is this constants dictionary. So XQuery, as I mentioned, since it's just a one-line function call, it can only be really applied to your current, you know, that one object that you're calling it on.

So we can say new RSS objects for XQuery, but we need some way of getting the old document in there. So to do that, we have this constants dictionary. And this key is going to match something that we declare in the XQuery. So it's a way of just passing information in between from Objective-C into XQuery.

And then the last line is our one line that we do to actually apply the X query, and we're going to get back this big sequence of items. So now, instead of looking at the XQuery in Xcode, which has no XQuery syntax coloring, we're just going to look at a canned RTF file.

Here it is. So XQuery comments are these really cute, smiley faces. So you're always happy when you're working on XQuery. So this is what I was talking about, this first line here. This is where we're bringing in that object that we declared in our Objective-C. So now we've got these two documents. One is the context item, which is dot. And then the other one is $olddoc. Then we're going to loop through. So this is just a really big version of the small snippet of XQuery I showed you just a couple minutes ago.

So we're getting a few more attributes than we did before. In addition to the song and song link, we have the artist and album. And you'll see here where it goes slash text, parentheses, that's matching a text node. So you can also have slash attribute, slash element. So it's really easy to pick out. We'll say I want all the comments. I go slash, slash, comment.

Then we want to find how the old list compares to the new list. So there's that song link, remember, which we can use as the unique ID. So we're using this predicate statement, which, based on the song link, in that entire array of old items, it's going to find just the one that matches. And of course, there's the chance it doesn't match if the song is brand new. And we'll deal with that.

So then we have this little if statement, you know, standard stuff. And we're just building up this string that says, well, did the song go up a notch or down two? Is it brand new or is it unchanged? Then the last thing we do, if you can see that far. Helena, do you want to just move the whole thing up a bit, the whole window?

So another great thing about XQuery is instead of having to write in line, you know, left, blah, blah, blah, blah, right, and make sure you match your start and end names, is they have these constructors. So this is an element constructor. It's going to build an element with some attributes. And our attributes, we're giving special names here, because we're going to use these to match the names that are in Final Cut Pro.

And here, also, a neat thing about the Compute Constructors is you can put any variable you want in there. I mean, it's not like hard-coded XML, right? So at the very end of this, our loop is done. We've created this list. It's a list of items which have these four attributes, which contain information from the original RSS plus the rank change. And now, if we can go back to slides.

So now, Final Cut Pro, we want to create our template. So again, there's lots of different ways of creating this template. If you look at applications now, they can do so much XML that it's crazy. And when you go in Illustrator and you've created this beautiful graphic, and then you can export it as SVG and programmatically turn it into whatever you want.

And I'm really excited about that because that's one of the annoying things for me is you get into this program, and instantly you want to do this. You want to do something really repetitive, especially in Final Cut Pro. And now you can just export the whole thing, use a little bit of NSXML and XQuery, and import it back in, and it's all done for you. So this is a couple of things we're going to be looking for in our exported Final Cut Pro XML. A generator item is going to be our text elements, and then the item will be the actual text. Let's go back to the demo machine.

Here we are in Final Cut Pro. I know for most of you like me, this was my first time using Final Cut Pro for this presentation. And it was really fun, and it wasn't that difficult to get something going. And then once I had this little snippet there, well, then I could manipulate it with my mad Objective-C programming skills and build this whole really cool presentation.

So here we have our template item, and we've got these four text elements, each on their own track. So this is important to keep in mind, because what we're going to do is duplicate each of these along the track. So we're moving forward in time, and we're going to change this to actually be a real countdown. And underneath, we have a little bit of audio and a little bit of video to spice it up. So we can just export this.

And Final Cut Pro, it actually exports everything that you do in it. You think, well, maybe I'm doing this really cool transition, and it's not going to be able to describe it in XML. But Helena and others have done such an excellent job. It's all there. So we'll just take a look at what the exported file looks like. So we're giving it this name that we want to remember when we're back in our code. And we're just going to search for one of those strings.

So if we scroll up a bit until we hit a generator item. So there's that generator item. This is our generated piece of text. And above that is track. So that's the track that the generated item is on. So what we want to do is take out this template item, because we really don't want something that says song name, and change something inside it, then push it back in multiple times for each of the four things on their four different tracks. Now if we can go back to slides.

Okay, this is our last step. I'm realizing now you'll be out quite early. So we're going to insert our changed copies. So we had the XML that we created from XQuery, and we have this XML that we've exported from Final Cut Pro. Then we're going to put it all together.

We're going to use some of these NSXML functions. So on the left-hand side, you have all the sort of attributes of your things. And then on the right-hand side, you have everything dealing with children. So I mean, as you can probably guess, these things end up being KVO compliant. So we'll go back to the demo now.

I forgot a step. Step three. Okay, after we exported the Final Cut Pro XML, and I showed you where that generator item and the track are, now we want to find those. So we're importing the template, then we have our array with the four keys. And remember, we named in Final Cut Pro, we named our text with the same names as the attribute names that we export from XQuery.

So we're just going to loop through each of those four things. And we want to build one array that's the generator items and another array of the tracks, because ultimately, the generator items are going to disappear from the original, and we want to insert things into the tracks.

And there's that little bit of XPath there, which I should explain. So we want to find within that big exported file just that one particular generator item, right? And to do that, we go look for all the generator items. Then we have this predicate where inside there, we look for a text element that matches based on the name of wherever we are inside our array. Then we add the generator object and also its parent to text nodes and track nodes.

The last thing we do in the loop is detach it, which is obviously another NSXML function, which just takes something right out of the tree, basically setting its parent to nil and deleting it from the parent. Okay, step four. This is the last step. So you see length there. This is so we can move ahead in time. We're going around 30 frames a second. We want each thing to play for two seconds.

Then we get that first item. So we've got these 10 items from the XCore, remember? So that's current. Then we're looping through our four special keys. And the first thing we duplicate is the actual generator item, because that's the one we want to insert our new text in. Then we come up with our replacement string, which is the attribute with the same name matching one of those keys.

And now we find just the text node. So we do a little bit more XPath just to find the text node. Because remember, we don't want to replace the whole text of the generator item. We just want to replace the text where it belongs. So we set the string value.

And then that dot slash start and dot slash end, those are our start and end times. We want to make sure that we actually move forward in time and don't end up with our 10 clips all on top of each other. Then we add our child and release it. And we're done. So, I mean, that's really all there is to it. So let's go back. Let's run this. So see at the very end there, we just write it back to a file.

OK, and then we can go back into Final Cut Pro and import this. And it should have our 10 bits from the RSS in Final Cut moving along in time. So we can see we've added that additional information of whether it's unchanged or moved up and down. So let's just switch to our pre-canned rendered version of this and play it back and see what it looks like.

The Expendables from Santa Cruz. They're really nice. They lent us this footage and their music, so we're not stepping on anyone's feet. Now let's go back to slides. So in addition to everything I just showed you, I just wanted to show you a few more header highlights. So in addition to exporting XML, we also can export XHTML or HTML. And at this point, I will encourage you to read the release notes, because there's some things that aren't quite finished. And there's a couple known bugs that we don't want you hitting upon, but there's generally workarounds that are known. You can also set string value resolving entities.

So this is something that I really like, because it means in your DTD, you can just say, hey, this string should resolve to this other string, and then pass it in with the entity and have it resolve for you, so you don't have to parse through your string and replace it later on. I showed you objects for XQuery. There's also object by applying XSLT, which is another great language slash technology. It's sort of, in theory, similar to XQuery, but I think XQuery is a bit more powerful.

In XSLT, you can also do some other things, like you can export RTF or you can export PDF, depending on how you write your XSLT. So that's why it's object by applying XSLT instead of document, because you can get either NSData back or you can get an NSXML document.

Then there's also a resolved namespace for name. As I mentioned, we put a lot of work into having the namespaces just work. So when you add something, it will actually go and find the namespace that it's associated with, which means you can say, hey, what's the namespace for bar colon baz, and it will say, oh, it's http bar.com.

Then we have fidelity, which for those of you that are using straight libXML2 right now, is something I know you're not getting because it was quite a bit of work to get this in there. So what I mean by fidelity is just that your input, you know, you probably want it to look like your output. So if you've got white space in between your elements, you might want to keep that. If you use single quotes or double quotes on your attributes, you know, you might have some reason that you want to keep those around.

So that's a big fix as entities. So all of this stuff. Also keep in mind, though, that this will be a performance hit. I've tested this. If you turn on all of these options, you'll get about a 50% performance hit on parsing. So just use the ones that you really need. Then, bindings. So you briefly saw the little NSXML application. And since we have a minute, I'm just going to show you a little bit more of it. If we could go to the demo machine.

So there's a whole tab here, which I didn't show you, which is where the whole KVO, KVC comes in. So we had our source here, and there's this Editor tab. And you can just move down the tree, move up the tree, and edit things. Here you can see that there's some namespace and attributes.

And find information about the document. And it really didn't take me long to put this together. It was an exercise in seeing if our bindings worked. And they do. I mean, it's great. It's basically-- you can build a whole XML editor really quickly. And it also wouldn't take much effort if you guys went to the What's New in Cocoa talk. They said that there's new bindings for trees. You should be able to just put an extra class in there that steps to this and be able to do an outline view editor in a day. And let's go back to slides.

That's it. I want to see what you can do. We're giving you all of the stuff. There's that option NSXML document tidy HTML, which takes your HTML and converts it into valid XML. You can also tidy up XML. And as you may have guessed, we're using the actual HTML tidy library, if you've heard of it. It's also available on BBEdit.

There's our five classes, NSXML document, node, element, DTD, and DTD node. And one of the first XQuery engines, as I mentioned, XQuery is now... Its final call happened in February, so it's really getting to be a stabilized language. We don't expect major things happening like they have been the past couple years.

Please, file bugs. I know of a bunch, but you're probably going to find a bunch more. So I can't find out about them unless you file them. So get on your bug reporter, file them in. Also, enhancements. We know that we haven't covered everything that you're interested in. I'm especially interested in hearing what people want more from the DTD stuff and validation. Matthew Formica is our Cocoa evangelist. He should be your first sort of email contact. And then there's John Galenzi, who you probably met at the HHI talk.

We have a bunch of great documentation that Terry put together, who you'll see in Q&A. In addition to the regular Foundation web page, where you'll see the new classes, there's also this whole prose-based, hey, what's NSXML? And it goes into how XML browser was put together, how the bindings go together, little code snippets.