Getting Started with Xray - WWDC 2007

Developer Tools • 44:50

Xray is a new application in Leopard that lets you visualize what your application is doing as it runs. Discover how Xray will help you to better understand, debug, and optimize your application. You will learn how to use the included instruments to trace memory and CPU usage, how to create an automated testing template, and even how to create your own instrument using the power of DTrace.

Speakers: Steve Lewallen, Daniel Delwood

Unlisted on Apple Developer site

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript has potential transcription errors. We are working on an improved version.

My name is Steve Lewallen and this is session 309, Getting Started With Xray. I'm the Performance Tools Manager within Development Technologies at Apple and I'm here to talk about Xray. So on today's agenda we're going to define for you what Xray is and w'ere also going to briefly cover a technology you will probably hear often hand in hand with Xray around here at the conference and that's DTrace.

Then we're going to highlight the key elements of Xray that you need to understand how to use in order to make basic use of Xray itself. And finally we're going to have three demos. The first demo that I'm going to give is just a general usage demo. We want to familiarize you all with the different parts of Xray and how you use them, what a work flow is within Xray.

The second demo is more about solving an interesting problem and integration with Xcode. We'll also used an advanced feature of Xray called the UI recorder. And finally we're going to have a talk on memory analysis and demonstrate our memory analysis tools in Xray. So what is Xray? I like to call Xray a meta analysis tool. Most performance tools you have are focused on one particular metric or genre of data such as memory allocations or how much time you code is spending in one function.

But Xray can do all of these things within one unifying interface and part of this unifying interface theme has been to bring in some of our older legacy tools. Over the years, we've developed many different tools. They come from different paths in the company and different history's and they have different UIs that you need to learn and we need to upgrading them and some of them have started to kind of really lack the polish.

Certainly not looking like a Leopard app. So we've built a great UI in Xray, polished it and are supporting all of these tools within Xray itself. And of course I mentioned that we include support for DTrace as well. And finally, Xray is exclusively available on Leopard. It will not run in Tiger.

So why is Xray so great? Well those disparate types of data I talked about, you can look at any one of those Xray can measure many different types, but you can also view them simultaneously over time and you can use time as one way to correlate data together. Now instead of saying well, my app uses so much memory around here, you can say my app spikes in memory between the time I open this file and close it and now I have another context with which to look at that memory spike.

It also provides the ability to mine the data you gather. We have various tools to flip the data around, upside down, inside out, scope it down, to find your particular problem you're interested in looking at. Xray also streamlines the edit, build, analysis cycle and we'll take a look at that in demo number two. And finally Xray simplifies the use of DTrace itself.

So what is DTrace? Well, this is a quote actually from Sun's DTrace manual, updated slightly to fit our needs here. Sun did a great job, very innovative technology in the last few years in Solaris and they open sourced it. We said well that's great technology, we want to use it at Apple and so a bunch of people at Apple from people in the kernel team to the performance team, all over Apple, worked their butts off to port this to Leopard in a year, both for Intel and PowerPC and that wasn't an easy job, but they did a great job at it. Now, it's a dynamic tracing facility that allows you to trace any instruction on the system from the kernel on up and concisely answer any arbitrary question you may have about the system and the behavior of your app and ours.

So why is DTrace so great? What attracted us to DTrace? Well one of the great things is zero disabled cost. Now I was actually sitting in the room during the OpenGL talk and they said something that I've said often about why DTrace is great. They were talking about an error API that they put in their programs and often times developers accidentally leave that in and that slows down the performance of their OpenGL apps. Well this is a reason why DTrace is great, because it has zero disabled cost.

Instead of specifically putting in a code that gives you error reporting and performance metrics in you own code, you can use DTrace to dynamically attach probes, gather some information, when you're done pull them out and it doesn't effect you app. You can do that with apps that are in production and already shipped. You don't need to recompile it to turn this on and off.

So it's dynamic and it's system wide from the kernel on up. We have a technology now that bridges the gap. It used to be that there's the kernel developers and they have their debugging tools and then there's the user developers and there was this gap and DTrace bridges that gap for all of us.

Now a user, a user app developer can actually see what's going on from the kernel on up in his application. That's really phenomenal. So again, it's always available as well. We ship DTrace in Leopard on the user install, on the developer install and on the server. You don't need to install anything to use DTrace itself in Leopard.

And finally, it's scriptable. So we call technologies that we use in Xray to gather different pieces of data instrumentation. So what types of instrumentation does Xray have? Well it has all this technology built on the Darwin Foundation and most of it built on Leopard frameworks. Such as for, example, object allocation and leak instruments, general memory usage instruments.

It has instruments to measure where you're spending your time in your app a.k.a. sampler or if your app is spinning, what's going on when it's spinning. Is it deadlocked? Is it just taking a long time with the event loop? Then we have some new innovative instruments one of which is the UI recorder which will allow you to record the user events that are being sent to your application.

You can use this in a couple of ways. One, you can say well, is my app actually responding to the events it should be responding to? But two, we can flip the coin here and play those events back and automate you driving your own app and we'll take a look at that in demo.

We have another app called the OpenGL Profiler and this will help you determine how efficiently you're using GL on the system. Then of course as I said, we have DTrace also and we've built several instruments around DTrace such as file activity instruments, garbage collection instruments and finally you can build your own DTrace instruments in Xray.

So what are the key elements that you need to understand to make best use of Xray? Well first and foremost is the instrument itself. This is the key piece of technology that gathers data on a particular metric such as memory usage, I/O, etcetera. So where do you get an instrument in Xray? Well you get it out of the instrument library. The instrument library hosts all of the instruments that we include in Xray and if you build any instruments yourself, that's where they will appear as well.

Once you have an instrument out of the library, what do you do with it? You apply it to the trace document. This is where all the instruments that you've selected to measure against your particular target app will appear, as well as all the data that they gather and this is where you will spend you time mining the data and finding your problem.

Now once you've applied an instrument to the trace document, it has certain default settings usually and we've tried our very best to pick good, solid default settings. But if you want to tinker with that and change them, you use the inspector on the instrument. And finally, where else can you get instruments?

Well you also can get instruments from Xray trace templates. These are collections of instruments that we've put together to look at particular problems and you can build trace templates yourself as well. And you can set these templates onto problems from Xray itself, from Xcode, from what we call "quick start" keys and from the command line.

Now one caveat here or something to mention is that as I said, we're trying to unify some of our smaller tools into Xray itself. Well a couple of tools you may have used, Sampler.app and ObjectAlloc.app are no more and in their place, in the performance tools folder, are Xray templates that look pretty similar to those apps with a little template, cool template icon on them. So don't be surprised when you click on those that Xray actually opens instead and you can use those with great effect.

They have even more power than the apps themselves. So let's go to demos. Let me give you a general usage demo. I want to show you how you use Xray and we're not going to solve a real problem, we're just going to kind of play around and find some issues and show Xray itself.

So I have Xray on my dock here, it's normally in developer applications. I'm going to click it and up pops Xray and on the front of it is the Xray template we just saw on the slides. Now for the purposes of this demo, I'm actually going to choose the least exciting template, the blank template, because I want to show you how to use the library as well.

So I have the blank template selected and I'll click choose. Now we have an empty trace document and where do I get the instruments? Again I get them from the library. So I'll go up to the upper right in the tool bar and click on my library button and my library appears. Now these are all of the instruments that we actually include with Xray out of the box.

And I can find different things, I can search in the library, I can say what has to do with memory? Well there's a bunch of instruments that have to do with memory, oh how about file? There's a bunch of instruments that have to do with file. It will search all the key words in the instrument to get these hits.

I can also search by category. So I'm going to go to the top of the library and drag down the splitter here and expand the library node and these are all the different categories of instruments that we actually have and I am actually interested in this category, the system category. Because I'm going to do this first demo with a sampler instrument.

So to apply an instrument to a trace document, I select it and drag it out of the library and drop it onto the document itself. Now I've applied the sampler instrument to the document and I'm going to go and close the library. Now the next thing I want to do is I want to change the sampling rate of sampler from ten milliseconds to five milliseconds. So I'm going to go and click on the small I here on the right of the instrument and that brings up the inspector.

So I'm going to change this from ten to five and now I'm done. Okay. So now we have an instrument, we're all ready with that, now we need to actually pick something to gather data on, to run this sampler against. So I'm going to go up to launch executable and I could attach to a running process for example but I want to actually choose an executable. So I'm going to select that and my chooser comes up and I'm just going to pick chess.

Just so we can play around with that a little bit. Before I exit this dialog, I want to point out that if you have a special setting that you like your application, your target to run with, you can set environment settings in this box as well as command line arguments and those will be remembered for that executable as long as you don't go back and change them.

So I'll select open now on chess and we've selected our target Now I'm actually ready to run the trace, using the sampler instrument against my chess target. So let me do that now. I'm going to use the upper left hand button here, record and I'm going to press that.

Xray is going to launch chess and start running the sampler instrument against it and let me just chess through it's paces again. We're just doing a little usage demo here. Okay that's good, I'll quit. Okay so what do we see? Well a lot of things have changed in Xray now, so let's work from the top down and discuss each of those. So center top is the time display. This is actually showing me the running time of my recording and chess was in the foreground but it was ticking away as the trace was going and data was being populated and as we performed the trace.

So we ran for twelve seconds. Now let's move down a level and we see that there's track data next to the sampler instrument. This is CPU usage data basically, spikes and different instruments will put different kinds of graphical representations and graphs in these track views. Next down in my detail view. Now we saw that automatically disclose as soon as we started the trace. This actually contains the meat of my data. The track view is more of a high level navigation tool.

So I want go dig into this a bit. I want to answer what is happening in this first spike here when the chess app starts up. So I'm going to use a few techniques to narrow that down. So the first thing I want to do is I want to focus in on just the main thread. That's what I'm interested in, just what happened on the main thread. So I'm going to go to the lower left hand corner and under active thread, I'm going to select main thread.

Now we see in the detail view that all the threads other than the main thread have been eliminated from the sampling data. The next thing I like to do is invert the call tree. Because that gives me the interesting leaf nodes of the sample. So I'm going to go about two thirds of the way down to this invert call tree check box here under the call tree options and I'm going to select that. Now I've inverted the call tree and I want to eliminate anything that's not Objective-C. So I'm going to go down a few check boxes and select show Objective-C only.

Okay, well here we are. We've filtered this down some, flipped it over, now I want to focus in on just that one range of time in the track view up above. So what I can do is I can filter on time by click and dragging in the track view.

So if I hold down my option key on the keyboard and then I click and drag my mouse, I'll actually open a small region of time here and we saw the detail view filter down a bit and that's because it's filtered out everything that's not in that time range.

There's also some interesting ways to create that time filter that we won't go into today. So on the top I see the view hierarchy lock, lock for writing. Well what's going on? Why is that one of the hot points in the beginning here when we loaded chess? Well I'm going to look at another view in Xray called the extended detail view and often times, the extended detail view is loaded with stack data.

So I'm going to go to the upper right tool bar item again, this time using the view menu and we see that the detail menu is already disclosed and I'll now select the extended detail view and out slides the stack trace and I can see, I can also invert this, reverse the order of the stack and I can see that lock for writing is being called well basically because we're loading bundles, we're loading the nibs and that makes perfect sense.

All the UIs coming into the app and so things are being locked and unlocked as all that's being built up. So that makes perfect sense. So that is actually the conclusion of this demo. Just to show you around how you actually run Xray. Now let's go back to the slides and we'll review the steps that we took to do that.

So, step one was we had to choose a template and in this case we chose the most boring one of the templates to show you the library. So step two was to pick and instrument, in this case sampler out of the library and apply it to the trace document. Then we configured the instrument and we chose our target, chess.

And step four we actually started the recording, Xray launched chess for us, we gathered the data, we stopped the trace and then we used our various tools within Xray to narrow down onto the issue we were interested in which is start up time. So now let's go to a more interesting demo, now that we're are more familiar with how we basically use Xray. So here's the scenario for the demo.

You are an engineer at a large biotech company and they have these DNA records, they have millions of DNA records and they had someone write an app that would sort these DNA records and the person who wrote those did a simple string comparison type of sort to sort the data and you thought you could do a better job. So let's go back to demo machine one again and let me start that and I'm going to run it.

Okay, so this is our simple app and on the left hand side are unsorted records. For this example data set, we just have a 100,000 and on the right should be our sorted records. So when you've got the app, you perform the default string sort and it took two thirds of a second or so, so then you added your UID Sort. You thought that if you just added a unique identifier to each DNA record, it would be a lot faster. So you just had this idea and you just implemented it and let's see where you got.

Well you didn't do so well and actually, this isn't totally uncommon. Often times we get an idea of how to solve our performance issue without really digging into it a lot and we may have a sound concept overall, but something about the execution of that actually didn't result in the performance win that we had hoped.

So you think yourself wow okay, I thought this was going to be easy, this is going to be a lot tougher. So how do I improve this, I'm going to have to iterate over time probably. I don't want to have to go and drive this app every time and regather the performance data myself, I'm going to use Xray and Xcode together tell me automate all of this. So let's go back to the demo app and quit it and let's go up to Xcode's run menu and take a look at what's under, start with performance tool.

Now in start with performance tool, we have a bunch of templates here. These are all the same templates we saw in the template chooser and in fact any new templates you see here, you'll see in the chooser. So I'm going to select this template at the bottom, UI recorder. I'll select that and let's see what happens. Well Xray starts up and it starts up our demo app. Now I'm going to go through the same thing I did before and do this UID sort that's so slow and now I'll go and quit my demo app.

So what happened there? Well the UI recorder instrument recorded all of the user gestures that I sent to my target app and it can replay those for me. So I'm going to use that feature and I'm going to combine it with that sampler instrument we saw before. So let me go back to my library and let's look for sampler instrument again, there it is, I'll drop it on the trace document again, I'll close my library. Now I'm going to save this with these two instruments, along with the UI recording data which we call in this mode the driver data, as an Xray template. So I'm going to go to the Xray file menu and say save as template.

And I'm going to type in sort test and I'll just say automated sort test of DNA app and I'll hit save. Now I'm going to close Xray completely. I don't want to save it as a document itself. Now let's go back and look at that performance tools menu again in Xcode.

Start with performance tools, now we see a new template, sort test. So what happens if I select sort test now? I'll do that and Xray is going to start up. It's going to start the demo app, then it's going to drive the target app for me. So it's going to perform the first test and then the second test which took longer and then it's going to go and quit the app.

So that's a nice little automation. ( applause ) So what's really cool, one of the first people that saw this was someone who works with game developers a lot and they described to me and said wow you know, often times these game developers, they get this bug report that says like it's Quake or something.

Well I went down this hallway, I turned left, I shot that guy, I turned right, I shot this other guy, etcetera, etcetera and that's difficult for them to reproduce. Often times they hack into their code just so they can get to that point in the game and so this is neat to automate all sorts of applications.

So now let's look at the data it actually collected. So this first large blip here, this was the amount of time it took to do string sort and this rather bigger blip here, that was the time it took to do what was supposed to be the faster sorting routine, the UID sort.

So let's use the same techniques that we learned in the previous demo to find out what the problem is. Okay so the first thing I do is select authorize and in this case this is a single threaded app so it doesn't have a lot of affect. I'll go now and invert the call tree as we did before and I'll say show Objective-C only. So at the top here is RSRecordcalculateUID, so let's disclose our extended detail view to see the call stack.

I'll go back to view menu at the top and say extended detail. And now we can see that at the top is RSrecordcalculateUID and we can see by the subtext below there, all the frames that have source code to them. So let's click on a calculate UID, double click on it and bring it up in Xcode. So I look at this and I say, well you know, this is pretty efficient, I don't think I can really optimize calculate UID. So let's close that and let's go down a frame. What about the UID method itself?

Okay well this is passing back a number with the calculated UID, but how is all this being used? Well if you look in the next few frames, we'll see that an array is being sorted, there is a comparison function called UID compare and that is getting the UID, which computes the UID itself for both left and right and the comparison.

So that could be a lot of comparisons and a lot of computations of the UID and that doesn't change. I mean a DNA strand remains the same the entire time. So what I'm going to do is go back to the way we provide the UID. Let's go back to that frame and I am going to replace this with a new method that actually caches the UID.

So I'll go and delete this now and I will drag in the new method, save that and then I go to my header and add the data member there, save that, close, close and build and now I'm going to use another cool feature of Xcode which basically is restart.

Xcode always knows the last way you ran your app and if you notice, Xray is also running in the background. It still has that other template that we have the previous data with before we made our fix. So I'm going to go back to the run menu, now I can just say go sort test. I don't even need to find the template anymore. So say go sort test and Xray is going to start up and start the demo app again and put it through it's paces once again.

That took about the same amount of time. This takes a lot less time. Now the UI recorder says I've gotta wait around here, this took awhile so it's going to hang out there for a second and now it's going to quit the app again. So now we have performance win here. Now let's compare before and after by using yet another feature of Xray called runs, multiple runs. So let's go back to Xray and click on the instrument.

Move this detail down, click on the inspector and say show all runs and hit done. And now what we can see is the bottom graph is what was before. This is what took so long. The upper graph is what was improved. So let's review this really quickly in slides. So back to the slides.

So the first step was to generate a UI recording and then we can store that away and use it whenever we want. So we used Xcode to launch the UI recorder template against the current executable in our Xcode project. We manipulated our app to teach Xray how to do it itself and then we added the sampler instrument so that whenever we ran, we gather actual performance data. We save it as a template and then we played it back from Xcode seeing that our new template was there in the performance menu. We identified our hot spot and we corrected the performance issue by caching the UID away.

And then we restarted from Xcode using the restart function and verified our performance win. Now I sensed that you thought we had failed there cause it was still longer. So actually, we improved the performance some using one of our instruments but we can improve it more. So I'm going to invite Daniel Delwood, engineer on the Xray team, who is going to talk about memory analysis and see what he can do to meet our goal of beating the string sort by taking the angle of memory analysis. So Daniel.

Thanks Steve.
Take it away.

( applause )

Well howdy. I'm Daniel Delwood, Software Radiologist and I'm here to talk to you today about memory analysis and Xray. So first of all, if you remember one thing I say today, remember that memory is critical to performance, okay. If you remember two things, remember that Xray can help with that. So without further adieu, let me jump right into what memory analysis tools Xray brings to the table and sort of introduce you to them.

So first of all, there's ObjectAlloc. Now this is an instrument that tracks all the malloc calls and the free calls and a target application and records all that data. Now it is a lot of data, but that gives you a lot of power because later on you can go back and get detailed address histories for virtually any pointer. It used to be a stand alone application and we've since improved it by speeding it up and adding features and including it in Xray and I'll talk about this a little bit later before the demo.

Next is leaks. This is a tool that checks for unreferenced blocks of memory. If you've allocated a block and since lost all references to it, you obviously can't deallocate it and it's a leak and it returns that and it also tells you the allocation point at which the block you know, is created. So that's another really powerful tool.

The memory usage monitor is a slightly different tool. It's very, very useful because it records high level statistics such as virtual size, resident size, the number of page faults, number of copy on write faults that your target application makes and graphs them over time. So the good part about his is you can visually look at your data and say, okay when did my memory usage start to spike? Was it because I clicked a button? Was it very slow over time? Memory usage monitor will help you find that.

And lastly I just wanted to mention that DTrace, which Steve talked about earlier, is a very, very powerful technology for you as the application developer to use the knowledge you have to solve your memory problems. Because you know the most about your code, you can ask specific questions, probe specific functions and find out well why is your memory spiking, why is it not being used well? So what are some, what's the benefit of these all being together? Well you can use them all at the same time in Xray. All in the same document.

So what are some common memory issues? Well leaked memory like I said, you can use the leaks plug in or the leaks instrument along with ObjectAlloc to get more detailed histories for those leaked blocks. If you're facing large footprints, you can use the memory usage monitor to find out was it a slow, gradual growth over time possibly due to leaks or was it something that you did suddenly? If you're facing pointers to freed memory, getting exec bad access, I'd use ObjectAlloc, leagues, DTrace and really search this.

And then dynamic memory usage is a little bit less known of a memory problem. This is when you allocate an object or just a block and then you free it and then you allocate and free and allocate and free and do this a hundred thousand times. It's a really big performance problem and it's something you just don't need to do. You can create an object and use it for the whole lifetime or just create a buffer and so that's what dynamic memory usage is about and you can use ObjectAlloc to help track that down.

And also, if you're tracking down memory fragmentation, this is a very, very hard problem to track and if you use all these instruments together, you can actually get a handle on it and it's really amazing. So I just wanted to mention that as well. There's a little bit more in ObjectAlloc before we go to a demo of it.

It's not just for objects, it's sort of a misnomer there, but it tracks all of the malloc calls, all of the free calls and if you want, all the retain/release/autorelease events too. This allows you to really get that detailed information. You can get statistics on what type of objects were getting created.

Was it just those three really, really big buffer allocations that are using up your memory and you forgot to get rid of them or what it the hundred thousand NS strings that you accidentally left around? You can find out that information as well as view these as call trees to find out what point in your code was responsible for those allocations. Also, you can get pointer histories by using all these events and giving it a pointer, you can find out okay, what was the life cycle of my objects. That's what ObjectAlloc is all about, it's a memory life cycle tool.

And finally, it does work with garbage collected apps. The only difference between a garbage collected app and a normal application when viewing it with ObjectAlloc, is that you don't actually contain, or you don't actually control when the frees occur. So under a garbage collected app, the free is due to the garbage collector cleaning up after you and any other blocks that you allocate manually, you'll definitely want to watch as well.

One last note on leaks before we go ahead and demo is that it does a static memory analysis of your target application which is really, really accurate except that it's not for garbage collected environments. This is something I just wanted to mention since garbage collected environments and Objective-C 2.0 world are a big thing.

What it does is it looks for all the malloc'd regions in your target application and searches through them and tries to find all the blocks that are referenced and anything it can't find as referenced, it returns as leaked and that's how you can really track down those memory problems. It was a command line tool only and now we are both including the command line tool and the leaks instrument and it works alongside ObjectAlloc to give you those really detailed histories for your leaked blocks.

So anyway, we move on to a demo. Steve showed us the DNA sorting demo and we didn't quite reach our goal. Our goal was to be faster than the alpha numeric case and sensitive string compare there and we just didn't get there. So what I'm going to do is I'm going to show you some of the memory analysis tools in Xray and we're going to look at just the memory usage with really no concern of performance and at the end we'll go back and say, well how much did this really help and we'll take a look at that.

So I've got the same application that he has and I'm going to go up to my run menu and instead of using his really cool sort test, I'm going to use object allocations plus sampler. It's one of my favorite templates and it lets me set my own settings and save them for next time I want to use this template on this problem or other problems.

So I select the template and Xray starts up and in the background you can see that ObjectAlloc is graphing the number of graphic bytes in my target application and so if I go ahead and press the UID sort button, whoa, look at that spike. That's probably bad. So we need to look into that and I think I've got enough data, but one thing to point out here is that while Steve was using 100,000 records, I'm only using 10,000 records.

ObjectAlloc really does record a lot of data and it does have a little bit of an overhead because you're recording every single event and so while I could have done 100,000 records, it would have taken longer and it doesn't give me the information I need because 10,000 records is just sufficient.

So I'll quit my application and looking at the ObjectAlloc instrument, I can see that the detail view down here is showing me the summary of all the objects I created and the categories and so I can take a look here at the, let's see here, RS records and you can see that 10,000 were allocated over the course of the application's lifetime and 10,000 were still alive at the end of the application when I quit it.

The only thing that's unexpected are these CFNumbers here. Look at this, only 432 were still alive at the end of the application's run and I created over 279,000 or 278,000 of these while doing the sort. Hmm, that's probably a problem. The other thing that's interesting to note is that by using sampler and the graph as well, you can see that it really did take a lot of processing power when the memory spiked. Interesting.

So I think it's probably the CFNumbers that are at fault. I'm going to select the second view, the call tree view here to find out what point in my code is responsible for these allocations. And so I could just go in from here, but I'll focus on CFNumbers since I'm not really interested in anything else at the time.

So I click the arrow and here we go. I'll bring in the extended detail view so we can see a stack view, oops, and there we go and since I like customized, heaviest stack trays view here, I'm going to go ahead and pull down on this gear and removed the source location and removed the library name and I'll go ahead and invert the stack so it shows as I would more expect.

So anyway, we see here that 4.26 megabytes worth of CFNumbers were allocated in the program's execution. So if we follow this down on the right, we see a lot of these BSD Q sort frames and it looks like it's sort of branching out through there because it's a recursive function. Well we've got data mining so I'm going to go to the left and select flatten recursion and boom, there we go. So I'll select frame here and watch that 4.24 megabytes comes all the way through to our code in RS record UID. Hmm, that's interesting.

If I click right one frame below that, I can see that they all come from and it's number, number with unsigned long, long and it looks like we can see the problem. It looks like what we're doing is creating an NS number number every single time our UID method gets called, which as Steve showed earlier, was not a good time to calculate the UID and probably not a good time to create and NS number either. So what I'm going to do now is look at the lifetime of one event or one object and find out why the spike occurred instead of sort of a flat up and down.

So I'm going to go to the third view, the list view and I'm going to go up to the graph and drag my inspection head and if you can see below, the events are going by and matching the time of my inspection head. And so here is a bunch of CF numbers and I'll just select one of their addresses and boom, right in the middle, malloc auto release, CF release and free, there's the allocation life cycle of this object.

So it was malloc and RS record UID, auto released and RS record UID. Hmm. Interesting. So what it looks like we're doing is using the NS number, number with unsigned long long method. We're creating an auto released object every single time and it waits until the end of the event loop to release all those objects. Which is why we see the very, very visible spike. Let's go directly back to the code and fix this if we can.

So I'll double click, pop right back to the code that Steve saw earlier and I'll replace this method with a method that caches the NS number and of course, I'll need to go back to the header file, add an instance variable for that and I should probably go up to the dealloc method and add a release just so I'm a good citizen. So I'll go ahead and add that, build my entire, build my application and go back to my run menu and hit go.

All right, so it's starts up the application again just like last time, but to have more of a side by side comparison, I'm going to go ahead and select the show all run option right now. So if I hit the I and hit show all runs, you can see that it's comparing them side by side and showing the memory usage of my application. So if I hit UID sort now, wow, it's done and no memory spike. Excellent. That's exactly what we wanted to fix.

So all right, just to verify that our objects were created in a more reasonable fashion, I'll go back to the summary view and remove the extended detail view so you can see this, but there are only 10,964 CF numbers created during the run. Excellent. All right and just for the last comparison, I'm going to go back to the run menu from Xcode and just run the application separately so that we can see what a performance impact of doing a memory analysis on this application was. So I'll set it back to 100,000 records, string sort it, 0.7 seconds. UID sort it, 0.48. There we go. We got our win.

So back to slides.

( applause )

So what did we just see here? Well we started our application from Xcode by using the run with performance tool and selected my custom template that I can use on multiple different documents, or multiple different applications. From there we took it to ObjectAlloc inside Xray and looked for the summary view to hunt for the overall problem. To find out where our memory usage was going wrong.

Then went to the call tree view to find out the culprit, to find out what part of my code was responsible for this and then because we were interested in why the spike occurred and the life cycle of one of these objects, went to the object list and really got the details and got the exact lifetime of our object.

Finally we went back to our code, found the offending line and made a fix that cached the UID so that only one of them would be created, not, per object, not 27 and then we 'went back to Xray and verified that the fix did in fact solve our problem.

So what did we learn? Well memory analysis is critical to performance. Just by simply caching that one instance variable, we gained a 4x improvement in performance. That's great. So Xray has really powerful tools to help you look at the memory usage of your application, ObjectAlloc, leaks, memory usage, DTrace and more. You can create you own.

The side by side instrumentation allows you to use these all at the same time and really correlate the data and you also get the common Xray benefits of time scoping, of persistence, document persistence, saving it, emailing it to one of your friends and more. So with that I'll return it to Steve to give us some closing thoughts.

( applause )

Hey Daniel, good job. Thank you.

So that was a great talk on memory analysis in Xray. So in closing, just to summarize, Xray is a great way to visually and mine your performance data. It has a lot of different instruments in it. You can use it to correlate disparate types, a new unique power that you haven't had in other tools before. You can use it to leverage DTrace by building custom instruments and you can use it to automate you work flow.

Now speaking of DTrace, there is a talk tomorrow on dedicated entirely to DTrace and I urge you to go to that and then on Friday there is a talk about Xray and DTrace combined. This is sort of our advanced session and we're going to talk about a lot of cool things in there, including other ways to invoke Xray on your targets that are novel and interesting and helpful. DTrace, building custom instruments in DTrace, in Xray and doing remote tracing.

Building instruments in Xray, exporting the script as DScript, running it, generating output and importing that data back into Xray. So I urge you to go to that session as well. It's really going to be exciting. And finally, you can seek out Matt Formica, our 64-bit Dev Tools Evangelist for more information or you can go to the developer website at Apple and you can go to Sun's website for their DTrace manual and there's also a lot of good information on the web about DTrace.