Mac • 57:10
Instruments is a versatile and powerful analysis tool for visualizing, understanding, and optimizing your Mac or iPhone application. Discover how Instruments has evolved to analyze Grand Central Dispatch, profile launchd, perform fine-grained sampling, and offer other new data collection views for tuning your application.
Speaker: Lynne Salameh
Unlisted on Apple Developer site
Downloads from Apple
Transcript
This transcript has potential transcription errors. We are working on an improved version.
Good afternoon everyone. My name is Lynne Salameh and I'd like to welcome you to What's New in Instruments.
[ Applause ]
All right here's today's agenda. Now there's a lot of ground to cover and a lot of new features that we've introduced in Instruments and Snow Leopard. But really all-- our goal was to give you the best user experience you've had with Instruments to date. We want to make it very easy for you to find the data that you're looking for, data mine it and display it in Instruments.
We've introduced a whole set of new instrumentation in Snow Leopard and we've improved the overall performance of Instruments. And without further ado let's go ahead and jump right in and begin with a demo of Instruments and Snow Leopard. My demo application is an application called ImageWare, which I just brought up in Xcode. And this application takes PDF pages and renders them into thumbnails onto a screen. It also goes through each PDF page's metadata and tries to determine the number of images that each PDF page contains and displays it as the subtitle of each thumbnail.
So I'm going to go ahead, build it and run it and see how it behaves. So if I just click start, my application is going to start reading in PDF thumbnails and suddenly it slows down and I've got this big spinning beach ball. So this is obviously an area that we'd like to investigate in terms of, in order to improve the performance, and now it's done.
So in Snow Leopard, if I bring up Instruments over here, we've introduced a new instrument called Time Profiler which will allow you to do very strict, statistical sampling of your target applications and interferes minimally with their execution so all the timing data that you get back is very precise.
Here in my new target chooser in Instruments, I'm going to select the CPU category from the left and double click on Time Profiler to bring up the Time Profiler instrument and let me just enlarge this. And over here from the executables I'm going to launch ImageWare and hit record.
All right, if I bring up my app, hit start and you can see over here in Time Profiler we saw a spike in CPU usage as we began rendering these PDF thumbnails onto the screen and my application is actually hung right now. When it completes the rendering onto the screen, the CPU usage drops right back down. So we've collected all the information we want.
Let's stop recording and take a look at this data that we've just collected and try to see where we can find areas where we can optimize. All right so the data that I'm seeing over here in my Call Tree View is frames from stack traces that I've collected in my Time Profiler sample. Right now it's displayed in inverted order which is a setting that I can set over here on my left corner, so invert Call Tree, which means that my deepest stack frame is displayed first and I can view my call history in reverse.
What I'm going to do is I'm going to bring up the extended detail view to look at what each of these stack frames, which back traces each of these stack frames came from and to do that I'm going to go up here to the right corner and bring in my extended detail view.
All right, so if you look at these, and I'm not sure if you can read them, let me just zoom in a little more, all right, so if you look at these frames, a couple of them seem to be occurring in libraries that I don't control. For example this frame, PDF lecture scan, is getting called from CoreGraphics library and InflateFastAdler32 is getting called from libz.l.dylib and if we take a look at the extended detail view for one of these frames, let me just, and zoom in a little bit, you can see that these frames are getting called somewhere in our code over here in ImageWare, calling these frames in libraries and frameworks that are system frameworks that I don't control.
Well, in Snow Leopard we've introduced this new concept of data mining that will allow you to manipulate your call trees in order to let you focus in on areas in your code that you can optimize. So to show you how that works I'm going to control click on this symbol, PDF lexer scan and it brings up a bunch of options that I can perform on my Call Tree.
I'm going to charge the symbol to its caller, meaning that I'm going to take the time that I spent in the symbol, currently around 4 seconds, and charge it to its caller. As we can see, now CGPDFScanner appears to be the heaviest or the symbol that we're spending the most time in.
Well, I'm going to go ahead and charge all of CoreGraphics, the entire library, to its callers. So I'm going to hit Charge Library to Callers and with these two simple steps I can now see that I'm spending 36% of my time, let me zoom in again so you guys can see better, 36% of my time, about 5 seconds, in this function, image item, images and page, which is available, it's in my code, it's in ImageWare itself.
So what's going on here? From the extended detail view I can actually double click on my code and it will take me directly to my source code within Instruments. I just saw imagesInPage over here and the Time Profiler provides with annotations, performance annotations about where I'm spending the most time. So within imagesInPage I'm spending about 99 percent of my time in CGPDFScanner scan.
So what does this function do really? This function is reading through the PDF page metadata and creating the scanner which goes through and determines the number of images in this page. Now who's calling this function? Well, if I hit the info button over here, scroll down, I can see that image subtitle is actually calling this and if I double click on that it will take me directly to that function call.
[ Applause ]
So what's happening over here? Every single time we're trying to display the image subtitles we're going through creating a new PDF scanner and trying to determine the number of images in that page. Now this seems rather wasteful so let me just go back to Xcode over here, and I can do that by clicking this little Xcode icon in the top, well the lower right and here we go.
Xcode brings up my code and, let me just double click on the Xcode icon, one second, right. So instead of recalculating the number of images in my page every single time I'm trying to find the subtitle, why don't I go ahead and cash this value so that when I find the subtitle the first time I can just look up that number later on and I don't have to do all this work every single time.
So to do that I'm going to add this new instance variable into my image items and this instance variable I'm going to aptly call numb images Save that, go to my image item, when I initialize my image items I'm going to initialize the value of numb images to NSNotFound.
Over here, all right and down where I'm trying to calculate the imagesInPage, I'm going to check whether the number of images is NSNotFound first, meaning it hasn't been calculated yet over here And then if it hasn't been calculated I will go ahead and find the number of images in those pages and then return it.
Later on when I'm trying to look up my subtitle, I can read the cached number of imagesInPage and I don't have to recalculate it every time. So let me save this, build and go back to Instruments and verify that this change has actually achieved the performance one that I'm looking for.
So let me just go back to my Call Tree review. [inaudible background talking] I didn't build it? Oh, sorry, never mind, here we go. Delete the old method. That is key. [laughter] Save it, build it, all right. Go back to Instruments [laughter] well thanks for that and launch it again.
All right so you know bring up my app, hit start and now I don't see the hanging anymore and it's actually much faster. [applause] So I can go back to Instruments and stop recording. And performance optimization is really an iterative process so we just, you know, eliminated the lowest hanging fruit, but really, what you should do is continue to iterate through your performance analysis and find more areas where you can optimize. And in this case we're going to go right ahead and in the configuration view on the left, hide our system libraries which is really going to charge all our system libraries to functions or calls that we make in our code.
And after we do that you can see that now I'm only spending 355 milliseconds in Image item, imagesInPage but now my heaviest or my hot spot is actually occurring in ImageWare controller create image from PDF page which is you know I'm taking about 3 seconds or so in that code, which is the code that is going through and rendering the PDF pages as thumbnails. Well we're going to go come back to this later. So to recap what we saw in the demo, what is Time Profiler? Well it's the next-generation statistical time sampler that we've incorporated into Instruments.
You can use it in Instruments or you can use it standalone whenever your Instruments icon is in the dock, regardless of whether Instruments is running or not. All you need to do is click on the Instruments icon and launch either one process, or sorry, profile either one process or all the processes on your system. There are two other ways that you can launch Instruments, sorry, there are two other ways in which you can launch Time Profiler.
You can set it up so that it automatically detects spins on your system or you can use a set of hot keys that you can trigger whenever, you know, a certain application is exhibiting slowness or is hanging, in order to begin profiling it. Finally, Time Profiler saves its documents in a special file format that is compact and it doesn't symbolicate in this file format, stack traces aren't symbolicated until the document is opened.
So unopened documents have a short shelf life and if you change your binaries the actual symbol addresses are not going to be valid anymore, so keep that in mind. All right, so why would you use Time Profiler? Well, Time Profiler has very low impact in terms of CPU and memory usage which means that it interferes very minimally in your target application and the timing data that you get back is very precise.
It performs stricter, more precise time sampling from within the kernel and this is new in Snow Leopard. You can use it to profile all thread states or only thread states that are running on your CPU. And finally you can use it to profile all your system processes that are running. So how does Time Profiler work? We kind of glossed over that in the demo but let's go into that in more detail. Time Profiler collects stack traces from your running threads at regular time intervals.
Here we have a bunch of example stack traces that are very similar to the ones we saw in the demo but they're slightly simplified. And Instruments is going to take these stack traces and aggregate them into a more readable form that will allow you to pinpoint the areas that you need to optimize.
So how does it do that? So if I assume every single one of these stack traces took 1 millisecond to execute, I'm going to build something called an inverted Call Tree which, as we saw in the demo, displays the deepest stacked frame first as the roots of the trees and then it builds it backwards so that I can view my call history in reverse.
So for example, let's start off here, you can see that I spent 2 milliseconds in the symbol inflate, 4 milliseconds in lexer but 2 of these milliseconds went ahead and called symbol inflate and so forth with CGPDFScanner, imagesInPage and then finally main, which was the entry into my code, the entry point.
All right, so how does Instruments display these Call Trees? If we focus on the middle Call Tree over here, Instruments actually displays it in the form of an outline view as you can see on the right. Now Call Trees can get very complicated and a lot of times you see stack frames from libraries and frameworks that are not your code but what you really are looking for is areas in your code that you want to optimize and you want to be able to simplify your call trees. So in Instruments and Snow Leopard we've introduced the very powerful concept of data mining that will allow you to do just that.
It will allow you to prune your call trees, simplify them so that you can find areas of your code that you can change and optimize. Now in Snow Leopard we have two types of data mining operations. We have Library and Symbol Operations and let's just step through them really fast and, you know, show you how they work. So the first of these that we saw was charging a symbol to its caller.
What this means is that you take a symbol and attribute the time that is spent in that symbol to the frames that called it. Like for example if you take lexer over here and attribute it to its callers we can then go ahead and aggregate those two stack frames and you can see that now I see 105 milliseconds spent in CGPDFScanner.
You can go ahead and charge an entire library to its callers. For example if I charge CoreGraphics to its callers I would take the time spent in CoreGraphics and attribute that to its calling frames. And once again I can take this two stack frames in the far right and the far left and aggregate them. A lot of times you'd like to discard the amount of time spent in a certain symbol from the rest of your Call Tree. You want to say, I want to ignore this symbol and not look at what time it contributes to the rest of my Call Tree.
And to do that you prune the symbol and its Call Tree. So for example if I actually decide to prune createImage, I can discard its entire weight in my Call Tree and you can see that inflate, I'm spending now 76 millisecondsv in inflate whereas before I was spending 98.
And finally, a lot of times you see your code call into a lot of stack frames and system libraries and then call into some other code. And what you really care about is where that library called into the rest of the code. So what you can do is flatten your library to its boundaries.
And in this case if we flatten CoreGraphics to its boundaries we're just left with the boundary frames that call in to the rest of the code. So there you have it, data mining in a nutshell, simplifies the way you view your Call Trees and helps you find areas in your code that you can optimize.
Now moving on, [applause] in Leopard you've seen Instruments display their data in one of three typical views that are shown here and that you can access from the lower left corner of your Trace Documents. And for example, in Time Profiler it really had two views that it used, it used the sample list view and then the Call Tree view which we see over here.
Now in Snow Leopard we've introduced two new views that will help you improve your workflow and these views are actually available for all Instruments. The first of these is the console view which will display standard in and standard out, writeWithinInstruments. You can correlate it in time with the data that you're getting in your track and in your other views and Instruments will save these console outputs per run per document so you can access it later.
[applause] And as we've seen in the demo we've also introduced the source view, which will allow you to view your source code directly in Instruments and Instruments will annotate your source code with performance data that they collect. Like for example, in Time Profiler we saw that we were spending 99% of our time in CDPDFScanner Scan and this greatly improves your workflow in Snow Leopard.
So let's go back to our ImageAware application. At the end of our previous demo, we saw that we were spending most of our time in createImages from PDF page which was really calling a bunch of system API that was rendering the thumbnails onto the screen. Now in order to improve the performance of rendering the thumbnails, we're going to follow a different approach this time around and instead of -- we're going to make use of the fact that we have about 16, I believe, logical cores on this machine that I'm running in and you know we can leverage concurrency on our devices and leverage the API that we've introduced in Snow Leopard in order to be able to paralyze the process of rendering these PDFs as thumbnails. So in this part of the demo, we've actually changed the way we rendered the thumbnails so that each -- we've created an arbitrary amount of queues, dispatch queues that will be performing the rendering operation.
So let's see how that runs and see whether, you know, we can improve that. I'm just going to select to run this from Dispatch, using Dispatch, and hit start. All right, so it seems to me that, you know, I'm not seeing the speed up that I'm expecting on such a powerful device.
And really I must be misusing the API in some way, so in Snow Leopard we've introduced a new instrument called the Dispatch Instrument that will allow you to analyze the behavior and the performance of your target applications that are running Dispatch. So back in Instruments I just brought up a new template chooser and I've already selected the CPU category over here on the left.
I'm going to go ahead and select the Dispatch Instrument. All right, I'm going to launch my executable and just go through the same process of trying to render my PDF thumbnails off of the screen. Just remember to select Dispatch and here we go. All right, so as it's collecting the data back in Instruments I'm going to stop my recording because I've collected all the data that I needed and down in the left corner I'm going to switch to my queues view.
Let me zoom in a little bit, actually let me just resize these for a second and zoom in. There we go. So the queues view shows you all the queues that your target application is using. Some of these queues, as you can see over here, like com.Apple.rootlowPriority, are the global concurrent queues that lib-Dispatch provides you and you have another global queue which is the main thread queue, and this queue is the queue serviced by the main thread. And down here you can see a whole bunch of queues called imageBrowser queues, and these are the queues that I had created in ImageWare to process my PDF thumbnails.
Now there's a bunch of statistics that the queues view shows you as well. For example, it shows you the total number of blocks processed by these queues over here and you can see that you know on average the imageBrowser queues process about you know 34, 32 blocks. It also tells me about the latency and the total CPU time that I'm spending on each queue. There are two things to note over here and one is the latency, which is really the average amount of time between when a block is inqueued, propagates across the queue, and then invoked. It's actually quite large.
It's about, you know, 3 seconds in general. And that's one thing to note and maybe, you know, the fact that my queues have this high latency might be the fact that I'm not seeing this speed-up in my target application. And another thing to note is that if we look at the main queue over here, it seems that I'm inqueuing about 526 blocks synchronously on the main thread. So let's, you know, take a look and see what sort of blocks I'm inqueuing there and what's going on. So if I zoom out and I'm going to hit the focus button for the queue and I'm going to focus on the main queue.
It's going to take me to the blocks focus view which shows me all the blocks that I've explicitly inqueued on my main queue and it sorts them by when they've been executed and tells me how long they've -- how long they took executing on the CPU. Right. Let me just, you know, select one of these blocks because it seems that I'm inqueuing a lot of the same block on the main queue, and bring up the extended detail view by clicking the icon in the lower left corner.
For each of the blocks that I have inqueued on the main queue, that as you can see I'm inqueuing synchronously, the extended detail view is going to show me stack traces for when the block has been inqueued and for when the block has been invoked. So you can see that I'm inqueuing my block in ImageAware controller and port files in GCD and executing my block somewhere from within the lib-Dispatch mechanisms. So let me see what sort of block I'm actually, you know, inqueueing and executing.
I'm going to double click at my, the frame in my invoke stack trace, and it will take me directly into my code and I can see the blocks that I've created and inqueued. So the first block I create over here is really the block that goes ahead and calls createImage from PDF page that's rendering the thumbnails.
And later on this block is synchronously dispatching something onto the main queue and if we take a look at what it's doing it's actually, let me just go back, I didn't mean to click that, all right, so it's actually refreshing the UI and asking the runloop to redraw the screen to refresh the UI in that case. So really I don't need to do this operation synchronously because I don't really need to wait on the main thread to complete its UI rendering.
So very easily, back in Xcode, if I bring up ImageAware controller and go down to the code which is calling, refreshing the UI synchronously, it's just one character change, I'm just going to change that to Dispatch Async because you know I'm telling the main queue to update the UI in its own time. I don't have to wait on it to complete. So I'm going to build, save all and back in Instruments let me run the dispatch instruments again and verify the change that I've done actually made an effect.
So I'm going to hit record, bring up ImageAware, select dispatch and hit start. I don't know about you but I didn't see that much of an improvement by changing, you know, the call from dispatch, Async to Dispatch, sorry, Dispatch sync to Dispatch Async. So let's investigate this further and see what Instruments tells us about this.
Let me go back to Instruments and stop recording. All right, well the first thing to note actually, if we zoom in, is that for my imageBrowser queues the latency has gone down dramatically. Whereas before I was spending about, you know about 3 to 4 seconds between when a block is inqueued and executed, ow it's down to around you know 800 to, yeah, about 800 milliseconds. You can also see that for my main queue I'm no longer dispatching synchronously onto the main queue.
But really, what's going on here? If I scroll out, and let me select too a bunch of my imageBrowser queues to plot in my track view, let's select these two, and deselect some of the global queues so that we can see what's going on, and zoom in, pull this up a little bit.
OK. So my image browser queues are the ones over here at the bottom and what I see in my track view is the number of blocks processed on those queues. So my imageBrowser queues are completing their rendering the thumbnails all the way over here. And my main queue is actually continuing to, you know, process the blocks that are asking for the UI updates, even long after my PDF rendering has completed.
So this seems like, you know, it's an area to optimize. Another thing to point out is I'm spending about, you know, 1 second over here on my main queue executing the blocks that are performing the UI update. So let's step back for a second and take a look at this information from a different perspective. If you go down to the bottom left corner and select the blocks view, the dispatch instrument will show you all the blocks that have been executed in a queue context.
Now in this case I only care about blocks that I've created explicitly so I'm going to go down into the right corner and filter by ImageWare. Oops, all right and the block that I care about is the block that has been dispatched on the main thread which is import files in GCD block 2.
Let me just also select that to plot it on the screen. So you can see that I've executed, I've invoked 526 blocks of block 2, of the block that I'm executing on the main thread and on average each of these blocks took about 1 millisecond to execute. And if I look at the extended detail view, it shows me that invoked stack trace for this block. Let me go ahead and double click on it and see why I'm spending so much time in this block.
Well, what is it doing over here? It's asking the image to refresh, the view to refresh itself and then it's asking runloop to explicitly redraw the view. So as an improvement here it seems that we can reduce the amount of UI update that we're doing. And what we could do is if we go back to Xcode, instead of asking the view to redraw itself after every thumbnail that we've rendered, let's instead delete these two and bring in this code.
So we're going to ask the runloop to refresh the ImageView at the end of its runloop and we're going to cancel, you know, any pending previous requests over here to refresh the runloop, I'm sorry, to refresh the ImageView. Save, rebuild and run and see whether we've improved the performance of our application. So let's select dispatch, hit start and it's done.
So there you have it. [applause] What we've seen is part of the multi-core template that we've added into Instruments to help you understand the performance and the behavioral aspects of your applications that are running using lib-Dispatch and applications that are concurrent. So as part of the multi-core application we've introduced new in Snow Leopard the Threads Instrument, which we haven't seen in demo but we'll show you the thread states of all the threads that have been created in your target application. It will show you state transitions between when a thread was running, waiting, suspended or idle, etcetera. We'll also tell you what type of threads are running in your target application, whether they're BSD threads, Dispatch threads or your main thread.
And it will tell you things about the parent and child relationships between these threads, whether each thread is living or terminated, the number of context switches each thread performs and finally the total number of CPU time a thread consumes. Because really the question you're trying to answer is how many threads do you want to create in your target application to achieve concurrency and not create a large overhead. So let's suppose that these are the number of runnable threads that our target application is using, plotted against time. And let's suppose that my machine that I'm running on has four logical cores.
Now the first area over here where I have two threads running on my 4-core machine I'm underutilizing my machine. I have two idle cores that could be doing more work but aren't doing any work so really I can be doing better. In the case of four runnable threads on a 4-core machine, that's really the optimal that I'm trying to achieve.
And then finally if I have 6 threads running on a machine with four logical cores I'm actually overcommitted. At any one time two of these threads are going to be waiting while the other threads are executing on the cores. So usually there's a tendency to try to create the same number of threads as you have units of work. But the rule of thumb here is try to make the number of threads closer to the number of cores on your machine rather than the units of work.
And using a lib-Dispatch actually makes that very simple for you and you don't have to care about the hardware that's running, that you're running on. All right so the second of the Instruments in the multi-core template is the Dispatch Instrument and this allows you to analyze your blocks of queues that are in your target applications.
We saw that there were two ways of representing this data from a queues perspective and from a blocks perspective and if we start off with the queues perspective, over here you can see that the dispatch instrument detects when blocks are inqueued asynchronously onto a queue and when they're inqueued synchronously and it also detects when they're invoked.
So here we invoke a block off of queueA and then we go ahead and invoke a bunch of blocks over queueB. And you can see that, you know, my statistics are updated. I can see that you know I invoked, I inqueued one block synchronously onto queueB and the latency has changed as well. And you can also see this from a blocks view.
In this case I see my block queue bar I've inqueued about, I've executed, invoked about one type of block from block foo and four from block bar but there's this extra block, baz, and I don't really know where it came from. But if you take a closer look, block baz is actually next to the block. It's been invoked in, within block bar and block bar has been inqueued directly onto queueB. And because block baz is nested within block bar, we actually see it as executing on queueB and that's why it's going to appear in your blocks view as well.
So to recap, you can optimize the performance of your target applications running with Dispatch by looking at the blocks with the longest duration and the blocks that are executed the most and you can also understand the analysis of applications running lib-Dispatch. All right let's switch gears and talk about some memory analysis that you can do in Snow Leopard. I'm going to bring up Daniel Delwood to tell you about how you can optimize memory and performance in Snow Leopard.
[ Applause ]
Daniel: So howdy. I'm Daniel Delwood, performance tools engineer and I'm here to change your perspective a little bit, to talk about memory. And we're going to start out by talking about retain/release which should be very familiar to you as Cocoa developers. So retain/release mechanics are very simple to understand the rules, but they're hard to follow perfectly and when you over-retain something it leads to leaks and we've presented a leaks template for you in Instruments since Leopard, but over-releases are also another type of retain/release problem and these have been historically a lot harder to track down.
Well now there's an instrument for that. [laughter] So over-released objects, what are they and how does that work into a crash? Well, let's say I have my object and two other objects, A and B. Well, when the last release gets sent to my objects it's then freed and any subsequent messages sent to it cause the crash because that memory's freed, it's no longer an object, and the runtime doesn't know what to do with the message.
So what the Zombies template does is it turns -- it changes behavior of that last release into making that object a Zombie instead of freeing it. So the object just stays around in the lifetime a year process right up to the end and that way any future errant messages that get sent to it just cause the Zombie object to emit a message that we can then record in Instruments.
So the Zombies template then is for detecting those messages and then also for presenting the retain/release and autorelease histories of those Zombie objects. Now to show you what I mean why don't we go ahead and see a demo of this and show you Sketch. This is probably everybody's favorite drawing program, I'm sure. So I can draw some shapes, you know, perhaps add a color to one of these but what I want to do is add a feature to Sketch.
And while I like resizing the canvas, I wish my graphics inside Sketch would resize with them. Of course during the real life resize, if I change the bounds on them constantly, they get a little bit performance intensive and so what I've done is I've implemented this so that every time I do a live resize it creates a renderer for the view and that renderer just cashes in NSImage and stretches the image and then does the balance calculation when I let go.
And so as you can see on that last, what? Oh, I've got a crash. So I go back to GBD and take a look and it looks like I'm crashing in Objective-C message send. So this is probably an over-release and let me go ahead and try finding this in Instruments.
So you go to the run menu, run the performance tools, Zombies, and Instrument starts up and Sketch starts in the background. Now you'll notice this is ObjectAlloc which is our tool for recording all of the malloc -- free events and all the objects created in your process lifetime and if I hit the inspector button there are some additional options here including the reference count recording that we have on and our Zombie detection, which changes the behavior of the last released objects. So I'll go back to my application and exercise my feature and try to get it to crash.
And there it goes, it crashed just right on cue. Well, Instruments provides a flag, a message to me saying that a Zombie was messaged and if I zoom in here you'll notice that it says an Objective-C message was sent to a deallocated object at this address and what point in time that happened.
And if I click the focus button it transforms my detail view into the retain/release history for the object that was messaged last. Well, it turns out we've got an NSBitmapImage route and it was malloced at a certain point and then it was autoreleased and retained and released. There's quite a history here, in fact I can bring in the extended detail view to see each of the stack traces. I'll just scroll all the way down, it was 51 events and the last one was the Zombie event after it was supposedly deallocated. OK, so our job is to track down what the problem is.
How does Instruments help you with that? Well there's a responsible library column which makes an intelligent guess at what library was responsible for calling a certain reference counting point. So for this, this is our first release that it attributes to Sketch and you'll notice it's called directly from our code NSketchSKTliveResizeRenderer snapshot view.
Well, let's go there and take a look and see if that's our problem. Go back to our code and here's our snapshot view code. Well, what do we know? We know it's an NSBitMap Image Wrap and here's our Image Wrap getting created. We've got an alloc, init and we scroll over and yes, there's an autorelease there, so our net retain count is zero and this will go away once the autorelease pull pops.
So go down to the only other place it's used are draw an image add representation, which itself if it wants to keep ownership will retain the image wrap and then I release the image wrap. Wait a second. I already had a retain count of zero so this is actually at fault, this shouldn't be happening. So I can go ahead and delete that, build and while on many projects it will require a little bit more investigation, that first hint was correct in this one because when we started up, draw a couple shapes and start resizing them, they resize fine.
There we go, tracked it down. [applause] So that was tracking down some retain/release problems and introduced in Leopard was the technology of Garbage Collection. It's a very, very powerful technology and one of the best parts about it is that makes retain/release unnecessary. Your lifecycles of your objects are managed by references and these references mean that as long as you are using an object or it's referenced from a place in your code, it won't get collected by the Garbage Collector but you still don't have to worry about retain/release. It's not a magic bullet, though, and any unnecessary references in your code can actually lead to unnecessary memory usage.
Now how is that? Well, we have an Instrument called the Object Graph instrument and this is to help you manage those references and understand the behavior of your application. Well, first of all you have to understand roots. And roots are objects in your application that the Garbage Collector will not collect because they're considered global or they're considered top level.
Well, the causes of a root are three things: first of all you can have a global that points to an object and that will make it a root. Second it can be referenced by your stack, by the current function on the stack in use by your code at the time and third, CF retains and CF releases are still respected under the Garbage Collection mechanism while you can opt in for CF types if you wish. So the goal of Object Graph is to help you understand what the roots are and, more importantly, what the path to your objects are from those roots.
So for example the NSApplication root has a delegate, which is the SKT App delegate in this case, and that actually has references to window controllers for the panels that are used in the background. So I'm going to go ahead and show you a demo of using the Object Graph instrument and how you can use it to understand the behavior of your app while running under Garbage Collection. So same project, I'm just going to switch it to using my GC configuration. And go ahead and build and start this directly from Instruments, let me close the other document.
So run, run with performance tool and I'll use the GC monitor template. Now this template has three instruments, but for the purpose of this demo I'm just going to focus on the first, the Object Graph instrument and what this is does is that it periodically takes snapshots of my target application and all of the heap objects in that application. I'm going to turn that off for now because I'm going to be talking a little bit more than doing analysis.
And in the detail view here you'll notice it took a snapshot of all of the objects and this is showing me all of the objects in my target application filtered by the block filters on the left here. So currently I'm looking at just the user-defined types which is what I usually am caring about most.
In this case you'll notice I have a NSKBO notifying SKT document as my GraphicView and even my app delegate here. And this view is for helping you understand the strong references that Garbage Collection tracks, so if I turn down this SKTGraphicView, you'll notice that its next responder is an NSClipView, its Window is this Window and I can even turn down this Window and follow its graph even further to its delegate or its border view.
Now to top level, you also notice there's some other statistics such as the thumb size, so for instance with this GraphicView, it's referencing a total of about 33 kilobytes. So I'll go over to my Sketch window here and I'll exercise my own function and now that I've done the live resize I'll close the document and I expect that my document would be gone, my GraphicView would be gone and well, most other objects would be gone.
So I take a snapshot of my graph or my process and it updates this view. If we look through here we'll notice that the document isn't there, it's gone. But the GraphicView is still around in my application holding on to that 30K. But why didn't Garbage Collection reclaim GraphicView? Well, to understand that, we have to look for the roots of the GraphicView.
So I'm going to select this GraphicViews focus button and take a look at the roots. So this GraphicView only has one root and it's a MapTable. If I bring in the extended detail view it will tell me why it's root. Now in this case its one of those three, it's a Global. The Global is called SKT_cachedRendererForView and it gives me the address even.
But more important to me is the pass from this root to my GraphicView, why is it being kept alive and this is really the shortest path. So my MapTable has keys and these keys are GraphicViews, in this case my GraphicView. And if I go back to my code I can look for that global and that global is defined right here, it's the static NSMap Table SKT_cachedRendererForView.
So what this MapTable is doing and we'll take a look at the creation point, it's creating a MapTable with standard key options and standard value options and then the usage of this is that views get a RendererForView and so they're created lazily and then added to the key value pair there as ViewToRenderer and then the contract is that View destroyed will be called when the view goes away and we can go ahead and remove that key value pair from the table, releasing memory.
However, this doesn't look like it's being called, so if I search for it in the project I can see that it should have been called in one place and I go there and whoa, it's in a dealloc. Well, deallocs aren't called under Garbage Collection so this is a source of the difference in behavior between the two modes. Well there's a couple of ways we can fix this. The simplest is which to be to just write a finalize and call it in the finalize, but finalizers really are a last resort.
There's usually a better way to do it, and in this case there is. I'll go back to the creation of that table and what the problem was is that the keys were strongly referencing my views, were strongly referencing. I can change this MapTable though so that the key options only weakly reference it, oops, there we go. So I'm going to change the personality of my MapTable and here we go.
The key options are NSPointerFunctionsZeroingWeakMemory. So I'll save that, build it and go ahead and run it right back in Instruments and see if my view goes away as I expect. So draw a couple images, create the renderer and do the resize, close the document, take a snapshot and go back to the top level. So nope, we didn't quite fix it. So the document again is gone but the SKTGraphicView is still here. This is because we looked at the shortest path to a root but we didn't quite get everything.
So if I again focus on this GraphicView and look at the path to this, well again the same MapTable, I'll notice that it's no longer being referenced by the keys, it's being referenced by the values. Now the MapTable again map the views to the corresponding renderers but the renderers each have a delegate view which is again that view. Well this is another one of the behavior changes between Garbage Collected code and retain/release.
Delegate view references are strong by default. So I go back to my code and I'll even take a look at the interface and you'll notice here that I just declared an NSViewDelegateView and my comment is not retained in typical delegate fashion for Cocoa. Well, all I need to do is add the keyword under weak, save the document, build it and I will test it again in Instruments.
Let me go ahead and stop this last race and we'll just run it through here and again exercise my function and see if I expect that the memory behavior or see if the memory behavior is what I expect it to be. So I go ahead and take a snapshot, go back up to the top level and now the only two Sketch objects are the tool palette controller and App Delegate. So I've correctly gotten rid of the roots to that view and the GraphicView is collected by the collector.
[ Applause ]
OK, so it's a little bit complex there but our problem was initially at looking at what was in our applications heap and then asking ourselves, why did the SKTGraphicView never die? Well it was because of that one root, which is the MapTable, and the MapTable was rooted by Global. It had keys and values and those keys initially had strong references to the graphic views and the values were the renderers which themselves had the images that they were caching and the delegate view strong reference to our graphic view as well.
Well, the first thing we did was go to the MapTable and by putting in that key, that zeroingWeakMemory key personality, we changed the key's strong reference to be weak. And then we went over to the interface of the renderer and by making that part weak as well we made the delegate view reference weak, which meant that when the last strong reference externally to that GraphicView went away, our GraphicView could be collected, and when it was collected, those weak references were nilled out by deflector and then the MapTable removed the value pair since the key was no longer there.
So the whole goal of all of this was to understand our GCObject relationships and you can really use the ObjectGraphs powerfully to understand those. You need to use it to understand the unnecessary dependencies for your objects and try to eliminate that over-rootedness in your application that leads to memory growth over time where you would have expected it to remain steady or to even go back down. So with that I'll like to invite Lynne back up.
[ Applause ]
All right, thank you, Daniel. So in Snow Leopard we've introduced a couple of other features. In Snow Leopard you can now launch Agent and daemon targets and you can do that by selecting launch executable, choosing your executable, which will bring up the target chooser and in the last column you can see, under the category of launchd the options of launching Agent and daemon targets.
And for example you know I'm going to go ahead and launch the dock agent. So after I select that, Instruments is going to defer to launchd to launch the dock agent and when you hit the record button, instead of beginning the recording process you're actually going to see a panel that asks you to take the appropriate action in order to start this agent or daemon.
So in the case of the dock you know you just move your mouse all the way to the bottom of the screen to launch the dock daemon. All right, so ObjectAlloc collects a lot of data and it might incur a certain overhead. In Snow Leopard we've added ObjectAlloc filtering so that you can choose to exclude certain types of objects from the data that ObjectAlloc collects.
So in this case you can actually set this up by clicking the info button on the ObjectAlloc Instruments and choosing, for example, to ignore types with prefix NS, with prefix CF, et cetera, so that you can minimize the amount of data that ObjectAlloc collects. And just focus on the objects that you care about that are not created by the system. All right, and also leaks, you can set it up so that it throws away lifecycle complete objects meaning you only care about objects that are malloced and you know are still in your memory.
Not objects that are malloced and then released and have gone away. All right so in Snow Leopard we've introduced flags that are really bookmarks that allow you to pinpoint areas in your track or in your execution time where interesting things have happened. So we've seen this flags in action in Zombies Instrument and really they're also user flags that you can add into your documents and in this case, you know, you can add a little comment saying for example, this is the point where I've clicked Safari in my dock and I'd like to bookmark that.
In Snow Leopard we have faster symbolication of stack traces and we have symbols for lazily loaded libraries. So that will make you and your stack traces much faster and much more efficient. So Instruments is a very powerful analysis tool. We've seen a whole set of new instrumentation that we've introduced in Snow Leopard that will allow you to leverage, use Snow Leopard technologies and understand it. And also instrumentation that deals with memory analysis such as Garbage Collection and Zombies.
In Snow Leopard we've made it very easy for you to pinpoint and locate the data that you care about very easily and finally we've improved the overall performance of Instruments. So I'd like you to go home and whether you've, you know, used Instruments before in Leopard or this is the first time you've used Instruments, give it a spin. Try out all our new features in Snow Leopard and if you have any questions, please feel free to email Michael Jurewitz, our Developer Tools Evangelist. There's some Instrument documentation and Concurrency Programming documentation available in the Xcode documentation on your Macs.