What's New in Instruments - WWDC 2011

Developer Tools • iOS, OS X • 47:37

Instruments is the one stop shop for all your performance needs on Mac OS X and iOS. Discover the latest advancements in Instruments, including improvements in System Trace, Time Profiler, and memory analysis. Learn how Instruments can help your app perform even better.

Speakers: Daniel Delwood, Steve Lewallen

Unlisted on Apple Developer site

Downloads from Apple

SD Video (141.5 MB)

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.

My name is Steve Lewallen. I'm the Engineering Manager for Performance Tools. So what do we have for you today? Well, first up, we're going to cover new workflow improvements we've made to Instruments to make your life using the tool much more productive. Secondly, we're going to cover a new concept called strategies.

Third, we're going to look at new profiling API that we've added to allow you to programmatically profile your own apps. Then we're going to cover System Trace, what's new, and do a deep dive on how to use it and why. And then we will cover our network analysis instrumentation that we've added for iOS. And finally, we'll conclude with our Arc support in Instruments. So let's get started.

First of all, we've added a new set of options in a sheet on Instruments to allow you to change, for example, the start time between the time you hit the recording button and when the recording actually starts. This allows you to do any manual setup you might have to do. Then we've added the ability to set a time limit on the recording, say a maximum recording time, at which point the recording will automatically stop.

And finally, we've added the ability to set deferred mode on the sheet for the trace document. Now, as I hope you know by now, we have two major modes of taking traces in Instruments. One is a Median mode. Now, a Median mode is when we collect the data and we show it to you on screen right away.

So you're profiling your app and you do something and you see a spike in allocations Instrument or time profiling, you can associate that, make a mental note, "Oh, that's about the explosion that just happened in my game. I'm going to go back and look at that." But we also have deferred mode. Deferred mode is all about reducing that observer effect. Instruments is running on the same system your app is. In deferred mode, you lose the immediate display, but we take Instruments off the CPU to a very great extent, thus giving over more resources to your app.

Now if you want to run in deferred mode all the time, you can do that too. We have a new preference in the Preferences pane. Just check that and every new trace document and every new recording you take will be in deferred mode. So we also have a new way to trigger recording in instruments. So let's say, regardless of whether you have instruments in the foreground or background, you want to trigger a trace. All you have to do is hit Control-Excape, and you can modify that key combo with this preference.

In the frontmost document, in instruments, we'll start recording or stop recording, depending on what it was doing before. It'll just toggle that setting. So I like to do this, for example, when I'm doing a system trace. I like this because I have instruments in the background, and I'm bringing it to the foreground, hitting the record button, and then taking it back. That does a lot of different things to the system. It keeps the Windows server busy, et cetera. So with Control-Excape, with it already in the background, we don't affect the system.

We've also added the ability to change how the cores on your Mac behave. So first of all, we have the ability to turn on and off hardware multi-threading. We also have the ability to reduce the number of active cores on your system. This is a way to get a rough approximation, but approximation nevertheless of how your app would perform on a lesser system with fewer cores, for example.

Now let's talk about Track Gestures. So when we introduced System Trace and a few other instruments, we also introduced a much higher resolution and time in the Track View. Now we show data right down to the nanosecond level. And because of that, that drove us to create new gestures, which we think are so important we remind you of what they are in the bottom right hand corner of every Trace document.

So this allows you to directly manipulate time in the track view. So for example, if you want to zoom in, you just hold the shift key down, do a mouse drag, you'll get immediate feedback about the scale, that is time per pixel, and how much the duration of time you selected and release and then you zoomed in precisely where you selected. If you've zoomed in, you'll probably want to zoom out, so you can use control and mouse drag to do that.

Simple. We've also added a really cool new feature called Snap Track to Fit. Let's say that you have made a recording that extends beyond the visual range of the window or maybe it doesn't fill up much of it and you just want to get that trace just fitting perfectly in your visible window. Well, now you can do that with control command Z and the track snaps to fit.

Great little feature, I do it all the time, and in deferred mode, we do it for you automatically at the end of the trace. And finally, of course, we've always had the ability to filter in time, and now you can use Option and Mouse Drag to do that time filtering.

So those are track gestures. Now let's move on to Call Tree Data Mining. So we've always had a powerful set of options to do data mining in the call tree view, but now we've added several more on focus. So for example, you can focus on a subtree of the call tree. That's just to eliminate noise.

But even neater, you can focus on all the calls, for example, made by the selected symbol. Or the inverse, who all calls the symbol? Or even further, who all calls this library or this framework? So that's very handy when when you want to see how your framework, your library that you're providing is doing.

So we've always had the ability to filter in detail views. This is a great feature. It brings all the information you're filtering to down to a concise list. But you lose one important thing, the context around what you're filtering to. And that can be as equally as important. So now we have find. It works just as you'd expect. Hit Command-F and this little find bar appears above your detail view.

You can search forward and backward in outline views and table views. And we'll even search inside of unexpanded outline view nodes and associated back traces. That's very handy. Now, incredibly minor but also very useful is discontinuous selection in all the detail views. And now you can select things and do shallow or deep copies.

and then paste it somewhere and we will copy the column headers, the contents of all the columns, and the indentation level of the outline view. This is really helpful if, for example, you want to share some performance information with a colleague, file a bug report, or jot down a note to yourself. So that is copying.

Now, we've had for a while now this great ability to see performance data right in line with your own source. This is awesome because it can say this line of code is taking up 27% of your time. You allocated this much amount of memory at this point. But often, especially if you're looking at disassembly, which we can also show, you can have huge number of hits. We're telling you about a lot of stuff. And it's been tedious and painful to navigate up and down the source view or the disassembly view to find what's the hottest frame. Now, what's the second hottest? Where was that hottest frame again?

So now, if you have the extended detail view open and you're looking at source or disassembly, we will list all the performance annotations, descending or ascending, it's your choice. You can sort them. And then you can use your arrow keys or your mouse to navigate amongst all of these. The source views will scroll up and down and will highlight the selection that you're looking at. This makes it really easy to navigate the source and all those performance annotations.

So that's the workflow improvements we've added to Instruments. So now let's talk about a brand new concept that we've added to the application called Strategies. So what's a strategy? A strategy is a way to take the data that Instruments has gathered and rotate it onto a different axis. Now we've actually already been using strategies no matter how long we've been using Instruments because now we're formally we're making the first default view you see the Instrument Strategy. So let's take a look at how this works.

Of course, the most common thing in instruments is time, and that's along the x-axis. And in the instrument strategy, we have instruments along the y-axis. But we collect data on other very important, common types of things. For example, cores. So now we have a CPU strategy where we list the CPUs on the y-axis.

Now, any instrument that knows anything about a particular core has a place in the track view to annotate it with its information about that core. And of course, we also have a thread strategy now. So we have threads listed on the y-axis, all the threads all the instruments have encountered, and it gives an opportunity for any instrument that knows anything about a particular thread to annotate that space in the track view.

So, what are the elements of a strategy that you'll use? Well, first of all, if we're going to give you multiple strategies, we have to give you a way to choose a strategy. So we have a strategy chooser just below the record button in the upper left. To the right of that, we have a series of highlight controls. This allows you to highlight particular data you might be interested in and sort of gray out other data. For example, show me everything about a particular process, a particular thread or set of threads.

Of course, the track view is a major part of any strategy. That's where most of the action happens. And then we have this little legend button over on the right hand side. So various strategies will use color to mean different things. So if you want to find out what a color means, just click on the legend button and you'll get a description.

So now let's look at the CPU strategy in depth. Why would I want to use a CPU strategy? Well, I might want to know how frequently, for how long, and on how many CPUs my code is running. Now in doing this, the CPU strategy can give you a quick visual indication, for example, of whether you've achieved concurrency and to what degree in your app. It can also show you how you're sharing the system and the CPU resources with other processes.

So this is a screenshot of the CPU strategy in action. We zoomed in all the cores and stuff so you could see them nicely here. We took a simple single threaded app, we threw it on multiple threads, and we added a big lock to guard all the data. This actually is not an uncommon approach. Sometimes we see when apps are taken to run multi-threading.

The problem is that we see our samples in blue. Blue, in this case, indicates user stack frames. We never see the same sample or blue on multiple CPUs at the same time. So you visually scan vertically up and down all the cores, and you want to see more blue aligned, but we don't. Instead, we see these troubling gaps. Now, this is a time profile across the entire system.

And so absolutely nothing is happening in these gaps. The CPUs aren't overburdened. Your app simply--or this app simply hasn't achieved concurrency. What we want to see is the ideal concurrency. We see blue samples on all cores all the time. That would be the perfect world. Now, our processes run on a system that shares CPUs with other processes. So even if you've achieved and created the most elegant concurrency model ever devised, what you really see often is something more like this. The gray samples are samples from other processes on a CPU.

And then you can see where the CPUs have shifted over and are processing your code. And that's your samples. And are leaved. So fret not, in this case we do have concurrency, but we're sharing the system with other processes. So that is the CPU strategy. And I'd like to invite Daniel Delwood on stage to give us a demo of using Time Profiler and the CPU strategy.

Daniel? Thank you, Steve. So I'd like to start out on the iPad, just to show you an example app that I'm working on. All right. So, oh, there it is. Excellent. So anyway, this app-- application is showing me, uh, loans. So perhaps I'm a loan holder, and I want to see what the effects of the interest rate or the payment schedule are on the amount I pay over 30 years.

And so I can go ahead and drag this slider. As I do so, you'll notice it updates really slowly. That's not a great experience. And we want to actually improve that. Now notice also there's a concurrency slider. Probably should turn that on now, but it'll come back in the demo later.

So I'll switch back to the demo. And I want to show some of those workflow improvements that Steve was talking about. So here I am in Xcode, and I've got my project open. So how do I run it with Instruments? Well, we've got a lot better integration with Xcode 4, and that's through the use of Xcode 4 schemes.

Now, schemes are a combination of both what you want to build and then what you want to do, what action. So if I go up to the toolbar and select Edit Scheme, I get a sheet that pops down and shows me what that scheme consists of. So on the left side, we've got a bunch of actions, run, test. What we're interested in is profile.

Something interesting to note here is that the build configuration for profile is release by default. Now this makes a lot of sense because what you want to profile is the application as close to your release version to customers. And so building with release will usually compile with high optimization and all that final, to replicate the final bits you'll be passing to customers.

The other thing you'll notice is that the Instruments template can be selected here. And so by default it'll ask on launch, but if you know you always want to use the same template, or perhaps you want to create a scheme just for using a certain template, you can do that here.

Now one thing I want to quickly note about that build configuration being release is that we need debug information in Instruments. And for all the default projects, that's not an issue. But if you have a legacy project or you know you've changed your debug format, you may want to make sure that you set your debug information format back to DWARF with ESIM.

It's all right. I have my scheme. How do I run it? Well, from the Run menu, just pull down, select Profile, and Xcode will go ahead and build my application. Launch Instruments, it asks me on launch, and I'm going to use the Time Profiler for this demo. So my app starts in the background, and I'm actually just going to move that slider to generate some activity. And you'll notice the track view immediately starts updating with data. Alright, that's probably enough.

And I want to tell you sort of what you're seeing here and how to work with this data. So Time Profiler is all about CPU utilization. And every one millisecond by default, it will take a back trace of whatever code's running on the cores of the target machine.

In this case, I've got an iPad 2, and I've got two cores. And so if I want to scrub in the track view, it's going to show me, by default, the Instrument Strategy with whatever the instrument finds the most useful statistic. So my application was using about 96% of my CPU. Excellent.

So how do I optimize? Well, that's what the Detail View is for, giving you a lot more detailed information on what the instrument collected. And this is a call tree just aggregating all of those back traces together. So the top frame, by default, is showing me that I made a lot of message sends. Excellent. That doesn't tell me very much. But if I want to use some of the call tree options on the left here, I can dive in more.

So I'm going to choose Uninvert to see from a sort of a top-down view. And you'll notice I have a worker thread and a main thread. And if I actually just want to start turning these down, you'll see that I'm using Dispatch. OK. Hm. I wonder if I'm getting good concurrency there. Well, there's a bunch of Dispatch frames here.

And I'm not really interested in them. So I can just use one of those new call tree options of control clicking and selecting charge lib dispatch to callers. And it no longer appears. And now I can continue with my investigation. So here we go. Oh, there we go. Loan calculate data. That's something I'm interested in. And in fact, I don't want to see the rest of the tree. So I'm just going to go ahead and select the focus arrow.

Okay, so now I'm looking at my code. The code I even suspect is going to be the one taking a lot of time, and it indeed is. And you'll notice that right above this detail view is a jump bar, much like Xcode Force. And so if you want to get back to where you came from, you can just click back in the jump bar and get right back there.

All right, so what do I see? Well, I see a lot of NSNumber, NSDate, NSDecimal number frames. And I'm interested in that. And if I'm going to optimize, maybe I could use less wrappers. I'm not sure. But I'm also interested in how I'm using the auto-release tool. Perhaps I'm using that poorly. So I'll hit Command-F. And that find bar that Steve was talking about jumps down. So I'll type in auto-release.

And you'll see I've got a pop auto-release pool. But if I even want to jump into sub-parts of this tree, I can just select auto-expand and start searching for it. So here we are in a subject auto-release. How many different places in my code am I using this? Or at least, is it showing a pot in the time profile? Well, I'll control-click on it, and I can select focus on all callers of auto-release.

Excellent. And so you'll see that I have used about 89 milliseconds in my application, just calling auto-release. So perhaps I need to do something different with my object wrappers. Now, that's a way to get a high level view of sort of what my app's doing. But how do I actually improve this? Well, there's three different things I could do. First of all, I could do less work.

And that's probably the thing I really want to look into. Perhaps I need to calculate the loans differently. Maybe I need to calculate them less. Second of all, I could do my work more efficiently. Maybe I should use a different algorithm. And I can really look into that. But third, and what I want to show you today, is doing that work concurrently. So that's where the strategies come in.

And if I use the strategy control on the top left, I'm going to select CPU Strategy. So the track view reconfigures to show me both cores on my iPad 2, as well as some data that looks different from the previous view. Now, this data is now showing me each individual backtrace at a one millisecond interval. That's hard to see. So I'm going to go ahead and use the shift key and zoom in.

As I zoom in even further, you'll notice that pattern that Steve was talking about of ping-ponging between cores. So from core zero to core one and back and forth, I'm doing a lot of work, but I'm not doing it at the same time. So there really is an opportunity for optimization here. And we can see that very quickly with the core strategy. All right, so I know I actually have that concurrency slider. What happens when I use it? So go ahead and turn it on and start moving the percentage rate.

That's probably enough. And you'll notice that the graph looks a little bit different, and if I just check the utilization, it's about 160%. But let's go ahead and take a look in the CPU strategy. So I'll zoom in again. And what we'll notice here is that we actually are doing a lot more work concurrently than we were before.

So we have a lot of work on both cores 1 and 2, and then there's some time that we're only spending on a certain core. Now if I'm curious what stacks are these, I can go ahead and click on them, or even double click to bring up the sample list, and it shows me a list of all the samples with the call stack. In this case, I was recalculating my loans. So, OK.

Excellent. When I select one of these stacks, you'll notice it turns yellow. And if you're ever curious about what something in the strategies mean, you can always use the legend. And the legend will show you that red in this case means kernel, yellow means selected, and blue means user.

All right. So I've achieved some sort of concurrency, but what was that bit about that wasn't concurrent? Well, I can scroll through here a little bit, and I see a lot of stacks that it looks like were part of the chart view, which are drawing in the main thread. And if I'd really like to verify this, I can go ahead and go to the pop-up and select main thread to highlight everything that's on the main thread. And you'll notice in the detail view, we have desaturated everything that isn't on the main thread for you.

Now, this is great. Because as you zoom out a little bit using the control modifier, you'll notice that lots of my other threads were doing work concurrently, and my main thread is where I should optimize next. So thank you very much. That's a demo of using the CPU strategy, and I'll pass it back to Steve.

Thank you, Daniel. It was a great demo. This is the CPU strategy. And now let's move on to the new profiling API that we're introducing for Mac OS X programmatic profiling for your use. So this is called the DT Performance Session Framework. And it allows you to programmatically target yourself, that is the app actually calling the API, or other processes in the entire system.

We've provided all of our major sets of instrumentation for this framework: Time Profiling, Allocations, including allocations with Sambi support, System Trace, Activity Monitor, Leaks. And you can even have an API-- you even have an API that'll allow you to programmatically post flags to the timeline. So when you're done using this framework, you will have generated a DTPS, DT Performance Session file. And you open that file in Instruments.

But Instruments needs the DSIMs that you have, along with your binary, and that file to fully make sense of all the data collected. So we ask that you don't rebuild your binaries in between the time you take a DTPerformance session and you open it in Instruments. Otherwise, the data isn't going to be nearly as good as it could have been. Now, this framework is located in Library, Developer 4.0, Instruments Frameworks. Pointing out the obvious, this will not be on user systems, so don't forget to not link to this in the release build.

Okay, so why would you want to use this framework? Well, one of the things you can do is programmatically profile something only under extreme conditions. Some instance where something has gone awry, you can just kick in profiling like that, and then you'll have that data. Or you might want to indicate when extreme events have happened.

Oh, I lost network connectivity here. And therefore, when you look at the performance data, you can interpret that data along with keeping that in mind. Or you might want to write, you know, performance regression tests around certain API and make sure that from release to release, you're not regressing a performance.

So how do you use it? It's really simple. So you create a session and you add Instruments to it. In this case, we're adding the Time Profiler instrument. And then you want to start and stop the Profiler. Now, you could do this just once. You could start at the beginning, you could stop it at the end. That's perfectly fine. But you can also start and stop it repeatedly in certain situations in your app. It's up to you.

While doing so, you can post signal flags. Again, you can note extraordinary events like, oh, network activity is lost or something. Or you can say, hey, here's the beginning of something and here's the end of that same something. And then when it's visualized in Instruments, we'll actually use that data to allow you to quickly and easily time filter, for example, along those boundaries. It just helps you dive into the data you collected. When you're done, you save the session out, note where you saved it, and then you open it in Instruments, and it's as if you took the time profile, for example, right from the app itself.

Now along with this framework, we're shipping a new command line called the iProfiler command line. It's built entirely on the DT Performance Session Framework. It can do exactly the same things. So if you'd rather work this into your workflow from the command line, in scripts, for example, feel free to do so. You can do either.

So that is the new API for profiling. Now let's move on to System Trace. Now we introduced System Trace last year, but a lot has changed and improved. The biggest news is that it's now available for iOS 5. This is pretty significant to have this type of instrumentation on an embedded OS. And it's particularly timely now that we have these dual core iPad 2s. So when those games, for example, are really pushing the system, they can use System Trace to squeeze out just a little bit more.

So System Trace is all about providing a comprehensive analysis of the entire system. What are the threads doing and why are they doing that? What system calls am I making? How long are they taking? What VM operations are going on that the system has to perform that might affect my performance in my app?

So, of course, System Trace has a default instrument strategy. It has three instruments. We have scheduling. This is all of your thread information. Then we have system calls. And, of course, we have VM operations. But things get very interesting when you switch to the thread strategy view, where we have threads in the vertical. Now you can see a lot more detail about what's going on.

When your clickers work. There you go. Okay. So first of all, we can show you thread context switches. Now the arrows are colored to indicate a particular CPU. So how does a context switch work? Well, you have a thread and eventually it's context switched onto a CPU. It's actually The length of its run, for that period of time, it's called its tenure. And then it will be switched off because the CPU has to move on to service some other thread.

And back and forth you go. So those are context switches. It's quite interesting to follow the CPU around and see what it's servicing at what point in time. You can often see ping-ponging instances where one thread is switching context with another, again and again and again. That's inefficient, so you might want to look at why that's happening.

So we also have virtual memory events. We cover a whole lot of them. And we should stop here and sort of describe what the virtual memory system is and what it's doing. So the system manages memory in segments called pages. And these become real with faults. They're faulting in a page. Now, these pages can actually be shared amongst processes.

Okay, everything's good. But then someone wants to write to a page, a shared page. So what happens in that case? Well, the system has to do some work. It has to make a copy of the shared page and assign it to the writer. This is what we call a copy on write page fault, and it takes a little time. So if you see a lot of these, you think to yourself, am I causing this? Is there something else I could do?

Now, in addition, when pages are no longer used, they're returned to the system. And at some point in time, somebody needs a new one. So we reuse that page. But it has the data from the previous app in it or whoever wrote into it last, so we don't want to share that.

So the system has to take a zero fill page fault. Again, spending a little time writing zeros in that page before it hands it back. So page faults can take a little time, minuscule really, but still can build up. When the system is servicing the needs of your app, so you should take that into consideration.

So we also show system calls. Now system calls are calls from user space into the kernel. and they can take some time as well. Kernel is very efficient at its job, but some calls can take longer than others. You might want to think to yourself, do I want to really take this time now? Can I put it off, do it at a different point, a different juncture of execution of my program? So it's useful to see that information.

And finally, System Trace will show you thread states. A thread can be in many different states for many different reasons. It's not always running, for example. So we show these states with colors. Now a thread can be in a runnable state. It's ready to go, but there's no CPU there to handle it, to service it. Eventually a CPU comes along and says I'm available, the thread is context switched on on that CPU, runs for a while, and eventually it'll be context switched off.

So this is the life of a thread, and threads are always moving in between these states, and you should be aware of when that happens and why. So now I'd like to invite Daniel back on stage for a demo of using System Trace on the iPad 2. Thank you very much, Steve. All right, so I am going to start on the iPad 2 here.

And I would like to show you an app that is a lot of fun. It is a Mandelbrot fractal drawing app. And it's actually pretty highly optimized. I mean, it's done in software, so we probably should do this on the GPU. But for the purposes of this example, we've optimized it using CPU, or using the Time Profiler and the CPU strategy to make sure it's achieving concurrency and is very quick.

So, I'll calculate it. And you'll notice it's calculating from left to right, doing scan lines, top to bottom. And each thread is just picking up a different scan line. And when it completes, it passes it to the main thread, displays the image, and that's what we see. So, all right, how can I use System Trace to analyze this? I'll go back to the Mac.

And here we are in my project. I'm just going to use that same profile action from Xcode 4. Compiles my application. Launches Instruments. And I'm going to select the System Trace template. Now, when I hit Profile, you notice that Instruments really doesn't do too terribly much. So it's drawing in the background, and I'll tell it to recalculate there.

And when I'm done, I can just go ahead and say stop. Now this is where Instruments starts analyzing, because the System Trace template, by default, works in deferred mode. And so that means that it tries to stay off the CPU as much as it can, do as little work as possible, and keep out of your application's way so that you get the best performance metrics you can.

So what are we seeing here, the first screen? Well, this is a trace highlight showing some high-level ideas of system usage, context switches, and the like. But if we really want more data, we can start clicking on these individual instruments and seeing that the scheduling instrument shows us time division ratios and how much time the thread spent running versus blocked versus interrupted, that sort of thing. System calls, much like its name, will show us a count and duration, and same with the VM operations. But again, all of this data isn't terribly useful until you really combine it with the thread strategy. So in the top left, I'll select the thread strategy.

And you'll notice that we have a lot of data. Now, how do you use System Trace? Well, System Trace really requires some thought on the part of the developer. You need to come up with a mental model of what's going on in your application and use System Trace to attempt to verify that.

Now, when you notice discrepancies, that's when you have really good opportunities for improvement. So with our fractal drawer, we really expect that we've got as many threads as Dispatch will create, namely two for a iPad 2. And we'd expect that they're both running as fast as they possibly can until the completion of the drawing.

Well, looking at the thread strategy, we'll immediately notice that there's a lot more than two Dispatch worker threads. In fact, there's one, two, three, four, five, at least five Dispatch worker threads just to draw it the first time. Now, why is that? Dispatch really is a great API. Why is it creating more threads than our number of CPUs? Well, that's where we can zoom in here and find out more information about what all these events are.

So you'll notice that there's a bunch of colors, and these are all the thread states of our different dispatch worker threads. And if I show the legend, this is a great reminder for what they are, unknown, running, supervisor, et cetera. Now you'll also notice a lot of events here, and some of these are system calls. So you'll notice by the phone, it's a Mach system call, and it has a stack trace associated with it.

And here is a page cache hit from the VM Instrument. And so all of the Instruments are contributing to this one view. Now, for our application, you'll notice all these copy on write faults. Copy on write. That's why the cow is there. But you'll notice here that we've got our Mandelbrot renderer in the stack trace on pretty much every single one of these.

Okay, so what could be going on? Well, we're probably using memory in a suboptimal way. And we need to go back to the drawing board of our mental model of what's going on in our application. So dispatch is stalling because we're making all these copy on write faults.

And all we're doing is passing that image back from each completed scan line to the main thread. Well, hmm, when we pass the image back to the main thread, it doesn't actually do the copy then. It does the copy whenever we touch the pages, which would be when the the next scan line continues on its vertical progression.

Okay, that gives us an idea then. Like, what happens if we did horizontal scan lines? Because the pages go horizontally, and maybe that'll actually reduce our number of copy on write faults. So, let's go ahead and try that. You'll notice that the application actually had a left to right and top to bottom. And so it'll render left to right to start. I'm just going to switch it to top to bottom.

And you'll have to trust me here, but the millisecond timer at the bottom showed around six seconds for the first render and about 4.4 seconds for the second render. So we actually did get a really good speedup there, and let's go ahead and verify what we thought in System Trace. So going back to the thread strategy.

We'll see again that we've got all of these dispatch worker threads, lots of cows, and lots of events, lots of different thread states. But once we get to that idle portion, on the right, There's only really two, maybe three threads working very, very, very hard. So we've really actually improved the situation.

It's closer to our model of what should be happening, and we have two threads running as fast as they possibly can. Now looking at these events, we have a lot of zero fill faults here, and this is something to be expected when working with large pieces of memory, because as Steve described, when the VM system gives you a new page, it has to fill it with zeros. So either way, we've both verified what we thought was going to be happening, as well as gained a really good improvement, over 20% in our application. So thank you very much, that's System Trace. I hope you guys use it and like it.

Alrighty, that was pretty awesome. Okay, so... Let's move on to our network instrumentation for iOS. So we have two new Instruments for you. One is called the Network Connections Instrument. This is all about looking at data volume coming in and out over TCP and UDP. You can use this to debug latency issues, issues with drop packets, et cetera. Here it is in all its glory. It's tracking a spike in incoming and outgoing traffic. But we also have another interesting Instrument called the Network Activity Instrument.

We put this Instrument in the Energy Diagnostics Template. Now why would we do that? Well, that's because using radios takes power. And when you use the network, whether it's incoming or outgoing, you want to use it as efficiently as possible. Because whenever you dribble in or dribble out data, the power's out-- if the radios are already up, cellular and Wi-Fi, for example, you have to go into a higher power state, then they handle the traffic, and then eventually they go down. They can't do this on a dime.

So this burns up extra power. So if you can collect a lot more networking data coming in or out together, then the radios can power down sooner, not waste as much power, the batteries will last longer, and your customers will be happier, and your customers are our customers, so everybody's happy.

So that's the Network Activity Instrument. I encourage you to take a look at it. It's very interesting. Now finally, we'd like to conclude with our Arc Instrumentation support. So, We've introduced Arc at the conference this week, and the support that we'll be talking about in Instruments will be in the next seat.

So what is Arc again? It's automatic reference counting. Essentially, this is the compiler taking the burden of retains and releases and auto releases, all that bookkeeping, off your shoulders and doing it itself, more precisely. This allows you to focus on the relationships between the objects in your app as you think of them for the purposes of your app, not for the purpose of managing memory.

Now, as with anything, it's not a panacea. You still can have leaks in some spots, and you can leak graphs of objects or cycles of objects. But this is where Instruments and the new support we've introduced comes in. Now, Instruments, as I'm sure you're all aware, has an instrument called a leaks instrument. This will find memory you've allocated for which you no longer have a reference. And for Arc support, we're introducing cycle detection as well. So what's all this about? Well, here you have your and you have your object graph laid out and you have references to everything.

But some point along the way, a critical reference is released. And perhaps that will cause an entire cycle of objects to be leaked. They're lost. They're out there taking up who knows how much memory away from your app and all the other apps in the system. In your main memory, all your other references have nothing back to them. So they're going to stay there until your app exits.

So when you're using the Lix Instrument and when you're designing your apps and using Arc, what you need to remember is probably you really have Arc objects, automatic reference counting objects, and you might still have some manually reference counted objects. So you have a legacy old school framework that you've been using as part of your app and it hasn't been converted over yet. And amongst all those objects, Arc and MRC will be references.

And along the way you can leak graphs, complex graphs of objects. Well, Instruments can look at those and detect the fundamental cycles. It'll find the graph itself, all the leaks, and isolate the fundamental cycles. And it will define for you all the I of Rs that point to which objects. So A points to B with a certain I of R, and B points to C with a certain I of R, and so forth. This provides you the information that you need to surgically determine in place perhaps a zeroing weak reference to eliminate that cycle.

That cycle goes away. And you proceed down your list of cycles in Instruments to get rid of all the other cycles. Until finally, again, if you were interacting with some MRC code, you have some stragglers, you use your normal leaks workflow, and you identify where you need to put a release in, you do it, and then your app is leak free. So to demonstrate this awesome technology, I'd like to invite Daniel back on stage.

Daniel Delwood, Thank you very much, Steve. So for this demo, we're actually just going to be using a very simple application on Mac OS X. And it's just actually working on-- I've written a custom tree node implementation. And so it's working on showing whether that actually works right in an outline view and just how the tree node works.

And so I can create one. I can replace it. And it's just randomly creating different trees. And hopefully, these will all be released properly when I replace the tree. But as you can see, the outline view is working pretty much right. So I'm running Arc. This is excellent. I didn't have to think about retain and release. But can I still leak? As Steve was talking about, yeah.

So I'll use the profile action and select the leaks template. And Instruments pops up showing me both two instruments in the template. Allocations, which will get me the malloc-free retain, release, and auto-release events, but the leaks instrument as well. And so as I actually create and replace some of these trees, some bigger and some smaller, you notice that leaks will eventually kick in in the background and notice if I have any leaks. And so there it is, a lot of leaks.

I'll go ahead and stop. So I select the leaks instrument. And by default, the leaks instrument shows us leaks by backtrace. And this is for a world of manual retain counting. You have a bunch of different leaks, and some of them happen at the same place multiple times.

This is because you have probably a missing release or some flaw in your code that's going to do the same thing over and over again. So we try to help you out by aggregating those into the backtrace where they're created. But this really doesn't tell you much about the leak.

Does this array reference a tree node? Does a tree node reference another tree node? That's what the cycle detection's for. So from the jump bar, I'm just going to pull down and select cycles. And here we go. So it's finding the cycles inside of our leaked graph. Now, you notice that I had some bigger trees and some smaller trees.

And so actually, we had a really complex cycle there because one node was holding onto another node. And the idea is that we would like to make this as usable as possible for developers. And we identify the simple cycles within the complex cycle. And so if we turn these open, you'll notice that what's shown on the left is exactly the graph shown on the right.

We have a tree node, which has a children mutable array property. And that mutable array has a reference to its list of objects, which has some malloc bytes, which also point to eventually a tree node. And that children, that child node of that original tree node, actually has a parent pointer, which isn't surprising. But the important part to notice here is that it's red. And that's because it's a strong reference. Now, you'll notice we have red and blue here. And that's because arc works with strong references and weak references.

And you're going to be living in a world that has both manual retain counting as well as automatic retain counting. And so those manual retain counting references, we can't tell for sure that they're strong. And so we'll display those to you in blue. They're very likely strong references. But as your project gets converted over to arc and more and more of your frameworks do, you'll notice that more and more of these become red and the cycle analysis becomes more accurate.

Now, the great part of this is that if you're running manually retain counting, the cycle detector should work for you as well. So, how do we go ahead and fix this? Well, we could break any one of these, but it probably makes the most sense to break this parent reference. So I'm just going to go ahead and double click on the reference.

And Xcode popped to the front. And you'll notice here that it's taken me to the declaration of my tree node interface. And I've got my children property and, oh, there's the parent. But I didn't declare it week. Now, if I had a property, I could just declare week there, but I'm actually just setting this in the initializer. So, in the Ivar declaration, I just put in the magic keyword week. And I'll hit profile.

And this time, those up references will be zeroing weak pointers, even safer than the just unretained pointers before. And so as I create my tree nodes, you'll notice that the outline view updates properly, still works right, and we'll notice in the background that Instruments detected no leaks. So we solved all of those just by solving that one cycle, which was repeated. So, thank you very much. Hope you guys really enjoy the cycle detector.

Thank you, Daniel. That was pretty awesome, too. Well, we're already towards the end of the session. In conclusion, we're thrilled with the instrumentation that we've been able to provide you in this release. It's going to allow you to create far more concurrent code, to use System Trace to make your code run more efficiently. It provides you this programmatically API that you can look for performance regressions. And now we have this excellent ARC support to detect leaked cycles and support your ARC development.

So, for further information, you can turn to Michael Jurowicz, the Developer Tools and Performance Evangelist. You can also turn to our documentation, which we are actively working on. And you can go to devforums.apple.com and ask questions, where you'll get answers from your peers here in the audience, in your industry, the engineers here on stage, and back at Apple. So there's some interesting new sessions for the rest of the week that you might want to attend focused on performance. They'll be using Instruments as well. Thank you very much for coming and have a good week.