Apple Performance Tools - WWDC 2001

Development Tools • 1:03:58

This session explains how to improve application performance in the Mac OS X environment using Apple performance analysis tools and techniques.

Speaker: Robert Bowdidge

Unlisted on Apple Developer site

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.

Good afternoon. It's the last session of the day. I hope we all still have a lot of energy, because we have a tremendous amount of information to show you, a great presentation. And without further ado, I think we'll just roll right in. Robert Bowdidge, our Performance Tools guru. Thanks, Godfrey.

Hi, my name is Robert Bowdidge and I'm a member of Apple's Developer Tools group where I'm responsible for the performance tools. What I'd like to do today is tell you something about the tools and give you a quick introduction to them. Now, why are we giving this talk? Now, I'm sure all of you have great apps that you're ready to ship at Macworld New York in just about two months. And hopefully they're all pretty much feature complete, right?

Thank you. So that means that you've got about two months to go and you should probably be concerned a little about now about whether your applications actually perform as well as you want them to be performing. And more importantly that they're performing well enough so that your customers also think they're good. and therefore hopefully you're going to start using the performance tools to actually track down performance problems and make your apps as good as they possibly can be.

What I'll do is I'll show you those tools. So what you should hopefully learn today is first of all, you should find out that we actually do have some cool tools available. Secondly, you should get some, you are going to see me go through a sample program and actually find some performance bugs in it. So we're going to do some real world examples.

and hopefully you can take that home and get so excited about these tools that you're actually going to go in and dive through them. I'm not going to do a tutorial, I'm not going to go through in detail of, well you pull down this menu and you press that button and this is what it shows. I'm really going to show you what you can gain out of these in hopes of exciting you enough that you're actually going to go off and play with these yourself. Hopefully some of you will be playing with them as I speak.

If you've got questions about the frameworks, about what you should be doing in your Carbon programs to make them more efficient, I am not going to answer those sorts of questions. I am not the right person to answer those and I would probably be telling you the wrong stuff.

In those cases, you want to go to the framework talks or look at them on DVD. So for example, the Carbon talk that was at 1:00 was absolutely--or 2:00 was absolutely wonderful. Go take a look at that if you write Carbon apps. The Java talk was really good about Java performance as well.

So, just a quick summary. So what are the possible causes of poor performance? On a Mac OS X system, one of the most obvious is excessive use of memory. That is that your application requires a working set of memory that is more than it really needs. Now this can be because you have a lot of code because of dead code or stuff that isn't really necessary.

It could be because you're allocating too much memory via malloc. It could be because you're mapping in shared files, that sort of thing. There's many ways that you can increase your memory footprint. A second way that you could be affecting performance is if your application is causing too much, is basically using the CPU too much, executing too much code.

A third way is that you might not actually be doing anything. You may be waiting for something to happen, such as going off on the network or going out to disk. So we have both the case of doing too much and doing too little. And finally, there may be cases where in graphical application you may be doing too much drawing, and therefore you're doing too much computation, you're using memory, you're talking with the window manager, and everything slows down.

Almost all of these problems eventually turn into memory problems on Mac OS X, because all of them have some connection with how much memory is in the system and what memory is being touched. And the problem is that as soon as the memory footprint of your application and of the rest of the system becomes larger than you actually have physical memory, we have this wonderful thing called virtual memory.

And it just sort of gives you extra memory. And to get around that, it basically takes some of the memory out of memory and it writes it out to disk. And then it takes some other and takes it off of disk and puts it into memory. And the problem is as soon as this happens, suddenly your application is going to be judged by the speed of the disk and not by the speed of the processor. So, anything you can do to cut memory use, to keep your working set as small as possible, is a great thing.

Okay, so here's a quick summary of the tools that are available with three categories that may or may not be the most meaningful, but they seem to work. The first one is execution behavior. We have tools that help you understand what code is executing. This includes tools such as Sampler and Sample.

The second set of tools we have are for understanding heap use, for understanding how much memory you explicitly ask for during the running of your program. These include tools such as malloc-debug, object-alloc, which is a tool for understanding Objective C and core foundation use, and the command line tools heap, leaks, and malloc-history.

malloc-history is particularly cool because you can actually tell it, "Hey, in this process I've got a buffer that starts at this address. Where did I allocate that?" And it'll actually tell you. It'll give you a call stack saying where that came from. Finally, there's a set that I call system state because I don't have a better way to describe them.

These include tools for helping you do drawings, such as quartz-debug. It includes tools for understanding how you use system calls, such as SC usage or file system calls, which is FS usage, as well as programs like top, which gives you sort of an overall state of the program--of the machine. VM map, which will tell you how virtual memory is laid out for a specific process.

Okay, so how are we actually going to look at the performance tools? What we're going to do is, well, we're going to do public embarrassment. We are going to rip apart one of my own programs, a little test app that I've been working on called Thread Viewer, and we are going to see if we can find any performance problems that either I put in or that linger from code that I borrowed. Okay. Scott, do you want to come up here and show Thread Viewer? Now, the important things to know about Thread Viewer is that, first of all, it's a small Carbon app. or excuse me, it's a small Cocoa application intended for helping us do performance analysis.

Actually, we need to go to demo two. Oh, we are on demo two. and what we can do with thread viewer is as Scott just did we can say hey I want to look at a specific process in this case the doc and what it does is it shows us a running timeline showing us what's happening with specific threads in that task. So for example we can see here that doc uses two threads. The one on the bottom is the main thread, the one above is a secondary thread that's helping do computations. Now the little blocks there represent 50 millisecond intervals in the life of that process.

The green represents that the, that doc was running during that, when it was examined. The yellow indicates that it had been running in the last 50 milliseconds but it wasn't currently running. The gray indicates that it's blocking, it's waiting for something to happen. The light green and light red are, are waits also.

The green means that it's waiting in the run loop, waiting for something interesting to happen. The red indicates that the program was waiting on a lock. Now exactly what thread viewer is this? What you does doesn't really matter except to understand the pretty colors. The important thing to see is that this is a somewhat realistic program and that you can use it to examine other programs. So, was there anything else I wanted to say on that? Ah, yes. So, what's wrong with this? Any comments? Does anyone see anything that looks obviously wrong with this program?

There's a lot of green. Non-green. Oh, non-green. Actually, I was hoping on Thread Viewer, but that's an interesting thing that the doc is actually doing a fair amount, but that's mostly because Scott's playing with it. However, with Thread Viewer, hopefully there's nothing that you see here that makes Thread Viewer look bad. And that's the first lesson I want you to take home tonight, which is that usually if you just go and look at a program, you're not going to be able to tell that there's necessarily any performance problems.

In order to understand performance, you need to do measurement. You need to do software metrics. You actually need to look at the program and measure things like how much memory it's using and measuring how much time it takes to do things. You need this for two reasons. The first one is so that you can actually look at it and say whether those numbers actually match your expectations.

Did I really expect it to take two seconds to load that web page? The second thing is that when that system or as the system changes, you want to be able to note regressions. And unless you wrote down how it behaved two weeks ago, you won't actually notice that you've been losing 5% performance every single time.

Thus, you need to write stuff down. And in fact, for the projects that I work on, the ones that are relatively time critical, I actually have a checklist of things that I measure and I write them down and just stick them in a folder. It's low tech, but it works.

So we don't know how the system performs and we'd like to find that out. What we can do is we can start by looking at overall system behavior and we'll start by running a text utility called top. Now what you see on top here is at the very top of the screen you see information about the system.

The information below represents individual lines referring to a single process. So some of the things that we want to look at is-- or some of the things that I find very important or interesting that you should probably look at is, first of all, the page in and page out rates at the very top.

So what the line that says page ins, page outs is showing us is it's telling us something about the virtual memory system. Specifically, that line alone is telling us how many pages the virtual memory system is writing out to disk and how many it's bringing in. The first number represents the number of pages that have been moved since the system was rebooted.

The number in parentheses represents the number of pages that have been paged in or paged out in the last second. The important number is the one in parentheses for me. What I tend to say is that you should look at that, you should look at top every now and then when your system's running.

If you ever see those numbers go above zero for any length of time, you know, if they stay at 20 or at 50, that usually means that you're paging a lot. It usually means that the amount of physical memory you have is not enough for all the things you're running, which may indicate that your app is taking up too much memory or that something else is.

Now if you're hitting 50 or 100, then basically your system is thrashing. It's spending all of its time throwing pages out to disk and bringing them back in. And at some point around 50 or 100, you're going to be hitting the limits of the disk that it can't page any faster. And so if you ever see anything at that level, that means that your system is really hosed. Oh, that's a technical term.

The other thing that's interesting is on the bottom part. You can see that in the list of processes, there's a column that says percent CPU, which says which tasks are actually spending the time running. And we can see here that ThreadViewer is responsible for 3% or 4% of the CPU.

So it's not taking up a lot of CPU time. We can also see a column far to the right that says RPRVT. That's the other one that I think is interesting that you should look at. RPRVT stands for Resident Private Memory. It's telling how much memory is in physical memory on the computer.

and it's telling you how much is actually private to that application alone. And this number tends to be a good measure of how much memory your application needs in order to run. So go look at that number when you want to get a rough footprint. Now notice if you have a lot of things in memory, there could be a lot of memory paged out, sitting out on disk. And in that case, it would not be resident. So on a heavily loaded system, that number may not make as much sense.

Now, one thing we should do is let's show another mode of Top. Top has a huge number of modes. And in fact, this is probably as good a time as any to try to sell books. There is a book called Inside Mac OS X Performance. It's available, I believe, on the CD that you got in your bags. It's available on the web. And it's available from Fatbrain, nicely bound like this.

And it describes a lot of the performance tools. And it, along with the various manuals for Top, will tell you what some of the flags are. What I'm going to do is I'm going to show you-- or Scott's already going to show you-- A option to top, the top-d option, extremely clear and extremely user friendly, which gives us a few other pieces of information. We can see the number of page faults. The two I'm going to point at is CSW, which is the number of context switches, way to the right.

and messages sent, messages received. That's how many Mach messages are getting sent around in the system. How many have to be received. Now, there's a reason why I'm telling you this. If Scott actually plays around with Thread Viewer, I guess it is kind of, what we see is that the number of context switches, that is the number of times that something got paged out so that something else could run, is pretty high for both the window manager and for Thread Viewer. And the number of messages sent and received is relatively high. Does Does anyone know why that's happening?

Yes, I heard it. What's happening is that Thread Viewer is drawing, and the drawing is not just done by the Thread Viewer. It's actually done by Thread Viewer and Window Manager together. And so at times when the Thread Viewer is doing drawing, it actually has to communicate with the Window Manager.

So this is the second take home lesson today, which is that sometimes the program that you're running can't be measured just by looking at that program in isolation. You actually have to look at daemons and servers that might also be doing computation. In the case of drawing, you want to take a look at the window manager and at your application. There have been times where I've seen an application that looked like it was performing great. It's only taking 30 percent of the CPU doing a lot of drawing.

But the window manager was responsible for 60 percent of the drawing. And so to understand overall problems, you want to look at both. So, Scott, I actually saw something kind of interesting in that last top display. Can you switch back to it? Okay, so does anyone see anything interesting in this display for Thread Viewer?

Well, we don't know that it's leaking. What we see is it looks like-- Maybe there's a reason why we're doing that. Maybe it's intentional, but we're seeing the amount of resident private memory growing. At least I think it's growing, but luckily top has this little plus there and it actually says how much memory is going.

So it seems to be going up and it's using 37 megabytes of memory. Hmm, this looks suspicious. Let's see if we can track down that problem. So one thing I can try to do is I can try to prove that it's actually growing. Yes, I know that seems kind of silly, but let's try it anyway. So what I want to do is I want to see if I can actually prove that it's increasing.

I want to identify a trend over time. Now one of the problems is I don't really have a tool for this. Okay, I'm not doing my job. However, I can show you how you can make a tool like this. What Scott can do is because top is a command line tool, because it's a Unix non-Mac like tool, that means that you can actually use the various features of Unix to basically roll your own.

So what Scott has done here is he has typed in a little shell script. While one, call top with the -l option, which is basically a one shot, give me the data, and grep, which is a search for the line that starts with thread, and print that out. Then sleep for a second and keep looping.

And if Scott runs that, what we start getting is we get the top data only grabbing the line for thread viewer. Okay, so, and we can see here that the amount of resident private memory, which is the last column visible on where thread viewer is obscuring things, is actually growing.

So, geez, I've got a memory leak. So what you should take out of this is that command line tools, regardless of how Mac-like, Mac-unlike they may be, still have uses. You can use them to roll your own tools. You can run them via telnet from another machine if you want to inspect somebody else's machine. And you can run them even when the window manager is not responding, or when you've got full screen because you're debugging a game. So don't ignore the command line tools. They do have a place. Okay, so what are the possible causes for the memory leak?

One possibility is we're not throwing away old data. Any other suggestions? History. History? Actually, that's a good one. Well, I'm not quite sure actually, so let's see if we can track it down. One thing I'm going to do is I'm going to try another command line tool. It's called Heap. Actually, Scott can try running it, but the slide actually has the output. Heap is a tool that's intended to give you a dump of how much memory you're using via malloc.

and the important, most of it actually is just saying, oh here are all the buffers you're allocating. The important part is the very top though. Because one possibility, although a relatively unlikely one, is that we could have fragmentation of our heap. You know, we could allocate a million bytes and then free it and then allocate ten bytes that fills that space and keep going so that the amount of heap space keeps growing even though we're not actually using a lot of memory. What we can do with heap is we can look and we can see that there's actually two heaps, what we call zones, one for core graphics and one for the default malloc zone, that represent about, say, 2.3 megabytes of memory.

We can see that from the overall allocation part. We also see that the total number of nodes malloc, the amount of space we're actually using, is about two megabytes. Okay, so 2.3 minus two megabytes is about 300,000 bytes of unallocated memory. This indicates that heap fragmentation is not our problem. Okay, so this means it's memory we're actually allocating. So, another thing that we could do Thank you.

is we have two megabytes of memory that's allocated. Our virtual memory size is--our resident private size was 38 megabytes. Okay, that leaves 36 megabytes we don't know where it's coming from. So what we can do is we can run another tool. This one's called VM Map. And what it does is it gives us information on how our virtual memory system is laid out.

Most of the time this isn't something you really care about, but every now and then it might actually be interesting. And if Scott runs it, Oh yes, you've got to run that. Basically what it'll do is it'll say, okay, there's a page at address zero, it's 4,000 bytes and it's basically unreadable. There's a page at 4,096 and it has the executable and it runs for 14k or something. And there is the dump. Very friendly.

However, it's got some interesting information. And if you ever wondered about how the application was laid out in memory, this is how you'd find it out. So that when you're looking in the debugger and you find yourself at address 1016, you can find out why you might be there. However, the slide actually shows you a little more detail about what's actually there.

In this case, we see that there's a malloc buffer at address 139A000, and there's about 16,000 bytes. So that's space for the heap. A couple others. And then below the 512,000 bytes allocated for one of the stacks for one of the threads, we see address 14B6000, 4,000 bytes. 146D, 4,000 bytes. 146F000, 4,000 bytes. This may be the source of our leak.

See where it says 4K on the slide here? Sorry, actually this is a good reason why I shouldn't have done this. So over on the slide you can see the 146d 4000 bytes read/write. So the interesting thing is we keep allocating these 4,000 byte buffers. This is kind of odd. And that would have been allocated via VM allocate. That was how you'd actually create a virtual memory page.

And I went in when I found this problem and went into the debugger and said, what do I see there? And what I did, if you can, can you bring up thread viewer? Is I found that basically that page was zero, that virtual memory space was zero, except that there were a couple numbers that looked oddly like those thread IDs to the left hand side, the 5A035903.

which made me think, oh, this may have something to do with, like, you know, a system buffer or something. And it turns out that what's actually happening here is that the Mach APIs sometimes will return you basically a buffer. It'll return you a VM allocated page that you actually have to free.

And the code, which is actually in one of the libraries that ThreadViewer depends on, does something like, hey, tell me all the threads that are associated with this task, and you get a buffer back that contains it all. And you need to make sure to free it, but that yellow line wasn't there.

and by adding that line we got rid of the memory leak. Now this is not a case you will ever run into. None of you will ever, ever, ever hopefully use the Mach calls to find out what threads exist. This is something I'm doing as a performance tool.

However, what you should take out of this is first of all I had to use multiple tools to track this down and I had to compare their output to actually figure out what the problem is. This is one of the problems with performance that usually the tools aren't just going to drop an answer out in your hand. Plan on inspecting things and thinking about them to actually find out what the causes are.

[Transcript missing]

Now, we saw that we were using about 2.3 megabytes of memory in the heap. That is, in memory we allocated via malloc. And that seems a little suspicious too. So, what are some of the reasons why we might actually allocate that memory? Well, one thing is we could just have really big data structures. We could be allocating two megabytes of data structures because we want them.

A second thing is that we could be caching things. Even though we don't need them, we're actually saving them just in case we need them again. And a third case is that we could actually be creating things and just forgetting to destroy them. And there's a huge amount of other things.

But the problem is that with any of these, the more memory you allocate via malloc, basically the system will just keep giving you the memory. And eventually what happens is the system grows to the point where it won't fit in physical memory anymore. You start thrashing, or you start swapping stuff out to disk, and as a result your system slows down.

So there's two tools I'm going to show you that will help you track down why memory is being allocated. The first one's called malloc_debug, and it tries to show you via a call graph where memory is allocated so that you can track down the big allocations. The second one called object_alloc tries to refer to things according to how many objects exist. So we'll start off with malloc-debug. And Scott can bring that up. What we do with malloc-debug is we select an application and we can launch it.

and Scott can connect and say okay I want to look at the dock and we get our little thread viewer And now Scott can go back to malloc-debug and hit the update button. And we find out that to get to that point in the code, we needed 2.6 megabytes of memory. Now, ThreadViewer isn't that complex, so this seems a little suspicious to me.

Now, like I said, malloc-debug works off a call graph. So what it's doing is it's showing the various functions that were called. So for example, Scott can start clicking on start, which is a secret function in the runtime, which you don't need to know about. But start at main.

Okay, so main is the top of our program. And the number to the side of that, the 1.3 megabytes, represents the amount of memory allocated by main and everything below it by all the functions it calls. The rest of the memory is allocated on a different thread that isn't started at main. That's why you don't see it.

and what we can see is to the right of main 1.3 megabytes of memory was allocated in NS application main and 10,000 bytes was allocated in calls to allocate unneeded buffer. Both of these were called directly by main and if you went to the source code you see that.

Okay, NS Application Main. You don't need to worry about that because that is the root of the Cocoa framework, and that's how you get it started. So of course there's going to be a lot of memory allocated. The Call to Allocate Unneeded Buffer, which calls malloc, allocates a single 10,000 byte buffer.

Scott can look at the listing below where it shows the buffers, double click on it, and get a hex dump. And we can find out that that buffer, we can look at the buffer and maybe we can detect what the cause is. And it has a string that says this buffer isn't really needed. Okay, so that's obviously a buffer we can get rid of. I think.

But this shows you how you can walk through the call graph and you can actually find where there are allocations and you can say hey this looks suspicious and try to fix it. And if you understand how your program is structured top down, this makes a lot of sense. And you can walk through the tree looking for things that are interesting, because this is basically sort of a scrub your nose on the screen until you see something interesting kind of problem.

Now the other thing that we can do is there's times where we don't really care how we got there from main, but we're sort of curious about who was calling malloc directly to try to assign blame. So Scott can switch the mode of display from standard to inverted. And this means we're looking at the call tree from the bottom up. So for example, we can see here that there's 740,000 bytes allocated in calls to malloc.

and 489,000 bytes of that was when NSZoneMalloc called malloc. That's the Objective-C version of malloc. And 290,000 of that was when we called NSReadImage, which called NSZoneMalloc, which called malloc. Now, ThreadViewer is a pretty simple program. It doesn't have any icons. It doesn't have any pictures. And so the idea that NSReadImage is responsible for 290,000 bytes seems a little suspicious. And so we can start walking up the call graph to see who called what to get to NSReadImage.

Now, if we go up past all the AppKit stuff and we can see--actually, Scott, can you point at where the library and the application name are? So we can see that we end up in a function called ImageForProcess, which is in the ThreadViewer library. That's what the thing in parentheses means.

So this is my code. And this is what's being called to actually cause those NS read images to be called. Now, unfortunately I know this code and I know what that function is. That's the function that handles the attached dialog box for Thread Viewer, which Scott can bring up.

And so even though that dialog box had been closed, those icons were still around. and the reason why was that basically Thread Viewer was trying to be clever. It decided it would be really smart if it could cache those icons, if it could keep them in memory after the dialog box closed, just on the off chance it was going to need them again.

Now the--and that's nice. I don't have to read them in off of disk, they're sitting in memory. The problem is because of virtual memory, as soon as the application got a little too large, what would happen is we'd take those pages that hadn't really been touched in a while and we'd write them out to disk. And so when we actually did bring up that attached dialog box again, well, to avoid reading the disk, we'd go to memory, which would require reading the disk.

Once again, the idea of saving memory is pretty important because it's probably going to be cheaper just to get the stupid thing off disk. And so the idea of caching those icons was really stupid. What would have been much better was when the dialog box went away if we just got rid of those icons and then we recreated them on the fly the next time we opened that dialog box because it was a relatively infrequent occurrence.

Oh, one thing I haven't mentioned here that I should is that everything that you saw in malloc_debug refers to currently allocated blocks. So if something was allocated and then freed, you won't see it in malloc_debug. So that's something to keep in mind as you use it. Now, malloc-debug has a number of other features that are useful.

For example, one of the things it can do is it can help you identify leaks in your program, places where you're allocating memory and then you forget to deallocate it. Now, these are really important to track down because leaked memory, stuff that you aren't--you're forgetting to free and never free, basically sit in memory, they occupy space, they keep you from having the stuff you're using close together.

And if you have a long running app that goes for days, like a server app, you're memory just keeps growing and growing and growing until 30 days into the thing, suddenly the app crashes and you have absolutely no idea why. So, and even if you have a small, short app, this is still going to affect performance.

You know, this is a particularly strong point for me because there's at least one game that I've been trying to play lately that after about four hours of playing, it has a memory leak and eventually it sort of runs out of memory. And so please, if you write games, don't do this.

So Malictabug can help us detect this. What we can do is we can switch from show me all things, or show me the things that have recently changed, which is new. And we can switch to the definite leaks mode. Show me only the buffers that aren't referenced anywhere in memory. The way it does that is basically garbage collection. It goes scanning in memory, looking for pointers to the buffer.

and any buffer in that's been malic that it can't find a pointer to it, it assumes it's unreferenced. Because if we don't have a pointer to it, there's absolutely no way we can free it. And we see here we have about 10,000 bytes leaked. And Scott can click on the malloc and see that there's a case in image for process, interesting, where we're allocating a couple buffers and we've lost the pointers to them, so we can't free them. I actually looked at this code. What I'm doing here is, when I go and read the icons, basically, I have to make a list of-- or I have to grab the command line to find the icons. You don't want to know.

And I keep a buffer around for that list of arguments. And in most cases, at the very end of the routine, I actually free it. However, in a few cases, I would find that there was an extraordinary situation that I knew I wasn't going to find an icon. And so I would basically just exit that routine. But I forgot to release the buffer. No one else does this, right?

So, yeah, this really is just one of those things that I wanted because it helps me. So this is another case where MalictiBug helped us. And in fact, it's showing us another lesson of software engineering, which is if you keep finding bugs in the same code, maybe this means you need to rewrite this routine.

Now there's a couple little things to remember. One is that you may say, oh, you're only leaking 10,000 bytes. Who really cares? What's 10,000 bytes? However, the way that malloc-debug's leak detection algorithm works, as you remember, was if you can't find a pointer to this buffer anywhere, then it must be leaked. This first of all means it doesn't do well with circularly linked lists, because everything will have a pointer to it, and so it'll never be linked.

Or, I mean, it'll never be leaked. Also, that means that if you have a tree data structure, The root doesn't have any pointers to it, but everything else is pointed to, and so it's never leaked. So every leak that you see in malloc_debug may be important, and so you should go track it down.

Finally, one more thing about malloc-debug. This doesn't necessarily impact performance, but it does impact correctness. Malloc-debug can also help you track down pointer problems. and that's and what happens is that uh... there's a number of bugs are really subtle intermittent examples include uh... cases where you free a buffer but you continue to read and write from it even though somebody else now has the buffer these are miserable suddenly data is changing you have no idea why The second kind of bug that is really nasty is when you have buffer overruns, where you have, let's say, a string that's 40 bytes long, but you write 45 bytes into it. And so suddenly you trash the next thing after it.

Malik Debug can help you track down both of these. Excuse me, Malik Debug can help you find both of these. And the reason why I say help is because what Malik Debug does is it tries to encourage your program to crash if it does stupid things. What it does, and here Scott's brought up the memory dump.

See, do we have any? So one of the things that it does is when you free memory, malloc-debug carefully goes and it erases the contents of that buffer. And it replaces it with 55 hex. So we would have seen a lot of cases-- actually, there we are-- of 55555555.

So if you try to read data from this buffer that you've freed, you'll get garbage. And hopefully, you'll crash or you'll behave badly. If you try to treat those as pointers, it's even better because 55, 55, 55, 55 is almost always unallocated memory. And as soon as you touch it, basically your app crashes.

So if your application ever crashes in malloc debug, it's trying to tell you something. Hook up with a debugger. There's instructions in malloc debug to tell you how to track down these kind of bugs. And find the pointer problem, because it'll save you a lot of grief later.

The other thing that Malik Debug does is it puts guard words on each end of a buffer. At the beginning of the buffer it puts the string dead beef, the first part that's highlighted. So D-E-A-D-B-E-E-F. Then you have the buffer and then at the end you have the thing beef dead. I did not make this up. It just came this way.

And what will happen is that if you overrun the buffer, Malictabug checks occasionally on freeze to see whether anything happens. And if it ever finds that those bytes have been changed, what it will do is it will actually print out a warning. Unfortunately, the warning goes out to the console, so please keep the console open while you're running Malictabug. Yes, this is lame. We're working on it. We'll try to do something about it.

Okay, thanks for that Scott. The second tool I'll show you is called ObjectAlloc and it's intended for helping you understand how many objects you have rather than where they were allocated. Specifically, it's mostly useful if you're doing Objective C work or if you're doing core foundation. What we can do is again select an application. In this case it doesn't understand bundles.

and we can start the program running and what happens is it gives us a little histogram. It shows us a bunch of numbers grouped by the type of object and these little histograms for showing how many objects we've created. The first number and the darkest bar represent the current number of objects that exist. The second bar represents the peak number of objects of that type that were created during the run. Okay, so the peak number that ever existed at one time. The final bar, the lightest color, represents the total number of objects of that type.

Now, this way of organizing things is really nice for certain types of tasks. For example, Well actually, let me mention one other thing. One thing that it's good for is identifying trends. So we can see motion here, we can watch the numbers growing, and so it's very nice for seeing that memory use is expanding, for example.

A second thing is it can help us when we're trying to prove various statements about our program. So, for example, the thread viewer display basically keeps sampling the application, finding out information about how it's currently running, and throws those on the right-hand side of the display. The information that scrolls off the left disappears. And the way that's done is with a data structure called a thread data.

And what happens is the new thread data gets put on the right side of a big array, and when the information becomes out of date because it's scrolled off, they get thrown off on the left-hand side, and hopefully the objects are destroyed. So one bug I could really imagine doing is that I could be forgetting to delete them correctly, resulting in the number of thread data is growing without bounds, and eventually my system performance would degrade. So we can prove that, and Scott has actually done this. We can actually click, we can find thread data in Object Alex display and see how many objects we have.

And what we see here is that the current number of thread data objects varies. It goes between about 45 and 55. The peak number was 62, but we noticed that the total number is about 1,400 and growing. So this implies to me that I actually did this correctly.

So this isn't a bug. We're keeping the correct number of objects in sort of our ring buffer, throwing new items on the right-hand side and pulling objects off the left. So Object Alex has been able to help us prove that. One other thing, how many of you are Objective-C programmers or are interested in being Objective-C programmers?

Okay, as you do test code, make sure you use ObjectAlloc because one of the things that it's really great at, that you may have seen when we actually brought up the attach panel, is that you can tell it to keep track of every single time that you retain or you release a data structure.

Basically Objective-C has reference counting and it only deletes the object when you haven't retained it anymore. And so if you have a program, you can figure out whether you're actually destroying objects correctly and if you're not, you can find out when you retained it one too many times to keep it around in memory. So use ObjectAlloc, especially on the example code.

Thank you, Scott. OK, so that's memory. The third case of performance we might try to track down is what code is executing. Now, there's a number of ways that we could be using too much CPU. We could be executing code that we don't need to, either that it's dead code or that it somehow is not really doing a value that--calculating a value we care about. We could have an algorithm that's much more expensive than we ever expected, something that's let's say a quadratic, you know, an N squared algorithm rather than linear.

We could have cases where there's some operation that's much more expensive than we thought. One example that was pointed out in the Carbon performance session was that now that we have home directories that could be out on an NFS server, when you go and get your preferences, you might be going across the network. And so something that may have been a really quick grab that from the disk kind of operation suddenly may take seconds to actually get a result back.

and so you may not have expected that certain operations would be as time consuming as they are. As you may have seen in Avi's keynote, there's also the problem that you may be checking for events by polling, by constantly checking and seeing where the mouse is, for example, rather than waiting for something to happen and having the system say, "Hey, by the way, something changed."

In general, the normal way that you solve this kind of problem is basically track down the biggest problem, track down the most expensive routine, because if you can cut the CPU cost of that biggest routine, if you can make it faster, you're going to improve the performance of your system as a whole. So don't try doing the little things, try doing the big things first.

So what we'll do is if we want to try to improve performance, what we want to do is find the expensive calls, improve them. The tool for doing this, or at least one of them, is called Sampler, which Scott has just brought up. So with Sampler, we can select an application or we can connect to something that already exists.

And what we can do after running it is start sampling that. And what sampling means is that we stop the program occasionally and ask what's going on. So every 20 milliseconds we stop the program and we say, "Where are you executing?" And we get basically a stack backtrace.

Then we let the program continue, stop it again, get another backtrace, and keep going. And the advantage of this is that for relatively little impact on the running system, we can actually find out what code is most likely to be running. This is statistical. We don't know all the things that ran in between, but we've got some good idea of what we were actually seeing.

So what Scott, I believe, has done is he's actually done some sampling while ThreadViewer is drawing. So let's see if we can find out what ThreadViewer is doing when it's running. Personally, I'm very concerned about how much time it's taking to actually grab its samples to find out what's running in ThreadViewer. And I want to find out how much time ThreadViewer is spending drawing.

Now, I know a little about this program. I know that thread 3 happens to be where the data gathering goes on in Thread Viewer. And we see that thread 3 was found executing 486 times. That's the number of samples that were taken. All of those samples occurred in the function pthreadbody, which was calling sample threads, which is my code.

And every time that we stopped sample thread, we found one of two things. 470 of the times we found ourselves in uSleep, which is a way of basically stopping for a few microseconds to wait for something to happen, or actually wait for a fixed time. In 16 of the times we stopped it out of 476 though, we found ourselves in this function called Thread Viewer Controller.

log, which happens to be the code that actually does the logging that gets the information for Thread Viewer. and if we look at that we find on the far right a sample stack and we find out that most of the time was actually doing what's called sample once all threads which is getting the stack backtrace which ThreadViewer can actually display.

So what's happening here is that we found that basically about 4% of the time that we actually looked at thread 3, it was actually doing something. It was actually gathering data and the rest of the time it was doing nothing. This seems pretty good. This means that the data gathering is relatively cheap and that's really good for a performance tool.

So Scott can actually just sort of ignore all that because it doesn't look like there's a performance problem. We'll prune that out of the tree so we don't have to look at it. And now we can look at thread zero, which is the main thread where the drawing goes on.

and there were about seven hundred and twenty times that there was sampling or that we sampled the main thread and most of the time it was in Maine which is not surprising and then it goes down into this DPS next event and the block until next event is basically sitting in the run loop it's not really doing very much So we can see here that 570 out of the 720 times we were sitting in CFRunLoop. OK, so this is kind of interesting. So 572 samples. 552 of the times that we found ourselves in CFRunLoop run, we were actually in a function called Mach message.

Geez, why are we spending so much time in Mach message? I'll give you a hint. That actually happens to be a kernel routine. So obviously there's something wrong with the kernel because we're just sitting there in Mach message all the time. Actually, does anybody know what's going on there?

Thank you very much. We are waiting for a Mach message. Okay, so Mach message is basically saying--sending off a message, probably sending it off to like the Windows server or to whoever's giving us events saying, "Hey, let me know when something actually--that I actually care about happens, like the mouse moves or we need to do a redraw." And so most of the time we're going to find ourselves in Mach message overwrite trap.

Please do not open a bug against the kernel saying, "Hey, you guys, I keep finding my code running in here." Okay, that's why you see Mach message overwrite trap. The rest of the time, however, we find ourselves in CF run loop and if we look up the sample stack and look for where the numbering changes, where the tree diverges, so to speak, we find ourselves eventually in thread view draw rect.

And we find that only in 17 out of those 700 samples, okay, once again, maybe 3 or 4% of the time, we found ourselves in thread view draw rect, which is actually the thing for doing the drawing. and ten of the times that we sampled it and found it in DrawRect, we found it doing some NSString drawing, and the rest of the time we found it drawing rectangles.

So this implies that the drawing code is pretty efficient too, though we weren't spending very much time doing it. And this suggests that if I wanted to up the redraw speed so that Thread Viewer wasn't just sort of flashing the screen every second redrawing the display, I could probably make that animation much better. So that's a good thing to know. Okay, so we didn't find a bug, but we learned something about how we're actually going, how we could improve this application.

So one other thing, as I said, this is sampling. There are a few other tools for helping you out. There's a tool called Sample which gives you similar data that's a command line tool that's really good for finding out why the machine is hanging, for example, or why an application is stuck in a loop. You can actually run Sample and get a stack back trace.

The other tool that you may want to know about is GPROF. That's the standard Unix profiler. We actually have that on our system. It generates a text report saying here's the code that's running. If you want slightly more accurate data, GPROF is a better way to go. However, it requires you to recompile your program. Sampler, malloc-debug, and the others don't require you to do recompiles and so they're much easier to use.

Now another way that we could be having performance problems is if we're using the disk badly. That is, if we are trying to read the disk at the wrong time and so on. In the Carbon session they actually went through how important it was not to try to do disk accesses, for example, when you're doing drawing because of the possibility of blocking and slowing down your drawing.

And also to minimize the amount of reads and writes you do during application launch to try to make the application launch as fast as possible. So one thing we can imagine is trying to understand how the application uses the disk, what files it tries to access. Now, luckily there's a really cool tool that will actually help us with this. It's also a command line tool and it's called fsusage.

and what FS Usage does is it basically dumps out a text report for a given process and it actually tells us every single file system call that we do, every open and close and read and write and get DER entries and the like. Now it has to be run as root because it's actually a security hole because you can in theory find out what other people are doing with it. Remember this is the fun of multi-user operating systems.

and Scott did not expose his password unlike me in a previous demo. And what we get is we can say fs usage for thread viewer and we can see the reads and writes and in fact we can find the name of the file, we can find in the far right column the amount of time it took.

Now, this is a relatively boring example, but you can imagine if you ran this, for example, on simple text-- and actually, that's a take home-- a bit of homework for you all. Go home, try running FS usage on simple text when it starts up, and watch what file system accesses it does to go and get the list of fonts and get the resources and the like, and you'll be surprised.

So what we can see here though, if we get back to my problem, is we find that Thread Viewer is every second doing an open Fstat, a write, and a close. And all of these are taking less than a millisecond. They're taking like two ten thousandths of a second to do, according to the numbers on the far right. But this seems a little wrong to me.

And we can see the file that we're doing is /temp/threadviewer-log. And we could actually go and look at that file if we needed to to try to understand what was going on. Actually, don't worry about that. So, so, Fs usage has shown us that Thread Viewer is doing something really brain dead. So the question is, where is that brain deadness in my code? And fsusage doesn't tell us that. Luckily, Sampler has a mode that will actually help us on this problem.

Rather than just doing CPU sampling, Sampler will let us do several other things. It has a mode that helps us track down mallocs, which is very similar to malloc debug. It also has a mode called Watch for File Actions, which will, instead of stopping the program and getting a stack backtrace every 50 milliseconds, it will do that every time you call one of the system file routines. or it will crash. Let's try that again. Did we kill the doc? Oh good now.

So one of the problems with Thread Viewer is because it's a performance tool, and because we're running it, it has this tendency to stop applications when it's looking at them so that it can snarf their memory and the like. And one of the problems with demoing it, which-- I'm not sure why I was silly enough to do that, is that if ThreadViewer manages to crash when it has stopped the program, there's this nasty habit that suddenly the dock is hung, which makes for really good demos because suddenly you're trying desperately to get the system back.

Luckily this didn't happen. Okay, so Scott's got this up and can basically run the program in Sampler for a while and then can hit update and can get a list of all the places where allocations occurred. Here we get the normal call tree starting at the root, starting at main.

It's a little more interesting to go with the invert call tree option here. And here we can see that there were 380 places where we did read, 372 where we did L-seq, 128 opens and the like. And so I think we were doing opens and writes. So let's click on open and we see that of those 128 opens, we see a number in CF read byte and the like. The one that's probably interesting is the F open call there. Okay.

is that it? Yes, which happens to be in the ThreadView controller get sample array. And what's happening here is that in my code for gathering the information on the thread, for some really silly reason, I put in basically a little loop that said, I think I've got that, something like this. Open this file, write out the samples to the disk, close the file.

Okay? And because I'm doing this every second, that's relatively inefficient. It didn't affect ThreadViewer, but you can imagine if you had this in your code, you might want to know about the fact that you were opening and closing the same file a lot of times. It might be more efficient if I'd done something like just open the file once and then just kept writing to it every time I needed data.

Or I could just yank this code out, because it's actually pointless in this program. Okay, so you've seen two ways that you can use Sampler, both to stop the program occasionally to find out what code's executing, and you've seen another of the cool features of Sampler, which is to look at file system accesses.

The final type of problem I'm going to tell you about is drawing. Because all of our applications are graphics based, all the good ones that we do, Most of the good ones we do. Drawing is very important because it's how we communicate with the user. All the value of the Macintosh is basically presenting things to users in graphical ways so that they can understand things, so that they can do the creative work and let the computer do all the boring stuff. And so drawing is a key issue, and you want the drawing to be as efficient as possible so your application runs well.

The problem is that if you do too much drawing, you're going to use CPU time, you're going to use memory because you've got buffers, you're going to be using Mach messages as we saw to communicate with the window manager, which means we'll be blocking. And so too much drawing will cause a lot of blocking. It will be too much work. And so we want to minimize that. So what we can do is there's a really cool tool done by the Core Graphics team. It's called Quartz Debug.

and Scott will first bring up Thread Viewer again, our sacrificial victim. And Quartz Debug has a number of features. The two I'll show you, first of all, it has this option called Flash Screen Update. And what Flash Screen Update is, anytime that it has to redraw any part of the screen, Quartz Debug tells the Windows server to actually color that in yellow. And so that makes the amount of drawing explicit. So you can actually see what goes on in the You know, we actually do a lot of work there.

So that's really cool. And so if you had some case where, let's say, during the same drawing cycle you were redrawing the same thing twice, this would be a way to tell. Another thing it points out is that the way that I handle the drawing in Thread Viewer is that I just erase the entire portion that I'm animating and redraw the entire thing. Maybe it would be more efficient if I actually just redrew the parts that changed every time that I got some new samples.

Oh, oh, there's a really cool feature here I forgot to show you. One of the things that you can do with Thread Viewer is that if the program has something interesting there, you don't want it scrolling off the left. And I don't have history because I've forgotten to add that. So what you can do is you can press that pause button. And the pause button stops the application that I'm looking at, in this case the doc. And it also stops the sampling because the program's not running so I don't need to gather any data. Uh-oh.

However, someone wasn't very bright and when he implemented the pause button, although he pauses the sampling, or the thread data gathering, He didn't bother to stop the display. And so the display that was on a timer so that every second it would cause a redraw gets redrawn every single second. And so this is a bug that would be extremely hard to track down in any other way.

It would be very hard to see that when you hit the pause button, this happens. And if you're using a tool like Sampler,

[Transcript missing]

and so this alone is a wonderful feature. Now another feature in Quartz Debug is what's called Show Window List. And this tells you what the window manager thinks all the windows it knows about are. And as we can see, there are actually about six windows that are part of Quartz Debug, even though only once on the screen.

So ThreadViewer actually has six windows open. Some of them are off screen, some are actually--one is appearing. And the problem is that every single window we create, whether it's an off screen window, whether in this case some of those are actually windows that we created via interface builder that are just not appearing until we actually bring them up, all those windows need to have space in the window manager. And so they occupy memory and thus contribute to our memory footprint and contribute to the chance we might be swapping.

Once again, this is something you may not know. You might not realize how many windows you actually create. And therefore, Quartz Debug actually gives us a way to find that out. And it tells us exactly how many windows we've created. And so now I could go in and I could try finding out exactly which of each of those windows was. And in the case of dialog boxes, make sure I only create them when I actually need them, as opposed to keeping them up all the time.

Okay, I didn't go through all the tools today. There were a couple that you may want to examine on your own. The first one is called SC Usage. It's somewhat like FS Usage, only it looks at some of the system calls, like get time of day or Mach message, and it'll tell you how many calls you're making to that. And you may find some interesting behavior that you didn't expect in your system.

Secondly, there were a number of tools I didn't mention about heap use. So, for example, we saw a little about heap, but the idea of being able to get basically a text output describing all the buffers you've allocated may be interesting to you, and so that may be useful. There's also a command line tool called Leaks, which is like the leak detection in malloc_debug.

Its leak detection algorithm is actually a bit better than malloc_debug's. It'll actually find any buffers that aren't referenced from things that are reachable from well-known spots, so to speak. It'll only find things--it'll actually find groups of data structures that are leaked, and so it's actually more useful. There's also a tool called malloc_history, which I mentioned earlier, that will actually help you identify for a given allocation, for a given address, like, you know, 0xE1C04, who is responsible for that, who actually allocated that block. So if you're in the debugger and you find something interesting, you could actually look at that.

There are also a couple tools for understanding how your application is running. So we saw Sampler. Sampler is the command line equivalent. And we had a quick introduction to VM Map, which is useful for understanding how virtual memory is laid out in your application. If you're coming over from the 9 side, everything is completely different and it may be interesting to actually look at that and realize how memory is laid out.

As I said before, I was really trying to just tease you saying these are the cool tools. There are a number of hints that I should stress again or stress for the first time. The first one is that all these tools have a very nice property, which is that you don't need to recompile your code, you don't need to instrument it, you just run the tools and they work.

This makes them much more available. It's very easy to just sort of go in and look at your own app, look at other people's apps. You know, if you're curious about how some of Apple's own applications do disk stuff, you can actually use some of these tools to find out how they're accessing the disk.

So that's a big advantage. Take advantage of it. Second, if you're coming--if you're working on Code Warrior and you're using CFM binaries as your output so that you can work on 9 and 10, you need to do a little work to actually get the performance tools to find information in the program, to get the symbolic information.

First of all, you need to make sure that you compile your code with the inline traceback table option on. This is part of the code generation settings. And basically this says put the name of the function. Excuse me, put the name of the function immediately after the code in the binary.

Second, Code Warrior gives you the chance to actually use its version of malloc instead of using the system's version of malloc. If you use the Code Warrior version, basically it asks malloc for a huge buffer and then it subdivides it and hands it out. If you do that, then tools like malloc debug, heap, leaks, and the like won't be able to help you with memory analysis. So make sure that you actually turn on that option. And unfortunately, I'm not quite sure exactly where it is.

Third, Object ALEC, although a nice tool, does have the problem that it doesn't understand what CFM apps are, which is probably not that big a deal because it really understands core foundation and Objective C only. However, if you want to look at it and at least see how many objects of malloc size 20 you have, which it will tell you, you need to actually select the Launch CFM app hidden in the system folder and then make sure that you actually name the application you're running on the command line, just as if you were trying to to run the application from the command line.

Okay, so that's my presentation for today. As I said, in two months you're about to hopefully all ship your apps at Macworld New York and you want to make sure that you give the best impression to your customers. So start tuning your code. The primary thing you want to do is cut memory use in all ways. That's going to be the best way to actually make your programs efficient. And so take a look at heap, take a look at how you're using the heap, take a look at how you're using memory, take a look at your private memory use.

Also remember that just looking at the program in isolation isn't going to be useful. Make sure to write down some metrics. Measure how much memory you're using. Measure how long common operations take. Decide whether those are appropriate and then compare them across multiple builds so you can note regressions.

Also remember that your application is not just your binary, but it's also some of the servers that you connect to. And so make sure in the case of drawing to look at both the window manager and at your application. And go out there and please create some great apps. And I'll be looking forward to seeing them at Macworld. Thank you very much.

Thank you Scott. Damn, I forgot I got a slide. If you want more information about these tools, first of all they're all available on the Developer Tools CD. You've all got a copy of it, go off and play with them. If you want documentation, all the graphical applications have documentation built in. All the command line tools have man pages online as standard Unix tools should. There is also documentation in the release notes section.

As I mentioned before, there are also books to help you. Inside Mac OS X Performance is a really cool book that talks about how to tune your application. It gives you information from the level of how the system works to documenting the tools. And all of us engineers actually tried to contribute to this.

Also, there's a Mac OS X system overview book, and this is really good for understanding just sort of the overall ideas behind Mac OS X. And we try to suggest that people actually look at this so that they understand some of the terminology. And with that, I will turn it over to Godfrey.

Thank you very much Robert and Scott. So, last session of the day. Information resources we put up in all of our other tool sessions and the information remains pretty much the same. So a roadmap. We wanted to point you to sessions 121 and 122, even though they've already occurred, so that when you go to the DVDs that you'll receive after the show, you'll see some other areas where we talk about performance tuning. Tomorrow, our last sessions for the tools track happen in Hall 2 at 9 a.m.

That's the debugging of Mac OS X and the feedback forum for Apple developer tools at 3.30 p.m. in J1. Please attend. We're very interested to hear your feedback at the end of the day. If you have questions on tools, you can contact me. I am the Technology Manager for Development Tools. That's my information up above. And the Developer Tools Feedback at macos10toolsfeedbackgroup.apple.com.