Configure player

Close

WWDC Index does not host video files

If you have access to video files, you can configure a URL pattern to be used in a video player.

URL pattern

preview

Use any of these variables in your URL pattern, the pattern is stored in your browsers' local storage.

$id
ID of session: wwdc2005-409
$eventId
ID of event: wwdc2005
$eventContentId
ID of session without event part: 409
$eventShortId
Shortened ID of event: wwdc05
$year
Year of session: 2005
$extension
Extension of original filename: mov
$filenameAlmostEvery
Filename from "(Almost) Every..." gist: ...

WWDC05 • Session 409

Performance Analysis of Your Memory Code

Development Tools • 59:09

Learn to debug and locate memory leaks in your application. You will learn valuable tips and tricks for identifying, analyzing, and squashing these common bugs. We'll also provide a valuable walk-through of a few sample applications using Shark, setenv commands, and more.

Speaker: Dave Payne

Unlisted on Apple Developer site

Transcript

This transcript was generated using Whisper, it may have transcription errors.

Good afternoon. So welcome to performance analysis of memory code. Many of you might have been in the Shark session previously. Shark, as many of you have worked with it, it's a great application for performance analysis. It truly excels at a lot of event-based things like time profiling or when do certain function calls get hit, malloc tracing itself, when does malloc get called, when do VM faults occur, and things like that. But there's a number of other additional tools on the system that we provide that give additional insights into memory issues. And by using these various tools and combinations, you can really find out a lot of additional things about your apps and make them shine. So in this session, we'll do a quick review of the process of performance analysis. We'll look at how to analyze your application's memory use, how to find memory leaks, and how to debug certain nasty memory usage problems. So within Apple, every time we go to release an OS, we have to really hammer home the performance points and really take a systematic approach.

It works best to do this throughout your development cycle. First, identify common use cases. What are users typically going to be doing with your applications? What are the most important things to make sure that they're fast, make sure that they're small? Given those, what are your goals for those? How responsive do you need it to be? What should the throughput for your application be, which is especially important for, say, servers and things like that? And scalability. In fact, scalability is one of the most important things for finding performance issues. is if you've got N squared algorithms, for example, it helps to have a large N in doing testing to really see the effects there.

So once you've got your goals, then establish specific precise benchmarks for what you'd like to hit with those so that then you can add in measurement code in the right areas of your application and measure the time or measure the memory use on a systematic basis. We really recommend that you follow through this process throughout your development from the very beginning. Don't try to tack performance on as a thing in the last week before you ship. You might have painted yourself into a corner by then. So a lot of times you can get relatively inconsistent results. So we really recommend trying to isolate your system. Take it off the network if necessary, if you're not dependent upon it. Make sure that spotlight indexing isn't going on in the background while you're trying to measure your time. Typically measure several runs, drop out the high and low, average out. And again, don't allow regressions throughout your development period. But if you have any, then go in and focus on those and hit those hot spots using the various tools. So just an example of why this kind of thing is important, think about a spotlight importer. For each user ID on the system, for example, Dave P on my system, there's a single MD import process that runs, runs all the importer plugins for all the different document types. So if your importer is slow or if it crashes or uses a lot of resources, then the MD import is running in background. It's supposed to gather the data quickly so that apps can run queries quickly. So if you crash, then the MD import has to be restarted.

So anyway, make sure you're fast, robust. Don't load in additional parts of your application that you might not need for just importing. So many frameworks you might not need. A lot of fonts if you're not gonna index that data. Don't read the file multiple times. Don't start up multiple threads if you're not going to be using them for importing. And then come up with a systematic way to test your code. So in this case, you could use mdimport-p on a large directory that you consistently use with a complex set of representative data files.

So some things to look at in the areas of benchmarking, time, memory size, IO, graphics. Let's look at them in a little more detail. For time, you probably have some ideas of what are the most important areas that you want to run fast in your application, but also look at things like how fast is your application launch? Can you defer things from the time you launch to later on in the application? When your application isn't doing anything, hopefully you're not polling, Hopefully your CPU usage is zero. And responsiveness. When the user does live resize of your windows, is that snappy and efficient? Does it feel fast?

We're gonna focus most of our attention in this session on the memory size issues, both static, what is the memory use right now, and dynamic. If I do an operation, does it pop up and then back down in that operation? Because that can be expensive. And are you leaking memory over time? It might look good now, but if you have it running all night and people hammering on it, does it get much bigger overnight? IO. So, you know, we tend to think of a file on our system, but remember it might be out on the network these days. Look at the opens and closes, reads, writes, stats, get at or list calls, but specifically look at, are you causing lots of paging? Are you reading the same file multiple times? How many files do you have open at one time?

And for graphics, a lot of these problems can be pretty subtle. The systems are so fast, it can be hard to tell that you're drawing the same graphics onto the screen more than one time. Or drawing areas that don't need to be redrawn. Quartz debug is a great tool for looking at that. OpenGL can help look at some of the things like frame rates of games and OpenGL applications.

We won't focus on that much more here. But we do provide a lot of tools is the message here. You definitely want to explore the tools, read the documentation about them, take advantage of them to make your applications fast. The tools are all included, free, of course, with Mac OS X, with the Xcode tools. They're in developer applications performance tools, the graphical ones. There's also some command line ones. The tools do work on your developer transition kit systems. So they work on Intel. They can look properly at Intel binaries as well as all the Mac OS X PowerPC binaries, support all the initiatives, Cocoa, Carbon, Unix. Some of the tools look at Java, Shark especially, not so much the ones I'll be talking about here today. We look at the C-based languages with these. But in general, you don't need to recompile for Gprof or anything like that. These tools just work. And the graphical tools are integrated with Xcode, as we'll see throughout this session in various ways.

So the tools can be grouped in various ways for both looking at memory use, execution time, resources, how many files are you using, et cetera. You can both monitor to just get a general sense of do things look okay? Is there anything unusual that I need to look at or to analyze in more detail? So three tools that we're going to look at some in this session are ObjectAlloc, MallocDebug, and Shark. These are all graphical applications. There are some command line equivalents. The heap tool is similar to object alloc in some ways. Leaks does some of the same things as malloc debug. And for example, the sample command is a quick way of sampling your CPU. The sampler application has similar functionality to Shark that I show here.

So some of the primary differences in looking at memory with these applications, object alloc is very good at looking at the dynamic memory of your application. How are you using specific types of objects in specific ways? How are you retaining and releasing those at various times? You can look at specific instances and the call trees of those. malloc, debug and shark both can provide you a call tree of your entire application and where memory is being allocated. malloc-debug is unique among these three in that it has a leaks mode. It can tell you where the leaks in your memory are. But Shark, one of the really interesting features of it is it also provides a timeline of over time, where, for example, where your malloc calls made, what was the call depth of those, which really lets you see calling patterns there. So let's dive into analyzing application memory use. So the general approach, again, know your target audience. What size hardware are they typically going to have? Are you aiming at real consumer systems with 256 or 512 megabytes of memory? If so, you definitely wanna test on that. But one thing to know is once the resident memory use of your application gets to a certain level, if you do start paging, you're gonna flatten out on the resonant size of your application at that point.

And this is the amount that's actually in RAM, but your app might be needing more than that. And you'd need to know how much more so that you know how far you need to reduce it. So you might also wanna test on large memory configuration systems so you can get a feel for the peak memory use. So definitely wanna try to prevent paging because if you have to go out to disk in trying to access memory, that's gonna really slow things down. Again, general guidelines, allocate lazily, avoid repetition of the same events over and over again.

So some techniques here on this, you can look at things with the top application, top, big top and activity monitor are all ways of looking at what's going on in the system as a whole and multiple different processes. VM map, we'll talk about the fair amount on a way to analyze the memory regions of your application.

Then we'll look in more detail at some of the tools for tracing in depth. And another technique that I find useful sometimes is to actually be stopped in the debugger and looking at the data in something like top and then stepping through my code and I step over a routine and I see a big spike in the virtual size of my application. Whoa, what happened down inside that routine? Let's explore that in more depth. So looking at Top, on the screen here, you see some of the things that I tend to take a look at a fair amount. This is output from running Top-U, which sorts the output in order of most expensive CPU time. So I look at the percentage of CPU. I look at the resident private, the middle column here, and the virtual size of my application. So many other things like resident shared includes the framework data that you're sharing with other processes.

So it's a little hard for you to get a feel for how much that you alone are responsible for and how to reduce that. This can also show you paging information and information about the number of threads. If you're on a dual processor system and expecting to be multi-threaded and you only see one thread, maybe there's a problem there and that type of thing. Big top is a great way to see short spikes in memory usage or just trends over time. Sometimes just reading the output of the top command in terminal, your eyes can glaze over a little bit and miss details. But top shows you trend graphs. So I mentioned VM map. This is a command line tool that can show you what the various different memory regions of your application are, why they were allocated. So some of them are binary image sections, text, data, link edit. This now in Tiger shows you the names of mapped files in your process. Shows you which parts are malloc blocks that you would analyze with other tools. Some applications do their own VM allocates of regions. So this shows you that as well. and also things like stacks and what we identify certain other framework allocated regions.

So typically you just run this on the command line with a VM map and the name of your application. If there's more than one process of that name, use a process ID. There's some new arguments for this in Tiger, specifically the dash resident flag shows both the resident size and the virtual size. And you can show that in either pages or kilobytes.

So one way to use this, for example, would be to run VM map once, save the output into a file, do the operation in your application, run it again, save into a file and look at the differences in file merge. So let's switch to demo two and see an example of this in action.

So I've written a little application here called font test. Now inside this application is some code that a third-party developer asked us about, said, why am I seeing so many mapped files in my application? What is that? Can you give me more information? So if we run Big Top, we can say, okay, let's select, you can either do the system as a whole, or you can select specific processes. So I'm going to watch the resident size, which in this application we currently see is static at about 1.3 megabytes. The virtual size is about 350 meg. Now remember that's lots of data frameworks and all paged in. So I'm gonna watch both of those at the same time. You can barely see the resident size there. Now, if I go out to terminal, I can do a via map dash resident of font test and save that into before.txt.

bring Big Top back up to the front and font test. So now let's do a list fonts here and see what happens. Well, we kind of had a big spike in virtual memory here. Was this what we expected? We've gone from about 350 up to about 400 megabytes here. The resident size has also had some change in there. So now we can come in with VM map again. run the same command again, and look at some of the differences in file merge here.

And I thought I'd changed the fonts before we got here. Wrap text, excellent. Okay, so here we see a lot of the output. We can see that, in fact, let me hide this for a second and look at it in terminal. Less of before.txt. So we can see that, for example, page zero is page zero. Read or write that, you're going to crash. The application typically gets loaded at 1,000 hex.

We see several mapped files and a shared memory region here. Then a lot more frameworks and DYLD. See a bunch of data regions, VM allocations, et cetera. So if we look at the file merge output, We can see now that we're starting to get a lot more mapped files in here this time around. In fact, these look like they were mapped from Library Cache's ATS. So it's font data being mapped in. Can we get a sense of what that's all about? So if we go down to the bottom also, we also see a couple other frameworks brought in. Carbon Core was brought in for this. It's kind of unusual since my impression was this was supposed to be a Cocoa application, but they share things under the covers. So here we can see that the amount of mapped file information has gone, the virtual size went from about 20 megabytes up to 82 megabytes, and the resonant size from about 8.2 meg to about 12.2. It's kind of unusual for just getting the names of the fonts.

So one quick thing we can do to analyze this further is look at this again with Shark, the new system trace facility in Shark 4.2. So we'll flip over to Shark 4.2. System trace of the whole system. Run font test again. Use the option escape to start our system trace, list the fonts, stop the system trace.

So the reason I'm doing this is we didn't see a tremendous change in the resident size, but we saw a lot of change in the virtual size. This might be able to show us some more about what was going on here. So we can see that a lot of the time was idle, but there was some user time, system time. Big top itself was taking some time, but font test had a fair amount. We also see ATS server has a fair amount. So if we look at system calls and go down here to font test, We can see that we've got a lot of VM copies going on, and I can expand this out. This is a reverse call tree from main. We go through a number of layers of ATS down to a VM copy here. Now, one thing, we can do a little bit of data mining using the advanced settings window and say, well, let's flatten the system libraries. Collapse that down again. So now we can see the list fonts. Font test controller list fonts is calling to ATS font get table directory, which underneath there is what's causing the VM changes. We can also see that in a top down fashion as follows. - Mm.

So list fonts, ATS, get font directory. So if we look at this project in Xcode, we see that actually this code is using some low level ATS calls to iterate through the fonts. We get the name, which that's not bad. Get a CFString back from that. We're releasing that, that's good. We build up a font list string, but this code that we got from the third party developer is making these additional calls that, for example, this might be a library routine. This is getting much more detailed information about the fonts that causes us to map them in. We really don't need that data at this point. So in this case, for just getting a list of the fonts, I can eliminate all that code from this path and we would eliminate a lot of paging and additional VM regions here. So shutting that demo down. Okay. Back to slides.

And where did I leave the clicker? All right, I've mentioned object alloc a couple of times. I'm gonna go into some more detail on that. This application excels at helping you look at specific object types of your application. It's got a lot of built-in support where the frameworks, core foundation and foundation, talk to the allocation, the logging routines, so that we know what types of objects are being allocated. We can find out about the retain and release calls of them. But in addition, it now shows the malloc blocks, which might include, for example, C++ objects, where C++ new calls into malloc. So it's dynamic. It watches your application as it runs. So it's great for seeing dynamic memory use. And you can look at specific instances of objects in this, which can be really revealing in some cases. So let's go ahead and take a look at this as well.

Now for this, I'm going to use an application that I pulled off of the web. This is an open source application called AquaLess. In the previous demo, you watched me use the Less Unix pager to page down through the information there. So I've already built this application. I'm going to go ahead and run it. It doesn't do anything just sitting here, but I can So you had a request to increase the font size. Windows settings.

So I'm just gonna do an ls -lr/system. Now it's got a command line command called aless for Aqualess. This takes the information in from the Unix pipe and redirects it out through distributed objects out to the Aqualess window here. So pretty interesting. But let's take a look at whether there's anything interesting going on here. Sometimes it helps to just run your apps performance tools to see if there's anything unexpected. So you can do that directly from Xcode with going into the debug menu, launch using performance tool, and in this case, we'll choose ObjectAlloc.

So back to ObjectAlloc. I can just go ahead and start my application. It gives me a few options here. I'm just going to keep backtraces at this point. So let's say, okay. So now, I'm going to go back. Nice, pretty bar charts here. We can change the scale with the slider on the bottom here. On the left, let's see, I'm not sure. I don't think I have a font size thing on those. Hopefully that's readable. But on the left, we have various different types of memory allocations here. So we can see CFArrays of various types, CFDatas, CFDictionary's. Now, because core foundation objects are toll-free bridged with foundation objects, We don't actually know whether these were NSStrings when we created them or CFStrings. But we can sort by that name, we can sort by the current number of objects of any type, we can sort by the peak that we've ever had, or we can sort by the total. I'll explain this in a little more detail as we go through here. Let's go ahead and start some output up.

So we see that we're getting a lot of data generated here. Things run a bit more slowly under object alloc. Now we're getting something interesting. Let's go ahead and pause the application. Sort again. Now we can see that I've got 120,000 immutable CFStrings created. Whereas I've got 5,000 CFStrings. This is total that were ever created. The current number is much lower than that at 4,000 immutable CFStrings. The peak was a bit higher, but the total, what is going on? And why the colors? Well, the colors are very informative in that, in fact, let me change the scale here because we appear to be way off the side.

If the bar is red, that means that the current number of this type of object that we still have live is less than 10% of the total that we've ever had of that type of object. So apparently we're creating lots more CFStrings than we're actually retaining for long periods of time. Another thing here is we can look at this count in bytes. We can see that we've created over 2 million bytes of CFStrings, but right now we're using about 111,000 bytes of immutable CFStrings. Let's go back to counts. Take note here that the second best or second highest is CFString stores. We've got more than 20 times that many objects there right now. We can look at specific instances of these and the allocation events of those with the backtraces.

Or we can go into the call stacks and say, well, I want to see not just the current objects, but apparently we've got a problem with the total objects. So let's select CFString immutable and descend the maximum path here. So this shows us where the biggest counts of these were allocated in my source code.

So we see the actual allocation happen down here. And this is coming from apparently a factory method of NSString, string with characters length that's called from this routine here. And it has a very handy little link that can take me back to the source code and Xcode. So commit character with style. What's going on here? I thought I was calling NSString. So I've looked at this code a bit. I can look at the command, double click to go to the definition of this. And what's happening here is that we've got a macro definition. This is the original code.

We're taking one Unicode byte in from the input from the pipe that was coming across the distributed objects. One Unicode byte, and we're trying to append it to an accumulation buffer here. To do that, the append string routine takes an NSString argument. So we're creating one. We're doing this using a factory method that creates an auto released object which is generating lots of auto released objects that aren't being freed until the next auto release pool pop. So that could cause choppy behavior in your application when the objects get auto released.

So a somewhat better way to do this is my version number two of this code, which this is identical code except for I changed from using the factory method to using an explicit allocation and release. So that does a much better job of keeping me constant with the number of objects I have live at any one time.

I'm not accumulating them, waiting for them to be auto-released. But at the same time, it's still generating a lot of objects. And maybe we don't need to do that at all for what we're doing here. So I have a version 2 of this code where the Unicode character comes in. And here I just assign that as byte 0 of a 2-byte Unicode character buffer that I created. And I created one CFString that says, I'm gonna use that external buffer and I'm managing the memory of it.

I know it's a two byte buffer. I don't need to explicitly allocate it or free it. I'm just gonna take that to create my CFString to pass it through a pen string. So don't create any objects dynamically on the fly. So if we go and build my version two of the code here, First, let's quit the application as it's currently running. We're paused in object alloc. We can go ahead and quit that and quit object alloc. Make sure that we're built again. Okay. So let's see if that made any difference on the application. So we can run it under object alloc again. Let's go ahead and do the same thing. We'll start our same output coming out.

And we can see that that is generating output. So remember it was CFString immutable that we had so many of last time around. Sorting by total at this point, we can see that now the second place, CFStringStore is looking expensive, but it's down in the 80,000 range at this point, as opposed to the 200,000 that we had before, the 20 times more. So apparently we completely eliminated the use of one type of object there. So we could walk down through all of these red ones and look for opportunities to reduce the dynamic memory use of this application, which may also then help it run faster and help the rest of the system run faster. Okay, so... Cleaning up after this one. And back to slides, please. All right, so we've looked at analyzing the memory use of the application.

Now let's talk about finding memory leaks. This can often be a favorite topic internally at Apple because we want to try to get all the leaks out of our frameworks so that you don't get hit by those leaks. The standard techniques that we've got for finding memory leaks, there's the malloc debug application. There's the leaks command line tool, which if you use that with an environment variable that causes your running application to generate stack logging, then the leaks command line tool can show you a stack backtrace as to where allocations happened.

But I'm also gonna show a third technique here that's a little advanced with our current state of the tools. I can combine the use of leaks command with the retain release mechanisms of object alloc to find leaks that may be more associated with not releasing things enough times or over retaining them as opposed to just allocating. Now, one thing to be careful of, and we sometimes run into this, Because occasionally there's a mysterious leak up in the highest levels of the application, so the programmer fixes it by putting an extra release in up there. Later on, a framework programmer comes along and says, oh, my framework is leaking. I'd better stop that. So the framework programmer puts release in and gets tested on many different ways and says great, things are leaking less.

Until we get to the application where the smart application programmer had fixed it and now they crash because they're getting an over release of an object. So you want to make sure that you're testing things fairly thoroughly and actually looking for root cause. If you aren't the thing that's leaking it, maybe you shouldn't be trying to fix it. You should be trying to find what the actual cause of the leak is.

So malloc-debug, this application shows a full call tree from the top of your application down of where all your memory is being allocated. This is the memory that's currently in use. But it also has a mode to say, well, let's find the leaked memory and show how much is being leaked from where. There were some limitations with malloc-debug in the past where if you had an application required two-level namespace. So that's the default, is two-level namespace. But if your application required it, it couldn't work with malloc-debug because malloc-debug used to require a flat namespace.

Now, what this means is if you have two separate, say, libraries being linked into your application and they each declare the same symbol, that normally works fine. It didn't used to work when working with malloc debug and the app would crash. That should now work. Another enhancement is if you have an application that forks and execs other processes, that fork and exec now properly works. the child processes inherit the lib malloc debug setting and you can then in mallocdebug.app attach to those child processes. So I tried this, for example, with terminal. I said, run terminal under malloc debug and then I could attach to all the shells or things that were started from the shells.

So the second mechanism here, the leaks command line tool with the malloc stack logging environment variable. So what you wanna do here is when you're launching your application, your target application, is set this environment variable that changes the behavior of the system malloc to say record where all the allocations occur. What are the backtraces of all of those? I'll explain a little more on how to do this on the next slide. But one thing I wanna talk about is, in general, how is our leak detection work with both leaks and malloc debug? What we do is we walk through all of the malloc regions of your code, and we say, okay, these are all the malloc blocks. Then we walk through all of the memory that your application is accessing. So all of the allocated code, all of your stack information, And we look for pointers to the malloc blocks. Any block that's being pointed to from things that are actually accessible from the top levels of your application is considered to not be leaked. And any malloc blocks that are left over at the end are considered to be leaked. And we do detections of cycles of leaked objects and things like this.

But there can be some issues here in that 4 byte blocks looking like pointers, well, maybe they're not actually pointers. Some things that can happen, you could allocate memory but not have initialized it yet, and maybe it had some leftover data in there that was a pointer to a block. If you were to set the malloc_prescribble and malloc_scribble environment variables, those say write into memory when we initialize it or write into it when we free it, write a certain known bit pattern that cannot be a pointer.

So this can make your leak information a little bit more consistent. And also try to stress test your application over a long period of time, which can give you larger amounts of leak information. So I say setting Unix environment variables. Now, I'm an old Unix geek. I've been doing that for 20 years, but for those of you who may be new to that, there are some nuances here.

So the first thing to note is actually the default shell when you create a new user on Mac OS X is now the bash shell. And the syntax for setting environment variables in bash is a little different than with the syntax I gave previously, which is for the shell I use still, which is TCSH. So what you do in bash is say exports, and then the name of the environment variable, malloc stack logging equals the value. In this case, you want one. Then you need to launch your application in the context of that environment variable setting.

So you need to launch it from that terminal session. So you got to the command line, you change directory to where the application lives, but it's not sufficient to just say safari.app, because what we need to do is actually get down to the application binary itself down inside the app wrapper. So we can say./safari.app/contents/maq.app. The other syntax that I'm using to try to be consistent here and show that they're environment variables is this setenv.

So I'm not actually gonna run leaks in demo here, but this is what you might see if you ran it. This is the example application I'm going to use. I'll explain that in a second. So this is a typical output. It lists some information about how many malloc nodes you have and how many bytes they have, and then how much is being leaked. Now, in this case, it's not a lot, but over the course of time, it could add up. So it then shows the pointer where that leaked block is, what its size is, and makes an attempt to tell you what type of block it is, and then show you the contents, the initial contents of it. Now, because we had the malloc stack logging environment variable set when we launched this application, now I see the call trace. If you don't get the environment variables set properly, you won't see this.

So the call stack here, what I usually do is walk sort of from the bottom up. I look at the CFAllocate or allocate and say, well, I didn't call that. CFRuntimeCreateInstance, well, I didn't call that either. CFUR, I just go backwards up to where I start to get to code that I'm familiar with.

And this can take some exploring around. It can be a little bit easier with malloc-debug as we'll see in a second. But in this case, maybe that add text to the text view where that sounds different than everything that's core foundation below it, where we're copying a resource URL. And because it says it's an instance of NSURL that's been leaked, that's a decent guess as to where the leak might be.

So this third technique I'm going to talk about came specifically from trying to work out the leaks in this MLTE showcase example. Now the MLTE showcase is a standard developer examples, Carbon example on your system. This is the multilingual text editor of the Carbon world. So I ran into a leak in this where, as you'll see, The object was actually allocated when the application said, "Load in my nib file." And about 18 levels deep, it allocates an object and it's being leaked. Well, I had no way of even getting to that object, I didn't think. So it turns out this was a retain-release match.

So I was able to launch the target application under object alloc, turn on the reference counting so that it would also record the retains and releases in addition to allocates and frees. And then run the target application. In a terminal window, I ran leaks, and then I could get the pointer of the leaked object, find that in object alloc, and examine the retain release. So it's kind of an advanced technique that we could probably make a bit easier in the future, but it could be useful to you now. So let's see a demo here at this point. So we'll shift over to the MLTE showcase application. Now I've actually added a little bit of code in here to exercise my application multiple times. I'm gonna run this under malloc-debug. It's already, let's go ahead and do a build. Yep, it's built. Run this under malloc-debug.

We can open this window up a bit and say launch. So you see the window flash five times. This is one of our test team's favorite techniques of using Apple scripts to control our applications to do the same thing over and over and over again all night long. And then they say, I hammered your application all night like this, and this problem resulted. So this is a very short mechanism of saying, Well, what would happen if the user, oops, I seem to have quit my app. What would happen if the user ran it multiple times? Closing and opening windows. In fact, I can do it, you know, this essentially it was this that was going on. So, as I've mentioned, this shows by default a top-down view of how much memory got allocated where in the call tree of your application.

And I can flip that around and I can say, well, I want to see the bottom up. But instead what I want to do is come over here and say, well, actually, I want to see the leaks. Are there any leaks in this application? So it goes off and does this static analysis of the state of the memory right now. We can walk down through and see, well, it's leaking 424 bytes. Well, I can walk down through, for example, quite a bit. But what I like to do is invert the call tree. So those of you familiar with Shark, this is equivalent to bottom up or the heavy view. And I can see that here's where the actual allocation occurred. And here we can see the full name of the function that I've selected. So walking backward, we see a CFURL alloc. core foundation core foundation core foundation Okay, so here we see CFBundle copy resource URL. That's the same as we saw in the leaks output. And that is being called from add image. There's a little icon here next to add image that shows us that we have source code available for that. So I can double click on that, bring up the source code in Xcode. Now that appears to be a URL that we're leaking. CFBundle copy resource URL. So it's just telling me that somewhere in this function, we're calling CF copy bundle resource URL. I do that fine. Let me say copy resource. And you can probably already see it.

So, image URL here being returned by CF copy bundle. So there's a copy going on here. If I select this, hmm. So we're copying, but we're not releasing. So this one actually appears to be fairly simple. So I can just add in a CF release after our last use of this, and that will probably fix that issue. Before I rebuild, is there anything else that's gonna be easy to fix here? Walking back this other path, we see the CGColorCreate is being called.

from down inside HITextView initialize, and on and on. Now this is the one that's actually coming out of the IB Carbon Runtime after we load the nib for a new window. Ooh, that sounds hard. I'll come back to that. - So there's one other one here, which is right in my code, CMLTE view data operator new. So apparently a C++ object being allocated. I wanna see actually the call site of that.

So operator new is being called from this set up the text view. So again, since it wasn't too handy and didn't tell me exactly where it did the new. Okay, so again, this example is copied straight out of developer examples Carbon. When we set up a text view, we're putting some data onto the side of that text view. And there's a big comment here that don't forget to dispose this when HITextView destructs. Well, if you build this example, you'll actually find that the destructor never gets called. Because they dispose the window without going to the trouble of finding, of pulling the object back out and doing a delete on the data that we added to it. So I added a tear down the text view method here that gets the text view from the window, retrieves the instance data and does a delete on that.

And I've actually commented my call out to that previously, so I'll uncomment that, save that. And so now I think that I've fixed two leaks out of three. So let's go ahead and quit out of the tools. Sorry, build. Oops, that got ran there. Build. Okay, let's run under malloc debug again, see how we did.

for leaks. So before we had about 428 bytes. Now we've got 48. That's pretty good. So specifically, ooh, it's that CG color create. All I'm doing is loading the nib file in. So let's try a different approach. My allocation appears to be okay. I don't think I'm getting that color myself.

So let's run it under object alloc. This time through, we want to record the reference counting as well. So the retain release calls, and I'll also record the general allocations by library. So it runs a bit more slowly under object alloc. And here we go. So now if I come into a terminal window and run leaks on MLTE, it should find the right application there. So there's a couple of them there. Let's grab one of these pointers and let's go.

Now, one of the things that ObjectAlloc does is it actually keeps these as events all the way through. So up here at the very top, let me go ahead and pause the application. There's a scroller here at the top that lets you scroll back and forth through the event history of the application. So you can actually make these bars go back and forth. So I want to find the last allocation event for this pointer.

or the last events period. So I can do a previous, and we see that it was way back here in time. bring up the inspector and I can see what the call tree of that is. It's a CG color that's being released. It's a CF release event at this point in the code.

So if I were to say, I want to sort by that, now we can see that having checked, show me the allocations by library. There's a lot of ATS there, for example. This should be a current object. So if I come down here to CG color, somewhere in there is going to be, and we'll try to integrate this functionality better in the future.

But eventually you'd see that there was a retain release happening from here where we copy the background color into previous color. We create a different color with a modified alpha from that new color. We set the background color which will release the old one, retain the new one. And then we're releasing the new color.

but we did a copy to get the old color. And we never actually released the old color after we copied it. So this is pretty subtle. This is in my application code here where I didn't allocate this object, but I did copy it. So it required a somewhat more advanced technique to come in and find what the issue was. So I can release previous color, quit the app, which is paused and object-delic. Now if I build again and run it, let's say, under malloc-debug, hopefully we'll see no leaks this time around. No leaks.

So memory usage problems. There's a variety of other things here that's not so much performance issues, but if you have buffer overruns or underruns or access uninitialized memory or freed memory, those can be nasty problems to find because they're very often intermittent. It depends on what the content of memory is or what you overrun into. Those can be really hard problems to fix. So what we really wanna do is try to make them happen then you can debug them. So some techniques that Mac OS X provides to help you with this is the guard malloc debugging library, some environment variables on the malloc system and watch points in GDB that might help you find memory stompers. Watch points are relatively a new feature for us. I'm not gonna go into it in detail here, but you can read the Xcode documentation about that.

So the way guard malloc works is that each and every allocation that your application makes goes onto a separate virtual memory page. So each page is 4K. So if you're doing a 16 byte allocation that gets 4K allocated. It leaves the page before that and the page after that unallocated. So, and then when the block is freed, we deallocate that page. And if your code tries to access a deallocated page, you'll get an immediate crash. So unfortunately this causes your application to run a lot slower because it's using a lot more, you know, pages of virtual memory, but hopefully that'll save you human time of attempting to figure out a way to find this problem yourself.

So by default, it's set up to catch buffer overruns. It aligns allocations with the end of the page because buffer overruns are more common than underruns. So if you walk off the end of that VM page, then you get to the unallocated page after that, and you get a crash.

If you need to catch a buffer under run, you might try setting an environment variable to say malloc protect before. Some of our system frameworks, some applications, don't really like to work with memory that we have pre-initialized to odd values. So we don't do this next one all the time, but you can set an environment variable that says, fill the newly allocated memory with values that would cause a crash if they were used inappropriately. And if you do call free twice on the same block, or if you try to free a block that was never allocated, we've changed guard_malik so that it immediately crashes into the debugger.

So because we're trying to help you have it crash all the time, you'll normally want to use this from a debugger. Easiest way is from within Xcode in the debug menu item to say enable guard malloc and then just debug as normal. If you're using GDB at the command line, you can set the environment variable there with the GDB syntax, set space, env space, variable value. So this is saying insert the guard malloc library into the application. So that's user lib, libgmalloc.dilib. So as with malloc debug, you no longer need flat namespace. So if you were using this before and setting that variable, you no longer need to do that. So the libgmalloc man page has much more information about this.

Now, I've mentioned environment variables that control the malloc system. These are all documented on the malloc man page. Previously, I mentioned malloc pre-scribble that says when I allocate a block, then write a certain sort of garbage value into that. When I'm freeing a block, then malloc scribble says write into that. If you are somehow trashing the memory pointers, the metadata of the malloc heap itself, we can help to catch that. And for the large blocks, there's a way to put guard pages around that. But again, that's only the large blocks. So this won't help catch nearly as many problems, but it runs at the full speed of your application at least. So it could be helpful in some circumstances. So anyway, I've given you a kind of a whirlwind tour of a variety of aspects here, reviewed the performance analysis process.

We encourage you to be disciplined about looking at the performance of your application. There's a lot of tools that you've seen, these plus others, read the documentation. They can really help you analyze your memory use, find memory leaks, and debug hard to catch memory problems. So with those, we help you continue to turn out great applications for both PowerPC and Intel.

So for more information, the standard page here. You've probably seen that a lot of times. There's a feedback form immediately after this at 5:00. for developer tools. Contact information, you can contact Xavier Legros, our technology manager. You can also send feedback to the performance tools feedback list at perf tools dash feedback at group.apple.com or the Xcode feedback list or the Xcode users list. So there's a lot of helpful resources out there.