Performance Analysis of Your Memory Code - WWDC 2005

Development Tools • 59:09

Learn to debug and locate memory leaks in your application. You will learn valuable tips and tricks for identifying, analyzing, and squashing these common bugs. We'll also provide a valuable walk-through of a few sample applications using Shark, setenv commands, and more.

Speaker: Dave Payne

Unlisted on Apple Developer site

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.

Once you have your goals, establish specific, precise benchmarks for what you would like to hit with those so you can add in measurement code in the right areas of your application and measure the time or measure the memory use on a systematic basis. We really recommend that you follow through this process throughout your development from the very beginning. Don't try to tack performance on as a thing in the last week before you ship. You might have painted yourself into a corner by then.

So, a lot of times you can get relatively inconsistent results. So, we really recommend trying to isolate your system, take it off the network if necessary, if you're not dependent upon it. Make sure that spotlight indexing isn't going on in the background while you're trying to measure your time. Typically, measure several runs, drop out the high and low, average out. And again, don't allow regressions throughout your development period. But if you have any, then go in and focus on those and hit the button.

Hit those hot spots using the various tools. So, just an example of why this kind of thing is important. Think about a spotlight importer. For each user ID on the system, for example, Dave P. on my system, there's a single MD import process that runs. It runs all the importer plug-ins for all the different document types. So, if your importer is slow or if it crashes or uses a lot of resources, then the MD import is running in background. It's supposed to gather the data quickly so that apps can run queries quickly.

So, if you crash, then the MD import has to be restarted. So, anyway, make sure you're fast, robust. Don't load in additional parts of your application that you might not need for just importing. So, many frameworks you might not need. A lot of fonts if you're not going to index that data. Don't read the file multiple times.

Don't start up multiple threads if you're not going to be using them for importing. And then come up with a systematic way to test your code. So, in this case, you could use MD import dash P on a large directory that you consistently use with a complex set of representative data files.

So some things to look at in the areas of benchmarking: time, memory size, I/O, graphics. Let's look at them in a little more detail. For time, you probably have some ideas of what are the most important areas that you want to run fast in your application, but also look at things like how fast is your application launch? Can you defer things from the time you launch to later on in the application?

When your application isn't doing anything, hopefully you're not polling, hopefully your CPU usage is zero. And responsiveness, when the user does live resize of your Windows, is that snappy and efficient? Does it feel fast? We're going to focus most of our attention in this session on the memory size issues, both static, what is the memory use right now, and dynamic.

If I do an operation, does it pop up and then back down in that operation, because that can be expensive. And are you leaking memory over time? It might look good now, but if you have it running all night and people hammering on it, does it get much bigger overnight?

I/O, so we tend to think of a file on our system, but remember, it might be out on the network these days. Look at the opens and closes, reads, writes, stats, get at or list calls, but specifically look at are you causing lots of paging, are you reading the same file multiple times? How many files do you have open at one time?

Graphics. A lot of these problems can be pretty subtle. The systems are so fast that it can be hard to tell that you're drawing the same graphics onto the screen more than one time. Or drawing areas that don't need to be redrawn. Quartz Debug is a great tool for looking at that. OpenGL can help look at some of the things like frame rates of games and OpenGL applications. We won't focus on that much more here.

But we do provide a lot of tools is the message here. You definitely want to explore the tools, read the documentation about them, take advantage of them to make your applications fast. The tools are all included, free, of course, with Mac OS X, with the Xcode tools. They're in developer applications performance tools, the graphical ones. There's also some command line ones. The tools do work on your developer transition kit systems. So they work on Intel. They can look properly at Intel binaries as well as all the Mac OS X PowerPC binaries.

Support all the initiatives, Cocoa, Carbon, Unix. Some of the tools look at Java, Shark especially. Not so much the ones I'll be talking about here today. We look at the C-based languages with these. But in general, you don't need to recompile for GPROF or anything like that. These tools just work. And the graphical tools are integrated with Xcode as we'll see throughout this session in various ways.

The tools can be grouped in various ways for both looking at memory use, execution time, resources, how many files you're using, etc. You can both monitor to just get a general sense of do things look okay, is there anything unusual that I need to look at, or to analyze in more detail. So, three tools that we're going to look at some in this session are ObjectAlloc, malloc-debug, and Shark.

These are all graphical applications. There are some command line equivalents. The heap tool is similar to ObjectAlloc in some ways. Leaks does some of the same things as malloc-debug. And, for example, the sample command is a quick way of sampling your CPU. The sampler application has similar functionality to Shark that I show here. Amen.

So some of the primary differences in looking at memory with these applications. Object Alloc is very good at looking at the dynamic memory of your application. How are you using specific types of objects in specific ways? How are you retaining and releasing those at various times? You can look at specific instances and the call trees of those.

malloc-debug and shark both can provide you a call tree of your entire application and where memory is being allocated. malloc-debug is unique among these three in that it has a leaks mode. It can tell you where the leaks in your memory are. But Shark, one of the really interesting features of it is it also provides a timeline of over time where, for example, where your malloc calls made, what was the call depth of those, which really lets you see calling patterns there.

So let's dive into analyzing application memory use. So the general approach, again, know your target audience. What size hardware are they typically going to have? Are you aiming at real consumer systems with 256 or 512 megabytes of memory? If so, you definitely want to test on that. But one thing to know is once the resident memory use of your application gets to a certain level, if you do start paging, you're gonna flatten out on the, the resident size of your application at that point.

And this is the amount that's actually in RAM, but your app might be needing more than that. And you'd need to know how much more so that you know how far you need to reduce it. So you might also want to test on large memory configuration systems so you can get a feel for the peak memory use. So definitely want to try to prevent paging because if you have to go out to disk in trying to access this memory, that's gonna really slow things down. Again, general guidelines, allocate lazily, avoid repetition of the same events over and over again.

Some techniques here on this. You can look at things with the top application. Top, Big Top, and Activity Monitor are all ways of looking at what's going on in the system as a whole and multiple different processes. VM Map will talk about a fair amount on a way to analyze the memory regions of your application.

Then we'll look in more detail at some of the tools for tracing in depth. And another technique that I find useful sometimes is to actually be stopped in the debugger and looking at the data in something like Top, and then stepping through my code, and I step over a routine, and I see a big spike in the virtual size of my application. Whoa! What happened down inside that routine? Let's explore that in more depth.

So looking at Top, on the screen here, you see some of the things that I tend to take a look at a fair amount. This is output from running Top-U, which sorts the output in order of most expensive CPU time. So I look at the percentage of CPU.

I look at the resident private, the middle column here, and the virtual size of my application. Some of the other things, like resident shared, includes the framework data that you're sharing with other processes. So it's a little hard for you to get a feel for how much of that you alone are responsible for, and how to reduce that.

This can also show you paging information and information about the number of threads. If you're on a dual processor system and expecting to be multi-threaded and you only see one thread, maybe there's a problem there, and that type of thing. Big Top is a great way to see short spikes in memory use. Or just trends over time. Sometimes just reading the output of the Top command in terminal, your eyes can glaze over a little bit and miss details.

But Top shows you trend graphs. So I mentioned VM Map. This is a command line tool that can show you what the various different memory regions of your application are, why they were allocated. So some of them are binary image sections, text, data, link edit. This now in Tiger shows you the names of mapped files in your process. Shows you which parts are malloc blocks that you would analyze with other tools.

Some applications do their own VM allocates of regions. So this shows you that as well. And also things like stacks and what we identify certain other framework allocated regions. So typically you just run this on the command line with a VM map and you can see the name of your application. If there's more than one process of that name, use a process ID. There's some new arguments for this in Tiger. Specifically the dash resident flag shows both the resident size and the virtual size. And you can show that in either pages or kilobytes.

One way to use this, for example, would be to run VM map once, save the output into a file, do the operation in your application, run it again, save into a file, and look at the differences in file merge. So let's switch to demo two and see an example of this in action.

So I've written a little application here called FontTest. Now inside this application is some code that a third-party developer asked us about. Said, why am I seeing so many mapped files in my application? What is that? Can you give me more information? So if we run BigTop, we can say, okay, let's select, you can either do the system as a whole, or you can select specific processes.

So I'm going to watch the resident size, which in this application we currently see is static at about 1.3 megabytes. The virtual size is about 350 meg. Now remember that's lots of, you know, is the founder of the Google Chrome browser. He's been working on the web-based application.

He's been working on the web-based application. He's been working on the web-based application. We'll talk about that in a minute. Bring Big Top back up to the front and font test. So now let's do a list fonts here and see what happens. Well, we kind of had a big spike in virtual memory here.

Was this what we expected? We've gone from about 350 up to about 400 megabytes here. The resident size has also had some change in there. So now we can come in with VM Map again. Run the same command again. and look at some of the differences in file merge here.

And I thought I'd changed the fonts before we got here. Wrap text, excellent. Okay, so here we see a lot of the output. We can see that, in fact, let me hide this for a second and look at it in terminal. Less of before.txt. So we can see that, for example, page zero is page zero.

Read or write that, you're going to crash. The application typically gets loaded at 1,000 hex. We see several mapped files and a shared memory region here. Then a lot more frameworks and DYLD. See a bunch of data regions, VM allocations, et cetera. So if we look at the file merge output.

We can see now that we're starting to get a lot more mapped files in here this time around. In fact, these look like they were mapped from Library Cache's ATS. So it's font data being mapped in. Can we get a sense of what that's all about? So if we go down to the bottom also, we also see a couple other frameworks brought in. Carbon Core was brought in for this. It's kind of unusual since my impression was this was supposed to be a Cocoa application.

But they share things under the covers. So here we can see that the amount of mapped file information has gone. The virtual size went from about 20 megabytes up to 82 megabytes. And the resonant size from about 8.2 meg to about 12.2. It's kind of unusual for just getting the names of the fonts.

So one quick thing we can do to analyze this further is look at this again with Shark, the new system trace facility in Shark 4.2. So we'll flip over to Shark. System trace of the whole system. Run font test again. Use the option escape to start our system trace, list the fonts, stop the system trace.

So the reason I'm doing this is we didn't see a tremendous change in the resident size, but we saw a lot of change in the virtual size. This might be able to show us some more about what was going on here. So we can see that a lot of the time was idle, but there was some user time, system time. BigTop itself was taking some time, but FontTest had a fair amount. We also see ATS Server has a fair amount.

So if we look at system calls and go down here to FontTest, We can see that we've got a lot of VM copies going on. And I can expand this out. This is a reverse call tree from main. We go through a number of layers of ATS down to a VM copy here.

Now one thing, we can do a little bit of data mining using the advanced settings window and say, well, let's flatten the system libraries. Collapse that down again. So now we can see that List Fonts, Font Test Controller List Fonts is calling to ATS font get table directory, which underneath there is what's causing the VM changes. We can also see that in a top-down fashion as follows.

So, list fonts, ATS, get font directory. So if we look at this project in Xcode, we see that actually this code is using some low level ATS calls to iterate through the fonts. We get the name, which that's not bad. Get a CFString back from that. We're releasing that, that's good. We build up a font list string, but this code that we got from the third-party developer is making these additional calls that, you know, for example, this might be a library routine.

This is getting much more detailed information about the fonts that causes us to map them in. We really don't need that data at this point. So in this case, for just getting a list of the fonts, I can eliminate all that code from this path, and we would eliminate a lot of paging and additional VM regions here.

[Transcript missing]

And where did I leave the clicker?

All right, I've mentioned ObjectAlloc a couple of times. I'm gonna go into some more detail on that. This application excels at helping you look at specific object types of your application. It's got a lot of built-in support where the frameworks, core foundation and foundation, talk to the allocation and the logging routines so that we know what types of objects are being allocated. We can find out about the retain and release calls of them.

But in addition, it now shows the malloc blocks, which might include, for example, C++ objects, where C++ new calls into malloc. So it's dynamic. It watches your application as it runs. So it's great for seeing dynamic memory use. You can look at specific instances of objects in this, which can be really revealing in some cases. So let's go ahead and take a look at this as well.

Now for this, I'm going to use an application that I pulled off of the web. This is an open source application called AquaLess. In the previous demo, you watched me use the Less Unix pager to page down through the information there. So I've already built this application. I'm going to go ahead and run it.

It doesn't do anything, just sitting here, but I can... So I'm just going to do an ls -lr/system. Now it's got a command line command called aless for Aqualess. This takes the information in from the Unix pipe and redirects it out through distributed objects out to the Aqualess window here. So pretty interesting.

But let's take a look at whether there's anything interesting going on here. Sometimes it helps to just run your apps under the performance tools to see if there's anything unexpected. So you can do that directly from Xcode with going into the debug menu, launch using performance tool, and in this case, we'll choose object alloc. So back to Object Alloc. I can just go ahead and start my application. It gives me a few options here. I'm just going to keep backtraces at this point. So let's say okay.

So now... Nice, pretty bar charts here. We can change the scale with the slider on the bottom here. On the left, let's see, I'm not sure, I don't think I have a font size thing on those. Hopefully that's readable. But on the left, we have various different types of memory allocations here. So we can see CFArrays of various types, CFDatas, CFDictionary's.

Now, because core foundation objects are toll-free bridged with foundation objects, we don't actually know whether these were NSStrings when we created them or CFStrings. But we can sort by that name. We can sort by the current number of objects of any type. We can sort by the peak that we've ever had, or we can sort by the total. I'll explain this in a little more detail as we go through here. Let's go ahead and start some output up.

We're getting a lot of data generated here. Things are running a bit more slowly under ObjectAlloc. Now we're getting something interesting. Let's pause the application and sort again. Now we can see that I've got 120,000 immutable CFStrings created. I've got 5000 CFStrings. This is total that were ever created.

The current number is much lower than that at 4000 immutable CFStrings. The peak was a bit higher. But the total, what is going on? And why the colors? Well, the colors are very informative in that, in fact, let me change the scale here because we appear to be way off the side.

If a bar is red, that means that the current number of this type of object that we still have live is less than 10% of the total that we've ever had of that type of object. So apparently we're creating lots more CFStrings than we're actually retaining for long periods of time. Another thing here is we can look at this count in bytes.

We can see that we've created over 2 million bytes of CFStrings, but right now we're using about 111,000 bytes of immutable CFStrings. So let's go back to counts. So take note here that the second highest is CFString stores. We've got more than 20 times that many objects there right now. We can look at specific instances of these. and the allocation events of those with the backtraces.

Or we can go into the call stacks and say, well, I want to see not just the current objects, but apparently we've got a problem with the total objects. So let's select CFString immutable and descend the maximum path here. So this shows us where the biggest counts of these were allocated in my source code.

So we see the actual allocation happen down here. And this is coming from apparently a factory method of NSString, string with characters length, that's called from this routine here. And it has a very handy little link that can take me back to the source code and Xcode. So commit character with style. What's going on here? I thought I was calling NSString. So I've looked at this code a bit. I can look at the command, double-click to go to the definition of this. And what's happening here is that we've got a macro definition. This is the original code.

We're taking one Unicode byte in from the input from the pipe that was coming across the distributed objects. One Unicode byte, and we're trying to append it to an accumulation buffer here. To do that, the append string routine takes an NSString argument, so we're creating one. We're doing this using a factory method that creates an auto-released object, which is generating lots of auto-released objects that aren't being freed until the next auto-release pool pop. So that could cause choppy behavior in your application when the objects get auto-released.

So a somewhat better way to do this is my version number two of this code, which this is identical code except for I changed from using the factory method to using an explicit allocation and release. So that does a much better job of keeping me constant with the number of objects I have live at any one time.

I'm not accumulating them, waiting for them to be auto-released, but at the same time, it's still generating a lot of objects. And maybe we don't need to do that at all for what we're doing here. So I have a version two of this code where the Unicode character comes in, and here I just assign that as byte zero of a two-byte Unicode character buffer that I created. And I created one. So I have one CFString that says I'm going to use that external buffer, and I'm managing the memory of it. I know it's a two-byte buffer. I don't need to explicitly allocate it or free it.

I'm just going to take that to create my CFString to pass it through a pen string. So don't create any objects dynamically on the fly. So if we go and build my version two of the code here. - First, let's quit the application as it's currently running. We're paused in ObjectAlloc. We can go ahead and quit that and quit ObjectAlloc.

[Transcript missing]

Let's go ahead and do the same thing. We'll start our same output coming out.

And we can see that that is generating output. So remember it was CFString immutable that we had so many of last time around. Sorting by total at this point, We can see that now the second place, CFStringStore, is looking expensive, but it's down in the 80,000 range at this point as opposed to the 200,000 that we had before, the 20 times more. So apparently we completely eliminated the use of one type of object there.

So we could walk down through all of these red ones and look for opportunities to reduce the dynamic memory use of this application, which may also then help it run faster and help the rest of the system. run faster. Cleaning up after this one. And back to slides, please. All right, so we've looked at analyzing the memory use of the application.

Now let's talk about finding memory leaks. This can often be a favorite topic internally at Apple because we want to try to get all the leaks out of our frameworks so that you don't get hit by those leaks. So, the standard techniques that we've got for finding memory leaks, there's the malloc-debug application. There's the leaks command line tool, which if you use that with an environment variable that causes your running application to generate stack logging, then the leaks command line tool can show you a stack backtrace as to where allocations happened.

But I'm also going to show a third technique here that's a little advanced with our current state of the tools. I can combine the use of leaks command with the retain release mechanisms of object alloc to find leaks that may be more associated with not releasing things enough times or over retaining them as opposed to just allocating. Now, one thing to be careful of, and we sometimes run into this, is occasionally there's a mysterious leak up in the highest levels of the application, so the programmer fixes it by putting an extra release in up there.

Later on, a framework programmer comes along and says, oh, my framework is leaking. I'd better stop that. So the framework programmer puts a release in and gets tested on many different ways and says, great, things are leaking less. Until we get to the application, where the smart application programmer had fixed it, and now they crashed because they're getting an over release of an object. So you want to make sure that you're testing things fairly thoroughly and actually looking for root cause. If you aren't the thing that's leaking it, maybe you shouldn't be trying to fix it. You should be trying to find what the actual cause of the leak is.

So malloc-debug, this application shows a full call tree from the top of your application down of where all your memory is being allocated. This is the memory that's currently in use. But it also has a mode to say, well, let's find the leaked memory and show how much is being leaked from where. There were some limitations with malloc-debug in the past where if you had an application that required two-level namespace, so that's the default is two-level namespace. But if your application required it, it couldn't work with malloc-debug because malloc-debug used to require flat namespace.

What this means is if you have two separate libraries linked into your application and they each declare the same symbol, that normally works fine. It didn't used to work when working with malloc debug and the app would crash. That should now work. Another enhancement is if you have an application that forks and execs other processes, that fork and exec now properly works.

The child processes inherit the lib malloc debug setting, and you can then in mallocdebug.app attach to those child processes. So I tried this, for example, with terminal. I said run terminal under malloc debug, and then I could attach to all the shells. or things that were started from the shells.

The second mechanism here, the leaks command line tool with the malloc stack logging environment variable. What you want to do here is when you're launching your application, your target application, is set this environment variable that changes the behavior of the system malloc to say record where all the allocations occur. What are the backtraces of all of those?

I'll explain a little more on how to do this on the next slide. But one thing I want to talk about is in general, how does our leak detection work with both leaks and malloc debug? What we do is we walk through all of the malloc regions of your code and we say, okay, these are all the malloc blocks.

Then we walk through all of the memory that your application is accessing. So all of the allocated code, all of your stack information, and we look for pointers to the malloc blocks. Any block that's being pointed to from things that are actually accessible from the top levels of your application is considered to not be leaked. And any malloc blocks that are left over at the end are considered to be leaked. And we do detections of cycles of leaked objects and things like this.

There can be some issues here in that 4-byte blocks looking like pointers, well, maybe they're not actually pointers. Some things that can happen, you could allocate memory but not have initialized it yet, and maybe it had some leftover data in there that was a pointer to a block.

If you were to set the malloc_prescribble and malloc_scribble environment variables, those say write into memory when we initialize it or write into it when we free it, write a certain known bit pattern that cannot be a pointer. So this can make your leak information a little bit more consistent.

And also try to stress test your application over a long period of time, which can give you larger amounts of leak information. So I say setting Unix environment variables. Now, I'm an old Unix geek. I've been doing that for 20 years, but for those of you who may be new to that, there are some nuances here.

So the first thing to note is actually the default shell when you create a new user on Mac OS X is now the bash shell. And the syntax for setting environment variables in bash is a little different than with the syntax I gave previously, which is for the shell I use still, which is TCSH. So what you do in bash is say exports, and then the name of the environment variable, malloc stack logging, equals the value. In this case, you want one.

Then you need to launch your application in the context of that environment variable setting. So you need to launch it from that terminal session. So you go out to the command line, you change directory to where the application lives, but it's not sufficient to just say Safari.app. Because what we need to do is actually get down to the application binary itself down inside the app wrapper. So we can say .safari.app.contents Mac OS Safari. The other syntax that I'm using to try to be consistent here and show that they're environment variables is this setenv.

So I'm not actually going to run leaks in demo here, but this is what you might see if you ran it. This is the example application I'm going to use. I'll explain that in a second. So this is a typical output. It lists some information about how many malloc nodes you have and how many bytes they have, and then how much is being leaked.

Now, in this case, it's not a lot, but over the course of time, it could add up. So it then shows the pointer where that leaked block is, what its size is, and makes an attempt to tell you what type of block it is, and then show you the contents, the initial contents of it. Now, because we had the malloc stack logging environment variable set when we launched this application, now I see the call trace. If you don't get the environment... environment variables set properly, you won't see this.

So the call stack here, what I usually do is walk sort of from the bottom up. I look at the CFAllocate or allocate and say, well, I didn't call that. CFRuntimeCreateInstance, well, I didn't call that either. CFUR, I just go backwards up to where I start to get to code that I'm familiar with. And this can take some exploring around. It can be a little bit easier with malloc-debug, as we'll see in a second.

But in this case, maybe that add text to the text view, where that sounds different than everything that's core foundation below it, where we're copying a resource URL. And because it says it's an instance of NSURL that's been leaked, that's a decent guess as to where the leak might be.

So this third technique I'm going to talk about came specifically from trying to work out the leaks in this MLTE showcase example. Now the MLTE showcase is a standard developer examples, Carbon example on your system. This is the multilingual text editor of the Carbon world. So I ran into a leak in this where, as you'll see, the object was actually allocated when the application said, load in my nib file, and about 18 levels deep, it allocates an object and it's being leaked.

Well, I had no way of even getting to that object, I didn't think. So it turns out this was a retain release match. So I was able to launch the target application under object alloc, turn on the reference counting so that it would also be able to see and also record the retains and releases in addition to allocates and frees. And then run the target application.

In a terminal window, I ran leaks, and then I could get the pointer of the leaked object, find that in object alloc, and examine the retain release. So it's kind of an advanced technique that we could probably make a bit easier in the future, but it could be useful to you now.

So let's see a demo here at this point. Thanks. So we'll shift over to the MLTE showcase application. Now I've actually added a little bit of code in here to exercise my application multiple times. I'm gonna run this under malloc-debug. It's already, let's go ahead and do a build. Yep, it's built. Run this under malloc-debug.

We can open this window up a bit and say launch. So you see the window flash five times. This is one of our test team's favorite techniques of using Apple scripts to control our applications to do the same thing over and over and over again all night long. And then they say, "I hammered your application all night like this, and this problem resulted." So this is a very short mechanism of saying, "Well, what would happen if the user..."

[Transcript missing]

So, as I've mentioned, this shows, by default, a top-down view of how much memory got allocated where in the call tree of your application. I can flip that around and I can say, well, I want to see the bottom up. But instead, what I want to do is come over here and say, well, actually, I want to see the leaks. Are there any leaks in this application? So it goes off and does this static analysis of the state of the memory right now. We can walk down through and see, well, it's leaking 424 bytes.

Well, I can walk down through, for example, quite a bit. But what I like to do is invert the call tree. So those of you familiar with Shark, this is equivalent to bottom up or the heavy view. And I can see that here's where the actual allocation occurred. And here we can see the full name of the function that I've selected. So walking backward, we see a CFURL alloc. Core foundation. Core Foundation, Core Foundation.

Okay, so here we see CFBundleCopyResourceURL. That's the same as we saw in the leaks output. And that is being called from AddImage. There's a little icon here next to AddImage that shows us that we have source code available for that. So I can double click on that, bring up the source code in Xcode. Now that was, appears to be a URL that we're leaking, CFBundleCopyResourceURL. So it's just telling me that somewhere in this function, we're calling CFCopyBundleResourceURL. Can I do that? Fine. Let me say, copy resource, and you can probably already see it.

So image URL here being returned by CF Copy Bundle. So there's a copy going on here. If I select this, hmm. So we're copying, but we're not releasing. So this one actually appears to be fairly simple. So I can just add in a CF release after our last use of this, and that will probably fix that issue. Before I rebuild, is there anything else that's gonna be easy to fix here? Walking back this other path, we see the CGColorCreate is being called.

from down inside HITextView initialized, and on and on. Now this is the one that's actually coming out of the IB Carbon Runtime after we load the nib for a new window. Ooh, that sounds hard. I'll come back to that. So there's one other one here, which is right in my code, CMLTE view data operator new. So apparently a C++ object being allocated. I want to see actually the call site of that.

So, operator new is being called from this set up the text view. So, again, since it wasn't too handy and didn't tell me exactly where it did the new. Okay, so again, this example is copied straight out of developer examples Carbon. We do, when we set up a text view, we're putting some data onto the side of that text view. And there's a big comment here that, don't forget to dispose this when HITextView destructs.

Well, if you build this example, you'll actually find that the destructor never gets called because they dispose the window without going to the trouble of finding, of pulling the object back out and doing a delete on the data that we added to it. So I added a tear down the text view method here. That gets the text view from the window, retrieves the instance data and does a delete on that.

And I've actually commented my call out to that previously, so I'll uncomment that, save that. And so now I think that I've fixed two leaks out of three. So let's go ahead and quit out of the tools. Sorry, build. Oops, that got ran there. Build. Okay, let's run under malloc-debug again, see how we did.

For leaks, so before we had about 428 bytes, now we've got 48. That's pretty good. So specifically, ooh, it's that CGColorCreate. All I'm doing is loading the nib file in. So let's try a different approach. My allocation appears to be okay. I don't think I'm getting that color myself.

So let's run it under object alloc. This time through, we want to record the reference counting as well. So the retain release calls, and I'll also record the general allocations by library. So it runs a bit more slowly under Object Alloc. And here we go. So now if I come into a terminal window and run leaks on MLTE, I should find the right application there. So there's a couple of them there. Let's grab one of these pointers.

Now, one of the things that ObjectAlloc does is it actually keeps these as events all the way through. So up here at the very top, let me go ahead and pause the application. There's a scroller here at the top that lets you scroll back and forth through the event history of the application. So you can actually make these bars go back and forth. So I want to find the last allocation event for this pointer.

So I can do a previous, and we see that it was way back here in time. Bring up the inspector and I can see what the call tree of that is. It's a CG color that's being released. It's a CF release event at this point in the code.

So if I were to say, I want to sort by that. Now we can see that having checked, show me the allocations by library. There's a lot of ATS there, for example. This should be a current object. So if I come down here to CG color, somewhere in there is going to be, and we'll try to integrate this functionality better in the future.

But eventually you'd see that there was a retain release happening from here where we copy the background color into previous color. We create a different color with a modified alpha from that new color. We set the background color, which will release the old one, retain the new one, and then we're releasing the new color.

We did a copy to get the old color, but we never released the old color after we copied it. This is pretty subtle. This is in my application code here, where I didn't allocate this object, but I did copy it. It required a somewhat more advanced technique to come in and find what the issue was. So I can release previous color, quit the app, which is paused and object-alic. Now if I build again and run it, let's say, under malloc-debug, hopefully we'll see no leaks this time around. No leaks.

So, memory usage problems. There's a variety of other things here. There's not so much performance issues, but if you have buffer overruns or underruns or access uninitialized memory or freed memory, those can be nasty problems to find because they're very often intermittent. It depends on what the content of memory is or what you overrun into. Those can be really hard problems to fix.

So, what we really want to do is try to make them happen all the time. Then you can debug them. So, some techniques that Mac OS X provides to help you with this is the GuardMalloc debugging library, some environment variables on the Malloc system, and watchpoints in GDB that might help you find memory stompers. Watchpoints are a relatively new feature for us. I'm not going to go into it in detail here, but you can read the Xcode documentation. about that.

So the way guard malloc works is that each and every allocation that your application makes goes onto a separate virtual memory page. So each page is 4K. So if you're doing a 16 byte allocation, that gets 4K allocated. It leaves the page before that and the page after that unallocated.

So, and then when the block is freed, we deallocate that page. And if your code tries to access a deallocated page, you'll get an immediate crash. So unfortunately this causes your application to run a lot slower because it's using a lot more pages of virtual memory, but hopefully that'll save you human time of attempting to figure out a way to find this problem yourself.

So by default, it's set up to catch buffer overruns. It aligns allocations with the end of the page because buffer overruns are more common than underruns. So if you walk off the end of that VM page, then you get to the unallocated page after that, and you get a crash.

If you need to catch a buffer under run, you might try setting an environment variable to say malloc protect before. Some of our system frameworks, some applications don't really like to work with memory that we have pre-initialized to odd values. So we don't do this next one all the time, but you can set an environment variable that says fill the newly allocated memory with values that would cause a crash if they were used inappropriately. And if you do call free twice on the same block, or if you try to free a block that was never allocated, we've changed guard malloc so that it immediately crashes into the debugger.

So because we're trying to help you have it crash all the time, you'll normally want to use this from a debugger. Easiest way is from within Xcode in the debug menu item to say enable guard malloc and then just debug as normal. If you're using GDB at the command line, you can set the environment variable there with the GDB syntax, set space, env space, variable value.

So this is saying insert the guard malloc library into the application. So that's userlib libg malloc.dileb. So as with malloc debug, you no longer need flat namespace. So if you were using this before and setting that variable, you no longer need to do that. So the libg malloc man page has much more information about this.

I've mentioned environment variables that control the malloc system. These are all documented on the malloc man page. Previously I mentioned malloc pre-scribble that says when I allocate a block, then write a certain sort of garbage value into that. When I'm freeing a block, then malloc scribble says write into that.

If you are somehow trashing the memory pointers, the metadata of the malloc heap itself, we can help to catch that. And for the large blocks, there's a way to put guard pages around that, but again, that's only the large blocks. So this won't help catch nearly as many problems, but it runs at the full speed of your application at least. So. It could be helpful in some circumstances.

So anyway, I've given you a kind of a whirlwind tour of a variety of aspects here, reviewed the performance analysis process. We encourage you to be disciplined about looking at the performance of your application. There's a lot of tools that you've seen. These plus others, read the documentation. They can really help you analyze your memory use, find memory leaks, and debug hard-to-catch memory problems. So with those, we help you continue to turn out great applications for both PowerPC and Intel.

So, for more information, the standard page here. You've probably seen that a lot of times. There's a feedback form immediately after this at 5:00. for developer tools. Contact information, you can contact Xavier Legros, our technology manager. You can also send feedback to the performance tools feedback list at perf [email protected] or the Xcode feedback list or the Xcode users list. So there's a lot of helpful resources out there.