Mac OS X Essentials • 51:13
Cocoa provides robust features and APIs you can use to make your application more responsive and efficient. Learn how to identify performance hot spots in your application and how to address them using the newest Cocoa features in Leopard. You'll learn about memory usage, threads, when to draw (and when not to).
Speaker: Chris Parker
Unlisted on Apple Developer site
Downloads from Apple
Transcript
This transcript has potential transcription errors. We are working on an improved version.
Well, good morning. Welcome to Session 141, Boosting Responsiveness and Performance in Your Cocoa Application. This will actually be interesting for those of you who were here a few minutes ago for Chris Kane's presentation of partitioning your Cocoa app. Some of the things that we'll talk about here fit in very well with the things that he talked about in the previous hour. So this would work out pretty well if you've been in the room for both presentations. My name is Chris Parker. I work in the Cocoa Frameworks team. And today I'm going to be talking about, oddly enough, two things, responsiveness and performance.
So responsiveness, we, yes, Lord, okay, responsiveness is generally how well your app reacts to user actions. So as the user is actually typing away in your application or clicking the mouse and things like that, how fast your app can actually respond to those actions and handle the things that the user is asking it to do.
And performance is about getting your applications worked on efficiently. And this is about things like your memory usage and your CPU usage as well as on all of these new eight acore machines and things like that, keeping all of those CPUs as busy as we possibly can keep them.
Before you get started on doing any kind of responsiveness or performance work, one of the things that you should always be doing first is measuring what's happening in your app, right? So premature optimization is the root of all evil. You can wind up doing a whole bunch of work that either doesn't get you a whole lot of win for your effort or really just doesn't make a difference. So it's important to know what's happening in your application as you're working with it. So there are a lot of tools on Mac OS X that can actually help you measure the performance of your app. And the simplest to use is installed as a command-line tool.
It's called /us/bin/sample. It puts out that nice text file in /tmp that has a statistical listing of where your program is spending its time. And one nice thing about sample is that it's actually installed on all machines. So it's even on the user DVDs. So you may be able to get reports from the field using sample that you might not be able to get with other tools because some of these don't have the developer tools installed.
Another good tool is Shark. Shark is installed as part of the CHUD framework on Mac OS X. And this comes with the developer tools. And this can give you a lot of information about what's happening in your machine as far as processor usage, but also in terms of memory and a number of other traces. It has very sophisticated data mining capabilities so you can draw some correlations and compare samples, compare Shark runs and things like that. And new in Leopard is a developer tool called Xray.
Xray has a GarageBand-like interface that allows you to put instruments into the interface that will show you things like memory allocation events. So Xray has actually pulled in some of the functionality that you used to find in tools like ObjectAlloc. It also interfaces with dtrace on Mac OS X. And you can also use dtrace directly, if you'd like, to be able to introspect what's happening in your application.
But figuring out when to use the tool can be a little tricky, right? When is there a problem? Well, the OS actually has a really specific cue about when there's a problem with your responsiveness. And it's pretty obvious. This pops up, right? The beachball, it's the spinning pizza of death, right? Whatever you want to call it, this usually means that there's a problem with your app. And it's hard to tell sometimes what this means. You might get a response from using your app that says well, I clicked on something and the beachball appeared, so I killed the app.
Well, okay, that doesn't tell you much. Or I clicked on something and the beachball appeared and I let the app sit and it recovered. And suddenly the beachball went away and I could use the app again, right? The beachball actually indicates a very specific targeted problem, right? It has to do with the main event loop.
So what happens? When a user double clicks on your app, and this application name gets called and the Cocoa machinery sort of comes into play, and the main event loop is initialized. And this contains a whole bunch of things. So the main event loop is basically how Cocoa goes through and handles events. It also has the top-level autorelease pool, so if you're not using garbage collection, all of those autoreleased objects that don't have pools around them will eventually be freed by-- will eventually get the release message sent by the main event loop at the end of the loop.
But the main event loop is mainly in charge of dealing with events that come in from the user, right? So the user clicks on something and that mouse click goes through the USB hardware, and then it comes up through I/O Kit. And then I/O Kit hands it off to the kernel. And the kernel goes off and it gives it to the window server. And the window server eventually puts it into a queue meant for the application.
And as those-- as the event loop comes around, we pull those events onto the event loop and handle them in sequence, right? And then when there are no more events, we just hang out waiting for new things to happen. And this isn't just user events, by the way. This is also things like Apple events that are being sent to the application from other processes. So there are a lot of things that can come in that are handled as part of the event loop. And it's important to keep track of for us.
What happens, though, if the application goes into a tight loop on the main event loop? So the user clicks on a button, and as a result of clicking on that button, a whole bunch of work starts getting done. Some long live computation or a download from the network, something like that? Well, the longer the application sits there, churning away, it's keeping the event loop from getting back around to pick up events.
So as events come in, they're just sitting there, hanging out, doing nothing, not getting handled. The windows server keeps track of this. The windows server actually keeps track of how often your app has checked in and grabbed an event and handled it. So after a certain amount of time, if the window server says wow, nobody's home, it puts up the beachball, right? So it doesn't necessarily mean that just because the beachball is up, the application is hung, that it's not getting any work done, or that the application needs to be killed, right? So this is where it's a great opportunity to get a tool, sample it, and figure out what's going on, right? If it looks like it's just waiting for events, well, then there's a problem with the app hanging and maybe being deadlocked, but if it's actually churning away in some work, it's probably time to try and get that work to be done in some other way that isn't going to block the event loop, right? This pretty much sums up the experience, right? Any time you see the beachball, well, doc, it hurts when I do this. Okay, the answer is don't do that then, right? Let's see what we can do about avoiding the beachball.
There are a lot of different ways to do it. And there are a lot of places where the beachball can show up, right? One of them is a beachball at launch, right? One of the most unsatisfying user experiences is when the user double clicks on the application and it sits there and it bounces in the dock for a while, and it bounces in the dock for a little while longer, and it keeps bouncing in the dock. Then eventually the beachball comes up, right? So other than the fact that the user now has been sitting here doing this for the entire application launch, you might see a sample that looks like this.
I took this sample, it's a 10-second sample of an application, and we got roughly 8,600 samples. If you tell sample to sample faster than it's able to, it will just sort of do a best effort kind of thing. I was trying to get it to do a one millisecond, you know, a sample every millisecond. And it looks like I got about 8,600 samples here.
But the key portion of this is that 8,500 of those samples are spent in nib loading, right? So we're trying to draw in probably MainMenu.nib, right? It's got the main menu bar and everything else and we're trying to bring this in. And when you see samples like this, sometimes we can get a copy of the nib. You open the nib up, and the nib looks something like this.
The nib has everything and the kitchen sink in it. It has every window that this application is going to open up. And it has all the controllers. And it has a whole bunch of other objects in it. And this is a really good way, basically, to get the beachball to appear immediately.
It has to unpack all of this stuff in order to get the application ready because it has to resolve all of those objects. So the solution to this is hey, load smaller nibs, right? Break your nib down into smaller portions. Refactor it, mainly based on probable utilization lines, right? So if you have multiple document types in your nib, factor those out, put each document type that you're going to handle probably into its own nib probably.
The Preferences dialog is actually an interesting opportunity to do this. A typical usage pattern is that your user comes up, and they launch your app for the first time, and they poke around the Preferences dialog for the first two launches, and then they get things set right the way they want them and they never open up the Preferences dialog again, if they don't have to.
So moving that to someplace where it's only going to be loaded on demand is a great way to get some of that stuff out of MainMenu.nib, right? And you might actually try to put inspectors either in their own, if the inspector interface is something you're using in their own nib or in with the things that they're going to be inspecting.
And there are other classes in the kit that can help you out with some of this. And NSViewController can actually help you deal with views that you're going to load out on nibs on a per-view basis as well as hooking things up for you. It's actually a new class in Leopard. And we're quite proud of it. We're proud of a lot of things in Cocoa, but... Another problem can be document formats, right? Your application is going to save some documents.
And there may be a lot of data to save. And one document format that some people use is property lists. They're very convenient. Property lists come in two flavors, the XML property list, which is a plain text, and it's a very human-friendly format if your humans like to read XML.
But it at least can be edited with a text editor, right? And the binary property list, which is a smaller, much more compact format, and that format takes advantage of things like some uniquing of objects. So if you have a string that's repeated many, many times, that's only going to appear once in the binary plist, right? Here is a plist that is on almost everybody's machine who's ever run iTunes.
This is an excerpt from the iTunes MusicLibrary.XML file. It happens to be a nice large plist that exhibits some of the characteristics that we're talking about here. And this is actually the track that I was least embarrassed to put on the slide, right? But you notice that in the dictionary, if you open up this plist, it's a number of repeated dictionaries. Each dictionary is a track.
And the key, these keys that appear, track ID, name, artist, composer, stuff like that, these appear once for every single entry in the plist. So I happen to have in this particular library I think roughly 750 tracks. For all 750 of those tracks, each one of these appears every single time, right? So this is a lot of wasted space. Bigger files take longer to load. I actually use the plutil command to translate my iTunes Music Library from an XML plist into a binary plist. The XML flavor is about 1.6 megs.
And the binary plist version of the same file, and it's still a property list, is about 400K, right? So there's big space savings in using the binary plist format. So if you've been seeing a lot of time being spent in XML parsing or if your application is going to be loading a lot of XML plists, you might consider looking at the binary format, because that's going to load much faster. It happens to load in roughly a third of the time. So just shy of a tenth of a second for the XML plist and three hundredths of a second for the binary plist. So it's a pretty good place to shave off some time.
Property lists has some specific performance characteristics. You have to read and write them completely, right? So when you pull in a dictionary off a disk using it with contents of file or something like that, it has to read the entire dictionary. When you're going to write it out, it has to write the entire dictionary out.
So if you're only changing one or two objects, well, it still has to write out all 500 other things that are in that plist. And property lists only handle plist types. So if you have more sophisticated document processing needs, this is probably not the way to go. You'll have to get stuff out of the dictionary or out of another kind of object and spend time constructing your objects.
You might shift over to the KeyedArchiver, right? So you use the binary plist format as sort of the backing store for this, but that's not a feature you should rely on. But objects are instantiated. So you get to control, as you're archiving and unarchiving exactly what gets written out to disks.
So if you have things that you could compute on the way up, maybe that's a way to be able to solve-- spend less time working with getting the file up off of disk, right? Things like that. So you get a little more control over what's happening in the Archiver to decide how you're going to do your writing of documents.
These still have to be written out, though, when you write out the entire document. As an entire block on disk, all right, so you have to spend all that time to write the whole thing out. And again, so even if you change one or two objects, you still have to write out all 500 that are there. And if you find yourself spending lots of time in the KeyedArchiver, the next step is to move to something like Core Data, right? The documentation for core data I think uses the phrase that's in object graph management and persistence framework.
Which is a fairly long-winded way of saying that, boy, this thing can deal with a lot of objects pretty quickly on disk. It has several store types. The XML and the binary archive store work very similar to the way XML and binary plists work, right? They're what the Core Data team I believe calls atomic stores. They're written out in their entirety, and then if you write over it, the stores happen atomically.
The third type of store type, though, is probably the most interesting for this kind of situation. It's SQLite database store, which allows partial updates, right? So if you have 50,000 objects Pstored in your document and you change 50 of them, Core Data only has to change those 50 on disk rather than all 50,000, right? P So you may be able to switch up to using NSManaged Object subclasses to be able to get more efficiency out of your application.
You spend a lot of time reading and writing files, not just your documents, but also files that other people give you to open up for your application. And there are a lot of convenient APIs in Foundation to be able to do some of this. initWithContentsOfFile: on NSData or initWithContentsOf URL: when you pass the file URL are good ways to just pick up files right off of disk.
If you have a very large file, though, you're going to spend a lot of time and a lot of memory with initWithContentsOfFile: because it's going to malloc all that space and hold onto the entire file in memory. So on a modest configuration Macintosh, you might find that this 1.5GB file that you've just loaded is taking up a lot of all of the user's RAM, and the machine starts swapping, and that's not a very good user experience either. If you're only going to be touching portions of that 1.5GB file, you might consider something like initWithContentsOfMappedFile:.
And that allows us to go out and memory map the file so that the whole thing isn't brought into memory at once, only the bits that you actually touch that are on the pages that you touch are brought in. So you'll spend a lot less time in malloc and causing swapping in the machine, you'll be able to actually get at the file contents pretty quickly.
This works great for file URLs. initWithContentsWithURL:, but initWithContentsOfURL: is going to do its work basically right on the event loop if you call it there. And it's going to block waiting for data if you hand it a network URL, right? So you can hand it an HTTP URL and it will happily go off and download that stuff from the Net. And if it's on the machine next to you or if it's on a local machine that's to your local subnet, that's great.
But, you know, the Internet is a tremendously inconsistent place. So if you're trying to download say even something as simple as a 300 or 400K file, but it's from a very slow web server or it's over an unreliable connection, initWithContentsOfURL: is just going to block waiting for data. And if there's nothing there, it's just going to sit there and wait until something comes in, right? So it's blocking the event loop. And if we sit here long enough, eventually that beachball is going to appear. So we want to avoid the beachball.
How do we do it? Avoiding the beachball with network URLs typically also, specifically HTTP URLs is done with something like this. NSURLConnection initWithRequest: delegate: startImmediately: So NSURLConnections cooperate with the run loop, right? So avoiding the beachball means not blocking the run loop, so we're going to find an API that doesn't block the run loop. In this case, it is NSURLConnection.
What happens here is if there's no data, NSURLConnection says, oh, that's all right. And then it yields to, it basically returns from the thing that's getting the data, and it allows the event loop to come back around and pick those events up, right? So because it participates in the run loop machinery, it doesn't actually block the event loop.
You can find most of the APIs that do this by looking for the scheduling APIs, things like scheduleInRunLoop: forMode: or unscheduleFromRunLoop: forMode:, right? And this allows you to be able to put these things onto a run loop in a mode where if you need to, you can spin the run loop in a private mode to get some of your work done and then release it to allow the event loop to come around. Or you can just schedule it in the default modes and let the kit handle it, right? But scheduling APIs allow you to participate in the run loop. And that's a really key idea to not getting the beachball to come up.
There are a lot of other ways to do it as well. And you can time your work. So for instance, if you're using an NSNotification queue, you can set up idle time notifications, NSPostWhenIdle. So you can get little bits of work done when we think there's nothing else going on.
You can also schedule repeating NSTimers. And those can get little chunks of work done every few seconds. And if you need to look at things in terms of when things are happening in the run loop, you can use run loop observers. And that's another way to be able to schedule your work at specific times.
The key here, though, is that even though these are ways to get information about what's happening in the run loop, you can still block the run loop. So you don't want to do lots of work using any of these techniques. You just want to do little bits of work so that you don't tie things up for other things on the run loop.
But specifically, the event loop, all right? Once you've sort of exhausted some of these possibilities, you may find that you're still not able to get out of blocking the event loop, so one solution is to use simple threads, right? So we need to get this work that's blocking the event loop and keeping those events from getting serviced someplace else, right? So maybe this is one of those downloads or this is a bunch of work.
So we'll say that this is-- this was triggered as a result of the user clicking on a button. And that button has an action method called startWorking, for instance. And you do a whole bunch of stuff. You're doing this heavy computation, right? We're going to shift that and try rewriting this method in a slightly different way.
The first thing we're going to do is rewrite, call a new method called doWork. That work is going to take an NSMutableData. And that NSMutableData is basically going to sort of be the result of just sort of append things onto it. And when this is called, it's going to do a bunch of work and then we're going to call ourselves back on the main thread. We're going to say performSelectorOnMainThread: finished withObject:workdata, so it's going to call finished with the parameter work_data.
And waitUntilDone:NO, so we'll just exit out of this immediately and some additional clean-up will happen. And in our startWorking method, we're going to take advantage of some new API that's available in Leopard. Chris talked about this in his talk just before this. performSelectorIn-- oh, he didn't talk about this one.
I'm sorry. I get to introduce this one. performSelectorInBackground: withObject: This is a very simple way to be able to fork off a background thread and get a bunch of work done, right? So performSelectorInBackground: withObject: sets up a thread, right? Goes ahead and does that work, and then because we've called waitUntilDone:NO there, right, when this work is done, the thread is going to end, it's going to get cleaned up, right? So there's no having to call prethread creator, prethread exit, and there's no instantiating an NSThread on your part. We're going to take care of all of that stuff for you.
And this is going to basically get your work done in a background thread. So what happens here is that when we call that performInBackground message, all this work that was blocking the event loop gets put onto that background thread. And now the work can happen out there. It will get cleaned up. And those events can shoot off the edge of the screen and get handled.
You may find though that you need more control over what's happening, right? The simple threading mechanism isn't going to give you enough of that. So you may want to drop right to NSThreads. And when you create new NSThreads in your process, each of those NSThreads has their own run loop, right? So the main event, the main run loop has the event loop, right? The NSThreads have their own run loops that you can schedule work on, right? And one way to do that is with performSelector: onThread:, right? So because they have their own run loops, they take advantage of the schedule in the APIs. Anything you can schedule, you can schedule on those other threads as well.
And you can actually give it the modes that you're going to use. So performSelector: onThread: withObject: waitUntilDone: And the longer version, waitUntilDone: modes:, is how you can shift work from the event loop or from one thread to another thread, right? And this is a good way to be able to get that work off the event loop, stop blocking, and don't get the beachball.
Let's talk about some performance stuff, right? We'll combine some of the things I'm going to talk about in performance with some of the stuff that we were just talking about in responsiveness. Performance, though, usually means one of two things, right? Your use of memory or your use of CPU. And the trade-off is the classic trade-off, right? More memory for cache results means you only have to compute something once, right? Or computing things more often means that you don't have to spend time storing that in memory and taking up memory.
The other twist is the one I mentioned earlier, which is we have all of these machines that have multiple cores. And it's really kind of a shame to be having a lot of work to do and not be taking advantage of all of those cores, right? So let's talk a little bit about ways to do some of this.
One of the ways that you-- one of the places that you may find yourself spending time is in Text Layout, right? And this is a sample of the Tiger version of TextEdit. And it's just opened a 60MB file. I'm really not sure how HIToolbox managed to get pink as their color in Shark, but that's a little perhaps unfortunate random color choice on this run.
But if you look at where this is spending time, this is using idle time notifications. And it's using the background text layout mechanism. So what happens here is you drag a 16MB file onto TextEdit and it starts laying all this text out line by line, right? And if you grab the scroll bar in TextEdit and scrub it around, right, that work stops, it's not idle anymore.
But as soon as you let the mouse button up, that scroll bar is going to start moving again because it's actually laying out the additional text in the background, right? Out of this 5.5-second sample that I managed to get, roughly 4 seconds are spent in layout. So even though we don't have a beachball up, there's still a lot of churning going on and the CPU doing a bunch of work.
If your text layout needs are fairly straightforward and you're not doing lots of overriding and things in the text system, you can use noncontiguous layout for your text layout, right? So NSLayoutManager setAllowsNonContiguousLayout: You pass yes to this. And the sample here actually shows that there's practically nothing going on after the initial drag, right? Because it only lays out the bits that it needs to lay out, right? And as soon as the user scrolls to a new place in the document, that's where the text system goes out and lays out that chunk of text, right? It doesn't lay out all the stuff in between.
It only has to lay out the things that are actually being displayed, right? So if you spend a lot of time in text layout, this is a great opportunity to cut that time down a lot, okay? Oh, I'm sorry, I forgot to mention, you can actually play with this effect in TextEdit. In the sample code that's on your DVD, there's a call. You can just find the call that setAllowsNonContiguousLayout:.
Set it to no and just drag a file in there and see how that works, right? It's a good test bed for tinkering around with some of this stuff. Another good test bed is Sketch, right? Sketch is some example code that's written to show a lot of different things. It shows NSDocumentUsage, but it also shows heavy usage of key value coding and key value observing, right? Here's a 5-second sample I took of Sketch after pasting 10,000 documents into it.
And I have to admit, I poked at Sketch a little bit to make its performance a little bit worse. So you won't see this behavior in what's on your DVD, but the idea here is to look at this particular sample. insertObject:atIndex:, right? So remember what I did. I pasted 10,000 objects into a Sketch document.
After pasting those 10,000 objects, I spent-- the beachball came up after a little while, and I found this sample. And the catch here is that for each of those 10,000 objects, that insertObject:atIndex: is getting pounded on. We're spending all our time there doing this work. One of the ways, if you see this, that you can get out of it is by avoiding it. Because it only handles a single object and it's called repeatedly for each of those 10,000 objects, this probably isn't the way to go.
It's probably one of the first things that gets implemented, but you can actually modify things a little bit. The one you're looking for here is to use the bulk accessor for this, right? So insert<Keys>:atIndexes:, in this case, it's going to be insertObjects:atIndexes:. Or in Sketch, you might see insertGraphics:atIndexes:, right? This handles many objects, all 10,000 of them come in at once, and the index set is provided as to where to insert them.
This is a great performance opportunity to be able to take advantage of whatever your model is or what the kits doing in Foundation in order to do these large bulk insertions, right? And by making this change in Sketch, or miraculously changing back to what Sketch does by default, insertObjects:atIndexes is only half the time that we spend in the sample. And the beachball only came up for like a fraction of the second, right? So it's a good way to be able to get a big performance win when you're dealing with lots of objects. And staying with the bulk accessor scales fairly well.
Another place that you may find that you're spending time is in enumeration, right? Going over a list in memory, a collection, or iterating over a dictionary, right? And one of the things about enumeration in Tiger, this is a Tiger-style enumeration, is there's a lot of messaging going on, and there's actually a fair amount of churn in the autorelease pool. So we're creating an object enumerator from a list.
And then each time through that string enumerator there, every time we call next object, that next object is autoreleased. And this winds up calling object:atIndex underneath the covers. And you spend a lot of time doing this messaging and walking over individual access, right? So just like there's bulk access for KVC and KVO, we can tighten things up here a little bit.
The first step is to realize that, hey, we have enumeration in Leopard, right? The 4N construct. Does everybody like this, the 4N thing? Isn't that great? That's like, I keep wanting to use this in other places, and I have to realize oh, wait, I don't have that in this language. So it's nice to be able to do it here. For NSString * string in strings_list, all right, it's much clearer, it's much nicer.
But you'll notice that it's no coincidence there's much less messaging going on here. You see a lot fewer square brackets. And that's because 4N takes advantage of the new protocol in Foundation called the NSFastEnumeration protocol, right? So countByEnumeratingState: objects: count, you're going to get a context thing here in NSFastEnumerationState structure.
And an array to fill in, and the count parameter there is telling you how many things to fill in. You're going to return how many things you actually filled in from this. And this is the way that the 4N mechanism decreases the number of messages sent. And by using this method, by implementing this method on your collections and taking advantage of this structure, and the documentation talks about what each of these things do.
There's a state field you can fill in, the mutations pointer is how you can notify the FastEnumerations system that a mutation occurred. So if you're enumerating over something, you get an opportunity to throw an exception, right? Nobody's mutating collections while they're enumerating them anymore, I hope. This allows you to take advantage of it.
And what happens is because it's a bulk accessor, you fill out say 50 or 100 things at a time. It actually gets it down to less than one message send per object in the collection being enumerated, right? And none of those things are autoreleased because you're providing direct access to the pointers for those things. So it's very, very fast. And we actually find that by shifting over to this in Foundation, we got a general speed-up across the board. So for things that do a lot of enumeration, looking at the FastEnumeration protocol is a great place to go.
We've actually implemented this for all the built-in collections. So if you're using NSArray and NSMutableArray and things like that or the dictionary classes, this is all taken care of for you. But if you have collections or if you have objects that can participate in this, here's where to go to get some extra boost out of your code.
I'm going to jump back to threading here. You know, the performInBackground selector is an easy thing to use. And there are a lot of other easy things you can do to fire off threads. And it's kind of convenient maybe when you're writing your application to say, okay, well, I'm not going to block the event loop.
That guy, Parker, told me not to do that at WWDC, so I'm not going to do it. So what I'm going to do is I'm going to fire up another thread that will do my downloads and maybe I have another download thread going on. And while that's going on, I'm going to throw another thread in there that's going to do some other work because I don't want to block those other two things. And that's going to be some indexing. And I might be updating a database. And pretty soon, I've gone completely thread happy, and I've got all these threads running around.
And this winds up being a good way to get into a lot of trouble for a bunch of different reasons, right? So creating just 35, 40 threads all at once and say, hey, you know, I'll keep them around or I'll use them as I need to or I'll set them all doing work and then hope that, you know, we'll just let the computer sort it out, right? It's good at that.
Threads have a very real startup and teardown cost, right? So some factors in using threads are things like how often are these threads going to be created and then thrown away? If you have a very short bit of work, it's probably not worth the cost of firing up a thread, putting the work on the thread, letting the thread run and then getting the thread torn down, right? Threads also hold resources, and they hold resources in the kernel and they hold resources in your process.
So it's a general load on the system to create lots of threads and then either tell the scheduler to figure it out or have a lot of idle threads hanging around, doing nothing, right? So there's processor contention. If all of those threads are doing work, they're all trying to do work in much smaller number of processors, right? And you wind up with data integrity issues, right? Multithreaded programming is hard.
And it's not something where, you know, when you're messing around with locks, and suddenly all those threads need to go for the same object, right? Suddenly everybody's pounding on that lock trying to get in and all the threads are blocked. So you're still not getting any work done, even though you have lots of threads around. And deadlocks can block the event loop, right? Chris alluded to this in his talk earlier.
You know, you can wind up in the event loop needing some information, and you go and ask for the lock, and then some other thread has it because it's doing work behind that lock, and now you've blocked the event loop again, right? So what we want everybody to think about is creating, rather than lots of threads or enough threads for an eight processor machine or enough threads for a four processor machine, we want you to think about working with a natural number of threads. And rather than having to figure out what that natural number is, we're using NSOperation and NSOperationQueue to try and help everybody out.
NSOperation is a class that encapsulates your work into threadable units, basically. And it's a good place to encapsulate the data that that work is going to have to use, right? So rather than having data sprinkled around in a bunch of objects where they're going to have to take locks and things like that, if you can pull that all into a single NSOperation subclass, then the operation can be self-consistent, basically.
Operations can be prioritized, right? So when you put them into a queue, the queue can say, well, I've had these four high-priority operations, and I only have these two low-priority operations. I'm going to do these other, the first four first, right? I'll do the high-priority apps first and then I'll get around to the lower stuff.
And then as new things come in with different priorities, some load balancing can occur, right And another really important thing, especially for responsiveness, is that it provides cancellation API, right? So if you've structured your NSOperation subclass to handle being told that it's been canceled, right? If you look at the NSOperation API, there's a set cancel method there, right? That's how you can do things like providing a background operation and the user can click the little X button or the stop sign, right? And stop that operation if they decide that it looks like that might be taking up lots of processor time or memory.
The NSOperationQueue is the thing that actually takes care of doing all of this work. And it manages the NSOperations. It's the thing that looks at all the priorities and says is it canceled, is it running, what's happening, what are the dependencies? Right? There's dependency management also, which is very handy. And this works with the kernel. And this is a big deal for us.
NSOperationQueue is actually doing things to make sure that it can offer flow control for your operations based on the general machine state. So what this will do is, well, here's an example. What this will do is, let's set up a download queue, right? And let's say we have a bunch of download operations here.
So the queue is told to start its operations. And it says okay, well, there's enough resources right now in the machine to be able to touch off two of these. So I'll start those going, right? And then, oh, one of the downloads is fairly quick. And it says, okay, well, there's a completed download, so I can start off another one, right? Now, while all of this is running, let's say something changes in the machine.
Like let's say other threads in my process suddenly go to sleep, right? Or they're blocked in a read or something else is happening. And the kernel can figure this out, and it will say, okay, well, there's more resources available, so those threads aren't really using it right now, so let's fire off all of the remaining download operations and get those going. And maybe we can get a little more throughput.
So NSOperationQueue is working with the kernel in order to help you provide the natural number of threads in the system rather than trying to guess at some of that state and things like that. So NSOperationQueue is a good place to go to be able to maximize throughput of all of those processors that you might have on the machine, right? And it isolates you from, you know, having to call syskidles to figure out how many processors are available and running.
So once you've downloaded all of that data, you probably actually want to try and display it, right? So there are a lot of things about view drawing. But the biggest deal about view drawing is be lazy, right? My mother's going to hate me for saying that, but be lazy.
Avoid invoking -display... methods directly. So if you actually go through and you look at the NSView API, the -display... methods in there, and those are really big hammers to get your work done. You want to actually try and do things in a small chunk as possible, right? So if you've got a document and you've only updated two things in that document and they're separated by hundreds of pixels, you know, only mark those rectangles that you actually dirtied. Because then the view machinery can come around and only draw in those places, right? So setNeedsDisplayInRect: is a great place to be able to cut down some of your drawing time if you find that samples are being spent in drawing.
setNeedsDisplay: is another way to be able to just tell one thing, you know, a view or something like that, to draw. And if you do have that situation where you have a bunch of small rects being drawn, don't mark the entire union as dirty because you're still going to have to draw all of those unchanged portions of the document. And that's going to spend a lot of time drawing things that just don't need to be drawn, right Be opaque, right? Here's an easy one.
If at all possible, if you can return YES; from isOpaque, do it. Because if you don't have to draw behind you, then the kit doesn't have to spend time figuring out what to draw. It will just stop with that view, right? NSScrollView and its clip view are opaque out of the box. And if you can do what you can to keep it that way, that will actually help you out.
And if you're drawing document views, you know, we have these nifty Mighty Mice. I don't know who uses a Mighty Mouse. There's one here. I have one on my desk. And I love the little scroll ball. You know, and you scroll around with this thing or you use the gesture on the track pad, and the scroll view jerks around, and it's going to try and display bands of content, right? A new rectangle comes into view when you scroll.
And so if you can draw your content in those bands, right, very quickly, that's another good place to be able to optimize your drawing a little bit. Don't draw from the top down, right? Only draw those chunks that you need to draw. So draw as little as possible.
Be opaque, right? Be minimalist, right? Here, I'll take advantage of the new enumeration stuff in Leopard, but if your drawRects gets called and you're just iterating over all of the elements in your array there for what to draw, that's spending a lot of time that may not be necessary.
There's some good information coming in there in the rectangle parameter drawRect. And a simple call to NSIntersectsRect to find out what the bounds are of the thing that you're drawing can actually cut a lot of your drawing time down, especially if you have large documents, right? One place that you sometimes see issues is during live resize, right? When you grab the edge of the window and you start scrubbing it back and forth. And sometimes things can start slowing down.
One place that you might be able to find out if you're having problems in live resize is catching a sample while that's happening, right? And if the sample is spending a lot of time in CGSReenableUpdate, that's the thing that's helping out with some of the live resize machinery, that's a place to look for doing less in your live resizing behavior, right? You can check to see if the view or the window is in live resize mode at the moment. And maybe draw a lower quality preview of what's going to get drawn there.
And then when the live resize ends, draw the real thing in there, right? But don't do lots of work at that point, because if you do, you pretty much run the risk of decreasing the performance and responsiveness of your app, right? And you can also run a follow of the beam syncing in the windows server. The windows server is going to rate limit the drawing that it's doing to how fast the panel, in most cases, can draw, right? So it can rate limit some of the things that you do. For instance, it could actually rate limit -performSelectorOnMainThread:.
So if you're spamming the MainThread while you're doing lots of drawing or trying to get things to be drawn, right, well, try and break your work down into chunks that can be drawn in one call to -performSelectorOnMainThread: rather than doing, you know, a thousand calls in a second. And the place to find out whether you're being rate limited by the beam syncing is if you get a sample and you're spending time in CGSSynchronizedBackingStore, that's the thing that's doing the drawing, basically, then you're probably getting limited by the windows server.
Another place that you can pick up some good performance, layer-backed NSViews. And this allows really smooth animation effects. It's very spiffy, right? The Cocoa animation talk earlier. I know you all went for the song, but there were demos before that that were pretty good. Those animation behaviors actually run in their own thread, right? And because it uses Core Animation, it's taking advantage of things like the video card and stuff like that. And you can get at this by calling NSView setWantsLayer: with YES on it. And that actually just layer enables your views. And then you can do all of the animation effects that you were doing before.
You shouldn't jump right to this, though, if you don't have to. If you find that you're spending a lot of CPU time computing things about your animation, then maybe it's time to look at something like layer-backed NSViews, right? 64-bit. 64-bit has some really interesting performance opportunities in some ways.
Intel 64 is faster for some tasks, partly because it has more registers available so the compiler gets to play more tricks, and, you know, reorganize your codes so that you can't read the disassembly at all. You can get more things in the collections. You can address more memory, right? And it's a great way-- you saw Steve's demo with the Library of Congress modification, right? I went through and recolored the image and put some text on it. And that's a great place where a compute heavy task can take advantage of what 64-bit can do fdor you, right? And I think the stat is that almost every machine that we sell is 64-bit capable now, right? Other than maybe the Mac mini. I don't remember.
But you get the opportunity right now, especially if you're starting in Leopard, to write 64-bit code from the start that will run in both 32- and 64-bit source compatibly, right? So if you use the types like CGFlow and NSInteger and NSUInteger and stick with the collection classes and, you know, don't make assumptions like an int and a point are the same size and things like that. These are great opportunities for you to be able to take your code that you wrote 32-bit using these types, hit the little check box in Xcode, get a 64-bit binary that you can test out and compare them side by side.
Unless you have a big graphics department writing your code for you, you probably can't do a spiffy graphics demo like Steve does, but you can at least get the same numbers or get numbers for the same executable, 32-bit and 64-bit, right? So you can actually see what the difference will make. And it's not a whole lot of pain to be able to do that, as long as you're using the types.
Garbage collection. Garbage collection is going to solve all of our memory woes, right? I don't have to think about retain and release and autorelease and everything else. Well, not quite. There's still things that kick in in garbage collection that are concerns, right? One thing is what we call unintended roots.
The collector thinks of anything that's in a stack variable or in a global variable in your code as a root for strong references, right? So it's very easy sometimes to wind up in a situation like this where some global variable is holding onto an entire tree of objects, right? And you set it and you kind of didn't think about it. And then suddenly you can't figure out why there's this 500K of memory being used in your application. The place to try and fix this is basically when you're done with the data, set that reference to nil. So what happens there is it breaks that strong reference.
And then the collector gets an opportunity to come along and grab all of that memory back, right? So even in objects that are holding big trees of things or if you have global caches, trying to be a little bit more aggressive about how you mark the usage of the memory of the objects that you're using is a great way to be able to inform the collector that hey, you know, I'm done with this, you can come along and reap it.
There are a bunch of other things to watch out for. One is scarce resource usage, right? So when you're working with things like mach ports or file descriptors or things that are not garbage collectible by the collector, which is basically anything that's memory based, right, or, for instance, memory you've allocated yourself using malloc and free, right? That's not collected memory, per se.
So the scarce resources that you use take up space. And it takes time to close those down. So typically, you might see in some objects, you've tied the lifetime of that scarce resource to the lifetime of the object, right? So what happens is it's hanging onto that until it actually gets released. Well, what you probably want to do is make sure that you're actually closing that stuff out as fast as possible, right? I used it, I'm done, I'm going to close the file descriptor, right? I'm going to close the mach port.
I'm going to free that memory immediately. And that way, you don't have this sort of piling up effect, so that eventually when the collector comes along, then things are finalized, right? And eliminating finalizers is another big performance opportunity in the collector, right? So, you know, there's a method finalize. And you could clean some of these things up in finalize. But really, you want to hit that point where you don't have any finalizers at all. And that just gives the collector the opportunity just to free all that memory at once. Finalization is order undefined.
You have no idea what's going on at finalize. So objects that you may want to message in your finalizer may not be there anymore, maybe in an inconsistent state, right? Performance really suffers when you're app crashes outright. So that's probably not a good way to be able to get your work done.
And finalizing takes time. If you have lots of finalizers, especially finalizing things that might take time to clean up, you probably don't want to do that at all. So if you can get it out of the finalization sequence, you'll spent less processing time, actually, basically cleaning up after yourself or having the collector come along and clean up after yourself.
And also, don't forget the C, right? Objective-C is a superset of C. And if you've got libraries or external things that do stuff a lot faster than you could probably write it, link it right into your app, right? And if you're spending too much time messaging, you can always do the things like writing some functions.
If you actually see samples being spent in obj C message send, you can actually break it out and write those as functions instead, right? So you don't necessarily have to get tied into absolutely everything that's happening to you in your code via Objective-C, right? So just to review, this is actually for that guy in the back. I can see you.
Wow, machine was sleeping too. First off, measure all your code, right? No matter what's going on, you can't make accurate assessments over where your hot spots are, where your responsiveness problems are without being able to see what's happening. And Shark and Sample and Xray are great tools to be able to introspect the state of your code. dtrace, I've been playing a lot with dtrace, it's very nifty because it gives you a lot of things, a lot of insight in the whole machine state.
And these are great tools, and they give you lots of information. So be sure to use them, right? Do your work at the right time, right? If there's little small chunks, you can probably get them done at idle time notification, right? Doing things, you know, only when the user actually asks them, right? Don't load nibs you don't need to load, things like that.
Use APIs you can schedule, right? Participate in the event loop. If you don't need to fork off a thread to be able to get your work done, well, maybe you don't want to do that. But again, they have real costs. And if you can schedule things on the run loop, on the event loop, you may be able to participate in that without getting the beachball popping up all the time.
Use threads, but use them wisely, right? Don't just go creating a hundred of them, because that's probably not going to do you any favors. Use NSOperation and NSOperationQueue. The encapsulation for NSOperation is a great way to think about your work and partition your work down into chunks. NSOperationQueue is going to cooperate with the operating system and figure out exactly what the state is and say, hey, you know, this is what we can touch off at the moment and we're going to hold off on other work until this stuff is done, right? Use bulk operations, right? So for enumeration, use the fast enumeration protocol. There's some big opportunities for performance wins there. In KVC and KVO, make sure that you're using the plurals, the insert<Keys>:atIndexes: APIs. And those are great opportunities to be able to sort of sidestep some of the more naive implementations that might be around.
And finally, be lazy, right? Don't draw what you don't need to draw. Don't load what you don't need to load. Don't compute it until the user actually goes ahead and asks for it, right? There are a bunch of labs. We still have one Cocoa Open Lab today at 2:00pm. We'll be downstairs.
Performance Lab has open hours today and tomorrow. But there's a specific Performance Tuning Lab Thursday at, it looks like 10:30am, I can't read from down there, there we go, 10:30am and Friday at 2:00pm. So there's lots of opportunities to come and talk to us about the performance of your app or talk to people who write Shark or Xray and figure out how to use those tools to really get good views into what's happening in your application.