How Threading Can Benefit Your Application on Mac OS X - WWDC 2001

Hardware • 45:49

Learn to leverage single and multiprocessor hardware with Mac OS X to significantly boost performance of your application with threading. Information on the different threading models in Carbon, Cocoa, and BSD are covered in-depth.

Speakers: Mark Tozer-Vilchez, Matt Watson, Robert Bowdidge, Ivan Posva

Unlisted on Apple Developer site

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.

Good morning. Welcome to session 201, How Threading Can Benefit Your Application on Mac OS X. My name is Mark Tozer-Vilchez. I'm the Desktop Technology Manager in Developer Relations. So today what I'd like to do is start the conversation with-- actually, first of all, I've gotten kind of feedback from the fact that our session title is the longest one at the conference this year. So I've been asked to take some input at the end of the session for title suggestions. So make sure you give me some input. I need to shorten it here is what I'm being told.

What I'd like to do is kind of go over some of the ways that Apple Computer has been working to optimize system performance. And some of the ways today we'll talk about how you can optimize system performance through hardware and software. So what I'd like to do is kind of take a look at how we introduce systems at these keynote shows like Macworld and so forth.

And aside from the features of new functionality, design, performance is one of the key customer messages that we deliver to our customers. Customers want to be able to say that if they're going to buy this new piece of hardware, their applications are going to be able to run faster. So performance is a key message here.

How does Apple address performance aside from increasing the megahertz? Well, there are different ways we can do that. We can use the processor technology, the G4, that we've migrated throughout our product line. So the most recent addition has been the Titanium PowerBook G4. Other ways we've done that is last year at Macworld New York, we introduced dual processor systems. So the ability to now have two G4 processors with two G4 velocity engine available to you. as well. Other ways is the unified chip architecture for I/O on the bus has also been improved. So information can be passed along the bus much more efficiently.

So what are the ways that you as a developer can take advantage of these technologies? Well, there are APIs that we put out there. So the MP API has been out for at least three or four years. Velocity Engine API has also been out there. So those are ways that you can optimize your code to take advantage of some of these hardware technologies. Multitasking and multithreaded are also ways that you can increase performance just through the application and not have to worry about the processor being faster.

At the root of all of this is obviously OS X. And this is what you're going to learn throughout the whole week of how OS X provides all the functionality and abilities for you to optimize your code without having to worry about when is Apple going to deliver a gigahertz processor. Well, today, what you should be really asking yourself is, when are you going to optimize for the APIs that we have today and increase performance on today's processors?

One of the areas also that I want to kind of clarify are some of the definitions that we use in this type of presentation and conversation about optimizing for hardware. Multitasking is the ability to handle several different tasks at the same time. Multiprocessing is the ability of the OS to actually utilize more than one processor at the same time. SMP, or Symmetrical Multiprocessing, is the ability of the OS to actually be a little bit intelligent of how to schedule the actual functions and execution of code across multiple processors to actually balance the load, so to speak.

Can we go to Demo 3 machine? To give you an example of what that looks like, here's QuickTime running three different movies. And if you can notice, the system is actually utilizing both processors, and they're pretty well balanced. QuickTime itself is... Mr. Schaffer, would you go to Window or Aisle?

...is actually threaded. So because of SMP, the OS basically knows how to hand off to each processor the task of pushing pixels to the screen. So if we had -- if QuickTime itself was not threaded, if it was a single threaded application or technology, you would see one processor being overloaded and the second processor kind of getting some time when the operating system actually divvied up some of the tasks. Can we go to slides, please? So what I'd like to do is introduce Matt Watson, Core OS engineer who will actually deliver today's presentation on why you should be threading your application for Mac OS X.

Thanks, Mark. So I work in the CoreOS group at Apple, and we're responsible for most of the lower level Darwin source code. And I specifically work on the user level implementation of Darwin. So, why do you want to use threads in your application? What Mark alluded to earlier was the user perception of the marketing message we've been sending out for Mac OS X is it's fully preemptive, supports symmetric multiprocessing, so customers are going to expect that their applications are going to run more efficiently and scale better on multiprocessor machines.

So, to take advantage of that, if your application can perform tasks simultaneously and you're running on a multiprocessor, the customers are going to expect that added benefit of performance. Since Mac OS X is fully preemptive, you'll notice that all the applications are getting time-sliced simultaneously, and if your application has multiple threads, it's going to get more than another application's share of that time. So, you're actually going to see the benefit directly in your application.

In some applications, you might notice that synchronous requests would block the UI. So we want to avoid that. We want to make sure that if you're coding and you're using an API that could block for an indeterminate amount of time, an extra thread in your application could help prevent the UI from blocking and let that other request complete.

Also, polling is considered bad on Mac OS X. We don't want applications to constantly check for state. And one way you can do that is through some of the synchronization mechanisms that I'll discuss in a little bit where you can have a thread that's waiting for an event to occur in the background and it will only continue after that event has actually fired.

So, most cases, your application may not need to be multi-threaded in a lot of instances. There are some times when the added complexity of a thread may change the logic of your application. If it was written originally to be single-threaded and you have lots of global data, your locking mechanism may not be as robust as you have originally designed. There's a little bit of added overhead for an extra thread to come into your application. There's some kernel resources that are associated with that.

And you basically need to decide when you're designing your app which portions of your application make more sense to be multi-threaded. There may also be other options that make sense. In a GUI application, you always have a run loop going that's polling for events. And if you can use a timer for a short-lived task, that might be a better option in certain cases.

Some of the overhead I was talking about in a preemptive multitasking operating system, a context switch is what occurs when the kernel decides to go between different threads that are running on the system. And there's data associated with every thread. Basically, the register state that each thread takes up needs to be saved and restored between threads. There may also be extra register state depending on if that thread has used floating point or even the velocity engine.

And that amount of data is obviously a little bit more expensive when you're switching extra threads throughout your application. The thread memory footprint is also something that might be significant. In Mac OS X, every thread gets a virtual half a megabyte of stack space, for instance, which is controllable when you create the thread. But you have to be aware of that. So if you're creating a thread that only calls one function, you may want to reduce the stack space. And I'll discuss APIs to do that. I'll talk about that here a little bit later.

Thread creation in and of itself is a little bit of overhead that you need to worry about because the act of creating a thread introduces those kernel resources. There's a set of Mach APIs that is used under the covers to create threads. And the calls that you use to create the stack and the actual thread itself take up a little bit of overhead.

Throughout the APIs that Mac OS X provides for multithreading, there are some common concepts. All these APIs let you create a thread and let the thread exit itself. There are synchronization primitives that let you coordinate the events that are occurring between multiple threads. And every API will have a set of thread-safe services.

The documentation describes things that you can and can't do for multiple threads. We're working toward making it a lot easier so that you don't have to worry about which APIs are thread-safe by making sure that everything that you call will be thread-safe, but that's not quite there yet.

In Mac OS X, threads are the scheduling primitive. It's the unit that the kernel uses when divvying up the work that needs to be done at every time slice. Our threads are fully preemptive, so the kernel will interrupt a thread that's running to start the next thread. There are some exceptions to that where we have some scheduling models where a real-time-like API can be used to specify somewhat of a deadline schedule, where you can say, I want my thread to run this long. But in most cases, the default threads that get created are normally preempted.

We use a priority-based scheduling model. So by default, all threads get the same priority. When you create threads or after the thread has been created, you can change or modify that priority. So if you have a thread that needs to be more important than other threads, either in your application or system-wide, you can change that. You can also depress the priority of a thread if you want that thread to be in the background more, or if it's just doing some low-level work that doesn't need to be in the user's face.

We use a one-to-one threading model, which means that as the APIs I describe are calling into the low-level kernel threads, we use a single kernel thread per high-level thread concept. So as I discuss these APIs, keep that in mind. There are other implementations out there where you have multiplexing of threads in user space per kernel thread, but the added complexity of that and scalability on MP machines makes it a little bit more difficult to justify on OS X. Mark Tozer-Vilchez, Matt Watson, Robert Bowdidge, Ivan Posva If you manage a thread's priority through the Mach API and not through the API you're using, that API may not notice it. So later on, your thread scheduling might behave unexpectedly.

So this is a slide we've shown before, and what it's trying to show is an example of what happens if you take an application and run it on a multiprocessor system with more than one thread. And the numbers along the edge of the bars there are the multiplier factors for the improvement in performance. So you can see the first, third, and fourth numbers are all under two, but that second number is above two, which is kind of interesting.

And it's an exceptional case where you might wonder how I could get more than 2x performance improvement on a multithreaded application. Well, if you have two processors in the system, you also happen to have two caches. So if your data set fit in the primary cache, you wouldn't see more than 2x performance improvement.

But what happened in this case was the data set was able to be split up across two caches, and you actually got a little bit better than a 2x performance improvement. So you may be surprised. In your application, depending on the size of the data that you're manipulating, how good your performance can actually be.

In the Mach implementation, it was designed from the ground up to be full symmetric multiprocessing. So we don't have any special case code for a single processor, dual processor. Everything on the system is designed to be run on a multiprocessor machine. The velocity context I mentioned earlier is something that has to be saved and restored across processors, and that context that gets saved is a little bit expensive. That's what I was mentioning about the overhead you might need to worry about.

On OS X, we use the same kernel binary for either a single processor or a multiprocessor system. We don't have a special install for a UP or an MP system. This basically makes our own development a little bit easier. We don't have to QA two different kernels on the system, and it's a little bit easier for us to do the install as far as making sure that customers all have the same bits on their machines.

The Mach scheduler was inherited from the OSF, the Open Software Foundation. We use a scheduling framework that they designed, but we've modified it heavily for our own use. It has a global run queue, meaning that all the tasks on the system get switched through on every contact switch. The tasks that are runnable, I should say.

The system notices that if you're on a multiprocessor system, you have an idle processor, so it will go and schedule a thread to be on the most idle processor at the time. So we don't really have a notion of thread affinity. It might be a term you've heard of before where you can have a thread running on a single processor for a long time. The kernel will basically balance the resources of the system as best it can. And as I've said before, preemption is key here. Every thread gets preempted as it goes through the run queue.

The user frameworks that you'll see when writing a multithreaded application are probably just the same frameworks you're using for your application. There's nothing really special. Most of the frameworks that we have have an API for threading. In Carbon, it's the MP or MPTask API that has been out for a few years. In Cocoa, there's NSThreads, which are used both in the non-GUI and GUI frameworks.

And in Java, we have JavaThreads, which is just a special implementation that is using the underlying primitives in Darwin. All of those three APIs depend on the Darwin Pthreads or POSIXThreads APIs. And since I work in the CoreOS group, that's actually the API set that I'm most familiar with, and I'll talk about that later. in a second. Pthreads, as I said, is the basis for all threading models. Every API that I described goes through the Pthreads layer. So when we make changes and enhancements to the Pthreads layer, all of those API sets take advantage of that.

I put in quotes there, it's a light implementation. The history behind that is when we were trying to decide on an API that we could use to implement those higher level threading models, we chose Pthreads and the implementation decisions were driven by those higher level APIs. So if you look in the header file in our documentation, you might see that there are some API calls missing and in general we're working toward flushing out that API, but the design goal was to help ensure that the higher level Carbon, Cocoa, and Java APIs could do their job. We use a one-to-one Mach to Pthread implementation, as I said, that reduces the complexity of our user level code and helps the scaling on an MP system better.

In Pthreads, there are some common API uses and misuses. If you haven't done threading before, Pthreads is a fairly well-defined standard. You can go to your local bookstore and get a book on it. I'll talk about some of the ways that you'll need to be careful when using Pthreads on our implementation. For instance, one case, we don't have any system-wide types. So in Pthreads, there's a specification that provides global shared memory-based mutexes and condition variables for signaling. Those can be implemented in your own application, but we don't provide those APIs right now.

One thing that you might need to remember is synchronization is not cheap in Pthreads. The model that Pthreads uses has a mutex lock associated with a condition variable. And for those to be properly used in an atomic fashion, we have to use some kernel resources to do the signaling when you're synchronizing between threads.

The default behavior for a Pthread as specified by POSIX is that threads are what's called joinable, which means that when you create a thread, the operating system will hang around and wait for that thread to finish. If you want your thread to go away and do its job and you don't want to worry about it anymore, there's an API to detach that thread, which just means let it do its job, I don't want to hear about it, let it finish on its own.

As I mentioned earlier, the stack space for a thread is by default half a megabyte. Now, it sounds like a lot, but it's all virtual memory that is used on demand. So if you look at process listing, you might see your application using a lot of virtual memory, but unless it's actually been touched, the system hasn't allocated that for your application yet.

So it won't cost you as much as you might think. But even given that, if you have a lot of threads running in your application, you may want to create them with an attribute saying, I want my thread stack size to be smaller. And if you do that, then you can limit the visible virtual memory used by your application.

In the POSIX specification for condition variables and signaling, there's the notion of a predicate, which is when you want to signal that something has happened, there's a mutex and a condition variable associated with that, but there's also some external condition that needs to be checked for. So you can have a global, volatile variable that says, "This thing happened." Well, if you just use the condition and mutexes without having a global variable, the API won't work properly. And this is discussed in a lot of POSIX thread specification texts. You can just look this up. The most common error that people use is they try and do signaling without a predicate.

Our implementation of PthreadCancel, which when you start using threads or if you've used threads in other systems, you might want to try to cancel or kill a thread that's currently running. And this is dangerous for a few reasons. One is if you're using asynchronous cancellation where you're basically telling the system, I don't care what the thread's doing, I just want it to go away. That's dangerous in the sense that there could be kernel resources associated with that thread that may not be properly cleaned up.

There may be some other data associated with that thread in some other task that is not getting cleaned up, like a file descriptor that's open or a file that's taking up space on the disk. So we recommend using the PthreadCancel in its deferred or synchronous model where you basically make a request to the system that I'd like this thread to quit. And in our API, we have a PthreadTestCancel call where if you're in a long-running compute-bound loop, you can check to see if, for example, a thread has a request to cancel, and if so, that thread will exit at the PthreadTestCancel point.

The POSIX specification also provides for system-defined cancellation points like most system calls. And if you look in the Darwin source base, you'll see that we're busily working on implementing those because it's actually a better way of doing the cancellation. If I'm in an open or a read system call and I tell that thread to cancel, it should just break out of that open or read system call. if I had interrupted that system call.

The Pthreads documentation right now is a little sparse on our system. I usually point people to the Open Group site because they happen to have a pretty extensive documentation on both the UNIX 98 standard and Pthreads specifically. In general, we use that as our model for doing our implementation. We will be providing more Pthreads documentation on the system at some point in the future.

So in the Carbon APIs, as I said, the MPTask spec is what you'd probably look at for your multithreading needs. MPTasks, just a quick overview. In Mac OS X, there's the notion of tasks and MPTasks. So tasks have classically been the process notion where you have an application, its address space, all of its threads.

In MPTasks, those are threads within a Carbon application. And as you know, in Mac OS X, all applications are in a separate address space. So some of the APIs that you may have used in classic Mac OS 9 are not going to work the same way. You can't do signaling between applications right now using the MPTask API.

The API here is pretty rich. There are a lot of mechanisms to do synchronization between MP tasks. There's a semaphore model. There are message queues and event groups. All three of these can be used in different ways depending on your application. If you have some client server model, if you have a worker thread model, you can decide which one works best for you. The other API that's a little bit unique here is the Critical Region API. This basically lets you do mutual exclusion, and it also allows for recursive entry to those regions.

There are atomic operations that are present in this API that are kind of handy for atomic increment and decrement and test and set type instructions. These are all done very efficiently, and they're probably very close to the same implementation as on Mac OS 9. Some of the APIs that exist on Mac OS X that use MP Tasks under the covers are the Synchronous File Manager APIs and the Open Transport APIs.

If you'd like to see an example of some of these APIs and the tech note that we usually refer to is 1104, the URL is kind of hard to read there. But in general, that will give you a background on what we're doing with the MP Task APIs and some examples of the thread-safe services that you can use in Carbon.

The documentation specifically for the multiprocessing services is on developer, apple.com, techpubs, mpservices. And all these documents are being evolved to reflect what the current Carbon API is. The second framework I'll talk about is Cocoa. It's the high-level GUI framework that Apple has presented as an object-oriented environment for doing application development.

NSThreads API is very simple to use. There aren't very many entry points to it. Basically, you can create a thread. You can get the thread's state at any time. The preemptive nature of the thread isn't unique. All the threads in the thread models I've described are preemptive. There's an exit notification for NSThreads where, even though the thread is detached, meaning that it can go away and you can forget about it, you could also register for a notification saying, I'd like to know when this thread goes away to clean up resources. Common to most thread APIs is the notion of a per-thread data. And when you extend that into an object-oriented environment, the per-thread data becomes a per-thread NSDictionary. So you can have keys and values that are associated with your thread using the nice high-level Cocoa APIs.

In NSThread, there's an AppKit extension. So even though NSThread is defined in the foundation classes, if you look in the AppKit or Cocoa framework, you'll notice that there's a method in there that says detach drawing thread. And what that lets you do is give a hint to the system that I'm going to be creating a thread here that might be doing interaction with the window manager. And the AppKit will set up special state to make sure that all its interactions with the window manager are thread safe. Now if you're writing a Quartz, pure Quartz application, that interaction is already thread safe.

In fact, the way that you can write a Quartz application without using any of the Cocoa frameworks is to use one thread. So you have one connection to Quartz per thread. And that provides all the synchronization you need. You still have to protect your global data if you have multiple threads that are running that are all trying to communicate with the same connection. You need to use the synchronization mechanism that I've described to make that work. But in general, the AppKit extension lets you do all you need to do for a Cocoa thread to let it know that you're going to be doing drawing.

NSThreads are self-aware, meaning there's no global notion. You're not going to be doing the distribution of all the NSThreads on the system, or you're not really going to be handing NSThreads off to other objects in the system. All these APIs I've described, since they're layered on top of POSIX threads, can ask for their POSIX thread.

And then once they've done that, they can perform any of the POSIX thread APIs, like changing their priority or finding out what the stack size is for the thread. But in general, NSThreads kind of stay in their own realm. They have a separate run loop. So usually when you're creating an NSThread, since you're already in a GUI application if you're using Cocoa, there's a main run loop that's associated with the first thread that's been created on behalf of the system.

But new threads that get created are going to have their own run loop, because those threads will probably have a different signaling mechanism and different events that are occurring that need to signal back and forth between other threads. So other Cocoa APIs that have to do with run loops are timers and notifications. And you can send those between threads with a little bit of synchronization.

There's a concept of an auto-release pool that, if you haven't done any Cocoa programming yet, this is a wrapper for objects which get created and destroyed kind of on the fly. So you call a method, it returns you an object, and one notion that Cocoa tries to help out with is the memory method. So if you don't actually explicitly create an object, it just comes back from a call, it gets what's called auto-released. And the way the auto-release pool works is on a per-thread basis.

So the main run loop has an auto-release pool. Every time through the event loop, it releases all these objects that may have been created on behalf of another method call. And in a separate thread, you need to make sure that you're maintaining the auto-release pool as well, because all the messages that get sent on that separate thread will need to be auto-released if they're returning objects eventually.

The documentation for NSThread is also on developer.apple.com in a very long URL. Actually, the Cocoa documentation is very well done. And there's, I believe, an O'Reilly book out that you can go get now. For future developments, basically, if you follow Darwin, since I'm involved in the CoreOS group, we basically get, we have the ability to put all of our work that we're working on daily in the public CVS repository.

So if you want to just go to the Darwin webpages, that will let you know how to go ahead and check out any of the CVS repositories for all the work that we're doing. Specifically, the LibC project is where the Pthreads code lives, and the XNU project is where the Mach kernel is and where the Mach threading model lives.

Some of the things we're working on, and we're also getting help, actually, the Darwin community is very interested in our threading implementation. Priority inheritance is an issue where if you have multiple threads and you're finding your lock contention becomes a problem where a higher priority thread is blocked waiting for a lower priority thread to run, there are some solutions that you can use in your application to work around this problem.

In general, the system should help out by temporarily raising the priority of the thread that needs to run for the higher priority thread to continue. So that concept hasn't been implemented yet, but we're working on that. The general API expansion issues of the Pthread spec that we've provided, we've gotten a lot of requests for specific functionality that's missing. We know about that, and we're working on it.

The other thing that we've been focusing on is performance. Like I said in the first couple slides, when you're deciding whether you should use multiple threads in your application or how you should use them, we don't want to be an impediment there. We don't want the choice to use more than one thread in your application to have to be made because there's a performance issue. In general, you should use multiple threads if your application has a data model that works well with that.

If you have lots of data that can be parallelized in its functionality where you're splitting up chunks of data in a graphical application that's doing tiling or the best example we've used is Photoshop a lot for some of its filters. Those things lend themselves very well to multithreading.

And we don't want you as a developer to have to worry about whether you're even on a multiprocessor system. In fact, if your application is written properly, you should just automatically take advantage of that second processor and your customers will be happy because they paid the money for the multiprocessor box and they're actually getting the performance that they would expect.

We're working on providing more thread-safe services. As I've mentioned, in the Carbon APIs and even in the lower level Darwin APIs, we've been working pretty hard on making sure that you can call the APIs that you want without having to worry about creating your own locking mechanisms around the parts of the API that are not thread-safe. So right now we have a couple demos. First I'd like to bring up Robert Bowdidge, who's going to show an example of some of the developer tools that we're working on. Thanks, Matt.

So what Matt has done is explain to us what the APIs are for using threads and a little about the reasons about why we might use them. What I'd like to do is give us some case studies and explain how some of Apple's own tools use threading. Now, the interesting problem-- So let's start this up. Is that this is going to be a somewhat unauthorized talk. I've talked with the groups a bit. But in general, what we're going to do is we're going to reverse engineer on the fly these apps and try to understand what's going on.

The way I'm going to do it is with a handy dandy little performance tool. This is a pet project. It's not yet on the developer CD. That tries to visualize how threading goes on. Now, the first app I'm going to look at is the Finder. OK. Let's get some action here.

and what we're seeing here is a timeline view in Thread Viewer. So each of the little blocks there represents about a 50 millisecond interval and each of the bars represents Each thread. And so the idea here is that you can see that there's three threads currently in the finder. Click around.

and the colors change according to what's going on. So for example, the green represents that there's currently execution on that thread. The yellow represents that execution occurred but is not currently occurring. So there was some execution during the last sample. The green, as in here, represents that the program was waiting in the run loop. The red represents that the thread was waiting on the lock. Now the first thing I'll show you is that you can see that there's basically one thread that does most of the action. That is the main thread. And that's how most applications run.

So most of the drawing, most of the UI logic is going on there. The second thread from the bottom, the one that's locked, represents what the Finder people call a sync thread. So the idea is that they cache a lot of the information about what's going on in each folder. However, when you enter a folder that you've already seen, you need something that will go and make sure that nothing is changed in that folder. And that's what the sync thread does.

So the idea is it quickly goes and it looks in that directory in a separate thread. Notice that it's always existing. It's always locked, so it can be started up quickly. And by having it on a separate thread, you not only have the ability to basically have scalability and do things in a multiprocessing way so that you get quick response, but also, since you're accessing the disk, you know that if that blocks-- because let's say it's an iDisk-- you're going to be able to go and access that and wait for the response without actually stopping the UI thread. And so that improves the user experience, as Matt was telling us.

Let's start Finder up again. The other thing that you'll find as we click around is that each time that we enter a new folder, we can see that new threads get started. So there was the remnants of one here, and then we have a thread here. And that thread only stays around for a small instance and then goes away again.

And what's happening there is that once again, the finder guys don't want the UI to block. The finder would look miserable if every time that you went into a new folder and it went out to touch the disk, if the entire finder locked up. It's a horrible user experience.

And so to avoid that, as many of you probably will want to do in your own apps, what they do is they have the disk accesses going on in a separate thread. When they enter the new folder, they go, they catalog it, they do that on a separate thread, and that way they know that the finder is not going to block while they collect that data.

A third example is if we do a copy. So let's go to the home directory, and let's duplicate an application. And what we find is, again, up here, We've created a new thread. The idea is that the copy is something that's going to be a long-running app or a long-running thread. It's something that's going to be touching the disk, and so it's going to be blocking, so you don't want it on the normal thread.

And it's something that should be running in the background because it's probably something that the user doesn't have a huge amount, or doesn't really care that it completes immediately and wants to be able to do other activities. And so the idea of being able to split off this background task or background process is an important idea.

So what you've seen here is that the finder is using threading to avoid blocking on I/O, in case they're accessing disks that are remote, for example. You've seen trying to improve the user experience by making sure that blocking stuff happens on separate threads, and you've seen the idea of using the threads for background actions. Okay, the second example I'd like to show is iTunes. So let's go back to Thread Viewer. Amen. We'll see there's a few more threads in this. So let's start out by getting some music playing.

Okay, so we know it's actually playing. You know that this isn't canned. And what you see here is the activity going on in the threads. The thread at the bottom, again, is the main thread. So it's sitting there and it's doing all the UI work. And if we actually had the visualization part of iTunes going, you'd see that thread basically pegging out, constantly running.

The next thread up represents basically data collect, or is the data decompression, which is actually running as a deferred task as a Carbon thread. What you also will see is occasionally on this third thread, there'll be some green blocks that go by. And what's happening there is that there's actually a separate thread that's being used for accessing the disk.

And so the idea is that this thread goes off, and every now and then when it needs data, it grabs as much data as it can, stores it in a buffer. Then there's a separate thread that actually does the decompression. And then the thread at the very top where you see all the activity is actually the thread that sends data off to the audio device.

On the disk, we want to do large accesses. We want to grab a huge amount of data and then do the decompression. With the audio device, we want to minimize latency. And so the idea is we want to throw a few little bytes at a time out to that audio device.

And so by having that on a separate thread, that means that we can easily send the data that's needed and do it in as timely a manner as possible so we have no dropouts. In actuality, the three threads that you see there, the decompression, the disk reads, and the audio playback, are actually all controlled by the sound system. And so the sound manager. So they're actually not threads that iTunes actually went to any trouble to actually create.

But once again, you're seeing cases of iTunes using threading to avoid blocking on disk I/O. You see it trying to take the tasks that are time critical and put them on separate threads. And you're seeing the idea of trying to make sure that you can parallelize the code by breaking the major parts of sort of this pipe and filter model into separate threads. Okay, these were just two examples about how Apple is actually using threading. Hopefully the people who are available for Q&A can give you some more ideas about when to use threading and when not to. Thank you.

Thanks, Robert. So the second demo we have is Ivan Posva, who works on the Java Virtual Machine. And as I said earlier, there are definitely cases where your application is more inclined to be multithreaded, especially if you have lots of data that can be operated on in parallel. So he's going to give us an example of an application where this is the case and helps out a lot.

Okay, good morning. What I have is basically a digital elevation model from Switzerland, and I wrote a swing app that basically renders scenes within Switzerland. So I have a UI element down here, swing. It says one, so I'll spawn this rendering in one thread. So it goes off, it tiles the image into small pieces, and does that on a thread. You see in the task manager, one CPU is usually used, and the other CPU is Java UI making use of the spare processing power to display.

So we saw that it took 15.7 seconds to display this image. If I go to two or even three threads on this machine, when I restart this image, we see that it's not going to display the image. So we can see that the UI updates much quicker. It's pegging both the CPUs, calculating. So what else do I have?

I can do the same thing with the satellite image, just taking the... Mark Tozer-Vilchez, Matt Watson, Robert Bowdidge, Ivan Posva still use the UI while it's calculating the stuff. So Java itself has built in support for threading in the language. It's rather easy to use threading, rather easy for us to update the UI at the same time as you do calculation. That's about it. So what was the performance benefit you saw with your multi-threaded example?

For this calculation, it was about 1.8 scale factor, so that's pretty good. If you consider that part of the update is happening while you're calculating, it's pretty using the processors at max speed. So it's also a feature of the swing implementation that the developer doesn't have to worry about the locking for drawing to the UI, that just happens for them? That happens behind the scenes for the user. You just say, repaint this area. Here's your new image. That's it. Great. Thanks, Ivan.

So I'll bring up Mark to finish off the information about this session and other sessions. What I wanted to do is bring up related sessions that you might be interested in furthering your knowledge about tuning and specific areas of how to get hardware advancements here. So session 121 with Carbon Performance Tuning, which Robert will be at, advanced Cocoa topics for session 123, the Darwin Kernel, and session 140.

I also want to add that there will be a Birds of a Feathers session at the end of the evening today at 6 p.m. here in Hall C, I believe it is, with Velocity Engine. So we will actually discuss with the updates on Velocity Engine. We wanted to kind of separate both of these topics and allocate as much time as we could to threading. One of the things that last year I was up here on stage promoting to you developers, and at that time we didn't have dual-processor.

I was promoting the fact that you should thread your application, kind of not being able to tell you that, hey, we're coming out with multi-processor systems. But if you thread today, as OS X matures and was actually going to be released, your application will run faster simply because the threading model within OS X is a lot more efficient than what is in OS 9. So today with multi-processor systems, you can see now why there's even more of a reason why you want to optimize your applications. So taking that into account.

So if you're doing a single-threaded application, bringing it over to Mac OS X, the customer experience is going to be, well, there might be some performance enhancement, but really it's more perceptual because OS X is actually doing all the work for you. If you multi-thread your application, you get additional benefits because OS X now says, okay, great, I can move things around a lot quicker and use the thread model within the Mach kernel. Now, the next step is really optimizing your application.

So what we're doing is optimizing for MP because just providing threads is not going to give you the efficiency of using that second processor. That's where SMP comes into play. So actually paralyzing your code and seeing where it makes sense to actually have your thread go off to the second processor is what factoring or paralyzing your code involves. And that's what we term as optimization for MP hardware. And that's what you kind of saw in the example with QuickTime itself, actually balancing the pegging of each processor so that SMP itself can... take care of the housekeeping there.