Threading on Mac OS X - WWDC 2005

Application Technologies • 48:11

Using multithreading can significantly boost the performance and responsiveness of your application, particularly on multiprocessor Macs. Take a closer look at the threading models on Mac OS X, and learn how to leverage the threading architecture using the Carbon, Cocoa, and pthread multiprocessing APIs.

Speaker: George Warner

Unlisted on Apple Developer site

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.

Welcome to session 124, threading on Mac OS X. If that's not where you're supposed to be, now's your chance to make a run for it. I'm not Mark Tozer, I'm not an evangelist, I don't play one on TV. So, George Warner, here I am, the schizophrenic optimization scientist. They cut my email, if you ever need to get hold of me, it's [email protected].

So what we're going to talk about today, threading on Mac OS X, our agenda is basically we're going to talk about some terminology, so you'll understand what I'm talking about, the terms. We'll talk about the why, the why not. Talk about threading architectures. We'll go into some do's and some don'ts. This is pretty much stocked from presentations I've made any number of times.

This next section, how to write thread-safe code, I really was kind of tasked with coming up with something really hands-on that you could take away that would be useful when you sit down in front of your computer and actually get ready to write some code. And so I've got a whole new section there. Then I have a quick little demo, and then we'll break for some Q&As.

There we go, threading terminology. So thread versus processes. First off, a thread is an independent execution path, preemptively or cooperatively scheduled. And a process is a collection of threads plus the resources necessary to run them. Now the reason I make this distinction is we have this thing called a task. In Unix land, a task is a process, but in Carbon land, a task is a thread. So I kind of avoid using the word task. I'm going to say thread when I mean thread, and I'll say process when I mean process.

Multiprocessing is a special case of multitasking. I'll probably slip up and say multiprocessing occasionally. I usually mean multithreading. The main reason I explain this one is, I remember back when 8.6 was released, when we first released the dual processor G4s, a developer would come in with a problem, and I would suggest MP as a solution. And he'd go, but my customers don't have MP boxes. Well, MP multiprocessors and MP multiprocessing are two different things. So I'll typically try to use multitasking or multithreading.

Reentrant is a term that gets banned about a lot, gets thrown out. Basically what it means is concurrently executing the same code, either time-sliced or actually parallel on two different processors at the same time. So parallelism is a subset of concurrency. That's when you actually have the same code running on two processors at the same time.

So thread state. There's five thread states that I typically use. I'll explain what these are. I kind of use them fast and loose. I'm used to using the terms, but I want to make sure you understand what I'm talking about. Just to make it easier, I drew a cute little picture, if I can get to it.

Here we go. Oh, they stripped. They didn't get the graphics cleaned up. When you create a thread, it's in a suspended state. It's waiting for a CPU. That's the orange box. It's ready. As soon as the scheduler runs and it's of the highest priority and there's a CPU available, it'll actually get scheduled and start running, which is the green box. After its quorum, which is typically about 10 microseconds, so 100 times a second, it'll get preempted and put back in the ready state, and someone else will get a turn.

From the running state, it can also, if it calls a blocking I/O routine, it'll go into a wait mode where it's waiting for the I/O to complete. It's basically a form of suspension. And when the I/O completes, when it unblocks and the computer continues running, it'll be woken up and go back into a ready state where it's waiting for a CPU.

The top gray box is suspended. If while it's running or while it's ready to run, an external thread calls a suspend API, it'll be taken out of the run queue and put in the suspended queue where it's basically being held. It won't get any CPU time and it's sitting there. From either the running mode or from the suspended mode, it can be terminated, in which case it's basically a zombie until this operating system cleans it up. And that's the last gray circle that the text got wiped out of.

So why use threads? There's customer expectation that your application is going to be responsive. And if he clicks on a button and your application starts running, and he goes to, like, move a window or access a menu or something like this, and he can't, basically, he's got a nonresponsive application. He's less than happy with the performance.

It's an expectation that user interactions can happen at any time. So by using threads and putting synchronous calls and stuff like this over on their own threads, long execution, long computation cycles and stuff like this all on their own threads, you keep the main event loop responsive, you keep the user experience optimized.

No spinning beach ball. So scalability. We mentioned earlier the multiprocessors. If a guy goes out and spends the money to get one of the dual processor boxes, he kind of expects to see both processors getting used. And if he runs your application, and one processor is maxed out, and the other processor is completely idle, he's pretty much aware of the fact that he's not getting everything that he paid for. And so scalability is a good thing. MP hardware equals MP performance.

Speaking of scalability, this is a chart I drag out every year. So I think the first time I ran these numbers was in 1988, '89-- excuse me, '98 and '99, when we were working on 8.6, the MP implementation. We took a bunch of the common Photoshop filters and multithreaded them.

And for most things, motion blur, unsharp mask, Gaussian blur, we got an incremental amount of about 30%, 80%-- can't remember, can't read it-- and 90% increase. The one that's always interesting about this slide is, like, maximize here was 2.3 times faster than the single scaler. Well, the first time I saw this number, I thought, there must be something wrong with my testing. There's only twice as much horsepower.

How could it actually be more than twice as fast? And so I went back and re-ran my data, re-ran my test, and checked my timings and everything else. And after a bit of head scratching, finally come to realize I didn't only have twice the horsepower, I also had twice the cache.

And my data set, what I was doing the test data on, didn't fit so well in a single cache, but it fit really extremely well in two caches. And so by dividing the job up between the two processors and the two caches, it actually ran better than it did on a single-- than just a times two. And that's what we typically refer to as super scaler. So why use threads?

So preemption. More threads equals more CPU time. This is kind of a sneaky way to get more CPU time. If there's two processors running and they're both in the same priority, you're both getting about half the CPU. But if you're running two threads and he's only running one, then you'll get two-thirds of the CPU time, and he'll only get one-third. So it's a sneaky way to grab more CPU time.

I call it the store shelf policy. If you ever go to the store and go to the laundry detergent, you see All and Cheer and Biz and Fizz and all these 14 different thousand brands of laundry detergent. And you wonder why there's so many brands until you realize that half of them are manufactured by one company. And it's basically their way of getting more shelf space. So this is kind of the same thing.

Synchronous request. I mentioned earlier about a responsive UI. If you've got a synchronous operation that blocks and prevents your UI from being responsive, put it in its own thread. If it blocks a separate thread, it doesn't matter. If it blocks the main thread, you're going to affect the user experience.

So polling is bad. So this is, to me, is the equivalent-- if you've ever gone on vacation with a nine-year-old, this is the equivalent of, are you there yet? Are we there yet? Are we there yet? And for me, especially, about as annoying. So block instead. Tell them to take a nap until we get to grandma's.

I'll wake you up. So I wasn't talking to you. Not now. So why not? Well, a little fun with the clicker here. Added overhead. So it takes a finite amount of time to actually spawn a thread, to create it, and set it up, get it ready to run, get it scheduled.

We'll get there eventually. Here we go. So on a 2.7 gigahertz, our fastest G5 right now, it takes about 126 microseconds. So sure, you can do about 4,000 of those a second, but you'd probably rather be doing some computation in that time. So my kind of rule of thumb is anything that takes less than that amount of time to compute is kind of a no-win to go throw it in a thread when it's going to take it that long to create the thread.

So the preemption time, so every hundredth of a second when a task gets preempted and it gets swapped out and another context gets loaded in, it's 30 microseconds. Now we've improved this on a 2.0 system, this was about 40 microseconds, and if you do the math, we did a little bit better than just linear. We've gotten that time down a little bit.

The time quorum, which I've already mentioned, that's about 10 microseconds, about 100 times a second that we do that switch. So if you've got thousands of threads, you're going to probably spend more time switching than actually doing any work. So you want to really avoid doing too many threads. I'll talk a little bit more about that later.

So why not memory? So for every thread, every pthread that spawned, we locked down 2K of system memory down in the kernel. And that's physical memory. So you definitely don't want to waste too many kernel resources. So thousands of threads, probably not a good idea. Part of what that is that we're locking down, the hardware context, 32 general purpose registers, 32 floating point registers, floating point status control register, transfer link count registers, memory management registers, et cetera, velocity engine registers, pretty much the entire hardware context of that running thread. The 32 bit-- I mean, the 32 general purpose registers on a G5, obviously, are 64 bit. So that's a pretty big chunk of memory.

And the user land resources, this would be the frameworks that threads are implemented in at a higher level, like MP for Carbon, NSTask for Cocoa, et cetera. There's some user land resources. At the very least, there's a 512K byte virtual stack for each thread. Obviously, you have a lot of threads. You can eat up a lot of your virtual address space. Not a too bad idea. So added complexity.

If you have data structures that you have to protect, then you've got to write locks codes. If you've ever written deadlock prevention code, it can get very complicated. If you're doing multiple things in multiple threads, you have to think of all those things at the same time. That's where I picked up the schizophrenia, I think. So the shared data may require locks. And the last reason, non-thread safe APIs. If an API is non-thread safe, you can't call it from another thread.

Then you have to call it from the main thread. And so that's why not to use. So 100 threads, I've already mentioned all of the reasons why 100 threads, a lot of threads. The more threads you are, the more memory you used, et cetera. So other options. Cooperative threads. For the non-thread safe APIs, if you need thread-type behavior, you can use the old 68K from however many years ago now.

Over 15 years, I think. Thread manager. Yield to any thread. Yield to thread. And use thread. Use cooperative threads and call thread safe APIs. If you have a repetitive task that repeats at a fixed interval, timers might be a better idea. So we'll talk about some threading architectures.

So parallel threads with parallel I/O buffers. Now this is where each one of these threads and their data is completely independent of each other. You may be doing, like in a word processor, maybe grammar checking, maybe spell checking. You may be kerning. You may be doing 1,000 different things. But they're basically independent of each other.

In this case, we have parallel threads with shared I/O buffer. This is what I call the divide and conquer. This is where you take something like a huge image, breaking it up into little bitty chunks. Each one of those little chunks is fed off to another, to a different thread, and each one of those threads crunches that chunk in pretty much the same way, and then it's reassembled on the output to reassemble the picture with the effect.

So sequential threads with multiple I/O buffers. I always like to call it the pizza oven, which might be obvious why I think that. Some people call it the assembly line. I think the CS101 computer term is provider-consumer, or something to that effect. But basically you have multiple tasks where each task is linked to the output of the previous task, and its output goes to the input of the next task. And it doesn't look like it's immediately parallel until you realize that thread number three can be computing on the third paragraph, while thread two can be working on the second paragraph, while thread one can be working on the first paragraph.

Other way around. Thread one can be working on the third paragraph, thread two on the second, and thread three on the first paragraph. So as the first thread finishes the work and passes it to the next one, he can start working on the next paragraph. So that's an example of sequential threads.

Now most applications have both parallel and independent execution paths. And some examples are driving simulator where you've got an AI, physics engine, rendering, etc. An image processor where you've got color correction, you've got filters, you've got special effects, word processor, grammaring, spell checking, kerning. All these things can happen in parallel and sequence, etc. So this is about the only slide I have on implementations. I'm not going to nail down into any specific APIs. Carbon has the MP APIs, Cocoa has NS threads, Java has Java threads. All of these are implemented on top of pthreads. That's underneath it all.

So threading implementations. Now all these implementations have some things in common. Thread management, creating threads, deleting threads, terminating threads, suspending threads, setting thread priorities, et cetera. You can change thread behavior from round robin to FIFO, real time threads, et cetera. Synchronization primitives to critical sections, mutexes, semaphores, spin locks, et cetera. and thread-safe services. And this is basically all the toolbox things that these have that are thread-safe, like malloc and free and etc. So we have some dos and some don'ts.

Do avoid the creation destruction overhead. I mentioned it's about 126 microseconds to spawn a thread. What you can do to work around that is you can pull threads. You can preallocate them, block them against the job queue, and then they're not taking up any CPU time. When you're ready to actually do some work, you stick it in the queue.

One of the thread wakes up, pulls the job out of the queue, executes the job. When it's done, it goes back around and blocks on the queue again. But you've avoided the creation time. The next time you do a job, signal the queue, same thing happens. You've avoided that overhead of creating the job queue. I mean, creating the threads.

Be data-driven. I'm kind of surprised by the number of times that I've run into really, really complicated thread code that kind of gets away from the basic way that we all pretty much learned how to write code, where we have a prologue where we open, allocate, retain our objects. We have a crunch loop where we read, modify, write our data. And then we have an epilogue where we close, dispose, release, et cetera. This is a good paradigm. It works. Do the same thing with your threads, as long as all of these calls are thread-safe.

In 8.6, when we first did this, so many of these things like the open, allocate, new pointer wasn't thread-safe, and the close, dispose wasn't thread-safe, but the file I.O. was thread-safe. So what we did is you could write the prologue, spawn a thread to do the crunch, and then when the crunch was finished, the epilogue would happen back on your main thread again. Nowadays, with more and more and more of the toolbox being thread-safe, this is becoming less and less necessary. You can do all this in the, right there in the in your thread in sequential code.

I've got my only "don't" on this page. Let me see. And I put it here for a reason. Don't suspend, resume, or terminate. Another operating system whose name I won't mention, for some reason thinks this is a good model. And it really drives me crazy when I have developers from that platform come over here and do it over here. The problem with the suspend, resume, and terminate is, if you look at the data-driven, that prolog, crunch, epilog, if you suspend, what state is that thread in?

Was it in the prolog? Is it halfway through the crunch? Is it in the epilog? Has it started closing files? You have no way of knowing. So if you go and terminate that thread, chances are you've leaked memory, you've leaked ports, your retains are off, it's basically, your application now is in an undetermined state. So we very seriously discourage developers from using suspend, resume, terminate models.

So what can you do instead? Use synchronization primitives. You can have a cue that, or even a mutex, that you signal that says abort. And so in your middle crunch loop, however often that's appropriate, you check and see if anything's in the cue. Nothing is in the cue, continue crunching. Do it often enough that you have a responsive UI, which is like two or three times a second, but you don't do it like on every, if you were doing like a big image, you wouldn't do it on every pixel.

That's way too much overhead. Then you're back into the are we there yet kind of thing. So you just want to do it often enough so that the user experience is when he hits cancel, within a reasonable amount of time, your loop notices, aborts your thread, cleans up, disposes memory, falls out the bottom, and aborts the operation.

don't over- or under-lock. So I'll get in later into how to write thread-safe code, and I'll talk a little more about locking. But basically what you want to do, let's say you've got a tree structure. If you lock every single access point, you'll probably spend more time administrating all those locks, locking and unlocking those locks, than you would if you actually acted the time it takes to access the data.

That would be over-locking. Under-locking would be having one lock for the entire data structure. So what would happen then is everybody that wanted to access would be blocked, waiting for whoever's got that one lock. That's definitely under-locking. So the answer there is to find a balance, and you know your data set's the best. You have to make the decision. You have to understand the trade-offs. You should have metrics and measure and meet your criteria for what you think is appropriate.

Don't spin wait. We've already mentioned the are we there yet scenario. In this particular case, what I'm talking about, if you're waiting on more than one thing, and you're waiting on multiple things, one thing I see people do is they'll go, check that one, then check that one, then check that one, then delay for a little bit, then check that one, then check that one, then check that one, then delay for a little bit. And basically, they're spin locking. They're still spin waiting. And it wastes CPU cycles.

If you have multiple things that you can wait on like that, you could actually spawn three threads, have each one of those three threads wait on one of those things, and when any one of those threads wakes up, it could signal the fourth thread to say, this event has happened. And the fourth thread that's waiting on that event from any one of those other threads would then wake up, check that one, check that one, check, oh, this one's it, process that one, and then go back and wait for one of the other two.

used separate threads to merge signals. There's also a nice Unix API signal or something like that that allows you to do the same type of thing, wait on multiple events at the same time. So don't GUI. Now, these sets of slides have been in my presentation for going on six years now, and I'm really looking forward to the day that I can remove that bullet point. It's becoming less and less true every day.

We're working digitally to get it the way we want it, where we can just do anything anywhere. But right now, you can't do user input or click your mouse clicks and stuff like that from your GUI. But you can call courts. You can do your drawing. you can do OpenGL.

One of the tricks in this one, and Cocoa has the same type of thing, but I'll pick on Carbon this time, is postevent2q. So if you've got a thread that computes a nice little graphical image, and you're ready to draw it onto the screen, you can use postevent2q to tell the main event loop, update the screen.

And one of the nice things in the Carbon event system is, especially for update events, is what we call event coalescing. So if you're updating something on the screen five or six times a second, but the main event loop is busy, and he hasn't updated the screen yet, when you call postevent2q, it'll look in the queue and go, oh, there's already an event in there. It won't put another one in the queue. So you don't have to worry about overrunning the queue.

So this is our new section on writing thread-safe code. This is the part you'll hopefully take home and find most useful. When I was preparing this, I started working on this about a little over a month ago, I went on the web and I was looking for a good definition of what thread-safe meant. And unfortunately, there's a lot of misconceptions out there of exactly what thread-safe means. But I pulled up a couple of them here, and I'll go through them. Thread-safe code may be safely invoked concurrently by multiple threads.

Okay. Well, what does "safely" mean? Well, you know, I've been working on computers for 30 years. I've never ever seen one blow up and blow sparks all over the place. Regardless of what Hollywood have you believe, I've never seen it happen. So, "safely" here, I don't know, might be an OSHA rating or something, but it's not a very helpful definition.

So, thread-safe code can be called from multiple threads without unwanted interaction. Well, this is a little more useful. Maybe a little vague about what unwanted interaction is, but a little bit better definition. So here's one found. Thread safe code is reentrant, are protected from multiple simultaneous execution by some form of mutual exclusion.

I don't know how you, you know, I know computer guys tend to be less English literate than maybe the rest of the populace. I know I certainly was. But I was always irritated when my English teacher, you know, I'd ask her about a word and she'd say, "Go look it up." I'd go look in the dictionary and I'd find five other words in the definition that I don't know what they mean either. So, I will say that I do know what these words mean, but I'll also admit I had to look some of them up.

Here's another one. Thread safe code is guaranteed to compute the same result regardless of whether it is run on one thread or many. This is probably the best definition, a little verbose, but once you understand, once you kind of get thread safety, this is probably the definition that kind of makes the most sense compared to what you've learned about thread safety. So I kind of mooged all these together, and this is what I came up with.

Thread safe code must behave correctly in a single-threaded environment. If you've got code that crashes in the main thread, it's not thread safe, just by the fact it's not even main thread safe. So it must behave correctly in a single-thread environment. And here's the trick. It must behave the same in a multithreaded environment.

So the way it works in a single-thread environment, that's the way you expect it to work in a multithreaded environment. If it doesn't, more times than not, it's because you've got some thread safe issue. So... The problem with all these definitions is, I don't know if any of them really help you write thread-safe code. They just give you a definition.

In the course of researching this, Joshua Bloom wrote a book on Java threading. I don't remember the title right off the bat. I think it's Excellent Java Apps or something like this. In that book, he actually had a section where he got a little more detailed about what thread-safe means, and he took the definition down for thread safety. So thread-safe, I've already mentioned.

Okay, I'm going to use that one. So, immutable. So, Immutable data and immutable is always thread safe. If you don't have to worry about someone writing to it, you don't have to worry about it changing unexpectedly. So things like maybe the processor type. Unless you've got some hybrid mixed Intel PC board, OK.

Chances are your processor's not going to swap out from under you in the middle of your code. So if you've got a routine that returns the processor type, chances are it's thread safe. : In mutable never requires any type of synchronization. If it's not going to change, you don't have to worry about somebody changing it on you. So no synchronization. So thread aware. This is code that can detect when it's being executed in a thread and behave differently.

I was trying to think of an example of that, and I forgot my example right off the top of my head. Cocoa. Cocoa runtime is a perfectly good example. If your code never instantiates an NSThread object, Cocoa is smart enough that it doesn't instantiate all the locks and all the mechanisms that make the Cocoa runtime thread safe.

But the moment that you instantiate an NSThread, it goes at the same time and goes, "Oh, let's make sure that all these memory structures are protected." It instantiates all these locks and does the right thing. So this is code that's thread aware. It behaves--it detects when it's executing in a thread and does the right thing.

So conditionally thread safe. So specific usages are known not to be thread safe. If you've got an API that you've got one parameter that if you specify it, if you pass null as that parameter, then it picks up some system default. That system default may not be thread safe. It could be stored in a global that everybody's using. So maybe the condition on this particular API would be is as long as you don't pass null for this parameter, then it's not thread safe. And it's thread safe. So it's conditionally thread safe.

A good example of this one also is Get Main Event Loop, the Carbon API. It's only thread safe if it's executed once to completion before another thread calls it. And the reason why is it has to instantiate the run loop. If you've never called run application event loop, you don't have a run loop instantiated, then the first time you call this, it instantiates the run loop for the application. And instantiating the run loop is not thread safe.

So if you've called it one time to completion from the main thread, it's instantiated the run loop, then every time after that, it's going to return the run loop that's already instantiated, and that's always thread safe. That's kind of like the immutable. Like the first time you run it, it instantiates it, but now it's kind of an immutable object. It's not going to go away until your application quits. So the second or sequential access is, are threads safe?

So thread compatible or thread friendly. These are routines that inherently in and of themselves are thread safe. However, the data that you pass to them may not be thread safe. But if you've properly protected, if you can guarantee mutual access to the data that you're passing to this API, then this API is thread safe.

And then pretty much everything else, which is just thread hostile. And if you look at all these other ones up here, I find very few toolbox routines that are thread hostile. They may not be marked as thread safe, but they're typically one of these other ones, either conditionally thread safe or thread compatible.

So we've talked about APIs and functions, routines that are thread safe. We'll talk about things that are thread safe, data that's thread safe. We already talked about immutable. Immutable routines are thread safe because the data they return are immutable. So immutable variables are, by definition, thread safe. Any constants, et cetera, are thread safe.

And just one of the things to help the compiler help you, if you've got read-only data, declare a const. You can save yourself a lot of headaches. Because you can prevent the compiler from generating code that would write on constant data. So non-shared data. And by non-shared data, I don't mean just globals. What I'm talking about is local variables, anything that you've declared on the stack.

method parameters, anything that's passed to your routine that's passed in registers, the contents of those registers are thread safe. Now, the gotcha here is you might be past a pointer, and that pointer's thread safe. But the data that pointer points to is not necessarily thread safe. So we're talking about strictly the parameters that are passed, not necessarily the data that those parameters point to. method parameters, anything that's passed to your routine that's passed in registers, the contents of those registers are thread safe.

Any locally allocated memory is thread safe. You call malloc, or any one of the variants of malloc, that memory is thread safe. And you can pass it to any routine that's thread safe, or any routine that's thread friendly that can take thread safe data, and you're still thread safe.

So thread safe code may call other thread safe functions. Kind of makes sense. If you call something that's not thread safe, then you're not thread safe. So thread safe code may call other thread safe functions. Kind of makes sense. If you call something that's not thread safe, then you're not thread safe. So thread safe code may call other thread safe functions. Kind of makes sense. If you call something that's not thread safe, then you're not thread safe.

So thread safe code may call other thread safe functions. Kind of makes sense. If you call something that's not thread safe, then you're not thread safe. If it's thread safe, then you have to kind of assume that it isn't, and you have to do things to make sure that it is. And the most common solution is locks.

It's kind of falling out of favor because of all of the negative aspects of deadlocking and race conditions, et cetera, that the locks cause. They're hard to administrate, et cetera. There are some new methodologies coming around. I wish I had time to include them, where we talk about where different methods are used. other than locks that don't have some of the problems that locks do.

So, may simultaneously access distinct data. Now, this is kind of like record locking in a file. If I've got a huge block of memory, I can have one thread working on this part, and one thread working on this part, and one thread working on this part over here. This is distinct pieces of data. All it requires is I do some kind of housekeeping to make sure that these threads don't step on each other when they're accessing this and modifying the data. So that's what I mean by distinct data.

So thread safe code has no race conditions. A race condition is something like a clock where you've got your thread waits for a second, increments seconds. If the seconds hit 60, then it resets seconds back to zero, then increments minutes. If after incrementing the minutes, the minutes hit 60, it resets minutes back to zero, and then increments the hours.

That's OK. That's all well and good. But let's say you've got another routine that's supposed to be able to read the time. What happens if my one thread that's keeping my clock running, he goes to increment the seconds, the seconds hit 60, but before he can reset the seconds back to zero, he gets interrupted. He gets preempted by another thread. This other thread says, I want to know what time it is. And he goes out and looks at it. Well, it's 1, 30, and 60 seconds.

That's not a valid time. So that's called a race condition, where you've got multiple threads. You've got multiple threads accessing the data. And at any point in time, the data can be in an invalid state because one thread hasn't completed manipulating the data back into a valid state. You have a race condition.

So, does not deadlock. Now this is a case where you've got, let's say we've got two threads. The first thread locks on one variable, the second thread locks on a second variable, and now they both need the other thread's variable. So you've got two threads locked on each other, holding the variable that the other one needs, and waiting on the variable that the other one has locked. That's a deadlock condition.

So it has no priority failures. So let's say you've got a thread that runs that constantly updates the screen showing how much percent done a task is. Unfortunately, the task that it's measuring is a low priority task, and the fact that we're updating the screen all the time to show what percent done is affecting how much time the task gets. And so it's the Heisenberg uncertainty principle. Observing it slows it down. So that's called a priority failure. Now the extreme case of this is where the secondary thread can't get any time at all, and that's called a full starvation. So things to watch out for.

Static or global read/write data, what can you do about it? Well, you can eliminate it. : You can convert it to thread-specific data. All of the different APIs that I mentioned, all the implementations, have thread-specific variables, where you can have a global variable that says, "This is the variable ID for : I'm going to show you some piece of data that all of my threads need. And each one of those threads will use that variable ID to access a thread specific variable.

So you can protect it via synchronization methods. Use an entry and exit kind of thing. You can use critical sections, et cetera. They basically said anybody that wants to access this is going to have to take a lock. And if someone has the lock when someone else want to access it, they're going to have to wait for the lock. And then when you're done accessing it, modifying it, whatever, you release the lock so that anyone that's waiting for it can get the lock and then access it.

I think I'm on the last battery here. Oh, here we go. So writing thread-safe code. Here's a little example I threw together just to give you a taste. It's non-reentrant. That's because we have a static local there. And the problem with this code-- all this code does is we pass it a string. It's got a local buffer that it copies the string to while it's converting it to uppercase. Null terminates it and returns the address of the buffer. That's all well and good.

unless you do this. So, let's say you're doing a password compare. You pass it in the user-- the password that was typed in, and the second parameter is the password that was in a password file, let's say. What happens is the order of those two calls inside the string compare are non-determinant in C. They could happen in either order. But let's say that the first one, the leftmost one happens first, and he stores the value passed in into the buffer.

And then when he finishes, the second one executes, and he stores uppercase, the value that was in the password file, into the same buffer. So when string compare gets called, what's being returned is the address of that static character buffer, and it's the same address. So it's always gonna match, and I don't think you'll have a very secure system.

So what can you do? So one thing you can do is you can malloc memory, because malloc memory is always thread safe. And use that instead. You copy-- you do the same kind of string copy into the malloc. That's okay. It's a better function because it is reentrant, but it's got a problem. It requires error checking and external free.

The person that calls this routine is going to have to know that it may fail. I may not be able to allocate that memory. I'm going to have to check that pointer that's returned and see if it's null. And it's also going to have to know that if it isn't null, I'm going to have to free it. And that's kind of a bad programming paradigm. A better solution is to let the external guy malloc his memory and worry about freeing it.

And in this case, we're just going to do -- the first parameter would be the string in. The second parameter would be the string out. And we just copy the input to the output. And it's really up to the guy that calls this routine to make sure that the buffers are allocated and freed and et cetera. So if -- that also means that he's got to be responsible for making sure that what's being passed to this routine is mutable.

And then the third parameter is the string in. And this is the string that's going to be used to make sure that the data is being used in the right way. And then the third parameter is the string that's going to be used to make sure that the data is being used in the right way.

So at the last minute, I had one of the QuickTime engineers that knew I was doing my threading session drop me two QuickTime slides. And so these don't quite fit in with the rest of the presentation, but I promised him that I would plug it anyway. He did a lot of work on QuickTime to make it thread safe, as thread safe as possible. And at the very least, I can share that with you.

So anybody who uses QuickTime knows that it basically uses plug-in components. Some of those components are provided by Apple, some of them are not. Most of them provided by Apple are thread-safe, and if they're not, they will be soon, because this particular engineer is very religious about making them thread-safe. But you can't always predict which components you'll need at runtime.

Some of these components may be third-party components that may not be thread-safe. And so how do you tell? Well, fortunately for you, we let QuickTime do the job for you. If you've written a thread-safe component, you've done all the right work, there's actually an attribute of the component that you set that basically tells the QuickTime, "I am a thread-safe component." Now, you could always lie, but you'll probably find out pretty quick if you were wrong. So I already mentioned opening a movie may invoke many components under the hood. Got a little ahead of my bullet points here. And some components written by third parties, some legacy. How do you know whether the components are thread safe? Non-thread-safe components cause hard-to-reproduce crashes.

So how do you use QuickTime from other than the main thread? Well, this API, innerMovieOnThread. If you do this, you're telling QuickTime that you're not on the main thread, and you want to do threaded component movie stuff. It protects you from opening non-thread-safe components. So you might just open a movie.

But then underneath, as QuickTime goes out and starts loading other components and et cetera, it's going to check each and every one of those to make sure it's thread-safe, make sure it's got the thread-safe attribute. And if any of the components that that movie requires aren't thread-safe, he's going to return back to your thread from new movie or whatever you called this error, component non-thread-safe error. So, and this is an indication to you, well for this particular movie, you're going to have to migrate it to the main thread. Well, instead of having to throw away all the hard work you've done and start over, we actually have some APIs that make this easier.

"Detach movie from current thread and attach movie to current thread." So from the thread where you got the error, you can detach it. You can send a message to the main event queue saying, look, you're going to have to play this movie because I can't. The main event, the main thread calls attach movie to current thread, and it takes that movie and it's in whatever state the thread left it in, and is able to play it on and use it on the main thread. So it's important to check for errors returned by these APIs. They may fail if the movie uses non-thread-safe components. So we do have a new tech note out, 2125, that goes into the details of the QuickTime threading stuff for anyone that's interested.

[Transcript missing]

is a-- currently it's not threaded. By clicking the window, they spend for about 10 seconds, well, maybe about five seconds. I can't drag the window. I can't open any menus. I got it post-haste here. My thread-- my application is basically bricked. Eventually, if it was timed out long enough, I'd get to spinning beach ball, but we're not getting it here. If you look down here, I'm running it in thread viewer down here, you can see I've got one thread, and when it's running, the green down here is basically that thread, or the main thread, drawing this quartz code and taking up that one thread.

If I switch it into threaded mode, now I can drag the window around, slows down a little bit, I can look at my windows, my menus, et cetera. And you notice down in thread viewer here, I can actually see work being done on more than one thread. So this is an example of the difference in user expectations as far as interactivity and why you should use threaded code to free up your event loop, et cetera.

Back to the slides. There we go. Okay, this is pretty much the canned URL for all of our sample codes and everything else. I can provide the source to this one if you want to see it. It's basically, it's Ovaltine, and I just added like six lines of code to thread the spinning part. But it's pretty simple. If anyone wants it, you can either email me at my email address, or I can post it. So, and there's my email address.

I did want to point out the last one, and it's really hard to read there. We do have an MP, SMP mailing list. It doesn't get a whole lot of traffic. We have both, I know three or four of the internal engineers that lurk on that list, and we have a pretty good community of developers that do MP programming that lurk on that list also. When someone typically posts to that list, sometimes I'll see five responses to the list even before I even see the original question. So it is actively being monitored regardless of the low traffic. My email address is g-e-o-w-a-r at apple.com.