Blocks and Grand Central Dispatch in Practice - WWDC 2011

Developer Tools • iOS, OS X • 45:02

From processing events and callbacks to keeping your app's user interface running smoothly, block objects and GCD queues are a fundamental part of software design on iOS and Mac OS X. This session provides both an introduction to the technologies and more advanced tips and tricks you need to take advantage of blocks and GCD.

Speakers: Kevin Van Vechten, Dave Zarzycki

Unlisted on Apple Developer site

Downloads from Apple

HD Video (120.7 MB)

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.

Welcome, everybody. This is the Blocks and Grand Central Dispatch in Practice talk. I'm Dave Zarzycki and I'm from Developer Technologies. So where are we? We're talking about Grand Central Dispatch and blocks, and these are very low-level technologies that are available in Mac OS X and iOS for implementing a wide variety of services. Everything on the system uses it, all the way from lib system, from very traditional Unix APIs up to UIKit and applications and foundation.

So what are we going to talk about? Well, there's two major or three major sections. First, we're going to do an introduction, talk about blocks, Grand Central Dispatch, and then the latter half of, well, once you start using these technologies, what are the subtle things you need to be aware of that can impact your design or maybe surprise you or surprise you in great ways? So that will involve what blocks automate and what blocks don't automate as far as memory management is concerned.

So blocks, what are blocks? Blocks simplify function callbacks. They allow you to... and I will talk about the functions of blocks and GCD. We have a function, a body of code. We have our curlies, we have some statements inside. Well, guess what? We have blocks. They have bodies of code. They have the same curlies. They have the same statements you already know.

Now here's where things start to diverge. You obviously know pointers in C. Well, with blocks, we have a special kind of pointer, and it uses the caret. And as you can see, we can declare a pointer to a function and a pointer to a block with these characters. In fact, this is the only difference between these two pointers.

One uses a star, one uses the caret. Now, of course, you all are good programmers and you would never just type that out in random places in your code and use typedefs. The typedefs look the same. They're the same syntax, same style. And in fact, after that point, functions and blocks look the same. They have a type. You can assign to them. And in fact, you can call them the same.

So let's go back to these blocks of code. Let's make some room. We're going to add some arguments. And in fact, the arguments are the same. You declare them the same, they use parentheses, you have a comma separated list, int A, int B. Well, let's see where they start to diverge now.

Well, with a function, we need to name it. So here's my comparison function. And it needs a return value, so we're going to say end. What are we going to do for a block? Do you remember that special character we talked about, the carrot? There, that's it. One character. And if you missed that, let's highlight it. That's it. Just one little character. There it is, hiding there. But it's a huge character, and it makes a big difference, as we're about to see.

So let's talk about implementing a sort function with the little comparison routine we just implemented. With a function, we would, of course, take some kind of pointer to an array of integers, maybe a size of the integer array, and then we'll take our function callback to compare them and let the sort routine do its magic. With blocks, it's the same. We're going to have our pointer to the array of integers, maybe the size of it, and our block. Okay.

Everything's similar. We know this. Well, how are we going to use it? Well, with a function, we're going to take our array, pass up the size of it, and then we're going to use that name we declared for function, my compare, and pass it to the sort array. Well, what do we do for blocks? Here's where the power kicks in.

We're going to actually just take that body of code and put it right in our sort function. This keeps the usage close to the definition and usage right in the same spot. There's no ambiguity. It's very clear. It's right when you're thinking about it anyway. And in fact, that body of code for the block disappears. They all stay together. Okay.

So this is a basic block. All right, that's cool. We understand they're the same, but one might not see this as really saving a lot of time. So let's build on this and talk about trivial blocks versus non-trivial functions, or let's implement a configurable sort. So starting again with our function and block comparison, we have our sort as we just implemented, taking a block, and we can add maybe some kind of reverse argument. We want to do a reverse sort.

Well, here we go. That's it. The reverse argument is in scope in our function when we are calling sort and the block is in scope when we declare it and pass it to sort. And because of that, we can just use the reverse variable or any other variable we want inside of that block because everything is in scope.

: Well, let's see what we'd have to do for functions now. All right. So here we're starting from a very similar point. We have our reverse argument, we have our sort, but we have a problem. That comparison function is defined somewhere else. We need to get that data to it somehow. But how? Well, we need to pack that reverse argument in a data structure.

And then we're going to have to change our sort routine. All of a sudden, we have to change APIs. We have to go then deal with the ripple effect of everybody that uses sort and now pass in this data. And in fact, we now take the address of that data and pass it in.

So you can see the address of D there, and the complexity is just beginning. Now we need to move the code down. Well, not literally, but somewhere else in our code. We need to actually go back and declare that data structure. We need to add whatever arguments we want to pass in to the sort routine. We then need to go to our cert routine again, make a lot of room.

Add a context parameter, and because this is C, of course, it's a void star. It's generic. We now need to cast it back to that data type. And then we finally use it, but it's abstracted away through pointer indirection, and we have a lot more code. And worst of all, as the slash, slash, slash shows, that code is far away from the actual implementation.

We've had to deal with the mental overhead of code over here, code over there. It's just starting to get messy. So that's it. These are basic blocks, basic non-trivial blocks, but you can see that we're already starting to save a lot of code and a lot of mental overhead of basic boilerplate of C data structures and pointer and direction.

Let's move on. Let's even build on this. Let's build an even more non-trivial block and let's extract results from it. Well, here we are with the code that we just left off with. We're going to make a little bit of room in the block and just to declare another variable in the stack, a count.

Let's say we're going to count the number of comparisons we made because we're trying to do some performance analysis. Well, here we are. We just declare int count, initialize it to zero on the stack. and David Koehn. We are going to start with the first step, which is to say count++ in our code and then at the end we can log it and see how our sort routine is performing.

That's almost true and that's why we left a little bit of space in front of that declaration of the count. We need one more thing, under under block. This keyword is the only keyword you need to know to get information out of a block. And we'll talk more about why this is the case at the end of the talk when we talk about memory management and blocks. But that's it. Three simple lines, one of which is just logging. And we've gotten data out of a block. That simple.

Let's look at what we'd have to do with functions. You're not actually meant to read this, because what ends up happening is we're adding yet more code, and yet more code in lots of different places in our file. We're gonna have to, and in fact, this is what that code looks like.

Again, I don't expect you to read this, but I'll just tell you what's going on. We had to go back to that data structure. We had to declare that, you know, our out parameter. We had to then go to where we needed the sort and then assign the result into our temporary stack variable that we're going to pass to sort.

And then finally, in our sort routine, we now need to deal with the pointer indirection and making sure that we actually save off the result and the data structure so that way after sort returns, we can get at the result and it wasn't left behind in the sort routine function.

So yeah, this gets really complicated. And as you can see, it's a lot simpler with blocks. And in fact, it can get even more complicated with functions when we start dealing with asynchronous code, but we'll deal with that when we talk about GCD and at the end of the talk when we deal with memory management.

So in comparison, the mental overhead of blocks versus functions, we've highlighted on the right. You just need to know that caret, it may even be hard to see, but it's there, that one character, and then under under block. That's all the mental overhead you need. But for functions, we have a lot. We have data structure definitions, we have pointer indirection, we have code far away from usage. It's a big mess.

So let's talk about blocks at Apple. We love them. And a lot of APIs use them. Some common patterns you'll see are things like enumeration. And I'd like to use these two examples enumerating a dictionary and an array of why we like them. With the Objective-C for each syntax, you can only do one value at a time, one thing. But in the case of a dictionary, we can iterate both the key and the value at the same time because that's how we can define our block. And in fact, that's what the enumeration of a dictionary does here.

and David We can do that for any containing object. Much more is available in the system. We have GCD, which we'll talk about. We have lots of callbacks. And in fact, callbacks are one of the simplest and most common patterns and they look like this. Well, almost. That wouldn't be great if we're not going to pass any arguments with the block. We could just delete them. In fact, that caret becomes the only character you need to define a block.

You're going to have to use the curlies anyway with functions or blocks, but the caret is the only overhead you need to do a callback. And with that, I would like to transition to Kevin to talk about Grand Central Dispatch and its use of these blocks and doing really powerful patterns. Good morning. My name is Kevin Van Vechten. I'm with the CoreOS team at Apple.

So today we're going to talk about Grand Central Dispatch, which builds upon the idea of blocks. And as Dave just outlined, you saw how blocks are really great in terms of C syntax and eliminating a lot of the conceptual overhead for defining blocks of code that can be called by the system as an iterator or as a callback.

Well, we built on that further and we created Grand Central Dispatch, which really treats blocks as fundamental objects. And like any objects, you can put them in interesting data structures. The fundamental data structure of Dispatch is a queue, and we'll talk about how we use these queues to efficiently provide you with tools for serialization in your multi-threaded code, for concurrency, and for asynchronous execution.

So you're probably familiar with the concept of queues from, you know, common computer science literature. They're pretty straightforward things, simple linked list. In this case, it's a linked list of blocks. Now, the queues that we implement in GCD have some very important characteristics. The first of those is that the queues are strictly FIFO. So anytime you enqueue a block, that's always going to go on to the tail of the queue, and anytime a block is dequeued, that's going to come off of the head of the queue.

And we've taken it a step further and we've provided synchronization primitives that guarantee that the end queue is always atomic. So you can have any number of threads in your application and all of those threads can be enqueuing blocks onto the same queue and will automatically sort it all out and make sure that the blocks get enqueued in FIFO order and ultimately get run in FIFO order. So because it's safe to use these queues for multiple threads, the queues themselves become a synchronization primitive.

And finally, we take these queues and we actually manage the dequeuing of blocks and the execution of them automatically by the runtime. As you saw with the block that Dave showed where there were no arguments and no return result from the block, it was just a very simple callback. That means that all of the blocks conform to the same signature, and we know exactly how to call them. They just need to be invoked.

They don't need to be passed any arguments. They don't have any return results. All of that is managed by what variables were captured in scope when the block was declared. And so we can automatically dequeue these things in the background on threads that the system manages for you.

So some interesting characteristics of this approach become apparent when we compare the queues to traditional locks like spin locks or Pthread mutexes. So in real rough terms, a mutex or a spin lock is most efficient when there's no contention. In other words, if there's only one thread running in your application and it acquires the lock, then it can start executing the code right away.

Any other threads that try to acquire the lock at the same time are going to be stalled and they aren't going to be able to make any progress until that lock is released. So really the efficiency in terms of throughput of your multithreaded code is going to decrease linearly about with the number of threads you have and the amount of contention.

On the other hand, with queues, because we can do a very quick atomic operation to enqueue a block on a queue and then move on in your code, we actually have a much different curve. And that is that peak efficiency in terms of throughput is actually achieved when the system is the busiest. The more queues you have, the more blocks you're submitting to those queues, the faster they return, the more throughput you're going to get.

And this is a really interesting thing to consider when you're designing applications that run across a very wide range of hardware. So you might be designing iOS apps that run on iPhones and iPod Touches, and those only have a single core. and David Koehn. The characteristics were pretty good, but if we take that same code and run it on an iPad 2, which has two cores, you're going to start seeing some benefits from the approach of using queues. And of course, all of these interfaces are available on the Mac as well, and so you might have many cores on your Macintosh, and the same coding style, the same approach, literally the same binary can run from one core to many cores very well.

When the system automatically dequeues, what it does is it's guaranteeing that strict FIFO ordering of the blocks being dequeued, but if there are multiple queues that are independent, that actually presents the system with an opportunity for concurrency. So in this example, we have a couple of queues. One has three blocks.

The other has two blocks. The system might begin dequeuing some of these blocks and executing them on one CPU. If another CPU becomes available to execute the blocks, then that second queue can start and you can see there's some overlap in the timeline. That's concurrency that was available to your application.

So in order to use Dispatch, we're going to talk about four fundamental concepts. The queue, which is a data structure, and then there are three functions, sync, apply, and async. And we'll go into each of these in detail. So starting off with queues, they're a pretty standard data structure.

You're probably familiar with the semantics from a lot of other APIs on the system. When you call dispatch queue create, it returns to you a new queue object. These objects are retained and released. You should retain and release in pairs. The last release balances out with the original create, and that actually deallocates the object. You can see in the create function here that there's a name. We recommend giving a reverse DNS style name.

These names actually show up in things like crash reports, so it can be pretty useful as a diagnostic tool to give descriptive names to your queues. So you can see what was executing at the time something went wrong in your application. The second parameter, which is null in this example, has advanced attributes that you can assign to queues. We won't talk about any of those today.

So moving on to Dispatch Sync. Dispatch Sync is probably the simplest API call in GCD. And basically all it does is it takes two arguments, the first being a queue and the second being a block. And DispatchSync synchronously enqueues the block onto the queue and that's done in an atomic nature so it's safe to call DispatchSync on the same queue from multiple threads at the same time. The blocks will be serialized, they'll be executed by the system and then after the block has finished executing, DispatchSync will return to the call site in your application.

So this is a very useful thing for implementing critical sections. Maybe you have a lot of threads, you're implementing some shared data structure, you can use a queue to protect that data structure, you can use Dispatch Sync to serialize access to that data structure. And so in this code example, we're doing something, you know, classic textbook example of having some sort of account balance that's updated with the transaction, and we want to make sure that there's atomicity in this update. So pretty straightforward stuff. A slightly more advanced pattern building on Dispatch Sync is using the under under block keyword to extract results back out of the block that's synchronously executed.

So we've expanded this example a little bit. There's a return value. We're just going to have a Boolean that says whether or not the update happened successfully because maybe we want to check to make sure the balance in the account is actually sufficient that we can adjust it for the amount of the transaction. Now, one of the key things that differs here between Dispatch Sync and a traditional locking approach is highlighted by the return statement that's about halfway through the code.

If you've ever written a very complicated function with a lot of locking that checks various error conditions or checks input, you know you have to be very careful to unlock any of the locks that you've acquired before returning from your function. If you ever leave a lock locked, then a lot of times somewhere much later in your code, something deadlocks against that because the logic was left in an inconsistent state. One of the really powerful things about blocks in Dispatch Sync is that the runtime manages all the serialization.

And it does that on block boundaries. It's safe for you to return from your block at any point because it's the returning from the block that signals that that block is finished and the next block can execute. So it really lets you do some concise things for checking for error conditions and whatnot, simply returning from the block in the middle. You don't have to worry about unlocking a lot of locks before continuing execution of your program.

And then, of course, in this example, assuming the transaction was able to return from the block, you can do that. to be interpreted, we didn't return early, then we just assigned true to the result, and that'll be visible by the caller when we return from DispatchSync, and that can be returned from the outside function.

So there's one thing you probably should be aware of with Dispatch Sync, and that is it does what it says. It waits until the block has finished. So if you were to do something like this and call Dispatch Sync on the same queue from within a block that's also Dispatch Synced to that queue, you're going to deadlock. And the reason for that is the outer block is still executing. It hasn't returned. The runtime enforces very strict FIFO serialized nature on these queues, like I mentioned earlier.

So that inner block is really never going to get an opportunity to execute because it's still waiting for the outer block, which of course is waiting for the inner block. Now, this example is pretty straightforward. If you saw this in code, you'd probably say, well, yeah, of course that's going to happen.

Though if there's a lot of layers of indirection, then sometimes it's less obvious that this might be occurring. So the takeaway point here is if you're debugging and you see things that are kind of hanging up and you look at your backtrace and there are multiple calls to Dispatch Sync in that backtrace, you probably want to go and look and see, oh, are these in fact the same queue? Might this be a pretty classic deadlock pattern there?

So that's Dispatch Sync. Now we'll move on to Dispatch Apply, which really takes the exact same approach that Dispatch Sync does, but provides concurrency to your application. And this is very useful for data level parallelism. The runtime is able to scale the blocks that run concurrently to the number of cores that are available.

And it really takes a pattern very similar to a traditional for loop that you're used to using. You give it an overall count. It's going to provide an index parameter to the block. It's going to be incrementing that index as it runs. You can use that index to index into an array, something along those lines.

Now, Dispatch Apply is also beneficial in that it has knowledge of what various different parts of your application are doing. So you might imagine calling into one framework that has made the decision that, oh, well, we have eight cores on this machine. We're going to divide this image up into pieces and execute eight threads to process that image.

And meanwhile, you might be playing a sound or something and the different framework has said, oh, we have eight cores on the machine. We can split the sound up and process it. And this is a very useful tool for data analysis. With eight concurrent threads. And pretty soon you're running 16 threads, but you only have eight cores and you aren't really gaining anything. Dispatch Apply is able to balance the competing demands from different areas of your application because it actually has visibility into what these different subsystems are doing.

The subsystems themselves can partition things in a logical manner and not really have to worry about how busy the system is at that moment. So as you see in this simple example, we're just using the index to index into an input array. We perform some computation. We write into some output array. And we're done.

And what this looks like in practice is we might have a block and we have a CPU. We start executing some blocks on that CPU. Another one becomes available. The system just automatically fans out the blocks to the multiple CPUs. That can grow and contract over time depending on what the system looks like.

So one thing that we get a lot of questions about is why isn't Dispatch Apply working as fast as I think it should? You know, I have an eight-core machine, I did a Dispatch Apply, and I'm maybe seeing a one-and-a-half times speedup instead of close to an eight-times speedup. Well, there's a few reasons for that.

One of the most common is actually hidden locks. You might have a fairly complicated body of code in this block that you're using with Dispatch Apply, and on the other side of some function call that you make, perhaps there's a lock. In this example, we're just calling printf and printing out the index.

And if you were to try this example, you would see that it actually executes at about one-time speed, even on a multi-core machine. Well, the reason for that is because the standard C library actually has a lock inside printf, and so it doesn't really matter how many threads are running these apply blocks. They're all going to be executing one after another because it's all contending for that lock. outside printf. So when you're profiling your code, this is definitely something to be aware of. Look for those bottlenecks if you aren't seeing the performance you would expect from Dispatch Apply.

Another somewhat related issue is perhaps the block you're providing to Dispatch Apply is too small, and so the cost of the overhead of actually bringing up these threads and fanning out the work is dominating the execution, and so you don't really see much benefit from doing a concurrent apply. Or, another possibility is perhaps you're accessing memory that's too close together and you're constantly invalidating the cache line between cores because they're both contending at a hardware level for the same memory accesses.

Well, striding is an approach that you can use to mitigate both of these issues. Basically, the concept of striding is take the input and divide it up into chunks and then perform the apply block on just each individual chunk instead of each individual element. Sometimes it's easy to divide things into chunks logically, like you might have an image and you might decide, okay, well, we're actually going to apply over all of the rows of the image and then iterate linearly through each of the columns within those rows. Sometimes it's a bit more arbitrary and it really is just chunking an array in a specific size.

There's not really a good rule of thumb for what the right sizes are, but typically it's pretty easy to come up with a pattern like the one on the slide where, you know, you're jumping to a major index in an array and then iterating over a subindex and you can use performance tools to tune your code, look for something that looks fairly optimal as kind of a stride, you know, a chunk size.

And one of the things you can also do is make sure that the chunk sizes are big enough that if you are dealing with an array that you would actually be operating on cache line boundaries so that you don't give, you know, something that's within the same cache line to the same array.

So you can't have the same block on multiple CPUs, but they can operate on disjoint sections of data that will keep the contention lower. Cache line size is something that you can look up. There's sysctl calls and whatnot for obtaining the cache line size value for the current machine. So it's really just an iterative approach, measure and tune, but this can make a huge difference in the performance that you get out of dispatch apply.

So the last major fundamental building block of GCD is Dispatch Async. And Dispatch Async is a bit unlike the others in that it executes blocks asynchronously. It doesn't wait for them to complete before returning. So this is also useful for implementing critical sections because remember the queues and the dequeue of the queues is what maintains the strict FIFO ordering and mutual exclusion. But in this case, because we're calling dispatch async instead of dispatch sync, we're saying we're willing to just enqueue the block, fire and forget, return immediately. We know the system will execute that at some later time.

So sometimes you know you need to perform some operation, you know it needs to be consistent with respect to whatever data is also protected by that queue, but you're not actually interested in the result of that operation. And Dispatch Async is a great tool to use in those cases.

So here we've expanded on the example that we were using previously, and we can use Dispatch Async to perform some maintenance task, like maybe we calculate some sort of interest that accumulates in the account. We don't really need to know what that was here, so we can just enqueue the block, fire and forget, move on.

This is even more powerful when you apply it to the overall architecture of your application and move work off of the main thread. So the main thread on both Mac OS and iOS, is what's responsible for handling events. It is running the main event loop, events are processed there.

Any code that's executing for a long period of time on the main thread prevents your application from receiving and processing additional events. And so on an iOS app, this makes it look like the app's not responding to touch events. On a Mac OS app, you actually see the cursor change into this spinning beach ball if it's in the state long enough. So anytime you have a large task, you might want to consider using Dispatch Async to defer the execution of that task off of the main thread.

And of course, anytime you defer something on a queue, and you have multiple queues, that gives the system the opportunity to execute those blocks concurrently. So you actually implicitly get multi-core benefits on machines that support that. So building on this pattern of Dispatch Async, one of the most useful features is embedding multiple layers of Dispatch Async within one single block. So you can actually do that by using the command "dispatch_async_dispatch_async" to get the result back to the main thread.

And using that to first take a block and defer its execution off of the main thread. And then once you have a result from that block, using Dispatch Async to get the result back to the main thread, so that you can actually use that data to perform some sort of UI update.

So here we have an example where, again, we probably want to execute with consistency of the account information we had, but we're going to call some function that renders an account statement. Maybe it returns an image that can be drawn to the screen. That might take a while. We don't want to do that on the main thread. But once we have the information available, we can do a subsequent Dispatch Async.

We can call a special function called Dispatch Get Main Queue, and that's tied in with the main event loop of your application. We'll make sure that blocks submitted to the main Dispatch Queue get executed as you would expect on the main event loop. And then you can perform the draw operation of the image on the main thread where it belongs, but you didn't have to wait for a long time in order to gather the raw data before that.

So if you're going to be doing this type of nested Dispatch Call Something, Have It Call Back, a lot of times it makes sense to actually define a function that encapsulates that pattern in the API. And so what we really recommend, as you see in the top line of code here, is providing the queue and the callback block as the last two arguments of the function. The queue provides the execution context for where that block should be invoked, and the callback block is a function that's going to be used to call back the function. And then the callback block is, of course, whatever you're expecting to run after the asynchronous operation has completed.

So those are the last two parameters. The reason for putting the block at the end is that, as you saw in the earlier examples, it's very convenient to define a block in line with the function call. Keeping it at the end is a syntactic convenience. It means that you don't have to go looking for extra parameters that might be dangling past a very large block that's there in the declaration.

So we get a queue, we get a block, we call this new asynchronous function we've defined, and we pass in the queue. In this case, like the previous example, we want it to be executed on the main queue. And then now we just define a simple image draw block that executes there.

So one thing you do need to be aware of is that dispatch queues must be retained when you're using this nested block approach. So this example is really what the implementation of that asynchronous function we defined on the previous slide might look like. And because we're being passed a queue that we don't have ownership of, and because all we're really doing in this function is calling dispatch async, which returns immediately, the caller is free to release that queue as soon as it's passed it to us, which means that queue might be invalid by the time the block actually gets around to running. Because remember, dispatch async is fire and forget. It's deferred execution. Some other thread will run that at some later time. So we need to make sure the lifetime of the queue is extended.

Such that we know it's valid so that it can receive the last callback block. So here we have a queue input parameter. Before we do the dispatch async, we need to do a dispatch retain. And the dispatch retain means that we're asserting some degree of ownership of this queue. Then inside our background task block, we can deliver the callback block to that queue via dispatch async.

And once we've done that, we're done with the queue. The system will take it from there so we can actually release our ownership of the queue. So this retain release pattern, where you retain things before an async block and then release them in the async block, is necessary when you have a couple layers of indirection like this and you're working with types that are basic C types, you know, C malloced objects, you need malloc and free, duplicate strings, retain or release queues, that type of thing.

And so to talk a little bit more about some of these memory management rules surrounding blocks and asynchronous execution, I'd like to invite Dave back up to the stage. All right. Thanks, Kevin. So now we're going to really start to talk about, we're going to talk about the practical details of blocks and async. This is where the memory management kicks in. This is where a lot of your kind of daily details about how do I make everything work right and work reliably. So building on what Kevin talked about, let's take dispatch async. And implement it.

We're going to implement it right here, right now. Well, we have that block parameter. What are we going to do? Well, in fact, GCD provides an alternative function-based interface. They're the same names with an underbar F at the end. It takes a context parameter and a classic C style and it takes a function. And that function takes that context parameter. All right. Well, here's the body of dispatch async.

It just calls the underbar F version of the same. It passes the same API. Passes the same queue. And then it does two interesting things. It calls block copy, which is a block API. And what that does is that copies the block to the heap. And we'll see why in a second. And then it calls this little static helper routine. What is that?

Well, that's just this. It takes that context parameter, casts it back to a block, calls the block, and then releases it. That's it. That's how you -- and the important thing to take away with this is that it's not just a block. It's a block. It's a block.

that any async API should block copy and block release. This ensures that a lot of automatic memory management kicks in and it saves you a lot of boilerplate in your code of taking that data structure that we talked about at the beginning of the talk, copying it to the heap, figuring out when to free it. This is what blocks really is all about.

So the really important takeaway, too, is let's say there's an API in the system or an API in your code. Well, guess what? You can add a block wrapper to it. It's that simple. And in fact, you don't even need to wait for Apple to do it. You can just write your own little block wrapper around any function and context pointer-based API.

So what is this block copy? What does block copy do? It does a lot of great things. It automatically copies values, integers, floats, pointers, et cetera, and that under under block variable forces sharing. This is what allows the caller and the block to share data. But otherwise, that copy, that automatic copy of values is what allows things like Dispatch Async to just work. Any values that are captured are automatically brought to the heap with the block.

Block copy also automatically copies and releases other blocks. So if you just happen to use a block parameter within another block, you don't need to worry about memory managing the relationship between the two. and David Koeppel. It automatically releases Objective C objects. If you may have noticed this week, this is a big theme of the conference, where we're automatically retaining and releasing Objective C objects.

And blocks are one example of how this pattern has been increasingly apparent and obvious in hindsight at Apple. We need to automatically retain and release objects because it's just the right thing to do. and David Koeppel. Block copy also automatically calls C++ constructors and destructors. Having said that, strongly recommend that you use our latest compiler, the Apple LLVM 3.0 compiler to get the best C++ and block relationship.

All right, let's talk about what block copy doesn't do. And this is really why certainly some of you are here. Block copy doesn't read your mind. What do I mean by that? Let's say you're going to use dispatch async, and you're using it from within one of the methods in one of your classes. And you pass an instance variable in that block, and you tell it to do something. Well, what's actually happening here?

Well, implicitly, this is the way the compiler thinks about it, is that's actually taking the self pointer and dereferencing it and getting at the Ivar. And therefore, from the compiler's perspective, self is actually the variable that's captured in the block, and self is automatically retained. This might surprise you. Why is my object living this long? Well, it's because the object that you thought was being captured isn't the one you thought. There's a mismatch.

So what can you do? Real simple workaround. Just be explicit. Create a temporary variable, assign the instance variable, capture the temporary variable. Now you're explicit. Only the Ivar is captured, and only the Ivar lives long enough to do the right thing. Another side effect of this, of capturing self, is if the Ivar changes between the dispatch async and when the block starts, you may get a different value for that Ivar being used, and that might surprise you. So this is another good reason. To be really explicit, say I want this one right here, and the same one when the block actually runs.

Let's talk about another example about what block copy doesn't do. It doesn't automatically fix retain cycles. Now, luckily, a lot of our frameworks have been designed over the years to avoid retain cycles, but it's still always possible with new code to introduce new retain cycles. Here's a really simple example. It's also really obvious in isolation that what is going on here.

We're going to take an object, we're going to set some kind of handler, and guess what? We want to use that object in the block. Well, what's going on here? The object is going to block copy the block, and that block is automatically going to retain the captured objects. And here we go, a simple cycle.

Well, what can we do about this? Well, a simple workaround is that in manual memory management, under-under block variables are left to the programmer to decide the policies about how they work. And under-under block variables are not implicitly retained at block copy time. The value is, but the object itself is not implicitly retained. So you can take advantage of this fact to assign the object to a temporary block variable to prevent automatic retaining of it. And guess what? The object now just retains the block. The block just holds its hands off the under-under block variable, and the right thing happens.

and Kevin's work on dispatch sync deadlocking. As Kevin was illustrating earlier with dispatch sync deadlocking, it may be obvious in isolation, but with a nontrivial code, this pattern can end up in practice in nonobvious ways. Retaining nonobjects. This is what Kevin was talking about with queues. : GCD queues are an example of a non-object, and so are CF objects. So what do we do?

Well, we just need to retain, use it, release it after we're done with it asynchronously. Pretty simple. Here we are with the CF example. Make sure to CF retain, then we maybe do something awesome with that foo thing, and then we can CF release it when we're done asynchronously using it.

Well, what else does block copy not do that you need to be aware of? It's not implicitly called by non-blocks. Remember how I said that blocks can capture blocks and the right thing happens? Well, let's say you're building an array on the stack and you're trying to do something clever by creating custom blocks and putting them into an array.

We actually saw examples of this when we first started designing blocks and people were trying to experiment with design patterns. Well, guess what? This block is only valid for the scope of the enclosing for loop. Once that for loop exits, that block is invalid anymore. So that entire array becomes invalid. The workaround for this is to use block copy inside the for loop.

Similarly, functions themselves are not blocks. They don't automatically manage the memory of blocks. Therefore, if you're trying to do something clever, and actually, it's not fairly reasonable and just return a block, that won't work. You need to call block copy in manual memory management mode. And then, of course, if you block copy, your caller needs to be aware in this last case that it needs to block release it. So keep that in mind.

So let's talk about better blocks. Let's talk about automatic reference counting, which is one of the big themes of this week's conference. Many of the challenges outlined about block copy and what it does not do are fixed with automatic reference counting. I strongly hope you can check out this feature, but we won't be talking about it here. Also, some of the things are not fixed. You still need to be aware of these facts that you've learned today. Non-objects are not automatically retained and released. You still need to do dispatch retain and release or CF retain and release. See the ARC talks for more info.

So to conclude, blocks and Grand Central Dispatch are a very important part of the design patterns we use at Apple for both encapsulating enumeration and concurrent for loops to asynchronous callbacks. They're simpler. They're safer. They help you avoid boilerplate code. There are already patterns that you use today.

For more information, I'd like to suggest that you contact Michael Jurowicz, our developer tools and performance evangelist. Paul Danbold, our Core OS evangelist. We have documentation. GCD is open source if you want to check it out. Here are the related sessions. We have the developer tools kickoff, the introducing ARC session, mastering GCD, Objective-C advanced on Friday, and mastering GCDs tomorrow. Thanks for coming.