Programming with Blocks and Grand Central Dispatch - WWDC 2009

Mac • 49:51

Grand Central Dispatch (GCD) is a revolutionary system technology in Snow Leopard that allows your application to take full advantage of today's multi-core Macs. Learn about the new blocks feature for Objective-C, C, and C++ as well as the key concepts and APIs necessary to understand and use GCD. Learn to incorporate these new features into your modern, multi-core Mac OS X application.

Speakers: Blaine Garst, Dave Zarzycki

Unlisted on Apple Developer site

Downloads from Apple

SD Video (86.6 MB)

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript has potential transcription errors. We are working on an improved version.

Good afternoon, welcome to the Programming With Blocks in Grand Central Dispatch talk. My name is Blaine Garst. I'll be talking to you about blocks and for the most part what they've been used for in other languages as well in Snow Leopard, in sort of traditional ways of programming. But something new for us, new for you, is Grand Central Dispatch. My colleague Dave Zarzycki will be up here later to tell you all about how you can use blocks to deal with concurrency issues. But first let's talk about blocks-- start there.

If you've ever programmed in a language that required you to write this...

[ Applause ]

...you would have been writing in Scheme or what turned into LISP. Perhaps later on in your career you saw something like this. Times repeat, you know, this funny thing in square brackets. Square brackets, what a syntax actually. And that would have been SmallTalk of course.

And more recently if you kind of ride the rails of Ruby, you also see some closures. So the funny thing about all of these languages though, is that they not only have garbage collection all the time, but they're also mostly interpreted. And so that's not kind of something that we're kind of used to. When we program in C or one of its derived languages, Objective-C or C++, or for those of you who want to do both, Objective-C++, we have not had that kind of an option.

Until now. In Snow Leopard we offer you blocks, which are our name for closures, and what they let you do is write this kind of function expression and pass it to things, and you'll see lots of things that we get to pass it to. And the neat thing about it is that it carries a long local context, in this case the variable D.

So where did D come from? Well that's just a local variable in your function or something, and you pass this function expression along and it carries along that value of D so you can just write your code the way you want to write it, and it gets passed off, and it gets used as you wish it to be used. So let's take a look a little bit about this in more detail. So how would we implement the repeat function? So first of all, one of those blocks looks kind of like a function pointer, except we use the carrot symbol instead of the star.

But otherwise this is kind of like a function pointer declaration, and so it ought to be somewhat familiar to you. We think that functions, the return values and take parameters, are kind of complex to writing again and again and again in your APIs; and so we quite often put a typedef around it. And so if we have this typedef for a block that takes no arguments void and returns nothing, returns void, we get this work block thing.

That's a very interesting concept, it's just this chunk of work, and we'll see this used in a number of places later on. But anyway to complete the story, the implementation of repeat is very simple. It takes one of those blocks as a parameter, and then it just calls it like it was a function.

And whatever you passed in happens, and so it's very simple. Both simple to create, simple to simply use. So closures came about in the 1970s actually, pso that was 40 years ago. And so there's been kind of like 40 years worth of experience with closures. And so we now, in the C language, in a compiled environment, get to take advantage of these kinds of programming patterns that have been around in other contexts for a long time. So one of them for example, is kind of what Ruby specializes in, is writing little wrapper kinds of things.

So you write this function, all lines in a file, you pass it in the carrot star file name. Of course this works for Objective-C, but let's do carrot star for now, and you pass in a block. And what this function does is, does all the boiler plate, it opens the file. For every line in the file hands the line off to the block. This is a neat way to write. And then of course after the function is done, it closes it, and so what you get to do is just write your code.

Now you think, think, think, think, in the compute section you use some of your thoughts earlier and stuff, and you just kind of write it, and it's a nice, neat way to concisely express the idea that you can process each line of a file with this code that's just right there. Now you don't have to write this, I mean we know how to write these kinds of things in C today. They look something like this.

You've got your computation interspersed with all this boiler plate, and it can be done, it's just, I think, more concise, easier to read, more utilitarian, reusable, it's just better off this way with a block. So another very common, classic, lambda, and whatever concept is the idea of a map. So here's a category on the NSArray class that does map, and so map takes sort of a transform block, function block.

And what it does is for every element in the array it applies the transform block to that object, to that element, and accumulates the result into a result thing; and then when it's done returns a result. And so that's your transform function. It's very simple to use, it's very cool. Another one is sort of the collect, whereas you pass in a predicate which says for every element in the collection, test this predicate with the element, and if it's true collect it, return this back. So this is sort of an array reducing kind of thing.

So in use for example, you just pass in return [item length] > 20; and you get all strings whose length is greater than 20. Now the syntax for this one is a little bit different. Notice the BOOL there. So we won't talk about the syntax in great detail, but BOOL in this case is the return value coming back from that block.

We don't use it very often, it's kind of a corner case, but it's there if you need it. Now what I like about blocks in Snow Leopard is some of the new uses. But let's go back, if you have to do something as simple as sorting. Now let's think about sorting, I mean how many years have computer scientists been refining the sort function? Like decades, right? None of us really like to write the sort function, right? So we want to use qsort of course. There's other variations, bsort and other things, but let's talk about qsort.

If you have to sort things according to different criteria, like in this case what I'm going to do-- well if you have to use different criteria, what they require you to do to be fully parameterized, is to pass in this callback argument which you set up and then every time you compare 2 elements, you're also passed back the thing you passed in. So you can put your context in this callback argument and vary your sort based on runtime parameters of some kind. So let's take a look at what that looks like in practice.

So the first thing you do is you declare a custom structure. OK, you put your things in here. In this case we're going to say sort some kind of a person field, or object, or whatever, and maybe age first is one of the options and maybe first or last or capitalized or not is another; but in any case, let's focus on age first.

So you set up this custom data structure somewhere, you write a custom compare function somewhere that takes this void star context coming back in, you put lots of castes in and you ask, "Well if it's age first then of course I'm going to compare the age fields." You know, you do this.

The third thing you do is somewhere near where you're going to call qsort, you actually initialize the structure, you declare an instance of the structure, initialize it with the right parameters, and then finally, finally, you get to call qsort_r. Remembering which order the context versus the call, but you know-- you call it. So what do you have here?

You've got your code in 4 different places. Now if you have to change your criteria, if you're actually developing a program and you know, you change your mind or some new option comes up, you've got to go change code in 3 different places. It can be done, it's tedious, might be in separate files, they might-- it's just a pain.

So what we like instead is a qsort_b function, which you will find on Snow Leopard, and it simply takes a block as the 4th parameter. Now notice that these are the elements; these are the things that matter to qsort. It's the array, it's how many elements in the array, it's the width of each element in the array, and the sort block. There's no extraneous thing to confuse, what does qsort want with this funny callback thing?

You don't have that, you've got the block. It's very simple to understand this API. And when you use it, step 1 is, you set up your options, you use your variables as you wish in your block, and there is no step 2.

[ Laughter ]

Right? I mean, if you need to change parameters up in the options area, or down in the block, you just do it.

And it's much cleaner, it's much nicer. Now this is a very simple callback scenario. We have more complexities, say with the timer or with some networking APIs where you need to set in some kind of a function, and it needs to stick around for a long time and get called back.

In those cases the memory you use to set up that context pointer is often really hard to reclaim. It's hard to know when you're done with it. But you've got to figure that out, or else you're going to leak, and so that's also a problem that is very neatly solved with blocks. So the thesis here is that blocks are far better callback API.

In the old style we saw qsort_r, but it turns out that we have lots of kinds of callbacks in our system. The performSelector:withObject: in the Objective-C world is exactly that. You want to send it some kind of code to do it, and you often write a custom selector, and you go do that.

We have this performSelector:withObject:-- with object because at the generic level we don't know how many parameters you really want to pass it, and often they're integers or void stars; they're not really objects, and you're violating the typing system and stuff, and we don't have a general language way to introduce all that stuff, or at least we didn't. Now we have blocks. Another case is the whole idea of the set target-- the target action sequence inside Cocoa.

That is also perhaps better expressed as a block. But these are the mechanisms by which we have done sort of this callback, and we see it in many, many places. There are all kinds of notification-based APIs in our system. The NSNotification for example, you can carry back arbitrary context in a user info dictionary.

Those are so much fun to set up, and to pull your values out, and think about, you know? So the new style of course is much simpler. You just set the callback, I mean every subsystem provides its own set callback API, but it takes a block and you put the code in there that you want it to do, and you're done. It's nice, it's good.

A key thing about this is that blocks are a C language extension, and so Objective-C gets to do it, C++ gets to do it, Objective-C++ gets to do it.

[ Applause ]

So if you are programming Cocoa GUI from C++ you just do it. If you are providing a C API like qsort, and you need some Objective-C goodness going on, then you just do your goodness in Objective-C and hand it to the C API. So this interoperability is actually very cool also, we're really excited about that.

Let's talk about Lifetime. So blocks start out on the stack, and these interpreted languages, sometimes they're stack frames that actually can be allocated off the heap. And if the Lifetime persisted, they'd just use garbage collection. I mean that's way inefficient, we don't do that. Blocks start out on the stack. They can be copied to the heap, but you or the subsystem has to take an explicit action to make that happen. Blocks carry a const version of most variables that it uses.

So in the previous case, that variable D, we actually made a copy of D, made it a const version inside that block so that it goes along with the data, so that whereever that executes, it's got a local copy of that local stack variable. So we don't have to worry about that stack variable.

So being a const is good for 90 percent of your uses, but every now and then you need to mutate something and hand it back to the caller. And in that case, we have a new storage class called __block, and these variables are shared both with the stack that uses it and any blocks that reference that variable.

So here's how __block works in practice. You just stick it up there, it's sort of at the level of register and auto and static and-- maybe there's another one, and you just update your field when you need to. This is actually a real example, innumerate keys in objects using block.

This is a new API in Snow Leopard. If you have to innumerate a dictionary and examine both key and value, with the current scheme of things, it's an order login. It's N login, because the N is just getting the keys back, but then you have to do, in theory at least, a login lookup.

In practice it's much faster than that, but if it's a deeper table. This turns out to be, when you're handed both keys and values, to be the most efficient way to go through a dictionary. In this case we're pulling back the key that happens to match a particular value that's passed in. So blocks give us the ability for these collection types to offer many different ways to go through there. So there are also APIs to reverse-- to examine an array in reverse order.

And that's also the most efficient way we have for doing that. Let me talk about Lifetime. So imagine-- we're about ready to call a routine that uses blocks. So we have stack and we've got the heap. Once we enter the stack we're going to have an int local, we're going to have one of these __ block variables, and we're going to have a block that uses both of them. So once the block is constructed, it makes that constant local copy of int local, and it creates a reference to the stack based shared __block variable.

If the block is copied to the heap, what happens is we actually move the __block storage also to the heap. So when the function, so we've got 2 cases that are kind of interesting. When the function returns, well we let go of our reference to that shared variable and we return letting the heap version of that block keep the __block thing alive.

In case two it can happen that the heap object perishes first, in which case it lets go of its reference to the __share variable, and the heap object goes away but the stack is still in good shape because it has a real hard reference to that shared variable.

And so in this case when the thread is ready, when that function is ready to return, we let go of the block, the shared variable-- it goes away, in this case because it's the last reference, and then the stack, the function returns and we end up at the initial state in both cases.

So we do the memory management for you in a non-garbage collected way so that you can just use blocks, as I said they're cheap because they start out on a stack, get a little expensive if they have to move to the heap-- but they're very versatile when they do that. And so we deal with the memory management stuff for you. If you're an Objective-C programmer, I hope many of you are, we have a few extra specializations. First of all, all blocks are always Objective-C objects.

[ Applause ]

Even if you're just sitting in a plain old .C program, they're always objects. And so they don't do much as objects though, but they do respond to the copy and release messages in auto release. What's cool about that is that as a first class citizen they can participate in our @property syntax, using the copy attribute, and they just work fine that way. So there's more about syntax, there's more about Lifetime, there's some kind of interesting gotchas and fun surprises we've encountered over the time, and if you want to learn more about that, please come to the 5 o'clock talk today in Russian Hill.

So what are blocks good for? There's over 100 uses of them in Snow Leopard. We have found a lot of things to do with them. There are sort of the traditional things that I've talked about, which are for the most part blocking, they're synchronous, they're primarily single-threaded uses that collect in the iteration kinds of stuff.

We have, as I said, iterations with and without options. We don't quite have the map, but map's so easy to write, you guys can write it, but we got reduce and collect kinds of cases; and we have wrappers as I've illustrated. A new thing for us is-- sorry, callbacks. The new thing for us is using these in an asynchronous, non-blocking, multithreaded way. And that's what Grand Central Dispatch offers you. So this lets you think about concurrent computation on independent data, it lets you think about serialized computation on shared data, and all commas of the above.

And before I steal anymore of Dave's thunder, I'll turn it over to him.

[ Applause ]

Mic? All right excellent. Thank you Blaine. We really appreciate blocks. So Grand Central Dispatch, what is it? You may have been reading all sorts of things online wondering what it is, well now you get to know. Well why GCD? Where are we coming from? Well GCD is all about asynchronous blocks, as led off from Blaine, and we'd like to give a little teaser for what that looks like. This is a basic block, no parameters, no return value, print something out.

We're going to provide a very simple API. Dispatch async and it takes a queue. That's how you can get a basic block running in the background doing work. That's it. So why are we doing this? Well on the top end of Apple's platforms we have the Mac Pro with 16 virtual cores. We have at the very low end our Mac Mini, that many customers like. It has 2 cores. Apple on the Mac platform is 100 percent multicore, top to bottom.

We also have one more problem we'd like to fix-- user experience. Every time you'd see a spinning beach ball, that's an opportunity for concurrency, to get something running in the background and keep a snappy user interface. So that's why we're doing GCD. Well what is GCD? There are 3 reasons why we're doing it. It's fast, it's easy, it's fun. So I want to talk first about being fast. How is it that GCD's fast?

Well GCD sits at the lowest layer of your process, sits right on top of the hardware, right on top of the kernel. And what we can do with some of our APIs is bypass the kernel to provide a nice speed boost, in this case the API isn't particularly important, but what happens is, your application can call into GCD and on the fast path we can potentially just return right away to your application in a few number of instructions.

However if we need to do some more complex work, we can then drop down to the actual kernel equivalent API, and do the work and then come back up. All right, so we have a nice fast path and for this particular API, I'd like to point out that it can be up to 200 times faster on a Mac Pro with 16 virtual cores. So that's a nice advantage of bypassing the kernel, give a little turbo boost.

So moving onto easy-- as I was talking about, we have a great metaphor. We have blocks and queues. They're very easy to think about, very easy to use, and again something you'll be using a lot of. A basic block, you're going to be sliding in a dispatch_async and a queue, and you're going to be running a lot of code concurrently really easily.

So then moving onto fun; I want to talk about some stories around campus that I've heard, from engineers we run into in the halls, and just as we chat up like normal. As you've heard during some of the Monday talks, we had some external developers talk about either how fast it made their code, or how easy it was.

And around campus we've heard the same story. Wow! My code runs a lot faster and that's really cool, thanks. Another thing we've heard, honestly, Why hasn't it always been this way? Developers around the company, after playing with GCD, love it. They just start running around, they add it here and there, it's just really easy to use and they find new opportunities that they never thought were there. And a quote I really like from someone in our developer tools team, "Dispatch is like writing poetry".

They enjoyed it that much, so dispatch is fun. So technology overview; what are we going to be talking about? GCD is a part of libSystem; you don't need to do a single thing to get it. It's in every process, it comes right along with basic Malloc and free and other fundamental library API.

All you need to do is include one header and you get access to all our API - Dispatch/Dispatch.h, that's it. We have a basic object-oriented architecture, so here's some common routines shared among just Dispatch objects. We have queues, we have sources for monitoring external events, we have groups for tracking sets of blocks, and we have semaphores for more intricate synchronization problems.

And finally we have some basic not object types, we have a concept of time in GCD, and we have a concept of running a block once and only once for the lifetime of the process. However in this talk, we are only talking about a few object API, a few queue API, a little bit of groups, and time. The rest of this will be covered in the in-depth talk tomorrow.

And those sessions-- so Understanding Grand Central Dispatch In Depth, is tomorrow morning at 9 a.m., and in the early afternoon we have Migrating Your Application to Grand Central Dispatch where we'll really focus on the comparison between existing technologies and what's provided in dispatch, and how to compare the two and adopt. We also have some more related sessions.

Yesterday there was a Designing your Cocoa Application For Concurrency, where lots of edge cases and nuances that you might run into are addressed and talked about. I hope that you might be able to catch that on video. There's a What's New in Instruments, it'll be right after this downstairs. And there's Blaine's second talk Objective-C and Garbage Collection Advancements, which will also cover blocks.

And finally there's some Advanced Debugging and Performance Analysis you might want to look into for understanding whether your particular project will find concurrency to be worth it. All right, so GCD objects. Let's talk about some shared context. Each GCD object has some-- we have some basic polymorphic API among our types. For example, Dispatch objects are reference counted. We have Dispatch retain, Dispatch release.

It can retain or release objects just like you would with Cocoa, and GCD does the right thing with the parameters that are passed to it. So if a queue is passed to dispatch_async, the queue will be retained throughout the usage of that block. I'd like to point out that Dispatch objects do not participate in Garbage Collection in this point in time.

And there are a few other shared functions, which I'll talk about tomorrow. Moving onto queues, our first example of a dispatch_object. We'd like to talk about the fundamentals of a queue first, and then we'll talk about specific instances of queues. A queue in Dispatch is a lightweight list of blocks. It's really lightweight.

It provides-- sorry...

[ Silence ]

It's a lightweight list of blocks, and it provides asynchronous execution of those blocks. Enqueue/Dequeue is FIFO. This is shared among all queues. And we'd like to show an animation of what this looks like. So you can have your thread, and your allocated dispatch_queue. You can then create a block and then dispatch_async to it like we've shown earlier.

It puts it on the queue, and in fact you can allocate multiple blocks, and what GCD will do is, it will notice this queue has become available with work, and it will start-- it will bind a thread to that queue and start running blocks. In fact it can reuse threads to run a few blocks in a row, thus getting a nice speed boost for recycling an available resource. So that's the basics of a Dispatch queue. This is what that code would have looked like for that animation we just saw: async, async, async. We had 3 blocks A, B and C.

That's it, it's that simple. All right so global queues. This is our first instance of an actual queue that your code will use. The unique property about it is while it-- concurrent execution, however it shares all the properties of queues that we've described so far. It's lightweight, provides asynchronous execution, and Enqueue/Dequeue is FIFO.

It's just that completion part that's not FIFO. So a diagram of what this looks like in practice, again you have your thread, it starts allocating blocks, they get put on the Global Dispatch queue that's available, and GCD will notice that queue has work and start binding threads to it, and then run the blocks to completion and then return the threads to the pool for other queues to use. This is the code of what we just saw.

In this case we're getting one of the well known Global queues, there's some optional parameters. We're just going to pass 0 for now, we're going to take that queue and dispatch_async to it; just put the blocks on the queue, let the queue have the policy for what to do with the blocks; in this case, run them concurrently. So that's pretty easy. Let's talk about a different example and a different application, the Global queue. Suppose, suppose, suppose you wanted to characters, words, and paragraphs in a document. And let's say your document is large, very large.

Well here's your basic UI.

[ Silence ]

All right, we got some results. Yay! All right well that's a lot of characters, that's a lot of words. That's a lot of work the computer's doing. Well how can we use GCD to make this faster? All right well let's say the code is doing something like this. We have our work, we're going to run a for loop over it, scanning the text for the words, being able to deal with some of the divisions around the document. We can aggregate the results into a set, in this case basic C array.

And then after the for loop completes we can then summarize the results and eventually draw them. Well with GCD it's real simple. We drop that C construct and we replace it with a Dispatch for loop. All we need to do is take the Global concurrent queue that we've talked about already, and take a new function called dispatch_apply that acts just like a basic for loop that you write everyday; you pass in the count, you pass in the queue, and you pass in a block.

This particular block takes 1 parameter back, which is the index to do the work on. So what will happen is this block is farmed out among the CPUs, each CPU gets a different index assigned to it until we've exhausted the count, from 0 to the count, not including the count, just like a regular for loop.

To give you an example of the actual problem we just talked about-- here's some speed improvements you can get out of dispatch_apply. We took the basic scanning of a very large document, which took 17 1/4 seconds. We tried running on 2 CPUs and it got almost super perfect linear scaling on Mac Pro. We then broke it up into 4 chunks and 8 chunks, and again still super near linear scaling.

It's really efficient.

[ Applause ]

Thank you. However we still have a problem. That's still a very large document and sometimes it doesn't complete in time and we still have a spinning beach ball, despite reducing it from 17 seconds to a few. All right so what can we do about that? Let's talk about asynchronous design.

Let's talk about that little teaser API we've been talking about. So imagine this is your basic Cocoa callback for when that button's pressed. We're going to take your document, call the summarize method, and that summarize method would probably use dispatch_reply to do the work that we've just talked about, and it's going to take the resulting dictionary, hand it off to the model, and then update the view and release the dictionary. What could we do?

All right let's make a little bit of wiggle room, and let's slide in some code. Does dispatch_async to a Global queue to get the intermediate code running in the background, and that's it, so now it's running in the background. OK great. We need to do one more thing.

We need to dispatch_async back to the main thread, and get the model and the view updated in a thread-safe way and coordinate it with the GUI. That's it-- 2 lines of code, 2 lines of basic boiler plate, and now we've taken an existing function, gotten code running in the background, and we've gotten the results running back where we need them; that's it.

[ Applause ]

Thank you. All right, so let's talk a little bit-- so we've taken this problem and now we've replaced it with this, and your users are a lot happier. So let's talk about that main queue that we just hinted at. It executes blocks serially, unlike the Global queues.

It cooperates with the Cocoa main run loop, which our application is probably calling through NSApplicationMain. It's also usable by pure GCD programs. Instead of NSApplicationMain, our program can call dispatch_main, that's it. The main queue, here's an example usage, you might get the main queue, dispatch_async Hello World to it. Maybe after that, oh sorry here's main. Here you're going to get the main queue, maybe you're going to put a little Hello World.

It's going to wait in the queue until we actually drain the main queue, and then your application's probably going to initialize some libraries or itself; and they can also use the main queue. And then finally your program will probably call NSApplicationMain and the main queue will start draining, that block will probably run first, and then any callbacks that have been installed by libraries that the application will run. To reiterate, your program in a pure GCD world could also just call dispatch_main.

Let's talk a little bit about serial queues. The main queue's one, how can you allocate one? Well this is the second type of queue. Only 2 types that we have in GCD. We have the Global queues and the Serial queues, same common properties are shared, it's really lightweight, it supports asynchronous execution, Enqueue/Dequeue is FIFO, but unlike the Global queues the completion of blocks is also FIFO. To again show an animation, we're going to have 2 threads now trying to enqueue at the same time.

One of them won the race, its block got inserted first. GCD assigns a thread, the queue starts draining the blocks, they finish serially. So that's a Serial queue. Here's an example in code of what you just saw. Thread 1 and thread 2 exist already on the system. One of them executes a dispatch_async and queues the first block, then our race happens and 1 of these threads wins the race as we saw in the animation.

Here's some more code to show what that's like. Here's how you create a queue, it couldn't be easier. You call dispatch_queue_create, you pass in a label, this label can be whatever you want, although we recommend and strongly encourage some kind of reverse style DNS, because there's a lot of queues running around the system and it's nice for them to be distinguishable. And then the second parameter is future proofing, just pass null; so that's it.

That label you'll also see in future talks, like in the Instruments talk, or in your crash supports, or samples, or other debugging tools. And that's how you can help aid your own debugging and development. You can also get the label that we've just assigned to it, we have this so again you're debugging, you might want to print out programmatically what label it is.

And that's how you print it out, pointer and label. So building on the topics that we've talked about so far, I want to talk about multiple Serial queues. How do they behave? Well queues are lightweight. GCD imposes no limits other than available memory. So you could allocate 100, 1,000, 10,000.

Doesn't really matter, you can decide what's appropriate for your application. Because of that fact it's very natural to assign 1 queue per task. So if you have a lot of tasks, well assign a queue per task and then let them roam around independently. And this is how GCD achieves yet more concurrency.

Each queue can run around independently and we have an animation to show that. So you have your thread, it allocates a queue and then maybe another queue. It starts assigning blocks to queues, and GCD will intelligently assign threads to queues and allow the blocks to execute concurrently. However each Serial queue is still only processing blocks serially.

That's how you can get some task level concurrency. In code of what we just saw, there were 2 queues that were created, queue 1 and queue 2, in blue and gold respectively. There were dispatch_asyncs A through E in a row, 2 alternative queues, queue 1, queue 2, queue 1, queue 2, queue 1.

And that was the animation we just saw. So again, each queue operates independently and serially, but we can achieve concurrency at the same time. So that's been queues and blocks, now let's talk about grouping blocks and start tracking work and build a more interesting and fun program. So let's group some blocks.

Well we have an API for that, it allows you to track a set of blocks and then get a notification when those blocks complete, the whole set. The blocks may run on different queues-- the blocks that are part of the group, and each individual block can add more blocks to the group.

And you might call this recursive decomposition. So what does that look like? We're going to take some queues that are running around the system and allocate a group. We're then going to start assigning blocks to that group and get them running on queues. All along the Dispatch group is counting how many blocks are outstanding.

So when that 0's hit, we can now get a notification callback, say, Ah-ha, all the blocks are done, we can now move onto the next stage of our project. This is what, more or less what we saw-- a Dispatch group was created, we then used whatever iteration API we want, in this case a for loop.

Your code could use an NSDictionary, or some more complex object graph to evaluate. Whatever the case may be, it's up to you. We then, instead of a dispatch_async, do a slight modification and call dispatch_group_async with the group parameter passed first, and the rest of the parameters are just like dispatch_async.

Inside of our block we can do something with our task, and that's how you get blocks into a group. Finally once we're done populating the group, we can call dispatch_group_notify, pass in the group, pass in a queue to run the block on, and then pass in a block that will get run when the group empties. And then finally we can call dispatch_release. It's worth noting that GCD will retain the group while it's in flight, so you don't need to worry about waiting for the group to complete.

[ Silence ]

So completion, groups are a generic completion notification system. However we'd like to remind you of something we just saw earlier in the talk, which are called nested blocks. It can be fewer lines of code, it can be more efficient. So what was it that we saw earlier in the talk?

We saw a dispatch_group_async to a Global queue, we then saw some work done inside the block, and then as the last part of our block, the outer block, we then ran another dispatch_async, we picked a queue, in this case the main queue, being a very popular one. And then finally we took the results of our background operation, and did something with them.

The last part is the boiler plate to end our blocks. That's it. Two lines of code just to get some code running in the background, and the results running back where you want them. However sometimes asynchronous code isn't convenient, and code needs to wait. Let's talk about waiting. Why do you need to wait?

Why does your code need to wait? Well when you're bridging synchronous code and asynchronous code, that's where you need to wait. You can wait for a set of blocks to complete with the Group API, which is sometimes called a Fork Join model, or you can wait with 1 block with an API we call dispatch_sync, because it's synchronous, it waits.

However groups are more a more generic mechanism. So here's an example of dispatch_sync, it's like dispatch_async, but without 1 letter.

[ Laughter ]

So it's real simple. Here's an example using the syntax that was outlined earlier in the talk with the __block syntax. We're going to set up an RVal to get the result of a function, we can then Enqueue the block on the target queue and let it run in the proper order of all the blocks Enqueue'd.

And then finally once the block runs, it can do its work, generate the result, and once that block completes dispatch_sync will return, and now we can use the resulting value. That's it, that's how you can wait; just drop a letter. Here's the more generic example as we talked about with groups, but now we're going to talk about more the advance block syntax. So we can create a group and an array of resulting values, using the __block syntax. Again we can pick our favorite iteration technique. In this case the for loop.

We can do the group async, and now instead of a group notify we can do a group wait. Pass the group; tell it we want to wait forever, and when that returns the group is done. We can release the group, and then finally use the resulting values. So that's it, that's how you can go create a bunch of asynchronous work, and then wait for it all to complete with 1 API.

All right, so what was that dispatch time forever all about? In GCD we have a concept of time. It's an opaque type, it's not safe for arithmetic, it's not safe for comparisons, it's just a bucket of bits. There are 2 well known constants that GCD uses, and your code will probably use too. There's DISPATCH_TIME_NOW for just checking immediately and having it return with a result of either a success or timeout, and there's DISPATCH_TIME_FOREVER, which tells GCD to wait forever.

In that case it will always succeed. So using the group API that we already talked about, group-- wait with DISPATCH_TIME_FOREVER will wait forever, and this always returns success, which is 0. We also just pull the group by saying an immediate timeout, and the group wait API will return non-zero for the timeout case.

So this will print out group is not empty if the group is not empty. So that's Dispatch Time. We also support arbitrary timeouts, and it's also possible to defer a block just like with dispatch_async, except with an API called dispatch_after, which will run the block after a dispatch time. However we're not going to talk about it now.

We're going to talk about that tomorrow morning at our Understanding Grand Central Dispatch in Depth talk at 9 a.m. Finally I'd like to talk about Instruments. They've done a great job at making blocks easy to track and introspect. They've got some great behavioral analysis for understanding the latency of blocks form when they're enqueued to when they actually start executing. They also can help you track when they were enqueued, by which queues, and essentially just all these blocks in flight. I hope you understand that.

You can also track the latency of the block execution itself, and it can also help you track which blocks are run synchronously; which is important to understand as far as accidentally creating too many stall points in your program, where a thread did a dispatch_sync, waited for a block to run, which completed, started much later. And it's an opportunity for switching to a callback base design. Which leads into optimization, which are what are those longest running blocks and what are the most executed and how can we tune? These are going to be some teaser diagrams for what Instruments can provide.

Here's a call tree view that's-- here's a queue centric view that's keeping some accumulated statistics. Here's a block centric view of tracking how blocks are flying around, and then a block focus view. And what I'd like to point out here, which I think is really cool and we're really excited about, is on the right, the very right is this stack of backtrace. But there's a division in the middle which is the difference between the enqueue point and the execution point.

[ Applause ]

So as the clapping hands have already figured out, it makes completely decoupled code look like a straight up backtrace from where it started to where it ran. So we really think that's cool. Instruments has 2 talks, there's one right after this one. And there's an advanced one tomorrow morning. So that's GCD in a nutshell. We think it's really fast. We think it's easy, and we think it's really fun, and we hope you think so too.

[ Applause ]

All right cool, thank you. So to wrap things up, Michael Jurewitz will be getting on stage soon to handle Q & A. He's our Developer Tools and Performance Evangelist, and he'd be happy to hear from you. We have lots of great documentation showing up. There's a section on the dev's forum for communication, and I'd like to point out something that's not here on the slides that we forgot is, we have header doc which is fairly extensive, and more than anything that I think we feel proud of is a huge section of man pages that our engineers have written that document a lot of this stuff in depth.

[ Applause ]