Mastering Grand Central Dispatch - WWDC 2011

Core OS • iOS, OS X • 42:47

Grand Central Dispatch is a powerful technology on Mac OS X and iOS that simplifies multi-threaded development. Learn about the latest advancements in Grand Central Dispatch, discover how you can adopt GCD in your own application to increase performance, and gain the real world knowledge you need take your app to the next level.

Speaker: Daniel Steffen

Unlisted on Apple Developer site

Downloads from Apple

HD Video (132.7 MB)

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.

Good morning. Welcome to Mastering Grand Central Dispatch. My name is Daniel Steffen. I'm the engineer responsible for GCD on the CoreOS team. And as you know, we've introduced Grand Central Dispatch in Mac OS X Snow Leopard and were pleased to bring it to iOS 4 last year here at the conference. It has been very widely adopted in both your apps and in Apple's frameworks.

And indeed, as you may have seen in other sessions this week, we are introducing a large number of new APIs in line and iOS 5 that coordinate or relate to GCD in some way. So it's becoming an ever more important core technology for you to understand if you do any kind of asynchronous or concurrent programming on our platform. It's worth pointing out that the GCD API is identical across our platforms, so you can use the same functionality on both Mac OS X and iOS.

And it will, GCD will scale across our hardware from a single core iPhone all the way up to a 24 core Mac Pro. So today we're going to give a very brief introduction to GCD. There was a more in-depth introductory session yesterday that I hope you managed to catch if you're new to the technology.

And we'll look at what is new in GCD on Mac OS X line and iOS 5 and give some advanced usage of GCD API. So what is GCD? So the very core of GCD are blocks and queues. Why blocks? These are very useful in asynchronous programming to encapsulate units of work.

Along with the associated state, so in this example here, we create a block and it encapsulates some piece of work that we want to do asynchronously and the block captures the state that it needs from the surrounding stack. Along with the associated state, so in this example here, we create a block and it encapsulates some piece of work that we want to do asynchronously and the block captures the state that it needs from the surrounding stack.

For instance, here, it will retain the object and copy the integer argument. So once that is done, we pass that block to a function that will execute the block asynchronously and now we can safely change the integer or even release the Objective-C object without affecting the block. Along with the associated state, so in this example here, we create a block and it encapsulates some piece of work that we want to do asynchronously and the block captures the state that it needs from the surrounding stack. For instance, here, it will retain the object and copy the integer argument.

So once that is done, we pass that block to a function that will execute the block asynchronously and now we can safely change the integer or even release the Objective-C object without affecting the block. Queues are the central concept in GCD. They are just a very lightweight list of blocks, really, and it is very cheap to create queues so it's fine to have many queues in your application. The enqueue and dequeue operations of our queues are FIFO. So, blocks get dequeued and executed in the order that they were enqueued. And importantly, the enqueue operation itself is atomic.

So, it is safe for multiple threads to enqueue onto a single queue at the same time. These properties, along with the fact that serial queues execute blocks one at a time, make these queues useful for synchronization and serialization purposes in your application. We also provide concurrent queues. These execute multiple blocks at the same time. There's still dequeue blocks in order, but they will, because they can execute concurrently, and these blocks may then complete out of order.

Another way to get concurrency with GCD is to use multiple serial queues. That works because all of our queues execute concurrently with respect to other queues. So, this is an example of that in action. Here we have two serial queues with some blocks enqueued. And we have a worker thread on one CPU that starts to dequeue some blocks and execute them. If a second worker thread can come along, it can dequeue from the other queue, and as you can see, we have some concurrency where there was overlap between blocks.

Similarly, with a concurrent queue, here we have two worker threads that dequeue items from the one concurrent queue and execute them concurrently. Very briefly, the API to submit blocks to queues is DispatchAsync and DispatchSync. DispatchAsync just enqueues a block and goes along, whereas DispatchSync will enqueue the block and wait until that block is actually executed on the queue. You can also delay the submission of blocks to a queue. You can also delay the submission of blocks to a queue by calling the DispatchAfter API, or concurrently submit and execute a single block many times to a queue by calling the DispatchApply API.

We also allow you to suspend and resume execution of a dispatch queue by calling the Dispatch Suspend and Resume APIs. This is an asynchronous operation suspension. It will take effect at the earliest possible opportunity on a queue, meaning it won't interrupt any ongoing blocks that are already executing, but as soon as the can't block is finished executing, the queue will stop execution. If you want to make this process deterministic, the way to do that is to call dispatch suspend from a block that is actually executing on the queue itself. This way you know that you can't block is the last one that will execute before suspension takes effect.

And our queues are reference counted and you maintain that reference count with dispatch retain and release. A couple of pitfalls to go over with dispatch queues. They are really intended for flow control in your application and not as a general purpose data structure. If you have a lot of work that you may need to execute in your application asynchronously, it is better to keep that work in an actual data structure until you're sure that you really have to execute the work and only then submit it to a queue. In particular, this is the case. Because once you submit a block to a queue, there is no way to cancel that. The block, we guarantee that that block will execute. Also, as with any multithreaded programming, be very careful when you use any synchronous API.

GCD does not protect you from the possibility of deadlock. For instance, if you have used our dispatch sync API, it is very possible that you can deadlock yourself by waiting for a queue that is waiting on the queue that you're on in turn. Also, if you use our recommended pattern to apply, or use a serial queue for each subsystem in your application or each resource that you need to -- each shared resource that you need to protect, if you block the queue that protects that resource or that subsystem, you will block the whole application's access to that. So any kind of synchronous API on those queues would be bad. We have a couple of queue types.

The global queues that you get with the dispatch get global queue API, is passing in a priority. These are concurrent queues. They execute multiple blocks at the same time. And they are global across your whole application. The main queue is another global object. This is a serial queue that cooperates with the main run loop of your application. This is typically where you would execute blocks that interact with the UI subsystem.

This is one of the queues where it is very important to not run any blocking synchronous API, of course, because you would prevent event processing in the app if you do that. Serial queues you can create yourself by calling the Dispatch Queue Create API and passing in a label. New in Lion and iOS 5 are concurrent queues.

Like the serial queues, you create these with the Dispatch Queue Create API and you pass in the Dispatch Queue concurrent parameter. These, like the global concurrent queues, will execute multiple blocks at the same time, but they can be suspended like a serial queue. And a new concept, they support barrier blocks. What are barrier blocks?

These are blocks that represent synchronization points in an otherwise concurrent queue. So they will not run until all the blocks that were submitted before the barrier block have completed. And while they're executing, nothing else can run on the queue. Any blocks that were submitted later will wait until that barrier block has executed.

You submit barrier blocks by calling the Dispatch Barrier Async or Dispatch Barrier Sync APIs. These are exactly like Dispatch Async and Dispatch Sync, except that they mark the block as a barrier block. Note that these barrier blocks don't do anything special on the global concurrent queues because the global concurrent queues are a shared resource across the whole application.

We don't want anybody to be able to monopolize those queues. So here we have the same animation as before, except that we have one of these concurrent, one of these barrier blocks in the middle of one of these new concurrent queues submitted by the Dispatch Barrier Async API.

And once threads start dequeuing blocks here, the yellow barrier block will not start executing until all the existing work on the queue has finished. Then while the block, the barrier block is executing, nothing else can run. And once it has finished, normal concurrent execution can continue. One important use case for these new concurrent queues are to implement efficient read or write schemes with GCD.

[Transcript missing]

So we currently have three priority levels, low, default, and high. And if you create a serial queue, by default, it will come with the default priority global queue set as its target queue. Then you can submit some back to that queue and later on potentially change its target queue, say, to the low priority queue. Of course, you can also, when you create a queue, immediately change the target queue here. You set it to be the high priority queue.

So items that you submit to the serial queue on the right will now execute on a thread that runs at high priority. New in iOS 5 and line, we're adding a fourth priority level, the background priority. And of course, with the new concurrent queues, you can also set the target queue. So here we've set it to be this new background priority global queue.

More on that. The background priority global queue you get by passing the dispatch queue priority background constant to dispatch get global queue. Blocks that you submit to this queue will run on threads that run at the lowest possible scheduling priority in your process. And any I.O. that you do from that thread will be throttled I.O. system-wide that will take a backseat to any normal I.O. that's ongoing.

So this queue is very useful if you have any kind of idle time work or scanning of the disk or indexing, anything like that that should not interfere with other more higher priority work on the system. As mentioned, target queues can form a whole hierarchy. And the way to set that up is by calling dispatch set target queue. This in itself is an asynchronous barrier operation on the queue. So any work that is already enqueued on the queue and executing will not be affected by this changing of target queue.

It's only once all the work that was submitted ahead of the call to dispatch set target queue has finished that the target queue will change and affect any future blocks. And we support arbitrary deep hierarchies if that is useful to you, but not loops. There's a couple of useful somewhat non-trivial idioms with target queues that we'd like to go through now.

One is that you can synchronize with a serial queue by setting the target queue of your queue to be that serial queue. Synchronize meaning that there will be a block executing either on your queue or on the serial queue but not on both at the same time. However, setting that up does not implicitly order -- impose any ordering between the two queues. So if you submit blocks to both queues in a certain order, that doesn't mean that they will be dequeued in that order.

Here we have two zero queues that come in the default state with the target queue set to the default priority global queue. You can synchronize them by changing the target queue of one to be the other zero queue. This means there will be a block executing on one or the other of these two zero queues but not on both at the same time.

Of course, you can do the same with one of our new concurrent queues, and what that does is that it actually makes the concurrent queue serial. The reason for that is that, remember I said, target queues are where the dequeue operation for a queue is executed. And because here the target queue of a concurrent queue is serial, the dequeue operation becomes serialized, so we cannot run more than one block at the same time, even though the queue was concurrent. One use case for that is if you can't use a read or write lock, you can promote it to an exclusive lock by doing this.

[Transcript missing]

As soon as that red high priority item has finished, we resume the low priority queue and the DQ operation can continue as normal. So now we've managed to jump ahead of the queue. Our next topic is queue-specific data. This is a new feature in iOS 5 and Lion. This allows you to associate arbitrary key-value storage to a queue. We already had DispatchSetContext and GetContext, but that was a single pointer and wasn't really sufficient if you were sharing your queues among multiple subsystems in your application.

So here we allow you with the DispatchQueue, GetSpecific and DispatchQueueSetSpecific APIs to set key-value storage for any queue. And we use keys that we just compare as pointers. So typically, we recommend that you use the address of a static variable in your subsystem that will be a unique pointer to your subsystem. Don't just pass in a string because that may not be a unique pointer.

And for the setters, you pass in a value and a destructor function. That destructor function is called when the value is unset by another call to dispatch queue set specific or when the queue is destroyed. And a new capability we make available for you with this is that you can get the current value for a given key when you're running in a block on a queue with the dispatch get specific API. You just pass in the key and this will return the value that is set on the currently executing queue or, importantly, this is the value that is set on the currently executing queue.

[Daniel Steffen]

And this is a way of the target queue hierarchy. If this current queue doesn't have a value set, it will go and look in the target queue, does this have the value set for this key, and so on until it reaches the bottom of the hierarchy.

So to illustrate this here, we have three queues set up in a target queue relationship, and we set a value for a key, value one set on the QA and value two set on QB, but no value for this key set on QC. If you now run a block on QB and call dispatch get specific there, you will see value two. But if you run it on QC, because it doesn't have its own value, it will see the value from the target queue, value one.

Another new API set that we are introducing in iOS 5 and line is dispatch data. This is a facility that provides container objects to you for multiple discontinuous memory buffers. And importantly, these container objects, as well as the represented buffers, must be immutable. So the container objects, once you've created them, you can no longer change them, and any buffers that you pass to us to be represented by dispatch data must not be changed once you've given them to us. And the central goal of the whole API set is to avoid copying buffers as much as possible.

The way you create one of these Dispatch Data objects is to call Dispatch Data Create. You pass in a buffer and a size and it will return you a Dispatch Data T object that you manage with Dispatch Retain Release like all the other Dispatch objects. So we represent that graphically here with a Blue memory buffer represented by a red Dispatch data object. And because the buffers are required to be immutable, we can now have multiple Dispatch data objects that reference the same buffer. And of course, only once all the data objects go away will the buffer be deallocated.

The way that deallocation happens is via this destructor block that you give us as the last parameter in Dispatch Data Create. And there's a couple possibilities for that. You can pass the Dispatch Data Destructor default constant. This will tell us you should make a snapshot of this data. I can't really guarantee that this buffer will not change for the lifetime of the data object. So we make a copy and then manage the lifetime of that copy ourselves.

So of course, if you can avoid it, this is not the preferred choice because it involves making a copy. If you just have a buffer allocated in malloc, you pass Dispatch Data Destructor free, which is essentially equivalent to giving us a block that we'll call free on the buffer, but is more efficient. And if you have some custom buffer that is maybe wrapped into another object, it just passes a block that then does whatever deallocation break is required. So in this example, for instance, we've passed in the buffer underlying a CFData object to the creation.

And for the destructor, we just release the CFData. You can also create Dispatch Data objects from other objects, from other data objects. For instance, via concatenation with the Dispatch Data Create concat API. This takes two existing data objects and makes a new object from them that now represents the two memory buffers concatenated together. And of course, again, because of counting of references, when these original source objects go away, the memory buffers will stick around until they're gone. So the concatenated object also goes away.

You can also make a subrange of an existing data object if you're only interested in a part of the data by calling the Dispatch Data Create subrange API. So here, as an example, we have a data object that represents four discontinuous memory buffers. And we're really only interested in the piece between the two light blue arrows. So we call the Dispatch Data Create subrange API with this range here, which makes a new object that only represents... which represents those two pieces.

And if you now get rid of the source data object, you can deallocate the two buffers that are no longer required. So if you do any kind of incremental processing of data in your application, this can be a very easy way to get rid of data on an ongoing basis. And again, of course, when that goes away, the remaining buffers get deallocated.

[Transcript missing]

Let's look at an example of this in action. Here we acquire a dispatch data object from somewhere and what we're interested in is to try and find the header on a character by character basis. We want to look through the data and find the terminator character of a header here, which we picked control Z as an example to be the terminator. We need to fill in this position under under block variable with the position of that character in the buffer.

So we call dispatch data apply and our iterator block will get past four parameters, a region object, which we don't actually use in this example, but that just is an object that represents the current memory area that's being traversed, and an offset, which indicates where that area starts in the overall data object, and a buffer and the size that gives you direct data. So we can then create access to the storage.

So here we just look for the position of that terminator character. If we find it, we calculate the overall position in the buffer and we return a Boolean that indicates whether the iteration should continue. So for instance here, once we have found the header, there's no need to go through all the rest of the buffers in the object so we can return false and stop the iteration.

[Transcript missing]

and also allow us to track file descriptor ownership with a reference countered object. And these dispatch IO channels have two different types, stream or random that you have to specify a channel creation time with the following two constants. The stream access channels perform IO operations at the file pointer position and advance the file pointer position.

And if there's multiple asynchronous IO operations in flight, these will be performed serially one after the other because they depend on the file pointer position. Of course, if you give us a file descriptor like a socket where reads can be performed concurrently with writes without interfering with each other, we will do that. The random access channels on the other hand will start at a specified offset.

The IO operations on those channels will start at a specified offset to the initial file pointer position when you created the channel. And these operations then change the file pointer position. So this is like period versus read, right? And in this case, any asynchronous IO operations that you submit may be performed concurrently if it makes sense for the device in question.

And of course, any file descriptors that you give us for these kinds of channels must be seekable. So three ways to create Dispatch.io channels. The Dispatch.io create API where you pass in a file descriptor. The Dispatch.io create with pass API where you pass in a path on disk.

And the Dispatch.io create with IO API where you pass in an existing IO channel. This is useful if you need to change the type, for instance, of a channel or change configuration parameters on a channel without affecting the original. And all of these take a cleanup block as the last parameter. So let's look at that. This cleanup block is what allows us to do that wrapping and control of file descriptors.

When you give us a file descriptor with the Dispatch.io create API, it becomes under system control. And you must not change it during that time. So you must not do any direct IO to it or change the file pointer position or close it or change any configuration parameters on it.

And you must not do these things until that cleanup handler is called. That cleanup handler gets called when the IO operations that you have submitted to the channel have all completed and either the channel has been released or it has been closed with an API that we'll see in a couple of slides.

The syntax of these cleanup handles is very simple. We just pass in an error. These errors are only creation errors for the channel. So we only pass you errors that occur when we try to create a channel, for instance, if you pass a bad file descriptor or one of the wrong type or a pass to a to a volume that doesn't exist or anything like that.

You won't get any I/O errors on the channel that occur later on when you submit operations. And the typical thing that you will do in this cleanup handler is close the file descriptor. Again, you must not do that before this point because it would rip it out from under us.

The I/O operations that we support, asynchronous reads and asynchronous writes. You submit an asynchronous read request at the file pointer or at the specified object offsets, depending on the channel type with the dispatch I/O read API. And here you just give us the lengths that you would like to read from that channel.

And asynchronous write, you pass in, again, occurs at the file pointer position or at the specified offset, and you just pass in the dispatch data object that you would like to write to the channel. Both of these take a handler block, which is an I/O handler. This is what allows you to do incremental processing. We will call this block multiple times, potentially, during the I/C.

This is an asynchronous I/O operation and give you incremental pieces of data as they come in. So importantly, that data that comes in, we will get rid of it as soon as the handler returns. So if you can do any ongoing processing, that's great. Otherwise, you must retain that data if you need it after the handler has returned.

So just call dispatch retain on it or concatenate it with another object, a dispatch data object, et cetera. And the read operations are passed the data in one of these handlers that was read since the last time the handler was called, whereas the write operations actually pass the data that we couldn't write yet. So this is the end piece of the data, which will get smaller as the write proceeds. And this is particularly useful if the write fails, then you can get at the data that we couldn't write and maybe write it somewhere else, a retry or something like that.

The syntax of these handlers is, we pass you three parameters, boolean, done flag, a dispatch data object, and an error, parameter. And so you should treat all of these three parameters independently and check for their presence. If there is a data object, that means maybe there was some data read, you process it and handle any error that occurred. And once you get the done flag, that indicates to you that this is the last time it will call this handler.

So complete any processing of incremental data or any state that you've accumulated during that time. The other kind of operation we support on dispatch area channels are barriers. These are very similar to the, barrier blocks that we discussed earlier for concurrent queues. They only execute once any pending I/O operations that you already submitted to the channel have completed.

So this is a way to allow you to synchronize I/O operations on the channel and know when they have finished. And while one of these barrier blocks is ongoing, it has exclusive access to the file descriptor underlying the channel. So we do actually allow non-destructive modifications of the file descriptor during this time.

So here we submit one of these barrier blocks with the dispatch I/O barrier API. And inside this block, now we are sure when we're executing that we have exclusive access to the file descriptor. So if you don't know it, for instance, if you open the channel from the path, from a path, we will open the file descriptor on your behalf. So you don't have it, you call the dispatch I/O get descriptor API to get it.

And in this example, we just call fsync, for instance. So if you have a bunch of asynchronous writes that you have submitted, you want to make sure that they're synced. So you put in a barrier block that will make sure they have all completed and now sync. I mentioned closing channels before. You do that with the dispatch I/O close API.

This will close the channel from accepting any new operations, but it will let any existing already submitted operations complete. Once they have all completed, it will invoke that cleanup handler where you can close the file descriptor. If you need to interrupt any already submitted existing operations, you can pass in the dispatch I/O stop flag and we will attempt to interrupt any ongoing I/O at the earliest possible opportunity. But of course, this is asynchronous. So again, you must wait for the cleanup handler to be called to be sure that all the operations have actually completed.

Any ongoing I/O handlers that are interrupted get passed the eCancel error flag set that they know that they were interrupted by a stop. To go through an example of all this tied together, we are implementing a transliteration of maybe a very, very large file that we pass in via a file descriptor, and we want to transliterate that file on a character-by-character basis and write it out to a path, and want to do it all asynchronously.

So we create an input channel. In this example, we make it a random channel from this file descriptor, and as you can see, the cleanup handler here just handles any creation errors and then closes the file descriptor behind us when the operations are done on this input channel.

The output channel we create from a path, and importantly for this example, we make that a stream type, and we put in the parameters that we would have otherwise passed to open to create the file for writing. And the cleanup handler here just handles any errors. It doesn't need to close any file descriptors because we do that on your behalf when we open from the path.

Once you have that all set up, you can put in an asynchronous read operation to the input channel, and here we pass in offset zero and length size max, which will actually read the whole file on disk. So it could be 200 megabyte file or something like that.

And we have the handler block here, which gets passed those done data and error parameters. And remember, this gets called potentially multiple times. So each time we check, do we have a data object? If there is one, we call this translate function that I haven't shown here, which will go through and create a new data object with the result of this transformation that we want to do. And create a new data object that returns to us that we now pass to an asynchronous write. So this is where it becomes important that the output channel was a stream channel because, of course, we could have multiple of these asynchronous writes.

And we just pass the output data piece at offset zero to be written to the channel. And we, in this instance, in the IO handler, we don't do anything special. We just handle any errors. We don't try to write any data if it couldn't retry writing any data or anything like that.

And once we have passed the output data object to dispatcher write, the system will retain that object so we can get rid of it here immediately. And the same with the input data object. We didn't, as remember, as I mentioned, this data object gets released immediately as soon as the read handler returns.

And so here, if you have a very large file, we will get pieces of that file one after the other, and we will actually never have all the file in memory. As soon as any processing and writing of that piece is done, we can get rid of it here immediately. And so once the read handler is done, the memory will be released, and you will have very good memory efficiency with this API.

When you pass the done flag in the read handler, you know that the read request has completed, so we know that we never need the output channel anymore, so now we can just release it. And the same with the input channel, once we have submitted the read request, the system will retain the input channel for the lifetime of that request, so we can immediately get rid of that channel. So now at the end here, we have released all our references to all the objects, and as soon as the asynchronous IO completes all the references, all the objects will take care of themselves and release.

So to summarize, we went over a couple of advanced examples of usage of the target queue API in GCD, then looked at The new APIs that we are introducing in iOS 5 and Lion, concurrent queues and barrier blocks, which are useful in particular for implementing read-or-writer schemes, queue-specific data, which allows you to associate arbitrary key-value storage to a dispatch queue, dispatch data API for managing this contiguous memory buffers, and the dispatch IO API for asynchronous IO. For more information, you can contact Michael Churowicz, our DevTools and Performance Evangelist. And we have a concurrency programming guide on the DevTools website, which is very informative and covers many of these APIs. And of course, LibDispatch itself is open source.

We will very soon merge in the source changes for the new APIs that we are introducing into that repository. So if you want to go and have a look at the implementation, you can do that. And of course, there's a special forum for GCD questions on the developer forum, so we encourage you to ask any questions there. Related sessions happened all yesterday already, unfortunately, so if you haven't had a chance to go to those, please watch them on video. And I've also put the references to the GCD sessions from last year on here that's focused specifically on iOS programming with GCD. Thank you.