Core OS • iOS, OS X • 56:21
Concludes a two-part series to present the latest techniques for building reliable, secure, high-performance network applications, with a focus on iPhone OS applications. Part 2 covers the interface lifecycle, NSOperation and runloop-based asynchronous APIs, performance considerations, debugging, logging, packet tracing, and failure simulation. Find out from the experts how to improve and enhance your networking products to perform as reliably and securely in real-world implementations as in your test lab.
Speaker: Quinn “The Eskimo!”
Unlisted on Apple Developer site
Downloads from Apple
Transcript
This transcript has potential transcription errors. We are working on an improved version.
[Quinn “The Eskimo!”]
Greetings. My name is Quinn "The Eskimo!" I work for Apple Developer Technical Support, answering questions from people like you, developers, about networking file systems, threads, and other sort of Core OS little things. Welcome to Network Apps for iPhone OS Part Two. If you were at Part One, thank you for coming back. I admire your persistence.
[ Laughter ]
If you didn't make part one, that's probably okay. Most of this talk sort of stands alone. I'll make some references back to part one and you can catch it on the videos, but mostly you should be fine. I'm going to start today -- well, not today because I'm already an hour in -- with a quote.
This is from my friend at DTS who was there when I joined. And he's since moved on to debug kernel panics for a living, which sounds like fun to me but I'm going to -- and it's relevant to this talk because the previous talk was all about architecture. It was about problems and architecturing your application to solve those problems. This talk is all about practical matters.
And so the practical matters we're going to cover and the number one, the big one -- asynchronous programming -- and then shorter sections on debugging and common mistakes. I have certain religious objections to some terminology, "anti-patents" is one of them, "Cloud" is another so we'll avoid that term. So to start.
[ Applause ]
As most of you know -- well, can tell from my accent -- I'm Australian. And -- thank you. And I was looking for an icon for the practical and I wanted sort of an image of filling developers up with knowledge or something like that. So I thought, "What's more practical than beer?" And the answer is: Nothing.
[ Laughter ]
So I took this and I turned it into my cheesy graphic for the talk. And you know I'm a fan -- if you came to Part 1, something you would have missed -- you will know that I'm a fan of the cheesy graphics. And I also sort of wanted a filler bar so you guys could tell your progress along the talk. And for me -- for you guys it's a filler bar and for me it's a goal. When we get to the end, after two hours of talking about networking, I'll be well-ready for a beer. So to start -- asynchronous programming: glass is empty.
Bummer. Asynchronous programming is a big topic so we're going to break it into three. There's the basics. There's a discussion of run loops and then a short discussion of state management, which is how to connect the complicated model -- state of your model objects -- to the hopefully simple state of your front end objects.
We're going to kick off with the basics and what is more basic than a definition? Now, I had a hard time coming up with a definition of asynchronous and synchronous programming in words so I've defined it in code. People kept saying my slides were short on code so I keep adding it. I was helped.
So here you see on the left is a classic synchronous network program: the start method runs; it runs the first request; gets the results from the first request, processes them; runs the second request; gets the results, processes them; and so on. And it does all of that before returning from the start method. This is called synchronous.
On the right is an asynchronous program: the start requests runs, it starts the first request; it starts method run; it starts the first request and then immediately returns; and when the first request is done, another method is called -- request1Done it's called; and that processes the results from the first request, starts the second request; and the whole process continues on from there.
Now, people often ask us "why?" If you've been to any network sessions this week, you'll hear constantly "program asynchronously," and there's good reasons for that. First of all -- well, the obvious resistance to that is that the synchronous programming on the left clearly is easier -- there are fewer lines of code, there are fewer methods, everything was shorter and simpler or it appeared to be.
But yet we network programmers keep nagging you about asynchronous. And why is that? And it's because really fundamentally, the network is asynchronous; you can't control the order that things happen on the network, you just have to adapt to it. And if you do your network programming asynchronously, what happens is that you have a mismatch between your code, which is running synchronously and the [inaudible 3,0] underlying network code, which is running asynchronously. And that causes all sorts of problems.
And in addition to that, what we find is that synchronous programs are fine for when you're writing your third-year paper at university and you want to write a simple test network program. But when it turns into writing a real program that works on the real network where it has to deal with errors and it has to deal with latencies and it has to deal with cancellation, then the balance tips and synchronous networking is no longer easier. Doing cancellation in synchronous networking is a nightmare. Doing cancellation in asynchronous networking is trivial. And so it turns out that asynchronous networking becomes easier as you move along, as you deal with all the Edge cases. And that's why we recommend it.
There's another critical equation in iPhone OS, which is this thing. If you do synchronous networking programming on the main thread, your application will eventually be killed. You will get crash reports from the user saying that your application has died. No matter how you do your synchronous networking, this is inevitable. And so you have to avoid this.
And the reason why this happens is because of this guy: this is the watchdog. The watchdog kills applications that have gone bad. And one definition of the application going bad is it not responding to user interface events. If it doesn't respond to user interface events, the watchdog gives it a certain period of time and then it shoots it.
[ Laughter ]
This is not just a network thing. If you take your main thread and you start calculating Mandelbrot sets in your main thread and you do it for a minute, the watchdog will kill you as well. But it's especially bad for networking because of these -- oh, actually before we get there.
Oops, slide misordering here. When you do get killed by the watchdog, this is what it looks like. You get this "ate bad food." If you look at the numbers there it says, "8badf00d."
[ Laughter ]
[ Applause ]
[ Laughter ]
Anyway, you can tell the crash reporting people on Friday because there's a great cash reporting session. So if you want to know more about crash reports, it's a fabulous session -- I highly recommend you go to it. But if you just want a quick summary, you can get this from the tech note 2151.
And it's an indication of how often we reference this tech note by the fact that I know the number, because it has a good summary of all these watchdog crash reports. And the watchdog crash report that says "ate bad food" means that you were killed because you weren't responding to these rude events.
And the reason why this is an issue is because of this: The watchdog timeout is roughly about 20 seconds, that's not an API, that's a current implementation detail. And all the networking timeouts of synchronous networking are all longer than that, in some cases much longer than that. So if you do a synchronous request and the network drops out from underneath you, then you're stuck, waiting for the network to respond. You can't get out of that; you're just stuck there.
And then at some point, 20 seconds later, the watchdog comes along and kills your application. And the answer to this is not lower the timeouts. An application that's unresponsive for 5 seconds is still a broken application. You need the application to be responsive all the time and that means the main thread can't be doing things that take long periods of time, and especially can't be doing things that take unbounded periods of time like networking.
So we return to this: Synchronous program networking on the main thread is death. Now there's another "gotcha" here and that relates to this: Hidden synchronous networking. There's a whole bunch of utility methods in the operating system that do networking behind your back, things like the NSArray initWithContents0fURL. If you pass at a filesystem URL, it will read a plist off the filesystem and that will be fast; it won't be a problem.
But if you pass an http URL, it will go to the network. And if the network's not working properly, it can't return because it hasn't got the whole results yet. It can't error because it's not got to the point where it's timed out. It just sits there and waits and the watchdog kills you.
And in addition to that there's the DNS. Lots of people make these traditional BSD DNS calls -- gethostbyname and gethostbyaddr. Again, fully synchronous -- don't call them ever on the main thread. And another one is this NSURL connection method -- sendSynchronousRequest:returningResponseerror. Now that's not really hidden in the sense that it's got "synchronous" in the name. But it is one common case where we see people tripping up.
And finally, there's this notion of synthetic synchronous. And synthetic synchronous is where you call an API asynchronously and then you wait for the results. So here's some pseudocode -- this guy. I get this from folks. They say, "But I called the API asynchronously." And it's like, "No, if you're waiting for the results, it's synchronous." Now synthetic synchronous is a good idea in some circumstances. I'm going to have some examples of where you might use it later and where the operating system uses it. But it's not a miracle cure for this equation. Synchronous networking on the main thread will kill you.
So how do we break this? How do we break that equation? And the first thing you might think is, "Well, if synchronous networking is not in the main thread is bad, then I'll just put it on a secondary thread and that will be good." And that's rarely the answer. Sometimes it's the answer in some circumstances but for typical iPhone OS application, it's not the answer because threads are evil. Now that was a contentious statement a few years ago but now it's fairly well-understood that threads are reasonably evil.
And I'm going to talk about that specifically in a slide or two. But for the moment, you just have to take my word for it. Now another option is Grand Central Dispatch. Grand Central Dispatch is all about asynchronous programming. It's potentially the best thing coming to networking ever in terms of programming model.
Unfortunately, today it's not a great choice and this ties back to something I had in my previous talk, which is this: here's the iPhone networking stack. We want people to be working at the Foundation layer. Now Foundation has been revved to take advantage of GCD in some places but the networking part of Foundation hasn't been. And that means you have a choice: you can either not use Foundation and use GCD, or you can use Foundation and not use GCD.
And our recommendation is you stick with Foundation; it gives you a lot of benefits above and beyond what you can get from just using GCD directly, which means that for the moment, for a typical network programmer, GCD is the future. One day it will be great, but for the moment we have to stick with the third option here, which is run loop programming. Now I'm going to talk about run loop in a lot of depth in the next section but first I wanted to return back to this idea of threads being evil.
Why are threads evil? The first point for why threads are evil is this notion of locking. If you have multiple threads in the same program, then they may share data and if they share data, you have to lock the data before you access it. And then you introduce this notion of accessing the data outside of the lock, which causes random corruption. Or if you have more than one lock in your program, you end up in deadlocks.
And in general, that sort of thing is just a mess and it's better to try and avoid it. In addition to that, this is this idea of cancellation. If you're doing synchronous networking on the main thread and the network goes away, then you can be blocked. Or even if the network's just waiting for data, you can be blocked inside the kernel waiting for the data.
The problem here is if the user hits the cancel button, how do you get out of the kernel to say you're done? And it turns out that's very hard to fix. You can fix it. You can use inter-thread signals and other crazy things but the reality is it's very hard to get right.
And it's one of these cases where making a request asynchronously -- it starts to be a big win because you have an asynchronous request, then you're just waiting to be called back. And if you want to cancel, you just invalidate your own loop sources and release everything and you're done.
It's a big win for asynchronous. Timeouts are a similar issue. If you use synchronous programming, you're at the mercy of the timeouts provided by the underlying API you're using. But if you use asynchronous programming, you can time out just by using MS timer and when it fires, cancel the request and you're done.
Also, bidirectionality -- TCP is inherently bidirectional. And if you use a protocol, it is bidirectional. It's a big win doing it that way. You get advantages on the wire but it doesn't work well with synchronous networking on threads. If you're reading, waiting for data, you're stuck. You can't write data as well. Now, again, there are ways around that. You can simultaneously use two different threads to read and write. But that sort of undermines the whole benefit of using synchronous programming.
And finally, there's this issue of resource use. If you have 10 threads, they each have 8K of stack and they might be blocked in the kernel and they're consuming 24K of kernel stack. And on the iPhone, all of that memory is wired down -- it's consumed permanently. So you're looking at large amounts of memory that can't be reused for anything else and they're just stuck and they're literally doing nothing; it's no benefit to user at all.
In contrast, if you use asynchronous programming, that really -- that's not an issue anymore. You don't have threads sitting there, consuming memory doing nothing. So in my experience, threads are evil. But of course, there's a "but." There's always a "but," and that is if you're doing CPU-intensive operations.
Threads are evil for networking but for operations that need the CPU, they're a really good thing -- they're the best way to get the CPU to run concurrently with the user interface or the CPU task you're trying to do to run concurrently with the user interface. They're also fine for doing I/O operations that are both fast and reliable, like accessing the disk drive -- at least on iPhone OS the disk drive is both fast and reliable. And so it's fine to use threads for those things. And that raises the question of how do you mix and match them? How do you do threads for one things and run loops for the other? And my recommended way of doing that is with NSOperation.
NSOperation is this abstraction for dealing with asynchronous operations. It's like start the operation and when it's done, you hear about it being done. And it turns out that you can use NSOperation -- what we call standard NSOperations for CPU-bound tasks where the NSOperation queue starts a thread for you and you run on a thread and that's good for CPU-bound tasks. And then for networking tasks, you can use what's called a concurrent NSOperation where the operation queue doesn't start it on a thread but instead organizes to run it and expects it to continue running by itself.
So NSOperation is a really good way to model a mix of CPU and network-bound intensive operations simultaneously. It's a very unique technique. The only issue with NSOperation is that doing the concurrent NSOperations for networking is a bit tricky. There's a bit of fiddly things you've got to do to get it work properly. So I've been working on a sample that shows how to do this and it's called the LinkedImageFetcher sample.
It's not quite ready for prime time -- I didn't really have time to get it properly reviewed before the conference. But I've put it on the attendee site so you can go and grab it from there. And it shows how to do this mix and match of CPU and network operations in the same program. And I do intend -- I fully intend -- to get that made public soon after the conference.
Finally, before we leave the subject of threads, this is the idea of hidden threads. NSOperation will start threads behind your back. Similarly, Grand Central Dispatch will do the same thing. And also this Cocao method, performSelectorInBackground:withObject:, will also start threads behind you are back. And you if you mix and match threads and run loops, you can fall into one big pitfall and I'll give a good example of that later in the talk. But just for the moment, the lesson here is: Watch out for these hidden threads. And that wraps up for the basics. We're going to dive straight into run loops. And that's definitions to start off with.
There's one run loop per thread, always. A run loop is an event dispatch mechanism -- it monitors a set of event sources and each event source has a callback associated with it. And when the event source fires, it calls the callback. Now, the run loop has to be explicitly run by the thread that it's associated with. So the thread runs the run loop and while it's running inside the run loop, it monitors these event sources and calls the callbacks as they fire. And if no event sources fire, it blocks, waiting for one of them to fire.
In general, you must explicitly run your run loops. But as one special case, the user interface frameworks like UIKit on iPhone OS will automatically run the main thread's run loop. Now, to look at this graphically, I have a series of less-cheesy diagrams. Here's a bunch of threads -- the main thread and a couple of secondary threads and each of them has an associated run loop. The run loop sort of owns that thread. Now, if we focus on one of these threads, we can zoom into it.
And here we see the run loop and the run loop is associated with all of the run loop sources. Now these run loop sources aren't abstract notions, they're related to what you've done. So for example, here on the left, if you start a timer, we create a timer event source that's attached to the run loop. And on the right there, we started an NSURLConnection and it's created a connection source. So it's attached to the run loop. So these run loop sources don't come from out of just thin air, they come because of operations that you've done.
Here's an example of actually scheduling something on the run loop. Here's -- we started with a net service, which is a reference to a service that we found on Bonjour. We created an input stream for that service. Now that input stream by itself is not scheduled on the run loop so we explicitly scheduled on the run loop with this scheduleInRunLoopforMode method.
And you pass in the current run loop and you pass in on the run loop mode. Now, run loop modes are a source of some confusion and I'm going to cover those in detail in a few slides. But for the moment we're just going to ignore it and choose the default run loop mode.
Then the other thing you do is you set the delegate and the delegate is effectively the callback. The real callback is internal to NSInputStream. But for the moment from your perspective the callback is the delegate. And then once the source is set up on the run loop, you kick off the open.
And then the open proceeds asynchronously in the background. At some point in the future when the open is complete, you'll start getting events to your delegate. And we'll call this method the HandleEventMethod on your delegate. And the question is, "Well, what thread is that running on?" It's running on the thread associated with the run loop that you passed in when you scheduled the stream on the run loop.
Those two are tightly-bound together and this also means that in order for this delegate callback to be called, you have to be running this run loop. Now, if you're in the main thread, that's really easy -- the UIKit does it for you. But if you're in other threads, you have to go out of your way to make sure it runs.
Now this is explicit scheduling where you explicitly tell the frameworks what run loop you want to schedule on. In addition to this, you get implicit scheduling. And here's an example of this, where the frameworks sort of decide for themselves what to schedule on. And this is NSURLConnection, it's a utility method called connectionWithRequest:reqdelegate. And that automatically schedules on the current run loop in the default mode. And so that's the context that this callback will run in.
Every time we have an implicit scheduling, we almost always have an equivalent method that's explicit. So here's the explicit version. You allocate the connection with the request and the callback, which is the delegate and you pass NO to this startImmediately prompter. So it doesn't start, it doesn't schedule in the run loop automatically.
Then in the next step, you schedule it on the run loop that you want to schedule it on and then you call the start method. And from then on, at some point in the future, you'll get this delegate callbacks associated with this operation. So that's pretty much how you schedule things on the run loops. What about these run loop modes? Whenever you add an event source to a run loop mode -- to a run loop -- you actually add it in a particular mode.
And whenever a run loop runs, it always runs in that mode. And when it runs in that mode, it only runs -- it only monitors the event sources associated with that mode. All the other event sources are ignored. So this is just the basic facts. Here it is graphically -- well, not quite yet.
This is where we left off our run loop model. I'm going to insert the modes in there. So now we have two run loop modes in this blue layer: the default run loop mode and a tracking run loop mode. Some of them have all the event sources associated with them, the default mode, and the tracking mode only has a subset. So when you run the run loop in the default mode, we monitor all of these event sources.
In contrast, when you run the run loop in this tracking mode, we only monitor a subset of the event sources. In this case, we ignore the timer. So if the timer fires, the callback for that timer won't be called. And that's useful in a variety of circumstances. But the real question is: why do we have this whole run loop mode mess? And it's associated with a recursion. Sometimes you're in a run loop callback and you want to run the run loop again.
You might want to call an API using this synthetic synchronous model; sometimes it's important to do so. And an example of doing that is the user interface tracking that's done by UIKit. And I'll talk about in a little more depth in a few slides. But I just want to give an example of this synthetic synchronous model.
You're in a run loop callback and you want to run an async API, synthetic synchronous. So the way you do that is you set up the async call and you schedule it in a custom run loop mode. Run loop modes are just strings; you can pull them out of thin air. And generally we recommend that you use reverse DNS notation just to keep away from other people's run loop modes.
So you schedule your event source in this custom run loop mode and then you run the run loop in that custom run loop mode. And what that means is only your event source will run. All other event sources are held off until the run loop start returns to the other modes. So here's an example of this. Here's where we left our run loop off with two run loop modes. What we do is we create a custom mode and we run the run loop in that custom mode.
And we add our file descriptor in the schedule -- we've created a CF file descriptor just as an example -- and we add that -- its event source -- to the run loop in that custom mode. And when we run the run loop in that custom mode, only that event source is looked at; all other event sources are ignored. It's really useful.
In this case, if you're doing it on the main thread, this might be a bad idea because it's effectively If you block forever, you'll be killed by the watchdog. If you're doing it on a secondary thread, it's perfectly reasonable. You can even do it on the main thread if you can limit the amount of time that you'll spend if there's some upper bound to the amount of time you'll spend running the run loop in the custom mode.
Now example of where this is used in practice is user interface tracking. If you have a scroll view on screen and the user taps down on the scroll view and drags up and down, the UIScrollView class wants to run the run loop in order to track the user's finger. And it doesn't want to return to the main run loop in order to do that, to the top level, so it uses a form of synthetic synchronous.
It gets all of the run loop sources -- event sources -- that are associated with tracking touches, such as the input event sources and the compositing sources required to composite out to the screen so you can see things and it adds those to a custom run loop mode, which is UITrackingRunLoopMode. And then it runs the run loop in that mode.
And so all of the event sources required to track a run and other event sources, such as maybe one's OpenURL event sources or ones related to push notifications don't run. This is a hugely important technique. Now, it's a "gotcha" for you guys because if you take an object and you schedule it in the default mode, then the run loop isn't running in that mode at this point. So you might have created an NSURLConnection and it's receiving data, the user puts their finger down on the scroll view, and it stops receiving data because its event source isn't being monitored.
And you might work around this by scheduling the event source, not only in the default mode but also in the UITrackingRunLoopMode. But there's actually a better solution to that, and that is to schedule in this common modes object concept. The common modes are a meta-mode. You can't run the run loop in the common modes but you can schedule event sources in the common mode.
And when you do so, the run loop automatically schedules those event sources in all the likely places that you'll need to be run, which are these modes called common modes. Now on iPhone OS, the common modes consist of the default run loop mode and the UI tracking run loop mode.
But that could be extended. So for example in Mac OS X, there's a run loop mode for tracking across the menu bar when the user puts the mouse down in the menu bar. And so the key thing about using the common modes is that you run in all of these modes where you're likely to need to run. And it's a good abstraction layer for getting your code running, even though the user's interacting with the user interface.
The "gotcha" with using the common modes is that if the run loop is running in the default mode, then you can do all sorts of things -- you can do pretty much anything. If you get a network error and you put up an alert, that's fine. But if the user is tracking their finger across the scroll view and you're running on the UITrackingRunLoopMode, then if you get a network error -- because you're running now because you scheduled in the common modes -- if you get a network error and you put up an error alert, that's going to do bad things. The user's going to be hopelessly confused. It may -- it probably won't crash the frameworks but it's not going to look good.
So if you use this common mode concept, make sure you understand the context you're running in. And for example, you can use other mechanisms like a short timer that's scheduled only in the default mode to defer these sorts of user interface operations. As you're using run loops, keep in mind the following: There's never any need to create or destroy run loops. Run loops are created on demand per thread and they're destroyed when the thread's destroyed, so you don't need to mess with the run loop itself. In contrast, run loop sources -- it's vitally important that you invalidate them.
If you think about those previous diagrams, they are massive pointers with one object pointing to the next object, which is pointing back to the other object, and so on. And so it produces a massive amount of retain loops between all these objects. And if you fail to invalidate your run loop sources, then what happens is those retain loops are never broken and you just leak memory. Whenever you schedule an event source in a run loop, you typically sort of have an owning object, which sort of owns that scheduling.
And before it releases its last reference, it's vitally important that you invalidate the run loop source before you release your last reference to it, otherwise you'll just leak. And in some cases -- I had a developer today who was leaking sockets because he was failing to invalidate his socket run loop sources.
Try to avoid scheduling cross-thread scheduling where you're running on thread A and you're trying to schedule an event source on thread B's run loop. In general, it's meant to work. It works at least 99% of the time. So sometimes it just blows up. But worse than that, you know, we can fix those bugs; we know about them, we are fixing them.
But the real issue here is that inside your own code you can get into these race conditions where the run loop sources are or aren't scheduled and it just gets very confusing. So always try and schedule on the current thread's run loop. And if you need to, use performSelectoronThread to get over to the thread that you want to be on and then schedule on the current run loop from there.
Don't run the run loop recursively in the default mode on the main thread. The UI frameworks have run loop sources that are only meant to run in the default mode at the top level of the framework where you're nearest to main. If you run the main thread's run loop in the default mode, those sources will fire in the wrong context and bad things will happen on both iPhone OS and Mac OS X.
Run loops are a serialization mechanism. This is vitally useful in most cases. If you think about a run loop, it monitors event sources and then calls the callback. And when the callback returns, it returns to monitoring the next event source. So these callbacks are inherently serialized, which makes your network programming very much easier -- it radically reduces the amount of race conditions you have to deal with. But the issue is, of course, that this serialization can give you latency.
If your main thread is off doing user interface compositing somewhere or calculating Mandelbrot sets or whatever it's doing, then while it's doing that, your network event sources aren't firing because it's in a run loop callback. And so you really want to either keep the main thread doing very nonsynchronous operations, i.e., always returning to the run loop quickly or in some cases it's a good idea to create a single, secondary thread and put all of your network event sources on that thread. And so they will never be held off due to latency on the main thread.
And finally, there's this problem with hidden threads. I'm going to go into that in a little more detail. Here you see me doing performSelectorInBackground to call the doStuff method. Now when the doStuff method runs, it's running on a secondary thread -- that's the whole point. It does its stuff and then when it's finished, it wants to schedule a timer to continue doing more stuff in about a second from now.
Now the thing here is that that doMoreStuff method that it's trying to call can never possibly execute. And the reason is the schedule timer with time interval method always targets the current run loop, which is the run loop associated with the current thread, which is a secondary thread because we ran doStuff using performSelector in the background. Now when that secondary thread is created by performSelector in background, it's created, it calls doStuff, and when doStuff returns, it's destroyed. And so any event source that you schedule on it will never run because the secondary thread never runs the run loop.
It's a real "gotcha" that confuses a lot of people, so what watch out for this one. And this is why hidden threads are a danger if you're mixing and matching threads and run loops. And that wraps it up for run loops. It's been a long haul. Glass is almost full to the fifth stage, which will be good. The last thing I wanted to deal with is state management.
And this is the idea of how you connect the states of the front end of your application -- the user visible states -- to the states of the back end of your application, the states associated with the networking. Doing operations on the networking typically requires lots of states as you get this thing in posit and then deal with the results and so on.
But the user interface hopefully has very simple states because you don't need to display a lot of state information to the user. Now, a really easy example of this is this placeholder mechanism. The only piece of state shared between the user interface code and the model objects is whether the placeholder has been got or not.
That's one piece of state: Are we busy or what's the image? In fact, in many cases, if you follow the advice from my previous talk, you don't really need to share state here at all; all you need to do is get the object -- which is the image -- and listen for notifications for changes to that object.
In contrast, solicited operations, ones that the user is expecting -- specifically requested -- and they want progress on, is a little more state there. Obviously you need to know whether the back end is busy -- here's an example in Safari where if the back end is busy, we get a Stop button rather than a Refresh button. And similarly, if the back end is busy, you get a progress bar, so that's two pieces of state. And if the back end fails, if the model objects can't fetch the data, then you get another piece of state, which is the error.
And in addition to that, there's one piece of control flow that goes down, which is the cancel. But the reality is the state sharing is really small here. Safari is doing a lot of weird things to get data off the network. It's going through probably hundreds of network states to get the primary URL posit, get all these images, and so on.
But the front end is seeing one state -- you know, effectively one state -- which is: Are we busy or not? And so it's very relatively easy to map your back end states to your front end states in a way that the user can understand and that's reasonably easy to wrangle in terms of your user interface. Typically it involves: Are we busy? What's the progress? And was there an error? So the take-home message here is: Asynchronous programming means you will have to do state management in your user interface; it's really that simple.
But it's not as hard as you might think. It doesn't require a huge reworking of your user interface. In most cases, it requires a very simple rework. And the real trick is to hide all of the irrelevant states down in the model so that the model goes through a bunch of complicated states and the front end only says "busy" or "not busy," "error" or "not error." And then, of course, for the front end to know about changes to the state, you have to have some sort of model notification.
And that is something that I talked about in Part 1 of this talk and something that's too complicated to recap here. So if you missed Part 1, you might want to go back and look at it on the video. So that wraps up state management and it fills the first part of the glass -- we're happy about that I think. Well done.
[ Applause ]
And I'm running really fast, which is probably a good thing. Okay. Next point: debugging. Network debugging is traditionally hard. This is the guy with the network debugging program. Now why is it hard? It's hard because the network is asynchronous and it's hard because network behavior is dependent on environmental factors that you can't control, such as how many MiFi iPad stations there are out there. And so if you get these problems coming in from the field, it's very hard to debug them. And similarly, you may encounter this problem that only happens once in every 10,000 executions and yet it still crashes on a number of users and annoys them.
[ Laughter ]
Now, you're laughing because you think it's a joke but it's not really a joke. There are lots of things you can do to minimize the bug count in your program. The number one most important thing is design. If I'm writing a user interface application, I'll often sit down and write a line of code, put up a table, view, run it, change the code, run it again, see whether it works, get a bug, fix the bug, and so on -- sort of this incremental implementation approach. That works fine for user interface code where everything tends to run in a deterministic fashion; it's a real disaster for networking code.
The networking code you have to plan in advance, you have to understand the states that the network can be in, and how those states change. And you have to understand how those states affect your model and how those states affect the front end of your application. Plan that out in advance so that you're not building it from scratch and keep changing it because if you change it, that will introduce bugs. So try and design it in advance and stick with it. And if you have to change it, think very carefully about how you change it.
My other tips are a little bit more prosaic. You know, I get projects from developers and they're like, "This crashes." And you go, "Okay." And you build it and it's got compiler warnings and it's like well, that's step one. And then you run a static analyzer and it's got static analyzer warnings. It's like life is too short to debug network problems and these other silly problems. You get the silly problems out of the way and that makes compiler warnings, that makes the static analyzer, that means adding asserts to your code.
Asserts are really useful for tracking down bizarre networking problems because if things happen in the wrong order and the program's not executing the way you expect, your assert fires and instead of corrupting some vitally important state, you end up straight in the debugger. So that's a huge win for network programmers. And similarly, memory management warnings. Memory management errors are very indeterministic as well, just like network errors. So don't try and debug your memory management problems and your network problems at the same time; use zombies to flush out the memory management problems first.
In terms of real network debugging, the first point I want to talk about is logging. Now logging has a bad reputation in debugging circles. People disparagingly refer to it as "printf debugging." And it's considered sort of a 1970s technology. I love logging. Logging is the first thing I add to my network programs and the reason is because in my opinion, logging is like a TARDIS. Okay, if you're not -- okay, we have some Dr. Who fans out here.
[ Applause ]
This TARDIS is about five minutes' walk away from my flat in Glascow, it's pretty cool. So if you're not a Dr. Who fan, a TARDIS is a vehicle that can travel anywhere in time and space and that's what logging does for you. These network bugs are indeterministic and sometimes they happen in real time. If you stop in the debugger, then the service stops sending you data and it eventually times out and gives up on the connection. And so you can't just stop on the debugger.
But if you've got good logging in there, you can play the thing out in real time or wait for the error to happen. And then when it does, go back through the logger and replay time at a speed that you can understand. And that's a critical piece of network debugging technology.
And as I say, it gets a bad rep but it's good stuff. In addition, logging lets you travel in space. You've got some user in Uzbekistan whose network always reproduces this problem but you can't reproduce it ever. You've got an app reviewer that always refuses this problem and you can't reproduce it ever.
What do you do? You can't go to Uzbekistan or indeed, the app review offices to debug the problem. So what do you do? And the answer is you have that user turn on the logging and send you the log. So it's don't skimp on the logging when you're writing a network problem, it's a critical part of making a network problem that can be debugged in the real world.
So when you do it -- many people disable the logging when they're done. You know, in the release build they remove the logging so it can't be enabled because they think it will slow down their program or something. My experience is you want to leave it in there -- leave it in there and leave it disabled and provide a way for the user to turn it on so that they can get you in the field reports. Try to make the logging persistent. If the application crashes, having all the logging information in memory is not going to help you; it's gone.
And similarly, if the user has a problem and then quits your application, launches mail, say, "I've got a problem." And you say, "Well, what did the log say?" That's not helping really, is it? Because it's gone. Also make it easy to retrieve. I really like the in-app email feature of iPhone OS 3 because you can automatically create an email and attach the log to it. It makes it very easy for the user to get the log to you.
Packet traces are a way to log what's going on in the wire; it's another form of logging but it's a very specialist form of logging -- it allows you to see all the packets traveling over the network. Packet traces are another critical network debugging technology. They allow you to do divide and conquer.
You have a problem with a server and a client and you can't never really tell, is it the client sending the wrong request or the server responding incorrectly? Well, what do you do? You run a packet trace, see what went over the wire, and then you know for sure.
Similarly, you can use packet traces for comparison. If client A works and client B doesn't, you packet trace both of them and see what's the difference between the requests that were sent and then fix that. It's also great for verification. Every now and again you run a packet trace over your program and see what it's sending on the wire. I like to think of it as leaks for the network.
Every now and again, you're sending things you weren't expecting. You might have typed "http" instead of "https" by accident and now you're sending all the user's confidential information in plain text. And the only people who are going to find out -- if you're lucky -- are app review or the black hats.
Similarly, you might have a game update timer. And so you're sending your game state every tenth of a second and you forget to invalidate it and now you're sending your game state twice to every 20th of a second and that's bad. So look at the packets on the wire with the packet analyzer and see what's going on, just every now and again.
Also, if you do this, make sure you have some feature in your debug build to turn off your on the wire privacy features, typically TLS, because the packet trace works a lot better if you can actually see inside the packets. Now there's a Q&A -- packet traces aren't available in iPhone OS but if you go to Mac Handy, you can take a packet trace with the tools referenced in this Q&A.
And finally, there's the simulator. The simulator is a great tool, especially since iPhone OS 3 for doing network debugging -- surprisingly so. The simulator and the iPhone behave quite similarly if you're using typical application debugging technologies. So it's fine to do a lot of network debugging in the simulator and if you do that, you get access to Mac OS X tools. And my favorite tool of all time is DTrace, it's like super-logging that you don't have to write.
Of course, if you're running on the simulator, you're running on the Mac OS X kernel so nothing really behaves exactly the same as an iPhone. So you have to do your real testing on an iPhone. And that's my talk on debugging and network debugging. My next step is the common mistakes, notably putting foam at the top of an empty beer glass apparently is a common mistake, too -- I have to fix that one.
Number one common mistake: main thread synchronous. We see it all the time. Some significant proportion of app review rejections have occurred because people turn on airplane mode on the iPhone, they launch the app, it crashes. And we see millions of these -- probably not millions -- but probably hundreds of thousands of these come in every day through the crash reporting mechanism. Just don't do it.
Number two: threads. As I like to say, "Networking is hard enough without involving threads as well." I see applications like this all the time. They think, "Let's do something asynchronously." Of course, asynchronously means "threads" and they have these threads running through all of their model objects, all through their view controllers, all through their views. And at some point, somewhere in that code, they call UIKit on one of those threads and it doesn't blow up most of the time but every now and again it just goes boom.
And they go, "How do I fix that?" And the answer is: get in your TARDIS and go back 6 months and design your applications properly and then it won't happen.
[ Laughter ]
So be very careful with the threads. It's not the threads that are evil -- well, they are kind of evil -- it's not that threads don't have their place, it's just their place is deep in your model and very well-contained.
The interface life cycle -- this can get very confusing so I've put up a chart. There are a variety of interfaces on iPhone OS: Bluetooth, cellular, Wi-Fi. They come and go on different life cycles. Bluetooth is easy -- when you resolve a Bonjour service, the Bluetooth interface comes up. We talked about that earlier today. And so if you're confused by it, take a look at the video for the Bonjour session.
Bluetooth interfaces go down on idle. They don't go down when you disconnect; they go down when the interface hasn't had many packets over it for a few minutes. So a few minutes after you stop talking on the interface, that's when it goes down. The cellular interface often will be pinned up by various services, for example, push notification will typically pin the cellular interface up all the time.
If the cellular interface isn't up, then its life cycle isn't being held up, then its life cycle is controlled by the API as you call. It comes up when you connect to a network service and that assumes you're using CF Socket stream or higher, which means CFNetwork or Foundation. If you use BSD sockets, the cellular interface won't automatically come up based on connect.
One day we'll fix that but today is not that day. The cellular interface goes down based on idleness. So again, if it's going to go down, it will go down about two minutes after you stop using it -- I think it's two minutes, some short number of minutes after you stop using it.
The Wi-Fi is substantially more complex, so another slide. The Wi-Fi comes up based on a number of criteria. The first one is if it sees a network that it's seen before -- it's one of your known networks -- then the Wi-Fi will come up automatically. In addition to that, there are these two controls: there's a user-level control, which is asked to join networks and settings. And if that's on and there's a Wi-Fi app at the foreground, then the Wi-Fi will put up the Wi-Fi chooser dialogue and let the user choose a Wi-Fi network and login to it. And a Wi-Fi app is one that has this UIRequiresPersistentWiFi, in its plist.
Wi-Fi going down is also somewhat complex. Wi-Fi goes down 30 minutes after the last Wi-Fi app has left the foreground. Now the definition of "foreground" is quite complicated there. If you ScreenLock your device, then that pushes your Wi-Fi app to the background -- not the background of the multitasking sense, but it makes it inactive. And that means that the ScreenLock is the front-most application.
If you think like a Mac, it actually makes more sense here. So the ScreenLock becomes the front-most application, which means -- and the ScreenLock doesn't have UIRequiresPersistentWiFi set. So if you ScreenLock, then your UIRequiresPersistentWiFi setting is no longer relevant and the Wi-Fi will go down in about 30 minutes after that.
But it's worse than that because if you're not running on batteries, then when you ScreenLock the device, unless something else is keeping the CPU awake, the CPU will shut down. So it will go to sleep like a Mac would go to sleep. And when that happens, then the Wi-Fi goes down as well. The whole story is also complicated by this notion of captive networks.
Now both -- I'm not quite sure if I talked about captive networks, I know Brett did, I think Stritt did as well. Captive networks is a complicated topic and you'll have to catch the information on those in the previous talks and also last year's iPhone networking talk as well talks a lot about captive networks.
Reachability. Now, you never thought that I would complain about this but it's an API that people use too much. Don't use Reachability pre-flight -- we've had that in all three sessions this week and it's true. Reachability is about user interface, it's about telling the user what's gone wrong; it's not about telling you whether you can connect or not. So if you want to connect, connect -- it's just that simple.
If it fails, you can then use Reachability to get some idea of why it failed and maybe guide the user as to how to fix it. Or you could start a Reachability operation in parallel with your connection operation so that you minimize the amount of time that you displayed imprecise state to the user. But don't use Reachability before you connect, just try and connect.
Similarly, don't use Reachability to determine the interface type for the sake of speed. We're going to talk about that in the next slides. What you want to use Reachability for is to guide your user interface, provide useful feedback to the user to say, "No, this is never going to work; you need to fix your networking." It's also a good idea to trigger retries.
If you've got a whole bunch of queued operations and they've all failed and you've retried them once and they don't seem to be working, then you can queue them all up behind a Reachability change. And when the Reachability changes, then retry some of them and see whether they work. It's no point retrying them until Reachability changes because chances are the reason for the failure was a problem with the local network stay.
Another example of Reachability where it's useful is interface-specific connectivity. I see developers who have licensed content that can only be displayed if the application is running on cellular, you know, movie content and so on. And that's a good reason to using Reachability to determine the interface type. A bad reason is to estimate the speed.
In terms of using Reachability, use it asynchronously. It's like any other API: if you use it synchronously, you will be killed -- on the main thread at least -- you'll be killed by the watchdog. There's a Reachability sample. Make sure you get version 2. If you got a previous version of the sample, go and make sure you update to the version two version because the previous version -- how best to put that? "Severely suboptimal," shall we say.
Interface type. And this is interface type for this question. What type of interface am I on? Which really means -- and also what type of cellular am I on, which really means: What speed is this network? This is a fundamentally broken question. The network speed is independent of the link layer speed -- it's really that simple. If you're talking across a MiFi, which is going to be my example for this talk, your back call is a 3G. So if you're talking to anything on the wider Internet, knowing that you're connected by MiFi isn't going to help.
If you're talking on 3G, on cellular on 3G in this room, you're probably going slower than Edge because every other device in this room is talking on 3G. Whereas very few people still have first generation iPhones in this room, so very few people will be using Edge. So if you need to know the speed of the network, measure the speed of the network -- get a small file, download it, see how fast it goes. And also adapt to changes because the speed can change over time. So just be careful with this one. Don't ask me what sort of interface we're on purely so you can get an estimate of the speed.
Timeouts -- another thing that we see with common questions and it's generally one of those flawed questions. The network timeouts are set to the default values for a reason, specifically on cellular it can often take tens of seconds for the cellular network to come up, which means that if you lower the network timeout to a few seconds, you may have connected to the network, caused the cellular to start coming up, and then timed out before it's finished coming up, which means you're wasting power and not getting connected. So make sure -- well, as a rule you want to leave the timeouts set to their default values.
Now, if you're worried about what impact that has on your user interface, then solve that problem at the user interface level. If it's a solicited operation -- something that the user specifically requested -- then put up a progress dialogue, put up a cancellation button, keep retrying until it works. And if the user can then walk in range of their base station, it will all work.
The last thing you really want -- I see this all the time on my Mac -- where I go open the lid and then try and connect with my VPN. And my VPN software says "connection failed," because my Wi-Fi hasn't yet -- it took a second or two for my Wi-Fi to bind to the network. That's really annoying. If the VPN software had just kept retrying, then when the Wi-Fi bound, it would have worked.
Or if the Wi-Fi wasn't working, I would have walked in range of my base station and then it would work. And then it would automatically connect. So don't just timeout for user interface operations; let the user cancel and just keep trying. For unsolicited operations where there's no real user interface, then yes, you will need timeouts but just use the default timeouts -- that's fine, the user isn't waiting for them. That wraps up my common mistakes, which wraps up my talk as a whole: two hours of networking. Sorry, guys. I need a beer, I don't know about you. But first, a summary.
Networking is hard. We can't fix this at the API level. It's a fundamental issue of networking and the best way to make it easy is to design your project properly. Good architecture is how to make it easy. In addition to that, what I talked about in this talk is asynchronous programming, how using run loops is really your source of solving the problem of doing the networking off the main thread, and how you can use NSOperation to structure your high-level application into asynchronous operations so that the network details don't leak out into your application.
Plan for debugging -- adding logging once you've got a bug is painful. You really want to add the logging in advance, then deploy the users, then get the logs, not the other way around. And try to avoid the common mistakes. I have lots more common mistakes, I could have kept going all day but those were the real tough ones. So for the moment that's it.
I'm Quinn "The Eskimo!" That's my email address. Paul Danbold is doing our network evangelism at the moment. There's tons of documentation on our website. Apple developer forums, the Core OS section is where I hangout. So feel free to come and ask us a question. You're more likely to get an answer on dev forums than you are if you send me a personal email because if I answer you on dev forums, everyone sees the answer.
Sample code -- use the iPhone sample code but don't be afraid to use the Mac OS X sample code. Mac OS X and iPhone -- the networking is architecturally very similar and most samples work on both. In addition to WWDC attendees, as a special one-time offer, you can get the Linked Image Fetcher sample which shows how to use NSOperation to do asynchronous operations and use the networking.
And last but not least -- well, actually not quite last -- related sessions. Obviously my first session has already passed, so you'll have to catch that on video if you missed it. All the other sessions have passed except for Understanding Crash Reports. It's a really good session, I highly recommend it.