iOS App Performance: Responsiveness - WWDC 2012

Essentials • iOS • 56:43

Creating an app that performs well is essential to making your users happy. Learn the techniques that will make your app launch faster, display graphics smoothly, and respond to the user immediately. A must attend session for all iOS developers.

Speakers: Tim Lee, Ben Nham

Unlisted on Apple Developer site

Downloads from Apple

HD Video (535.4 MB)

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.

I'm Ben Nham. I'm an engineer on the iOS performance team. I'll be joined in a bit by my colleague, Tim Lee, who is also an engineer on our team. And we've been working on performance for the last three or four years. And we're going to talk about improving your app's responsiveness today.

So we're going to talk about two topics. One is responsiveness, and that's how quickly your app responds to user events. So you really want your application to respond to things like touch events, accelerometer events, and so forth quickly so that they feel like they're interacting with a real object. And we're also going to be talking about performance. That's how to make your app do work efficiently.

We're going to look at that in a few different contexts. One, the real key to this talk is to measure and profile your app for performance. So we'll be going over a couple of profiling tools and showing you how to use them to make your app faster. We'll also look at a couple of key scenarios. One is fast app launch.

That's the first user event in your app. We'll also be looking at how to make your app respond to events faster once it's already launched. And we'll also give you a few key performance strategies that that you can apply to your app, whatever performance problem you might have.

So let's talk about a workflow for fixing performance problems. It's really like fixing any other problem. The first step is to reproduce the problem, just like you have with any other bug. And once you've reproduced the problem, hopefully you can time it, get some sort of number, like a wall clock time, frames per second, network throughput, whatever it may be. Hopefully you get some sort of one number that you can keep going back to to make sure you're making forward progress.

The next step is to profile your app with some of our tools, say instruments. And once you take a look at the time profile or the system trace or whatever profiling tool you use, hopefully you come up with some hypothesis of what's wrong. And sometimes you'll only get some sort of general hypothesis, like, "Well, this doesn't seem like a CPU problem.

Maybe it's really a GPU problem. It's a graphics problem. I should use a different instrument, like the core animation instrument." So you might go around in a cycle with sort of a general hypothesis and then more narrow hypothesis. And hopefully, eventually, you'll figure out something to do to fix your application.

And once you make the fix, you come back to reproducing the problem. If it doesn't reproduce and you measure it and it's gotten better, you're done. Otherwise, you go around in this loop again until your performance problem is fixed. So we'll be going over this workflow in all our performance talks today.

Let's take a look at how that applies to a key performance scenario like application launch. So why is application launch so important? Well, the first reason is it's the first thing that the user does. They'll tap your icon, and they'll expect a responsive user interface really quickly. And really what you're shooting for is a really fast app launch, like, say, 400 or 500 milliseconds, because that's how long the application zoom animation takes.

And we launch your app concurrently with the zoom animation. So if you can actually launch that quickly, to the user, it seems like your app has launched instantly. Of course, your app might do nontrivial work, so it might be hard to get to 400, 500 milliseconds, but that's really the goal at the end of the day.

So that's sort of the carrot. Users love fast app launch times. What's the stick? Well, the system really dislikes slow app launch times. And so if you launch in 20 seconds, we'll actually terminate your app. That's the system watchdog at work. And you really want to avoid that because that's a really bad user experience where the user taps your app and it just shows a loading screen and then just terminates. And of course, users often give up before 20 seconds.

So you really want to be on the low end of this scale. Just one thing to note is that if you're running in the debugger from Xcode, the watchdog will be disabled so we can attach with the debugger. So you have to launch the app like a real user would to get this timeout behavior.

[Transcript missing]

So one way of measuring the launch time using logging, as I said, is to set a global, say, start time in main and save to that global in your first line. And then after you get the application did finish launching callback, when you get that callback, it doesn't mean your application has actually finished launching. It means that we're giving control to your app to do all the work that it needs to do to finish launching. So you might think that, well, it'd be valid to just stop the timer at the end of application did finish launching.

But actually, because of the way the system works, all the initial layout and drawing doesn't happen until after you've returned from application did finish launching. So that's why in the first line of this callback, I've actually used dispatch async to sort of end the time of the end of the timer after that work that happens after application did finish launching. So. That's a little tricky, so I wanted to point it out here.

Another way to measure launch time is to use Time Profiler. A lot of people don't know about the CPU Strategy View, so I'll be demonstrating that today. And what that will show you is every single call stack that Time Profiler has taken over time. And you can actually search through those call stacks.

So the idea here would be to search for this ReportAppLaunchFinished method and look for the very last sample with that call stack in it. And that's where you can sort of endpoint the launch time of your application, at least if you care about that now. If you care about something else, like when the shutter button is able, you'll have to search for that instead.

So this is a little easier to demonstrate, so I'm going to head over here to show you how to take a time profile. So just to make this a little more interesting and show you a real application, we're going to actually be looking at the WWDC app. And the first thing I want to show you is how to report the launch time via logging. So as you can see, I've declared a global up here for start time, and in the first line in main, I've I've stored the current time to it.

In my app delegates, application did finish launching. This probably belongs in a header. I've just put it here just so it's all in one place. But I've declared that the start time global exists. And then, as I said, I dispatch async back to the main queue. And that's so that I'm measuring the launch time. I'm taking into account the layout and drawing that happens after you return from application did finish launching. And all this is going to do is report the launch time to the console. So let's build and run this.

And you can see that the current application launch time is reported down here. It says it launched in 1.15 seconds, so that's actually pretty good because, as I said, the animation is 500 milliseconds on a iPad. And this should be actually pretty reproducible. Now, note that the output down here is from running with a debugger. You probably want to run this, you know, not in the debugger, and then you'll see the output in the Xcode Organizer console.

And just to convince you, I'm actually looking at the Xcode app. So all I did was launch the Xcode app and a launch time appeared in my system console. So that should be relatively repeatable. That's one number you can keep going back to as you work on launch time.

As you keep making improvements, hopefully that launch time keeps going down and down. Now let's talk about how to take a time profile. So I've already built the application and uploaded it to my iPad, so I'm just going to start Instruments as a separate program. And I've selected time profile.

There are a few options I like to use. So one of them is in File, Record Options. So there's this option called Deferred Mode. That basically minimizes the overhead of the trace because by default you might have noticed that Instruments sort of interactively displays the profile as you're taking it. This will just put up a block out screen that you'll see while taking the trace so that overhead is minimized.

Also by default, a lot of people attach to one process. I'm just going to profile all processes because it's also relatively efficient. And sometimes your application has some interesting interactions with other system processes. So for instance, if you wrote an app that gets events from the user's events database, that'll talk to a calendar access daemon and so forth. So sometimes it's just good to see all the processes that are running. So I'm just going to hit record now. Launch wwc.app. Okay, something is going a little awry here, so I'll bring up one of my backup traces.

So this is what you would see after taking the time profile. You would see a screen that looks something like this. So what this purple at the top shows CPU usage over time. And what this is saying is, might be hard to see, but in the WWC app, we spent two seconds of CPU time.

Now, As I was saying, one of the features of Time Profile that a lot of people don't know about is this CPU strategy view up here. That's this button at the left. And that'll show you every single call stack sampled over time. So by default, we'll sample the call stack every millisecond.

And if you switch over to the sample list here, you can actually see every single call stack over time. Now, another feature that's kind of handy, let's say we want to figure out the launch time from the time profile. We can restrict the trace to just WWC's main thread by clicking on this thread picker. And I'm going to click on WWC's main thread.

And as you can see, we've highlighted only the stacks that have to do with the WABC main thread. So there's a lot of other stuff going on in this system other than the main thread of WABC. So the beginning of app launch, let's say it happens around here.

You can see the first highlighted call stack is about -- it's about It's at about 2.265 milliseconds. And let's say we want to find what's happening-- Let's say we want to find the end of app launch, the end of the first layout and draw. So as I said, We can search all the call stacks here. We can search for report app launch finished.

And that'll actually show that there were, in this case, 59 samples with report app launch finished in it. And the last one, you can actually see that Xcode has moved the inspection head to about 2.995 seconds. So 2.265 to 2.995, that's about, say, 750 milliseconds spent launching the app.

So you can, at this point, you can restrict your trace, let's say, to just this time period. If you only care about what happens until the first layout and draw. So that's how to use the CPU strategy view to sort of restrict your trace to what you're interested in. And we'll go back to this trace in just a moment.

As you saw, there's a lot of information that comes out of that time profile. It can be really overwhelming. So I'm just going to go over some of the key phases of application launch so you know when you take a look at that time profile, you'll have an idea what all those functions mean.

So the first thing that happens in application launch is we have to do some linking and loading. So if your application links against a bunch of frameworks, and maybe you refer to some constants in that framework, we might have to bind to those symbols. And this all happens before we even hit main in your application.

So some of the things that you can do to minimize the time spent linking and loading. One is to take a look at which frameworks you've linked against. This is in the build phases section of your Xcode project. And take a look here because there's a tendency for frameworks to sort of accumulate in this list. Now, in this case, there's only UIKit, Foundation, and Core Graphics. That's sort of the bare minimum that you need. But there's a tendency for people to experiment with new features and add a bunch of frameworks and never remove them.

And you really want to minimize the amount of frameworks here because every single Objective-C framework you bring in does a little bit of extra work at link and load time. So, for instance, there is a hash table that, of all classes, and that has to be populated at load time based on each framework that you link against.

And that's how things like NSClass from String work. That's how we can do that. You can actually look up a class based on a string and give it--give that class instance to you. So take a look at your linked frameworks. Make sure there's nothing that you don't need there.

Another thing that sometimes developers get confused with is how to use the optional frameworks flag. So we'll say, well, minimize your linked frameworks, and then we'll see that someone has marked UIKit as optional. And you can't really trick the linker. You really need UIKit. So we're going to detect that. And you really shouldn't use optional for frameworks that you actually need, because it has the potential to just add work to linking and load time. So that's not the correct use of optional.

Where you would use optional is, for instance, if you're deploying against 5.0, and you want to link against a framework like, say, passkit.framework that was introduced in iOS 6. So that's where you would use the optional flag so that users on iOS 5 would not error out looking for a passkit framework. So that's the proper use of optional frameworks.

The last thing I want to bring up is static initializers. So this is easy to do with C++ especially. So a lot of games are written in C++ and you'll tend to accumulate a lot of global objects with nontrivial initializers. And so for instance, in this case, I have a C++11 statement that initializes a map, a standard map. And what that's going to do is run the code for the constructor of standard map at load time, before you even hit main. And it's going to go through every single one of these in all of your object files.

So that's probably not what you want. That's bringing in a lot of code before you even do any user work. So you're forcing the user to pay for work that they might not even need. There's ways to do this in Objective-C and C too, using the load method, for instance. So try to avoid those methods. Avoid static initializers.

You should instead make objects when you need them. So Objective-C has a lot of static initializers. And there's a class method called plus_initialize that's great for this, because plus_initialize is only called the first time you use the class. There's a couple gotchas with using it. So refer to the Objective-C programming guide to learn how to use initialize correctly. But that's--if you need to set up global and you have a class that uses that global, use plus_initialize to initialize those globals so that it only happens when you first use the class, not at load time.

Okay, so after linking and loading, we have to do some UI kit initialization for every single app. That includes things like creating fonts, creating the first status bar, reading your user defaults, Deserializing your main nib. And that will show up in functions like UI application initialize, instantiate singleton, create status bar, and load main nib file. So that's where it will show up in time profiler.

So what can you do to reduce the amount of time spent in this part of application launch? Well, the first is to minimize the size of your main nib. And so in this case, I've got nine top level views, each with sub views in my nib. So that's definitely not the right way to use nibs.

And iOS makes it really easy to use nibs in the right way because it really enforces the idea of one nib per view controller. So have one nib per view controller. You can have your view hierarchy for that view in the view controller. But definitely don't have something like this where you have nine top level views, each with a very complex view hierarchy. Because we're going to have to deserialize that entire nib before we even respond to any user events. So that's the first thing. And then the second thing is to minimize the amount of time spent in the application launch.

So that's the first thing. And then the third thing is to minimize the amount of time spent in the application launch. So that's the first thing. And then the fourth thing is to minimize the amount of time spent in the application launch. So that's the first thing. And then the fourth thing So minimize the size of your main Nib.

Also, one of the things we'll do is we'll look in your app for certain key preferences. And underneath the hood, preferences are implemented as property lists. And property lists are great, but they're not designed for large amounts of data. And the reason for that is that, say you ask preferences for an object for a key, because of the way property lists work, we have to deserialize everything in your user preferences before we can hand you back that object for that key.

So if you store large pieces of data, like in this case a ping image, inside your user preferences, we have to go deserialize and create all those objects in memory, even before we do any real work in your app. So again, use preferences for what they're meant to be, for, you know, booleans, integers, and so forth. Don't store giant pieces of objects inside your preferences.

Next, we'll call back to your application. So this is really where your application will-- where we're giving control to your application to do work. So first, we'll call application will finish launching with options in iOS 6. Then we'll restore your application state if you're using that API. And then we'll call application did finish launching with options. So in time profiler, when you see this, that's where you really have to concentrate your efforts for figuring out where hotspots are.

Because this is really where your app is in control. And finally, we're gonna do the first core-animation transaction. So if you don't know what that means, that's okay. This is where we batch up all the work related to laying out and drawing your views. This happens after you return from application did finish launching. And as I said, we force this to happen at least at launch time in report app launch finished. It also usually happens implicitly at the end of the event loop.

So for instance, if you call set needs display 10 times on a view, we're not going to actually call drawRect on that view 10 times. Instead, we coalesce all those set needs display until the end of the event loop when we commit the transaction and we just draw the view once. So a lot of people are sort of mystified by what is this -- what is occurring during commit.

And there's sort of three main things that you'll see taking time. One is preparation. So you'll see as a sub call inside a CA transaction commit sort of prepare transaction. And that's where you'll see things like decompressing pings and JPEGs if you set an image on an image view.

The next is layout. So this is where we call layout sub views. And a lot of views have nontrivial layout sub views. So for instance, UI table view in its layout sub views, that's where it loads new cells. So that's where you'll see work related to creating new cells. And then the last thing we're going to do is we're going to call the view sub view and then we're going to call the view sub view.

And then we're going to call the view sub view. And then we're going to call the view sub view. And then we're going to call the view sub view. So there's a lot of time that we're going to spend in creating table view cells. So there is going to -- generally a good amount of time is spent in this layout portion of the commit. And then finally, after we've laid out your views, we know where they are, we know what the size of them are, we can finally call drawRect if you've called -- if your -- if your view implements drawRect because we know the size of the view.

So again, let's take a look at how this manifests itself in the time profile for the WWC app. So this is the time profile we just took of the WABC app. And I'm going to switch back from the sample list view to the call tree view. And the sample list view, as I said, shows what happened over time. The call tree view shows in aggregate where time was spent. I have to clear this filter at the top right here.

So as you can see, we spent 700 milliseconds in the main thread of WWDC during the application launch. And I'm just going to walk you through some of those things I just mentioned. So in this particular trace, we spent 26 milliseconds in DYLD. That's the linking and loading phase. That's not a lot of time. As I said, if you see a lot of time there, check for static initializers, check for linking against a lot of frameworks, and so forth. After we've linked and load, we actually hit Main.

And we do some UI kit initialization. So UI application initialize, that's where we'll do things like reading your user defaults. UI application instantiate singleton, that's where we'll make the UI application singleton object and your app delegate. And to continue with UIKit initialization, you'll have to expand this call tree a bit.

And you'll see that we spend some time creating the status bar and also loading the main NIM file. So those are call stacks that are associated with sort of UI kit initialization for every app. Now where we start calling you back is in this call initialization delegates for URL. So as you can see, we called, did finish launching with options here.

This is where we give hand control to your application. So you can see that we spent 400 milliseconds here in this particular app launch. And this is where you really need to focus your investigations. The other call stack I want to point out is in, as I said, Report App Launch Finished. If you expand this out, this is where you'll see the CA transaction commit.

And as I said, there are a few key phases to see a transaction commit. One is preparing the commit. That's where you'll see deserialization of images. Laying out and displaying. In layout, you'll see things like calling layout subviews, loading table view cells, and so forth. And display, that's where you'll see the callbacks to your app for draw rect. So I'm not going to go over looking at this app launch right now because Tim is going to go over how to look at this time profile and optimize some key aspects of WBC in just a bit.

But hopefully that gives you an idea of where certain key operations and application launch occur. Okay, so to summarize, you want to make sure your application launches fast because that's the first user interaction, and try to get it to be as instantaneous as possible. And the way to do that is by measuring and profiling your application launch with Time Profiler.

So next we want to give you a few sort of key performance principles that you can apply to any performance problem. And the first one is you have to profile your app. Don't guess as to what's slow. I mean, really, if there's nothing else you take back from this talk, remember to profile your application.

So after you profile your application and you figure out what's slow, how do you make it faster? Well, there's just a bunch of general performance dos and don'ts. Don't do it, don't do it again, do it faster, do it beforehand, do it afterwards, and do it at scale.

So really the most common performance optimization is just not doing whatever takes a long time. And year in and year out in the labs, we always ask people to take time profiles, system traces, and other profiles, and we always see some sort of useless work. And that can be something graphical like an unnecessary mask layer, an unnecessary shadow, or it could be something database related like multiple queries for the exact same data over and over again, or maybe hundreds of milliseconds logging or sorting at launch time. So really take a look at your time profile, and more often than not, if it's the first profile you've ever taken, you'll probably see something that you can really quickly remove and improve your launch time or your responsiveness.

The next is to not do it again. So there are certain classes that take a good amount of time to initialize. And the most common example is TableView cells that's baked right into the API. We really encourage you to reuse TableView cells because there's a method in TableView that actually lets you get a reused TableView cell. But there are other classes that fall into this category. Things like date and number formatters, regular expressions, SQLite statements. And in this case, once you've made that expensive object, you should reuse that object instead of recreating it over and over again.

So for instance, with date formatters, a common operation is to format a date. In this case, we want to format February in a TableView cell. And if you create that date formatter in cell4row at index path and then just immediately release it, you're going to be paying the overhead of initializing the date formatter over and over again. So really what you want to do is you want to sort of cache one date formatter for each date format. If you do that, you'll have to invalidate that cached date formatter when the locale changes.

And note that you can't really trick the system. So if you create one date formatter and just call set date format on it over and over again, that's just as slow as creating date formatters over and over again. So you really should, if you're going to use a date formatter over and over, keep it around and keep one date formatter for each date format in your program.

Another example is with calendars, NSCalendar. So this can happen behind your back. So for instance, NSLog calls-- creates an NSCalendar to format that date in the log line every single time you call it. So it's not something you want to call thousands of times on app launch or when you're responding to a user event. So don't log every single method you hit in your application. I mean, it's fine if you log a few times, hopefully you don't log at all on a release build because your users can't see it anyway. But for this reason, avoid calling NSLog excessively.

One real gotcha is that NSCalendar, current calendar, actually creates a different calendar instance every single time because NSCalendar is mutable. So this can really get you if you're, say, in a loop, getting dates from a calendar for each event, let's say. Because it looks like a singleton, but it's not. So this is just a gotcha in the framework.

If you're using current calendar, save it instead of calling it over and over again. And finally, SQLite statements. They're really actually little compiled programs. So you should make sure to call SQLite3 prepare. You shouldn't call that over and over with a format string. Instead, use bind parameters and reuse that prepared statement over and over again.

So what if you just have to do whatever is slow? Well, hopefully you can do it faster. And this really is the domain of... Correct data structures and faster algorithms. This is where your creativity sort of has to come into play. So I can't really give you too many general examples of this, but one thing you can do, as I said before, choose the right data format.

If you're using property lists, make sure you're using the binary property list format and you're not storing a ton of data in the property list. Because as I said before, it's not a lazy format. If you want one key from the property list, you have to create all the objects in the property list in memory.

If you have really -- if you have tens of thousands of objects, you should really be looking at Core Data or SQLite, because those are incremental formats. You can -- if you need one object from the database, the database will try to just read in the set of pages that contain that data instead of reading in the entire database into memory.

If you are using SQLite correctly, I just want to point out there are a couple callback functions you can use that are public. SQLite 3 Trace and SQLite 3 Profile will call you back with every single query you make, as well as an estimate of how long it took to perform that query. And once you figure out which query took a long time, you can use Explain Query Plan to figure out what SQLite is doing to satisfy the query.

Another performance strategy: do it beforehand. Precompute. So if you have some sort of expensive calculation, you might be able to precompute it and save it off to disk or save it in memory somewhere and reuse it over and over again. So for instance, Calendar supports recurring events. And those recurrences can be pretty much arbitrarily complex. You can schedule a meeting every Monday, Wednesday, and Friday of every month except in February, or on the last week of every Monday, Wednesday, Friday of every month.

And that can take quite a long time when you take into account leap years, time zones, and so forth. So the solution we have here is to pre-expand the recurrences into objects we call occurrences. And we'll store those occurrences in the database so that at launch time, we're not pre--we're not expanding any recurrences. We're just fetching those pre-expanded recurrences.

Now when you do do this, beware of memory growth. Because where this really can hurt you is if you cache a really large image, like a screen-sized image. That's like a 640 by 960 image at 4 bytes per pixel. If you store that in a static UI image and it's been decompressed, you've just blown away 2.4 megabytes of memory. So that's a really bad pattern. Really, really, really think hard if you're saving large images in global. You probably should not be doing it. And we'll have a session later this afternoon about memory to show you how to track down these issues.

Another performance strategy, instead of doing it beforehand, do it afterwards. So if you have a lot of data to load, ideally you can load it synchronously, but if you have thousands of events that you just have to load at launch time, maybe you can use Grand Central Dispatch to push that work later. So for instance, in Calendar, we load all the events asynchronously. So if you look closely, we'll put up a responsive user interface. You can scroll immediately, and then we'll load that user interface with events sometime later in the background.

And finally, do it at scale. So test your application with lots of data. So for instance, the contacts app that ships with the system, it scales pretty well with the amount of contacts. So for instance, if you have 3,000 contacts versus 300 contacts, it launches in pretty much the same amount of time instead of ten times as much time. So make sure to test your app with large data sets.

And the way to do that is to think about the critical methods in your application. So in terms of contacts, it's a table view app. So you have to make sure sections are loaded quickly and the index bar is loaded quickly, and then whatever visible cells are loaded quickly.

And one way to do that is to make sure you store the section counts and titles into some side table in your database or somewhere else so that you don't have to load the entire data set just to group by sections. And Core Data users actually get this for free in the fetch results controller.

So make sure to check that out and make sure you're doing the most efficient thing with large data sets. So in conclusion, to make your app faster, you have to profile your app. And once you figure out what the hot spots are, do it faster, do it later, don't do it. Follow those general performance strategies to figure out a way to make your application faster. And remember to test your application not just as a bare application but with large data sets. And I'll hand it over to Tim to talk about event handling.

Thanks, Ben. All right. So besides app launch, the rest of the time your users are using your app, the thing that matters most is event handling. When somebody touches a button or does a swipe or gestures, handling those events quickly is the key to maintaining responsiveness. And the thing that you want to remember is that handling pretty much all of these events is all about keeping the main run loop free. All these events end up happening on the main run loop, except for a few cases where the API explicitly lets you handle them on another queue. So keep in mind that the main run loop is the one that matters.

So here's an illustration of what that means. So when an event comes in, it gets processed by your run loop. You'll do some layout, you'll do some drawing or whatever. And one at a time, you'll get processed, and things will happen normally. The problem happens when events are handled slowly.

What can happen then is you'll get your event in, and this is moving around a little bit slower. And in the meantime, another event's come in, and another one. And by then, the user has done some action, and your main run loop has not gone around quickly enough to process it. So now the user's waiting. So we want to avoid this, and there's a few ways to handle this, a few strategies.

So the simplest is just...

[Transcript missing]

So in this case, I happen to know my app just like you'll know yours. And I know that "Did control did change?" is really the call that gets called when I switch tabs. So just like in the beginning of main, instead of main will be "Did control did change?" We keep some time. And down here... We print again. The difference between the end time and the start time.

And you may notice here that There's this dispatch async, which Bennett explained earlier, but there's actually the second one, and this one can come up and is a little bit tricky. So what's going on here is that the first time through, it could be the case that this block, which is dispatched asynchronously, may happen before Core Animation gets to doing its layout and drawing for this particular turn of the run loop.

Now, the thing that'll tip you off here is that if you log on the outside and then log within the first dispatch block, you'll have pretty much the same time. And it'll be unusually fast. So in that case, you can try this trick of dispatching it again to make sure that you've captured the time that Core Animation is using. So I'm going to build and run it here.

Killing off the app. And here we go. So now we see the launch time from before. And we see that it took about 400 milliseconds to load one of the tabs. And if we're switching here, It's taken about 450-ish milliseconds between switching tabs. That's--again, it's okay. It's not great. We'd like to keep our interactions down within a couple hundred milliseconds. I mean, ideally, it's instantaneous, but, you know, you do the best that you can. So the next step, as usual, is to take a time profile. So I'm gonna fire up Instruments again.

Do the same trick that Ben did with deferred mode. Hit record and switch tabs. So I'm switching, and I'm switching again. All right. Now we can see very clearly here what the times are when we're switching tabs. So just like before, we can use the inspection range to zoom in. What I'm going to do here is actually use a keyboard shortcut. I'm going to hold Option and drag, and that will also create the inspection range. So you can see here there's 500-ish milliseconds. And I'm going to use Shift to zoom in.

How do we take a look at this trace? So the easiest thing to do is start with this extended detail, which will give you a backtrace of the hottest backtrace that you've got. You know, you could go down here and you can expand this tree and manually look at the times on the side.

and look for things that take an unusually long time. But this thing on the side actually gives you a shortcut into doing that. So for example, one of the hot traces here we see is UILabelDrawRect. And looking at the app, we can see that there are actually a lot of labels going on here.

For every session, there's the session title and the name, and there's quite a few of these. So this makes sense, which is always a good thing. The question is, what can we do about this? Well, first, before you even try to fix label drawing, you want to confirm that your hypothesis is actually a problem. So I'm going to go back to the app.

And I'm going to confirm this. You know, it's label drawing, and it's actually largely here in NSString Drawing Rect. So it's actually doing the string drawing. An easy way to confirm that this is a problem is to just not draw the strings. We'll keep the views to make sure that it's not view processing and whatnot.

So I can go to -- let me see. And what I'm going to do is just remove this text. So we'll keep the views around. And see what that does. This is the process of confirming the hypothesis from the workflow that we saw in the beginning. So we'll run the app again.

So the idea here is that if you don't see enough improvement, it's really not worth spending a lot of time optimizing this case. I mean, you can see here that this is not a shippable app. But in the meantime, it's gotten really fast. At least 100-- almost 150 milliseconds faster.

Okay, so now we have an avenue of optimization that we can pursue. What can we do to maintain a shippable app while-- while maintaining the speed. So going back to the principles and strategies that Ben talked about before, one thing that we can do is precompute or don't do it again. So the labels are actually something that you're doing all the time. A lot of the room titles are the same.

So even the lab sessions have common names. So there's actually a lot of redundant work there. So one thing about redundant work is you can save doing it multiple times if you just cache it. So what I'm going to do is I'm going to cache the labels as I've already done this, because it's not exactly a trivial thing to do.

So I'm caching the labels, and basically what this does is it replaces the UI labels in the app with

[Transcript missing]

We're down to the 300 millisecond range again. You'll notice that it's quicker later. You know, we're down in the 200s, because, again, this is a cache, and ideally what I would do is save it on disk.

But, you know, this is incremental, so... Again, you can see in the app. We have text, and it's quicker. So this is kind of the cycle that you go through. From here, I would go on to time profile, look for something else, and optimize again. But this is the general workflow through the circle that we saw earlier.

So besides doing this process and minimizing the time, making your algorithm faster and doing trade-offs and things of that sort, you can just get work off of the main thread. That will leave your main thread time to process events and be responsive. And there's two main ways that this happens.

Implicit and explicit. So implicitly, there are lots of frameworks that will take care of things for you behind your back. Because we know that things take a long time. The view and layer animations. Not the drawing, but the animations when things move across the screen. That happens in a separate process and your app is not responsible for it after it's set it up.

The next is layer compositing. After you've drawn your layers and created your view hierarchy, the actual processing of that hierarchy happens, again, off in another process. And another very, very interesting one is ping decoding. Oftentimes, you might look at a trace and wonder, how is this so fast? And notice a lot of ping decoding in the background.

So what generally happens is ping decoding will dispatch off into the background and use multiple cores even and then come back to the main thread. Now, one very important thing to remember is that scrolling is not an animation. It sets up a timer that happens on the main run loop and processes events on the main run loop. So keep that part of Snappy.

Explicit concurrency is where you guys get control. And there's a few ways that you can exploit having multiple threads. The easiest, the quickest, is Grand Central Dispatch. There's lots of information about it. I hope you all love it like I do. If you need a little bit more control, there's NSOperationQueue. It lets you have some control over the widths, a little bit more control over priority and tasks, a little bit more setup, but sometimes it's useful.

And finally, NSThread gives you full control over what thread everything runs on. Now, here's an example of where you might want to use Grand Central Dispatch. And this is a case of doing it later, I suppose. So let's say you're in the middle of an animation and you get an event that says, you know, you should update something by reading something off a string.

Reading something off a disk, sorry. And by doing so, you've now dropped a frame because the time to read something off a disk is not quick. So it's pretty simple. You dispatch it off onto a background queue, and then things proceed normally as if you did no work.

and doing the code is almost as simple as visualizing it. You have your original code that's synchronously grabbing something off of disk, setting a text field. You wrap the whole thing in a dispatch async, do it in the background, and then this is key, you have to dispatch back to the main thread if you're doing something with UIKit. Thread safety we'll talk about in a second.

So a few gotchas with GCD. It is possible to have GCD spawn too many threads. And this is actually a problem. There's significant overhead, well, depending on how many threads you have. There is overhead with having too many threads. So this can happen. And there's also a physical hard limit on the number of threads that can get spawned. And you don't want that to happen. Bad things happen.

So how can this happen and how can you avoid it? So it's pretty simple. You start with one of the global concurrent queues. You dispatch some blocks onto it. And they're running and they're doing their thing. This is fine. And the problem comes in when these blocks do something that takes a long time and is waiting on something.

If you're waiting for the network, you're waiting for disk, lock, something like that, then these guys are blocked. The system detects that and figures, oh, we can do some more work. So let's find some more threads. And this is okay as long as this doesn't go on in a loop. Because then you'll just get an explosion of threads. And it's actually pretty innocuous. The code that can spawn this looks pretty innocuous. So be careful for things that -- where you're synchronously blocking in a loop in a dispatch block. Some solutions to this.

Serial queue. You know, just don't do things concurrently. Sometimes that's okay. If things are going on quick, or you don't need things immediately, this works. Dispatch sources is one option, which will control the dispatching of blocks onto those queues, so it's not happening behind your back. This is a case where NSOperationQueue, which has a concurrency width option, and that will limit the number of threads that get spawned. And finally, if you're just doing NSURL connections, just use the async methods. It'll take care of it. So, pretty easy solutions. Thread safety. Like I mentioned before, there are some cases where you need to be careful about what threads things run on.

So the big gotcha is UIKit. In many cases, in most cases, UIKit can only be called on the main thread. If you call it on the main thread, bad things happen. Or they could, and that gets harder to debug. There are some exceptions. There's actually talk about this yesterday. You can refer back to the videos and notes. UI image, UI graphics, those are actually okay to use in the background and can be used very effectively.

A couple other frameworks. Most of the rest of Cocoa, Cocoa Touch, is thread safe insofar as you can use multiple objects from any thread, but you can only use them once at a time. So you need to either dispatch onto the same thread to use them, or the same queue, or lock around access to them.

And finally, there are libraries that are completely thread safe. You don't have to worry about it at all. They'll do their own locking. The one that you might be familiar with is Objective-C introspection. So you're calling in to getting information about your Objective-C objects. You don't need a lock.

You can do this from multiple threads. And Objective-C will handle it for you. Don't do it too much, because you might get into a contention with that one massive lock. And then in that case, you will have to use system trace to figure out what's going on. And we'll have a demo of that in a bit. There are also possibly third party frameworks that could lead you there.

Finally, background queues. We introduced this a few releases ago, but it bears repeating. These background queues, dispatch queue priority background is extremely low priority. I/O is throttled, your CPU is throttled. Basically, you're telling the system, "I really don't care when this finishes. Just do it whenever you've got time." This can take seconds, tens of seconds, minutes. Only dispatch things onto this queue, into this priority, when you really don't care when it finishes. If you actually care, then use dispatch priority low.

And finally, getting work off the main thread is good. There are cases when You'll notice that you're not spending a whole lot of time CPU-wise on the main thread, but your thread's blocked. And this is also a problem, because you're not processing events. So you'll look into time profiler and it's like, there's nothing going on here. What's the deal? And there are various causes for this. Disk network, locking, etc.

So how can you figure this out? Like I said, time profiler, it's kind of hard to find. You pretty much have to go to the CPU strategy view, look at individual call traces at the beginning and end of this long blank space and try to deduce what's going on. You can use record waiting threads. That will actually give you the back traces of the threads while they're not running on the CPU. And that can be effective. But really system trace is the way to go here.

So what I've got here is a pretty simple app. It just has a button, and it's a button that you can press. And in response to the button press, it sends a synchronous URL request. I know it's kind of a tired example, but it's one that works every single time to lock up your main thread to show you how System Trace works. So let me show you the app.

So the idea is you click "Load URL," and this might take a while. It took two seconds because the Wi-Fi in this room is pretty bad right now. So how do we detect this? Now, if you looked at this in time profile, you would see almost no time on the main thread, almost no CPU time on the main thread. But if you take a look at System Trace, which I'll switch to -- If you take a look at System Trace, you can use it to find blocking system calls on the main thread. So I'm just going to click record.

And then I'm going to click Load URL. And it says it took 1.5 seconds to load that URL. And as I said, if you looked at this in System Trace, you would see almost no work on the main thread. Now, if you're an advanced user, you'll love System Trace because it logs all sorts of system operations, such as scheduling events, VM events, and system calls. In this case, we are just going to look at system calls for this demonstration.

And my application was called Synchronous URL Tester, and I'm going to focus on that application, and I'm going to click on the arrow here to focus on all the system calls on its main thread. And as I said, The basic idea is to look at the wait time of system calls on the main thread.

So I'm going to sort by wait time descending, and you can actually click on each of these system calls and get a back trace for how long it took that system call to complete. So some of these are actually pretty innocent. So for instance, here's a 73 millisecond wait time, and it's in a call stack that's actually pretty bare. It's just your event loop waiting for events. So that's good. That means your event loop is waiting for events, and it's actually responsive. Now, the problem is if you--in this case, there's a 1.5 second block.

And you have a call stack that is in this case. View controller did press button and it calls to send synchronous URL request. So that's what you're looking for in system trace. You'll focus on the main thread, sort by wait time descending, and see if there are any back traces in event processing. And so that's sort of a quick way to find blocking system calls on your main thread.

Okay. So just to wrap up, if you take notes on this, if you're doing nothing else away from this talk, remember to profile your application. That's the key to improving your application's launch time. Understand app launch to figure out what's going on in those time profiles. And avoid blocking the main thread using dispatch and other techniques. For more information, contact our developer evangelist, Michael Jurowicz. The iOS app programming guide has a nice flow chart of different phases of application launch. And you can ask us questions on the dev forums. That's all.