Optimize Performance through Testing and Tracking - WWDC 2006

Application Technologies • 39:39

Performance monitoring and analysis for your application should be an important part of every stage of development, not only when it's time to ship. Learn the principles and practices of test design, automation, data tracking, and regression handling that will enable your application to perform exceptionally on Mac OS X. Hear case studies and real-world results from the engineers who've created some of the fastest Mac OS X applications.

Speakers: Vicki Murley, Ken Kocienda

Unlisted on Apple Developer site

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.

Hello there. This is session 126, Optimizing Performance Through Testing and Tracking. My name is Vicki Murley, and I work in the platform experience group at Apple on performance. So let's get right into it here. When most people talk about optimizing performance, what they're really kind of talking about is the very conventional, very sort of typical, retroactive approach.

And what I mean when I say that is, you know, we sort of all know how it works. We're all developers here. The software is almost finished, but it takes 10 seconds to launch. Or maybe it, you know, consumes lots of memory. Or I'm using it, something just doesn't quite feel right. So what happens is, you notice these really terrible problems, and you turn to the wonderful tools that we ship in the developer folder.

Such as, you know, Shark, Sampler, Malik Debug, etc., etc. And you try to fix, you know, the very, very worst problems that you may have noticed. So this is kind of how it typically goes, at least in my experience. This isn't really the best idea. For one reason, you know, a lot of the time, the time that you need to fix these pretty bad performance problems isn't necessarily built into the schedule. As I mentioned before, we all know how it goes. You're working on a small team. You're trying to get all your features in.

You're trying to get all your bugs fixed. Even if you have built some time in at the end of the schedule to specifically think about performance, often that time can be sort of overridden by your bug fixing time or that last feature that you're trying to get in. So what ends up happening is, you fix only the worst of the worst problems, if even that. You may end up fixing nothing at all.

So instead of this sort of old conventional, retroactive approach that lots of people are used to, there is actually a different kind of approach, and that is more of a sort of proactive approach. With this approach, you sort of very, very early in the development process get a basic version of your software working. And at that point, you decide that you need to decide what to measure, find a super easy way to measure, test, and monitor the results.

So the tools that you use here are a lot different from the set of profiling tools that we mentioned earlier. In most cases, you're going to be deciding what to test, so the tools are going to be defined by you. That's not to say that you can't use any of the system tools that we ship on Mac OS X, but, you know, for the most part, you're thinking about things before the fact as opposed to afterwards, so tools are defined by you.

[Transcript missing]

Why should you even care about performance? You know, you're developing software, working hard. Well, first of all, one reason you should care is because your competition probably cares. If you're developing a piece of software or an application that is pretty much on par in terms of features and functionality as your competitor, Performance can give you that leg up to make your app sort of stand out and be better than the rest. Also, provide a great user experience on Mac OS X and customer satisfaction. You know, people use your product, find it enjoyable, have confidence in your brand, and want to buy more stuff from you down the road.

[Transcript missing]

All right, getting started. You're starting out. You decide, OK, I'm going to try this out. Where do I even begin? First thing that you want to do is sort of decide what to test, find an easy way to measure, and set some goals. So let's look at deciding what to test in a little more in-depth. First of all, this may seem obvious, but I just wanted to point it out anyway. You should always, always, always consider launch time.

Clicking that icon in the dock or in the finder is really your user's first interaction with your application. If you're taking more than three bounces to launch in the dock, that is not going to be a good scenario. I know this happens all over the place. Apple applications aren't excluded, but when an application like this laggard here takes about 10 seconds to launch, what's going on there? It's almost personally offensive to me to see that. One reason it's so offensive is because launch time is probably one of the easiest things you can measure. You can just mock up a little script, spit out a log, you're good to go. You're measuring launch time. Boom.

Doesn't take long at all. Secondly, you're going to be looking around, deciding what to test, and you're going to see that, hey, you know, there are some bad things. There are some benchmarks out there, some benchmarks available. Be careful. Improving the floating point operation from spec is not necessarily going to make Safari load web pages any faster. It's kind of will make you seem like you're making progress, but really, when you're looking at these sort of existing benchmarks out there, you have to evaluate their relevance and investigate them thoroughly to see if they apply to what you're actually trying to do.

So instead of sort of using these existing benchmarks, one really great idea is to sort of focus on the most frequent and important user actions for your application. You might even put this in terms of sort of your flagship feature. For Safari, we're loading web pages. Not that complicated. If you're an RSS reader, you want to, you know, read and render RSS feeds. If you're a photo browser, you want to load and display photos. What is the purpose of your application? That is the thing that you want to focus on here.

Before I get started about, you know, deciding, talking about how to create your test and how to define your test, I wanted to point out you're going to be testing often, very often, probably more than you're used to. And for this reason, the test that you create to test performance should be incredibly, incredibly, incredibly easy to use. First of all, it should be accessible to every engineer who is working on the project, and it should be accessible to everyone who is working on the project.

Secondly, you do not want to have a long, complicated setup process. It should be one time. You never have to do it again. It shouldn't take that long. Easy is the key here. To execute things, it should just be one click or one command. And lastly, the test should not take that long to run.

If I am trying to make a code change and test performance in the process of making that code change, I don't want to say, all right, I've got my change ready. Now I'm going to test performance. Three hours later, I'll have the result. It's absolutely unacceptable. No one's going to run a tool like that. No one changing the code wants to do that. So make sure the total time that it takes to run the test is low.

Got my change ready. Now I'm going to test Performance. Three hours later, I'll have the result. It's absolutely unacceptable. No one's going to run a tool like that. No one changing the code wants to do that. So make sure the total time that it takes to run the test is low.

So that was a really great thing for us to focus on because we found that by focusing on time, as opposed to focusing on time and memory or time and whatever, if time got worse, then that could indicate a larger problem. Maybe we slowed down today and that's because we have a horrible memory leak or something like that. So if you're on a small team looking to get up and running quickly, time is a great thing to focus on first.

So time is a really great thing for us to focus on because we found that by focusing on time, as opposed to focusing on time and memory or time and whatever, if time got worse, then that could indicate a larger problem. Maybe we slowed down today and that's because we have a horrible memory leak or something like that. So if you're on a small team looking to get up and running quickly, time is a great thing to focus on first.

Hi, hi, thanks. The Safari page load test is really a great example of the things Vicki has been talking about. It's a test. It's simple, easy to run, built right into the application. It gives you test results back very, very quickly. So I'd like to give you a look.

Now, when I say that the page load test is built into every version of Safari, I really do mean every version of Safari, including the ones if you've got a laptop out there. You've got the page load test. It actually ships out on the DVD. Now, the one thing you need to do to enable it is turn on the Safari debug menu. Now, if you're not familiar with this, you can just turn it on with a simple default right there. If you can't read that, you can just search in Google Safari debug menu.

You'll find out how to do this. And so now that I've turned this on, I launch Safari, and you'll notice now at the end of the menu bar, I've got this new menu, which has a number of interesting features in it that we used over time to develop Safari. And some interesting things that let you look at DOM trees, render trees, and things like that. Well, one of the things over here is the Safari page load test window. So for bring this up, it brings up this very, very simple window.

With a list of URLs in it, and all I need to do to get it going is click start, and it'll go through and load all of the web pages in the list. Now, you know, one of the things about the Safari page load test is, you know, when I say it's simple to use, well, it really must be simple, because here I am trying to think of what to say. I don't want to mess up or whatever. I can still run the page load test. So it really is just that simple to get going and use.

Now, now that it's actually finished running, I want to point out that just down here at the bottom, I know it's a little bit difficult to see, but there's just one number here, just one line of information down here in the lower right. And this is the one number that we use to really, you know, as our takeaway from running the page load test.

It's one number, which is just a little statistical work, which is how long did it take to load each one of those web pages. So now if I go again, it's actually... 317 milliseconds. If we kind of go and take a look, we can see, yeah, it's a, you know, sort of about that fast.

So now, you know, about the page load test and, you know, how it becomes useful to us. Well, one of the thing, one of the big characteristics about it is that it's done. It exists. When I, you know, I, you know, set about writing this tool, well, you know, I think the, you know, a kind of the engineer tendency might be, well, I'm going to go and I'm going to make this whole engine. And I'll have this infrastructure and I'll have these pieces. I have this database and this test suite and everything like that.

Well, you know, something like that might never get done because of the press of the other work that you need to do. And so I just coded this up. It took, you know, a day or two to get this going. The fact that it's built in, you know, Vicky, you know, quite, quite honestly, now we've discovered it over time. It's a really virtue to have this thing built in. But honestly, I've got to tell you, I was just too lazy to kind of make a framework and expose APIs and make a separate test.

Program and things like that. So I just built it into the Safari application because it was the easiest way to get at the guts of Safari and get it to go to another URL once it was done loading one. But it actually turns out to be this great, great benefit because now if you've got the source code, you've got the test. You don't have to worry about gathering up a different piece.

And as you can see, again, as I've already pointed out, it's easy to run and that it produces this one key result when you're done that allows you very, very quickly, if you've made a code change, this is really one of the ways how we use it. As we go and make a code change and we're ready to check it in, run the page load test. You saw it's simple, quick, and easy to do. So now I've made that code change. I can now confirm that it doesn't hurt performance.

Or maybe it did hurt performance and boy, I'd better not check that in or else, you know, Vicki will tell you a little bit about how we handle regressions. But, you know, it just saves you a lot of trouble to go back and, you know, later try to figure out where the regression may be kept into the tree if you didn't run the test.

And so one of the other things that it does is it lets you experiment. If you've got a code change and maybe you go and you're focusing in on performance, you say, well, I'm going to try to speed up page rendering. I'm going to try to speed up HTML performance. I'm going to try to speed up HTML performance. And so you can go and make that change and then very, very quickly turn it around and see whether or not that code change actually did what you intended it to do.

And so just kind of in closing up, I mean, the page load test has been, you know, really, really the essential tool that we've used over the years to improve page load performance. Just simple, easy to run, gives us that one result that we can compare to results that have come before and really lets us track and keep our eyes on making the changes. making the browser faster. So that's it. Thank you.

Okay, thanks Ken. The two things that I want to stress about this, about the page load test, A, it took a day and a half to code up and we're using it five years later, and B, it's extremely easy to use. If it hadn't been easy to use, no one would be using it today. You know, made back in 2001, we're still using it today. Super useful. Okay.

So once you've got your tool together, you can think about setting some goals. And one way to think about this is to compare directly with your competitors. For us, this meant comparing against Mozilla-based browsers, like at the time it was Camino, now it's Firefox, and Mac IE, which had the largest browser share on the Mac at the time. Hard to believe, but true.

We were lucky in that we found a sort of networked version of the page load test where we could compare directly in other browsers with the same test, but a lot of the time, that's not going to be possible. So hand timings are sometimes necessary. It sounds like a pain, but really, it's something that you do once, you have the baseline, you're not trying to get a super exact number, you're just trying to see if you're in the ballpark.

Also, if you have a tool that's built directly into the software, you can't build that tool right into your competitor's software. So, as I said, hand timings are sometimes necessary, but luckily, you only have to do them once. So, yeah.

[Transcript missing]

So, just to go over what we just talked about, you want to decide what to test, find an easy way to measure, and set some goals. That's kind of how you're going to get started with this whole thing.

Once you have all those things done, it's time to test. As I mentioned before, you're going to be testing pretty often, but if you've kind of followed the principles and practices that we've set out so far, you're going to have a tool that's really easy to use. So it's not going to be as bad as you think. In this section, we're going to talk about when to test, checkpoint testing, and your test environment that you may want to set up.

So first of all, when should I test? I have this tool. Well, you should really test before committing any code changes to the repository. Like I said, this may sound like a pain, but if you've got a tool that's super easy to run, there's really no reason not to. Not just code changes that may affect performance either. Any code change, test them all.

Of course, this includes during new feature development. Feature development is the worst time for bad performance regression sneaking into your code, so you definitely want to test then. And lastly, for those rare occasions when each individual engineer may forget to test before checking in their code change, it's sometimes a good idea, it's usually a good idea to set up a dedicated test machine where you can build the source on a daily basis, test things out, and see what the results are.

So let's talk about that daily testing for a second. Everyone is going to be running the performance test before they're checking in code, but sometimes it's useful to have this sort of safety net as well. So you have to think about what kind of environment do I want to use? For us, it was really useful to choose a slow machine.

At the time when we started this, the 800 megahertz iMac was sort of the standard level consumer machine. That's G4, iMac 800 megahertz, very slow. And what we learned was, as time went on, different engineers on the team would get new hardware, and Dave would say, "Hey, I totally tested performance before I checked in this code change. I have a dual 2 gigahertz G5," or whatever. And I'd be able to say, "Well, you know what?

I do see a performance regression on the old slow iMac." So, and then you have to think about what you're going to do with that. And that was very helpful for us. It helped us identify those super small regressions that we didn't want to let slip through the cracks.

In addition, you also want to make sure that the environment you set up is really a controlled environment. By that, I mean that you want to eliminate any variants that may creep in during testing in order to understand the results that you're seeing. What I mean by that is, I test on Monday. I test on Tuesday.

We look 5% slower. The number is 5% higher. Am I seeing a genuine performance regression, or am I seeing, oh, Spotlight was running, or something like that? Really, to make this work, you're going to want to aim for a variance of about 1%. If you don't achieve 1%, you're going to miss those small 2% and 3% gains that you might get, which could really add up over time to a big performance improvement.

So in terms of setting up this controlled environment, there's a few big ones and a few small ones. First of all, Spotlight, it's a huge one. If your app doesn't depend on it, turn it off. Demons on the system, again, if your software doesn't depend on it, turn that off as well.

Network activity, if your app doesn't depend on it, it can go. This is an interesting one because it also sort of illustrates the point that you can also set up a closed network. It's not to say that you have to eliminate all network activity entirely, but you do need to create a sort of consistent, controlled environment.

We also found that sometimes inconsistent user settings could cause variance. Bookmark files of different size, history files of different sizes could have varying impacts on performance. And we wanted to know exactly what we were looking at every single day. Like I said, is this a genuine performance regression or did the history file get bigger? So wipe those out every time.

And lastly, you know, beyond the sort of big three and, you know, the user settings, it's really a sort of iterative and investigative process. You're going to be running this test, checking out the results. You may see a little bit of variance here and there, and it'll take a little bit of investigation. Maybe there's some variance in the code that's running in your app, et cetera, et cetera, but you can narrow those things down over time.

So, once you have a controlled environment set up, you can start tracking your results. Once you have your test set up, you can start tracking your results. By this, you may be thinking, okay, now's the time when we're going to start creating charts or, you know, getting a database together or whatever, but that's not what we're going to talk about. Really, there are only three numbers that you really care about. Yesterday's number, today's number, and your goal.

I tested yesterday, or I tested before my code change. I tested today, or I tested after my code change. Am I faster, or am I slower? You can make this work on a change-by-change basis. The test is super easy to run. We saw the page load test. And, you know, of course you want to keep your goal in mind because you want to know how close you are.

But, you know, it's not to say that charting your results or, you know, graphs or anything like that are not useful because they most certainly are. Especially to the people who are managing your project. But for you as a developer, when you're thinking about performance, you know, this is a pretty easy way to do things.

When you're looking at yesterday's number versus today's number, or sort of pre-change versus post-change number, it's an interesting thing to actually look at the percentage delta as opposed to the raw number. You may say, OK, well, I only slowed down by 10 milliseconds. Whatever, no problem. But 10 milliseconds, if your test only takes 100 milliseconds to run, that's a 10% regression. So that's huge.

Lastly, it's often useful to compare against a baseline. Let's say your variance in your test is around like 2 or 3 percent, and so you're testing on a daily basis, and you see maybe a 2 percent change here and there, and you think, oh, it must be just variance, whatever.

If it's not variance, and it turns out to be a genuine regression, these things can stack up over time. So it's a good idea to know your starting point and sort of make sure that you don't go above a certain baseline over time. And of course, that baseline has to be adjusted here and there, but it's a great idea to sort of guard against those, you know, sort of small little one-off slowdowns that may creep in if you're not watching closely. So for testing, we want to test before committing. So before committing any code change, set up a safety net, and lastly, stay vigilant.

Now, we have our tool together. We've been testing our faces off all the time. When I check in a code change, we're testing daily. It's not that hard because it's built right into the software. But what happens when we have a performance regression? Well, in this section, we're going to talk about some tips for handling regressions. And we're also going to talk about the lessons that we learned over time in dealing with these sorts of things. So when it comes to handling performance regressions, there's really only one tip that you need to know.

And that is that you should have zero tolerance for performance regressions. So what do I mean exactly when I say zero tolerance? Well, for us on the Safari team, it meant that if we came in in the morning, tried out our safety net, and saw that we were slower than the day before, the source tree was closed until that problem was resolved.

So then what we would do is start looking at all the code changes that occurred in the last 24 hours. Sometimes you can look through that list and say, oh, I think it might be this one that has to do with text layout or this one that has to do with image rendering. Seems like that could impact performance. But a lot of times, this just turned out to be a plain old binary search. Build the tree at one point in time. Test. Check the result.

Build at another point in time. Test again. And that was the way that we were able to find the specific code change that caused the regression that we were looking for. So what would typically happen is, as I said, you come in in the morning, run the test, see a regression. And then maybe I'd send out a little email that says, hey, everybody, there's been a performance regression. Just wanted to let you know. I'm working to track it down, but the tree's closed for now. Cheers, Vicki.

Nine minutes later, I would get this sort of thing. Hey, is the tree open yet? Nineteen minutes later, hey, when's the tree going to be open? Hey, I have these bug fixes in my tree. I totally would like to check them in sometime soon. What's up? Is the tree open? An hour later, people start to get a little ticked off. I hate you, you know, was pretty typical. I didn't want to put that up there, but, you know, people start to get crazy.

So, you know, for me, sending out this email, this was kind of, you know, hard to deal with, but luckily, I had the backing of the boss plus 15. So you can see how that really helped. What happens is, you know, we have sort of this zero tolerance policy. The source tree is closed. People are upset. Let's just say upset. They want the tree open again. So what ends up happening is a performance regression becomes a sort of all-hands-on-deck scenario.

And it's amazing when you have, you know, three or five or seven really smart people looking at a problem. It can get solved very, very quickly. A matter of, you know, minutes at times. So that worked out really well for us. Secondly, no exceptions. Oh, I moved us over to using this new API. Oh, I added this feature or whatever. No, no, no exceptions.

So in dealing with these performance regressions over years and years and years, there were a lot of things that we learned. First of all, we learned that fixing a performance problem is easier and more effective when you know the exact cause. We have all these code changes. This is the one. This one, I don't know where it is. This is it. But anyway, what I mean is, you saw before we started doing the binary search to find out which code change caused the problem.

Once that's identified, I know exactly what to do to speed it up. Or maybe I know exactly what I did wrong to make it slow in the first place. So that's one huge benefit. The second great benefit is, if you're not looking at the exact code change that caused the problem, what you end up doing is you sort of say, okay, we're slower. Let me run shark or whatever during the test, and I'll just knock off the top thing. Well, that sort of knocking off the top thing should really be saved.

I mean, not saved, but it's really useful to do that at a time when you're really focusing on performance. If you do that, sort of knock off the top thing instead of knocking off your change that caused the regression, the change that caused the regression is still in the tree, making things slower. And that's bad.

Secondly, we learned that the complexity of the problem increases as time passes. And when I say that, I mean that, you know, I check in code that slows everything down and nobody notices. So then Ken checks in code that relies on my code. And then John checks in code that relies on Ken's code, which relies on my code. And as time wears on, by time I don't mean weeks, I mean a day. You know, it becomes harder and harder to sort of extricate that problem from your source code. Solving those problems quickly was huge for us.

Thirdly, I mentioned before, you are going to spend, if you're lucky, some time actually optimizing performance. And you want those intentional optimizations to actually be meaningful. By that, I mean, okay, let's say I looked at the page load test and I made it almost 30% faster. And then we're watching performance on a daily basis, so it stays low.

If I'm not watching performance, you could end up with something like this. Your 30% optimization, let's say each one of those dots corresponds to a week. The optimization that you may have spent four weeks doing is gone in about a month and a half. And those dots that are going upward there aren't necessarily huge regressions. They all denote a 5% regression. So it can really creep up on you. That's why you want to be watching.

Next, this one's kind of interesting too. We also learned that by not slowing down, we actually got faster by not doing any work at all. You may fix a bug and actually be fixing a performance problem, or you may fix a bug that has nothing at all to do with performance, and your code change just happens to make things like 2 or 3% faster.

So, if you're not sort of watching things, this is sort of the data or the graph that you would never see. Like, one day we got a little bit faster, but then the next two days we got a little bit slower, and then we got faster again, and then we got slower. Well, if we're watching every day, and we're guarding against these sort of 2 or 3% regressions that can creep in, our graph is going to look more like this.

This is that same graph with all of those little upticks eliminated. And the result is a 10% performance. So, we're going to be able to fix a bug that has nothing at all to do with performance gain without really doing anything at all. And that was really helpful to us.

Another thing, I mentioned this before, new features do not have to impact performance. It's really easy to say, "Well, I added more code. I added this functionality," etc., etc., but those new features, there's no reason they should impact performance. If you've really, really, really investigated and done everything you can to make sure that your changes are having as small of an impact as possible on performance, then one thing that you might want to consider is, let's say my change impacted performance by 20%. I went and looked at it.

I fixed a bunch of things up. Now it's like 5%. I still have that 5% regression in the tree, so what I might want to do is go and try to speed something else up by 5%. You want to keep that baseline steady, even when you're adding features.

Lastly, this was also very important. No sacred cows. Everyone sort of has that piece of code on their project. Maybe you bought the code from another vendor, or it's that code that that super smart guy wrote like two years ago, and no one has touched it since then, and everyone's kind of, you know, scared of it. You should dive right in there. That part of the code is not immune to being slow, basically. So don't be afraid of it.

So, just to sort of recap what we went over with performance regressions, do your best to adhere to a zero tolerance policy and remember all the things that we talked about are lessons learned. So, you know, we went through all this stuff. What actually happened? Did it actually work? Well, absolutely it did. You know, if we look at JavaScript performance during Safari 1.0 development, we improved, we made it 16 times faster. It's a lot faster.

For HTML load speed, we made it 10 times faster. Incredibly fast. When we went from Safari 1.0 on Jaguar to Safari 1.1 on Panther, we got 30% faster. And that was sort of without even trying. This was one of those scenarios where we were just sort of checking in little 2 and 3% speedups inadvertently. And over time, you know, each release is a year. Over time, those really stacked up with us doing a lot of work.

And we were doing almost no work at all. I actually remember, you know, Panther coming out and sort of reading the feedback online and people said, wow, you know, Safari feels a lot faster. And that was great. So, in closing, you really can, I hope I've gotten it across to you today, that you really can develop high performance software by following a few simple principles and practices. You do a little bit of work up front and get some vigilant testing in there. And, you know, also, all the things that we've talked about today are simple enough that you can sort of leave this talk and actually start today. So, that's it.