Development Tools • 28:03
Xcode delivers a number of features designed to help you build complex projects quickly. Distributed builds are essential for accelerating your project compile times, and this session will help you gain an understanding of the latest techniques to configure and optimize your own distributed build systems. Topics will also include strategies for maximizing the value of configuration files, integrating Xcode projects into automated build and testing systems, and using Subversion to automatically trigger building and testing of a complex project.
Speaker: Bill Bumgarner
Unlisted on Apple Developer site
Transcript
This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.
Good morning. So, as Heather said, I'm the middleware manager of Xcode. I manage things like the project model, the build system, the distributed builds environment, what we like to think of as the Chewy Center of Xcode. And the agenda today. You've seen a lot of demos this week. You've seen a lot of shiny stuff. This isn't about shiny stuff. This is about working with large projects, working with complex projects, managing them, dealing with them, and making them, well, making it so you can ship your products.
I'd like to get through the content relatively quickly because I can imagine there will be some questions that will fall out from this. So what really is a complex project? It's any project where you've got multiple targets or you may have multiple projects with dependencies. You may have automated test suites. Maybe you've got some external stuff that's part of your build that Xcode isn't designed to manage. You might have packaging and distribution infrastructure and you're probably going to be using revision control and it's probably just going to be managing more than the project itself.
And, you know, if your project has a framework target or it's used by an app target or an app target and some automator actions or spotlight actions or things like that, you really have a complex project. Even if you build them separately, it's still complex. So let's have a look at a typical complex project. I grabbed the Adobe open-source libraries, which was very kind of them to make available to us.
And this is a project that has-- well, it's got a bunch of project files, as you can see here. It's got a lot of configuration files. If we look at the project itself, you'll see that it's actually quite the deep hierarchy of projects in here. If I dive into here, you'll see this opens up. And I see more projects in here.
I dive into there, and it opens up some more projects. I mean, you can see there's a lot of stuff here. And it's all using dependencies and everything else to build this one from above. You'll see that there's a lot of configuration files in here at different levels. And some of these configuration files are actually referring to a common directory of configurations. So there's a lot of stuff going on. And this is not that atypical as a product evolves. So if I can have the slides, please.
So the first step, let's talk about configuring and controlling the project. Glad I only have one demo because that's a lot of exercise. So configuring the project-- you can use the configuration files. They're just key value pairs of build settings. And it's really a great place to gather the policy and configuration all in one place. And you can use it to, say, set organizational policies. So if your organization wants warnings at a certain level or wants a certain optimization style, stick it in the config file.
They can be applied to multiple projects and targets. You can use what we call up and over references. All that means is that a reference to a file outside of your project directory. So you can share these across multiple projects. They can include each other. So you can have a baseline configuration and then, say, have a debug and a release and a profiling and whatever other configurations you want to lay on top.
Some people are using configurations where they have a demo and a production configuration where the demo version just if-defs out a whole bunch of code. And some examples. This is actually from Adobe's open source libraries. This is their debug configuration. So you can see it's setting the symbol levels, optimization levels, setting the target variant, et cetera.
So, one thing we added in Xcode 3.0 is architecture-specific build settings. And what these allow is for you to have a build setting that's specific to a particularly targeted architecture, be it, say, PPC64, I386. And you use these, well, first, you have to make sure your project is set up for 3.0 compatibility.
There's now a compatibility pane in the project settings window, and in there, you can select what compatibility level you want Xcode to run at. If you select 2.4, then Xcode 3.0 will make sure your project's compatible with 2.4. If you turn it on to 3.0, then you can use this feature.
So what you do is you use the little action menu down at the bottom there to basically add an architecture-specific setting for your particular setting, in this case the auto vectorization. And you can see once that happens, it actually adds a little sub-item in there which has the architecture-specific setting. Very handy.
So moving on to revision control systems. So use them. Even when you're doing loan development, use them. It's a documented timeline of your project's evolution. They're typically-- you can think of them as a specialized file system. For teams, it's a brilliant way to figure out who to blame when something explodes and to watch the history of your project over time.
And there's some terms that come along with this. There's importing, which is just throwing a bunch of stuff into the revision control system for control by it. There's checkout. There's cutting branches, which we'll get into in just a second. Committing, committing your changes back to the revision control system, setting up that history, making sure the changes are tracked. And updating.
Now, with a revision control system, first, use one. I can't reiterate this enough. Even on your machine for your own throwaway projects as you're experimenting with Leopard, use revision control. It's incredibly easy to-- you can set up a subversion repository on your own machine. It's just SVN admin create some path, and you're done. These are critical to effective team interaction and to the management of very large projects. Can't say that enough.
And it is, again-- can't say it enough-- use it for solo development. And it really simplifies the maintenance versus the release versus the pie-in-the-sky future development stuff. Because you can use these branches. So you can have your maintenance release ongoing while your production release is ongoing. And then you can use the features of the revision control system to merge between them.
I'm not going to go into a huge amount of depth on how to do that, because all of the revision control systems out there have been documented to death on the internet. And about 40 seconds in Google will reveal everything you need to know. For Subversion and CVS, there are excellent open source public domain books available-- not public domain, but open source-- that are evolving constantly.
So the other fun part of Revision Control is that it can protect you against, shall we say, creative coding catastrophes, things that might happen at 4:00 in the morning. So you can really, you can like cut a branch and just go wild on that. Do a lot of experimentation. If it doesn't work out, throw it away. No harm done.
And it's important to note that revision control is a catalyst for process, but they don't dictate process. So even though we call it an SCM system in Xcode, which implies process, it's not. It's revision control. It's just a history. So they work well with whatever process you might want to use, except for, of course, the chaotic development process, which I wouldn't recommend.
So, out of the box, Xcode supports three revision control systems. We have CVS, which is really just broken, old, antiquated, and don't use it unless you have to. It's the COBOL of RCS. It needs to go away. There's Perforce, and, well, touching on that, the reason why CVS needs to go away is because it doesn't do atomic transactions.
And what does that mean? That means when you're sitting on the beach somewhere, and you've done your emergency maintenance, and you've just patched five files, and you're doing the commit across a, I don't know, a 14.4 modem, effectively, three of the files go into the repository, and you get hit by a wave, and your laptop's now dead. You've just committed three of five changes with, well, yeah, you've only committed part of the changes. And CVS lets you get away with that.
No real revision control system does. So, yeah, don't use it. Perforce, commercial product, excellent support. There tends to be more of a kind of a dictating of process that comes out of the Perforce community than I've seen with CVS and Subversion, but I know a lot of people that swear by it and use it a lot.
So, you know, great alternative. And finally, there's Subversion, which was designed from the ground up to replace CVS. It was, as they put it, quite succinctly, it's like CVS without the suck. And it defaults to not corrupting your files, unlike CVS. And it's quite well designed, very popular, and has excellent opportunities for extension. And it's Apple's RCS system of choice these days.
So, well, eat your own dog food tends to be the best tasting dog food as opposed to stuff made for other people. So, as you can imagine, we're doing a lot of work with Subversion, and you will see it's supported very well. So why Subversion? It's the atomic transactions. That's really a big part of it.
I can't, you know, again, this is another, I can't reiterate enough. When you go to commit a change across five files, that should be an atomic one single change in terms of how you conceptualize it. And Subversion does do that. It also has zero cost branching, which is just brilliant. What that means is that when you cut a branch in the server, it comes back almost immediately. It doesn't take any additional disk space to speak of. It's only when you start making changes on that branch that it actually starts making copies of the files.
And this allows for all kinds of development processes that are critical on large projects, like, for example, the ability to do what are called task branches, which is when you as an engineer cut a branch specifically to perform a task, get to the end of it, checking in, checking out, doing whatever you need to do along the way, and then you reintegrate that change back into the mainline branch. Vastly reduces risk overall, because you don't have a lot of your complex changes being interleaved with everyone else's. And actually, it makes the history of the mainline trunk easier to read.
And again, excellent automation opportunities, which we'll get to. And, you know, it doesn't corrupt things. It's cool. So in Xcode 3.0, we are improving the SCM, or we call it SCM. It's really a revision control experience. In particular, we've added a manager for different repository connections. As you can see by the little green dot, it actually validates the connection, and you can go to it and see if you can talk to that repository from whatever network connection you happen to be on. Once you have connections configured in here, you can actually then go to the repository browser. Oh, and we also integrate SSH agents, so you don't have to deal with that anymore. Yeah. Thank you.
Um, the repository browser lets you actually browse the live repository on the server. Amazingly enough, Tom, who wrote this, made it work with CVS as well. So you can actually browse the repository, find exactly what you want to check out and then check it out to your local area. At that point in time, it will also open or offer to open any Xcode projects that were checked out.
So it also sets what we call the project source root path. Now what is that? This is a concept that we're adding in 3.0 that basically sets a directory as the root of your project. So everything in that directory and below is considered to be a part of the project regardless of whether or not it's referenced by any of the Xcode projects inside.
And we're moving to basing the revision control operations off that root path so you don't have to worry about anymore making sure every single file is referenced in Xcode even if it's something that should never be built as a part of your product. . And we will be continuing to refine this from now until GM.
So, one of the great features of revision control systems is that you can take action as changes happen. And really, what this allows is it allows the revision control system to be the foundation for software configuration management or source configuration management, which means you can have it do things and take action and validate your changes as they're happening. The hooks for this kind of stuff is configured on the server side.
So this is not something that Xcode-- you do anything about in Xcode. You configure the hooks on the server side in the repository configuration. And you can have hooks that will take action before or after a change is committed. In Subversion, there's also hooks for all kinds of things, like changing properties. Because in Subversion, you can have all kinds of properties associated with files.
Or locking and unlocking, things like that. But what happens is that when, say, in this case, I had a hook configured to detect any line of code that was an NS log and disallow the commit, because I don't want NS logs in my production code, Xcode will actually show the error panel and give you information as to why that hook was fired and what happened. There's also post-commit hooks, which can do things like sending an email after every action or every commit.
This is an example of making an RSS feed of every commit to your Subversion repository. The actual files for this, the SVN to RSS and PI RSS, the URLs are at the end of the presentation, and these are all publicly available. And it's just a matter of modifying the post-commit script that's included with Subversion to add that block of code at the bottom to invoke the SVN to RSS module. This is pretty cool, because this gives you an actual subscribable in mail or Safari feed of all the changes in your project.
So the other thing you can do, though, is you can have the commit hooks trigger build bots. Now, build bots are really cool, because what they do is they'll go out and they'll check out all your source code. And then they'll go and they'll build all the project or projects, and then you can configure them to run all your unit tests, run performance tests, things like that.
And there's a couple of build bots out there. There's build bot itself, buildbot.sourceforge.net, I think. No. I don't remember. Anyway, search for build bot and go get it. There's also cruise control. There's the URL at the end of the presentation as well. There's cruise control, which is another one for continuous integration, which tends to be used by Java developers, but it's not really specific to it.
Continuous integration is the notion that after every commit, you build and test. And what this allows you to do, though, is you can automate the testing across multiple systems and architectures. And as you can imagine, once you have a complex project like this, like the Adobe Open Source Libraries or whatever, you can do a lot of things.
And you can do a lot of things with your applications. Being able to automatically trigger testing across a panther, leopard, and tiger on potentially multiple architectures is incredibly valuable. It's a great time saver. In particular, it means that you can very quickly pinpoint the exact change or window of changes that happened in the repository that broke something or caused a performance regression.
And they're often triggered after every commit, though, of course, if you're in active development with a large project that takes a long time to build, you'll probably want to put a mechanism in so you can coalesce the changes and only have it fire once every, say, three hours or something like that.
So this is an example of a build bot. This is the WebKit's build bot, and this was not a very good day to take a screenshot, unfortunately. So anyway, it gives a one-stop console for multiple build trains. Normally the changes column would have people's names in them. It would be the last commit made. It doesn't here because this was taken during WWDC and they're all here.
So you can see things like post-commit PowerPC Mac OS X, post-commit leaks PowerPC Mac OS X. So they have multiple different kinds of tests they run after every commit. You can see how long it takes to build any of these things, so you can actually see, "Oh, wow, I made a change to the project and now my build time doubled." You can quickly click through to see the actual failures and click through to find the actual build numbers, things like that.
As you can imagine, this saves them a lot of time. They're so oriented to performance. It makes it very easy for them to quickly pinpoint what changes caused a particular performance regression. Now, moving on. Like I said, I'm going to get through this quick so we have lots of Q&A.
So you got your build bots, you got your revision control systems, you got your other things. What about making the builds themselves go faster? Xcode includes distributed builds. It includes two kinds of distributed builds. There's disk CC for casual ad hoc, no administrative overhead builds, but very limited scalability.
As you can see, it flattens out. It's the blue line. Then there's DNB, or dedicated network builds. Requires some administration up front, but with extreme scalability. We've scaled it through 30 machines, and we will continue throwing more machines at it until it levels off, figure out why it's leveling off, and then throw more machines at it once we get it to scale some more.
So this graph here sees that at two volunteers, meaning two machines contributing to the build, they're about the same. Disk CC scales a little better in the beginning, up to about eight or 10 machines. These are Intel machines. On PowerPC, it's about six. And then after that, DNB keeps scaling. Disk CC kind of flattens out. It even starts slowing down eventually.
So, some details on this, because choosing between the two is important, and you probably want to make the choice once and stick with it. DiskCC, really easy to use, but has some limitations. In particular, this CC works by pre-compiling all the source files on your local machine. And then sending that gigantic resulting source file over the network to some random remote volunteer, which will then do the compilation.
That means it's very sensitive to network speed, and as you can imagine, if you're trying to precompile, say, 12 files because you've got six builders that each want to build two things at once, your local machine very quickly becomes a bottleneck. And your local machine needs to be effectively the most powerful computer in the network.
But it's very easy to set up and use, which may be, you know, tempting. And there are some extreme compatibility requirements with it. Each machine must be running the same version of the OS, not just like 10.4.7, but the little build number that you'll see in the panel.
They need to be running the exact same compiler, same build number of the compiler, and that's because of the precompiled header compatibility issues. And all the machines have to be the same architecture, again, because of precompiled headers. Now, dedicated network builds, on the other hand, harder to set up, but great scalability.
Each builder needs to have a static IP address or a resolvable name. We don't support Bonjour with it, and that's because of the way the cache works. The builders need to be stable, meaning you don't want to have a pool of builders where some of them come and go a lot, because it'll cause a lot of cache flashing if you do. However, It's designed to minimize the network traffic required between your machine, the one doing the building, and the building volunteers.
And it's really designed so that you can be sitting on that beach, and now with iChat's little background thing, still look like you're in the office, and be sitting on something like an ADSL or an airport connection and do those builds effectively on dedicated network builds. However, it uses more disk space on the builders, so that's something to keep in mind.
And the local machine really is nothing more than a glorified file server at that point in time, so you can actually effectively use the machine for other things. We get a lot of reports with disk CC that when they're doing builds with disk CC, the local machine is really kind of doggy and chugging, and that just doesn't happen with D&B.
Now, we'll get into the cache in a minute. The compatibility requirements are that all machines must be the same architecture, same CPU architecture, and all machines need to have a similar version of OS X, meaning that it'll generally work with 10.4.6 and 10.4.7, but not 10.4.7 and 10.5, Or with 10.5 and say 10.5.1, whenever that would happen.
So, we should talk about the caching in DNB for a minute. What it's actually doing is when it runs the compiler on the remote machine, it does it in a true-rooted file system that has a special file system underneath that mirrors all the files from your local machine out to that machine. So it's exactly like you were running the compiler on your local machine, it just happens to be somewhere else.
But it goes beyond that. The way the caching actually works is that every machine in the build farm is sort of responsible for a subset of the overall files. And they'll talk with each other to resolve the cache hits or misses amongst each other. So what this means is that actually every file in the build only has to be copied from your local machine once, if at all. And the "if at all" part is that, say, you have the same version of lib system on your local machine as the remote machine. The remote machine is smart enough to get it from the local machine. So it's a simple file system then.
And the builders, like I said, they mirror the files between each other. And the cache is organized by file content, not path. So if I'm building something by a DNB and my path is, you know, volumes, data, sources, blah, blah, blah, blah, blah, and Andy's building something and it's, you know, users, Andy, something else, the cache is still going to be warmed up by both of us and we're still going to take advantage of each other warming that cache. So, that's the way it works. So, that's the way it works. Cache warm up can take a little bit of time.
We see continued speed ups after 10 builds. And we're reengineering this for Leopard. We did this in 2.3, but we were limited by some of the limitations in the file systems on Tiger. And they've gotten a lot better in Leopard. And DNB is improving accordingly. It also uses the FSEvents stuff to monitor changes in the local file system. So if you added a source file and hit save, that will automatically invalidate the cache as needed.
So it's pretty neat stuff. And this is really to illustrate, DistCC, you need a supercomputer on your desktop, and the builders, it's less critical what they are. It's great for an ad hoc network or like office network where, you know, you could have finances computers be contributing to compilations, things like that. Whereas D&B, you want that rack of XSERVs, or we're using racks of Mac Minis, actually, the Core Duo Mac Minis. You want that, you want it with a fast network between those machines, but you don't care so much about the connection back to your machine.
So, with that said, how do you architect or construct or organize a project to take maximum advantage of distributed builds? And one of the issues with the build system, and like I said on the slide, we don't really like it, but it's the way it is today and we're looking to fix it sometime in the future, is that the only thing that gets parallelized in a distributed build is the compilation of source files, not targets and not projects at this time. So the more source files you have per target, the more parallelization you will see.
And DistCC, it's going to top out at about 10 simultaneous compile jobs. So if you have 20 or 30 source files per target, DistCC will do pretty good with that. D&B continues scaling at least 30, so it's going to take more advantage of that and get more source files per target the better off you are.
So, bringing it all together. Like I said, I was going to whip through this. So, complex projects. They're not bad, they're just inevitable. You're going to be dealing with them. Use those configurations to optimize the type of the product build. Per architecture build settings, they're great when you need them. In general, I would recommend just avoiding them just because it makes testing that much more interesting.
You use the revision control systems and use them to track your changes over time. Use some of these additional automated tools to help you to validate and guarantee project quality over time. Distributed builds: You can use the distributed builds to make things go faster. Every architecture you build for adds one compile process or one compiler invocation per file.
If you have 20 source files per target and you turn on Build 4 Away Universal, that's going to be 80. App data, though, don't worry about the size of the binaries because the app data is still going to outweigh them. It's just not going to be that big of a deal. And with that, here's a bunch more information. You can always contact, well, I don't think that email address is going to work, Matt.
Yeah, so Matt will come up and say his email address or something. The Subversion site and repositories, they're excellent. There's lots of scripts in the Hook scripts directories in the Subversion repository. All kinds of different things you can use to automate your stuff with. Xcode itself has the Xcode build command line tool for building products and projects. It can set the target and the target configuration, things like that. So it's very easy to plug that into stuff.
There's the Build Bot. The other Build Bot that I mentioned was Cruise Control. So look up Cruise Control. Actually, better yet, look up Cruise Control and Martin Fowler because he did a wonderful write-up on continuous integration and using Build Bots to sort of keep the complexity of your project under control. And documentation, sample code, other resources that the developer about Apple.com has. So there's a lot of stuff that you can do with it. I think it's a great way to get your