LLVM Compilers In Depth - WWDC 2008

Tools • 57:41

Xcode 3.1 introduces the new llvm-gcc 4.2 compiler based on the open source LLVM.org project. Learn how to use llvm-gcc 4.2 within Xcode, make the most of its new features, and discover how your code can benefit from it. Finally, get a sneak peak at important future directions, including the LLVM/Clang project.

Speakers: Chris Lattner, Devang Patel, Steve Naroff, Ted Kremenek

Unlisted on Apple Developer site

Downloads from Apple

SD Video (359.4 MB)

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.

Hi, everybody. Are we all happy? All right. I'm here to talk about LLVM. I hope you're in the right place. So my name is Chris Lattener. I'm one of the people that are hacking away in LLVM, slaving away to make it a great compiler for you. Today's talk is actually a great talk because we get to have several other people that have been slaving away, hacking away, making it awesome, and they'll be talking more about specific features.

So if you're here, I assume that you probably have some idea what LLVM is. And a lot of you probably went to the compiler state of the union, so I'm going to skip over some of the details of what it is. To put it briefly, it's a compiler.

[Transcript missing]

So, one thing, again, I want to emphasize is that LLVM is open source. And for our integration with Xcode, one nice thing about LLVM is that it uses a BSD license, which means that you can use it in a commercial project and you don't have, you can use it in many different ways that you can't do with a GPL-like license.

And so, as we push forward, we think this is really important. This lets us do a lot of things with LLVM that we wouldn't be able to do if the licenses were different. This is also great because a lot of other companies have taken LLVM and have done great things with it. And if you go to the LLVM web page, which is LLVM.org, you can read about some of that. Several companies that are probably here are probably using it and they may not know about it.

So, today I want to talk about LLVM-GCC, or I want, we will be talking about LLVM-GCC, we'll be talking a lot more about Clang, and we'll be talking about applications of Clang. So, this is old hat. I think it's a little bit of a different thing. So, a compiler is broken into three pieces, the parser, the optimizer, and the backend. When you're using LLVM-GCC, the backend and the optimizer come from LLVM and the frontend comes from GCC.

The great feature of this is that GCC does all the parsing, which means if your code parses with GCC 4.2, it will parse with LLVM-GCC 4.2. By changing the optimizer and the backend, we now get better performance, lower compile times when the optimizer is enabled and features like that.

So, later we'll be talking significantly more about Clang, talking about its architecture, the implications for Xcode, giving detailed timing information and things like that. It's a great project. There's a huge amount of work going on both by us and in the community, and I'm really excited that we'll be able to talk about that.

So, the overall roadmap for this talk is we'll talk about LLVM-GCC 4.2. This is on your Xcode 3.1 DVDs and available on Snow Leopard. And so, we'll talk about performance. We'll talk about link time optimization, how to get the most out of link time optimization, what you can do to your code. After that, we'll talk about Clang.

Clang is this new frontend. There's a whole broad range of things we can do. We think this is going to have a huge impact on your build times. It will let us do a lot of exciting things with Xcode, and we'll talk about that. Finally, we'll talk about applications of Clang. And this is great stuff that it's not shipping yet, and Clang itself is also not shipping yet. But there's a huge amount of things that you can do when you have a compiler's view of the source code.

And with a compiler's view of the source code, you can do lots of fun things and help find bugs in your code, for example, which is one we'll talk about in depth. So, this is enough of me up here blabbing. I'd like to have Devang Patel come up. Devang is, among other things, the guy who implemented link time optimization in LLVM-GCC. So, Devang, please take it away. Thanks, Chris.

Good afternoon. I'm Devang, member of LLVM team at Apple. Today, I'm going to describe how you can use LLVM-gcc to help improve performance of your application. So, let's start. LLVM-GCC is a drop-in replacement for the GCC 4.2 compiler. Both of the compilers are binary compatible, which means you can mix and match object files generated by these two compilers. LLVM-GCC supports C, C++, Objective-C, and Objective-C++ languages. It supports x86 32-bit as well as 64-bit. LLVM-GCC also supports PowerPCs 32-bit, but it doesn't support PowerPCs 64-bit. LLVM-GCC is now available in Xcode 3.1.

LLVM-GCC supports all the standard features supported by GCC 4.2 frontend. In fact, it takes advantage of some of the features more than GCC itself. How? We'll see today. LLVM-GCC also supports link time optimization. We'll discuss this in detail today. This is our very first release. In this release, OpenMP is not supported. Debugging of optimized code is also not supported. Please read release notes for more information.

So now you know that this new compiler is available. How do you use it? It's very simple. You can select this compiler in Xcode's compiler drop-down selection list in the build panel. On the command line, this compiler is installed inside your Xcode installation folder. If you have installed Xcode in /developer, then this compiler is available at /developer/user/bin/llvm/gcc.

This compiler accepts all GCC standard command line options. So now you know you have a new compiler, you can use it. Why do you want to use it? Well, one of the new features in this new compiler is link time optimization. You may ask, what is link time optimization?

Well, traditionally, when you compile your source file, the compiler will generate optimized object files. Linker will take these object files and link it together to create your final application. Now when the optimizer is optimizing your source file, it cannot see across the source file. So the optimizer has to make conservative assumptions. Now what if you can do the optimization at the link time, where you can see everything?

This is known as link time optimization. LLVM supports completely transparent link time optimization, which works across programming languages, which means you can take a C function from one source file and you can align it in a C++ function in a second source file. How does it work? Well, it's very simple. You compile your source file using -o for the LLVM gcc. The compiler will generate a .o file, the linker will take that .o file and link it together with other input files to create your final binary.

However, one thing new in this situation is the .org file generated by the compiler is not native Mac OS X object file. In fact, it contains LLVM bitcode. But don't worry. Xcode 3.1 linker knows how to handle LLVM bitcode files. The linker will use the LLVM optimizer to optimize all the available LLVM bitcode files on the fly and generate native Mac OS X object files, which will be linked with other object files on the input. and others will be linked with other object files to create the final binary. If this sounds complicated, then let's take a look at the example.

Here we have one simple source file which defines two functions: get_number, which returns one integer, and we have a main that invokes the external function do_something. If you optimize this source file, Using highest compile time optimization, the optimizer will not be able to do anything. The code is already simplified.

Now let's take a look at the second source file. This source file has one static variable flag, which is modified by a function called foobar. And finally, this source file defines do something that is used by the first source file. Once again, if you optimize this source file using the highest optimization available to you, the optimizer doesn't have enough room to do anything at the compile time. The code is already in very simple form. But what if we compile only one source file using link time optimization?

The compiler will generate the LLVM bitcode for this source file. Linker will accept this source file. The linker will accept this LLVM bitcode file, and it will ask the optimizer to optimize it. But linker will provide some additional information. Linker will inform the optimizer that this function foobar is not used anywhere else.

Second thing Linker will inform that this function is hidden because we are building a static application where only one function is externally visible main. Now this is very important. If we are building a dynamic library, then Linker will not make this symbol hidden. You will have to do it yourself by using various visibility controls.

We'll see it in detail today. So now, based on this information, the function is not used anywhere else, and it is hidden. The optimizer will simply remove it. Once this function is gone, The optimizer will immediately recognize that this condition is always false. So it is safe to remove this code as well.

Once this condition is gone, the optimizer will recognize that we have a static variable that is initialized but not used anywhere else. So we can remove that also. Now the linker will join the fun. Linker will realize that this function is not used anywhere else, so I can remove that one also. The Xcode linker supports date code stripping.

Now you can see that additional information provided by the linker helped optimizer to significantly optimize this code which was not possible earlier. But we are not done yet. If we compile the first source file using LLVM link time optimization then the linker will provide both the LLVM bit code file to the LLVM optimizer. Now the optimizer will be able to simply in line do something into the main and you will get this. This is the power of link time optimization.

Before I wrap up this example, I would like to point out two things. Sometimes you cannot compile everything in your project. You may link libraries from third party, but try to compile as much as possible using -04. More is better for the optimizer. Second, as we saw with the foo bar function in the example, use visibility control as much as possible. Hide as many symbols as possible. It has one additional advantage. It will help you reduce your application's launch time. There are various ways you can control symbol visibility. You can use static variables, you can use anonymous namespace in C++, or you can use GCC's visibility attributes.

or you can use exported symbol list at the link time. We have one excellent tech note that describes everything. It's very useful. So now you want to know how to use this new technology. It is as simple as you would expect. It's just a checkbox in the Xcode's build panel. On the command line, you just use optimizations in level 04.

So, now let's look at the performance we are gaining using this new technology. Tuesday, Chris talked about the performance improvements we are seeing in H.264 decoder. Today, we will look at the performance for the SPEC 2000 benchmark. This benchmark is a collection of C, C++, and Fortran programs. We are just focusing on C and C++ programs. We are comparing LLVM-GCC's performance with respect to GCC 4.2.

Before we look at the numbers, I would like to clarify two things. Number one, we are measuring the relative performance of two compilers. We are not focusing on getting the absolute base spec numbers for the spec benchmark. Second, your mileage may vary. The Your application may not see the performance improvements we are seeing in these benchmarks. You should measure and analyze the performance of your application.

So let's look at the performance number. Here we are comparing the average spec numbers for these two compilers, gcc 4.2 and llvm-gcc 4.2 at various optimization levels. X axis represents the optimization levels. On the Y axis, we are representing the relative performance. If the performance is as good as, which means if the average spec number is as good as the average spec number for gcc at optimization level O2, then it's 100%. If it is more than 100%, which means the compiler is producing faster code, which is good.

If it is less than 100%, then it is not good. So let's look at what the compiler can do at optimization level O2. At optimization level O2, LLVM/GCC-produced code runs 5% faster. We ask both the optimizers to do additional work. Let them aggressively inline your code, which will enable further optimization opportunities.

Well, both of them performed well. However, once again, LLVM GCC-produced code outperforms the code produced by GCC. You may ask, what about link time optimization? We looked at one carefully created example and see what it can do, but how about real-life benchmark numbers? We expect it to do better, but how much? Well, we are getting 25% performance improvement by turning on link time optimization.

We do not have numbers for GCC at O4 because GCC 4.2 doesn't support link time optimization right now. Now you may ask, okay, 25% performance improvement is really good. It's excellent. But what's the cost? Well, let's take a look at the compile time numbers for this benchmark. Here we are measuring how long does it take to build this benchmark at various optimization levels. Once again, we are comparing it with GCC 4.2 at optimization level O2. If the compiler is building this benchmark faster, the number will be less than 100%, which is good. If the compiler is slower, the number will be more than 100%, which is not good.

At optimization level O2, LLVM-GCC is 30% faster than GCC. And if you remember the previous slide, it produced 5% better code. Now, when we ask both the optimizers to do more work, they take longer. But once again, LLVM-GCC is very, very fast compared to GCC. How about link time optimization?

As you know, at link time optimization, we do all the work that we do at the compile time optimization, plus at the link time, the optimizer works on entire All entire object files in your project. It's an extremely large dataset, so I expect it to take longer. But how much? Well, it is almost as fast as GCC at optimization level O2, which is significant. In other words, you are not spending any extra time and you are getting 25% performance improvement.

So let's summarize the performance numbers. At optimization level O2, LLVM GCC compiles 30% faster and produces 5% better code. At optimization level 04, LLVM GCC is as fast as GCC at optimization level 02, but generates 25% better code. But I'm not satisfied. I want 25% better code and 30% faster compiler at the same time. Why not?

So what can we do? Let's take a look at compile time numbers once again. We saw this slide. When we collected these measurements, there is one component that is common in all measurements. That is gcc 4.2 frontend. llvm-gcc uses gcc 4.2 frontend. So how much time is spent in this frontend?

Now you can realize that LLVM optimizer and code generators are amazingly fast compared to GCC optimizer and code generator. In fact, at optimization level O2, they are almost twice as fast as GCC's optimizer and code generator. So the obvious question I have is, is it possible to use a faster front end? I'm inviting Steve Naroff to answer the question. Thank you. Thanks, Devang.

Hi, folks. So I'm Steve Naroff, and I work on Clang in the LLVM team. I'm going to expand a little bit on the tools old timer to give you some context. It might help you understand my perspective in the talk. In the mid-'80s, I was the tech lead on Objective-C, and after that I went to work for Next and implemented the GNU native implementation of Objective-C, and after that went on to manage the tools effort at Next. And while at Apple, I've worked on several things like Java and the PB to Xcode transition, many different things. So anyway, that's the old timer part, mid-'80s. And I'm here to talk about Clang.

So this talk is divided into our motivation for Clang. We'll talk about some goals. We'll talk about some specific numbers. And then we'll get a little bit into our future plans for integrating with Xcode. The number one motivation we have is fast compilation. Now, as Devang just mentioned, the back end is actually quite fast, even when generating really good code.

Realize that this is a huge innovation. Most C compilers, static C compilers like GCC, when they were developed, the people working on the back end really didn't care about compile time at all. Their main goal was generating the best code, and they didn't really care how long it would take.

So it's really important that today, when we want to use the C compiler in a just-in-time compilation mode or at runtime, that the back end is quite fast. And that has obviously a great side effect on your development time. So we're inspired in the Clang front end to basically marry a great fast front end with the LLVM back end to give you fast compilation, because compilation obviously involves the front end and back end.

Another important motivation is many C compilers don't really give you great error diagnostics messages. The messages that a compiler spits out when you do something bad is its user interface, right? That's the main UI the compiler has to you is to tell you what you've done wrong. It doesn't say much if you've done right, unfortunately. It doesn't pat you on the back. But it's really important that we give you precise location information. I'll show you a little bit more. We'll talk more of that later.

And certainly we want to enable a whole new class of programming tools. Last year we introduced refactoring in Xcode, and it was surprisingly difficult to do. We did it because it was important, it's a great feature, but we want Clang to make doing refactoring and other great programming tools really simple.

So one of the key goals here is to make a drop-in replacement for GCC. We would love to not implement all the funky extensions in GCC, but that's not pragmatic. It's not pragmatic because Apple uses GCC internally extensively, as you know. We compile our entire operating system with it, and we use a lot of these extensions. And even -- there's some code that doesn't even know it's using extensions that is using extensions, and we find that out later. So it's interesting.

We really need to be drop-in compatible with GCC, and we're working really hard at that and have implemented many of the GCC features already. Also we want language conformance. Right now we're doing a great job with C and Objective-C. C++ is to come, and when we do C++, it has to conform to the spec. We're not going to be deviating at all. One second, let me get some water.

So, as I said before, we want to spur innovation for the next decade. I want to talk a little bit about the open source community that LLVM has and Clang is modeled after. It's really progressive. I use that word very specifically. For instance, when people talk about open source, there's this feeling that, oh, open source is good, right? That if it's open source, it's good.

Well, just like with companies, corporate cultures, teams, there are good teams, bad teams, good cultures, bad cultures. And I think the community that the LLVM community has is extremely friendly. It's very open to change, and it's very open to great engineering. Okay? So the leadership that's driving LLVM and that we hope to drive Clang is really concerned about making great software because we feel that great software can be optimized better, can be extended, and more importantly, can be extended by people who aren't necessarily compiler.

Chris Latten, Devang Patel, Steve Naroff, Ted Kremenek So a lot of you use the language, and if you wanted to develop an extension to a tool we're providing, since you know the language, if we're giving you APIs to the function objects and the statement objects, you should be able to, in theory, write your own refactorings that are custom.

Okay? A lot of the refactorings we do today are canned, so to speak. We would like to offer a much more extensible API. If you wanted to do your own searching browser that's custom to your tool, you could do that. There's many different things that are exciting when you have an API that's approachable by software engineers, not necessarily compiler experts. And we've been told by many people who have downloaded Clang that the API is really pleasant to work with.

So I guess I've already covered it. Modularity is critical here. We firmly believe that the only way to really have great performance is through great design. And so we're sticking to that. And, you know, as far as I'm concerned, the C-based tools have stagnated, okay? Xcode is wonderful considering what it has to live with, and I'm going to talk about that later.

But, um, uh, It's just a shame that because the C compilers that are in use today were developed 25, 30 years ago, that they've basically stunted the growth of great development environments. And we believe Clang can basically do -- make a huge step forward in enabling great IDs. Obviously, we're most concerned with Xcode, but Clang can be used in various other contexts.

A good example is Eclipse. Many of you probably have heard of Eclipse in the Java world. Well, it's an IDE that's written in Java and that supports many different languages. Well, it turns out that for Java, Eclipse is a wonderful tool. For C, it's less than wonderful. And part of the reason is that it uses GCC as its plugin. And again, GCC was not developed with modularity APIs, extensibility, all that kind of stuff. So anyway, we're excited about this.

I'm going to shift gears now and talk a little bit about diagnostics. Here's some code that is meaningless. It's just for the example. It has two statements there. Now, when you compile this with GCC, you can see the first statement, it says that there's an invalid type argument to unary star, and the second error is it says invalid operands to binary plus.

Well, the problem is the second statement has many pluses. I think there's three there. And you don't know what it applies to. So with Clang, what's wonderful is the first diagnostic is actually telling you what the problem is. It's saying indirection requires a pointer operand, and int is invalid. So it's giving you the type that it's confused by, which GCC wasn't giving you. Okay.

Okay. Okay. Okay. Okay. Okay. it is telling you exactly what you need to know. The second error is giving you pinpoint control on the plus sign you can see there. And having the range data on all the expressions enables us to do some really great things with error diagnostics. It also enables us to do a much better job with rewriting since the Clang ASTs have precise information on the source code. And I'm going to talk a little bit more about that later. So we're proud of this.

So now let's talk a little bit about performance. This project is an open source project, Postgres SQL, and front end times mean we're taking the project and compiling it with Clang. Let me be more precise. I said compiling. That's not exactly -- well, let me be more precise.

So what we're doing here is we're compiling -- if some of you are familiar with -f syntax only, it basically says do lexical analysis, do preprocessing, do parsing, do type analysis, and build your internal data structures, but don't generate code. Right? So it does all those activities. And for gcc 4.2, it will parse, preprocess, lexically analyze the Postgres SQL project, which is about 665K lines of code in 49 seconds.

Okay? So with Clang, we do it in 21 seconds. So that's about a 2.3% times faster improvement. I want to emphasize that we are doing the same exact thing GCC is doing. We're trying to compete with GCC on its turf, so to speak. No fancy games like I'll be talking about later.

So here's the Xcode front end times. Now, Xcode's a very big project. And, um, it's and I are going to talk about the LLVM-gcc 4.2. It consists of hundreds, thousands of modules. And gcc 4.2 compiles it in roughly -- that translates into 16 minutes. It's 972 seconds. With Clang, we do it in 420 seconds or, let's see, six minutes, right? Seven minutes. So, trying to get a little improvement there.

So, seven minutes, that saves nine minutes. Okay? Now, again, it's consistent. The reason we're showing you a C project versus an Objective-C project, it turns out that Objective-C, like many object-oriented languages, the code usually contains more type information in headers, right? The compilation of your program or the parsing and preprocessing of your program is heavily dependent on the number of headers and the contents of the headers. For example, in Xcode, believe it or not, we push, like, 16 gigs -- 16 gig bytes through the whole preprocessor and parser when we're doing our job. So, it's a tremendous volume of data. And C projects, in fact, don't have the same girth of header information.

So, the fact that we're scaling is quite good now. One last point I want to make on this. So, how do we beat GCC, which has been around for so long, so many people are hacking on it? What could they be doing wrong, right? All we're really doing is using smart algorithms and data structures. It sounds a little bit absurd, but that's all we're doing.

Well, one other thing. This is like the fourth parser I've written in my career, so that does account for something, right? We have experience with this. We've seen mistakes. Chris and I, the team, we've seen different compilers, and we've learned from them. So this is -- it's something that -- I mean, I can trivialize by saying it's smart algorithms and data structures, but there is a lot of experience we've gained through the years, and we're benefiting from it.

So now I'm going to move from time and talk about architecture. What I have here is a diagram of Xcode's architecture today. And it's very high level. But the idea here is the Xcode IDE has a built-in preprocessor parser for C, Objective-C, and C++. And that built-in machinery enables the IDE to do things like code sense, indexing, refactoring.

That infrastructure does not know how to do code generation. So when you ask Xcode to build, it's delegating the code generation, so to speak, to GCC. GCC is a Unix command line tool. When you run it with a source file, if you run it on three files, it forgets what it did the previous time, right?

It's a batch compiler where it will run and forget what it did, run and forget what it did, and so on. On the other hand, the built-in preprocessor actually has some interesting optimizations for basically caching headers and doing other interesting optimizations because it's running in process and does not live and die, live and die as a batch compiler does. So what's wrong with the picture?

The problem is we're parsing twice, right? The built-in parser is parsing, then GCC is parsing. And it's error prone, right? Considering this architecture, again, Xcode does a remarkable job. And since I've been involved in it from its inception, I'm the last person that's going to stand up here and belittle our wonderful tool called Xcode. However, it could be doing a lot more, and I'm going to talk about that in the coming slide. It basically leads to compromises, okay?

For example, if you're going to use a lot If you are trying to find a symbol in Xcode and you can't find a symbol, but your code compiles with GCC, right there is a huge disconnect. It's like, my code compiles, how come it can't find a symbol? Well, the answer is this architecture. Because, for example, getting two preprocessors to agree on exactly the same input is actually really hard.

It's extremely hard. And so the duplication of effort leads to bugs and it leads to considerable engineering effort. As you know, we just added blocks. Well, we prototype blocks in Clang. We wanted to make sure we got them right before we wrestled with GCC. And it would be great to live in a world where we could just basically add our features to Clang.

Because as you've noticed, we've been doing a lot more with Objective-C over the past couple of years. We've added garbage collection, properties, now we're doing blocks. There's a lot of interesting work going on there. And we don't want to have to modify three or four or five compilers to do our job.

So the vision here is that we have an efficient implementation here that uses Clang that will directly generate code from within the IDE. No need to parse twice in process. We can do many things here. We can cache files. We can cache tokens. We can cache abstract syntax trees, which leads to a much more incremental development environment. So the excitement we have here is really significant, and we hope to deliver some of this next WWDC.

Again, the unified parser will give us fewer bugs, and it will greatly simplify our life in making language changes, which to some degree doesn't affect you guys. But if we're going to move quickly and implement the features all of you desire, it is important to basically make our lives easy and simplify our life. And we don't like wrestling with GCC.

So we're using this quite extensively for several internal projects. People are really happy with what they're seeing so far. We're actively fixing bugs. We have some wonderful people in the open source community that have been helping us. I'm just wildly impressed with the quality of people that have been attracted to the project. Some very talented people in universities have gathered around.

And we're thrilled with the progress. C99 and Objective-C are shaping up well. As I said, Clang basically parses all of Xcode. And Xcode is one of the more challenging Objective-C programs on the planet. There were several things that I had to fix in Clang that I didn't even know existed in the language. And since I've been involved from the beginning, that's pretty interesting.

And some of them are bad, and we're filing bugs against GCC in some cases. But again, we're going to be bug for bug compatible if it's going to be a drop in replacement. So anyway, C++ is under development. It's still very early. Again, we have some excellent people across the globe who are helping us. And we really sense that this is the time to have something great and replace GCC. And we hope you agree. So with that, I'm going to bring up an extremely talented person who works with us on this, Ted Kremenek.

Hi, everyone. So we're very excited about Clang, and I think Steve put it very eloquently. So in this last part of this session, I'm going to be talking about other applications that we have in mind for Clang, besides it just being the basis of a great C, Objective-C, C++, and if it wasn't clear, also Objective-C++ compiler.

So the things I'm going to be talking about really tie into our vision of innovations, features we want to put into Xcode one day. But in the general sense, this really speaks to our belief that Clang just provides a strong foundation for building great programming tools. And this is something we think that will continue to happen in the Clang space, and so we're very excited about it. The only caveat I want to say is a lot of these things that we're going to talk about, think of them as more as speculative ideas, crazy ideas that we have on the table that we're actively thinking about, but they're not necessarily immediate feature promises.

So, on to the first crazy idea. As Steve said, faster compilation is the primary goal of Clang. And once you move Clang into Xcode using the same front-end technology used by the compiler, essentially the difference between the command line compiler and the IDE just vaporizes. And there's just a ton of optimizations you can do in this space, and Steve actually mentioned several of them.

Now, Xcode has this feature today called predictive compilation. It allows it to fire off to GCC in the background while you're still editing your code to try and compile files that have recently changed, so by the time you hit build and go, hopefully most of the work has already been done.

But if we move over to Clang, there's a lot more we can do in this space. Essentially, we can incrementally reparse your files, building up the same data structures that a command line compiler would build up. So at any point in time, we're ready to go and generate new code.

And so not only can you just speed up recompiling individual files that have changed, but we can potentially do things like only recompile the functions or the methods that have actually changed, since Xcode will have cached all the information from its previous builds. And this alone will lead to tremendous performance increases when we're compiling.

So the second thing that we're really thinking about is just making a better user experience within Xcode. And if you're familiar with it, there's this great feature called CodeSense, and we also do indexing. So it allows you to do things like auto-completion. You're typing a function that you want to call, and Xcode gives you a suggestion of what function to use. And also you can do things like jump to definitions of a function or a method. So this very quick navigation of your code.

The problem is that whether or not you're using Xcode or Visual Studio or whatever, we all know that the best IDEs out there are easily confused by commonplace language features and C-based languages, like preprocessor macros. And even templates are not really handled very well by any IDE. And the idea is if we move Xcode over to using Clang, if the compiler can understand it, so can the IDE. And then all these features will just be able to get to the code. And so this will just fluidly work together.

And also all this rich source level information will be in the IDE so we can expound and do more interesting kind of behavioral experiences in the IDE. And our hope is with this we can make Xcode the best IDE on the planet for editing C, C++, and Objective-C.

Now the third thing I want to talk about is refactoring. Refactoring is a technology that's in Xcode today, and we're very happy about it. Now if you're not familiar with what refactoring is, it's essentially harnessing the ID's knowledge of your entire code base to do high level structural edits.

So it could be things like renaming variables across a whole project, or moving an instant variable from a class to its parent class, splitting classes apart, extracting functions from a piece of code. So a lot of these kinds of evolutionary kind of design changes that you would want to do, that could be error prone or tedious to do by hand. So it's great if the tools can do this for us.

Now the problem is that refactoring tools basically use the same front end technology as the IDE that they're housed in. And so if the IDE doesn't understand the code, neither will the refactoring tool. And the refactoring tool is going to edit your code. It's going to go make suggestions of how the code should be changed. And if it doesn't understand your code, it's going to go make suggestions of how the code should be changed.

And if it doesn't understand your code, it's going to go make suggestions of how the code should be changed. So if the IDE doesn't understand your code, the transformations aren't necessarily guaranteed to be accurate. And they're not necessarily guaranteed to be complete. And we provide a user interface with our refactoring tool to try and deal with the case that it's not necessarily going to be 100% accurate. And that you make the changes that you want to do.

But on an aside, it's just basically if our refactoring technology can't understand the code perfectly, it also prohibits us from doing a lot of transformations that we could otherwise do. So along with moving Xcode over to using Clang, we'd like to move all of our refactoring technology over to using Clang as well.

And this is going to buy us two things. More accuracy with our transformations. This is going to enable more refactoring opportunities as well. And we want it to be very scalable. Clang is designed to be very fast and memory efficient. And so we'd like to do transformations that operate at the speed of your thought. That have millions or even thousands or millions of lines of code. of lines of code in seconds or milliseconds.

Now the last thing that I want to talk about is a feature that is not in Xcode today. We have refactoring, we have indexing, code sense, and we do a fair amount of predictive compilation. The last feature is something that we are actively investing interest in, and that's automatic bug finding using static analysis.

Now, if you're not familiar with static analysis is, I'm going to take a broader picture first. And that is, wouldn't it be great if the IDE was more than just a great editing environment? What if it could help you be very proactive about trying to make your code better before you even run it?

Chris Latten, Devang Patel, Steve Naroff, Ted Kremenek Now, if you're not familiar with static analysis is, I'm going to take a broader picture first. And that is, wouldn't it be great if the IDE was more than just a great editing environment? What if it could help you be very proactive about trying to make your code better before you even run it?

Now we're already familiar with compilers providing us some kind of feedback in the form of compile time warnings. And so these go beyond compile time errors where your code just doesn't compile at all. These are suggestions of I see something in your code that looks kind of dubious, and if you were to run it, I think something wrong would happen. So here's an actual diagnostic from the commit Clang command line driver indicating a format string specifier in this call to printf that's just wrong. It's bogus.

It will result in unpleasant behavior at runtime. And GCC also emits a similar kind of diagnostic. And the real power of these kinds of warnings is that they're cheap, they can happen as you're compiling your code, or we can proactively check for these things once Clang is integrated into Xcode. And they're really effective at reducing the number of bugs in your program.

But we can go a lot further than this. There has been a lot of work in this area called static analysis where essentially instead of finding a bug that occurs on a particular statement, like in the case of this example, we can find bugs that require a very specific set of events to happen.

So a certain set of branches have to be taken. A specific path through your code has to be taken in order for a bug to occur. And the real awesome feature of this is essentially you're trading CPU time for checking out your program. You know, so this is maybe idle time actually while you're editing. And you're editing your code. So these are free cycles that are available for use. And it really complements the traditional ways in which we try to do quality assurance on our code, such as testing.

Because the compiler basically has an inexhaustible attention to detail. It's fine to just crank away at your code, reason about all the different ways that it could be exercised in order to find those corner cases in which bugs could occur. So I'm going to give you a flavor of a few bugs that we can find with static analysis. This is really just the tip of the iceberg. These are very simple examples, obviously, for the purpose of presentation. There's a lot more we can do.

So this is a mockup of a real bug found in Xcode using static analysis technology built on Clang. So it involves this switch statement where there's several cases where on each case the value of str is assigned a separate string literal. So this is like a dispatch on some enumeration value.

And the diagnostic emitted by the static analysis tool is, hey, I see that this value that you stored to str is just, it's never read later. And what happens is that there's a missing break statement between case 1 and case 2. And what happens is str is just overwritten, so case 1 always looks like case 2.

Having a missing break statement, not having a break statement, is not a crime in itself. There's real reasons why you'd want to structure your control flow that way. But logically, semantically, this is not what the programmer wanted. They have dead code here that just-- it's meaningless. And so by reasoning about the logical inconsistency of the program, we can pinpoint some very interesting bugs in this way. And this particular check for dead stores, we found a lot of interesting bugs in many of our software.

So the other class of bugs which I think really hit home to almost everybody is memory leaks. So whether or not you're writing an app that uses garbage collection or doesn't use garbage collection, any real program has to do some amount of manual resource management. So it could be dynamically allocated memory, it could be files that you open, sockets, you know, whatever. There are always things that we have to manually manage despite having maybe a magic garbage collector to come around and clean up some of our unused memory.

And I think static analysis can go a long way to catching misuses of resources, leaks, using a resource after it's no longer available, because we tend to use these resources in very stylistic, idiomatic ways. So this particular example involves some objective C code that I mocked up. It involves allocating an object at the top, an NSMutable dictionary object, then we decal some method, and then on some error condition we go, oh, well, I don't know what to do, so I'm going to return. And therefore there's actually a leak here because the reference count associated with the dict object isn't actually decremented.

So if you're not familiar with this idiom in objective C, these are reference counted objects. And so this kind of error -- you have to correctly use retain and release in order to manage the reference counts. But because we use retain and release, we're not going to be able to get the reference counts. So if we use retain and release very stylistically, it's very easy for static analysis to catch.

And you can imagine the same kind of checking could apply to things like new and delete, malloc and free and so forth. So I think there's a lot of kind of memory leaks and other related bugs we can find with static analysis. And that's just the start. I think there's a lot of things that we can check. Security bugs, API rules, crashes, assertion failures, data races. There's a lot of opportunities for static analysis to greatly improve the quality of our code.

So there are a few design goals that we have in place for building a tool that would be useful for you. The first one is usability. The problem is that bugs in the real world can actually be fairly complex. They can require a convoluted path through your code to actually trigger.

The one thing about Clang is we have very rich source level information. And this is going to enable us to give very precise diagnostic about in this file, at this line, maybe even within this macro expansion, something bad happened, pinpoint you to directly where the problem occurs. Now the other problem is that some bugs found by static analysis aren't real. I think the notorious term here is false positives.

The problem is that static analysis is reasoning about complex runtime behavior without running your program. And so it's just not perfect. Our design goal is to address this by trying to have as few number of false positives as possible, because otherwise the tool is just useless. If you just get too much noise, you don't want to invest any time in using the tool at all.

Now the second thing is Objective-C support. Clang supports Objective-C, C, and eventually C++. So anything that Clang supports, the static analysis technology I'm talking about, should also support it as well. The Mac and the iPhone, Objective-C is the language to build applications. And so we're really interested in having strong support for Objective-C. And there's no other static analysis tool out there that has any support for Objective-C. Now, Objective-C is a highly dynamic language. This is not necessarily -- this is not really going to be a problem for static analysis.

The things that we're mainly concerned with is that we have these huge APIs. You're using them to develop apps. We think we can systematically enforce a lot of the rules involved with those APIs by using static analysis. So as you are typing your code, you can be less concerned that you're just doing the wrong thing because you don't know some hidden rule.

The static analysis can be there to try and enforce those hidden rules that you don't know about. And these can be things from, you know, very complicated rules that you must call one method after another, to even very simple rules that you can't really know about. things such as you should never pass nil to a particular method.

The other thing we're very concerned about is memory management errors. The phone is all reference counted. There's no garbage collection there. And even if you use garbage collection on the Mac, there's some special rules you must obey in order for the garbage collector to work correctly with your code. And so we think we can enforce a lot of these rules using static analysis as well.

So where are we now today with this? This isn't vapor. We've made a strong initial investment in static analysis based on Clang. It's probably one of the biggest applications we've developed so far. We're actively researching and developing it. We actually have an alpha, maybe beta quality memory leak checker for foundation objects. So that retain release bug, that's an actual diagnostic that would be emitted by the tool. And we've used it and found and fixed bugs.

In the phone, in Xcode itself, in the Mac OS X kernel, and various frameworks that we have available at Apple. So you already are benefiting from the effects of this technology. Now while we're not shipping it today in Xcode, it's not available at developer.apple.com. It is, like the rest of Clang, 100% open source. And you can download and try it out today if you wish by going to clang.llvm.org.

The only caveat is this is a very early stage tool. So that's it. So we talked about a lot of things today. We talked about what LLVM is. And we talked about what we're delivering to you, what's built on LLVM. The Vang talked about LLVM GCC and how it's a truly awesome standalone compiler.

It can give you both faster compiles on much of your code, and we think it's going to generate much faster executables for many real workloads. You need to try it out to really measure its benefits to you, but you can easily try it out within Xcode from the drag down option, just like the compiler you're using, or you can use it from the command line from developer user bin LLVM GCC.

Steve talked about Clang, this vision for this new C front end that we're building. It's more than vision. We have a really active code base, and we can parse almost all of C and Objective C. We're actively developing it still. There's a rich developer community around it, and we're very excited about the new kind of tools it's going to enable us to build.

Better refactoring, static analysis, and most importantly, a better Xcode. And the great thing is all of this is open source. The LLVM project you can visit by going to LLVM.org, and Clang has a specific project for more information at clang.llvm.org. And if you have other specific questions, please contact Michael Jurowicz, our developer tools evangelist. He's also here to answer other specific questions if you want about this session and related material.