Tools • 1:01:37
Xcode 3.1 introduces two new compilers for Mac OS X: GCC 4.2 and LLVM-GCC. Learn how the new security and performance improvements in GCC 4.2 can help you produce better applications. Understand the innovations in LLVM-GCC, and find out how you can use it in your own testing and development. Finally, get a preview of future compiler developments.
Speakers: Francois Jouaux, Eric Christopher, Chris Lattner
Unlisted on Apple Developer site
Downloads from Apple
Transcript
This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.
Hello, everyone. Good afternoon. And welcome to the new compiler technologies on future direction session. This session is for everyone that cares about their compiler. And everyone should care about their compiler. And it looks like you got the message because you came in a very large number to this session.
We believe that every developer on Mac OS X should care about their compiler. What should they care? Well, compilers do their job in a more or less great way, depending on the configuration they are passed and the characteristics of the system. We believe that there is no easier way to squeeze out the last 10% of the performance of your killer applications than by tweaking your compiler.
This session is going to... Oh, okay, I already clicked on that. Perhaps you're after improving your performance. As Andreas talked about in the State of the Union, perhaps you just care about the pure speed of your execution code. Or is it more about minimizing the memory footprint of your application on a cool new iPhone? or compilation time in your build form.
Are you looking after the latest security features that the compiler could provide? or CPU load? Do you care about load balancing the CPU load on your multi-core architecture? With the right configuration, you can get the most of your compiler on our platforms. It should be quite easy to achieve this. We have a few guidelines we can help you through. And this is what this session is about.
So our agenda of the day is quite simple. First, we are going to take an unusual look at the iPhone SDK from the compiler angle. Then we will move on to Mac OS X and the great compiler choice you have in Xcode 3.1. will go into more detail on GCC 4.2 and on the brand new LLVM compiler. For once, you'll get forward-looking statements about our compiler strategy moving forward beyond Xcode 3.1 and Snow Leopard.
This session is shaping up as one of the best compiler sessions we've had in a long time. We have tons to talk about. and I'm really proud to have two excellent presenters, which also double up as experts in their fields. Eric Christopher and Chris Lattner. But you can judge by yourself because it's time for Eric Christopher to come up on stage now and talk about the iPhone SDK compiler.
Thanks, Francois. I'm not really sure I can quite live up to that glowing recommendation there, but I would like to start off by talking a little bit about iPhone OS development and really what it means. Now, whether or not you're new to the Mac platform or whether you're a seasoned developer, a lot of you are here trying to figure out how the compiler and the hardware inside the iPhone is going to affect what you do.
So I'd like to really start off a little bit by talking about the iPhone internals, the ARM processor, and how it is going to affect the code that comes out of the compiler and what you do about that. So the processor itself is pretty simple. It's an ILP32 processor.
What that means, integers, longs, and pointers are all 32-bit. It's a little Indian, just like the x86. It's a low-power risk processor that's heavily used in mobile devices. It's a little bit more complex than the ARM processor, but it's a little bit more complex than the ARM processor.
And it supports two computation modes, ARM mode and thumb mode. So, all right, well, what are those? The ARM processor mode is the larger of the two set of instruction sets. It's a 32-bit instruction set. It has access to all 16 registers on the chip, and it has direct access to the full instruction set. This means that you have no penalties when trying to do anything within ARM mode.
Then there's the thumb processor mode, which has 16-bit instructions, access to only eight registers. Well, still more than x86. And it only has indirect access to some of the instructions. So there's a little bit of an overhead involved in doing floating-point access. One of the really nice things about thumb mode is that thumb code is generally 25% to 30% smaller than ARM code. This is huge when you're talking about running some code on an embedded device.
Thank you. Now, this smaller code is the big reason why we decided to make Thumb the default mode for the iPhone SDK. Smaller is faster in an embedded system, and smaller code uses less system memory. System memory in the iPhone is the most important system resource. There's no virtual machine, which means there's no swapping.
So you want to try to keep your application as small as possible. All right. When do I want to use Arm? Arm you're going to want to use when you have either a lot of floating point calculations or routines that need lots of registers. You probably don't run into the second very much, but you will run into the first.
So, Let's talk a little bit about the specifics of the SDK here, so we can decide how everything shapes up. Well, first of all, it's GCC 4.0. It's GCC 4.0 for the ARM processor. We do support both modes I've just talked about, and it's going to be compatible in just about every way with the Leopard GCC 4.0. One of the big things, and you'll hear about this later in a lot of the iPhone sessions, is that there is no Objective-C 2.0 garbage collection on the iPhone. Therefore, at the compiler level, we don't support this either.
That said, all of your portable code should just recompile and work, with a few exceptions, you know, inline assembly, Indianess assumptions, all the things that if you were here for the PowerPC to Intel transition, you've already had to deal with and suffer through, and so your code already handles that.
There are a few tweaks that you can do to your code from just the toolchain side to make sure that you can get the best performance out. One of which is, as I said, there are gonna be times that you're gonna wanna use ARM mode. We only handle ARM on a file level basis.
What this means is that to use ARM mode, you're gonna have to take all the code out of your application that you want to use ARM mode, and you're gonna want to stick that into a separate file. And then within X code, you're gonna want to use the UI. You're gonna want to uncheck compile for thumb. And then all of that will be compiled in ARM mode. And then at runtime, it will swap between thumb and ARM mode for that invisibly to you.
So back to that precious system resource of memory. So if you're programming in C++, there are a couple of very, very easy things you can do to help optimize for memory. One of which, you can avoid C++ exceptions. And you can avoid runtime type information. Now, these two things are very, very helpful in reducing the memory used by your application. Because they have some non-significant overhead of just sitting in memory. They're not significant overhead at runtime, but they are significant memory overhead. So if you can, you'll want to uncheck those two boxes so you're not using either of those. language features.
So that's a little bit about the iPhone SDK. There are tons of sessions here during the week that'll tell you about the more specific things for the iPhone. You know, the APIs that you'll be developing too, the various different things, different in OpenGL and all that kind of stuff.
But from the compiler side, everything should just continue to work just fine, which is one of the nice things about using the same compiler as we transition between the Mac development side that you're used to and the iPhone SDK, and it's warranted. embedded side. Now that said, I would like to switch gears a little bit and talk about Mac OS X development.
We have a huge array of compilers in Mac OS X now. To go over just some of the evolution, we've had GCC 3.3, which was quite some time ago. We GMed GCC 4.0 in the Tiger timeframe. And then, and now at the conference here, last year, we gave a beta of GCC 4.2. This year, with Xcode 3.1, it's going to be a GM compiler. And it's going to be the default compiler that we're using for all of Snow Leopard.
and it would be a lot of great features. However, we haven't been sitting on our hands there either because we also have the LLVM compiler development side of things, where also in Xcode 3.1, we're shipping LLVM GCC 4.2. Now, this has a GCC 4.2 frontend and an LLVM backend. And one of the other things that we've been working on is something called LLVM clang.
Now, this has an LLVM frontend and an LLVM backend. Now, I'm not going to talk much more about those today. However, Chris Lattner, when he comes up here, is going to talk in much more detail about them. Just wanted to whet the appetite a little bit and talk about where we're going. So with all these compilers, probably your biggest question is, well, what compiler do I want to use? It comes down to a couple of questions. One, you want to target Tiger and later. Well, there's an easy choice there. You want to use GCC 4.0.
and Eric Lange, and I'll be talking about the new features of LLVM-GCC. So, the new features of LLVM-GCC supports everything you need to target Tigran later. However, you want to target Leopard, you want to use some of the cool new features that we've been talking about here. You want to use blocks, you want to use all these different features I'll be talking about later. You want to use one of the GCC 4.2 based compilers, either GCC 4.2 or LLVM-GCC 4.2. It has everything you need, and it's quite a bit more advanced.
I gave you a big landscape, a big development, but a picture is really worth more than a thousand words, hopefully. So, let's talk about the compiler progression that we've had, just to emphasize where things have been, where things are now, and where things are going. So, back in the olden days, we had GCC 3.3. It GMed about Panther timeframe, and we've moved it through all the way through Xcode 3.1, where we're going to leave it.
Then we have GCC 4.0. GMed in the Tiger timeframe. was our default compiler for Leopard. It's going to be supported for the coming future. Today at the conference, we're shipping both GCC 4.2 and LLVM-GCC 4.2 when we release Xcode 3.1. Now, we've had some default compilers over time, and this kind of gives you an idea of where we've been going as we continue our way down to the bottom right. And as I said, GCC 3.3 is no longer supported. You really, really do need to move. But hopefully I'll provide a couple of compelling reasons for you to want to do that here now, as I talk about GCC 4.0. Now, GCC 4.0 was our leopard default.
It is still supported now. It's a great upgrade path from 3.3. There's a porting guide that some very, very talented people have on the ADC website that gives you a lot of information on what you're going to want to do to upgrade from 3.3 to 4.0 or 4.2. Provide lots of information there. However, there are going to be no new features added to GCC 4.0. It's nothing else. And we're only going to be making the most critical of bug fixes that are necessary. Very, very critical bug fixes only.
However, as I said, we did add some features. In particular, we added some security features to GCC 4.0 that are around now for GCC 4.0 and later, especially in GCC 4.2. So we added these security features around the Leopard timeframe. And since security is our continuing emphasis here at Apple, I'd like to talk to you a little bit about them. Now, with security, there are lots of different approaches you can make, but there are no silver bullets in writing secure applications.
There's no process you can say by doing this, by doing this, by doing this, my app is magically secure. However, the compiler can help you out in a few different ways. For example, we actually have stack canaries in GCC 4.0. Well, okay, how do those help? Well, what we've got here is your standard buffer overflow problem, where we've got a buffer and we're copying data that our conceivably malicious attacker has decided to put on the system.
All right, what does this look like? OK, here's our stack frame. So here's our little small buffer there that's fixed length. We can put up to 32 characters in it. Wow. And then we read from the outside, where our malicious attacker has decided to do something bad. What he's going to do is he's going to overwrite the return address and use that to execute arbitrary code on the system.
and Eric are here. So, we have a huge, huge problem. He can execute as you, or if you happen to have one of those increased privilege applications that you're writing, he's executing as root on the system. You really don't want to allow this in your application, so you want to try to find some way of fixing is also known as a sentinel, if you've heard that term before. So in our execution, we again have our buffer here. And then our malicious attacker comes in. And yeah, our canary isn't very happy. He's run out of fresh air.
So what happens with this and with the canary is that it's a random number generated at compile time, or run time, I'm sorry, on the system. And it's checked before we return from the function. So what happens is that instead of using the return address that was previously listed, well, When we find that there's been a problem and you're going to end up having a security violation, you really want to abort the program to minimize the risk to your user. So we're going to go ahead and abort your application there.
It's much better than having malicious code and viruses and everything else running on the system. So, stack canaries are particularly easy to use. It's a simple command line option, checkbox hopefully soon, that you'll add via the Xcode UI. You just use MySF stack protector, it'll go ahead and enable it, and for those of you worried, it is a very, very small performance impact to do this. It's mostly not even worth talking about. However, there are a few things that we're not going to catch using this. We're not going to catch if, for example, your data is not on the stack.
Therefore, we've got another tool for the Arsenal to handle these kinds of situations. It's called object size checking. It'll help protect against non-stack overflows, where you're just going to go ahead and write across memory or right across your passing of your data. So if you happen to have your fixed length buffer again there in your more global memory, what's going to happen is we're going to rewrite that string copy routine into a checking version, where as you can see, we're passing an extra argument with the size of the buffer.
and Eric Lange, and what happens is that either at compile time if we can, or at runtime, we're going to see whether or not we can determine that you're writing past the end of your buffer. And if you're writing past the end of your buffer, well, we're going to do the same thing we did previously and abort your application because it's not safe. At compile time, if we can determine though, it's an even nicer thing.
It will give you a wonderful warning so you can say, "Oh, I've got a fixed link buffer here. There's a problem." Now, there There are some things that-- Well, we're not going to catch. Like, if you decide, I hate those standard C library routines, I can write a better one myself.
I would like to mention as an aside, it took me two times to write this little for loop. So I'd really suggest you use the standard C library routines. There are a few different reasons you're going to want to do that, not the least of which it minimizes the amount of times that you'll make some mistakes.
One, we can check these calls if you pass in the correct options, which I'll give you in a second. And these are optimized by Apple for every CPU we ship. That means if you're using the standard library routines and we ship some new whippity-doo machine next week, no forward-looking statements here, I have no idea, the string copy on that machine will be optimized for that CPU already.
So it's a much better idea that you want to use all the standard library calls so that we can have all this checking so we can have the performance. Now, object-size checking is also very easy to use. It's a simple preprocessor macro that you'll throw in. This communicates into libc that you want to turn on object-size checking, which communicates to the compiler that you want to use the checking versions of lots of things. It's very handy. And as I said, it is a couple of tools in your arsenal that you can use to help make your applications a little more secure.
So we made sure that these were in 4.0. They came from a little later in the compiler, and they're also in GCC 4.2 as well. So I'd like to spend about the rest of my time up here talking a little bit more about GCC 4.2 and why you should move there as you're migrating off of 3.3 to at least 4.0 and hopefully into 4.2.
So 4.2, as I said, is our Snow Leopard default compiler. I mean, Snow Leopard is built with GCC 4.2. It has numerous front-end improvements. We've got new parsers, better C++ language conformance, and all the cool new features that you're going to hear about here at the conference, like blocks.
We also have a huge number of performance enhancements going on behind the scenes. Now, they are behind the scenes, so here's a huge laundry list of them. Now, there are tons more of them than I've listed here, which, you know, dead code elimination, inter-procedural optimizations, auto vectorization improvements, visibility improvements.
As I said, lots more. If you want the full list, what you'll really want to do is take a look at the release notes that we're shipping if you want to get more information. Now, I did talk a little bit about here about visibility improvements, and visibility is usually an obscure enough topic that it might be worth talking about it in a brief introduction here.
What visibility is, essentially, is how do I control access to my symbols with my source file, my library, or my executable? So what this means is that you don't want all of your symbols to be visible everywhere. The minute you export a function outside of your library, outside of your executable, someone's going to use it. And therefore, especially if you're like Apple, you're going to be supporting it until probably the day you die.
So, on the Mac, we have three available visibility levels. Up here at the top, we have static visibility. This means that it's only visible within a file, and it's all you have to worry about. Now, down at the bottom, we have whole program, or more the point, external visibility. And this is what you might want to take careful care, because you don't really want to have all of your functions visible from the outside.
One of the things that GCC allows is what's called hidden visibility, which basically means that it's only visible within a library, which is very handy when you don't want to write huge export lists for your C++ code that has all sorts of name-mangled functions that you get to enumerate based on guessing, mostly, whether or not you want to actually export those.
So the visibility improvements that we've done in GCC 4.2 are very, very handy for these sorts of occasions. Basically, a lot of the functions you're not going to want to export end up being things that take internal types to your application, private types. So if you mark those as visibility hidden, the compiler is just going to go ahead and automatically hide all the functions that take that as a parameter.
It's very handy. It helps minimize the amount of time you're going to want to spend looking at export lists and doing all sorts of other things. So it's kind of a new feature. It's also a new optimization because it helps us generate code a little differently. So it's very handy to use.
We get an awful lot of feature requests for GCC. We get more feature requests than anyone would have time to look at and deal with. However, probably one of the most popular feature requests we get are, "I have to turn on warnings everywhere. I've got all of these warnings and I want to turn on errors, but I only want to turn on errors for this warning or that warning or this other warning over here." So, let's say you've got this example here where you want format security and format warnings. Now, format warnings are nice, they're handy, they're useful, good coding practice. Format security is a different matter entirely. It's a security warning. You want to know any time something like that happens immediately.
But you don't want to turn all of your warnings into errors using -wError. One of the nice things that GCC 4.2 allows is that you can use -wError equals whatever the warning name is. So, that will go ahead and turn only those warnings into errors. It's very helpful for when you want to turn a specific class of warnings into errors but not have to worry about it for all of your warnings.
Lastly, I'd like to talk to you a little bit about multi-core computing. So, Apple is shipping four and eight-way machines today. I don't know what we're shipping next year. It could be huge numbers of cores on this ship. Intel has announced 80 core chips running around. The problem is, is that direct P-thread programming is hard. You run into synchronization issues. You run into deadlocks.
There are all sorts of options on OS X that enable you to do better parallel programming without having to worry about directly using Pthreads. So let's take a look at a few of Well, there's OpenCL, there's NSOperation, there's Grand Central Dispatch. The real problem is that these all require you to restructure your code. They are wonderful if you're writing new applications. They're wonderful if you have the time to restructure your code.
The problem is, is that you may not want to do that, or you may not have time. So, can the compiler help you out here a little bit? Can you throw me something here? I'd like to talk to you about OpenMP, which is compiler-directed parallelism. It's easy to use. It's an open standard, which means it's portable to other compilers that implement it. There are tons of compilers around the world that implement it. You can migrate your code easily that uses OpenMP to GCC 4.2 as well.
As the graph shows, it's also amazingly scalable. So I did a simple Mandelbrot calculation. I parallelized it with OpenMP. And as you can see, I'm getting almost linear scaling per This is amazing. It's very handy, very easy. Well, okay, it's easy. Example, example. Well, this is how easy it is.
So you've got your simple for loop here that's doing some sort of pixely manipulation. It's just straight line code, normally. And you decide, well, let's parallelize it. All right, what's this little thing mean? So pal and pragma OMP is just the standard preface to every OMP directive. Parallel for means I've got a for loop, parallelize it for me. That's all there is to it. You've just parallelized this for loop. It will go ahead and spawn off additional threads as we go along. Well, OK.
How does it work? It really does work just like that. You're executing your straight line code here, you fork off a bunch of threads, you do all of your computation, you join them all in, and you start executing your straight line code again. And what's great is you can do it over and over again.
So you can parallelize every for loop just by a simple pragma and go through. Well, okay. and Eric Levy are the founders of Xcode. They are the first developers to have implemented Xcode in their own software. They have been working on a new application, and they are working on a new application. The new application is called Xcode.
It is a new application that is being developed in the Now, with OpenMP, we're talking about profitability and safety in parallelizing loops. Profitable means you want to have a long enough running loop that's worthwhile to parallelize. Safety means that no iteration depends on another. As an example of something where, well, we have a loop iteration that does depend on something else, we've decided to parallelize this outer loop where we have these two nested loops. So J here is shared among all these loops. Well, there's a problem here. We violated our safety property.
Now what this means is that since all of our loop iterations modify J, what we've now done is we've decided to take this nice single running loop and introduce synchronization issues. So we're not just writing in a straight line. We're writing at almost random times into this array where we have all of our loop iterations modifying all over the place, and it's corrupt data. You're going to wonder what the heck you did wrong.
Well, there's a very, very simple solution to this. J is private. All you do is say that J is private to every thread, and what this does is it makes sure that each loop gets its own copy of J. It's a very simple way of making sure that your nested loops can also be parallelized simply, easily, and safely.
Now this is just a brief overview of what OpenMP really is. OpenMP basically is a set of compiler directives that you can use to parallelize your program much easier. They involve things like how do you want to parallelize, how do you share your data, how do you synchronize your work, and there's a lot more to it. It is an open standard. It's published on the web at OpenMP.org, and it provides tons of information on how you can parallelize your programs easily with just a few simple lines of code.
So, GCC 4.2 is a new major version that we're going to be using in Snow Leopard. and Eric Cronin. It is our default compiler. It also means it is binary compatible to previous OS releases. This is very, very handy. All of our compilers are binary compatible. You can mix and match and switch easily. We also have these symbol visibility improvements that I was talking about that help you reduce the amount of symbols you are exporting from all your programs, which also have a side effect of helping your performance by reducing load times.
gave you a feature, warnings that can be upgraded to errors easily, a laundry list of optimizations, and OpenMP. Now, there's a lot more to GCC 4.2 than just this. I really encourage all of you to see the release notes for more information on exactly what we've been doing. Now, in addition to all of this, we haven't really been sitting on our hands here at Apple, so I'd like to invite Chris Lattner here up on stage to talk about LLVM and all of the exciting work we're doing there. It's really great.
Thanks a lot, Eric. So I'm here to talk about LLVM. My name is Chris Lattner. I am the architect of LLVM. I also run the LLVM group here at Apple. So first I thought I would talk about what LLVM is, because it's something that's often misunderstood. The two letters VM has some very interesting connotations that sometimes confuse people. So if you boil LLVM down, it's really a framework-based approach to building compilers.
Okay? And Apple does a lot with frameworks, so this should be something that's somewhat familiar. But it basically means we have components for building compiler stuff out of. And so, for example, we have components for building optimizers and code generators. We also have components that plug into GCC, this is our LLVM GCC front end.
We have components for doing native LLVM front ends, and so Clang, which I'll talk about later, is built on that. LLVM has a lot of other pieces, like a JIT compiler and other things like that, that are used elsewhere in the system. But I'm going to be talking mostly about static compilation, or just drop-in batch programming like you do with GCC.
So LLVM, of course, is also an open source compiler. Open source is very important to us. If you go to LLVM.org, you can see a lot of information. I apologize a little bit for the web page, it's not the best organized, but it gets some of the important points across.
LLVM is widely developed, and it's widely in use. But it's not the best organized. But it gets some of the important points across. LLVM is widely developed, and it's widely in use. by a lot of different people. We have industry people like Apple, we have research groups and academia doing interesting things. We also have a lot of individuals who contribute on their spare time.
And so if you go to the webpage, you can find out a lot of this information. So I thought I'd give a brief introduction to where LLVM came from and then we'll talk about where it's going of course. So LLVM actually started in the year 2000. When it first came out, when we first started, it was basically a research project. And it was a research project in the most traditional sense of it's broken and useless and has a lot of potential, but it's not really something that you wanted to use.
In 2003, LLVM finally became usable to a certain class of people and so we had an open source release and this was the 1.0 release of LLVM. This again wasn't super usable, right? It was basically at the point where other researchers and other people in related research groups could use it to hack on it and improve it and do things in specific areas.
However, LLVM has progressed at an incredible pace. The project has a release every three to four months typically and these releases keep improving the compiler in a lot of different ways, pushing performance and compile time and compatibility, all kinds of different things. And so it's really been making a lot of progress very quickly. In 2005, Apple started hiring people to work full time on LLVM and has made a huge commitment and investment in LLVM.
One example is that in 2007, the Clang project, which was actually started by Apple engineers at Apple, was open source and released back to the project and continues this development and it's maturing in the open source community. Finally, in Xcode 3.1, we're of course shipping LLVM GCC and I'd like to talk about that today. and Eric Cronin.
We're going to be talking about the new LLVM project. One interesting point about this is that Apple is actually the single biggest contributor to the LLVM project. Apple does a lot with open source. Open source compilers are very important, and we're driving the LLVM development. We think this is a really important thing.
So today, to start off with, I want to talk about what you can use today, and what you can play with, and how LLVM is useful. So LLVM-GCC is a compiler. It's a compiler just like GCC. In fact, in Xcode, there's a drop-down compiler selection box where you can actually pick the compiler you want to use. You go to Build Settings for your application, and you can pick between GCC 3.3, GCC 4.0, GCC 4.2, and LLVM-GCC 4.2.
If you're on the command line, LLVM-GCC is a little bit well-hidden, I'm told. It's in the developer user bin directory. And developer is wherever you choose to install LLVM-GCC. But you can run it directly from the command line, just like GCC. If you're using Makefile, of course, you can replace your CC commands and things like that. And it works just like a GCC-compatible compiler.
[Transcript missing]
So one question now is, you get the basic idea that LLVM is a compiler. And Eric, of course, talked about how GCC is a compiler. But what does it really mean to be both things, right? So I thought I'd talk a little bit about how compilers work and how these pieces fit together. So if you look at a well-designed compiler, a well-designed compiler, and, you know, every compiler fudges a little bit here and there, but a well-designed compiler basically has three pieces. It has the parser, the front end.
The front end of a compiler is the piece that is really in charge of looking at your source code. It parses the code. It analyzes it for correctness. It determines what it means. It builds a syntax tree, right? And this is the part that emits warning messages and errors.
[Transcript missing]
So if we get back to LLVM and GCC, GCC is designed very much like this picture. It has a front end, which uses the GCC parsers, has an optimizer, and has a code generator. So LLVM-GCC is basically a modification of GCC, where we take the GCC optimizer and the code generator and replace them with pieces from the LLVM project. Now LLVM is designed to be very modular, and so this is a pretty straightforward, well, relatively speaking, fairly straightforward thing to do.
But basically this means that now we have the same front end, the same parser as GCC 4.2. This means that all the logic for instantiating templates and all the name resolution and objective C and all that stuff is the exact same between GCC 4.2 and LLVM-GCC 4.2. We also reuse the same runtime libraries, lib standard C++, lib GCC. All the stuff is exactly the same.
So furthermore, we've worked really hard to make the back end and the optimizer as compatible with GCC as possible. What this means is that you can actually take two different files and on a file basis mix and match your code with GCC, and all the different versions of GCC are compatible, and so you can mix and match LLVM-GCC compiled code with GCC compiled code on the system.
Now this is actually really important because you may not necessarily be mixing and matching on object file boundaries, but you're very likely to be caught up in the process of making a compiler. So if you're going to be calling into code compiled by some other compiler, and so if you have, you know, you build your application with LLVM-GCC, you really want to be able to compile into a foundation which may be compiled with a different compiler.
And so the output of the back ends are very, very compatible between these two compilers, and this is true across all three supported architectures. So the output of these compilers are these object files represented with the beautiful dots on our hard drive, and then they go into the linker, and then that produces your object file. So this is your executable. I guess we're building Xcode. But so the question then may be, okay, we've done all this work. We've replaced this optimizer. We have a new code generator.
Well, what is the benefit of doing this? Well, so basically, if you're talking about the optimizer, the optimizer is really in charge of, or is capable of impacting three things, right? One is, how good is the code coming out of the compiler? How optimized is it? How fast does it run? The second is, how fast does the compiler itself run, right? How fast do we get code out of the compiler? The third is, well, is there room for new features? Right, and everybody loves features, and, well, to some extent, I guess.
So let's talk about features to begin with, and the one biggest feature that we have in LLVM-GCC is called link time optimization. Linktime Optimization is something that a lot of people have been asking for for a long time, and it's very straightforward with the architecture of the LLVM compiler.
[Transcript missing]
So, we're talking in very vague terms about how compilers are good and how performance is good. I want to talk about a couple of specific examples, okay? H.264 is an incredibly important video codec at Apple. It's highly optimized. And so what we're going to do is we're going to look at the H.264 decoder on the Mac compiled by both GCC and LLVM-GCC at different optimization levels. And across the bottom here, we'll have bars for O2, O3, and O4 and compare the performance across here. So, of course, with decoding, decoding video, higher is better.
And so if we look at O2, for example, what we've basically done is we've taken H.264, compiled with LLVM-GCC 4.2, also compiled it with GCC 4.2 and compared the performance. And so in this case, you can see that LLVM-GCC is actually producing code at O2, which they're both compiled at O2, that runs almost 7% faster.
So this is actually a pretty impressive result when you consider the fact that H.264 has been highly optimized by hand and tuned for GCC. It uses hand-vectorized code using the SSE intrinsics in XMM and Trin, for example. And people have paid very close attention to the performance of the code coming out. So 7% is pretty good.
If we look at O3, actually, it turns out that the gap widens. So O3 optimizes me harder. It enables new features like inlining. It enables loop enrolling. It enables some loop transformations, things like that, right? The tradeoff is you say, "I'm okay with bigger code and slower compiles as long as I get faster code." That's roughly what it means.
And in this case, it's very true. H.264 is a large C++ source base, or the H.264 decoder, and so inlining is very important. And all these kinds of things are very important for it. And so here you see that both -- the performance of both compiled versions of the program go faster, which is good.
But actually, LLVM-GCC increases its lead a little bit, and now it's up to 9% faster. Another interesting detail about this is that in this case, O3 with LLVM-GCC actually runs -- or O2 compiled code with LLVM-GCC actually runs faster than O3 compiled code with GCC, which is also a nice result.
So let's talk about 04. 04 turns on optimization across files. So in this case, we find that 04 is a small incremental improvement over 03 on this code base. Again, this is highly optimized code, and so we think this is actually a really good result. They've already done most of the things that link time optimization gets by hand. So they've moved file, you know, inline functions into headers, and they've done all kinds of stuff to really tune their code.
And so getting another 3% improvement, we think, is actually really good. And again, going from 65 to 66.8 frames a second is a big deal in the video decoding world, because you're potentially decoding movies to do editing, and you're spending a lot of time sitting with a lot of content, and the faster you can decode video, the faster you can get your jobs done, particularly if you're editing.
So the question, of course, is, okay, this is one axis of goodness for compilers, right, which is how fast the code comes out. Another axis is compile time, and if it's taking us, you know, ten times as long to get performance, it's not necessarily as interesting. So let's look at the same code base, H.264. Again, this is a large C++ code base. This is real-world code. It's been hand-tuned. It's had a lot of history behind it. It's real-world.
So instead of looking at performance now, I'm going to look at compile times across optimization levels, and we'll see how, you know, 03 time takes off. And 03 time takes longer than 02, for example. And so one of the things that we consistently see with LLVM GCC is that if you turn on optimizations, LLVM GCC compiles code approximately 30% faster than GCC 4.2. And this is a big deal.
So this is not a single case either. We see this consistently across a vast number of code bases. And again, when you're optimizing, you spend a lot of time in the optimizer. So the time you spend in the optimizer is a very big impact on this. At O0, you're not spending much time in the optimizer, so it doesn't really matter how fast the optimizer goes.
So let's look at O3 now. O3, of course, takes longer than O2, because we turn on inlining and we optimize harder. So at O3, both compilers slow down. But actually, it turns out that LLVM-GCC does slow down. It takes 20 seconds longer to compile this code. But it's still about 30% faster. In fact, it's 36% faster in this case. And so it's also very good.
So the big question now is, OK, we have O4. O4 involves a huge amount of work for the compiler. So at compile time, we're doing basically as much optimization as we're doing at O3. Because it's O3 plus work in the linker. We don't do code generation at O4 during compile time. We do it in the linker.
But we have to read in the entire application, which, again, is a large C++ source space, into memory, optimize across all these files, run code generation, and produce output. So the question is, how long does it take to do all this extra work? Well, when we set out to find these numbers, we were actually really surprised.
and Eric Lange, and the rest of the team. So, this is not something you should expect, unfortunately. In general, we see that 04 is about 20% slower than 03 on most source spaces. And so, across a lot of source spaces, we find that 04 with LLVM-GCC actually takes about the same amount of time as 03 with GCC.
[Transcript missing]
So, in summary, I want to wrap up what LLVM-GCC is. The takeaway points are it's very compatible with GCC. We've stressed this really hard. It's very important for us to be able to take an existing code base and drop in LLVM-GCC and just build it. It's important because we don't want to have to tweak the source code to change, you know, picky details in the parser.
We don't have to change our make files to work around differences in options and all that kind of stuff. Compatibility is key. Of course, LLVM-GCC supports the languages that GCC supports, which is important. And we support code generation for the same set of processors. The one missing one is PowerPC 64.
So across a large number of source spaces, we find that LLVM-GCC compiles code significantly faster than GCC does when optimizations are enabled. And we typically find about 5% to 10% better code generation at one specific O level. So if you compare O2 to O2 or O3 to O3. So again, LLVM-GCC has a new feature. A major new feature is link time optimization. And with link time optimization, you can get significantly better performance than with O3, for example.
And a lot of the benefit, the runtime benefit, the code generation benefit, depends on your source space. If it's heavily optimized or maybe only consists of one file, link time optimization won't help you as much. If it's a code base that's seen a lot of evolution, it's been hacked on by lots of different people, then link time optimization can sometimes be 20%, 30% speed up. And we've seen that in real world code. So LLVM-GCC, the big takeaway picture is that we're focusing on performance. Performance, both of the generated code and of the compiler, and we really want to keep pushing this forward, and we'll continue to do that.
So LLVM-GCC is something that exists today. You have it in Xcode 3.1. It exists on Snow Leopard. I want to talk about something that doesn't exist today. Well, it exists, but you don't have it, unless you go to the website. So I'll give you a little teaser on what Clang is. So the problem that we've found with LLVM-GCC is we can do a lot of work to speed up the optimizer.
And we really care about compile times at Apple because we have huge code bases. And we have huge code bases that we try to build very frequently because lots of things are changing, right? And when you're building huge code bases really frequently, you care about compile time. Furthermore, we really care about O0.
At O0, which is usually what you build your code at to debug, if I'm hacking on my code, I want to change something, compile it, run it, debug it, fix something, compile it, run it, debug it, over and over and over again, the speed of the compiler really matters. And at O0, no time is being spent in the optimizer, and so there's not really much you can do by changing the optimizer.
One of the goals that we have is we want to build a new front end. And it's aiming at solving a lot of problems that we've seen. One, the basic idea we want to do, the basic idea of the project is we want to take and build a new C++ front end for LLVM.
So we're going to be as compatible as we can with GCC. Now here, this will be a new front end, so it will be a completely new parser. So there may be bugs here and there. But we'll try really hard. And if you file bugs, we'll fix them. So the big architectural difference between a Clang-based compiler and a GCC front end-based compiler is that with Clang, we can embed it into Xcode.
Well, what does that really mean? Well, embedding into Xcode has a lot of benefits. It means that we can improve Xcode. It means that Xcode can use Clang to do indexing or factoring or other operations like that. It also means that embedding into Xcode, you can do a lot of interesting optimizations for compile time. So you can do caching across compiles. When Xcode invokes GCC, every time GCC starts up, or LLVM-GCC, it has to relearn everything about the program. It needs to learn where all the header files are, it needs to search the hard drive to find all those pound includes.
And as soon as it's done the 1C file or M file, it goes on to the next one, and it has to relearn all that stuff, right? Well, there's no reason to do this when you're compiling a large code base, right? The IDE has all that information. We can cache that. We can do a lot more with that.
A side note is that even though we're going to embed into Xcode, we still really care about command line use, and so we'll continue to support that. So Clang, as I mentioned before, is an open source project. It has a website, clang.llvm.org. It has a lot of information about our motivation, what we're trying to do. You can get the code there, you can play with it.
I really want to talk about four major features of Clang today. So the major feature number one is that when you compare Clang head-to-head against GCC, against time spent in the front end, which is F syntax only effectively, right now we see that Clang is about two to three times faster than GCC.
So we see two to three times speedups across a broad number of C and Objective-C based applications. So this is real world stuff parsing huge source bases, for example, Xcode. So the sub-bullet there is actually really important, though, because this is, again, head-to-head comparison doing exactly what GCC does. So this is straight out engineering improvement in the front end.
Of course, we don't want to run exactly the way GCC does, because that's very limiting. We actually want to embed this thing into Xcode. And when you start taking advantage of some of the information you have, we think we can push this significantly farther and get several multipliers of speedup in there. And we'll have to see. But this is front-end time. This is a project in development. We think that there's a lot of headroom left.
Second, we really care about the user interface of the compiler. So how often have you gone to compile something, and the compiler basically spits out a big mass of error messages, and you have no idea what it's talking about, right? Well, the error messages and the warnings coming out of the compiler are basically its user interface, right? The user interface is kind of a nasty one, because it's usually through the command line.
But if you compile code with Clang today, if you check it out from the open source repository, you'll see that it gives really good error messages. And the error messages not only are worded well so that they tell you what's going on, in this case, you know, that you have some invalid operations, it actually highlights the line of source code where the error occurs, and then pinpoints the location of the error and the sub-expressions involved. And what this means is that in this case, there's four pluses, for example, on this line. It identifies the exact plus that is the problem, and it identifies the sub-expressions.
This is really important and very useful when you have precedence problems and other things like that. In Xcode, you can imagine that there are bubbles. You can see that there are bubbles. You can see that there are bubbles around the sub-expressions and other nice, pretty things, but we'll get there someday.
So, which is the third bullet, Xcode integration, right? So we really care about Xcode. And furthermore, what we really want is we really want to build good programmer tools for C++, Objective-C, Objective-C++ programmers. And this is really what's motivating all this work. We really want to build the first class UI for developing code, and that is Xcode.
Fourth, and this is something that we'll talk about soon, is a brand new tool that's in development for automatically finding bugs in your code. And so this is just a teaser on Clang. We'll have a full session on LLVM, actually. The session is Thursday at 5 o'clock. This session will talk about a lot more detail about LLVM-GCC, including how link time optimization works, how to use it, examples of optimization. Also includes details on Clang, more performance information for Clang, more detailed compile numbers, things like that, more of the architecture design.
We also talk about a new bug-finding tool, which is in development. This bug-finding tool is really designed to take advantage of the information you get from a compiler as it parses code to analyze your code much more deeply and point out problems that you have. For example, memory leaks, things like that.
So with all of that, it's a very brief introduction. I want to wrap up the session with saying where we are, where we're going. So there's a lot of compilers here. We have GCC, GCC 4.0 on the iPhone. If you're building for the iPhone, your choice is simple. You have one compiler, GCC 4.0. It's a great compiler. This compiler has been used to build all the source code for the iPhone.
On the Mac, you have more choice. 3.3 you really want to get off of, so I'm not going to talk about that. 4.0 is a very stable compiler. 4.0 is most important if you're building applications that have to run on Tiger machines. If those apps have to run on Tiger machines, unfortunately, 4.0 is your only choice there. However, if you're willing to develop your app for Leopard and later, you have other good choices. So LLVM-GCC 4.2, GCC 4.2 are two really good compilers. They're available. They have many new features over 4.0, and I'd love for you to try them out.
[Transcript missing]