Developer Tools • 54:20
Explore improvements in Mac OS X Leopard compiler technologies, and get new insight into the compiler roadmap. Learn how new features in the GCC compiler can make your applications faster and more secure. Gain an understanding of how to take advantage of these features in your own projects.
Speakers: Geoff Keating, Chris Lattner
Unlisted on Apple Developer site
Transcript
This transcript has potential transcription errors. We are working on an improved version.
Good afternoon and welcome to session three seventeen, Taking Advantage of Compiler Advances. This is the compiler talk this year. My name is Geoffrey Keating. I am the GCC team lead and I'll be doing the first part of this talk. So in this talk there are three things that we'll be telling you about.
The GCC 4.2 in your Leopard seen today, has some new features over the GCC 4.2 and 4.0 in Tiger and we'll be telling you about those. Then we'll be telling you about a new compiler technology that we're developing which has a new version number, GCC 4.2 and finally, we'll be telling you about an even newer compiler technology called LLVM. We'll be telling you what it is and what you can do with it.
( applause )
We have a fan, excellent. So first of all, for those who are relatively new to the developing on the Mac, we thought we'd just introduce how the developer tools are structured. I talked earlier about three different things, GCC 4.0, 4.2 and LLVM and I thought I'd tell you about how they all integrate into Xcode. So the place to start for development on the Mac platform is Xcode. You'll find it in the developer folder on your developer seed with this icon.
Internally, Xcode uses a number of different possible compilers to take the source code that you enter and convert it into a working application. For C and C++ and Objective-C and Objective-C applications, the default compiler that's used in Leopard and that was also used in Tiger, is named GCC 4.0. But Xcode is not tied to a single compiler. For example, in Leopard, you can also use GCC 3.3, an older version of the compiler. Xcode also in Leopard, also permits you to add third party compilers like the Intel compilers.
In this talk we'll be introducing two new possibilities that aren't available yet on your Leopard seed, but that we're developing. In the future we may produce GCC 4.2 which will be a new version. It will be a separate compiler that will also be, that will be able to be used in parallel with GCC4 dot 0. You can actually set up Xcode to compile one file with 4.0 and then the next file with 4.2. We'll also be talking about LLVM which is like 4.2 an additional compiler on the system.
You can also use all of these compilers with other build systems. For example, if you have a process that uses, the build process that uses Mac and possibly Emac's or VI, you can still use all of the compilers from the command line. So first of all I'll talk about the improvements we've made to GCC 4.0. GCC 4.0 was as I said, the default compiler even in Tiger, but the GCC 4.0 in Leopard and in Xcode 3.0, has some significant improvements made to it.
So one change that we made is that we've default debugging format in Xcode 3.0. We've had a new debugging format called DWARF available since Xcode 2.2.3, but in Xcode 3.0 we're switching it on by default. So we're kind of hoping most of you won't notice, but there are some significant improvements that this brings.
One thing that it improves is the size of final executables. The previous debugging format called STABS had used to include all the debugging information through the link process, in the final executable. So the debugging information would be generated by the compiler in the object files, would be copied through the link phasing in the final executable and then the debugger would read it from there.
The problem with this is that it really made the executables much larger than they needed to be. STABS was typically larger than the executable code itself, so they would, the executables would become truly huge in some cases. We had a number of technologies designed to avoid this but DWARF makes the whole process redundant. Because, with DWARF, the debugging information, although it's still placed in the object files, is no longer copied to the final executable.
Instead, just a small amount of indexing information is placed in the final executable and the debugging information is read from the .o files when you debug your executable. The benefit of this? Well the executables become smaller when the compilers debug information, but more importantly, link time is dramatically reduced because this extra information doesn't need to be copied around. So you should notice significantly faster turnaround times.
If you don't wanna keep the object files around, we have a tool called DCM Util that let's you take the DWARF information from all of the object files and creates a separately file called a DCM file which you can then use even with a strict executable for debugging purposes.
DWARF also has a number of other benefits. For example, you get better C++ debugging, especially DWARF can properly represent name spaces which STABS couldn't. You'll find the .o files themselves are also smaller. You'll find that in line subroutines are have almost transparent debugging now. You can actually debug any line subroutine as if it wasn't and that feature, that last one is new with Xcode 3.0 and again, DWARF was always available since Xcode 2.3 but now we've made it the default. STABS is still available as an option for the moment.
Another collection of features that we've added relate to security. So there's a continuing emphasis on security in Leopard. Where Tiger was pretty secure, Leopard will be even more secure. Plan for this to continue happening until we either reach a perfectly secure operating system or we run out of time.
( laughter )
So there are some things that the compiler will now do to help you write more secure code. To basically catch security problems. But where should you put them, where should you turn them on. Well the places where you need security are where you application listens to the network, reads files that might be sent through email or through a web browser.
We know we tell people that you know, you really shouldn't just click on a file that someone's just sent you in email and most of the time that works and sometimes they click on it anyway. You also need to take extra care to be secure where application crosses privilege boundaries. So an example of that is any place where your application runs as root where it installs software.
So as I said, GCC 4.0 and Leopard provides new ways to avoid some common security bugs that you might have accidentally placed in your application. You should make sure to switch these on in places where you wanna take extra care to be secure and possibly switch them on all the time just in case because you never know when you might need it.
So here's an example of a security bug. Possibly the smallest program that can have a really serious security bug. We have a simple main program that takes some input from the user through it's arguments and attempts to copy that information into a fixed size buffer and the problem is that if the information from the user is too large well, the buffer isn't that large so it won't fit.
So how does this become a security problem? Well let's see what happens when we trigger the bug. When this is compiled and run, it sets up a frame on the stack that contains a number of things. It contains local variables including the buffer, the fixed sized buffer and there's other information on the stack like save registers and importantly the return address. Since this is the main program, the return address will normally call the exit routine to finish the application.
So when this program is run normally, it'll copy some information into the buffer. If that information doesn't fit, it'll overflow. It'll continue copying up the stack, overwriting other local variables, the saved registers, the return address and the function arguments. Someone malicious could arrange that the information copied into the buffer changes the return address to point to a location where code that they want to execute is available and that code could do something that you weren't expecting as the author of this program. So GCC 4.0 in Leopard, now has a technology which can at least prevent this particular case. What we do is we insert a canary between the local variables and the save registers.
( laughter )
So the way this works is this. Under normal use there's nothing changes. You write into the buffer and everything fits. If the buffer overflows, it will write up past the canary, the saved registers and onto whatever the attacker was hoping for.
The compiler generates code as the function returns, to notice whether the canary is still in tact or whether he's suffered this horrible fate. If the canary has been overwritten, the code at the end of the program, the code at the end of the function will then terminate execution.
It'll make sure that the saved registers or the return address which might have been compromised, are not used You can switch this functionality on with the -fstack-protector flag. At the moment we don't have any UI for it so you have to add it into the other C flag setting in Xcode. However, this catches that particular case.
There are cases that this won't catch. For example, it relies on the buffer being a local variable. If the buffer is in the fixed data section instead, so if it's static like in this example here. The stack canary won't be overwritten. So the stack canaries won't catch this particular problem.
However we have another technology that we've added in Leopard, another security technology that will catch this particular case. The problem here is that when we call the operating system string copy routine, there's no way for the operating system to know where the buffer ends and where other information like the remainder of the stack frame or the canary begins.
So in GCC 4.0, we've added a flag that lets the compiler change string copy into a new function called string copy check and that lets the compiler pass on information about what the return size should be. So here for example, will pass on the size of the buffer so will tell the string copy check that it should not copy more than 32 bytes.
If it does appear to be copying more than 32 bytes, it'll stop and again, it'll terminate your program. So you'll notice a trend with these, none of them really hide the bug and the usual response to a detected security problem is that the application gets terminated. So you should only really use them where terminating the application is better than what might otherwise happen.
But again, this has it's limitations. Object size checking is limited to checking only system calls or calls into the system libraries. So if you write the string copy yourself, object size checking can't catch that. Of course one thing that would have helped with this is that don't write it yourself. The system string copy routine is likely to be much faster than this code that I've written here. So to switch on object size checking, you define a preprocessor macro. In this case here, fortify source with and underscore. There's two levels, one and two, just use two.
Another security feature that we've added in Leopard is address space layout randomization. So you'll remember the earlier example that I showed you relied on overwriting the return address. The return, to point it at code that would do whatever the attacker wanted. To make this harder, we've now randomized the addresses of code so that the attacker can't so easily guess where the code they wanted to run might be.
So it works by loading executables in libraries at random locations in memory. The randomization for system libraries only varies between machines, so for a given machine the system libraries will always be in the same place until I guess you reinstall and a few other things like that. But for applications in user libraries, they differ every time you run the program.
So one detail about this is that debugging will all still work. All your variables will be found by name, etcetera. But if you're in the habit of for example, writing down the address from the screen on a Post-it note, well the computer can't update your Post-it note if you put your application in and restart it, so you might find it desirable to switch this off during debugging. But leave it on in your released application.
Oh and I guess I should say, you switch it on by doing all of these things, you have to add the dash F pie for position independent executable flag and also pass that flag to the linker. You also need to make sure that you are not generating position dependent code. The default is not to do that, but it might be switched on depending on what template you used.
Another feature, another security feature that we've added in Leopard, so this is 4.0 now, is code and data separation. It's no longer the case that you can just copy code into memory that you've allocated with malloc and then run it. You'll get a protection fault if you do that.
This is to prevent a situation where the attacker provides his own code in memory in your application and then tries to return to that. If your application does in fact generate code at run time, you should use the end protect system call to make just that place, just the part of memory where you generated code executable. This doesn't need to be switched on, it applies to all 64-bit programs in Leopard.
So in summary, four new security features. Stack canaries, object size checking, address space randomization and code and data separation They work together to make it harder or in some cases impossible to exploit security bugs. But, none of them are a perfect solution. There are certainly security bugs that you can have that nothing here will catch, even the compiler can't catch in principle. But they will help you to catch many of the more common ones. Unfortunately they're also all not the default. You do need to switch them on at the moment and they have to be switched on individually except for code data separation.
Okay so that was security. So another change that we made in Leopard has to do with the way deployment targets works. So the idea here is that you're on the left, you're writing your wonderful application with Xcode, you have some users running various versions of the operating system that you wanna be able to give your application to. So in Leopard, the default is that if you write an application in Leopard, by default it runs only on Leopard.
So okay, well as a commissioned developer you might not be terribly happy about this because at the moment there are zero people running Leopard. Well not counting the people in this building. And you might feel that even when Leopard does get released and nearly everybody moves to it, that the remaining people are still people you would like to be able to sell your application to.
On the other hand, you might feel that the new features in Leopard like core animation, like the security features I just talked about, are sufficiently interesting that this is what you want. That you don't need to be, that you can allow, you can ask you customers to upgrade to Leopard to get all of these new cool features. However, should you decide that you would really like the people running on Tiger to be able to run your application, what you need to do now is change a setting in Xcode.
If you set the Mac OS X deployment setting to be 10.4, then this will mean that people running 10.4 and higher should be able to run your application. Obviously you can't use any 10.5 features in your application when you're running on 10.4, although you can use weak symbols to use new functionality by detecting that you're running on a 10.5 system.
If you wanna go all the way back to 10.3.9, GCC 4.0 on Leopard, continues to support that. You simply need to set you deployment target to be 10.3. For so this catches the vast majority of all the people, of all the installed based of Mac OS X. For still earlier versions, then life becomes much more complicated than a simple setting and we can talk about that afterwards. There's not quite time in this talk to discuss the permentations. However, these three, 10.3.9, 10.4 and 10.5, are the vast majority of users.
Okay and one more thing, we don't have time in this talk to explain all the wonderful Objective-C 2.0 features that we've added in the Leopard compiler. For those you would need it to go to the Objective-C 2.0 talks that were earlier in the week. But hopefully they will become available later on.
Or hopefully you already went. So in summary, the GCC 4.0 changes for Leopard. We added a collection of new security features, some of which you should probably switch on. We've changed the default setting for deployment targets. We've changed the default for DWARF and we added Objective-C 2.0 and 64-bit Objective-C. So those were the GCC 4.0 changes. We're also working on a new technology, GCC 4.2, that's not yet available in your Leopard seed.
It will be a new major version of the compiler so it will install parallel to GCC 4.0. So people who are using GCC 4.0 now will be able to continue to use it for some time in the future. In GCC 4.2, we're adding a bunch of new features. In particular, we're adding a new version of the standard C++ library.
We're also planning to make, we're also making a number of bug fixes, including some that might break the existing code and so and in combination with that, we're going to attempt to avoid adding such new features to GCC 4.0. So the plan which is not final yet, is that GCC 4.0 will become the stable compiler and GCC 4.2 will be where the new development happens. All this of course is subject to change. We haven't shipped a 4.2 yet and there's still time.
One of the significant new features that we're adding in GCC 4.2 has to do with multiprocessing support. For those of you who are in the Intel compiler talk, you already know about OpenMP but there's still new information in this part of the talk. So for those who weren't there, I should briefly introduce what OpenMP is. Mac OS X has a large number of libraries supporting multiprocessing.
At the very basic level you have the Mach kernel which supports Mac tasks and ports and such. Layered above that we have as one possibility, the UNIX end of process communication infrastructure. For example fork, UNIX pipes. And on top of that what you might find are more friendly and more structured approaches to multiprocess communication. We have the Cocoa inter application communications layer and we have Apple events.
If you want to do threads so multiple threads of application within the same process, we provide the POSIX thread layer and we provide Carbon threads for people who are still using Carbon but what we recommend is Cocoa threads, a nice user-friendly way to set up multiple threads of execution in a single process, pass messages between them, set up locks and the like.
All of these thought are fundamentally based on message passing. You set up a thread, the thread you know, passes a message to another thread, acquires a lock from another thread and so on. If your code is already basically procedural and you want to add multiprocessing support to it, these are somewhat more complicated than the new alternative that we're proposing, suggesting, OpenMP. So let me talk about OpenMP, who might want to use it and what it's useful for.
Here is a simple example of some procedural code. It's a simple loop that adds two arrays together. You could paralyze this with POSIX's threads or the like, but it would be complicated. You would have to start off a bunch of threads, you'd have to set up locks to make sure the threads don't over write each other. With OpenMP, you can add parallelism to this simple loop like this.
Here we've added two single lines of code. One of them is the OMP4 pragma, this basically says, please take this full loop and paralyze it. The other one is infrastructure. It informs the compiler about what things are really parallel in this loop and what aren't. For example, here we've said that the arrays A, B and C are shared between all of the threads, but that the loop counter eye is to be private. We've also passed information about how many threads to use. We've basically said we don't care and how to organize the data. We've said to divide it up into chunks.
So what this does is it will cause the compiler to automatically generate code which creates a bunch of threads, runs this computation in parallel and then waits for them all to finish. To do this with POSIX threads would require significantly more code. So the basic model of OpenMP is that it's a fork joined model. You start off with sequential code, the code splits off into a bunch of threads, runs some computation and then when the computation is done, all the threads join together and then sequential code resumes and then maybe later on you do the same thing all over again.
OpenMP also supports more complicated forms of parallelism, all the threads don't necessarily have to be running in the same computation. You can set up barriers and locks to synchronize so that at some point in the middle of the parallelism, all your threads have single point where they're executing.
And I don't plan to go into the details of OpenMP here. So but what you do need to know is what compiler is supported and on what system it works. So open mp will be supported in GCC 4.2. However the version that's supported there is not compatible with the version that the Intel compilers support.
So you should really use for your multi threaded code, a single compiler to per shared library. You should probably avoid exporting OpenMP controlled variables from shared libraries because the ABI's aren't consistent. You can use OpenMP only if you're using the Leopard SDK but you can still use it to target 10.3.9 and higher through the deployment target mechanism I referred to earlier. And there is one feature of the Open, that's commonly supported in OpenMP, the GCC 4.2. does not have yet.
You remember earlier that I said that the variable I was to be distributed per thread, that worked because I was a local variable. At the moment GCC 4.2 doesn't do that for file scope variables. So for more information, if you wanna know how to program an OpenMP for example, go to www.openmp.org where there's a full description of all that OpenMP can do. There's also numerous tutorials on the web, just type OpenMP tutorial into Google and you'll get five.
Okay so that was OpenMP. Now let's talk the improvements that GCC 4.2 is going to make to visibility support. So for those who are new to the platform or haven't had to deal with shared libraries a lot, by default the C++ program behaves the same way whether it's a single monolithic application or if you split it into multiple shared libraries. This is different on Mac OS compared to say Windows.
So to avoid collisions between multiple shared libraries that might happen to use the same name for things, you should use the C++ name space feature. This is a good idea even if you're not using shared libraries at all of course. However, there is another alternative. GCC 4.0 and improved in GCC 4.2, supports an attribute, the hidden visibility attribute.
This set ups a new layer of what we call linkage. So just as you can use the keyword static to say that a variable is restricted to a single .o file or exstone keyword to say that this variable should be visible throughout the entire application. There's an intermediate layers where you can say this is to be hidden, which means that the variable is visible only inside the shared library or application where the variable was defined. And in C++ the hidden attribute also works for types. So if a type is supposed to be visible throughout the entire application, don't say it's hidden. But if you do want it to me limited to just a single dialog, then you use an attribute on that.
The benefits of doing this, aside from the normal code friendliness, is that it improves performance just as static does, is because the loader in macros doesn't need to start to deal with all of these symbols when you application starts up. It also prevents accidental use of internal symbols.
So we're making improvements to the visibility support in GCC 4.2. The basic concept is if you can't use a symbol, that is the symbol relies on something that isn't visible on the outside this translation unit, then the compiler will hide it so that you get the performance benefits without having to worry about putting explicit attributes on anything.
So for example here, we have a structure that's marked as hidden. We have a function that takes that structure as a parameter. Since the structure is restricted to just this shared library, then the function can also be automatically restricted because nothing from outside should be able to use it.
This is particularly useful in combination with templates and the standard C++ library. So for example, if you create a standard vector with a structure that is only used in your shared library, the compiler will now, the 4.2 compiler will automatically make sure that all of the components of that structure, the instantiated functions and variables are also themselves automatically hidden because since the structure isn't visible outside your shared library, neither the instantiation does not need to be either. But if you instantiate a template on some type that is not hidden, some type that is visible everywhere, then the compiler will ensure that is in fact continues to be visible everywhere unless you tell it otherwise.
Another improvement that we're making in GCC 4.2 is that the hidden attribute can now be applied to namespaces and when you do that the obvious thing happens, everything inside the namespace is hidden. This is particularly beneficial if you have a namespace containing the internal implementation details of your shared library. You can just mark that name space as hidden and then everything that relates to it will also automatically hidden.
Another collection of improvements that we're making don't involve the hidden key word at all. Anonymous namespaces don't have a name, that's why they're anonymous and therefore you can't use them from outside the translation unit where they're defined. What GCC 4.2 will do is notice that types, variables, functions and the like declared in an anonymous namespace, first of all all of those will be made static, but also anything that relies on them will also be made static.
So if for example, if you declare a structure in an anonymous namespace and then you instantiate for example a standard vector, of those structures, nothing on that vector needs to escape the object file that data compiled into. So all of that will be automatically made static. So that will become a significant performance improvement both in terms of load, application load time and the speed of linking become.
Another new feature we're adding in GCC 4.2 is the ability to turn warnings into errors. Now you could always turn all of your warnings into errors by using the -W error command line flag, in GCC 4.2 we let you turn specific warnings into errors. So for example, if you decided that the format security option is a very important option for you and you absolutely don't wanna build anything that produces those warnings, but you have other warnings that you're not so curious about and but that you still don't wanna suppress, GCC 4.2 lets you say -Werror=format-security. This will then ensure that all those associated warnings are hard errors so that you, if they're using your build, your build will fail and you'll be sure that none of those ever slip into a shipping product.
We've also added to a collection of optimizations. Some of the more significant ones is the dead code in elimination will now remove static variables that are used in your code, but were only used in code that turned out to be dead. We've made improvements to profile guided in lining and improvements to profiling itself.
We've also added an option called inter procedural constant propagation that works like this. Suppose you have a function and it takes a parameter and your code has been well structured so that many things are parameters or functions. It just so happens that this particular parameter is always the same. This is fairly easy to do if you've got say a control structure that is passed to most of your functions, but that you only ever actually instantiate one of.
If this gets in lined of course, this will go away. If it turns out that the function is not in lined, the compiler will still notice that only, that this function was only ever called with this single value of the variable and so the compiler will simply eliminate it. It will remove the function parameter and will replace the parameter in the actually function with the constant value that it always has. So you can use this kind of code without worrying about any penalty for passing extra parameters around.
So in summary, we're working on a new technology, GCC 4.2, it's not available yet but it will be we hope. Among some of the significant new features are OpenMP, symbol visibility improvements, some new optimizations and of course the ability to upgrade individual warnings to errors. So now to talk about an even newer compiler technology will be my good friend, Chris Lattner.
( applause )
Thank you.
Hello everyone. My name is Chris Lattner. I am the manager of the LLVM group at Apple. I'm also the founder and architect of the LLVM open source project. So, okay so, LLVM is a little bit different than GCC in a lot of ways. The biggest of which is that it's not a compiler.
Well what does that mean? It's actually a collection of technologies that you can use to build things like compilers and other tools. LLVM specifically consists of many libraries and so if you're familiar with framework based development, it's exactly what you're familiar with, we just have lots of different components and tools can choose the components that they're interested in and use them. Today I'll talk about three components.
The JIT, the GCC integration layer and the link time optimization components. As I mentioned before, LLVM is an open sourced project and so you can go to the llvm.org website and you can read all about it. I'll talk about a couple of applications we're using LLVM for today.
So today specifically I'll be talking about two things. The first is Leopard and the second is future plug ins with GCC 4.2 that we're planning on building. And so I'll start out by talking about LLVM and Leopard and to do that I wanna talk about OpenGL. The funny thing about OpenGL is that your into compiler talk right, you're not into graphics talk. So what do compilers and graphics have to do with each other right. I mean if you wanted to talk graphic, you'd go somewhere else.
Well as graphics have been developing and as graphics technology has been improving, the GPU's have been getting more and more general and the things that the OpenGL stack has been expected to do has been getting more and more advanced and much more aggressive. So what we've been finding is that graphics, compilers and other technologies are starting to converge and there are a lot of things that OpenGL has to do that starts to look and smell and feel a little bit like a compiler problem.
So for example, OpenGL has this concept of a vertex program. A vertex program is a small program that an application ships with that at run time, it is uploaded onto the GPU and every vertex that's sent down the pipeline for example if you're drawing triangles or polygons whatever, each vertex has to be transformed by this program.
And so in OpenGL, if you have vertex programs the typical way this works is the application developer writes a GLSL for example, a vertex program. These are text files. These ship with the application. These are included in their application bundle or whatever. At run time, the application says okay, I'm gonna upload these programs to the GPU.
Well what that means is that when your levels load in or when Keynote's building its transitions or whatever, whenever's a convenient time, it sends this to the driver and it says okay driver, I'm going to have to send lots of polygons through you with this program, think about it for a while.
Well there's a couple of different ways that OpenGL can do this right? I mean, ideally you want the graphics card to support your program and so the driver gets this thing, it processes it, figures out how to configure the GP to support it and then when you send all the polygons down the pipe, it transforms them and draws them.
Unfortunately life's not always that simple though. Specifically if your graphics card isn't as capable as you'd hoped, for example you might have embedded graphics, you might have one of the early Mac minis, it might not be able to support the program. GLSL is actually a very interesting language in that it's very generic and very high level and a lot of the capabilities of GLSL supports, actually aren't supported by any graphics cards that you can buy today or probably in the near future.
It supports many, many things. For example, very large programs, branching, looping, memory accesses. A lot of different things that a lot of GPs can't support. So because OpenGL has to support the full subset or the full spectrum of GLSL applications or programs, if the graphics card can't support it, it has to fall back and one option is to fall back to an interpreter.
The problem with falling back to an interpreter for this is that interpreters as you might expect, are very slow right and falling back to an interpreter when you're about to send a few billion vertices down the graphics pipeline is very bad. So the solution then is to say well we compiler people know something about JIT compilers right.
JIT compilers are a great way to implement a language, in this case a graphics language and you know, spend a little bit of time up front. So that you spends some time making code that's better so that when you send lots of vertices down the pipeline it will be executed more quickly.
So specifically, the way the LLVM gets involved with this, is that you have this several stages of this pipeline and in Leopard for example, your application starts up, it uploads it's program OpenGL gets it. If it decides it can't hit the hardware, the hardware doesn't support it for one reason or another, then it actually has a translator where it takes the GLSL and converts it into LLVM code and LLVM has this internal code representation which the optimizers use and the code generators understand and so it's this mid level representation that the set of frameworks we have understands and can chew on them.
Once it's in that form, LLVM among other things, has an optimizer, has a full suite of scaler optimizations, loop transformations and standard kinds of authorizations you'd expect to be in a compiler. Well it has these available in a library and so because these are easy to reuse, the OpenGL code that's been transformed to LLVM, can now make use of the same authorizations.
Once the code has been optimized, we send it through the standard LLVM code generator and the LLVM JIT and this allows you to get out at the end, sequences of highly optimized SSE or Altivec code that can process your vertex, your vertices or your pixels very, very quickly and it's much better than interpreting obviously. So once you have that code of course, OpenGL then runs it on the all the pixels and all the vertices coming down the pipe. So that's kind of a high level example of how in Leopard right now, OpenGL is using LLVM.
So the problem with that example is that it's a high impact example that a lot of people don't understand, so I'll give you a much more concrete, easier to understand example here. So imagine you are a graphics card. If you're a graphics card, you have a very simple mind. You understand very small sets of for example color formats. So you might understand eight bit per pixel images and you might want RGBA, a very standard format.
The problem is that OpenGL supports a gigantic amount of color form, that actually supports hundreds and so these things might you know, be packed so you have five bits per pixel or they might have two bits of alpha and four bits of this you know. There's all these different ways of encoding colors and images and when you say upload this texture to the graphics card, there's this transition that has to happen that maps from what the application knows and what the image format is, into what the graphics card understands. And this has to happen before the image actually even gets to the GPU, so this always has to happen on the CPU.
So the way this works is that you have code in the OpenGL stack that looks like this and basically it says that if I have an image, I'm going to loop over all the pixels in the image and for every pixel, I'm going to map it out of the encoding format, so I have this gigantic switch statement and the switch statement has hundreds of cases in it and I will decode all of the different fields of the pixel, you know pull out the red, the green, the alpha, the blue and decode them into the common format. Then I'll do another switch statement which again has the same hundreds of cases and it will recompress these back into the destination format. All right.
So the reason that the code has to be written this way is that you have hundreds of formats to deal with and if you were to make a specialized version of this function, it would have to be specialized for n squared combinations which are thousands and thousands of different combinations right. There's no way that we can ship OpenGL which has to be statically compiled, with thousands and thousands of different permientations of this, it just wouldn't scale right.
So fortunately at run time, we actually do know which formats the application's using and so we can do better than that. And so specifically if we see your application's uploading a lot of you know, five bit per pixel images, well we can actually realize that we should make a specialize version of this code.
So we can actually take the four loop, we can take the code just for the case that you care about that decodes the pixels and then add the code that recompresses the pixels and you get a nice simple four loop, which just has you know, say a dozen operations. These operations have no switching, it has no indirect branching, it's all very simple bit manipulation code. Well, furthermore again, we have the LLVM optimizer around and so the optimizer's very good at stamping out you know, extra bit shifting and masking and things like that.
The end result of this is that when you JIT compile this, suddenly you have a very small loop. It can be unrolled and optimized and suddenly instead of spending all your time doing indirect jumps and switches and things like that, you're spending all the time streaming through memory and doing actual useful work. All right and so this is a fundamentally run time you know, dynamic kind of application where you can't do this without a runtime version of a compiler, it's just you don't have enough information you know, in the factory before you ship.
So to give you an example of the impact of this, I just ran some benchmarks where I timed you know, how fast it is to convert images from one format to another on Tiger and on Leopard. Again, the problem with this is that you know, there's hundreds of different foundations and you get all these different numbers and it's not, I'm not gonna walk through all of this right.
So what I ended up doing is I ended up doing a summary and the impact of just the color format conversion, if you look at the average across all the different formats, shows a 5.4x speed up going from Tiger to Leopard and this is just by using the dynamic techniques and being able to actually optimize and produce specialized code that eliminates all the branching over it.
All right, well if that's the average speed up, one of the interesting things is that you get much higher peak speed ups in certain cases and so you know, one case we got a 19.3x speed up which is pretty ridiculous. That goes from 13 megabytes per second that you can translate to 257 megabytes.
Well the reason for this is that you know, in this particular color conversion case, where you're converting you know, RGB 4_4_4_4 to the same thing but byte reverse. Well the optimizer's able to see that you know, you have this very simple transformation happening and it was actually able to simplify the code a lot all right. But the what this means is certain cases which should be very simple and very cheap to do, now are.
So that's an example of how we're using LLVM in Leopard. We're also planning on doing many other things with it and one of the things I wanna talk about is GCC 4.2. So Jeff talked about GCC 4.2 earlier in the talk. So integrating LLVM with GCC, what does that really mean? Basically the idea is we all know and love GCC right? We use the driver and we use command line either directly with our make files or indirectly through Xcode and when we get errors and warnings and messages, we get it from the GCC front end and so staying compatible with all your code out there is really important to us and so we decided well, we should marry the GCC front end with some new technology for the optimizer in the back end.
And so the idea here is that we'll take the GCC front end which is just the stock 4.2 front end. This preserves compatibility with all the command line options, all the warnings, all you know, it sets the language that you know, the pickiness of the language that it accepts.
It's all the same as GCC 4.2. The difference though is that we're taking the LLVM optimizer and the LLVM code generator, again these are the exact same libraries used by OpenGL or available on the open source website and we're replacing the GCC optimizer and the GCC code generator with LLVM versions.
Well, what does this do? Well there's a couple things this does. One is you get significantly better code in some cases, as usual not always right. We, compilers can only do so much. It can't eliminate the n squared algorithms from your code for you. But, it also gives you new capabilities and one of the things that it lets you do is it lets you do optimization across files in your application. So specifically right now today, if you pick your application, it might contain a few hundred source files right.
Today, GCC will only optimize within each file, which means that if you have trivial, little, small programs, trivial, little, small functions in one file, you can't in line those into other files, you can't propagate constants across files, you can't delete dead code that's you know, only visible once you link the image. The linker has the ability to do this, but the linker only see the machine code. So it can't do high level transformations. To be clearer, the linker can do dead code elimination, but the linker can't do high level optimizations.
Well so the idea with LLVM GCC is that suddenly you can say okay, I'm going to use the exact same command line options, I'm gonna use exactly everything that I normally do with GCC, but I'm just going to specify 04 all right. This is our very simple, simplified user interface that even compiler people like me can understand. And from there it's very simple. You compile your application, at compile time you're building .o files, at link time we're invoking the linker.
Now these .o files are a little bit different than standard macho .o files. They contain extra information and so right now, we haven't enhanced nm or otool, for example, because they don't understand them. But the important thing is that when you invoke the linker, the Leopard linker then starts reading these .o files, it starts reading the native o files and it sees ah ha, these are LLVM enhanced .o files and it can actually invoke the LLVM optimizer at link time.
Well at link time, you application has a lot more context, a lot more information. It can see for example if you're using anonymous namespaces or visibility hidden, it knows which symbols can be imported and exported out of you know, the different libraries and dilibs you're linking, it can see your application, you know if you're linking your whole application together at once, it has a lot of information.
A lot of things only become visible at the linker and based on that information, the export lists and you know, whether you're compiling things for the whole program, things like that, the optimizer has a lot more freedom and LLVM in general will do cross, cross function optimizations at compile time, but that's only within a single file. By doing it in the linker, that means you can optimize across files and that includes things like dead code elimination and procedural constant propagation, in lining, a whole host of other interesting inter procedural authorizations which I won't get into right now.
So another interesting aspect of this is that this is all completely language independent and so not only does it work with Objective-C++ or Objective-C, it also works with C++ and C and so anything that you can send through LLVM GCC, we support the same languages that GCC does, can all be compiled as usual, just with 04, linked together and optimized across the language boundaries, meaning you can in line a C++ method into an Objective-C message or you know, across these different language boundaries. Which is actually surprisingly powerful and important in some cases because, due to wanting to build modular libraries and building our applications with abstractions, we wanna preserve the abstractions for the programmer, but we want the optimizer to eliminate them where it can.
So another interesting thing about this is this actually works really well with native .o files as well. You don't have to rebuild your entire application with LLVM to get this. This means that if you have old archives that have crusty .o files built ages ago, you can still use those. If you wanna use ICC or GCC for certain pieces of your application, that's totally fine.
The only disadvantage of this is the LLVM will see relatively less of your application or dilib, so it won't be able to know the full effects of that code and so we'll have to be more conservative. But it will work correctly just fine, it just reduces the scope of optimization.
Okay, so this was a whirlwind summary of some of the things we're doing with LLVM. The big pic, take away picture of this is the LLVM is a framework for building compilers. We're not planning on exposing that framework to end developers. If you're interested you can get it off the web page, it's an open source project and we're contributing a lot of code to it. We're working very closely with the community so that we're working with generally main line.
The two things that we're doing in the short term is that we are actively using OpenGL or actively using LLVM in the OpenGL stack and Leopard today and last WWDC for that matter uses LLM internally. In the future we plan on or we hope to have a LLVM GCC that we ship to customers, this LLVM GCC is aimed at supporting the GCC 4.2 front end and when and if we actually ship this, we it will be a drop in replacement for GCC. It'll work in, work, feel and smell just like GCC. So the it's as easy as selecting a new compiler in the Xcode compiler list.
So that is the end of the talk. In summary today, Geoff described some of the new features that Leopard brings to GCC 4.2, GCC 4.2 has a number of new things, in particular the security features are really cool and you should try them out. Going forward, GCC 4.0 will become the stable critical bug fix only compiler and so the 4.0 in Leopard will be basically frozen.
Going forward, 4.2 is the new development vehicles. This will bring most of the new improvements and new enhancements in GCC. Immediately the first obvious things are things like OpenMP, there's new optimization improvements, the visibility improvements, there's a lot of things coming there. LLVM is a new technology that we're developing. It's very technology oriented. We're trying to use it for a number of different things.