GCC in Depth - WWDC 2003

Apple Developer Tools • 44:03

Learn about Apple's implementation of the GNU Compiler (GCC) and how it has been enhanced with faster compile time, improved code generation, and better IDE integration. You'll also hear about the latest Mac OS X linker features and development roadmap.

Speakers: Ron Price, David Edelsohn

Unlisted on Apple Developer site

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.

As you probably all know, we jammed a new GCC 3.3 compiler this week at the show, and today what I would like to do is introducing you to that compiler, answering any questions you may have that are preventing you from moving forward, and I encourage you to move to 3.3. Also, I want to introduce you to one of our partners from IBM. We work very closely in terms of delivering the 3.3 compiler, particularly focused on the G5 and the new G5 system, and also leave you with some idea of future directions for us.

So we had a challenge in putting together the 3.3 compiler. We wanted to deliver a very robust compiler because we know that's what you expect. We wanted to address compile speed. We've heard very much from you in terms of compile speed and comparing us against the Code Warrior compiler. And also in terms of code quality because we were introducing a brand new system this week, a G5 system, and we needed to have terrific optimizations for that system coming out of the chute.

Well, that's a very hard problem. I've worked in the compiler area for quite a number of years, and usually you get to pick two of those three and you can deliver either a robust compiler with a lot of great code quality or you can deliver a robust compiler that's really fast. But to try and deliver a compiler that's really that meets all three of those goals. This is a real challenge.

So I want to talk a little bit about why we were able to accomplish that challenge, and I think you'll see that when you install and use GCC 3.3. First of all, as you know, it's based on the GCC, the Free Software Foundation technology, and today that technology is very mature and stable and continuing to evolve. Primarily from the advent of Linux systems and Linux system vendors putting together compiler production release compilers.

Since that time, the GCC compilers really matured significantly. There are billions of lines of code that have been put through it. We build our own Mac OS X operating system with GCC compiler. We have very extensive test suite, something on the order of 30,000 tests a day and growing all the time.

So just a very robust experience, and I want to assure you that when you install the 3.3 compiler, you will see the same thing. One example is one of our early seeds for 3.3 was one of you folks with an application of about 26 million lines of C++ code. That entire application was migrated to 3.3 with essentially no problems.

So what does Apple contribute? Because we also have been putting a big effort in. And I want to tell you that we're a very active member of the free software community. We have experienced engineers on staff that have worked in that community for quite a while. We are an active maintainer. The Free Software Foundation has actually recognized some of our efforts and they have made Darwin a reference platform for GCC releases. Something that is really fantastic.

New optimizations, as I mentioned, G5 was really important to us for this release. This system, as you've probably seen in some other presentations, is an order of magnitude more complex to deal with than the G4. And our mission, part of our mission, is to make that happen for you so that you don't have to do all of the work of extracting the power from G5. So we've done quite a bit of optimization. You'll hear some about that today.

You'll hear some from our partner IBM who worked with us on that. As I mentioned, compilation speed is important. And so we made that a high priority within our organization. And I will show you some of the performance improvements in terms of compilation speed. And finally, just general maintenance. Our customers report problems. We fix them. We put them into the community. And so we're very active and correct members, if you will, of the community. So let me get into talking about the three things that we focused on for this release besides robustness.

In the language feature area, as you know, GCC is already very compliant with the ISO and ANSI standards. And so it's mature in that way, but it's not 100% compliant. And so it continues to grow. And in this release, you'll see some of that, particularly with respect to now being able to identify incorrect code that may have slipped by in earlier releases of the Compiler.

We've also added a feature in Panther wide character support, a missing feature. Now, as you know, within our own frameworks and within Apple, we encourage you to use Unicode, which is a much more powerful means of dealing with wide characters. But we have heard your desire to have applications more easily ported into our environment. And in some cases, that requires a lot of work. So that will be there with the Panther release. You will not have it with Jaguar.

Objective C and Objective C++, of course, we maintain and we do enhancement in this. One of the problems that you have told us about is that we did not have an exception model, particularly for Objective C. We've added that. That's also available in conjunction with the Panther OS release.

Finally, and I know you've heard some of these things in other forums, but finally, we've added another migration tool for you, which is the ability to use inline assembly or port inline assembly from the Code Warrior. So completely compatible with Code Warrior, other than bugs you may find since it's a brand new feature. But I think it's pretty robust. We've ported quite a bit of code with it.

So, compatibility from release to release is always a concern and an issue. The main thing I'd like to point out to you is what you see here is a very short list of issues to deal with. So, we're compatible in many, many ways. One of the things that you will want to do though is that you will want to rebuild your C++ apps. A necessity in terms of making some enhancements to C++, we've had an ABI change. And so, you simply just need to recompile all of your C++ apps.

[Transcript missing]

The other thing that we have done within GCC 3.3 is that in looking at the migration of some code, people were getting tripped up a little bit with the flags. We had certain flags that you use for CPP Precomp. We have some others for PFE, and now we have PCH flags.

And so if you were in your development environment, getting those flags corrected was recognizing that you needed to correct them even was a problem. And so we've made the use of invalid flags for the particular compiler you're using a warning. So you'll get a heads up. If you see warnings from compiler flags now that you hadn't seen before, you should deal with those. They're really there to try and help you out.

One of the things that we get as a result of deprecating CPP Precomp is that CPP Precomp was built on a preprocessor model which was KNR based. And so you had to flip some switches within the compiler if you wanted to actually use the ANSI standards C preprocessor. Now that we no longer have that encumbrance, we have made the default CPP Precomp CPP preprocessor the ANSI standard preprocessor. It aligns us with all other GCC compilers that are being delivered today.

And finally, one other flag that I would just warn you about because we've seen some resulting difficulty with this as well is the no standard include flag was originally intended to not look in any system directories. We found that we actually had a problem with the implementation in the 3.1 compiler and it, in fact, was looking and finding header files in a system directory. Well, that's corrected now. And so if, in fact, you were taking advantage of the fact that certain system directories were being searched, then you'll need to deal with that as well.

[Transcript missing]

This makes the assumption that header files are a significant portion of your actual compilation process. And that's certainly true if you're using large frameworks like Carbon or Cocoa or AppKit or others like that. So your code is typically a very small portion of what is actually compiled when you're building files like that. To make the best use of pre-compiled headers, it works best if you can actually identify a common set of headers that are used by all of the files that you're building in a project.

What you can do with that is that you can create a prefix header, as we call it, which is a pre-compiled header that contains those common header files. Then you simply include that prefix header in the header file. And that prefix header in your file that you're building and the compiler will restore a significant amount of state, bypass all of that compilation process, and move straight into compiling the rest of the code.

The next feature is predictive compilation. And I know that this has been talked about a little bit in the Xcode discussions. I have a picture maybe that will help you understand. In predictive compilation, we once again are making the assumption that the header files are a big portion of your compilation process. And furthermore, if you're going to modify a header, you do that in the context of modifying that header file, not the source file you're editing.

And so what happens with predictive compilation is that when you begin editing a file, the compiler in the background begins the compilation process on the header files. And it will proceed up until the point where your code begins, the code that you're working on. Then once you save that code, the compiler proceeds on and predicts. And predictively compiles that file for you.

You can see in the timeline that I provided here a representation of that.

[Transcript missing]

The middle point, which we offered in December tool set, primarily was the addition of being able to use the dual processor on your tower. So that using both processors we were able to significantly reduce the amount of compile time. Still had PFE though. With the introduction this week of GCC 3.0 and Xcode, PCH is the new capability and you'll see on the chart there the performance over and above PFE we're able to obtain with PCH.

I wanted to draw your attention though to one other point on there and that's the green one down at the bottom that says distributed build. This is the result of building this application with six Xcode components. So we have two Xserves, dual processor Xserves. Six because we had them available. Actually, don't know that this app required the use of all six. But essentially you have a means now of adding horsepower to get faster performance, even above dual processor.

So code quality, and I mentioned that we had a big focus on code quality, particularly with respect to the G5 processor. When you're in the process of trying to focus on code quality within compilers, probably the most important tool you can have is a real benchmark. And that's a benchmark that is in a harness so it's easy to run. It provides reproducible results. It represents applications similar to what you deal with. And prior to G5, we had used an internal benchmark that was called Skidmarks. And Skidmarks was composed of kernels of code from various applications that we felt were important in Apple.

When we reached G5, though, we had a problem. The problem was that not only were those files small enough to fit into cache so that they didn't really represent true applications on the system, but also they weren't self-checking such that if we broke an optimization, the code might still run and produce incorrect results. We didn't have an idea of that. We just knew how fast it ran.

So we looked around in the industry and we chose the spec benchmark. There are pluses and minuses with the spec benchmark. You can argue as to whether they're representative of Macintosh applications or not, but we determined that they were close enough that we could make some good progress.

And, oh, by the way, we also needed to produce spec numbers for this new processor. So spec represents 12 integer benchmarks, 14 floating point benchmarks. They're real applications that have been encapsulated into the test bench, and they run in a harness and they're predictable and validate their results.

So let me just show you the results of G5 performance with the spec benchmarks. And what I'm comparing here is our 3.1 compiler versus our 3.3 compiler today. So the compiler you're using today versus the one that hopefully you'll be using later on this afternoon. You can see from specint, and this is an aggregate across the 12 benchmarks in specint, that we actually have a 17% performance increase over what we're able to extract from the 3.1 Correspondingly for Floating Point, and this is a Floating Point machine as has been noted in almost every presentation. In Floating Point, we're able to get 30% better performance than we can with the 3.1 compiler on a G5, and this is a single processor G5 2 GHz.

So what does this mean for you in terms of how can you take advantage and build your code for the G5 and get performance? Well, there are several situations that you may be dealing with. First of all, you may have an app that you want to run on more systems than just the G5. You may want to have that app which has some computationally intensive code, but by and large, the rest of the app is not dependent upon the performance of a G5 system.

And so you might break that app up such that you put the computational intensive portion into a runtime library. You tune that very heavily for G5 and leave the rest of your code as G4 or G3, and you can produce runtime libraries for each category of system. You can do that by turning on the MCPU Frag and getting G5 instructions and M-Tune and getting G5 scheduling.

Let's suppose though that your app also uses long longs or double data types. Then you may want to take advantage of the 64-bit arithmetic power of the G5. And what we have provided if you use the Empower PC64 switch is that you actually enable 64-bit arithmetic and it does amazing things then with your long long arithmetic or your doubles. And loading and storing 64 bits and doing arithmetic on that.

Believe me, you'll applaud once you've had a chance to taste it, if you're using that.

[Transcript missing]

So let me call David Edelsohn up to the stage. David and others within IBM have worked very heavily with us on optimizing for the G5 and the 3.3 release. David? Thanks very much, Ron.

So my name is David Edelsohn, and I work at IBM Research. My main focus there is on open source issues and technology, particularly GCC. I'm glad to see so many people here at the conference interested in GCC. So first let me give you a little idea of what I'm going to talk about in this presentation. I'm going to talk about the PowerPC 970 processor and issues in the processor that are important for a compiler to target to get the highest performance.

We're also going to talk about how we have optimized GCC to extract that best performance on this processor that Apple is now using. And we're going to mention a little bit about the future optimizations that are going to be coming down the pike in later releases of GCC and about how IBM and Apple are both working together with the open source community to engage them and produce the best compiler for the PowerPC processors. So let's start with a little bit of information about this processor. This big step in processor performance. We now have a 64-bit PowerPC processor on the desktop.

The PowerPC architecture was designed from day one as a 64-bit architecture. The previous implementations have been the 32-bit subset, and now we have an implementation in Apple's computers that use the full 64-bit architecture. And this architecture has instructions that operate on both 32-bit size data and 64-bit size data.

And the architecture is based on the 32-bit size data and 64-bit size data. And the architecture is based on the 32-bit size data and 64-bit size data. And the architecture is based on the 32-bit size data and 64-bit size data. And the architecture is based on the 32-bit size data and 64-bit size data.

And in 64-bit mode, when it's operating on twice the amount of data, and it's the same performance whether you're in the 32-bit mode or 64-bit mode, same latency, the same speed of the instructions, and you're operating on twice the amount of data. This processor has a lot of resources available at its disposal to attack whatever application you have to throw at it.

It has two complete symmetric floating point units. It has two symmetric load store units. It has two almost symmetric instruction integer units in this processor. And these are a lot of capabilities now to bring to bear on your problems. And this increase in resources is an increase in performance for your application.

So this is a pictorial depiction of the processor core in the 970. I just want you to have a visual representation for this processor in the upcoming slides. So let's go through and talk about how an instruction flows through this processor. First, instructions are fetched from the L1 cache into the instruction queue.

And in this instruction queue, the instructions are then decoded and placed into the dispatch groups. And up to five instructions can be dispatched at a time to the various instruction function unit issue queues. from the issue queues, you can dispatch to any of these 12 function units in the processor. So this is a lot of resources, a very powerful chip, a lot of complexity that the compiler needs to harness to give you the best performance.

[Transcript missing]

So I want to make sure that you understand that while this processor has a lot of capabilities that the compiler can leverage, that this processor works very well with any sort of code that you're going to throw at. That if you're working on G3 or G4 code, there's a lot of dynamic capability in the processor to extract the most performance out of that.

You can work with the compiler to achieve the maximum performance and get that little extra percent, but if you're working on an application that needs to be targeted at G3 and G4 and the new G5, that application will scream on all of these processors. And it will scream on the G5 as well. So this processor will not just fall down if presented with G4 code.

So how can you help the compiler to produce better code now that we've discussed what the compiler -- what we've done in the compiler groups to try to help tune it for this processor? First of all, one thing that you need to do is to make sure that this -- that the compiler can be as aggressive as possible in its transformations of the code. And one of these areas is to try to make sure that the compiler doesn't need to be more conservative than it need be. An area where this is important is in aliasing.

That's where the compiler cannot tell or where two different memory -- two different accesses to memory could potentially -- two variables could point to the same place in memory. And so if the compiler cannot distinguish this, the compiler needs to be more conservative to ensure that it's going to perform the calculations in the appropriate order.

So what you can do to allow the compiler to see and understand more of the procedure that you're working on is to use local variables and to avoid taking the addresses of variables. And this also includes areas where the calling convention may implicitly pass something by an address, because then the compiler needs to sort of throw up its hands if it can't fully understand the consequences of what the -- something may be happening behind its back. So that's one area where you can help the compiler produce more effective code. Another area is in simplifying loops, where if you can make this loop as simple as possible and as direct as possible, the compiler has much more opportunity to make it more effective. Thank you.

So you can take the loop and do the same thing. You can also do the same thing that you would do if you had a loop, but you have to make a loop. So the compiler needs to be able to make transformations that will improve performance. One example of this is to not have changes in flow control inside the loop.

So, for instance, if you have a loop that has a branch inside the loop, if it's possible, it would be better to move that branch, that conditional, outside of the loop and actually duplicate the loop in both branches of the conditional. And that allows the compiler to then optimize each of those loops much more effectively when it can better understand exactly what's going to happen. Thank you.

Another area is indexes into arrays. Again, try to have as simplified an index as possible so that, again, the compiler can understand what's going on and make transformations with prefetching and other sorts of optimizations that would be much more effective in the throughput of this processor. And finally is obeying the type rules.

The type rules are something that is essentially an agreement between the programmer and the language about how you're going to use the various types. And if you use the types appropriately and don't start changing accesses behind the compiler's back in ways that it doesn't expect, you can use the most aggressive optimizations, as Ron mentioned, and get the best performance from your program.

So let me talk a little bit about where this technology came from. That a lot of this work that we've been doing comes from IBM's vast experience in developing compilers over the past decades, including some of the earliest compilers for computers. So we have a worldwide research and development organization that has been brought to bear in partnership with Apple in improving this compiler for the G5 processor.

Some of this involves taking the technology that IBM has developed for its Power 4 processor from which the 970 was derived and used in developing IBM's own compiler, taking that knowledge, taking that experience, and driving that technology and that understanding into GCC, so leveraging all of that history of development. And we're trying to -- using all of that to exploit this processor and give Apple and its customers the best performance possible.

Now let me mention a little bit about where GCC is going in the future. There are a lot of very exciting optimizations that are going to be included in the compiler in the future, which will yet further enhance the performance of this processor. We have a new register allocator that is coming.

There is improvements in the instruction scheduler, which will allow even better ability to model the complicated dispatch groups that I mentioned. There is work on software pipelining, which will help improve the floating point performance. There is work on an SSA optimization infrastructure, which allows one to describe the program in a way that allows much more aggressive optimizations. This will allow easier implementation of loop optimizations and auto vectorization in the future.

Further work on inter-procedural optimizations to be able to, again, have the compiler understand more of the program, understand what's going to happen with aliasing, and work past limitations that the programmer may not have intended. Profile directed feedback, so that the compiler can reorder the instructions to take advantage of how the application is actually run with the data sets that you have. And of course, continued improvement in the compiler speed to more rapidly generate the code.

So all of this stems from this partnership that IBM and Apple have in developing and further improving GCC. We are both very committed to GCC and to the open source community. There are a large number of developers for GCC inside Apple and inside IBM. I am a member of the GCC steering committee and a member of Ron's team is on the GCC steering committee.

I'm one of the maintainers of the PowerPC port of the compiler and a member of Ron's team is my co-maintainer on the compiler. And we are fully engaged with this community to help design the future of this compiler and taking a leadership position to make sure that this is the best compiler for Apple systems and for the PowerPC.

[Transcript missing]

Actually, it's a pretty exciting time for us. In the compiler world, when you're presented with a challenge like this, it just makes your day or makes your year.

Future directions. Well, as you can see, we've just embarked on this journey with the new processor and extracting the performance from that. And we really consider our role to be one of trying to make this processor work for you without you having to jump through hoops or go through a lot of gyrations. Right now, we're telling you, please do a few gyrations if you want to get the performance out. But we're working, as David showed you, on any number of areas within the GCC community to take the optimization capabilities of the GNU Compiler to the next level.

We also don't consider our work done in terms of compile speed. I think you will agree we've made some really dramatic progress in the past year. And if I were you, I would expect from us that we'll make as good a progress in the coming year because we're going to nail performance.

That's really important. And we want to remove this compiler from being an obstruction to you getting your work done. In terms of language features, we'll move along at a slower pace because the language is moving at a slower pace today. But you can count on us continuing to move towards full compliance. with the ANSI and ISO standards.

So, this, even though it's being jammed today, is being offered to you as a part of the Xcode tools package. There's documentation in that package that you can reach. There's the release notes. One piece of documentation that you'll probably want to look for is in terms of performance tuning your code. And I would encourage you to attend the session, which occurs the next day or the next week. And I would encourage you to attend the session, which occurs the next day or the next week.

Okay. Anyway, there's a performance session during the next period of time, and I would encourage you to attend that. I was at the CHUD session that preceded this, and I was happy to see an overflow room there. We've got some dramatic capabilities with our tools to help you improve the performance of your code.

If you have contact that you want to make, Gafi DiGiorgi, who's our technology manager, and you'll meet him in just a moment, his email address. There's a feedback address for the development tools, and of course, we're always looking for bug reports. Well, I guess we're not happily looking for bug reports, but we appreciate the feedback.

Here's the 305 in Presidio as the performance tools section that I was talking to. Some of the other sessions that you might be interested in is Apple Script Studio, Carbon Development, and so forth. We encourage you tomorrow, there will be a feedback forum where you can come and let us know what you would like to feedback to us and give us your impression of the tools.