Programming for the Mac OS X 64-bit API - WWDC 2004

Development • 1:06:09

Mac OS X "Tiger" adds support for 64-bit applications and will allow you to build solutions that can address massive amounts of memory. This session will specifically cover the 64-bit calling conventions, describe upcoming 64-bit capabilities and developer tools, and explain best practices for 64-bit programming and extending 32-bit application sources in Mac OS X.

Speakers: Wiley Hodges, Nick Kledzik, Jeff Glasson, Stan Shebs, Matt Formica

Unlisted on Apple Developer site

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.

Good morning. So I am here-- I'm, of course, here to talk about programming for the Mac OS X 64-bit API, or more correctly, to talk a little bit about Apple's direction with respect to 64-bit computing, and then hand things over to our esteemed engineering team, who is going to give us a lot more detail about this.

First things first, why 64-bit? I think the key thing that we see is access to more memory than you can imagine for now. That's always an embarrassing remark, of course. 640K used to be more memory than some people could imagine, and 16 exabytes may seem like that someday as well. I hope I'm not introducing the 128-bit API session.

But we do see a number of reasons for this. In some cases, you're going to see increased performance for memory-intensive tasks. Now, I will say that all other things equal, your programs are likely going to run slow in 64-bit, but all other things are not always equal here.

One of the great things is that you have access to a lot more physical memory, a lot more virtual memory, and we think there are some benefits of that. And obviously, one of those is that physical memory is considerably faster than disk. I think the other one is that you can be lazier programmers in some instances, because it's going to be a lot easier to access this large amount of memory on the system.

Our 64-bit goals are quite simple. We want to provide the key benefits of 64-bit computing for the people who need it the most, while making sure that we actually don't break any of these 12,000 native applications that we have today. And we talk a lot about this internally as a goal. We want to make sure that people in scientific computing and high performance computing and media have access to these benefits of 64-bit.

But we don't want educational software vendors with very simple titles for children to have to actually redo everything because we've decided to move into the 64-bit world. So our objective is one that really stresses compatibility and performance for the native applications on the system. And 64-bit is an addition. And you'll hear more today about why we're able to do that in our 64-bit architecture, but I think it's a unique advantage that Apple has with Mac OS X.

We started the 64-bit journey already with Panther. We had actually system access. We had access to greater than 4 gigabytes of physical memory in a machine. But all of the applications that you could write were 32-bit. All of the user tasks were 32-bit. And that's the address space you had as developers.

All applications, however, could in Panther actually use 64-bit hardware math functions. There were 64-bit registers. You did have to recompile your binaries to actually take specific advantage of the PowerPC G5 architecture, but those were features you could get. And once again, this is something that distinguishes Apple's 64-bit architecture from other applications. So we're going to talk about that in a little bit.

But the 64-bit story from some of the other platforms is that many of the performance benefits of 64-bit computing are actually available to 32-bit applications today on Mac OS X if you're going to bother to actually do the recompile and optimization for PowerPC G5. But with Tiger, we actually want to take it to another level, and we're actually adding 64-bit addressing for user tasks.

We're going to focus our 64-bit support initially on the applications that we think are most likely and most urgently need to benefit from 64-bit addressing. And so we think these are going to be scientific applications, rendering and computational engines, and server applications. And the infrastructure we're providing to do that is actually based on a 64-bit lib system and 64-bit capabilities in the compiler. So that actually concludes, briefly and mercifully, the marketing portion of this morning's presentation. And I'm actually very pleased to hand this over to Nick Kledzik, who's going to talk more about 64-bit architecture. Thanks, Nick.

Thank you, Wiley. One of the things Wiley mentioned was that 64-bit computation was already available to 32-bit applications on Panther today. And I want to start off by going into detail on what that means. When you think about 32-bit, 64-bit computing, there are actually four aspects of that.

So the first is you need a processor with 64-bit registers. The second, you need to be able to actually load and store the 64-bit registers to main memory in single instructions. Next, you need a 64-bit ALU, which is the computation unit of the CPU, so that you can do 64-bit multipliers and shifts and so forth. And lastly, if you happen to use one of those 64-bit registers to access memory, that you actually, the kernel has set up a 64-bit address space for you. Now, the first three of those four points are already available in Panther today.

You can take your 32-bit app and take advantage of the full 64-bit registers. You can do 64-bit loads and stores in the app to make them a little bit faster. And you can take advantage of the 64-bit ALU. If your application happens to be 64-bit integer computation intensive, you can do all those things but not have 64-bit address space and get a performance boost. Another way of looking at this is all we're introducing in Tiger is 64-bit address spaces. Thank you.

So next I'd like to talk a bit about how 64-bit was implemented in PowerPC because it is a lot different than how some other microprocessors have added 64-bitness to their line. PowerPC is kind of unique in that PowerPC started out as a 64-bit architecture way over 15 years ago, whenever it started. Only now we've finally seen the fruition of the full 64-bitness.

Another interesting thing about it is that there is no mode. Some other processors, you have a 32-bit mode and a 64-bit mode. And when you're in those different modes, you have different sensor registers and different sensors of instructions. It's almost as if it's two processors merged into one, and there's some big switch as to which way you're doing.

One of the downfalls of that is that in the implementation of a process that model, the implementers have to make the decision of how to microcode each of those instructions. And they tend to make the 64-bit one faster and the 32-bit one at the cost of the 32-bit.

PowerPC does not have that problem. There is no mode on PowerPC between 64-bit and 32-bit. It is completely a software convention, whether you're doing 64-bit or 32-bit. And I want to go into some more details on this because once you get this, the rest of the 64-bit talk and how Apple is rolling out 64-bits will make a lot more sense.

If you played around the PowerPC instruction set, you'll know that this being a risk processor, that there's basically two categories of instructions. There's load and store instructions and everything else. Now, load and store instructions simply are the only instructions that can access main memory. All they do is move between memory and registers.

All other instructions operate with registers. For instance, add R3 to R4 and store it back in R3. If you look at those instructions and what they do with registers, you notice there's no size designation on those instructions. There's no add byte or add word. There's simply the add instruction, which works on the entire register.

So how that works is, say you want to do 32-bit math or you are a 32-bit program. You only ever load the 32 bits of the register. You do your addition or whatever, and you only ever look at the low 32 bits as the result. What that allows you to do is that a program written to 32-bit conventions has no concept of whether or not there's actually higher bits.

What this means is when the PowerPC came out, even though it was a 64-bit architecture, the original silicon only supported the low 32 bits of registers. All the software written only looked at the 32-bit registers, the low 32 bits, because that's all there was. But they followed that convention. When the D5 came out, there happened to be more bits of it there.

The programs only load the low 32 bits. It doesn't really matter what's in the high bits. They can be completely garbage. They still do the same add instruction, and the result, they only look at the low 32 bits. So what this means is there is no mode with PowerPC. It's only conventions that distinguish 64-bit from 32-bit processes. Another way of looking at this is that there's no performance penalty for running as a 32-bit process. In fact, as we'll learn later, there's a slight performance gain for running as a 32-bit.

Now, those of you who have done assembly programming also know that there's something called condition codes. So you've got this mental model of how you can run within 32-bit conventions on 64-bit processor. But you may ask, "Well, how are the condition codes set?" Now, condition codes are things like the carry bit and the zero bit.

They are set as a side effect of some instructions. So you do the add. If there was a carry out from the add, the carry bit will be set. Or if you do an add and the result was zero, the z-bit is set. There is one tiny bit of modiness in the PowerPC.

And that is, there's a mode of how the condition codes are set as a result of an instruction, whether the processor looks at the full 64-bits or the low 32-bits. So when the kernel starts a process, it looks at the code in the process and basically a flag on the header and decides whether this piece of PowerPC code is using 30-bit conventions or 64-bit conventions. If it's using 32-bit conventions, it sets a little flag on the PowerPC. It says, when you set condition codes for this thing, because it's using 32-bit conventions, it's only looking at low 32-bits, so the condition code should only look at the low 32-bits as well.

The one other thing the kernel does different when launching a 32-bit versus 64-bit process is for a 32-bit process, it also tells the MMU, the memory management unit, to ignore the high 32-bit addresses because they're potentially garbage and to only look at the low 32-bit. That gives a 32-bit process a 4-gig address space and a 64-bit process the full 16 exabytes.

So what are some of the trade-offs for compiling for 64-bits? Because now you have the choice of whether to compile your same PowerPC instructions using 32-bit conventions or 64-bit conventions. Well, since we all know 32-bit conventions and what they mean today, let's talk about the trade-offs of changing to 64-bit conventions. Well, there's the obvious advantage that you now got a huge address space. And if you have lots of data and you need that address space, then this is the advantage you want to go for.

Another advantage is if you are using 64-bit computations, 64-bit integer computations, but you don't need the full address space, one of the limitations of using the 32-bit calling conventions is that none of the functions know about the high 32 bits of registers. What that means is when you compile today and you tell our GCT compiler that you're building for the G5 and you want to optimize for the G5, Whenever in any leaf function, the compiler will basically make use of the full 64 bits within that function.

The compiler cannot do that when it crosses function boundaries. That is, if a function calls another function, The compiler has to worry that function may trash some upper bits of registers, so therefore it doesn't use 64-bit conventions within that function. Now, once you switch to 64-bit conventions, you know that it's safe to basically use all bits of the registers. So there are a small category of applications that don't need a lot of data, don't need a large address space, but because they use a lot of 64-bit arithmetic, can take advantage of compiling for 64-bit mode.

So what's the disadvantage of compiling for 64-bit mode? As I said before, the instructions are exactly the same on PowerPC, no matter which way you compile. But the difference is pointers are bigger in 64 bits. They're 64 bits. But what that means is every data structure you have that has a pointer in it is now bigger. Overall, that means the data in the application is bigger.

When the data in the application is bigger, that means you need to take up more address space, which means you need more pages and RAM to run your process. On one hand, you can put more RAM on the machine, that solves that problem. But there's also the L1 and L2 caches in the processor.

And they try to cache the most recently used data from the entire address space. Well, if your data set for your app is larger because your pointers are larger, the chances of your data being in the cache is slightly less. So there will be a small decrease in performance for compiling for 64 bits.

So therefore, it only makes sense to compile for 64 bits if you actually find that you have, you're running into limits of a 4-gig address space. Thanks. Another disadvantage is since this is all conventions, is any libraries you depend on also have to be available with the 64-bit conventions, otherwise you can't call them. So you have to wait until everything below you has been converted to 64-bits before you can convert.

Now that I've explained a bit how the PowerPC works and how there's no 64-bit mode, now I want to explain how we're going to actually roll out 64 bits. Let me explain what we're not going to do. There is not going to be a 64-bit Mac OS X and a 32-bit Mac OS X. There's only going to be Mac OS X. When you happen to run on a G5, the kernel will recognize it and allow programs that are marked as using 64-bit conventions to be launched, and they'll be set up with 64-bit address spaces.

For "Tiger," all Apple is committing to at this point is that lib system will be available to 64-bit programs. Lib system is the standard C library and most of the POSIX functionality, which means command line applications and applications with no UI will be able to convert to 64 bits if they so choose. Over time, Apple will be rolling out more libraries. And one of the things we want to hear back from you is, what library should we do first? What are most important to you? You being the people first converting to 64 bits.

Now I want to go into a little more detail. Now you've got the big picture of how 64-bit processing works with PowerPC. How we're actually going to roll this out. The most interesting thing is the last point here. We're going to have a single kernel. The single kernel is going to be a 32-bit kernel. We can do this because of the PowerPC architecture. It's just conventions.

A 32-bit kernel can launch a 64-bit process. The single kernel has a number of advantages. First of all, it means we can produce one disk that can boot on any machine. Second, it means all the existing kernel extensions and device drivers that are all written to 32-bit conventions will still run within it.

Some of you may have heard the term LP64 for data models. That is the convention that we have chosen to adopt for 64-bit calling conventions on Mac OS X. Now, I want to go through a little bit of history of where these acronyms came from. Let's go back in time to the early '90s with the pioneers of 64-bit computing, or Cray and Alpha.

Now once they got the hardware done, they started looking at the C language and were like, "Well, how big should an int and a long be?" Well, they were gung-ho for 64-bit, so they said, "Well, we're going to make an int and a long, all 64 bits, as well as pointers."

I've actually had that rolled out to more and more programmers, and more and more programmers as they use that said, "Dang, this is hard to use, because I've got this file format that has 32-bit values in it, or I just got this network packet with 32-bit values in it.

It's really hard to get to because there's no 32-bit type in the C language." So when the next generation of 64-bit processors came out for Solaris and SGI and eventually Linux and stuff. They all looked back at the early pioneers and said, "We're not going to do that. We're going to use LP64." And LP64 longs and pointers are 64-bits, but integers remain 32-bits.

Therefore, they have the base types in C to have byte, short, I mean, 8-bit, 16-bit, 32-bit, and 64-bit types are char short, int and long. So that was a much easier programming model. And that's basically, if you go searching for any 64-bit clean software out there today, you're going to see all of it is written to the LP64 model.

Now, the next thing that happens is once all these acronyms came out, OP64, ILP64 and stuff, people said, well, what do we call the old 32-bit model? So the old 32-bit model got renamed as ILP64, which is integers, longs, and pointers are all 32 bits. Well, then some of the 32-bit processors got jealous of the 64-bit computation available and said, well, we want a 64-bit type in the C language.

So a bunch of different compiler vendors added extensions, and eventually it got standardized and ratified in C99 as long-long. Then they had the problem of, well, what does long-long mean in LP64 and ILP64? Well, just to make it easy, they said, well, it's just the same as long. So long-long is 64 bits across all compilers.

You also may have heard recently that Windows has announced Windows 64. Now, they decided to take a different path. After looking through the source code, they decided that they had too many cases where they had hard code along to be 32 bits. So rather than adopting the industry standard LP64, they came up with a new model they call P64, which only pointers change in size to be 64 bits. All the other integer types remain the same.

So those of you who are investigating 64 bits and have code you need to compile both for Mac OS X or the UNIX world and the Windows world, you have a bit of a conundrum. Turns out it's not that bad. If all you do is avoid using the raw long type, you're fine, because ints are the same between LP64 and P64, and long longs are the same. As you'll learn later in the talk, using the raw types is kind of problematic to begin with.

Now the ABI. We had a chance to re-examine the PowerPC ABI for the 64-bit because we need to come up with a new 64-bit convention. And there's no reason why we had to be compatible anyway with the old 32-bit conventions. So we did as much analysis as we could on how the old calling conventions worked, how parameters were passed in registers and so forth. And we wanted to come up with the optimal convention for 64-bits. What we decided was the original 32-bit was pretty darn good and hard to improve on. found a few edge cases where we could improve. So we did that for 64-bits.

The first is when you pass structs that contain floats. The old convention was not very efficient. We've improved that for 64 bits. The second reason we made returns of structs a little bit more efficient. And last thing is we decided to dedicate R13, which was previously a non-volatile register, to be owned by the OS and in fact owned by the threading package. So R13 will be unique per thread. This allows faster Pthread access. In particular, thread local storage will be faster on 64 bits.

Next, we need to update the file format we use. For those of you who've actually looked at the details of the Mac OS file format, you'll see that it uses 32-bit offsets everywhere in the file. We decided that in the long term, people writing 64-bit code do so because they're going to have large amounts of data and large amounts of code, and a 32-bit offset or 4-gig limitation on the file size might be an issue. So we're going to be enhancing the Mac OS file format to allow larger than 4-gig files. The key point to all this is that it's going to be completely transparent to you.

Next thing is, some of you may be thinking, "Well, if there's no mode on the PowerPC, I could be clever and write some function that happens to work both for 64-bit callers and 32-bit callers." And yes, you can come up with some trivial examples where that works, but as soon as you do anything interesting, that breaks down. So we're recommending and we're adding no support for mixing 32-bit and 64-bit code in the same process. And the way we do that is our tools will mark all code, even though it's just PowerPC instructions, will mark it with whether it's using 32-bit or 64-bit calling conventions.

How we're going to do this is something we call fat libraries or fat binaries. Some of you may have seen different incantations of this concept of fatness before. I want to contrast this with some other OSes that when they introduced 64-bit, say for instance, you put all your libraries in /usr/lib as some OSes do.

When 64-bit came out, they had two libraries with the same names and they couldn't put them in the same directory, so they came up with a new directory, /usr/lib/64. And basically they kept all the 64-bit and all the 32-bit binaries in separate directories, and that's how they kept track of them.

We're doing something different. We're leveraging the fat technology, and this is how it works. On the right-hand side there, you see a normal Mac OS file, which starts with a small header which marks, in this case the calling conventions, 32-bit PowerPC is PPC, and then has the text and data needed for that file.

We also allow you to create FAT files and our tools will do this for you automatically. Or you can use a lipo tool to pack these things together. All it is, at the beginning it has a table of contents that says here's all the sub files or some images in this file and they're appended one after another. This allows you to chip one file that has both 32-bit and a 64-bit implementation in it, either a library or a main executable.

If a user takes that and runs it on a 32-bit system, well, the 32-bit version of the app will be run and they'll be limited to the 4G address space. If they take that and run that on a "Tiger" or later machine, on a G5 or greater machine, the OS will automatically pick the 64-bit version of the file and run that. I just want to have a little fun here to talk about what does the 64-bit address space really mean? It's pretty easy to say, but how big is it?

So imagine, if you will, you took a tip of your pen and made a little dot. Let's call that one bit. So let's say right next to it you tried to pack around seven other dots to make a byte. It's a few millimeters on the side. Now, if you extended that and tried to draw actually four billion of these little bytes, how big of a surface area would that be?

Well, it turns out to be roughly the surface of the roadway and sidewalks of the Golden Gate Bridge, including the approaches. So we've spent our professional career in the 32-bit world basically playing around an area the size of the Golden Gate Bridge. So what does 64-bit means with that same scale? Well, 64 bits is actually the surface of the Earth. It's not quite, that would be 65 bits, but it's actually twice the surface area of all the land masses on planet Earth. So basically, you can get lost in 64 bits. It's big.

Now, how are we going to divvy up the 64-bit address space? Well, first thing to remember is the kernel is going to set up, depending on which calling conventions you work, which convention is 32-bit or 64-bit, whether you have a 64-bit or 32-bit address space. All the existing binaries will load in the 32-bit address space and still have the same restrictions they've always had.

64-bit, you can load anywhere in that 64-bit address space and have access to the entire 64 bits. Now, one thing we're contemplating doing is some of you may know that we currently have this thing called the zero page where the first 4K of a 32-bit process is mapped to be neither readable or writable. And that catches a lot of null pointer or simple offsets off of null pointers.

So we're considering doing the same thing for the 64-bit address space, but instead we'll actually

[Transcript missing]

So let's get down into how you actually compile for 64-bits. So in Xcode, in the inspector, there's now a new attribute here. I'm not sure if it's big enough to read, but it says "Architectures." That's new for the preview you have of Xcode. In the architecture fields, you can type "PPC64." That is our token to denote that you're using the 64-bit calling conventions for PowerPC.

If you're using GCC, you can say -arc pbc64 from the command line. Now I want to talk a bit about what is actually in the 64-bit on the preview you guys received this week and what we're going to have by the time Tiger is done for 64-bits. So first of all, the kernel currently does not support the full 64-bit address space.

It only supports basically 2 million times the 4 gig address space. That's still-- you'll have a hard time filling that up. The second is the only compiler we have for 64-bit is the C compiler. When we should find it, we'll also have the C++ compiler available. Some of you may ask, "Well, what about Objective-C?"

That's actually easy to do in the compiler. The problem is, as I said earlier, we're only committing to lib system being available. And our Objective-C runtime relies heavily on the foundation frameworks, and we haven't committed yet to when the foundation framework will be available using 64-bit conventions. So we're not going to release the compiler yet until that's done.

Next thing is GDB can actually debug 64-bit programs already with this preview. The assembler and the static linker can create them. But what it creates are static versions of executables. These executables cannot load any libraries. If just one image is loaded in, that's it. For "Tiger" final, we're no longer going to support these static executables and only going to support dynamic executables.

For this release, the file format we're using is a standard Mac OS file format, which means it's limited to 32 bits, and which means that your 64-bit processors will load in the low 4 gig. We haven't done that trick yet of mapping it out. By the time "Tiger" goes final, we'll have an updated file format in which you can load your XCubeable anywhere in the 4-gig address space.

Lastly, because of the difference between static and dynamic, Xcode is only going to support building standalone static PPC64 executables, not FAT. And finally, you'll be able to build FAT binaries. And of course, the whole point of this preview is for you to evaluate 64-bit to start playing with it and give us feedback.

Once "Tiger" is final, then the programs you make on the final "Tiger" you can ship and Apple will support. Let me summarize here. G5 is a unique 64-bit processor. There is no mode. All the instructions are exactly the same. The only difference between 32-bit and 64-bit executables is conventions they use.

Again, because once you switch to 64-bit, your data structures are bigger, you're going to have a slight performance decrease. For that reason, the only reason you should convert to using 64-bit calling conventions is if you actually need more than 4 gig of address space. We're using the architecture part of our FAT builds to enable a mixture of both 32-bit and 64-bit calling conventions of PowerPC code, and we're calling the new thing PBC64.

And lastly, we're only committing to shipping LibSystem as available for 64-bit programs. So you need to work around that. And again, the programs you build on this preview will not run on Tiger Final. It's purely for evaluation. So next, I'd like to bring up Jeff Glasson, who's going to give you a short demo of 64-bit. Thanks, Nick.

What I wanted to show is I wanted to go a little bit into how to build a 64-bit service application using Xcode. And for those of you that were at Ted's tools overview session on Monday afternoon, you saw the Celestia app. What was going on behind the scenes there was we had this 32-bit GUI application with a 64-bit service application in the background.

And that application actually was mapping 6.5 gigabytes of terrain data, which I think we left out that little statistic. So it really was making use of the 64-bit address space. So I'm going to show that actual service application, how you would build that in Xcode, and then play some games and step through the debugger. So let me launch Xcode.

And I'm going to do some cut and paste to speed this up a little bit, but you'd be doing the same thing but typing. So I'm going to create a new project. And just a standard C tool. Just call it TerraMapper, because it's a good similar name. Give it a second.

I think my disk spun down while the machine was resting. So I'm going to do this quicker. Instead of me actually typing in the text, I'm going to actually add the source file to the project quickly. And copy it in and delete that little template main that came with the temp project.

Okay, so the first thing you need to do when you're building a 64-bit app to make sure is you want to open up the project inspector and as Nick mentioned, there's an architecture flag setting and by default we build for 32-bit conventions, so I'm just going to change this to PPC64.

And then close that. And so now what I want to do, I'm going to open up the source file, and I'm going to set a breakpoint at the start of-- actually, before I do that, I'm going to do something a little tricky to speed this demo up. I'm going to actually pre-allocate three gigabytes of my address space.

Let me count the right number of zeros here. This is actually going to speed things up some so we don't actually have to read four gigabytes of data off the disk for this demo. Let me set a break and then we'll build it. Actually, let's go into the debugger. So Nick mentioned GDB and the Xcode UI all is 64-bit aware. Right now, this actually is not going to be very interesting, so I'm going to step a couple steps.

Okay, so now let's do something a little more clever. So I've just pre-allocated three gigabytes of address space. What I want to do is where we're going to start mapping the data, I want to set a breakpoint there. And now I actually have to drop to the Xcode console, or the GDB console here, because I want to actually set a condition on that breakpoint.

That is actually breakpoint two. So what I'm going to do is set a condition so that breakpoint's only going to stop when my pointer value gets above four gigabytes. Again, I have to make sure I type enough zeros because it's more than I'm used to typing. and now I can continue running. So, the demo gods have not blessed me. Let me try that one more time. Sorry.

One second. I'm going to quit and relaunch it in case there was some stale data there. Remember, this is preview software. Ah, I remember what's wrong. I forgot to give it some command line arguments. I actually need to go here and actually tell it where the data is. It's my fault, not the software's fault. So I'm going to actually cut and paste a whole bunch of stuff here. Okay.

I'm just being paranoid with ULL to make sure GDB knows that I'm typing a 64-bit constant. So, okay, so now we're actually reading, reading, reading. So it's loading about a gigabyte since I pre-allocated three gigabytes. Let's see. We actually have a pretty big value for P right now. That's way up there in the address space. And actually, you can use all the features of Xcode. You can actually look in, dereference it. You can bring up the memory viewer, which is under here.

Uh-oh. I can't type. So we actually can look at that memory. It's not very interesting because this is actually mapped but hasn't been faulted in yet, but the tools are all ready for 64 bits, and as we move more frameworks along the line and as we make things dynamic, I think you'll be able to explore and give us feedback right now. So that's all I actually had to show. It's not very interesting since the GUI is not 64 bits, and you already saw that, but with that, I'd like to introduce Stan Chebs, who's gonna talk a little bit about some pitfalls with 64 bits.

Hello everybody. So we're going to go into, twist out and go into a little more detail into the pitfalls and what actually happens when you try and do 64-bit programming. Jeff very narrowly skated several errors and he's actually quite lucky the demo guys didn't ultimately smile on him because he actually got 64-bit numbers back when he put 64-bit numbers in.

So the kinds of things that can happen is that the source code will need changes because integers remain 32 bits and there's a number of practices, long-standing practices, that no longer work. For instance, integers cannot hold pointers. That seems fairly obvious, but in fact a lot of code will casually assign pointers to integers expecting to get the pointer back later on somehow.

That kind of practice won't work. Even something as innocuous as using a %d in a printf will not actually show you the entire number. And that can be very confusing if you don't use GDB and you try using printf for debugging. The other things we have to do is that casting doesn't actually solve the problem. There's ways to get tricked by sign extensions. There's ways to get tricked by function calls.

We'll start out by recalling Diogenes. And Diogenes, if you remember, was a philosopher of ancient Greece, and one of his sticks was to wander around with a lantern looking for honest people and never finding any. So since we're in the modern age, we have a flashlight, LED flashlight, and we're going to be looking for honest programmers. So, our first question: How many programmers have assigned a pointer to an integer? Wow, we have a lot of honest programmers. That's very encouraging. But I know it's not everybody raised their hand. Perhaps we have some Java people in here?

So the key thing to know about assigning the pointers to integers is it will lose data. It will just simply drop off the top half of the pointer and it'll just be gone. This is half an instruction set level. There's no way to recover from it. Now you can assign to a long or a long-long. Both of those are perfectly okay.

So with the code example we have here, we do the malloc. We'll assume the malloc came back with a big pointer variable. We assign it. And GCC is helpful and it does warn you that the assignment is making an integer from a pointer, but it doesn't tell you that you're losing data. It just warns you that you're doing this without a cast. We'll come back to casts in a moment.

So if you look at the value of the integer variable at that point, it's just the lower half of the pointer. Now if you do the same thing to a long variable, you get the entire value. You can assign that back to a pointer later on. That works.

Now, printf. How many people use the correct kind of printf directive for their longs and longlongs? So, most people will just habitually tend to use %d, and %d has the fatal flaw that it will only show you an integer-sized thing. Printf is not a magic function. You hand it all the arguments, but it decides what to pull off the stack that you pass to it based on what the printf directives tell it. The directive says pull four bytes, it pulls four bytes and leaves the next four bytes for the next thing that it's asked to print. So you can get some interesting behaviors.

And it can be very confusing because if you use printf as your window into what's happening inside the program, and printf is not telling you what's really happening, you can have a situation where the program is more or less working correctly, but printf says it has to be failing.

So you've got to watch out for that. And so the directives to use, they've always been there in C. They've been around. You can use %ld for a decimal printout, or you can use %lx for the hexadecimal print for longs. And for long-longs, it's always been available to do %lld and %llx.

And then we also have %P for pointers. The standard actually does not define what %P does. In our case, it puts a 0x on the front and prints it out in hexadecimal. But that's actually not a cross-platform expectation. %P may do something different on a Linux or a Solaris or what have you.

Now, how many people use casting? Good, that's good. So, casting unfortunately is not a magical process that somehow makes the conversion work. All it does is tell the compiler that you actually intended to assign one to the other. So, as our example here shows, we can assign a pointer variable to an integer variable and voila, it whacks the top off again.

But except that this time, the compiler hasn't actually said anything. It says, "Hey, hey, you put in an int cast in there, the programmer must know what they're doing." So it doesn't say anything. And again, you can do the same thing with a long cast. The long cast will do the right thing. So the basic bottom line is that all those casts you thought were going to fix the 64-bit problem actually aren't doing you a bit of good.

Now, signed extensions is a little bit complicated here. The problem is that an unsigned 64-bit number may actually look like a signed 32-bit number. And it's a little bit messy to set it up, but I did check this out in the code, so if you run back to the lab, I'm reasonably certain that if you type all this in, you'll get more or less the same result. However, we're destroying the slides after this talk, so you won't actually--you'll have to work from memory.

So, to take an example, let's take and assign 9 bajillion to an unsigned integer variable. So this is just above 2 gigabytes, so if this is a signed value, it would actually come out as a negative number. But we're cleaning our code, we've made it an unsigned integer. We can take that, we can assign it to an unsigned long, and the right thing happens. But then if we take that and we assign it to an assigned integer, we end up with a value that's less than zero.

And the other thing that's interesting about that, if you also take that less than zero value and then assign it to assigned long, it's still less than zero. So now we've taken our nice large positive number and turned it into a large negative number. Now the juicy part about this is then you go and do another assignment. You assign it back to an unsigned long. Okay, it now comes out as an extremely large unsigned number.

And so if you say we're expecting this to be, say, a number of iterations, you know, two billion iterations, that's sort of plausible. However large that number is, that's an awful lot of iterations. Your machine will spend quite a bit of time getting to the end of that.

So that's kind of a mysterious looking number, but what it really is, is the nine bajillion you had originally, but it's had FFFs glued on the front. Essentially that's just being assigned, extended, and choose complement. So that's where that value really comes from. But after a set of transformations like this, it's not obvious that that's where the number really comes from.

And what you want to do is when you're sitting in debugger and the numbers aren't making any sense, look at them in hexadecimal. In a lot of cases you'll see that, in fact, there's this sign extension that's gone on, the number's been turned into a large unsigned value.

Now, another way to have bad stuff happen is through function calls. Now, most of you have probably done prototypes. Is that true? Has everybody done prototypes for all their functions? How's this going? Okay, there we go. Everybody does prototypes for all their functions? They add prototypes when the system doesn't provide the prototypes? Fewer hands go up there.

So, this time around for 64-bit, the prototypes really, really matter. Because the rules of C are that if you don't have a prototype, it'll fall back to its defaults, which is to pass doubles for floats, which is usually okay, but to pass integers for the integer arguments. And in particular, that's not longs.

So, and the compiler may or may not say something about this. So, we have here an example of a function called fun, where it's declared as a function, but it doesn't have a proper prototype. We can take a long value and assign it to a long variable, and that works. And we can pass that into the function.

So, the function calling process truncates the integer again in a fashion that should be familiar by now. And if you look at the value of the variable inside the function, it's chopped off again. Now, if you pass the whole constant, it'll do the same thing. It'll still chop it off. In that case, the compiler will give you a warning that it's chopping it off, which is a small consolation.

Okay, so we've got all these ways of losing by getting the values cut up in ways that you don't want. So, how do you do it right? I mean, what do you do? You have often a very situation, for instance, you have to send something out over the wire, you have to send something to a 32-bit process, and you need to preserve both halves of your big pointer.

There's a couple different ways to do it and I won't try and recommend a single way because it really depends on your situation, your application. One way that works reasonably well is to use a union. In this case we have a union that exists only for the purpose of splitting up a big value. Now we have two fields union, a long and an array of two ints.

We can assign it to the, we can assign it, take a long variable, assign it to the union, and then if you look at the two halves of the union they come out as the two halves of the integer. Now a downside, a hazard of this, this is NDE-independent.

So if you're just transmitting within, from one processor to the same kind of processor, this will work, or if you're doing it within a single program. But if you're splitting in half and sending the two halves over the wire to say, "An x86 machine, chances are the two halves will go out wrong." And so you need to be aware of that, which half is which.

Because again, in traditional fashion, you send it out over the wire, it comes out wrong. You sent two over the wire and the x86 receives it as 2 times 4 gigabytes or 8 gigabytes. And for instance, in the header of a packet and it's expecting 8 gigabytes of data when you only sent two, the x86 machine is going to be waiting for a very long time. On the plus side, you can then say that's a Windows bug and everybody will believe you.

So the other way to do that that's more reliable if you have to pay attention to any of this is to write out manually the cutting up operation. And I've written it out here and I've tested this one too. So again, you know, do it from memory on a lab machine, see if it really works and come tell me if I got it wrong.

The game here is we need to mask off, we want to mask off the low and high halves of the long number. Now to mask off the low half, you can and it with lots of Fs. The trick here is you need to and it with lots of Fs with lots of leading zeros. Okay, if I just said 0x and then 8Fs, that would get sign extended to 0x and 16Fs and the ampersand then would yield a big number and then it would be cut off to get the wrong half.

Sorry, no, it would actually get the right half but it would get it for the wrong reasons. So I recommend doing it this way. They don't have a slide actually for this but you can actually get into trouble if you don't add the Ls onto the ends of your constants. So I recommend you do that everywhere now that you're doing, working with 64-bit programming.

To get the high half of the number, since we're working in a signed regime, the correct thing to do is to do a shift. You can shift down by 32 and in this case we don't have to put an L on the 32 because it's just a shift and that will make the sign extended correctly if you have a negative number that you're cutting up. And if we print F it, we see that it's cut up in the two halves correctly.

So what kind of assistance do we actually give you to write compatible code? So we have all these problems, all these different ways to lose, you know, what do you do? So at the language level, we give you some standard types and macros. Some of these have been around for a long time. We have size T, we have int PTRT, which is an integer type that is large enough to hold a pointer, and that'll have the correct type whether you're doing either 32- or 64-bit programming. We have uint PTRT, the corresponding type for unsigned.

There's also an N32T and an N64T types that you can use. For the Unix side of the world, there's additional types that are specific to Unix. They're not a standard C-type. We have things like C-Adder T, which is the size of a Core Adder, which is just a euphemism for a pointer into main memory.

So that's going to be a 64-bit type in the 64-bit world. So we have process IDs, Off T is for file offsets, and so forth. If you look in the user include, you'll see there's quite a few of these headers. In fact, if you compare the Panther headers and the Tiger headers, you'll see a whole bunch of changes.

Those places where Panther only had 32-bit headers. So the obvious implication is beware of compiling this stuff on Panther because if you can, if Panther didn't give you any complaints, the headers are different on Tiger, and so you may see things in Tiger that you didn't get out of Panther.

Another thing that's actually very common for programs is that programs have been ported to 32-bit. Say they were on Unix systems already, they will have their own local definitions for types. And there's any number of different conventions that people use. They'll have macros all uppercase, they'll have funny names for them. You'll see all kinds of different things out there.

One source I'll recommend is GNU software itself. GNU has been ported everywhere in the universe, has been ported as native tools, cross-compiler tools, all that kind of stuff. So every combination of 64 and 32-bit you can think of has actually had to have been handled, and there are a set of definitions in there that have been proven over time to work well for this kind of thing.

One of the special things that's come up, and several people have had to solve this, is what to do about these printf directives. It turns out there's not standard macros for these, although it seems like a really good idea. So you'll see some programs will actually define macros for the printf directives.

So I have an example here. If you want %lld to do the right thing, and you don't want to say pass an integer to %lld, because then that will grab the four bytes of your integer and the four bytes of the next data item passed to printf, So you want it to encapsulate this somehow. And so you can do something that amounts to using string concatenation, which is a capability of C.

And you have part of your string, you have the macro with the directive in it, and then you have the rest of the string. And this way you can get something so that the compiler won't give you warnings about the printf directives not matching up with the data types.

So, further API changes we have. We give you a LP64 macro, double underscores on the front and back because it's something predefined by the compiler. And LP64, the value of it is 1 for 64-bit compilation and 0 for 32-bit compilation. We also give you a PPC64 that's defined when you're compiling for 64-bit PPC and is not defined for 32-bit PPC. And there was the existing macro, double underscore PPC, is not defined when you're doing 64-bit PowerPC compilation.

And we had a little debate on that and we looked at the uses in practice and they were pretty much either/or. It was like either you're doing 32-bit or you're doing 64-bit and if you turn on PPC and PPC64 at the same time it would confuse a lot of headers. So we decided to make them mutually exclusive.

In practice, you should almost never use PPC64 directly because that's going to wire in an architecture dependency unless you're actually literally writing PowerPC code slipped into the middle of C code or something like that. You probably want to use LP64 instead or else, if at all possible, write the code to be 3264 independent.

One of the things you'll see is, again, if you look at the "Tiger" headers, you'll see we've made a bunch of API changes where we had to choose whether an argument was a long or an integer. And so I just thumbed through and found this little bit out of a header file whose name I forgot to write down, so I don't remember which one it is.

The functions get at our list and in the old, in the Panther headers, the argument for them is said unsigned long and that would have an undesirable consequence in the 64-bit world, so the header has been changed to say unsigned int. However, the programs that have their own declarations of the same system function, which happens, would be inconsistent if they continue to say unsigned long and you had unsigned int in the system header, so those are mostly conditionalized on LP64. And this way we get backwards compatibility with Panther code that may refer to the same prototype.

We have one little API change for assembly language, which is a new directive to allocate an 8-byte object and just call it ".quad". It's not the greatest name in the world, but it's consistent with other assemblers that do this, so that's why we chose it. The dot quad here, I'm just feeding it a large constant, but it also works to feed it a relocation. That's not very interesting right now, but when the full 64-bit Mac OS file format is available, you may end up wanting to use this in assembly code.

So, I've alluded to warnings a number of times. Now this time, you really have to pay attention to the warnings. If you're getting a warning and it's telling you about loss of precision or casting integers to pointers, this time around it's not just for show. You really are going to lose data and bad things will happen.

One of the things you can do is to add additional compiler options just to be on the safe side. In Xcode, you can say, ask for other warning flags, which is the equivalent of -w all for GCC users, and that will turn on lots of additional warnings. And people are often annoyed.

They say, well, W-Wall, it turns on too many warnings. But actually, in doing 64-bit programming, W-Wall is actually not all, doesn't even list all the bad things that can happen to your code. So we have an additional option, the -W conversion, which gives additional warnings about conversions that might possibly lose data precision. And 64-bit, if it says you might lose data precision, you probably are losing data. So we recommend using W conversion.

You can also ask for -W require prototypes, and there's not an Xcode flag for it that I could see, so send in a radar for that. And what it does is it actually insists that all your functions have prototypes. So if you have a forgotten piece of code that was always quietly taking integers and assuming everything was okay, -W require prototypes will flag them for you and say, hey, you need prototypes there.

Take a moment to talk about what we do at Tools and Utilities. We have an extended set of APIs and a few new APIs to handle tools that want to manipulate 64-bit processes, but don't necessarily want to be 64-bit themselves. An obvious example is GDB. When you run GDB in Xcode, it's actually a 32-bit program still, but it is manipulating a 64-bit process, which is the program you're debugging.

So the way we do that is we have things like a type vm_address_t in the system headers and it's set to a 64-bit type if PPC64 is enabled. We also have extended APIs such as VM Read, which will read 64-bit addresses, or will read data out of the 64-bit process. The OS guys have been slaving away hard on this over the past few weeks to get all that to work.

Now device drivers are running in a 32-bit environment. We have a 32-bit kernel and this is partly for efficiency and partly for to sort of have a single kernel that runs in all types of systems. And we can do this actually because the representation of memory, as you saw from the previous slide, the representation of memory is as data structures. So a 32-bit compiled text can actually manipulate the memory going into a 64-bit task.

So specific names of the functions, we have a prepare method that establishes I/O mappings, and then we have specific routines both for use with DMA and parallel I/O situations. Actually, it's probably not parallel I/O, that just shows, but I'll let it, what's that? Programmed I/O, thank you God for it, yes. I looked at it and it says, you know, parallel I/O, that's like 8-bit microcomputers. That's probably not what they meant in I/O Kit land.

Okay, so yeah, so programmed I/O, we have read bytes and write byte methods. So at least for now, and there is a possibility we may have to do something with 64-bit device drivers, and you have OS guys over on that side of the room that you can buttonhole in the Q&A period to ask about that.

Now if you're actually doing 64-bit I/O from a 64-bit process, the same POSIX APIs work as always. They've been essentially recompiled to take 64-bit addresses and all that stuff has been done. If people have questions about I/O KitLive and I/O User-Client plugins which are not available, let's bring it up in the Q&A session.

So I'd like to pop back up a little bit and talk about the design issues that you might want to think about. One of the classic uses of 64-bit applications that have become prevalent in recent years is to use them for servers. And servers are actually a very interesting use because the classic model now, what we've seen with Internet-type servers as the Internet's become popular, is that they need to handle large numbers of clients, in some cases maybe thousands of clients simultaneously.

And it's very convenient to actually be able to have a very large address space because then what you can do is have, say, one thread per client, and have access to a single large shared data space. So, for instance, you're serving out images, you load every last one of your images into memory so that it's readily available, and you can serve it out to clients as they ask for them.

This can actually be a very effective approach for things like databases where you can lock on individual data elements and you can have a single server managing all of those rather than trying to do something with multiple server processes managing shared files. So the internet server is a really interesting area to do 64-bit programming.

We can generalize that a little bit and talk about compute engines in general. And the TerraVision demo that you saw yesterday that Steve Peters did is actually a classic example of that. We have a 32-bit GUI front end. We use inter-process communication in one form or another going back to the compute engine, which is handling the very large address space.

What this does is essentially it can shift the burden from your application code to the system. A lot of programs actually already have mechanisms to handle large amounts of data. What they'll do is they'll manually page in data as needed and then page it out and then page in different data.

And with the 64-bit capability, you actually wouldn't have to do that anymore. You could just allocate large amounts of memory and suck it in and use it and never have to worry about running out. And if you have a piece of data that's not being used at the moment, you can essentially rely on the virtual memory system to page it out for you.

So in a sense, what it's doing is it's replacing your code with a system code. And that can be a great advantage because now it's not you having to write all this stuff and play computer scientist and read about VM memory items. You can let the friendly experts at Apple take care of that for you.

Now it may be that you know something about your application's memory usage that will be actually more efficient than the generic OS can do. And you basically have to reevaluate that for your own application. Do you have a usage pattern? Do you always have something that's always first in, first out, and that doesn't necessarily play nicely with the last in, first out in the VM system?

And that's just going to depend on you with your application. If you already have a memory management algorithm that you know is more efficient than anybody else in the world can do, then you probably want to stick with it and maybe even stick with using a 32-bit program and continue to exploit your algorithm. But it's the kind of thing you want to actually stop a moment, take a look at and say, you know, is this the best algorithm or can I leverage Apple's VM system and get something better?

So, what 64-bit does for us is that it really opens up a lot of new vistas. And I've talked about a couple of very specific things, but they're things we already know about. There's actually all kinds of opportunities that we really don't know about. I was actually talking with one of our guys last night and thinking about that surface of the earth analogy. And in fact, if you think about it, every point on the surface of the earth is identifiable with a 64-bit number.

So you could write an application, for instance, in which instead of maintaining, say, a linked list of locations stored as 64-bit values, you record data about a point on the earth at that memory address. So essentially, you're storing data on the earth. You use the entire 64-bit address space. The point under my feed, you know, is 4589. And you record the color of the carpet at that location. And you just record it directly at that memory location. And as you read and write that data, the VM system will handle the paging of that blob.

So, that's kind of exotic, right? You know, you think about, well, that's kind of silly. Who would do that? But the thing is, that's actually something you can do and something you couldn't do previously. So, you know, maybe it's a good idea, maybe it's not. So the real bottom line in all this is that these kinds of applications are actually only limited by a single application. They're actually only limited by your imagination. In the final tiger, we're going to give you the full 64-bit address space.

And most other systems don't give you that much. They give you somewhat less than that. So they give you the whole thing, and we're actually very excited to see what you'll think of to do with it in the future. So that's my part, and I'd like to bring up Matt Formica to do the wrap up and do the Q&A session. And thank you very much.

So you've seen a lot of information about 64-bit today. There is further documentation information available for you. The main thing right now is the 64-bit transition guide that's been written. It provides kind of the state of the world right now in Tiger with the preview DVD that you have. So that's the best place to get information about what we're doing.

We're now going to bring up our Q&A panel. We have a bunch of engineers who are going to come up. You can certainly talk with us, ask questions this week. Beyond this week, feel free to send me an email. I'm the developer tools evangelist here at Apple. My email is [email protected]. I'd love to communicate with you via email about the 64-bit tool set on Mac OS X.