Building 64-bit Solutions for Tiger - WWDC 2005

OS Foundations • 1:12:52

Are you developing an application that could benefit from more than 32 bits of address space? Mac OS X Tiger offers support for 64-bit command-line processes that can address vast amounts of memory. Learn the specifics of Tiger's 64-bit support and how to factor a Carbon or Cocoa application to run in conjunction with 64-bit backend processes.

Speaker: Stan Shebs

Unlisted on Apple Developer site

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.

Hello everybody and welcome to session 507, Building 64-bit Solutions for Tiger, or you could say, Living in a Double Wide. My name is Stan Shebs and they tell me I worked on this. They said last year, a couple months ago, you know, it says we want you to give a talk on 64-bit Macs. And I said, that's really cool. That's like an $8 Mac. That's really low cost. I didn't think we'd gotten quite that low.

And they said, no, no, no, that's not what 64-bit means. So then after several weeks of intensive psychotherapy and a small amount of electroshock, it came back to me that I actually had participated in it somewhat. And so I'm still not entirely convinced. I was looking at last year's video and I thought I saw editing artifacts around the image of me talking about this stuff. But I'm going to pretend. So we start off, we'll talk about what exactly it is we mean by 64-bit.

We're going to talk about what Tiger is actually offering for 64-bit applications. Talk about when you should actually do 64-bit. It's not something that you automatically want to do for everything in sight. The kinds of considerations when you're actually talking about porting your program to 64-bit or bringing it over from a different platform. And what to do about providing a GUI.

And then finally, well, to answer the question about the elephant in the room, We have nothing to say about Intel and 64-bit. So, yes, I can see everybody getting up to leave now. So, actually, there is something, though, is that as I go through the talk, you look at each slide and ask yourself, what on the slide is actually specific to Intel versus PowerPC, or what is actually generically applicable? And I think you'll find that most of what I'm going to talk about today is actually completely the same for PowerPC or Intel or whatever architecture we might happen to use in the future, that fourth and fifth transition.

Oops! Sorry, I gave that away. You didn't hear that. Confidential session, right? So what is it we mean? We hear 64-bit a lot. And we've been hearing about 64-bit, this and that, and various operating systems for quite some time. And you hear about some days it's the savior of everything, and others it doesn't really matter that much. So for us, what it means is the ability to do 64-bit integer math natively. So the processor is able to do the 64-bit arithmetic with single instructions rather than with, say, subroutine calls. It also means support for more than 4 gigabytes of physical memory.

And then finally, the thing that's most exciting is the support for virtual address spaces larger than 4 gigabytes. And it wasn't that many years ago that 4 gigabytes sounded like infinite space, but with the growth of applications and so forth, it's actually turned out that it doesn't seem so big anymore. So I'll quickly summarize what our 64-bit architecture looks like for OS X. First of all, 32-bit programs are unaffected. We don't do anything about them. They work in the same way they always have.

We don't allow mixing of 32 and 64-bit tasks. So if you have a 64-bit program, you have to have a 64-bit version of each and every library that it might make use of. So the alternative might have been to do some kind of ugly thunking thing and we thought about, you know, just exactly what it would involve stuffing a 64-bit pointer in a 32-bit box and said, there's no good solution, so we're just not going to go that way.

And it turns out that universal binary thing that we've heard so much about recently turns out to be very handy also for packaging 32- and 64-bit programs and libraries. And so we actually get a, and now that we can talk about the Intel code, we actually have three-way binaries, sort of a menage a trois of file formats. And if you look closely around on your system with the Intel stuff installed, you can probably find a few of those. Thank you.

And then finally, we use a single 32-bit kernel. This is at a certain advantage. For instance, there's no rebooting into a 64-bit system necessary. Keep using that already. All your existing drivers continue to work. Drivers that want to work with 64-bit address spaces, they already use IO memory descriptors, essentially an abstraction of the memory space. So the driver doesn't have to change. It can continue to work with the 64-bit applications.

Now technically, somewhere around 1027 when G5s were introduced, 64-bit showed up on the Mac. And this first version consisted of 64-bit instructions and data. And it was available to all programs. If you wanted to write a load double instruction into your code using an Asm block or something like that, you could do that.

Now, in practice, it had a fair number of restrictions on it. For instance, the ABI wasn't structured so that you could pass the full contents of the 8-byte register. So you could do a 64-bit add or whatever, but then you couldn't get the result back from the function.

Let's also introduce the greater than 4 gigabyte physical memory support. So this is good for the hardware guys. The software guys were sort of, yeah, whatever, a lot of memory. More than we personally can afford, so we didn't ever get to see it. But we were told that that worked. And our rich application developers got to use lots of memory. Anyway, that did allow the caching of large application data sets in memory. But without any kind of software support, you had to do something special in your program to actually make use of it.

And of course all this is G5 only. Basically I actually did experiment for a couple days with trying to get some kind of 64-bit thing working on G4, but then somebody pointed out to me that in fact it wasn't going to be able to address any additional memory, so it was kind of pointless. And I said, yeah, that's a good point, it is pointless.

So in the large datasets, just a quick example. So if you had a 4-gigabyte application and 7-gigabyte data file, you could actually map that to physical RAM. If you had the 8 gigabytes of physical RAM, VM would take care of all of it. But from your application, you would only be able to manipulate a small block of that at a time. And so the idea is that you'd essentially move this kind of a window of data around. Now, I personally don't know of any application that actually went to that much trouble, you know, that was actually possible in the Panther days.

Now in Tiger, we jumped all this up. We defined a new PowerPC ABI and the most important thing about it is that the full width of the 64-bit registers is actually being used. It's pushing the data back and forth in and out of functions. We had defined a 64-bit version of the Mac OS file format. And this is basically the same Mac OS file format that's been in use all along. It's been extended so that some of the fields are wider and the relocations are bigger and so forth.

And then most interestingly, we have the big address space. And this is a big flat address space, so you can actually allocate a single block at 2 to the 49th, the 512 terabytes. And you can actually call malloc with that and it works. Now, if you try and touch every byte in that block, that will not go so well. You'll go from the textbook discussion of VM thrashing to personal observation.

Still G5 only. So, how big is the 64-bit address space? Well, the technical term is 16 exabytes. Nobody knows what exabytes means, so we say 16 billion billion bytes. Or if you're European, we say 16 milliard milliard bytes, or something like that. How big is that? Well, it's 16,000 internet archives. It turns out we don't have enough pixels on the screen to quite get it to scale.

The equivalent to a million libraries at Congress. And you thought the Library of Congress was big. Well, not really. And it's equivalent to 3 billion DVDs. So that's a lot of space to work with. So what that means then is that you can still take that 7-gigabyte data file as we showed before and it conveniently fits in a corner of your big address space and the VM still manages it with the physical RAM just as it did before.

So, the future of 64-bits is now thrown into some confusion, perhaps, with the whole Intel thing. Oh, well. It's nothing like getting upstaged by your CEO two days before your presentation that you've been working on for a month. Oh, well. So, the way we've done this is we've rolled out a... Back up a moment. Okay.

So, we've introduced a limited set of 64-bit libraries this time in Tiger, and then future versions we'll be bringing on more of the software stack available as 64-bit. And part of that depends on what you guys want. So, if you say, well, Objective-C is most important, we want that first, that will certainly bump up the Objective-C, 64-bit Objective-C priority.

Similarly, if you're most interested in Carbon, or most interested in OpenGL, what have you. Now, at the same time, the open source world has already been picking up some of the activity of porting to 64-bit. You can get X11 client libraries out there now. And so, they're actually going along and porting quite a lot of things.

Okay, so the trade-offs for 64-bit. 64-bit is not always a win. You have twice as much data that you're pushing around, twice as much operations, caches fill up faster. So there's sometimes where it's good and sometimes where it's not so good. Now if you have something that's compute intensive, something involving long integers, long long integers, 64-bit integers, that's pretty much a definite win. We have one example, I'm told, one of the chess programs called Crafty that runs quite a bit faster in 64-bit than 32.

Now, conversely, if you have a program that is, say, pointer-intensive, manipulating a lot of blocks of memory, but a relatively small amount of data overall, well, it's just not going to win as much. We wouldn't like to say lose in this context, but you're going to win less. So, yeah, so basically you're pushing around twice as much data, but you're not getting anything in return for it. So in that case, you really wouldn't say that you'd want to do that as a 64-bit program.

Now if you started shifting up to having very large amounts of data and memory, most likely you're going to win because now you can start, the VM system is going to start kicking in, it's going to start managing things for you. It has better access to the disk IO subsystem, reserve space in the swap space and so forth.

It can actually do a certain amount of paging and management that would be very hard for you to do yourself. So at that point you're going to start to see your 64-bit program do better than a 32-bit counterpart. And of course if you're using 64-bit Windows we just can't help you. Hey, it went through all these stages of review and nobody told me to take it out, so there it is.

Okay, so let's get down a little bit more to the nuts and bolts. So, to get 64-bit code, it's pretty easy in Xcode. You go to the build configurations, is that what they call it this week, build configurations? And, bad attitude. My manager's not laughing. That's not a good sign. Yeah, he's scowling over there. Now he's smiling. He's thinking about what he's going to do to me.

Okay. Anyway, so fairly simple to get the 64-bit stuff. The string, the keyword to remember is PPC64. So PPC is implicitly 32-bit. It's the term you've been using all along. And then you can add PPC64. And if you want to do that three-way universal binary thing, then you can add the Intel to this also. So it'll read PPC64 by 386.

So, specific considerations. The first thing to understand is about our data model. And so, the basic deal is that if you think about C family languages, and this is really what we're talking about, we're talking about the C family now. You have the, C has always been very flexible in that you can take pointers and treat them as integers and integers and treat them as pointers. And you can add two integers and say, well, actually, this is really a pointer, go and find a piece of memory with it.

Well, it gets to be a little bit of a problem in the 64-bit world because if you're going to have a 64-bit pointer, well, what integer does that correspond to? And different people have chosen different things. So, now, just to start off with, the old style model. The conventional model is called ILP32.

That's integer, longs, and pointers, all 32 bits. And so, the model is that integers are 32 bits, longs are 32 bits. Longs are 64, and that's most people have set up long, long to mean 64-bit integer value, and pointers are 32 bits. That's pretty much what everybody does.

Now in the early days of 64-bit, a certain couple platforms, name of the alphas and craze, how many people have done alphas or craze? A few, okay. So you've probably personally experienced what it means to have an all 64-bit, all the time system. They call that IOP64 and it sounds really great until you actually try and get something working. And what happens is that now an integer is twice as large as you thought it was.

Every integer is. And so, any bit of code that says, well, I can stick four cares in an integer and pass that somewhere. Okay, well, the integer is only half full. Or you stuck eight characters in there and some of the code only thought there was four. So in practice, it actually turned out not to be a really great idea because it's just too different from the normal metal model of C programmers. And so most Unix systems since then have settled on a sort of a compromise, you might say, called LP64.

So longs and pointers are 64-bits and integers remain 32. And that's pretty much typical for the Unix world. That's what Tiger does. And then finally, just for completeness, we have a completely different way of doing things called P64 or sometimes you might see it as LLP64. And, well, it's a side effect of Windows's long legacy where once upon a time, ints were 16-bits. And longs were 32. And then when it went to a 32-bit system, well, longs were kind of still stuck at 32.

And so Microsoft said, well, and we'll just say pointers are 64-bit. And, well, we can't do anything about longs. We have to leave them alone. So you have this interesting programming model where you have to use long longs to hold pointers or you just lose. So anyway, that's, and that's, as far as I know, it's unique to Windows. Nobody else goes that way.

So the thing to keep in mind on this is certain invariants that will help for portability if it's something you have to worry about. First of all, if you have something that absolutely has to be 64-bits no matter what, use long-long. That's 64-bit all across the board. Every platform does it the same way.

Similarly, if you want something that absolutely has to be 32-bit for whatever reason, use ints, unless you're on alphas. But, you know, we don't want to think about alphas anymore. Was that a clap there? So you can count on integers being 32-bit in size. And then longs and pointers are the two things that will vary.

So, we're actually bringing a program up on 64-bits. You may have to make some source changes. And there's certain practices that you have to worry about now that you didn't have to before. First of all, ints cannot hold pointers. I just said that. Well, that's actually, that is the high order bit of programming in the 64-bit world that ints in C family, they just sneak in all over the place.

And they sneak in in unions, they sneak in in structs, they sneak in in function arguments, all kinds of different places. So there's just a lot of things to watch out for. If you don't know what the problem is, you're going to have to figure out how to do it. So, if you're going to do it, you're going to have to figure out how to do it. And if you're going to do it, you're going to have to figure out how to do it.

You also get to learn about the integral constant suffixes. And this is something that's actually been in C for a while and most people haven't had to worry about it. But if I just say 101, that's an integer. What if I want a long 101? I actually have to write that as 101L.

And you can use lowercase L or uppercase L, I believe, but please use uppercase L. We only get like one pixel difference in our fonts between L and 1, so let's use something that's a little more legible. So to go along with that, there's actually a UL, which makes an unsigned logint, and then also LL and ULL. We like UL the best because it's the most pronounceable, right? You can say 101-ul.

Also, we have printf directives that you all may or may not have seen before. PercentD is no good because the way printf works, the directives in the format string tell it how many bytes to peel off the argument list. And if you say percentD, it says, oh, well, he just wants an integer. So it peels off four bytes.

And if you pass an 8-byte object, as in, say, a long or a pointer, okay, well, it peels off one half of it. And the next thing that printf prints peels off the other half of your 8-byte thingy. So that gets very interesting results. And if you're a bad programmer and are using printf to debug instead of GDB... Nobody does that, right? No, no. Nobody's using it. This is good. Will anybody admit to using printf instead of GDB for debugging? Bad programmers. Take their names. Take names.

Yeah, so if in fact you can't help yourself and are using printf for debugging, be aware that if you're printf, if you're not using the right printf directives, printf will be lying to you. What you think is an accurate representation data is in fact something completely different. I should remind you that GDB in fact does all this correctly, well most of the time. GDB knows about displaying 64-bit values and will do that correctly.

Now, to go along with this, we have some things to watch out for. And there's a lot of ways to lose. For instance, casting can destroy your data. And it will quietly destroy your data. Sign extensions will confuse your data. Function calls will just lose your data. 64-bit to 32-bit conversion can scramble your data and we'll see a couple examples of how to do that right. And finally, alignment rules can hole your data.

So, once again, signing pointers to ints just throws away half of your value. And if you're lucky, it'll throw away the lower half of the value, so when you expect it to get a non-zero value, it gives you a zero and your program crashes right away, and you can then fix it.

If you're unlucky, you'll lose the upper half of the data, and that will work until your program gets out in the customer's hands, and then they allocate that 8 gigabytes of data that you forgot to test for back in the lab, and then the program crashes for them, and it doesn't crash for you, and you get to make a lot of excuses. So now in this case, we take a malloc, we get back a big pointer. Now the compiler will warn us. If we ask for our usual set of warnings, it will say assignment makes integer from pointer without a cast. Now has everybody seen this message before?

Nobody seen this message before? Okay, well, a few people. It happens a lot. And up to now, you've been able to say, yeah, yeah, whatever, you know. So I'm integers, pointers, same size, no big deal. Well, now the compiler is telling you just threw away half your pointer. Print out the integer value, you get just one half of it. You do it to a long, nothing, everything works great.

Now, casting is something that people often do. There's sometimes a misconception about casting. Casting doesn't do anything in the compiler. All it does is make the compiler shut up. You're telling the compiler, yeah, yeah, I know these types are not really compatible, but I know what I'm doing, so just don't say anything. Trust me. So anyway, so the same thing happens as in the previous thing. It's just now the compiler did it silently, drops off the wrong half of your data.

Now, when you start playing around with signed values and unsigned values, you can get some interesting things to happen. Because an unsigned integer can end up looking an awful lot like a very large unsigned number. So the example here is a little bit convoluted, but the key thing at the top is to see we have an unsigned integer value of 9 bajillion. And the key thing about that is the topmost bit is set. They say, well, so what?

Well, what happens is that if we take that and if we assign that now to an unsigned long variable, okay, What'll happen is that it'll extend with zeros and that's great. But now if we take that unsigned long and then we assign it to an integer and then look at the value and then assign that now to a unsigned long, it's a little bit twisted here, but it's not that unusual in code to have happen.

What's happened is that the integer variable now has been said, well, I want to assign to an unsigned value, but it's a signed number. And so what I'm going to do is what the compiler says, I have to sign extend first and then assign the value. And so if you look at the variable then, the unsigned long variable, you have a really large number.

And you say, well, that's kind of a weird number. Where did that come from? Well, where it came from was the compiler sign-extended the 9 bajillion to have a lot of Fs on the front. So this is something that, for instance, you don't want to end up sticking into an iteration loop. It might have ran a reasonable amount of time in 32 bits, and you run it in 64 bits, and it seems very slow for some reason. It's not the compiler's fault in this case.

Function calls are another variation on the losing your data routine. Again, quietly drops it off in the process of passing to the function. This is mainly a problem for unprototyped functions. The rules of C are that if a function doesn't have a prototype, it's just going to assume that you're passing in integers and you're returning integers.

And so this is what you get. You look inside and the compiler says it has no reason to say anything because you told it what to do and it just followed along. So prototypes are really important and you should be asking the compiler to warn you about prototypes in case there's some that you thought were there and really weren't.

So I talked about 64- to 32-bit extraction. There's a lot of ways to do this. There's right ways and wrong ways. There's a whole long list of wrong ways, but in the interest of space and time, I didn't include all of those. So I just have a couple examples how to do it right. One way to do it right is to create a union that consists of a long on one side and a pair of ints on the other side.

And then you can take a long variable, assign it to the union, and then look at the separate integers. Now, this has a certain disadvantage in that this is Indian-ness sensitive. And prior to a couple of days ago, I didn't know we were going to have to talk about Indian-ness that much. But anyway, this is actually one of the places where you are vulnerable. If this data is coming to and from a file and you're on a little Indian machine that's 64-bit, this code will behave differently.

Now another way to do that that finesses the problem is to do it with arithmetic. And in this case, the second example down below, we can do an ampersand to extract out the low half of the integer of the long value, and we can do a shift to extract out the high end.

[Transcript missing]

Okay, so writing compatible code. Unfortunately, the slides seem to, oh, no, sorry. We get, yeah, okay, a little bit backwards here. So I mentioned about alignments and holes. And this is a, this was actually an interesting little example. This is like one slide that actually wasn't in last year's talk, if you were at last year's. Because this actually came up when I was testing the GCC by compiling itself as a 64-bit program.

When you run the compiler on your systems right now, on your Tiger systems, it's running as a 32-bit program even if you're asked for 64-bit code. You're asked for 64, it's effectively acting as a cross-compiler in that case. But I said, well, it would be a good test to configure GCC itself as a 64-bit program and then make it bootstrap.

And if a compiler bootstraps correctly, it'll generate a lot of code and the compiler will be exactly the same byte for byte. And that's a standard test. So I'm going to show you a little bit of a test. compiler correctness. But anyway, so I did all this, sort of made up of fake keys, you know, run it as a 64-bit program. And well, it didn't work.

And not only didn't it work, but it didn't work in a rather amusing way. You look at the preprocessor output, and the same token kept showing up over and over again. I had like, you know, 55 functions all named div di 3 coming out of the preprocessor, so that was kind of an interesting bug.

And so after some poking around, I discovered that, in fact, the bug only kicked in when the code was being compiled optimized. If I compiled GCC itself unoptimized, then everything was fine. So it took a little poking around to figure out what had happened. And in the preprocessor, they had the tokens.

The tokens in GCC's preprocessor are represented as a union. You can see on the 32-bit side, it's a union, and there's two halves of the union. One half, which is just a pointer, like for a token, that's a string. And the other one is a struct consisting of an integer, which is like a type tag, and then a char star, which is a pointer to some blob of data.

So, anyway, when this got compiled 64-bit, the alignment rules for 64-bit required that things be 8-byte aligned. And what this did was very quietly opened up a little hole between that 4-byte integer and the char star, which was 8 bytes. Now, the reason this mattered was because the GCC hacker, who shall remain nameless, that initialized the union, assigned to the integer and to the char star side of the union, two assignment statements, assigned zeros to both of them, says, well, that's good enough. That clears the token for me.

And so what happened then is that the compiler then, when it's compiling unoptimized, the compiler generated two 8-byte loads, clearing the integer and the hole on the 64-bit side and then clearing the char star. And then when you turn the optimized, the optimized says, well, hey, I'm clearing a hole.

I don't need to do that. So it turned the 8-byte write of the first half into a 4-byte write. So that if any part of the code wanted to look at the char star side of the union, it saw whatever was left there from the previous token. And hilarity ensued.

So, in a classic illustration, the open source world, it took me about a day or so to figure out what had happened. And then I wasn't quite sure I liked the fix for it, so I sat on it for a day and did other things. And then I came back and updated my GCC sources and somebody else had actually run into the same bug and fixed it. So, saved me a little time and trouble.

Now, back up a moment here, a little slide rearrangement. Okay, so in order to-- Sorry, this is more than a little scrambling. Okay, so we have some API changes. The most important thing to keep in mind should you need to distinguish between 32 and 64-bit code is a macro called double_lp64__.

This is true when you're compiling 64-bit and it's not defined when you're compiling 32. And this is true for all architectures. A few days ago I planned to do a wink wink nudge nudge about other architectures, but now I can say that if we do a 64-bit Intel, I will commit us to making LP64 defined correctly for a 64-bit Intel architecture as well.

Now if you want to be PowerPC specific, we have a double underscore PPC64 double underscore. And what that is, it's defined for compiling PowerPC 64-bit and the traditional macro that we've had all along for that, the double underscore PPC double underscore, is not defined. So the two of them are actually either/ors. We had a lot of arguing back and forth and the either/or people won out over the or people.

And so what that means is that if you want to have a piece of code that's the same for both PPC and 64-bit PPC, you need to say something like, if defined, PPC64 or defined PPC. We've had a few unfortunate cases where both of them were not defined and the code claimed that there was no PPC64 stuff at all.

Now I have an example here of how you might use it. Of course, there's more and less clever ways. There's just a simple if-def, and the whole point is to use an int in the LP64 case as the last argument to get at our list, which is actually out of our system.

But if you're not doing LP64, you want to keep that as a long because that keeps stability in the 32-bit API. and finally for you assembly language programmers, any assembly language programmers? Not very many. Oh, there's one. We do have a new directive called .quad which just allocates out an 8-byte block and stuffs it just like .long.

Some additional features and facilities we give you. We have standard types and macros, size T, intPTRT, all those you probably looked at and said, yeah, whatever, who cares. Well, now they really matter. IntPTRT, for instance, guarantees to give you a size of integer big enough to hold a pointer whether you're doing 64-bit or 32-bit.

Similarly, Unix has defined some of its own types in the same general vein. And you should also look at, especially if you're working with open source code that's already been ported around a bunch, quite likely it already has some macros to support 32 versus 64. A popular example, for instance, is a way to macrify the printf directives.

Because you'll get a situation where, and the compiler will complain about this, if you say use %LLD on something that's not actually long, long, and say, oh, you're not using the right printf directive. So people do a thing where they make a macro and then kind of have the macro pasted in the middle of the format string. Okay, so it's just a sampling of the kinds of things, and we'll have a time after a lab right after this, so I'll be happy to answer more questions about recommended ways to make 64-bit code work.

So, as I alluded to before, we don't have the full stack 64-bit right now. What we did was we focused on those things that are needed by 64-bit programs, programs that really need the 64-bit capabilities, and those basically consist of servers, databases, compute engines. These will generally be written in C or C++. They'll be command-line, and they may be launched by plug-ins.

Okay, we've made it possible for them to be launched by plug-ins. But they're generally things that kind of run in the background. And all they really need is they need lib system, system facilities, and if they do math, we have the Accelerate framework, aka VecLib. That's also 64-bit.

So the basic deal here is that you can have a lot of We have this ability to map a huge amount of memory, but in fact you still have to think about exactly how you want to structure that. So our general optimal behavior is with a relatively small but still greater than 4 gigabyte space. So I said, laughed about it, yeah, you can allocate 500 terabytes, but probably that's not going to be a big winner.

What we're really looking at is something that supports thousands of mappings per address space. And again, a lot of this is all about giving convenient access to large amounts of physical memory. If I have something where I need a 500 terabyte swap space, I'm keeping the disk real busy and I'm not really going to get any win at that point.

So, let's see. And as we said, not the whole stack. Now we actually lied a little bit about the 64-bit part. The OS guys, being the way they are, actually kept some of the address space for themselves. So, so we actually only get, really only get 2 to the 51 at this, at this point in Tiger. And things like P-thread stacks and so forth will be consuming some of the, the higher space. But I'm, I'm assuming that everybody can, can think of a lot of things they can do with 2 to the 51.

And finally, no GUI frameworks. So what's that all about? Well, I mean, there's the porting part, right? We have a lot of code. We have all the same 64-bit issues that everybody else does in their code. So the strategy we're looking at for people that need 64-bit right now, and they're also looking at GUIs, is to set up the ability to communicate with a 32-bit GUI front end using any of several various IPC mechanisms. And so to talk about that, we have Jim McGee.

[Transcript missing]

What could be so hard, right? Just a little black line connecting two applications, you know, what could be so hard? What was even worse with this quote was in terms of how to implement that black line, not necessarily how to use it. So what we've done in IPC in 64-bit space was try to address some of the key requirements from those set of applications we just talked about.

The ones that we expect to be the first arrivers on the 64-bit space. So what do we need to provide for IPC services in 64-bit space? Well, we needed to be able to solve that picture you just saw. We need to be able to take a 64-bit backend, so a compute engine or a database engine, and we need to be able to put a 32-bit GUI on the front of it. So we needed 32, 64 kinds of IPC, things that those backends would need to be able to do, right? And would have the facilities to do, and the ability to communicate those with 32-bit frontends.

and your particular ones. But we also needed the ability, if you've ever noticed on a Mac OS X system, it's not just a kernel and a bunch of apps. There's dozens, hundreds sometimes of servers running in the background, little helper applications. 64-bit apps are clients of those. They have to do network lookups, so they have to communicate with LookupD and other services throughout the system. So we needed to be able to provide the ability not only for you to write a custom 32-bit front end that works with your 64-bit application, but also the ability to have it talk to the system services throughout the system.

And so we've done that heavy lifting for you. All that is largely hidden inside of the current lib system that communicates to those facilities. That stuff just works. Thank you. All right. And we also needed the ability, if you are going to create, let's say, a server that needs to fork itself or to create multiple 64-bit servers, you obviously need communication between the two of them.

So if you look at what we had as far as services available, I mean this is a small sampling of the IPC services available in Mac OS X. There's many layers above this. The layers above the gray line are those that are typically provided by services at user space. The ones below are things that the kernel provides for IPC services. There's many additional things that could be sitting up on top of there. There's distributed objects and all of those things. But these are the core ones that people typically focused on using.

And so in Mac OS X you have all of these services available to you. Which ones do you choose? Well, if we give you a little map of what the 64-bit space looks like currently in Tiger, we're helping you solve that real tough decision by removing a bunch of them.

So there's a set of services that are just not available in 64-bit space. Why? Mostly because they're implemented by frameworks that we haven't ported to 64-bit yet. So they're not available to you. You can't use Apple Events directly from a 64-bit application. You know, we don't have a lot of those GUI kind of facilities.

User notifications, CF user notifications, CF message port, that kind of thing. They're not there yet, but as you can see, there's still quite a bit that is left. There's another set of services which we provide to 64-bit applications, but we've kind of limited what they can do, where they have some small restrictions.

And what are those? Well, the POSIX thread synchronizers, they've never been able to cross task boundaries. And that hasn't changed in the past. They're still in the 64-bit space. So you can't use shared Pthread mutex or something like that in order to implement cross address space IPC. It was true in 32-bit, still true in 64-bit. So those aren't really available for that kind of communication. They work great in a process.

System 5 shared memory, as many of you have probably tried to use those. The System 5 IPC services were really provided as a last ditch, and they're now in the POSIX thread synchronizer. So we're able to use those. And we're also able to use the POSIX thread synchronizer.

So they're very limited in how you can use them and what kind of-- the number of them, the size of them. When you then try to scale that and try and use it in a 64-bit address space, right, you're just-- they become very unmanageable very quickly. So we--they tend not to be so useful.

Uh, and Mach IPC is actually one of the things we use a lot, but there are some restrictions in that space. Um, one of the restrictions is that an individual piece of data that you can send back and forth is limited to 4 gigabytes. So if you have two 64-bit address spaces and you want to have a communication with them, you can't send a terabyte in as a single unit, as a single piece of data in a Mach message between two 64-bit address spaces. You can break it up or you can create handles and send handles across, but you can't do it as one. It's hard to call this a limitation as far as doing 32 to 64-bit communication. You couldn't ever do this anyway.

What you're left with is a fairly rich set of services anyway. They fall into the somewhat BSD category of services or kernel services because that's essentially what facilities are there for 64-bit applications anyway. Almost all of them are provided by the kernel. There's one service that's available that's provided by UserLand Services. It's the Notify Demon. It's actually used for a lot of asynchronous kind of notifications. One of the common ones in the system is time or time zone changes.

It used to be that every time an app would go to try and look at the time, it would have to read a file to try and figure out what the current time zone was and determine how to display that. And that was very expensive and things that would display time frequently. So Notify Demon actually has a notification that will let you know whether time zone has changed.

Almost never fires, so you can run with a cached version of the time zone service and everything works. And you can use it for your own kind of notifications as well, and it's a wonderful little service. But the rest of these services are, you know, essentially the kernel services that do IPC.

So what do we recommend you do, right, as far as putting a GUI front end and then having communication or IPC between them and your 64-bit app? Well, there have been many demonstrations or examples of how to embed a tool into a GUI application, something to go off and maybe read the process data stuff and return a result into the GUI tool. Build your app the same way, just that your tool now is going to be a 64-bit tool.

and that tool is again largely limited to the POSIX layer, the BSD layer of services in the system. So those are the ones you're going to want to take advantage of. You also want to use the easiest, simplest, whatever IPC mechanism available, the most cooked version of IPC that you can get away with.

Obviously it has to fit your needs. Some are really good at sending small amounts of data back and forth very efficiently. Others, like shared memory kind of facilities, are meant for huge amounts of data. But you have to choose one from those set that's available both 32 and 64.

One of the interesting possibilities, and it's one that we're going to use in an example we're going to show you in just a little bit, is you can actually combine 64 and, I mean, you can combine using higher-level services with the lower-level primitives that they're built on. So you can take a CF application and you can use the higher-level service like a CF socket, and you can embed that in your GUI application.

It gives you lots of advantages. It can interact with the run loop instead of having to be a dedicated facility that you use just in a POSIX style from a single thread. So you can move that into the run loop. But you can also use the raw facility underlying that, like the raw socket, inside of the 64-bit application. So that the 64-bit application has available what it can use, and yet the 32-bit can still use the higher-level facilities.

So what else can you do to make this simpler? Well, one of the simplest things you can do is to take advantage of a parent-child relationship. The little tool, or the 64-bit tool that's going to be back-ending your GUI application, if you make that the child of the GUI application, lots of things come in for free as a result of that.

One of the biggest ones is inheritance. When you create a child process, lots of things are inherited across between the parent-child, your 32-bit GUI application, and the backend, the child, the 64-bit tool. File descriptors, those file descriptors can be just regular data. They could be shared memory regions that you can then map later on inside of the GUI application to get shared memory. That way, you can get a lot of data. way. They can be semaphores.

They can be, you know, pipes. That's actually one of the best ones, right? Standard in, standard out, standard error is one of the simplest things you can do, right, is set those up in the parent in the GUI application so that when you fork off your child and it execs its 64-bit application binary, it has its standard in, standard out, that it can communicate right back to the parent. Works great if you have long computations with fairly small data results going back the other way. All right, just stream them back.

But you can also use inherited memory. So you have your GUI application. It allocates some chunk of memory. You can actually use the POSIX inherit APIs to set the inheritance attribute on that chunk of memory. Then you go ahead and exec your child, your 64-bit tool. That memory shows up inside of the 64-bit tool sitting there shared. And you can use that memory for communications. Especially if you combine it with something like a semaphore, then you can go ahead and do that pretty nicely.

One of the disadvantages of this whole split is now you have to manage your child, right? This tool is now a second process part of their application. So you've got this burden to manage it. One of the nice things is a lot of these facilities, especially if you use like the CF socket kind of facility built into, you know, its ability to integrate with the run loop, it actually has an invalidation callback that will automatically trigger if the child goes away, right? And so you can clean up nicely that way. You also have Sig Child, which can-- it's a little trickier to deal with Sig Child in a GUI application, but not too bad.

Right, so what are we going to do? What do we recommend you do? Well, here's a real simple example. We're setting up an example where you have a GUI front end and it just opens a 64-bit application, launches a 64-bit tool in the background, and uses the standard in, standard out facility in order to get its data back and forth.

So inside of your 64-bit tool, in this case, right, you're going to be doing things very simply. It's, as far as you know, you're just getting things off the command line and sending them back. Right, so you're going to scan data in off the command line, off the standard in, and you're going to go ahead and print F data out and send it out to standard out.

Right, a very simple example, but if something is hooked up on the other end, Like your GUI application, it can catch it and display the results. So here's a set of code that shows how to essentially get a tool launched in a very simple, rudimentary way inside of a 32-bit GUI application. So you go ahead and you use the CFBundle APIs to try and find the tool. So the tool will normally be embedded inside your bundle, down inside the executable path. And so you can use the CFBundle copy auxiliary executable URL.

[Transcript missing]

You can get the regular Unix file representation from that, because those are URLs in the file system code, and the BSD calls take file system representation, not necessarily the full URL. So you have to extract that out. Then you can call the standard POSIX BSD P-open call. P-open creates a pipe between you and the executable identified by the path variable. So it's going to actually launch that executable.

That executable happens to be your 64-bit tool, and it sets up a pipe between you and that tool, and it returns that to you in the GPipe file. So now you've got a pipe or a socket between you and your child. And then you go ahead and need to hook that up to the run loop, right? Because now you're in this 32-bit application. You need to be able to do that.

So here's real simple little code to hook that up into a run loop. You use CF socket, right? You can create a CF socket with that pipe. You have to pull a file descriptor out of the pipe first and then call the CF socket create with native. But that'll go ahead and create you a CF socket that you can then create a run loop source from. And then you can finally add that run loop source to your run loop. So now in your run loop for your 32-bit application, you have a pipe.

That's connected from the 64-bit backend tool. And so it can receive asynchronous data. So you can go ahead and continue to receive mouse down events and all of those things inside of your 32-bit app. But you can also receive the events and results back from your 64-bit backend. And to give a demonstration of that, George Werner from Developer Tech Support is going to show you an example.

You guys have to bear with me. I was a last-minute replacement. Apparently, they had Vanna White to do this, but she had a better agent than I did. So the first thing I want to do here is, well, let's look at the target settings. I just want to show you that this is indeed... There's our architecture right there. PPC 64, we're all on the same page. We want to go ahead and open the source to this. And I think I got set up for this, because they advised me to use printfs for debugging.

[Transcript missing]

This is the first one. This is our pointer here.

Step over. There's a nice 64-bit number. We'll view it as hexadecimal, we'll view it as decimal. You can see that it's about 4 gig. We can step over. We just blocked on the scan F. Go to our standard I/O log. Terrible font. You won't be able to read it. Hopefully you can trust me. I'll type in what I say.

It's a nine-digit hexadecimal number, so it's a little bit bigger than a 32-bit number. It won't fit in a 32-bit number. And there's our count there. Got returned. We'll go over to hex decimal to see. There's 1, 2, 3, 4, 5, 6, 7, 8, 9 digits. Okay, step over, print F, our output. We'll switch over to our output window. You probably can't read that, but... We're at Malloch Barrie. Second one, here is this one. Now this one was close to our 4 gig. This one is up around 8 gig. I'll show it in decimal just to make sure.

It's around 8 gig. "So it's doing a 64-bit number there, 64-bit pointer. Step over, it's going to flush it out, it'll loop back around. and this time we'll enter zero. This time our program will terminate. And that's pretty much our backend tool. That's for demo purposes. Here, Vanna's code was a lot prettier than mine, too. OK.

Okay, we're going to switch our target over to our GUI. This is the source code. Now, I pretty much wrote the minimum Carbon app that I could get away with. It's nib-based. We're going to load from the Nib, load the Nib, set a menu bar from Nib, create a window from Nib, standard Carbon type stuff.

We're going to install an event handler. We have a little button in our window that we hit OK and the command process is our event handler for that. I'll show you that code when I debug it. That's the same code that you saw him. It gets the auxiliary executable. Our tool is Tool 64.

Okay. And file system, there's the P open and when I first demoed this, I had three Unix guys reviewing this, reviewing my presentation that fainted. Apparently P open isn't the API of choice here. There's security holes involved with using P open. I didn't have time to rewrite it, but they told me the correct answer is fork exec and to duplier pipes. If you know what that means, then you're ahead of me, so. Okay.

He kind of brushed over the install code. I showed the install code, but I'll show the socket call when we actually get down to running it. And then run application event loop. So it's minimal Carbon app. This is the command for the button in our window. I'm going to set a breakpoint on it.

This is the callback code. I think he stepped through it. His was cleaned up a little nicer than mine. Okay, pretty straightforward stuff. And this is the callback. This is what will get called when I receive a communication from the backend. And we'll set a breakpoint here. Okay.

So the other thing, other mistake I made in my demo originally was I demoed it in GDB and the very first question I got was what you can't, you can't debug in Xcode in 64 and so I had to change my presentation to appease that crowd but what I'm going to do now.

This is my build directory where the project is. There's the GUI app, there's the 64-bit tool. I'm actually going to go down into the tool. And there's the actual executables for those. I can run GDB. I don't need to go there yet. If I do a PS... Just to make sure it's not running yet, switch back over to Xcode.

There's our P open. I want to set a breakpoint there. And now let's go and debug. Okay, so there's our P open. Let's switch back over to GDB. We'll do that grep for the, for running. Now you can see the application running. That's this one here. Nothing else. We step over. Do it again. Now we have both the GUI application running and the 64-bit tool. All right. Still that PID. I can GDB. There's our source code. There's the same thing we've seen there. I'm going to set a breakpoint here on the malloc.

[Transcript missing]

This is our first callback. He printed that printf at the beginning of the source of the tool. This is catching his output coming back. All this code is going to do is take the data that's returned. Get the length of it, malloc a little buffer for it, pull the bytes out of it, null terminate it. It's basically the printf string. You actually see part of it right there.

We're going to get our HIV, which is our Carbon window, pull out its existing text. We're going to pin the new text to that text and then set it back into the view. Release our string, free our memory buffers, and return. Before I return, I'll clear that breakpoint.

Let's switch over to our app. Here's my little app. It's just a little window created in Nib. We have a little text entry box where you can type a number in, an OK button, and then this captured the output coming back from the tool. It's the hello world, size of void, star is 8, and please enter a size. That was our prompt. We hit our OK button. This is the handler for the OK button. What this code does, it finds our HI view.

It creates an empty string and sets the H.I. view. I just kind of stepped a little fast there. Set the text. That cleared the box on the view. Now we're down to the, we find the, this is the text, the edit text control that we type the number into. We get its text out. And then we write it to the pipe to the child. And I'm going to squeeze this up a little bit here so we can see it.

There we do the write, and as soon as I do the flush, boom, GDB just broke over here. He received it from the front-end tool. We can do a back trace. We can look at the count. There's what-- so we got the message from the front end. We parsed it. It's put it into the account.

When we step over-- oops, sorry, not GD. This is not Max Bug. Flashback. OK. Now we step over and look at-- PTR1 And there's a nice 1, 2, 3, 4, 5, 6, 7, 8, 9 digit number. Pretty confident that won't fit into 32 bits. So there that is. Oops. Did it again.

And back around to our prompt. I'm going to delete our breakpoints. And just while we're here, Dump out a bunch of registers here. You can see here's the, that's definitely a 64-bit number there. A couple more back up there. There's a stack pointer there. Backtrace. Interesting, argv is on the stack. You can tell that. So let's continue.

I'm going to clear my breakpoint on that one. And we'll continue and switch over to the application. And so, now you can see it printing out the value here, and every time I hit it, it jumps up about 4 gig.

[Transcript missing]

Okay, and that's pretty much the demo there. Okay. So... Our next speaker is Rob Regge-Scofield. I hope I got his name from Mathematica.

Hi. Starting later this month, we are going to ship Mathematica 5.2, which has been 64-bit optimized to run on G5 systems running Tiger. And the way we did this, surprise, surprise, is using the exact techniques that have been described so far in this session. We have a 32-bit GUI front-end process that talks to a 64-bit computation engine using some of the IPC mechanisms mentioned. It uses shared memory, pipes, TCP, whatever is most appropriate for the situation. And our GUI front-end even uses CF socket to integrate the communication into the run loop, so we can do that efficiently.

So, there's two big benefits to having a 64-bit version of Mathematica for our users, in that certain mathematical operations are faster. Arbitrary precision math is one example of that. And I don't know, was there supposed to be a slide of that, or... This is the slide from the science talk yesterday talking about some of the Mac OS X technologies, and this is what I was looking for.

So this is an example calculating the first million digits of pi. In the same computation running on a 64-bit version of Mathematica, it takes just a little over five seconds to complete, four and a half seconds maybe. Doing the same thing in a 32-bit version of Mathematica takes about seven and a half, eight seconds.

So it's just, it's almost twice as fast for that particular type of operation. And the other big benefit is addressing large amounts of memory. So can we cut to demo machine two here? And I have some, a pre-rendered simulation here, which unfortunately takes a few hours, so I can't do it in real time. What this is showing is going to highlight the differences between the 32-bit process and the 64-bit process. This one right here was done on a 32-bit system. It's a simulation of a tsunami.

The wave that's starting in the middle of the sea here is going to be affected by the mountains on the sea floor. That's going to cause disturbances all the way back up to the surface of the sea. If we just run this, we see how the simulation plays out.

What we did on this 32-bit system is that we had to decrease the resolution of this simulation so that the amount of memory required would be less than the 4 GB maximum. As you can see, due to the lower resolution, there's actually artifacts that appear that really aren't there.

Certain variations in the results at that point are being... exaggerated just because of the lower resolution. If we do the exact same simulation on a 64-bit system, we can increase the resolution. This was done using about 6 GB of memory. You see the simulation looks about the same except we don't see the artifacts that we got in the 32-bit case. The 64-bit support for computations involving large amounts of memory is a big win for customers who are using the 64-bit system. That's all I have.

Thank you very much. My name is Matthew Formica. I'm the 64-bit software evangelist in developer relations. For more information, you can go to the WWC portal. There's a lot of great documentation associated with this session. Sample code available or coming soon. There are a few upcoming sessions. The most important thing is the lab that is coming up actually right after this session in the performance and tools lab down on the second floor.

We don't have time for Q&A here, but you can get all your questions answered from the team in that lab. They'll walk there right after this session. You can contact me, send me an email. As I said, we don't have time for Q&A, but you can meet us in the labs. So thank you very much.