64-bit In-Depth - WWDC 2006

Application Technologies • 58:53

Transitioning your application from 32-bit to 64-bit requires you to modify your code to play well in a 64-bit world. Learn in this session about framework API changes for 64-bit, 64-bit ABI changes, 64-bit binary debugging tips/tricks, and how to do 64-bit performance tuning, among other topics.

Speakers: Matt Formica, Eric Albert

Unlisted on Apple Developer site

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.

Good morning, everyone. I want to welcome you this morning to 64-bit In-Depth, where we're going to try and answer some of the questions we deferred from yesterday's presentation. There is a variety of 64-bit content in other sessions through the week, so I'll try and point out to those as well. As the voice of God said, I'm Matthew Formica. I'm the 64-bit software evangelist at Apple. So feel free to email me through the week and post this week with any 64-bit questions you may have, and I'll try and get those answered for you.

So, why are we here? We're here because we completed our transition to Intel hardware this week with the announcement of the Mac Pro 64-bit Intel hardware. And we are transitioning in Leopard to provide a full 64-bit API stack to you in the operating system. So we want to show you how to use all of that. And as you saw yesterday in the 64-bit overview, 64-bit as an architecture is really just another part of going universal. It's another architecture that gets built into your binary, similar to going to Intel.

And in Leopard, we have, unlike Tiger, we have just about the whole API stack available to you for 64-bit. And we'll talk about today some of the exceptions to that. And what does this mean to you? Well, you should start thinking about migrating to 64-bit. For some of you who have very performance-critical applications or memory space needs, now will be the right time to move to 64-bit. For others, it might be a little longer transition time.

So, what are you going to learn today? I'm going to walk you through the various API changes for 64-bit and Leopard to help you understand what's there, what's not there, what are good replacement APIs for things. And then I'll hand off to Eric Albert to talk about some of the other things that you might need to be aware of when moving to 64-bit, including how to handle tiger and leopard differences and understanding the actual 64-bit Intel architecture. Thank you.

So, speaking of APIs changing, why do our APIs need changing? Well, the first reason is just a fallout of being 64-bit. We are an LP64 64-bit model, which means longs, pointers, and size T change size. So, some of the fundamental data types are changing size, and thus our APIs need to adapt to that. The 64-bit also means applications can access more data. So we need to change some of our APIs to make sure that they can actually reference those larger amounts of data.

And we want to do this in a way that doesn't break compatibility for existing 32-bit applications. And so there are certain data types, certain file formats on disk that by statement or by convention are fixed in size, and we need to make sure that those stay the same size for 64-bit.

And finally, we need to make some changes through our APIs because there's no mixed mode, which means that your entire process must be 32-bit or 64-bit. And if you want to build a 64-bit application, everything you link against must also be 64-bit. And so there is no piecemeal solution. The entire stack is going to go 64-bit.

We have a few principles that we applied as we were changing the APIs. One of them was consistency, where we had to change things in one framework. We looked at making a similar change in another framework that might do something similar. Secondly, we wanted good impedance match, which is our way of saying we want you to be able to take the results of one routine and easily pass them off to another without having to do a lot of conversions in between. And finally, going to 64-bit is a good chance for us to clean up our APIs and help you modernize your application. We don't have the same binary compatibility restraints, and so we can clear away some of the old cruft that's in the Mac OS X API set.

This is something I want to emphasize right here as we get started. I'm going to be spending a lot of time talking about the things that don't work or work slightly differently, but they're really just a small subset of the thousands of APIs that make up Carbon and Cocoa and the other frameworks on the system. Most APIs just work. The transition is mostly about those edge cases in the APIs and, of course, 64-bit goof-ups like truncating your pointers.

So we're going to walk through the different frameworks on the system. Let's start with API changes for Cocoa. And if we start actually at the very bottom of Cocoa, which is the Objective-C language, if you were at the Objective-C 2 sessions yesterday, you heard that for 64-bit, the runtime has been rewritten. Certain low-level data structures are now opaque, and we have accessors to get at what you previously just got to by accessing structures. And we have some new features that are going to be coming to 64-bit Objective-C applications only.

Moving up to the basic data types in Cocoa, nearly all ints have been replaced with NSInteger or NSUInteger. And these are data types, type defs that can now change to be long for 64-bit. That way we aren't artificially hampered. You can see the basic type def that we use in the system headers to handle this.

Secondly, enumerations are not predictably unsigned int. The compiler actually looks at the values in the enumeration and makes a call as to what size things should be. And that wasn't good enough for us, and so we have now declared some of the enumerations as NSInteger, NSUInteger to make sure that the base type is 64-bit capable.

And thirdly, a change that was mentioned yesterday in the 64-bit overview, all graphics-related floating point quantities are now doubles. CGFloat has grown in size, and this was motivated by the opportunity that moving to 64-bit provided to make some changes that will prepare us for the future when greater precision is going to be needed. I should mention that this change permeates through Carbon graphics calls as well.

There's a few classes that did not make the cut to 64-bit. The first couple are NSMovieView and NSMovie. And this should be no surprise. QtKit is a great Cocoa API for accessing QuickTime functionality. And so the NSMovieView and NSMovie classes have been on their way out, actually, for a while.

Secondly, NS Quickdraw view is not available to 64-bit applications, and that's because Quickdraw itself is not available, and I'm going to go into that in a lot more detail in a few minutes. Instead, you'll want to use the Cocoa Drawing system or go right to the Quartz APIs directly.

NSMenuView is probably a class that not a lot of you used, which is good because it's not there for 64-bit. Instead, if you do need to do custom view or custom drawing support of menus or in your menus, you can use the NSMenuItem class. And I want to emphasize a point that Ali made yesterday, which is don't use unkeyed archiving. You really want to use the keyed archiving system. Hopefully you've already ripped out that old code from your application.

There's a script to help your conversion if you are a 64-bit, if you are a Cocoa application wanting to move to 64-bit. It's in developer extras. It's a top script. It automates some of the work. It puts little comments and flags things in your code to help you find the things that need to change.

If you look at our documentation, it gives you all the details for how to actually run the script against your code. And then after you run it, you run file merge to see what it actually did, to make sure that the changes it made are what you actually want to happen. The script will warn of a variety of conversion problems. Some of them are strictly required by 64-bit. Others are changes that are suggested to help your APIs and your data structures be able to handle larger amounts of data.

And that's pretty much the summary of changes for Cocoa. It's pretty straightforward. There's new runtime changes to Objective-C, a few key types change. There's this great script to help you convert. And the script is going to help you see that you really want to be end-to-end 64-bit compatible in your application. Just because your application builds using PPC64 or x86-64 as your architecture, that does not mean that you're actually done. You want to make sure that you can handle 64-bits worth of data and do it efficiently.

Changes for Carbon are a little more extensive. Let's go through those now. The first change we made was to make SN32 and UN32 actually be fixed to stay 32-bit in 64-bit. They were actually defined such they would have become 64-bit values. Core Foundation has also had several types redefined so that they grow in size for 64-bit, the basic CF index type as well as a few others.

There are now new standard types for consistent representation of user-specified data. And these keep their old definition for 32-bit applications, so your code doesn't have to change if you're 32-bit, but they become void stars for 64-bit. And types that represent offsets in the memory, like byte count and byte offset, are now longs for 64-bit applications so that they can actually reference all areas of memory.

Let's talk about a few pitfalls with some of these common data types. You really want to try and adopt these standard types throughout your code base, not just when interfacing with the Carbon APIs. One common problem is casting a pointer to an SN32 that worked in 32-bit. Of course, that's going to truncate things in 64-bit. Secondly, don't assume that a CF index is the same size as an S32. Here's an example of a call that would not be correct for 64-bit.

And thirdly, on the flip side, don't assume that all parameters scale to 64-bits in 64-bit applications. The collection manager, for example, some of its parameters like item size don't actually grow to be 64-bits in size. And if you assume they do, you'll get a different set of crashes.

Pascal strings, they've been around for a long time since the very beginning, but we've made a concerted effort for our 64-bit applications to actually remove them from the headers. So wherever there were routines that took a Pascal string, there should now be replacements that take a C string. This doesn't mean that the compiler doesn't support Pascal strings. It still will, so you don't have to instantly change all your own code. But this is just a steady migration path that we've been on for some time.

Moving up the Carbon stack to the file manager, the FS spec data type is finally going away. And this type has had drawbacks that we all know well. It can only handle 31-character file names. And so FS refs are what you should be using for all of your 64-bit file manager needs.

There are a few places in the file manager APIs where an FS spec is still needed just to be kind of passed in. You can just pass null there now. It doesn't really use it for anything. But in general, wherever you need a real object there, you should be using FSRef.

A few other low-level managers that we should talk about some changes for. The memory manager has had a few changes. There are routines that are not available: block move, block data, and block zero. Instead, you should use the standard memmove or B0 calls. And believe it or not, the resource manager is moving on to 64-bit.

This is despite the fact that RezEdit hasn't been revved since 1994. There are a few new types and some things going away. The standard icon formats are not available. So no more color icons, no more icon suites. Instead, you'll want to use icon refs or go right to core graphics and use CG image refs.

64-bit Quickdraw is not available. There is no 64-bit Quickdraw. And Quickdraw has made a lot of migrations through the years. This is what it looked like in the early days. And we've revved it over time to support color, amongst other things. But there's really a limit to how far we could take the Quickdraw APIs and underlying implementation. And so on Mac OS X today, we have a situation where displays have gotten bigger, transparency is used more, and the technologies that are available on Mac OS X, like Core Image, Quartz Composer, and the new Core Animation framework, really don't speak the language of Quickdraw.

They speak the language of Quartz. And so now is a good time for you to remove the last vestiges of Quickdraw from your code and move over to Quartz. Just for my own interest-- if I could see a show of hands-- how many of you think you still have Quickdraw calls in your code?

Okay, that's a fair number of you. So hopefully we can help make the transition pretty smooth. We've been talking about this for a couple of years. In Tiger, of course, the Quickdraw APIs were deprecated. For 32-bit applications in Leopard, Quickdraw's still there, but to move to 64-bit, you'll need to make this change.

Let's talk about a few more details of this. There's some good reasons why we're getting rid of QuickDraw. It's not thread-safe. It's got a fixed resolution, so it's not resolution independence ready. And internal data structures of QuickDraw are limited to 16-bit integer coordinates. So you've probably noticed that when you get larger images, things start going wrong above about 4K resolution.

So instead you'll need to transition, and here's a short list of steps that you'll need to take a look at, one approach to moving to Quartz. You'll want to start separating out your UI code from your basic business logic. A lot of old Carbon code intermingles calls to the drawing routines with calls for event handling and doing other things in their application.

And because Quartz is a whole new paradigm for how graphics are done, you're going to want to try and separate out your logic to make the rewriting more easy. Secondly, if you haven't already done so, you should move to HIView as a Carbon application. And this will give you the right foundation to start building new graphics code on top of.

You'll want to start rethinking how you approach drawing things with Quartz. It's not a simple one-to-one mapping between Quick Draw calls and equivalent Quartz calls. It's a different way of working. Instead of working with shapes and pens and transfer modes and pics, you're going to be working with PDFs and transparency paths and gradients. Hopefully, step four is profit.

If we dive down into this a little bit further, you'll see that Quickdraw was built around a basic set of primitives. And all those primitives were pixel-based, so right in the calls you were specifying exactly how many pixels something should be. Quartz 2D instead is based around paths. So you describe a path which builds up a set of geometry, and that can easily scale to whatever the resolution of the final display actually is, whether that be a monitor or a printer.

Quartz, unlike Quickdraw, doesn't have any built-in automatic redraw machinery. It doesn't know anything about visible regions and so on. Instead, you should set up any clipping manually for your CG context. And there are some routines that do some similar things in terms of clipping to what you're familiar with with Quickdraw, but the basic model is different.

When it comes to moving pixels around, this is a big one. There's no exact copy bits functionality replacement in Quartz 2D. Quartz doesn't know anything about bits or pixels, so ultimately it can't allow you to do a copy bit sort of routine. And it uses the alpha channel, actually, instead of needing a specific mask.

So if you want to do the sort of thing you've been doing with copy bits, you should look at using CG Context Draw Image. We do have a bitmap context, a bitmap backing store that is available for core graphics, and then you can use CG Context Draw Image and friends. There are a variety of other similar APIs that can be used to get similar functionality in Quartz.

You should come by our labs if you actually want some help converting from Quickdraw to Quartz, and we can help you do that. Because Quickdraw is going away, this has a lot of impact to other APIs that are part of Carbon, because Quickdraw spreads its fingers wide. The Appearance Manager is going away. It was mostly Quickdraw-based. Instead, you should use the HIFeam API that's been available for a while. Custom menu, window, and control definitions are no longer available. You need to use custom HIVs instead.

Most of the routines in the font manager are not available because they're mostly Quickdraw based. So instead, you should use the Core Text font API that we have. There are a few new routines we've introduced in Leopard to help you convert from font manager data structures over to Core Text. And as I mentioned earlier, most of ICON services and ICON utilities is no longer available.

Following in the same vein, the drag manager has certain routines that are not available, the ones that take quick draw structures. And when it comes to the event manager, probably one of the biggest ones that goes away is getMouse. And I still remember some of my early Macintosh programming being so excited to call getMouse and do basic drawing on the screen just following the mouse. But getMouse returns coordinates in the current graphics port. And for 64-bit applications, there is no current graphics port. So instead, we've introduced a new routine in Leopard, HIGetMousePosition, that will let you do equivalent functionality.

Let's talk about the Window Manager. It's another big one that has some changes that affect other APIs. Carbon Windows must use compositing mode to draw their content in 64-bit. And compositing mode is something that's been around for a while, for a couple versions now in Mac OS X. Hopefully most of you have converted to that already. How many of you are using composited drawing mode for your Carbon apps? How many of you know you're not using composited drawing mode?

Okay, for those of you who are not using it, now will be the time to convert over. You can do this in 32-bit as well, so there's nothing special about it. And as I say, it's been around for a while, so there's nothing new to leopard with that. So you must pass kWindowCompositing attribute to the create new window call in 64-bit, or you will get an error back from create new window.

So this brings up another interesting issue. If you need a, you can no longer use Get CG Context from Port or its ilk in 64-bit. If you need a CG context for a window, you'll need to pay attention to the K event control draw Carbon event and get the CG context through that.

The Dialog Manager just transparently switches over to compositing mode without any changes that you need to make to your code in 64-bit. The Carbon Event Manager has some changes. Non-compositing window events are not available. Instead, you'll want to use events that operate on HI objects, HI views, and CG contacts. And there are a few new standard types that are introduced as well.

We've got some other grab bag Carbon API changes. Basically, any APIs that were deprecated prior to 10.5 are not available to 64-bit applications on Leopard. And there are a few others that are not available. The Code Fragment Manager is not available. The Device Manager is not available. The Language Analysis Manager has been replaced by the Natural Language Processing API. And the Desktop Manager functionality can be obtained through a combination of icon services and launch services.

The Sound Manager. Simple to use API, not very powerful compared to today's core audio APIs. The Sound Manager has been on its way out for a while as well. So it's not available to 64-bit applications. You'll want to take a look at core audio. The List Manager is also biting the dust for 64-bit applications. That should be no surprise to you. Use the data browser instead.

And some of the really old school text processing APIs are also not available. TextEdit and the Text Services Manager. Instead, for 64-bit applications, you'll want to use an HI TextView or, new in Leopard, we've got an HICocoView that allows you to embed CocoView inside a Carbon window. So you could put an NSTextView inside an HICocoView and use that in your Carbon application. Hopefully you're getting the picture here that most of the changes we're making are pretty obvious, what you would have expected, and we're getting rid of APIs that we've been encouraging you to move off of for some time.

The Translation Manager has been replaced by Translation Services in the HI Services framework. The Scrap Manager has been replaced by the Pasteboard Manager. And if you need to enumerate displays, you'll no longer be able to use the Display Manager, including GD handles, but you'll want to use CG Direct Display and the Quartz Display Services APIs to walk through the list of what displays are actually connected to a machine.

So in summary for Carbon, you'll want to adopt the new standard Carbon types in your code, which means you'll need to go through all of your old code to make sure that not only are you interfacing correctly with our routines, but you are actually able to handle large amounts of data in your code. You're not truncating things along the way.

You'll need to transition from Quick Draw to Quartz. And thirdly, you'll need to use compositing window mode. Hopefully most of you are already doing that. And good replacement APIs are generally available. We're getting rid of APIs that we've deprecated for a while now, and we've got pretty good replacements there.

Carbon and Cocoa are two big buckets, but there's kind of a other bucket that I also wanted to talk about, which is a variety of other changes that you should be aware of throughout the frameworks on Mac OS X. And the first one is a basic new way in CFBundle to allow you to check the architectures that are available for a given bundle.

This allows you to replace the sample code that we've had out for a while where you've got to actually kind of munch through the Mako header for a given binary to figure out what architectures are in there. So, nice new clean API to do that. Second of all, the message framework, a little-known framework that allowed you to send emails through mail, is not available to 64-bit applications. Its functionality can be easily duplicated through basic Apple events sent to mail.

Open Transport is not available to 64-bit apps. How many of you are relying on Open Transport? Hopefully not many. Just a couple of you. Good. CF Network is a good, newer replacement API, and BSD Sockets are a great, really old replacement API. AppleTalk is not available to 64-bit applications. There are other good ways of doing what it did that have been around for a while.

When it comes to printing, there's just a few notes on that. PDEs that are 32-bit PDEs, they're CF plugin-based, will not load in 64-bit applications. They will not show up in the print dialog. Instead, there are new Cocoa-based PDE APIs that you should take a look at. You can check out the PDE plugin interface dot h header to get more information on using those. And secondly, the other bit of information you should know about is that if you want to get information from the PPD in a 64-bit app, you'll need to use CUPS instead of PPDlib.

Let's talk about QuickTime. And this is going to be covered in even more depth in one of the QuickTime sessions this afternoon by the QuickTime engineering team, but I just wanted to give you a little bit here. The QuickTime C APIs are not available to 64-bit applications on Leopard. The code base is very old and crufty. It's a little bit like QuickDraw in some ways. However, there are Carbon and Cocoa ways of continuing to access QuickTime content and QuickTime functionality for 64-bit apps.

The biggest is QTKit, which is a new, well, even not so new now, QuickTime Objective-C API that's modern, powerful, and functional, and it's available to 64-bit applications in Leopard. A couple implications of this. Since there's no QuickTime C APIs, you can't get native QuickTime identifiers out of QtKit if you're a 64-bit app. So you can't ask a Qt movie for its QuickTime movie is the biggest example.

A bug in the Leopard preview that you have right now is you can't actually use a Qt movie view from a nib. That's actually just broken right now. But you can create one in code, add it to your view, and that will work fine. This is something we'll fix pretty soon. And the new capture classes that we've just introduced in Leopard are not yet available to 64-bit applications.

One other bit of detail is that nav create preview depends on QuickTime, so it's also not available to 64-bit applications. There are a variety of new QuickLook preview facilities that we've rolled out that are going to be a part of Leopard. Java on Mac OS X Leopard is available to, natively, for 64-bit Intel only. There is no PowerPC 64 Java. And that's simply because we license the JVM from Sun, and they don't have a 64-bit PowerPC implementation. Speaking of Java, Cocoa Java is not available as well to 64-bit applications.

If we dive real low level in the system to the driver level, I have up here on the slide what they told me to say about it, which is basically that if you are a driver that wants to access more than 4 gigs of memory, there are some changes you'll need to make.

The kernel is staying 32 bit for Leopard. That should be of help to you. There's a session right after this one that's going to be all about I/O Kit related changes for 64 bit computing as well as there will be a discussion of some Intel EFI related stuff in that session.

Let me talk for just a minute or two about the state of 64-bit in the Leopard preview that you have, because we've talked to some degree about what we plan to do for Leopard for 64-bit, but not everything's working in the preview that you have available to you. In the preview that you have, nearly all the frameworks are available, as I've described. Nearly all the developer tools are also available. Xcode, Shark, Crash Reporter, all of those have been 64-bit enabled.

Basic language interpreters, however, are still 32-bit on the system. That's something that we're going to correct soon. And there's still some performance work, some basic structural work that we do for 32-bit that we haven't yet done for 64-bit. Prebinding at the system framework level is not something that we've done yet. Of course, you don't need to pay attention to that for applications. There's no system shared region in 64-bit yet either.

There are some frameworks and APIs that are not yet available to 64-bit that we plan to make available by the time Leopard GMs. Let me just go over some of those here. Of course, scripting languages I've mentioned. Parts of DO are not working in this preview, which means that spell checking and sync services are also not working.

Objective C2 garbage collection in this preview is not enabled for 64-bit applications. That's something we're going to change by GM. DVD playback is available only to PowerPC 64-bit applications, not on Intel 64. And X11, if I have my information correct, is available to PowerPC 64 applications, but not Intel.

There's a bug as well. If you are a sophisticated C++ application, you use templates, and you use the -f visibility inlines hidden flag to help reduce the amount of symbols that are in your application and thus improve launch times, you're going to get a linker error when trying to build for 64-bit.

The error is code gen problem can't use rel 32 to external symbol or message similar to that. If you get that, that's just a bug that we're going to fix. The easy work around is don't use that flag. And with that, I'd like to turn things over to Eric Albert to talk about the 64-bit architecture at the lower levels.

Hey, there. So to go back to the slide that Matt showed at the beginning, so he's covered now our API changes for 64-bit. And I'm going to talk about how to find your application's dependencies to make sure that everything that you need is 64-bit before you actually move your own application.

Because well, if the frameworks that you need aren't there, then building your application in 64-bit won't work all that well, because you won't be able to run it. Then 64-bit in Leopard and Tiger, which can sometimes be a little bit complicated to think about, how the two operating systems interact, and I'll explain why. And finally, what's new in our latest and greatest architecture, 64-bit on Intel.

So first, finding 64-bit dependencies. The reason why this is important is that even though almost all of our system frameworks, and nearly all the ones that we're going to support for 64-bit are built 64-bit today, there are obviously a lot of applications out there that are dependent on third party libraries and frameworks. It's good to figure out which ones of those you'll need before you do that transition, much as you had to do for Intel.

So the way to do that is to run OTool-L to point that at the actual application binary that you have. So in this case, I picked Transmit, which is a nifty FTP client from the folks at Panic. And you can see here that actually three of the frameworks that Transmit pulls in, FTP Kit, Growl, and Neon, are not Apple frameworks.

So this means that in the Leopard preview today, obviously these three frameworks, well, they're not part of the operating system. So they aren't 64-bit today, unless the Panic folks have worked really, really hard over the past two days to port them. So those three would then have to move to 64-bit before Transmit itself could actually link as a 64-bit application.

Now, how do you actually check those to see if they have 64-bit slices? You run the file command from the command line. So here I've run it on core foundation on the Leopard preview, and you see that it has all four architectures. Of course, if it only reports PPC and I3D6, then that framework isn't 64-bit yet.

How about 32-bit versus 64-bit in Leopard and Tiger? And basically, how to go about running the right application at the right time. This is a little bit tricky, but I'll try to explain it. Mac OS X will choose to run the 64-bit slice of a binary when you're running on a 64-bit Mac.

And so what we've always tried to do in Mac OS X is to run the right application when the user goes ahead and double-clicks something. This is why, for example, on Intel, when we introduced the Intel systems, then when you built your application universal, the user double-clicks it and the Intel side launches because, of course, that's what you'd want. The idea here is if you've built your application for 64-bit, that's because you probably actually want folks to use the 64-bit side. And so we'll run that by default.

But sometimes 32-bit is the right choice, even for an application that has both 32- and 64-bit slices. So some examples of this are 32-bit plug-ins in browsers. So for example, if Safari was 64-bit today, but say the Windows Media plug-in was still 32-bit and you needed to view something in Windows Media, then you'd want to be able to run that as a-- you'd want to be able to run Safari, perhaps, as a 32-bit application.

This applies to any applications that have plug-ins, a variety of pro applications do. And also, 32-bit native libraries for Java, Perl, and other command line-- I don't want to call them interpreters because Java isn't, but things like this that-- You can't determine at launch time whether the application, whether the libraries that you're going to try to load later are 32-bit or 64-bit. If you have a Java application with 32-bit JNI libraries, then when you run user bin Java, we don't know that that's what you're going to try to load. So we'll have to provide some way for you to run a 32-bit Java in that case.

The way to do that is to use the new POSIX Spawn API, which was introduced in Leopard. Unfortunately, the flags that you need to pass to POSIX Spawn to get it to choose between 32-bit and 64-bit are not currently in your Leopard preview. That will, of course, be added before Leopard ships.

So this will be the interface, but you can't quite use it today. The way to do this today, then, the workaround for you while you're doing development is to use lipo-thin to remove the 64-bit slice of your binary. And yes, it's kind of inconvenient, but we will make this better.

[Transcript missing]

So the answer to that is to build the 64-bit side of your binary, so to build your PPC64 and x8664 architectures with a deployment target of Mac OS X 10.5. Now, unfortunately, today, this doesn't actually have a material impact on what launches, but before Leopard ships, then this will actually be fixed.

And when you set this deployment target, then the right thing will happen. By that, I mean that if you set a deployment target of 10.5 for the 64-bit slice of your binary, then when you launch that binary on a tiger system, the 32-bit side will run, and when you launch that binary on a Leopard system, the 64-bit side will run.

[Transcript missing]

So what do these changes actually mean for running code and for the performance of code? Well, more registers, as I mentioned, means that more things are possible, that we can do interesting things with the calling convention and with code generation that just simply couldn't be done for 32-bit.

Again, the most noticeable one is the better calling convention, the fact that we can actually pass arguments and registers and do the right thing with passing floating-point arguments to functions. So I'll talk a little bit more about that in a minute or two. Faster compute-intensive code. When you have more registers available, then you don't have to spill things onto the stack. You don't have to go out to memory.

You can keep more data in registers and manipulate more values faster. Faster access to external functions and global variables. This comes about because we have PC relative addressing, so we can actually reference these things much more directly than we otherwise could. We don't have to go through little funks in the code.

and faster floating point code because, actually, first of all, the better calling convention helps out floating point, and secondly, again, doubling the number of registers helps that out as well. So overall, you look at this and you say, hey, that sounds pretty good. There's one caveat. Of course, this is a shift from 32-bits to 64-bits, so larger longs and pointers means that fewer items fit in the cache.

It's not like your Mac Pro, say, has an 8-megabyte L2 cache when you're dealing with 64-bit code and a 32-megabyte one when dealing with 32-bit code. You get the same cache size no matter what you're running, so you can fit more longs and pointers in the cache when you're running a 32-bit code.

But the change here that gives us more registers and that changes the calling convention and so on and so forth means that many applications will actually end up running slightly faster on 64-bit Intel, regardless of whether they need the larger address space. Now, this isn't -- I have to be very clear here -- this is not a reason in and of itself to go out and move all of your code to 64-bit right away. So as we mentioned at the 64-bit overview yesterday, when you run the first 64-bit application in the system, that brings in a 64-bit framework stack, so that has additional memory requirements.

But the reason why I mention this is that if you happen to have a compute-intensive application, in particular folks in scientific computing or anyone who's just really trying to use the computer to do the work that you're doing, you're going to have to have a lot of memory.

So if you happen to have a computer that's really trying to use the computer to do the work that you're doing, you're going to have to have a lot of memory. just really trying to eke every last cycle out of the CPU, then 64-bit on Intel in particular is something that you may want to look at.

Now some applications actually end up running a little bit slower with this because of that trade off that if you have a lot of longs and pointers in your code, then that may more than offset the performance boost that you get from the register changes. But if you do have a compute intensive app, then I encourage you to take a look at this and try it. And if it's an application where that additional, say, 10% or so may make a significant difference, then great, this may turn out really well for you.

[Transcript missing]

How does that calling convention actually work? Well, we decided that compatibility was good. And so our calling convention is as much like Linux and other Unix-like platforms as possible. So this means that if you have assembly code or developer tools or whatnot that were built for Linux or FreeBSD for x86-64, then bringing them over to 64-bit Intel and Mac OS X should actually be reasonably straightforward. So we pass six arguments, six integer arguments and registers. We start with RDI and then RSI, RDX, RCX, R8, and R9. After that, we spell onto the stack.

Floating point arguments are passed in XMM0 through XMM7. The return value comes back in RAX. That's a lot like 32-bit, at least. And what all of this adds up to is that, at least conceptually, this is pretty similar to PowerPC, where we're passing arguments and registers up to a certain number and then spelling out the stack and returning values and registers. It's a lot more like PowerPC, certainly, than 32-bit Intel was.

Documentation for this calling convention is up at x86-64.org. This is one of the great things about being as much like Linux and other Unix as possible is that we can actually say, you know what? The documentation that's out there, that's been out there for a while now, works great for us.

So I'll give you a quick example of how this calling convention actually works and how you would go about using it in debugging. So let's say that you wanted to print out all of the file names that an application was opening as it went about opening them. So you set a breakpoint on open, and then you can set a breakpoint command that will just print out the first argument and then continue and end, which means that when you run your application, you'll just get this continuous stream of all of the file names that get opened. So you can see, say, where it's looking for preference files and things like that.

Here's how you would have done that for 32-bit Intel. Star, car star star, ESP plus four, and you'd have to know that that's ESP because open is a frameless function, and to do that you'd have to disassemble open. That's really not very much fun at all. Here's how you do it for 64-bit Intel. First argument's an RDI. Print RDI. Nice and simple. And when you run that, you get all of the file names here from a 64-bit version of TextEdit.

How about Objective-C? Now Objective-C, of course, the first argument is self, and the second argument is the selector. So the first argument to a function is actually the third argument that really gets passed in the calling convention. So here, if we wanted to print out the title of every window as that was changed, then we set a breakpoint on NSWindowSetTitle.

And then what are we going to hand over to print object? Well, for 32-bit on Intel, again, this is really messy. You have to know that actually NSWindowSetTitle does have a frame, and therefore you reference your arguments off of EBP instead of ESP. And you have to dereference things. It's just kind of nasty. So 64-bit, it's a lot easier. It's the third argument. That's an RDX. Pass that straight to print object, and so you can get all of your window titles.

How does this compare to Windows? If you already have 64-bit code for Windows, then you'll find that a few things are different here. And again, here, Mac OS X is a lot like Linux and other Unixes, but Windows is just the odd one out. Microsoft went with the LLP64 data model. That means that long, longs and pointers are 64-bit.

For us, longs and pointers are 64-bit. What that means is that if you have C code that you want to be cross-platform for 64-bit between Windows and Mac OS X, then you may want to avoid the long type altogether and use ints for your 32-bit values and long longs for your 64-bit values. And then that'll be compatible back and forth. Or, of course, use explicitly sized types like UN32T and N64T.

How about passing arguments and registers for the calling convention? We passed the first six arguments and registers. They passed the first four arguments and registers. And just to make that much more interesting, their first four are our last four. So that makes porting assembly code kind of difficult.

Volatile registers, all of the argument registers plus R10 and R11 are volatile for both platforms. But again, since the argument registers are different, that means that you have different volatile registers. And volatile XMM registers, all of them are volatile on Mac OS X. On Windows, only XMM0 through XMM5 are volatile and the rest are preserved.

So if you're bringing code over that assumes that those will be preserved, then you'll have to update that for Mac OS X. What all of this adds up to is that for the most part, most 64-bit Intel assembly code will be preserved. So it's going to be different between Mac OS X and Windows.

And that's unfortunate, but in this case, Microsoft went one direction, the rest of the world went the other. So again, if you have Linux or FreeBSD assembly code, then that should come over reasonably easily. But if you have code from Windows, you'll have to change it a bit.

How about our memory layout for 64-bit systems? Since we have gobs and gobs and gobs of memory, what do we do with all of it? Well, 0 to 4 gigabytes is used for the kernel and the 32-bit address space. Our kernel is still 32-bit in Leopard, and so it has to sit down there. From 4 gigs to 128 terabytes are 64-bit applications and libraries.

It's pretty unlikely that you'll write an application that actually uses all of that right now. I mean, the Mac Pro can only fit up to 16 gigabytes of RAM. So if you actually tried to use 128 terabytes, you'd be paging for a long time. But you can try, if you'd like.

Now, from 128 terabytes to some very large number that I couldn't do the math for, this is not addressable. This is a hole in the address space. This isn't some special feature of Mac OS X where we just decide to stamp out the middle of the address space. Instead, this is due to the architecture of Intel 64-bit chips right now.

But you're probably not missing this, because again, you really can't touch 128 terabytes of pages today. So that hole itself wouldn't help you all that much. And from that address and up, what we actually refer to as the negative address space, because it's easier to think about if you go backwards from 0, that's reserved for future kernel usage. So you can't allocate memory in that space today.

I'd like to talk a bit about that 0 to 4-gigabyte range. This is particularly interesting as you migrate code from 32-bits to 64-bits, because as Matt mentioned and as Ali mentioned yesterday, we see a lot of bugs in code that migrates to 64-bits, where people are truncating pointers.

And of course, if your code is still running within the low 4 gigs, then a truncated pointer will continue to work just fine. So to reach the 64-bit apprentice level that Ali described yesterday, you want to make sure that your application runs above 4 gigs, and so all of your pointers are up high.

Now, the linker defaults to a 4-gigabyte page zero size when building for 10.5 or later. This means that it'll simply carve out the low 4 gigabytes of your address space that isn't, say, malloc-allocated memory. It's just mapped out of your space, and none of your code will end up running in there.

But if you're running 64-bit PowerPC code, there are a couple of ways to get your PowerPC 64 application to load in the low 4 gigs. And one, which is actually on by default, is to build with -m dynamic no-pick. This is good for performance reasons, so it'll make your PPC 64 code run a little bit faster.

But when you turn this on, your application will actually load in the low 2 gigs, and the rest of that space up to 4 gigs will be blocked out, so no shared libraries will load there. But your application, if it's truncating pointers, will still be able to do that. So if you want to make sure that you can get to that 64-bit apprentice level and that you are not truncating your pointers, then you want to turn off m dynamic no-pick, at least for testing.

And once you're sure that you're 64-bit clean, then you can turn that on and get the performance boost back for PPC 64. Another way to do this is to use a Mac OS X deployment target of 10.4 or earlier. And the third way is to explicitly set the pay zero size, to say, "Really, trust me, you don't want the 4 gig page zero. I just want, say, one page."

This is somewhat different on Intel. The only way to get an Intel application to load in the low 4 gigs is to explicitly set the page zero size. So regardless of your deployment target, regardless of the value of Mdynamic no-pick, your Intel application will load above 4 gigs. This means that if you move to PPC64 first and then move to 64-bit Intel, then you may not notice your pointer truncation errors until you move to the Intel side. Now, why did we do it this way?

First, 64-bitness, to really enforce, to ensure that you don't have pointer truncation problems. But secondly, actually for performance. Due to the architecture of the virtual memory system on the Intel systems, if we have your application not overlap with the kernel, then kernel user space, and again, the kernel is in the low 4 gigs, then kernel user space transitions are that much faster. We can actually special case applications that have a 4-gig page zero. And make all of your system calls that much faster. So another thing that's nice about 64-bit on Intel.

So a summary about everything about 64-bit In-Debt. API migration, Matt talked an awful lot about this, and there's lots and lots to know. Basically, what this comes down to is that we've modernized our APIs. The things that we've been telling you for years are sort of the future direction of the platform, are the things that are available for 64-bit, and the things that we've been deprecating or trying to move away from are generally not available. Again, the most important one there is to migrate from Quick Draw over to Quartz.

For 64-bit on Intel, it's also a modernization, but in this case, a modernization of the architecture rather than of the APIs. The quick summary is that it's simpler, faster, and better. We love this architecture. It's going to be great for us in the future, and we hope that your applications can take advantage of it.

We have three 64-bit labs throughout the show. The first one starts at 2 o'clock today. They're all in Mac OS X Lab B. And I encourage you to bring your code by and start the process of moving your application over to 64-bit. I think we have a bunch of Mac pros there. And you can give it a shot.