I/O Technologies Overview: Best Practices for Driver Development - WWDC 2006

OS Foundations • 52:55

I/O Kit, the driver development framework in Mac OS X, provides a number of families that make it easy to develop state-of-the-art drivers for FireWire, USB, PCI Express, and ExpressCard devices. Come learn how Apple's own driver writers use I/O Kit, Xcode, and related Mac OS X technologies to handle issues impacting driver development.

Speakers: Rhoads Hollowell, Rob Yepez, Ethan Bold, Eric Anderson

Unlisted on Apple Developer site

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.

Hello and welcome to session 409, I/O Technologies, Best Practices for Driver Development. We will be talking to you today, we'll have four speakers today talking about four different areas of driver development, so let's get started. Here are the four speakers and the areas we will be discussing. My name is Rhoaads Hollowell. I'm with the USB software team, the USB technology team, and I'll be talking about what's new with USB.

So the main issue that we have, or the main thing that's new with USB these days, of course, is that we've now switched to the Intel I/O chipsets in all of our products, which means we have two different USB controllers than we've had historically. We have Intel's high-speed controller, or their EHCI controller, and we have Intel's full-speed controller, the UHCI controller. We also, new with the Mac Pro, have 64-bit systems that are slightly different than historical 64-bit systems, and so I'll talk a little bit about that.

First of all, I want to talk about the Intel EHCI controller, their new high-speed controller. The use of this controller should be completely transparent to both driver writers and device manufacturers. The existing Apple USB/EHCI controller driver that we've had for quite a number of years now works great with this new Intel architecture.

Things that we've seen that might affect driver writers slightly is, for example, in isochronous I/O, you may not be able to send out isochronous data packets in frames that are too close to the current frame. That number has always been an issue with isochronous driver writers. The number may be two milliseconds into the future, it may be four milliseconds into the future, so your driver just has to be aware that when it begins an isochronous data stream, the first so many frames may or may not go out on the bus, and you should be able to deal with that without any issues. Other than that, there are very few issues that we've seen with drivers that, you know, some slight timing issues as far as how much data can go in a millisecond and that kind of thing.

The Intel Full Speed Controller, the UHCI controller, is also new to us. And although this particular controller has been around at Intel for quite a long time, the Apple Mac OS X driver for the UHCI controller is fairly new. We started shipping this driver in 2005 with pre-Intel systems to support UHCI PCI cards, and now with the Intel systems, we of course have to support it in all of the new hardware.

One of the issues that driver writers might see is that you might get slightly different error messages returned than you did in the old OHCI days. This comes from the fact that the UHCI controller gives the system software different information than an OHCI controller does, and so we may end up taking that information and we do our best to map it to the same types of error messages in the case of errors as we did with OHCI, but this is not always possible because we don't always have the same amount of information.

Now another issue that affects people with the UHCI controller is that the UHCI controller, unlike an OHCI controller or an EHCI controller, is not allowed to have a data buffer for any packet across a page boundary. So this means that, for example, if the max packet size of a device is 64 bytes, which is typical, and a particular 64 byte data packet would cross a page boundary, then the system software has to actually copy that 64 bytes before it DMAs it to the controller, or before it gives the DMA address to the controller for sending or receiving the data buffer.

Now the host controller driver handles this type of thing transparently for your driver. You should not have to deal with it at all. You shouldn't even know it's happening. What this does mean is that you may not change the data in your data buffer after making the I/O call and before that call returns.

So, for example, if you have an outgoing I/O packet or an outgoing I/O buffer and you were sort of cheating, if you will, and filling in the data closer to the time you expected it to go out, that's no longer acceptable. If you are sort of cheating and looking at a data buffer, expecting data to come in, it may or may not all be there when you expect it to, because we may have to be copying the 64 byte data packet behind your back. So don't change the data in the buffer after you make the I/O call.

Now, for low-latency isochronous N, where we do have people that sort of expect to be able to see their data in a real-time thread before they get the callback, we do copy the data before we update the status in the frame, in the isochronous frame structure. And so that is still the case. If you wait until that isochronous frame structure status is updated before you look at the data, those data bytes will be, in fact, there when you look at them.

64-bit system support. Now, even though the kernel is a 32-bit task and your kernel extension, if you have a USB driver that's a kernel extension, is also a 32-bit task, it may be running on a 64-bit system where the memory buffer itself is mapped into a physical memory space that is greater than a 32-bit address can handle.

The EHCI controller is able to handle these data, these physical memory addresses, these 64-bit memory addresses, transparently, completely capable of doing 64-bit DMA. So that's not a problem. However, both the UHCI and the OHCI controllers are not capable of handling more than 32 bits of physical address space. So in this situation, the host controller driver, the I/O USB family software, will see that that's the case and will end up copying the data buffer.

The I/O Kit is a new feature that allows you to change the driver's data before or after the I/O as appropriate. So again, what this means is your driver should not change the data in the data buffer while the I/O is in flight. This is a case where low latency isochronous does need to pay attention because our low latency API gives you the ability to look at your data once we've updated the frame status. And of course, that's going to be very difficult if the data is in 64-bit memory, the physical address is 64 bits, and we have to do the copying.

So what we've done is we've added a new API for those of you who are doing low latency isochronous in transactions. And it's called GetLowLatencyOptionsAndPhysicalMask. This allows you to retrieve a physical mask for use with I/O Kit's new call in task with physical mask that will say, "Hey, I want to make sure that my data buffer is allocated in physical memory whose address is lower than 32 bits."

It also allows you to retrieve option bits. The only one that is actually relevant in this case is physically contiguous because on some controllers, for example, the UHCI controller, you also want to allocate your memory as physical contiguous. We had a property-based mechanism for doing that in the past, and we've now rolled that into the same API so that you can just with this one call retrieve from the controller the option bits and physical mask that is necessary for that controller to function properly for low-latency, isochronous input buffers. And I will now turn it over to Rob, who will talk about HID Manager.

Thanks, Rhoaads. I'm Rob Yepez with the I/O Kit team, and I'm primarily responsible for the HID Manager in Mac OS X. Today I'm just going to talk about what's new in HID, create a new HID Manager API, and talk a little bit about the 64-bit support that we've added in Leopard.

So first off, I'll start by discussing the past issues that we've had with the HID Manager. The old API used the CFPlugin architecture, which was a little bit confusing for some of our developers. It also required a working knowledge of the I/O Kit APIs, and you need to use these APIs for device discovery and removal and whatnot.

It also required the developers to manage all aspects of the HID objects. So you had to basically take care of storing them and keeping track of everything associated with the HID objects, such as the transactions, the queues, the elements, whatever. And the other past issue we've had is the inconsistencies with acquiring the device properties.

With some properties, you'd get them from the plugin, and other properties you'd get from the I/O registry. So we're kind of trying to centralize that in the newer API. And also, the way we pass the HID events is a little confusing. We use the really big, massive structure to pass the events that sometimes either had a 32-bit value or a data value on there. And it was kind of hard to decipher what value was actually being passed in that particular event.

So here's the new HID Manager API. What we've done here is we've pretty much based the API on CF-type objects. And what we gain for free here is reference counting and the ability to store these in CF collection objects. The new objects that we've created are the I/O HID Manager, I/O HID Element, and the I/O HID Value.

We also have newer versions of the old APIs we had in the past, so I/O HID Device, I/O HID Queue, and I/O HID Transaction. The advantage of doing all this work is we're able to clean up the API a little bit and shield the developer from using any unnecessary APIs.

So this kind of gives you a brief layout of the new HitManager API. And what you see here is the first thing you'd really want to talk to you is the I/O HitManager. And then if you really need to, you can talk to the I/O HitDevice and I/O HitElement, but they're not really required. And of course, the optional objects here are the I/O HitQueue and the I/O HitTransaction.

So let's talk about the I/O Hit Manager. This is an entirely new object in Leopard. And what this does is it handles most aspects of your device and queue management. So it will do everything for you. Kind of nice global interaction with your Hit devices. So this is your single endpoint to communicate with all your devices. And what this does is it sets up your device discovery and receiving input events. So what we've done here is you can start beginning to receive input events in only five API methods.

This is kind of a brief example of how to use the I/O Hit Manager. What we're doing here is we first create the I/O Hit Manager. We don't have any options yet, but we will in the future. And after you've created the Hit Manager, we go ahead and open the Hit Manager for communication. What we do here is you can also pass the KIOHitOptionsSees option bit here, and this will give you exclusive access to the devices that you're communicating with. So if you wanted to seize the mouse or the keyboard from the system.

After you've opened the HID Manager, let's go ahead and set up a matching dictionary for the devices that we're actually interested in talking to. For this example, we're going to try to register interest in joysticks. So what we do is we create the matching dictionary using CFDictionary, create mutable. Go ahead and set up the CFNumbers for the usage page and usage. In this case, the usage page is generic desktop. The usage is joystick.

Once we've done that, we go ahead and set the values into the dictionary. And after we've done that, we make a call to I/O Hit Manager, Set Device Matching. Now, if we're interested in multiple devices, we can use I/O Hit Manager, Set Device Matching Multiple. And what that does is it takes in a CFArray of CFDictionary's. And so you can use that to set up interest in multiple device types.

So after we've done that, let's go ahead and register a callback for device matching. And so this is a callback that's used to notify us any time a device matching our profile enters the system. We also register the input value callback, and this is for the events that are being received from the device. And let's go ahead and schedule this with the run loop. And in this example, we're just using the current run loop with the default mode.

And if you look at the end, we're calling CF run loop run. This may not be necessary in Cocoa or Carbon applications, But if you're doing a regular task, this might be useful for you. And for the input value callback, we just receive the event here. We're just checking to see if it's not null, and we go ahead and call processValue. And this is a method we'll discuss a little bit later in the presentation.

The next object is I/O Hit Device, and this is basically a newer version of the old API we had in the past, not really required. And what this does is it provides you similar functionality to the older API, so you can use this to set and get element values and obtain input reports and set and get reports. We've also added additional functionality. We allow you to kind of store your own application-specific properties, and this is useful for you to maintain states. Instead of having to create objects that wrap around this, you can just use this object directly.

We also allow you to set up and register for simple input events. And so what this does is eliminates the need for you to use a queue in most cases. Now we do limit the objects that we enqueue here to those that are smaller than or the size of CFindex. And from previous presentations, CFindex is either 32 bits or 64 bits depending on what architecture you're compiling for.

Now, there are some issues that you need to watch out for with I/O Kit device. First of which, the differences between input report callback and get report. Now, you'd use the input report callback to get interrupt-driven input reports from the device. I/O Kit device get report can be used as well, but this issues control requests for a particular report.

The thing you've got to watch out for is that this should be supported by a device, but we've seen in the past is that reported calls to the device for an input report via get report sometimes results in a stalled pipe. So you might want to defer to using the callback mechanism instead to kind of avoid that from happening.

Another thing is you need to be making use of I/O Kit device copy matching elements. And what this does is-- Basically gives you all the elements of interest in the particular device. Previously, what you had to do was parse through the elements by grabbing them from the registry.

And depending on the number of elements that were available on the devices can be potentially expensive, because what you had to do was serialize the elements on the kernel side and then un-serialize them back on the user side. And this pretty much locked up your application for a little bit. Also, beginning in 10.4, not all the elements were accessible from the registry. And we did that on purpose basically to kind of minimize the amount of time it took to actually grab the elements from the registry.

So a new object that we've added is I/O Hit Element. And what this does is it replaces the old CFDictionary representation we had in the past. So it just provides you convenient accessors to get at all the element properties. Like I/O Hit Device, it allows you to store and obtain application-specific properties. So of course if you wanted to save any kind of state or maybe potentially have action IDs associated with each element, you can do so here.

We've also added the ability to have calibration settings. So what we can do is support calibration bounds, the granularity, the dead zone, and the saturation. And we'll show you a little example of how we do that here. So this kind of gives you a layout of how the calibration could be set up. See, we have the saturation points and the dead zone in the saturation.

So, just to give you kind of a brief example of how to actually set up the calibration. So, what we do first here is we try to look for the x-axis of the joystick. So, we go ahead and set up our matching dictionary. And we've already prepared the usage page and usage CFNumbers here. So, we're looking at the generic desktop and usage for the x-axis.

Now that we've done that, we go ahead and call I/O HitDevice Copy Matching Element. And what that does is it returns back a CFArray. We see if it's not null, it's got at least one object inside here. So let's just go ahead and grab the first element. Now that we've done that, just go ahead and set the properties on there.

And what we're doing is we're setting the property for the calibration bin and the max. In this example, just negative one and one. And for the granularity, just go ahead and set that to one. Now let's set the saturation points and dead zone for the joystick and the X-axis.

So now another object that we have here is I/O Hit Value. And this is a new object, of course, in Leopard. And what this does is it provides accessors to obtain the integer or data representation of a value. So it's more of opaque calls as opposed to looking at the structure like we did in the past. You can get the scale representation of an element value as well. And this is useful for getting either the physical value of the element or the calibrated value. And like I said earlier, it replaces the confusing I/O Hit that we had in the past.

Just to kind of give you an idea of what the previous structure looked like here, it's like basically an event struct that had about six fields. And the confusing thing here is that you had both a value that was 32 bits, but then the ability to be passing data values in there with the long value and the long value size. And sometimes it was a little annoying to kind of decipher which value was actually in this structure. So we kind of just shielded you for that with I/O Hit Value.

So here's an example of how to use I/O Kit value. What we're doing here is, like the method we mentioned earlier, process value. Let's go ahead and deal with the value that's being passed to us. First thing we do here is we go ahead and get the element that's associated with that value. After we've done that, we go ahead and obtain the usage page and usage for that value.

In this example, we're only interested in the X axis, so we go ahead and look for that. Once we've found that element, let's go ahead and get the scaled value. And in this one, we're only interested in the type that's calibrated. And just simple, greater than zero, move right, less than zero, move left. Pretty straightforward.

Another object in the head manager is ioHit::queue. And this is strictly optional. You don't really need to make use of this. But it's useful for queuing input type element values. And, like the older API, it gives you the ability to specify the queue depth and manually dequeue element values.

This is useful for queuing complex elements, such as those that are greater than the size of CF index, and also for handling duplicate elements. And those are the elements that have one use tied to a multiple report current element. Unless you really need to make use of this API, I would defer to using I/O Hit Manager Register Input Value Callback or I/O Hit Device Register Input Value Callback, as that will handle that properly for you.

Now, issues with using the I/O Kit device queue. There's a difference between absolute and relative elements in the way that they're handled on the queue. Absolute element values that do not differ will not be enqueued. So if you're looking for that kind of behavior, you might want to defer to using the input value call-- or, sorry, the input report callback. Also, relative element values, just because of their nature, will always be enqueued.

Also, when you're using the I/O Kit Q register value callback, you need to make notice that the callback will only be issued when the queue transitions to being non-empty. So what this means is that when the callback is issued for your particular application, you need to make sure to drain the queue completely. Otherwise, you will not continue to receive any callbacks.

This is an example of how to do such a thing. So here we've got the queue callback. And what we're doing here is we're looping until we run out of values that are passed with I/O Hit Queue Copy Next Value. And what that does is dequeues the values from the queue. Because it is a copy, you do need to make sure that you CF release the value ref after you're complete with it.

Next object here is I/O Kit Transaction. And this is useful for manipulating multiple feature output type elements. Basically, we'll limit the amount of communication to the device, so instead of individually calling set element value or get element value, you can kind of group them together and then use one transaction. Another thing you want to do is use report IDs to kind of group these together and still make sure only one report is issued to the device. New support that we've added in Leopard is the ability to have bidirectional support.

In the past, we only supported output transactions, so now we can set up input as well, so this is useful for feature type elements to, you know, obtain the state of a particular element. As with the older API, we allow the ability to set default values, and that's pretty useful for the output transaction.

and I'll briefly cover the 64-bit support that we've added to Leopard. Both the current and new HitManager API is 64-bit compatible. One thing we've changed here is I/O HitElementCookie. The type changes from void star to UN32 when you're building LP64. And this is done just for compatibility reasons, not anything you really notice on your end. And as we covered before in the past, the new API makes use of CFNX and like I said, 32-bits or 64-bits depending on what is native architecture you're compiling for. And with that, I'll go ahead and pass this on to Ethan Bold.

I'm Ethan Bold, and like Rob, I work on the I/O Kit team, and today we're going to talk a little bit about some best practices for driver power management. So why do you need to power manage your driver? Because most of the computers that Apple sells today are laptops. And if your driver is going to be for a PC card or any kind of PCI device, you need to have a power management-aware driver.

And it's very easy. Your driver probably only needs to be able to power itself on and off when the machine is going to sleep and waking up. So today-- So today I want to familiarize you with OS X's power management support and discuss how you can add that to your internal I/O service-based driver.

So who needs the APIs I'm going to be talking about today? You don't need them if you have a user space driver of any kind. User space, USB, or FireWire doesn't need to implement these APIs. If you have an in-kernel driver whose family already supports power management, then you don't need to implement these APIs either.

For example, the I/O networking family has good built-in support for power management. So if you're subclassing I/O Ethernet interface, you don't need to use any of these APIs. So the calls I'm about to talk about are for in-kernel drivers that directly subclass I/O service or subclass a family that doesn't offer that power management.

So your driver's role in power managing your device is to, first and foremost, power the device off while the machine goes to sleep and power it back on during wake from sleep. And beyond that, you need to save the state of your hardware to memory.

[Transcript missing]

One of the big gotchas with sleep is that user space threads and processes are still running while we're putting the machine to sleep and while we're turning your device to power off. So your device could be deluged with dozens of hardware accesses after it's already turned off. So you need a strategy for preventing those from actually trying to touch the hardware and panic the system.

A good approach is the I/O work loop based approach where you block all incoming threads with a close gate call and then those threads can safely resume on wake from sleep and complete their I/O accesses. And to your driver, all kinds of sleep will appear the same. You'll get the same types of notifications. Be it idle sleep, or late closed sleep, or even safe sleep, or hibernation.

So there are three pretty simple calls to sign up your device for power management. And there's one cleanup call you need to make. And we're also going to talk about one I/O service method you need to override to get those power management notifications. So the first call you need to make to register your driver for power management is PMINIT. And that just allocates some internal power management data structures.

The second call that you need to make is join PMTree. And this tells the kernel power management where your device needs to belong in the I/O power plane. And the I/O power plane is how we order sleep/wake across all of the devices in the system. For instance, the I/O power plane kind of looks like a tree. And the leaf nodes are always the first to be slept and the last to be woken. And only after all of a node's children have been put to sleep will that parent node be put to sleep.

So here's an example of that. This is an example of the I/O Power Plan that I took off of a MacBook Pro, and you can pull this off of any of your own laptops just by running ioreg-p iopower. And if you're familiar with the I/O Registry, this is just another facet of the I/O Registry that tracks these power dependencies.

And in this example, you can see Airport at the very bottom of the hierarchy, and its direct parent is an I/O Power connection, which we can ignore. And above that is the PCI nub that it is attached to, and above that is the PCI bus that it's connected to. So when the machine is going to sleep, Airport will always be told to power off first before the PCI bus that it's attached to.

The last step in signing up for power management is to define a couple of power states in an array and pass those into kernel power management. So we're starting here by defining an array of two power states. Each power state is defined by the struct IOPM_POWER_STATE, and you can find the definition for that in IOPM_POWER_STATE.h in the kernel framework.

But you can see here that at index 0 of the array, we're defining our off state, and at index 1, we're defining our on state. So when the system goes to sleep, we'll be told to go into this off state, and when the system wakes, we'll be told to go into this on state. And we're only setting three important fields here: this capability flags, output power character, and input power requirement.

And for our off state, we're setting them all to 0. And for the on state, we're going to set them all to 1. And here's some shorthand for setting that struct up, that array of structs. And this is how you're more likely to see these power states defined in any Apple driver code.

And next, all we have to do is call the I/O service method registerPowerDriver with a pointer to ourselves, or this pointer, because we are our own power controlling driver, a pointer to the array of two power states we just defined, and the number two, because we have two power states.

So, remember we had to call pm init, and then join pmtree, and then registerPowerDriver, in that order, and you typically do that in your driver start routine. Now when your driver unloads for when your device disappears for any reason, you just need to call PM Stop to clean up for all three of those PM initialization calls.

Okay, and the last step is to override the I/O Service Virtual Method set power state. Set power state takes an unsigned long argument, power state ordinal, and that's the important argument to watch here. So that number tells you which index into your power state array you need to transition into.

So here's some example code. You can see here that If the unsigned long which state argument is set to zero, then we're going to sleep and we need to take action to save state of our hardware, prevent any incoming hardware accesses, and turn our device off. And otherwise, we are waking up and we need to power our hardware back on and allow any blocked hardware accesses to proceed.

And there's a couple of interesting lines down at the bottom. You can--there are two return codes you can make. You can return I/O PM ACK implied, which means that you have completely powered your hardware off or on in this method, and you're done. Your hardware is completely powered off. Or you can return I/O PM will ACK later, and that means that you are finishing your hardware work asynchronously. And you will acknowledge later with the call Acknowledge Set Power State.

So, it can be very painful to debug power management problems and sleep-wake issues because a lot of the debugging facilities that we rely on aren't there when the machine is going to sleep or waking up. So, the display can be powered off, Ethernet can be powered off, and hard disks can be spun down. So, all those facilities that we rely on aren't available.

But luckily, unless you're working on a FireWire driver of some sort, you can load up a FireWire logging kex that lets you do printf-style logging from your machine to a second computer. And you can get logs all the way down to the point that the CPUs are turned off. So, it's a very, very useful tool. And we also have a... Oh, and the FireWire kprintf debugging is available in the FireWire SDKs today.

So there's also a tool called SleepX that I'll point you to. And SleepX just lets you stress driver sleep/wake in your driver by sleeping and waking the entire system dozens or hundreds or thousands of times. And you can catch leaks and races and all sorts of stuff. And that is probably not available today, but should be available from DTS soon.

So that's about it for me. There are a few gotchas with Sleep/Wake that I'd like to mention. One is you can't allocate memory on the sleep path because a memory allocation could require VM to page out some user space memory, but if the disks are already spun down, you'll create a deadlock and the system will hang.

I should mention that these APIs won't give you a notification at system shutdown. And I should also note that this was a pretty simple run-through of a simple way to do device power management, but it gets considerably more complicated to do any higher levels of power management. So, thank you. That's it for me. And Eric Anderson's up.

Okay, thanks Ethan. Hi everybody. To round out this morning's presentation, we're going to talk about best practices in FireWire. A lot of the same general concepts as you heard for the other I/Os from a FireWire point of view. First, what's new? We spent the past year pretty much changing everything, hopefully in order to change nothing. We believe on the Intel platforms we have total feature parity with the PowerPC platforms before them. Everything should work.

We have our SDK 22 that Ethan just mentioned, which has the FireWire kprintf service in it. That's available at the URL you can see there. That's already up. That's been up for a couple months actually. Everything in there is 100% universal source code, sample code, and the tools that are in there are just about 100% universal.

I should note, unlike the transition a few years ago from Mac OS 9 to Mac OS X, where we sort of dropped FireWire out of the classic environment, FireWire is fully supported in Rosetta. You can use the PowerPC applications from SDK 22 or 21 even, and they will all work fine on the Intel systems. Of course, we encourage you to go native. We've done everything we can to support that with full parity and full source code.

But if you do need to run in Rosetta because, say, you're a plug-in to an app that itself is not native, you can do that. FireWire services are completely available, and they should work fine. Two other new areas that I'm going to discuss this morning are 64-bit support for the new Mac Pro and some new sample code that we have for audio/video type devices.

So like you heard a little earlier, the Mac Pro has a 64-bit architecture. It can have more than 4 gigabytes of system memory. The OHCI FireWire controller that we use today is limited to working within the first 4GB of memory by its architecture. So if you are working from user space through FireWire, through our user clients for example, or through higher level APIs, you're fine.

Your buffers and user space can be remapped so that the physical page underneath them moves into the low 4GB where FireWire can see that page with its DMA. So you don't need to worry about that. On the other hand, if you're developing a kernel driver for FireWire, because we're a high performance I/O, we don't want to be copying your data all over the place just to get it in and out. So we require that you allocate these buffers below 4GB if necessary on these systems so that the DMA can touch them directly.

Now there's two kinds of buffers in FireWire that you may need to be concerned with. For ordinary buffers in memory that you're simply going to run an isochronous program, to transmit or receive, or the basic FWREAD, FWWRIGHT calls. These are handled by the first block of code here. There are APIs to simply find out the mask or the number of bits that's supported by the controller that's backing up your object. Today these will all return either 32 or that constant that you see with 32 ones in it.

Conceivably, in the future, you may run into the need for a different API. FireWire allows external devices to access the memory in the computer directly through the FireWire hardware with no software intervention at all. The FireWire specification carves out a range of memory addresses in which this can be done.

That range is actually 48 bits large. So, conceivably, physical memory can be accessed above 4 gigabytes through a controller that supports it. We don't have one yet that does it, but if you want to be forward-thinking and take full advantage of a future controller that did support that, there's an API for you.

If your buffer is going to be visible from outside the box through FireWire, then use this second API, which in the future could return a number up to 48. It can't go above 48. But if we someday make a mistake, we can do it. So, I'm going to show you how to do that. So, if you have a PC machine with 256 terabytes, this will keep you on the right-hand side of that line where your stuff will work properly.

So here's a simple example, just like you saw in the slides before. Get the FireWire physical buffer mask, which today will tell you 32 bits of one. Use InTask with physical mask and just pass that mask in. Then it's common in FireWire, if you're doing isochronous real-time transfers in or out, to get one large buffer and carve it up yourself packet by packet. So you can simply call getBytesNoCopy to find out where things are and start doing pointer arithmetic to build up your program just like before.

Okay, one other area that is new in FireWire is a substantial increase in the amount of sample code for our audio/video services framework. There is an MPEG transmitter now that's constructed using the new DCL service. New DCL is sort of an abstract programming language by which you describe a real-time, isochronous transfer into or out of the system on FireWire.

The MPEG transmitter demonstrates how to use variable length packets. The previous system only had fixed length packets. And even if you're not sending MPEG, this is probably a good starting point for any kind of transmit code. You can just change the packet sizes, change the loop sizes, the callbacks, and so on to meet your needs. Also available is a new universal receiver. This is actually a legacy-style DCL receiver. We also will have a new DCL version of this.

This can receive any Isochrone's channel. It could be DV, it could be MPEG, it could be iSight, it could be audio, it could be something that you've invented for some unique device. Whatever it is, this receiver can receive it. So you can use this as a starting point to develop your own code to optimize it to meet your needs, or you may just want to run it as is. Many kinds of data that we receive from FireWire stream according to the ISO 61883 standard. So if your data fits in that category, there's a parser built in that will help pick it apart and find the appropriate packets and payload buffers for you.

So why a universal receiver aside from just its good sample code to start with? Well, there's two places you might want to keep the full universal support. One is there are devices like camcorders that actually support multiple formats. They may stream DV at one moment, DV25. They may turn around and stream MPEG or HDV the next moment because the user has moved a switch on the camera or they've put a different tape in the camera.

Or the tape may even have different formats recorded on it and it's just playing back and switching on the fly. So with the universal receiver, you don't need to tear down and build up a receive program every time you detect a format change. You can just roll on through and receive everything.

The other case where you might want this, even if you don't have that kind of device, maybe you have a multilingual application. Maybe you've developed some sort of FireWire video viewer that just wants to display video from anything. Here, too, rather than maintaining four different receive DCLs for four different packet formats coming in, just run the universal receiver.

You can pick the bytes out and decode them yourself, but now you don't need to tear it down and maintain it for each different device that may come along. All of this will be available in FireWire SDK 23. It's at the same URL you saw before. It's not available yet, but we do plan to post it within a few weeks.

One other thing that will be in there is a tool, a sample code called AVC Browser. This replaces an older version by the same name. AVC Browser is sort of like Apple System Profiler, but for a FireWire device that uses the AVC protocol, which would be a camcorder, a television, a DVHS, whatever.

So you can see the panel on the left is a list of devices found on the FireWire bus. The panel on the right has opened up one device and is offering controls. This allows you to sort of poke hands-on, it's somewhat like Reggie in a sense, which is a developer tool for reading and writing memory and other things.

AVC Browser can send primitive commands to and from your device, so you can explore to see what commands the device actually supports, how it reacts to them, before you sit down and write code for doing each of those things and then trying to debug it on the fly. So it's a good way to get started with an unknown device or to become familiar with the protocol that your device speaks.

Okay, to wrap up, we have some advice and recommendations for how to be successful and effective developing FireWire drivers. Always start with the latest FireWire SDK. There's the URL again, same one as before. It is full of sample code, source code, documentation, tools. And a lot of the tools are designed to help you get started on understanding how things work. Like AVC Browser specifically lets you poke around at an AVC device.

We have tools for viewing the configuration ROM in a FireWire device, tools for viewing the registers in a PHY, tools for viewing the bus topology. So rather than cracking open a 400-page spec and thousands of lines of API, you can try these things hands-on, one by one, whatever is appropriate to your device, and get comfortable before you start writing code. So then you really feel you know what you're doing.

If at all possible, please write in user space. As I said, we have 100% parity with our PowerPC services. We believe 100% of FireWire is available in user space. If you find something that's missing, we will add it. So it's much easier to debug in user space. You can run an Xcode, you can step through your code, you can see what's happening, you don't have to keep rebooting every time something goes wrong. It's very preferable.

In the SDK, most of the sample code that we provide is for user space projects. There are a few exceptions. If we're going to boot from your FireWire device, your driver will have to be in the kernel. If you've invented some new kind of RAID we've never seen before, you may need to write your own driver for that. That would have to be in the kernel.

Similarly, a network driver, if you've got a better internet protocol or some custom application-specific thing, that probably has to go in the kernel as well so that it can plug into the networking family. Finally, search engine. Finally, certain audio devices, if you need extremely low latency, such as for real-time effects processing, that may need to be in the kernel as well just to take advantage of real-time services. Other than those, we think you should be able to work in user space.

This really applies to USB as well as FireWire. Do not assume that there's exactly one controller. The Macs that we sell today have one FireWire controller built in, but some of them have slots where the customer can add another controller. Some of you are developers for the cards that would go in those slots, but it's unfortunate if your customer buys the card, adds it to the Mac, and their plug-in or their driver won't work on it because someone assumed there's one and only one FireWire port. So please test out your drivers on systems that have an added controller to make sure you can cope with it.

The same goes for your device. Of course we want the customer to buy one of your devices, but it's even better if the customer buys two of your devices. So please make sure that that's going to work for them by testing on the system with two or more devices at once.

Usually all it requires is letting the user choose. Give them a menu or some kind of control where they can pick which device they want to talk to. A lot of the tools in the SDK show how to do this. They have a menu in the upper left corner that lets you select which FireWire interface and if applicable which device or which node they're going to talk to.

Device matching is usually the first thing someone tackles when they sit down with a new device and try to write code for it. It's a mystery. You know, you're just getting started on the project. As soon as you match to the device, usually, do you declare victory and move on to actually making it do something?

Well, please take care to check that you haven't matched too lightly on your device. Your driver and your device may work perfectly together, but if your driver also matches on someone else's device, which may be a very different kind of thing, it may thrash trying to talk to that device and getting confused. So, even though this is usually done first and then forgotten, please come back and make sure you really are matching as tightly as possible to your device so that you won't conflict with other things.

FireWire provides a structured configuration ROM in the device, which our software discovers. In there are places for your vendor ID, your model ID, and other values that help set your device apart from others. So, one of the tools in the SDK is called Firecracker. It is a browser for the configuration ROM in your device. So, put it in your device.

Plug your device in. Run Firecracker. See what sense Firecracker can make out of your device. Make sure your unique values, vendor ID, model ID, and so on are all correct. If you need some examples for comparison, Firecracker comes with about eight built-in devices that you can pull up from the file menu just to see what other devices look like.

Finally, this is not a complaint. We know you all test your products carefully. Here's just some tips on how to test them the most effectively. As I just described, use Firecracker to validate your config ROM. If Firecracker has error messages or complaints about bad checksums, there's probably something wrong in your ROM that you need to clean up.

Another tool in the SDK is called FireStarter. It shows the bus topology and some very low-level information about the bus, and you can use this to check that all the bits are correct in the self-ID packet that your device sends. There's going to be more detail on this in this afternoon's session, session 410.

Another tool built into the SDK, FWPlug-O-Matic, electrically connects and disconnects the FireWire port from your device, so it can simulate the hot plugging and hot unplugging of your device automatically. This saves wear and tear on the connector, saves wear and tear on you, and helps you check for rare loading or unloading problems, memory leaks, and so on.

As I mentioned before, try two of your devices or more. Put them on the same bus, get them active at the same time, make sure they're really solid when they're both active. Like Ethan mentioned, test, sleep, wake. Often we hear people say, oh, sleep/wake's not supported for my product.

There's no explaining that to the customer. The product sleeps and wakes. It's configured out of the box to fall asleep if you leave it alone. Do what Ethan said, support sleep/wake. And test it. Test it with your device plugged in. Test it with your device active. People do things like close the lid in the middle of copying files or printing, and your driver can survive that if you use the right APIs and test it carefully.

On FireWire, when a device is plugged in or unplugged or certain changes happen in software, there's an event called a bus reset. This is not an error or reason to stop. It's just an ordinary event. It happens from time to time. Nothing bad should happen when there's a bus reset.

We have two tools in the SDK that can generate these so that you can easily test your devices to make sure they keep working. FW Busy Bus is really good at creating bus resets. It can generate thousands per second or less. It's very adjustable. FW Reset Storm is a little bit gentler, but maybe a good place to start for causing bus resets.

Busy Bus has some additional capabilities, finally. You can use this to rapidly test your device against traffic that comes in over FireWire. A simple test would be to just throw configuration ROM reads at your device as fast as the tool can do it. That's much faster than plugging and unplugging your device over and over again.

This tool can literally do thousands of packets per second. If you have a rare problem responding to these, it should pop out pretty quick. If you can pass that, try a harder test. Configure the tool to send junk packets to your device. This wouldn't happen in practice, but if you can survive that, you can probably handle anything the user is going to throw against you. It's a powerful tool. It's in the SDK. Give it a try.

Finally, please participate. We have a variety of events and services to work with developers. Tonight at the Apple campus, there's a Plugfest. It's combined FireWire, USB, and Bonjour. We'll have a lot of the engineers there, a lot of the devices, all the tools I've talked about will be available. You can try them out. There's also something about a band and free beer, but come to the Plugfest.

Okay, we have a public mailing list where we answer your questions about FireWire, and even better, you answer each other's questions about FireWire. The details are in the SDK, so you don't have to copy them down here. And the 1394 Trade Association is a worldwide organization of 130 companies that all make some kind of FireWire device.

It's a great resource for meeting engineers and designers, people responsible for the standard, a lot of creative ideas. They run their own plugfests, they have quarterly meetings, technical working groups, mailing lists. If you're serious about making devices, you should join the Trade Association and tap the many services that they offer.

Okay, if you need more information about any of the topics that we discussed today, you can contact Craig Keithley. He is the technology evangelist just for I/O in general. You can also find the documentation, the sample code, the source code, and other resources from these sessions all at the WWDC 2006 site. Okay, with that I'm going to, oh I'm sorry, there's one more. There is a lab tomorrow morning from 9 to 11, and there is the beer bash, I mean plugfest, that I already mentioned, and that's tonight back at Apple.