Configure player

Close

WWDC Index does not host video files

If you have access to video files, you can configure a URL pattern to be used in a video player.

URL pattern

preview

Use any of these variables in your URL pattern, the pattern is stored in your browsers' local storage.

$id
ID of session: wwdc2008-552
$eventId
ID of event: wwdc2008
$eventContentId
ID of session without event part: 552
$eventShortId
Shortened ID of event: wwdc08
$year
Year of session: 2008
$extension
Extension of original filename: m4v
$filenameAlmostEvery
Filename from "(Almost) Every..." gist: [2008] [Session 552] Unleashing ...

WWDC08 • Session 552

Unleashing Leopard Server Performance

Integration • 50:00

Leopard Server contains substantial performance enhancements designed to ensure that critical services remain responsive even under high utilization conditions. Get the details on how to maximize performance and enhance scalability of Leopard Server installations across a variety of workloads and configurations in your environment.

Speaker: Steve Parker

Unlisted on Apple Developer site

Downloads from Apple

SD Video (582.5 MB)

Transcript

This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.

My name is Steve Parker. I am the lead of a fairly new team, the server performance team inside 10 Server. And I am here to share with you some of the great stuff that's new in Leopard and Snow Leopard. I'm not sure I'm going to be able to compete with the entertainment level at the bash last night. I assume some of you enjoyed that.

Thank you. Yeah? All right. So my entertainment is going to be a much geekier sort. It's going to be about systems and making them fast and scalable. So here we go. Can I get the – there we go. Basically, the talk today, I'm going to begin by talking about software-specific performance gains.

I'm going to talk a little bit about how the hardware has improved over even just the last year. And then I'm going to talk about how to approach setting up a Leopard server system and getting the most out of it. And then finally, I'm going to go through – and this is the bulk of the talk – is actually going to be about diagnosing – performance problems and what you can do on OS X to work out those things.

So first of all, I'm going to begin by running through a couple of the key performance gains that stand out in Leopard. The first one I wanted to talk about, this is 10 gigabit ethernet. This is based on 1,500 byte MTU packets, so actually challenging the OS with lots of little packets. This is over 1.4 times better in Leopard.

AFP file sharing is over 2x. This is almost entirely SMP scalability improvements, so it's putting all of those cores in new systems to work. And this is a model that we have that simulates a corporate support site, sort of like support Applecom. So this is based on using Apache on our system, simulating a site with CGI that actually consults a back-end database that presents images, downloads, all that kind of stuff over 3X. So how's that? So you might be asking, how do you do that? That's an awfully big game. So one of the things that Tiger had was a thing called the funnel lock.

The funnel lock, unfortunately, was across about 80 different system calls. And this is an example of a timeline of a hypothetical mail server, web server, and pop client. And you notice what there are is a lot of gap in there. And that gap was because this was a single global lock, and you couldn't be in any one of those system calls on more than one core.

And this was probably the biggest single difficulty for server performance. Select was one of those system calls, and as many of you know, that's key in many server applications. Leopard, gone. So all of these system calls now proceed in parallel. And the result is a significant part of those gains that you see in, for example, web performance.

Something else that wasn't very good in Tiger was the way the networking stack worked for multi-core, and in particular, the mBuffs, which are used for network buffering. So those are a global pool, and that global pool meant that as you were processing, any time you needed to get or return an mBuff from the pool, you were stuck behind a global lock.

And so only one of those CPUs could actually be getting work done. And if you failed to get that lock, you'd get rescheduled. So that meant your cache gets dumped, something else tries to start running, and if you're unlucky, that something else also tries to go after that lock. That's bad.

In Leopard, a new architecture that we call the MCache has been implemented and although it still has a global lock across the original mBuff pool at the bottom edge, it has a set of depots which is broken up by different sizes of mBuffs. So when you go to allocate, several allocations can in fact go on at once.

That's great, but there's yet one more change, which is allocations from those depots can be in groups. So, for example, if the TCP stack has a megabyte of data thrown at it, it can reach down to the depot and grab all of the mBufs that it's going to need for that in a single call.

And then once this group of mBuffs has moved up and is in use by a particular core, there is a per CPU magazine and it will use it, return it to the magazine, and if shortly thereafter there's need for mBuffs again, it will simply grab it out of the magazine. And that doesn't entail a global lock that proceeds in parallel across all of the CPUs. So this is the next biggest element of those really substantial performance gains. Finally, something else in TIGER, TCP input processing was restricted to a single CPU.

So that meant that no matter how many networks you stacked up, it was all going to get shuttled across to one core. That core would do the processing, and it would have to migrate back out to the individual applications. No more in Leopard. Network interface is assigned a separate CPU to do the TCP input processing so that all of those networks proceed in parallel. And those web performance results you see are based on using many NICs, not just one.

So, all right, well that's Leopard, but Leopard's kind of yesterday. So Snow Leopard is the future now. And why don't we take a look at what the web performance in Snow Leopard is. Well, we're up to 4.34 times the performance of Tiger and a huge jump against Leopard.

So how did you do that? Right? First it's three times faster, now it's 4.34. Well, first of all, still in Leopard was a global lock for IP routes. So that meant that at the IP layer, everything was having to contend across a single lock. They weren't necessarily using the same information, but the lock had too wide a scope. So this is now broken up in Leopard.

That is in addition to adding TCP segment offloading support. So NIC drivers are now able, if the hardware is capable, of volunteering and saying, you can send me megabytes of data and I'll just break it up and compute the TCP headers, checksums, all that kind of stuff taken care of for you. That gains back a bunch of additional CPU that you can do web processing with.

[Transcript missing]

So there are some other gains in Snow Leopard that I think are really outstanding. One of them is that although late in Tiger we introduced 64-bit user processes and you could have far more than 4 gigabytes of memory in a map, each individual map was limited to 4 gigabyte in size. That limitation has been removed.

In addition, although M-Advise had always been one of the system calls, the virtual memory system didn't actually use it. So M-Advise and Snow Leopard now allows you a lot of creative and interesting ways to work with the VM system to get better performance. An example of that is that you can M-Map a private memory region to create a large chunk of memory.

You can reach a point in your program where you decide, I'm done with it, but I might want to use it later. And this was a 4-gigabyte segment. I spent a lot of time getting that set up, but I can tell the OS, those pages have nothing valid in them, but I don't want to tear down the map. You get to keep all that expensive virtual memory mapping work that's been done, but if you go touch those pages, they'll be filled in with zero-based pages if the VM system ended up taking those pages from your process later on to help another process.

In addition, as disks grow in size, you want to do I/Os in fairly big chunks. The cluster I/O size within the kernel has been increased from 1 megabyte to 16 megabyte, provides some substantial increases in performance where you're streaming data. Besides the VM page queue lock, there are a bunch of other changes that I can't go into. There are too many and too complicated to explain in the amount of time I have.

I would be happy to discuss that more in our lab this afternoon, however. But they remove a number of additional global locks, reduce the amount of time under locks, and generally improve the parallel execution on Snow Leopard. And finally, there are two other interrelated things that you won't enjoy the benefits from unless you go out of your way to do so in the Snow Leopard seed build, but will be in the general release, which are a range of higher resource limits, including things like the maximum number of... the number of files you can have open, as well as the maximum number of processes.

and the 64-bit kernel. How many of you went to the K64 talk? Not too many. Okay. Well, so in the server world, one of the key bits of news is that the kernel now has its own 64-bit address space. Server applications benefit a lot because the file system metadata cache and other internal structures can grow to appropriate sizes for really substantial workloads.

Okay, so is Snow Leopard better? Well, first I guess I want to talk a little bit, one of the things that Apple has been involved with is the updating of specSFS. Spec.org is one of the outstanding industry organizations for auditable, verifiable benchmarks, and we are particularly proud to have been involved with this update of what had originally been an NFS-only benchmark, and it now supports both NFS and Samba.

And it is extremely rigorous. In particular, it's careful to make sure that it watches its own behavior. So it doesn't just issue an op. It looks at the clock and says, okay, I issued it now, and it's done now, and I'm doing ops at this rate. The latency is this much on average. The standard deviation is this much. And it reports all that information back. Makes it an extremely hard benchmark to do anything that isn't extremely clear and obvious what you've done. So there's no tricking this benchmark.

So we released the first commercial posting of spec SFS with the release of the benchmark against Leopard, 8,000 ops per second. One thing to notice is that the overall response time on that is 1.4 milliseconds. The disk drives on it have a far longer latency. One of the things that means is that you're not bottlenecked on the disks.

Yesterday we posted this new Snow Leopard result. This is 18,500 ops per second and a substantially higher latency. It's at least a 50% increase in latency because we are finally reaching the point where we have utilized all of that XServe and our bottlenecked on the disk drives. SpecSFS is a benchmark that normally you would expect to be limited by the disk throughput.

So I wanted to dive down a little bit into these results and talk about the CPU utilization. So notice that on Leopard, we were basically not able to put all eight of those cores to work. On Snow Leopard, we are able to get extremely close to putting all eight cores 100% on the job to service that NFS. But notice something else. Notice that...

[Transcript missing]

That just underscores the gain here. So what used to be adequate 48, 15 case hash drives is now not adequate in Snow Leopard.

And then, just as a sort of a geeky footnote, to put this in perspective, one of the really demanding moments of the benchmark is actually where it creates the file set. So the file set is 0.6 terabytes to run this test. It creates it at 540 megabytes a second sustained.

And this is not IO zone against a raw disk. This is file system activity creating a directory tree with real file names just like it would be used in practice. So we are extremely happy with these results, and we look forward to even more gains in the future.

So, so far I've talked strictly about the software. So that's not the only story here. Since the transition to Intel, we have two XSERVs that have shipped, both a four core Woodcrest based system and an eight core Penryn based system. The spec CPU numbers, so the basic integer SMP score is 2.17x. with SpecJBB. Now, SpecJBB is a user-bound process that uses Java to simulate an order entry system, order entry and inventory system, 2.2x.

And finally, just memory bandwidth using the stream performance benchmark 1.6x. So add those both together and you get a stunningly faster system. So now I want to move to talking about what it is that you can do to ensure that you get the most out of that server system.

So... Keep in mind, if nothing else from this talk, this is an era where the server is a big brain that's connected by a lot of little pipes. So to put this in perspective, that NFS result is based on six gigabit subnets, 4.6 gigabits of aggregate traffic, four gigabit fiber channel links to that big array of disk drives.

So one of the key parts of the problem of getting the most out of server performance is balancing those different pipes, making sure that you actually put them to use, because if you don't, the power of that server in the middle may not be a benefit to you.

So now I want to start talking about some very particular bits of planning. First of all, remember I mentioned that each NIC gets a specific CPU for its input processing. So this diagram is a reminder that if you choose to bond several of your NICs into the switch with 802.3ad, you will in fact be ensuring that the input side processing is bound to a single CPU.

So if you have a choice between bonding or multiple subnets with multiple NICs, I would recommend that you go with multiple subnets. If you don't, the next question is whether or not you will have a heavy degree of input traffic. If it does have a heavy degree of input traffic, this could become a bottleneck.

So let's If we talk about other aspects of network planning, if you have a substantial number of clients that are across a significant network latency, so far across the country, the world, so if you're talking to iPhones as they wander around the world, one of the things you may find is that the default network buffer pools are not necessarily large enough to service that load in Leopard.

And in particular, you can use this series of tunings, first of all running NVRAM boot args to determine what the current system boot arguments are. Then adding an NCL equal value and then rebooting allows you to modify that pool. And I'll talk later on about how to decide when that pool needs to be increased in size and how to choose that size. So those were a couple of recommendations I wanted to make about networking.

Now let's talk about storage. So this is just the supported list for the Intel XServe. So the key takeaway here is there are a lot of different choices in your storage. And I've put details in the slides about those tradeoffs. I'm not going to go over them here.

But you should think carefully about what use you will put your server to and make sure that you've chosen a storage pool that's appropriate for your storage. So for example, with the NFS benchmark, it was very clear that if I was going to get the maximum, I needed a very fast storage pool. And unless I mention otherwise, that's probably the storage subsystem I'm using in this talk.

So once you've chosen what you're going to actually store it on, you need to decide how it's going to be organized. So we have several different choices here. One is XAN. This is especially good choice when you need scalable IO bandwidth, when you want to be able to incrementally increase the amount of traffic to a file system, and especially if it's a large data set. And so you are not going to want to reformat it or move it in any way. So XAN is an excellent choice for that.

If the utilization of the files is strictly local to each system or if you're intending to reshare it via NFS, AFP, or SMB, you should choose HFS Plus today. That does have one drawback, which is HFS+ has a single B-tree, single catalog file. That's a serializing point when there is a substantial rate of change on the file system metadata.

So adding new files, removing files, appending to files. Those operations have a limited scalability on systems today. So in that case, you may wish to consider how many file systems that you need to break that into. And finally, as we announced at the 10 Server State of the Union, ZFS is going to be supported.

So we are very excited about that, and that does remove many of those scalability limits that HFS+ has. We look forward to that as something moving forward. Finally, here's a collection of a bunch of the services that come with 10 Server. So in particular, those four at the top, When you are planning a multi-machine deployment, one of the things that you should not do is divide it up in that kind of way. All of those services at the top are what I call multi-process services. So they are starting multiple processes to do their work.

So for example, for SMB, there is a daemon present for every SMB connection on that server. There is a 2500 process link to the server. There is a limit in Leopard, and that limit can very quickly become a constraining factor. So instead, a far better approach is to begin by thinking about those multi-process services and choosing to divide those up across different systems, and then, based on what capacity is left over, choose to assign different services to the various servers that you have. have.

So those are the choices to keep in mind as you plan 10 server installations for performance. There's one other thing that I wanted to just mention more so you would be aware and for troubleshooting purposes, which is somewhere late in Tiger, a file called etsyrc.server slipped into the release. That file is our sort of tuning master pilot.

And what it does is it makes a variety of adjustments to the behavior of the system. So in particular, if you're in a situation where you have an OS X system, desktop system, and a server, and one of them acts differently, if it's a performance-related tuning, this is probably where it is. This is probably where the difference comes from. In particular, we set TCP behavior to be more friendly to Windows TCP implementation.

And most of the other ones are all memory-dependent. They are things like the file system metadata cache and the maximum number of processes. Normally, there's not a reason to change it, but I wanted you to be aware of it. Accompanying this talk is a KBase article, which will be available shortly. That KBase article will go into more detail on what each of those parameters tunes and when it's used. And it would be appropriate for you to make adjustments if you have a special case. We hope you don't have special cases. We do our best to eliminate that.

All right. So you've done those things to plan your server. You've built out the servers, installed everything. Now it's running. How good is it? How do you tell? Also associated with that KBase article, we are releasing a script which we use a lot called Top Guide. Now, don't get out your binoculars. We're not going to ask you to read the slide at that level.

But Top I'm going to take the output that it produces and break it down. This is basically harvesting top minus D output and summarizing it over time. It can save it into a log file and some other cute little things. But this is what we use as our top level as we investigate how a system is performing. So for example, the CPU section breaks down the percentage CPU utilization across user system idle as well as keeping a running average.

Now, obviously, if that had said, you know, I've got 100% utilization, well, that's an easy performance problem, right? We know that that means an application is soaking up more CPU perhaps than we would like. I'm not actually going to talk about that case. I think there's lots of other developer resources. What I'd actually like to talk about, though, and what I think is often a problem in the server space is identifying the when you have an underutilized CPU.

So you push load at your server more and more, but there's some point at which it just stops getting more work done. There's lots of CPU left. And if there is not a network or disk storage bandwidth problem, one of the key places to look would be Pthread mutexes.

So you can trace this sum with SHARP, but it's also very easy to use the new D-trace-based tool, PLockstat. And in particular, I definitely recommend that you use the minus C option there. And this gives you a very quick way to bore in on occasions where the PSTAT locking has caused you performance grief, in particular where it has put threads to sleep perhaps needlessly.

They break down into mutex block, reader-writer block for reader, and reader-writer block for writer events. And in particular, you can hone in on how often these things are happening and how long the total amount of sleep time, so the total amount of time threads weren't executing, is very easily. And in fact, even in some cases I find without source code, it's very easy to look back through clues from the symbols and get some idea of what might be going on.

So for example, in this case, I would in fact be quite concerned. That one particular lock that's shown in mutex blocks is happening thousands of times. If this is, say, a one-second sample, that's 3,600 reschedules per second. This does not sound like efficiency. So this would be an indication of underutilized CPU, and it would be time to go investigate that application.

All right. So sometimes, however, you get an underutilized CPU where it's not necessarily at the application layer. One of the possibilities is that you have an application which is pushing on the kernel in ways that are aggravating a lock on resources inside the OS. So you can get a handle on that from top-minus-D, which tells you about the context switch rate.

In general, if it's under 100, a smallish number like that, it's almost certainly not a problem. A certain amount of context switching is normal. I tend to use a sort of a rule of thumb on current server systems, so a current Mac Pro or Intel XServe of 250 a second or so might get my attention. Certainly thousands per second get my attention. Now, it's not guaranteed that that's actually a problem. Some workloads have a need to very quickly switch between many different threads, but in general, that would be a case where I would be interested in investigating further.

Finally, I wanted to show you a couple of key bits. On D-Trace, there is a facility called the lock stat provider. The lock stat provider allows you to look into kernel locking primitives and their behavior. Now, unfortunately, there's a lot of that activity. In fact, a normal system has hundreds of thousands or millions of lock operations every second. That's not uncommon.

So very key is to restrict lock stat provider operations to using adaptive block if you want to follow the P-thread mutex, sorry, the mutex, kernel mutex blocking activity. And the same applies for reader-writer locks. So trace strictly those block activities, and that ensures that you don't insert D-Trace overhead anywhere except in an instance where you're getting actual data about a performance problem. So those are very helpful. I'll show a sample output of one of those scripts a little later on.

So let's roll back up. So we've investigated the CPU side of things. Now let's go and look at disk I/O. So top guide brings together two pairs of columns. On the left, the read operations. On the right, write operations. And then within each, the number of unique I/O operations and the data write in k bytes per second.

So this gives me a way to look at aggregate level on my server, how much am I pushing through the disk subsystem. And note that this is in particular what's passing into and out of the disk drive. So there may be more file system activity than this, but this is what I'm actually presenting as load to the disk.

And for example, one of the things that I might take away from this sample of data, that through the disk, I'm actually pushing through the disk. So this gives me a way to look at aggregate level on my server, how much am I pushing through the disk subsystem. And note that this is in particular what's passing into and out of the disk drive. So there may be more file system activity than this, but this is what I'm actually presenting as load to the disk.

And for example, one of the things that I might take away from this sample of data, that through the disk, I'm actually pushing through the disk. That third line, if I look at both read and write together, that's over 230 unique IOs a second. If I had an XSERV with a single SATA drive, I would be concerned. That's actually pretty high for a single SATA drive in terms of the number of IOs per second. So that actually might be an indication that it is not always up to the task.

So I mentioned the HFS+ journal. So it turns out there's a cute little trick that I wanted to share with that mutex lock script. to actually assess journal impact. So this is the output of that script and it shows you exactly how many times per second that something went to update into a particular HFS plus file systems journal and blocked. So that particular stack trace is unique to that event.

Up at the top it's printing out the kernel address for that particular file systems lock and that bottom number giving you the blocks per second. So in that case that is probably indicating that the journal performance is possibly limiting here. 179 times a second to stop because you can't get into the journal to update the file system is pretty high for most disk subsystems and it would be huge for a single drive.

Another way that you can go at assessing whether or not HFS+ journal performance may be at issue is to use FS Usage. And I'm also not going to dive into that. That's a well-established tool with ADC articles available on it. But one quick thing I thought I would point out, that little W as well as the number beside it, that's telling you both whether there was an IO wait, as in the thread was put to sleep, and how long that lasted. So that select took 1.3 milliseconds and actually put that process to sleep.

So in addition, there is another DTrace script which gives you a sort of load average for disks called I/O pending. I also wanted to mention that Io pending takes a slightly different argument form than it does on Solaris. So the minus M option, which allows you to focus on a volume, is taking the volume name.

But if you provide that, you get a breakdown of a particular volume's IO activity. And in particular, in this case, we see that the majority of the distribution is at zero. So in fact, the disk is not very overloaded. It's got a couple of things. It's pending once in a while, but not too bad. So that's a good tool.

So I wanted to go back up and dive down into networking. Networking is certainly an area that I could spend hours talking about diagnosing network performance problems, and there are many, many aspects to it. But what I do find useful out of TopGuide is this ability to get that high-level idea of in aggregate what the system is doing, and especially in an implied deployment kind of situation.

As you get to know the workload of your server, you'll have a fairly good idea of exactly what to expect. But in particular, as a result of knowing the capacity of my setup and my server, I'm able to look quickly at this and work out whether or not I'm getting close to the network limit.

Now, I also mentioned that NCLEqual parameter. There are occasions where if you run out of network buffers, the behavior of the system is that it appears to simply freeze. And this is because its buffers are all committed. You send it new traffic. It has to throw it away. It doesn't even have a buffer to respond to a ping.

So, eventually, if its TCP connections time out, it may recover, but that is kind of ugly. What you can do to anticipate and tune for that is to look at net stat minus M on your system and follow those sets of highlighted numbers. So, if the left and right hand of either pair get close, as in the top line, so those are within a couple hundred of each other, less than 10%, that would make me concerned that this system is at risk for having network freezes and that you should consider increasing the cluster size.

So, in this case, looking at that, for example, I might set an NCLEqual to perhaps 16K, push up that substantially so that there is enough buffer pool to keep the network moving. So that's a key one. The KBASE article will have some additional details about the exact numbers to choose and so forth.

Finally, Spotlight is enabled by default on all of our systems. and it indexes your file systems. It provides delightful value when you're searching for a file, but it does come at a cost. Under particularly heavy write loads, it has to evaluate and index all of those files. So there are several techniques you can use. One is simply to look for MDS and MDS worker processes, which will indicate the indexing activity.

You can also choose to selectively disable indexing on individual files by creating a zero length dot no index file beside either a file or directory. In addition, if you have particular file sets that are not of interest to index, an example might be a MySQL database. That's probably not very interesting to have a MyISAM file pop up in a Spotlight search, and it probably changes a lot. So you can organize all of those files onto a separate file system. In addition, if you would like to keep Spotlight's indexing available at all times, but reverse the... put a lower priority on the I/O that supports spotlight indexing.

Finally, if you want, you can also just throw the big red switch. And that is also, I find, particularly useful, because it's a very quick and easy way to just say, do I care about spotlight indexing overhead? You turn it off, and you watch. And a few seconds later, if you can see something on top guide, then you know you have a performance issue that big with the spotlight indexing.

All right. So that was kind of the survey of the tools and techniques that we use as our primary triage for these particular performance problems for server performance. I now wanted to talk about a real example of this. So I had a customer that had a one terabyte production database. It was extremely critical resource and it was just not doing much. It was a tiny amount of CPU. It was and it didn't matter what they did. It just didn't seem to go fast. So I was working with them to help examine this problem.

So beginning with top guide, I ended up being able to very quickly identify their entire database was striped with software RAID across two 750 gigabyte SATA drives. And the number of IOs per second that they were throwing at it was absolutely as much as you could get out of that class of drive. So they were IO bound. By moving them onto SAS drives for that database was an immediate 3x gain in performance.

But that 3x gain was a hard cap also. In particular, we still found that although we had lots more I/O capacity now than was actually being done and the network traffic to the MySQL database was tiny, still it wouldn't go any faster. You could push more load at it, no faster. So we ran PLockstat-C.

So with PLockstep-C, we were able to see that indeed there was a ton of Pthread lock thrashing inside MySQL. So investigation led us to discover that with MyISAM indexed tables on MySQL, there is a global key cache by default. So every single table was sharing a process-wide, MySQL server-wide global lock and everybody wanted in. So this was seeing in fact on the order of 4,000 reschedules a second and it was unable to get above about 10% CPU utilization because of that.

So, what now? Well, we moved a few of the key tables out to a separate key cache. There's some MySQL configuration for that. But it didn't actually go much faster. So, all right, the next thing was to dig in deeper with Dtrace. And what we did with Dtrace was to couple gathering information from function calls within MySQL to work out what table was actually involved. And having identified the table, it turns out there was a merge table, so a table that's a view across four other tables.

And the light went on. As soon as I mentioned that to the database administrator, he was like, ooh. So that was a query that was being done as a, oh, you know, this might be an easy check and this might save us work later on. Let's just do this check.

Well, this check was, in fact, completely pounding on the key cache lock for that table. There was pretty much no way out of that. However, it turns out that that check was speculative. The application didn't need it. We removed it. And at this point, the customer can only tell me that he doesn't know how much faster it is. He hasn't been able to get the load cranked up high enough to find it. find the limit.

So I wanted to give you that story as kind of a way to tie in. I know I've gone across the top of a lot of different techniques, but I wanted to give you that sort of idea of how in real life these tools apply and can become a way to get at and solve performance problems.

All right. So in summary, we're in an era with big brains, small pipes, and you've got to spend time to get those things balanced. I hope I've given you some tips and techniques that will help you in your setup and diagnosis. And in particular, pointed out some of the newer detrace facilities in Leopard that will help you to do that.

So, with that, I guess the other thing I wanted to mention is we have a lab coming up this afternoon and myself and one of my other, actually a couple of my other team members will be there. I hope you will come by and we would love to take your questions. We will demonstrate some spec SFS and spec web performance runs for you. So, thank you.