Tracing Software Behavior with DTrace - WWDC 2007

Developer Tools • 1:00:04

Get an introduction to DTrace, the popular open source project that Apple has optimized for and integrated into Mac OS X. DTrace is embedded in the core of Leopard, underpinning many analysis instruments in the new Xray application. Learn how to directly interact with DTrace from the command line to observe the runtime behavior of an application, the kernel, or the entire system to gain valuable insight.

Speakers: Steve Peters, James McIlree

Unlisted on Apple Developer site

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript has potential transcription errors. We are working on an improved version.

My name is Steve Peters. I work in the Mac OS X Performance Group. My colleague James McIlree is in front and he'll join me later to answer questions. We'd like to tell you a bit about DTrace today. And this will be a very practical hands on presentation. Are they any DTrace users in the audience today? Okay, there's a smattering. Yeah. Right? I think you'll find a lot that's very, very familiar, identical in fact. And for the rest of you, we hope you'll be DTrace users by the end of this hour.

So, DTrace is a system observability facility that's new to Mac OS X in Leopard. It joins the rich collection of Mac OS X development tools already present in Leopard, in some cases extending the reach of what you can do. And in some case consolidating which helped us in our work on Leopard.

But more important, you can use DTrace to gain insight into the behavior of your applications. And we're very pleased to acknowledge right up front that DTrace was developed at Sun Microsystems. It's open soft, open source software. Appears in Open Solaris and we did the port to Mac OS X over the last 18 months or so.

So what do I think you're going to learn today? First, what DTrace is and a little bit about what it's not. How to use DTrace. You should come away and you may even want to try doing this as you sit here today, typing small DTrace scripts and gaining some interesting insights into the system you're working on.

We'll talk in some detail about the D language, including, you'll see some patterns and maybe some pitfalls along the way. We'll talk about the DTrace ToolKit. And you don't want to leave home without that if you're a sys admin or a, a developer whose out in the field correcting problems. And we'll talk a bit about the roster of DTrace providers that we're going to ship with Leopard.

So if anybody has to write a trip report and report on what this session is about, here's your slide. DTrace is a comprehensive, dynamic tracing framework for Mac OS X. DTrace provides a powerfool infrastructure to administrators, developers and service personnel to concisely answer arbitrary questions about the behavior of the operating system and user programs. It's written by Sun, probably by marketing folks and it's a nice, high level pitch about DTrace. I think the important piece here is arbitrary questions and you'll see how that emerges as a theme throughout the talk today.

So what are the advantages of DTrace? Why did we take this on? As we said it's new and unique tracing technology in Leopard, what's unique about it? First of all zero disabled cost. We think this is an important feature. There, you'll see thousands and thousands of probe points, points where you could collect data in Leopard, yet they cost you nothing if you don't use them or very nearly nothing. DTrace is dynamic. You don't need to start an application, stop it, decompile it, set flags, reboot, you just use DTrace, point it at the facility you want to investigate and let it go.

It's system-wide. You can see the tiniest bit down in the guts of the kernel all the way up through Cocoa Applications and messages being passed among objects there. DTrace is always available. We're going to ship it as part of the user install. You can use it diagnose problems in the field. Yea. And it's comprehensive. Essentially any place a threat of execution can land, DTrace can shed some light.

And the piece that's most important to me or most impressed me, is it's programmable, programmable, programmable. All right? There's a language, an interpreter, a model of execution and the freedom to turn the ship any way you want that's unlike most of the tracing tools that you perhaps have encountered along your careers. And the, the exhortation here is pose and answer arbitrary questions, follow your nose. Do an experiment, look at the DTrace output, scratch head, think about what might be the next interesting question to ask, immediately code that and go. It has that style of interaction.

And it turns out DTrace is an important enabling technology for the Xray tool and you learned about that yesterday in session 309. There's a DTrace instrument template where you can do some of the same kind of improvisational programming to build new instruments in Xray with DTrace. So just as important as what DTrace is, is what DTrace isn't.

It doesn't replace all your tried and true tools. You're still going to be using the right tool for the job at hand. So for example, if you need instruction level timings and you're tightest loops, look to Shark. I mean that's been really worked out very carefoolly and beautifoolly over the last couple of years. For memory leaks, go straight to Xray's ObjectAlloc template or the Guard Malloc libraries.

For figuring out where something might be hung up or spinning under the spad, go to the sampler instrument. And there's no magic here with DTrace. You still have to understand the problem. It's not a one push button solution to, to all performance problems or bug investigations. You need to be able to generate hypothesis, write a little script, interpret the results, and iterate.

All right, so for the visual learners like myself, here's a picture. There, there's quite a lot of software in the DTrace stack, but it sort of naturally bisects into what sits in user land above the dash line and what sits inside the kernel. So in user land, sort of starting from the top, you're going to express your analysis and investigation steps in scripts. Often you'll write these at the command line, sometimes they'll be large enough that you'll want to put them in a text file, edit them and send them through.

And what you send them through is a command line interpreter called /usr/sbin/dtrace, DTrace at the command line. It talks to a, it's a sort of a thin veneer on lib.dtrace.dylib, dtrace.dylib is a compiler for the D Language. It all, it knows how to communicate with the kernel layer in an organized and safe fashion. And it turns out that that's the place that Xray hooks into too, is libdtrace. So it's got a couple of clients.

Below the line in the kernel sort of the big piece of DTrace, the DTrace virtual machine. Interprets the byte codes, sets up the trace points, collects the data, sends it back up into user land for you to look at it. The virtual machine in turn relies on a set of semantically knowledgeable modules that are called providers.

So if you're interesting for example in important events in the life of a process, you'll turn to the proc provider and see things like exec starts, exits, forks, lightweight thread starts and so forth. If you're interested in when io gets started and finishes up, you'll appeal to the io provider.

Similarly if you want to run profiling experience, experiments, catch events on millisecond boundaries for example, you'll turn to the profile provider. We'll be spending a lot of time with the system call provider. Remember system calls are the programming interfaces that sort of span the user land to kernel boundary. Things like read, write, select, get me some memory, exit.

And really interesting provider here that James has done a lot of work on is a fasttrap provider. It knows how to reach from the kernel, back into user land, place instrumentation, and collect data from the instruments it places back in user land. So we can go right into say malloc in libSystem and notice when malloc is called, notice what the arguments to malloc are and so forth. Similarly fasttrap can instrument your application. Any function entry point, any return and any instruction in between. In fact, fasttrap is capable of implementing just about any instruction in the entire memory image of Mac OS X.

So enough talking, let's take a quick, quick demo. So great. So say DTrace here, I've got the right machine, good. And DTrace announces what its arguments are, what it's command line flags are. There are a ton of them, but we'll get away in most practice using just about five as you'll see. And probably the first and most interesting one is to ask DTrace what are the probe points that you know about? Where can I collect data in my system right now? So let's clear that out.

Grab this guy. And here I've said DTrace -l and pushed it through more. And we begin to see the sort of long list of DTrace probe points. The left most column, increasing series of small integers are numeric ID, then we have the name of the provider, a module, which helps us disambiguate names, a function name and a particular point within that function where we're going to collect data, where the, the probe could be attached so to speak.

So let's see, let's go down and here's, so 19,000 probes later, we hit vminfo, which is going to, if we were interested in when paging happens or page faults occur, we'd take a look at what vminfo can provide. There's the proc providers we talked about before, forks, exits, io provider, when we start and stop io. And I think we're pretty close to the end here, so on this system 19,448 probes in place right now for DTrace.

And we haven't mentioned the probes that you could provide, put into any running application, Mail, Safari, your app. Which can make this one of probes grow very, very fast. It's not unusual to, to see 100,000 probes appear when you instrument a big Cocoa application. So let's, let's latch onto one of these probes and see what's going on on this system.

( Period of silence )

So here's the sort of canonical little DTrace command. DTrace -n says I'm going to give the D Language program right here on the command line immediately following and here it is. It says, let's look at what the sys call provider can tell us about everything it knows. We say no more, DTrace fills in and says, well we'll tell you any time a system call is made on this system. And we'll pipe that through more because as you'll see these things go on for quite a while.

So I don't know if you notice, there are 854 sys call probes matched and now we begin to see the entry and return from a whole, whole long list of these things. All right, so absent any specific direction, DTrace does the natural thing. It says, I hit this probe, I'm going to tell you about it. Puts it in a buffer and we eventually see it in more.

But this is sort of an overload of data, there's an overwhelming amount of stuff to look at. Is there some way we can organize this? And DTrace has very power, powerfool facilities for digging into the data, massaging it in ways that help us understand sort of immediately with immediate impact what's going on on the system. Let's look at one way to do that. So again, we're thinking about system calls that are going on on this you know what other, otherwise is an idle system.

And again here's that command line, DTrace -n, I'm going to give you an immediate script here. You're again asking the sys call provider to offer up its probes. We'll take all the entry probes. And now here in the brackets, we have a little bit of computation we'd like to do. We're going to fill in a table called ag and the rows of the table will be keyed by the name of the probe, read, write, select, ioctal, the particular system call that we dove into. And we'll just increment a count for that guy.

We'll talk a little bit more about what this at sign syntax means and why we're using the function on the right hand side in a couple of minutes. But let's just see what happened while I was talking over that, that data collection was occurring. We'll stop the data collection and observe that a little table was printed out.

Keyed on the left, my left, by the name of the system call that we dove into. And the value on the right is the number of times we encountered it, all right So sort of simple counting exercise. Interesting that these two guys at the top sig all stack and the sig proc mass, remember those.

Those are you know, I probably talked for 15 seconds or net 1300 of those, that's a fair amount of system call activity. Well, you know sort of the next question following our nose is well which process is making these system calls? You know otherwise, this was an otherwise idle system. Can we look into that? Well, sure. So let's clear this.

( Period of silence )

Okay, so same idea. going to look at entry probes from sys call. We're going to collect them in a table, but now the index to the table is the process name. So as a thread comes through and hits a system call, say makes a read system call, that traps into the DTrace virtual machine and the first thing the DTrace virtual machine says, hm, whose the process?

What's the name of that process associated with the thread that did this? I'll need to remember that because someone may make a table like this. Well, it doesn't quite work that way, but it maintains a collection of special named variables that you can use for purposes like this. All right, so again after talking for 15 seconds or so, hmm.

Here's the list of processes now that made system calls. And woo, window server at the top with 674 system calls made in that interval. I don't know, was I shaking the mouse? I'm not so sure. So now maybe we can dive in foorther and ask well, window server, what, what, what system calls was window server making? And here's a script that'll do just that.

( Period of silence )

So now we introduce another element of the D scripting language, a predicate or filter. Enclosed by slashes and it says, whenever we dive into one of these probe points, detect if the name of the process is Windows server. If so, go on and do this action, otherwise, you're done. Nothing interesting here.

So when we do see a system call made by Windows server, we're going to collect it in the style of the original table example, namely we'll index by the name of the system call and count those up. So what did we get? Oh my goodness, it's the Windows server who's making that sort of high level of sig alt stack and sig part mass calls. So that's sort of there's the paradigm right? A series of questions, a series of scripts and maybe some insight at the end. That's our next stop for me is the Windows server folks and asking what's going on here?

All right, so back to the slides.

( Period of silence )

So what have we been doing? We've been writing little DTrace programs at the command line. Here's a summary of what they were. Sort of a little growing pattern that we've been using. And at the heart of all this was this D Language.

This is the DTrace D Language not the Mars D Language. It's very much like C, you should be, if you know a little bit of C, or you know a little Perl, you should very much at home with the D Language. So I mentioned before, it's compiled to bytecodes and those are passed down into the DTrace virtual machine and the kernel, where they're interpreted, sort of held onto until some probe fires and then off we go and figure out if we're going to do some table counting and just where we're going to put the entries in the table.

libdtrace.dylib shipping on every user system for Leopard does most of the heavy work, heavy lifting here. The language though is lightweight. It's really small. It's quite simple. There's no explicit control flow. There's a sort of technical safety reason for that. You don't want to get stuck in a loop by mistake. Say if you're tracing I/O interrupts, that would not be good for the life of your system. There are no user defined functions. Variable declarations are optional. You know you just say what you mean and variables spring to life, very Perl like.

The D Language is built of common elements. The first are these, the probe clause, the foondamental element of D program. We sort of stretch these out onto one line. You can of course, text file, make it look even more C-like or Perl-like. The probe calls, it's lazy. You know if you're missing some element for example in our first script we omitted the predicate and the action statements entirely. The interpreter fills in something sensible to do. Namely announce that you visited this probe.

All right? The first piece in the probe clause is the probe descriptor. That's that tupole of four components that uniquely identify a point in the system where we're collecting data. First component is the provider. We've talked about that. You're familiar with several providers now. Second is the module and you'll be more familiar with that as we go along but it's basically a way to set separate name spaces up. Then a function, read, write, select, malloc, and a particular point within that function that we might want to probe. The entry, the return, some hex offset within some function.

( Period of silence )

Supports wild cards, you saw that in our very first example. We provided none of module function or name. The interpreter did the right thing. It said, oh he wants to look at all 854 of the sys call probes. The probe descriptor naturally is really the only required part of a probe clause. You got to say what you want to hook the sys call to.

Can list probes via DTrace -l, we've done that already. If you want to sharpen that, mine down into what particular provider might be able to offer, you can essentially just set up a probe descriptor when prefixed with this -l flag, nothing gets enabled, no probes fire, but it will list all the probe matching sys call provider name, sorry function being one of the write system calls on the system. There's write, p write, write no cancel, so forth on Mac OS X.

Some examples, our very first one, sys call all the arguments wildcarded. Here's a provider that I've referred to but we haven't seen explicitly before. Suppose we want it to look at our process number 1017 and we may have gleaned that from one of the other tools on Mac OS X PS or top. So we say pid provider, process number 1017, in the module libSystem, I want to look every place printf returns.

And finally because Objective-C plays such a large role on Mac OS X, James went ahead an implemented Objective-C provider that derives from the pid provider. So imagine that process 1017 was a Cocoa app. We could look into its Objective-C name space. Any class or category, we've omitted the sort of second module piece. Any instance method that begins with the letters draw, wildcard the rest at the entry. We'd like to probe those points and fire through to the rest of the stuff when that's hit.

Next piece in the probe clause is the predicate. That's a condition, a Boolean, a filter. We evaluate it every time the probe fires. We use it to focus in on just the pieces of data of interest. We may be waiting for an obscure condition that triggers a bug. This is a way to sort of keep our probes out of the way until that event happens. And then start collecting lots and lots of interesting data to help us diagnose a particular situation.

Some quick examples we've already seen, the comparison of the special variable exec name that the interpreter maintains with a fixed string. We might be interested in when Perl for example makes and open sys call, right? So combining the probe descriptor that tells us what we're looking at, the filter to say when we want to see it. We might go on to do some interesting data logging activity.

So here we refer to sort of an arithmetic Boolean. If my call count exceeds five, then we're going to trap into the action statements for this probe, post incrementing the my count call along the way. If my call count is less than five, we don't bother. Perhaps the first five opens are uninteresting to us, but on the sixth one our app falls apart and we'd like to know why.

And finally we can actually dig in to the state of the probe that fired itself a little bit. Here, if we ask for the tripping on entry point to open sys call, we can look at its arguments list. Right? Argument zero to the open system call happens to be the string name of the file we're about to open. We can get a hold of that through copyinstr, talk about that in a moment, and compare it to a fixed string. So we might be interested in looking for all the times in the life of the system when terminal opens its plist file.

Action statements, this is the computation, this is sort of the heart of the DTrace probe. These are expressions in the D Language. Arithmetic, relational, logical, bit operation, semantics just like C, syntax just like C, it's going to be very familiar. Other action statements assigned to program variables, you can remember some state, you can remember that state either for the life of the, the duration of the firing of the probe. You can remember it globally for the duration of the experiment you're running. And you can segregate that data perhaps by the thread on which an event occurs. We'll say more about that in a little bit too.

There are built in functions. printf does an actual thing. Same syntax as C. Stack, we can ask for a stack back trace when a probe fires. If you're familiar with that from debuggers. There are string manipulation functions, strlen, strtoke, substr. And an interesting class of functions that I've already referred to, copyin, copyinster, copyout.

We've already mentioned special variables whose value is maintained by the interpreter, execname being the first one. probefoonc, the name of the function that was triggered. The interpreter also maintains a timestamp. It has actually three notions of, of time progressing. Wall time, virtual time for the process and one other version.

It maintains the, a notion of what process ID was associated with the thread that ran into the probe, the thread ID and so forth. There's several more. You'll find them in the documentation. And in action statements assignments as we mentioned can just create variables out of thin air so to speak.

So some simple examples call to printf, where the argument, the string argument is actually extracted from the system call that was made. Here we assign that string perhaps we want to hang on to it to see when it closes, there's something interesting about when that, that file closes. Who knows? At this point we're interested in when Perl does an open, where we in the Perl executable? So we'll ask for the user stack trace, stack trace back in user land. So let's look at a few more demonstrations.

( Period of silence )

So here's that use of printf. The special built in variable execname. That copyin function that's going to pull the first argument to an open call out of the probe firing. And we're going to tag a, put a little extra argument on the end of the DTrace command that says be quiet, print none of the identifying information about the probe.

We've handled it in our printf. So let's see what happens when this guy runs. So he's off to the races.

( Period of silence )

And what happens if we have a look at the about box. Right, great. All sorts of open calls. Most of them made by log in window, who's managing the, that dialog box as you might expect. A bunch of calls to open made by ATSServer, which is, manages the fonts, the collection of fonts on the system.

( Period of silence )

So I think I'd like to sort of broaden our horizons now away from the, the sort of narrow system call interface and look a bit more about how DTrace can tell us things about processes we construct. So here's a, oh hang on, let me clear the screen. Here's Hello World.

Maybe a little more complicated that versions that are in K and R. So let's run this guy. It does the natural thing. Right? Hello WWDC. And now let's, let's take a look into the execution of this very simple user binary and ask does this thing ever call malloc? You know it might be an interesting thing to answer, certainly in a much larger program you'd like to know that. So here's DTrace -n and the new construct here. pid provider, kind of a macro argument and we'll get to that in a second.

So pid provider is going to instrument a process in user land. We'd like to give it the process number, but we don't quite know that yet. And we'll look for entries into malloc. And when we see one, we'd like to see the stack, stack back trace up in user land.

Well, how do we fill in the process number? Well, we ask DTrace to start the program itself, keep track of the pid that gets started and feed it into that script so it knows just who to instrument. And away this guy goes. And sure enough, here's the execution. And when that thing finished up, DTrace collected its buffers and said, yep, sure enough, you hit malloc.

And it was under main, under printf, under the implementation of printf and right down somewhere in make buff. So interesting, might be useful for example as the next thing to know how big was the malloc buffer. We leave that as an exercise for the reader. And now we can go back to the slides.

( Period of silence )

Okay. So a little bit more about action statements. It's really quite a rich computational environment. It allows complete structure and array access. If you have a variable that you know in your process, in the kernel, that's a structure, you can dig into it by a construct like that. If you hold a pointer, if you can get your hands on a pointer in a D Language script like p, which we know is a struct proc pointer, we can dereference it and look at pieces of the struct.

We can index into arrays, all right? So curproc is a special variable maintained by the interpreter. We can dereference it. It, we know it's a structs, struct proc star, into the p_ucred structure and then foorther down into the cr_groups array under there. And that turns out to be the affective group ID for the current process.

( Period of silence )

Sort of as a side note, it's interesting to know that there are, there's some opportunity to make decisions in a D Language script with a conditional expression of the familiar C sort. We have a lot, essentially a selection.

( Period of silence )

Well there's something special going on when we can chase through named pointers and named objects in a sort of C like fashion on a, a big object like the kernel. Hmm, we're not recompiling the kernel with D, but we're able to peer into it. And that's because DTrace is symbol savvy and type aware. Has deep knowledge of the UNIX kernel, mock kernel. It knows every type definition, every struct in union and every global variable and external symbol in the kernel.

Similarly the, the pid provider can deal with unstripped binaries, any symbol that you can see with nm -g on your binary, the pid provider will be able to offer up to you and DTrace can again chase pointers, examine structures and unions. And the way you get it, get to that functionality is to use the minus upper case C argument to DTrace to pull in your own type definitions. Alright so it, it has a clear understanding of what C structures look like and can press on that way.

( Period of silence )

So let's take a carefool look at that. Our previous Hello World example was built on the printf stdio functions. These are the file versions of those. So we include stdio.h, and in there there's a declaration for stderr that says it's a pointer to a file structure. If you've got your machines now, you can look at stdio.h and see what a upper, FILE upper case is. It's a structure that contains buffers, lengths and one interesting field which is the small integer file descriptor that UNIX uses to communicate the identities of files, okay?

So we'd like to kind of hunt, hunt through the file pointer to see that number, you know what is that number? That small integer number that we communicate to UNIX when we do an fprintf to standard error of the Hello WWDC string. Well, here's a script that lets us get after that. It too wants to include stdio since it wants to manipulate this structure. We use the pid provider to place a trap, a break point, a probe point at fprintf, the user of the stderr symbol.

And when that probe fires, we grab the first argument to fprintf, zero base, so we call it arg0. Pull in the structure that it points to, that turns out to be 88 bytes worth of stuff. Bring it into the DTrace virtual machine and then manipulate it, namely print out it's under bar file component. So let's see how that works.

( Period of silence )

And we'll go to the demo machine.

( Period of silence )

So here's that program, uses fprintf, passes this file pointer around. Here's that D script. Catches fprintf, grabs that file pointer, hauls it in, puts it into the DTrace virtual machine and then digs into it. All right, so let's run that guy. Ah cool! So there's the execution, and here's DTrace telling us, yeah, I caught a call to fprintf. And the thing you asked me to trace was the small integer 2. And if you remember your UNIX 101 stdin, stdout, stderr is zero, one and two. So makes sense. Thank goodness.

( Background noises )

So let's go back to the slides once more. And here's what we just saw. Sort of glossed over this business of copyin and its actually a good point now to dig in a little bit deeper here and expose the sort of foondamental execution model of DTrace. It's I think important to have in mind.

DTrace makes very clear distinctions between what goes on in user land and what goes on in its kernel, virtual machine, data store let's call it. And keeping those things straight in your mind will make it really, really straight forward to get scripts running right the first time. So here's the big picture. We had a script, that's up on the white, big white box there. Sent it into the DTrace command, that got compiled into byte codes. The byte codes got pushed down into the DTrace virtual machine, its interpreter.

DTrace actually started the, the -c option, our little hello_fp program. At that point, libSystem was instrumented. A trap was placed on the entry to fprintf. hello_fp began executing. It called into fprintf. Passing that pointer to this file structure. The trap hits. The interpreter, the interpreter in the kernel gets invoked.

We begin to execute this code. In particular we execute a copyin function. That's a special function that knows how to reach into the virtual address space of the user land process, pulls 88 bytes located at arg0, the pointer stderr back up there. Drops that into the DTrace virtual machine data store where we can now manipulate it.

By referencing through the pfile variable, pushing the result back up for our viewing pleasure at the end of the execution of this script. All right. So if there's copyin, well there must be copyout. You can copyout something from the DTrace virtual machine's data store into user land. But I've been advised here by my session director to stand very still and soberly announce don't do this at home kids. You can mess up the stack, data and all sorts of things in your world. Can you consider yourself forewarned and forearmed let's just say that. All right.

So the copyout action. Here's that program again. I'm going to allocate a string this time to hold the Hello WWDC message. I don't want to mess with the protected data pages. I just want a raw memory that I can fooss with here. So we malloc some space, copy the constant string hello wwdc, and then we're going to print it right?

Except we're going to interpose a little DTrace action that will when fprintf is entered, it's going to copyout its own constant string. It's going to copy that from DTrace kernel virtual data store up into user land and then let things go and let's see what happens. Notice I've done some carefool counting of the number of characters in the string.

( Laughter )

Okay, forewarned. So back to the demo machine.

( Laughter )

Let's see. Here's that program, I can cat it what the heck. Right, oh, whoa you don't want to cat the, let's, let's cat the C code, right? Good. Yeah. All right, and then you run it. Does the right thing, all right. Here we'll apply that little piece of business.

There's that D script that did the copyout. I've got to stand up straight again and say, in order to do this thing, this is called a destructive action in the DTrace world, you've got to add a special argument to the DTrace line, right? You're going to write into some place where you really shouldn't be. Take my word. And away we go, what the heck. Ah! So there we executed.

Different message this time. We've done a destructive action and poked into the address space of this process. The point of this is if you can understand this interaction you've, then you've got a clear model of the execution model for DTrace in your head and you're home free. All right? You're good to go. Let's go back to the slides.

( Period of silence )

So let's talk, I'm going to give you a quick survey of the D Languages variables. I promised this at the outset. They're global variables. They're just simply utter their names, foo and bar. You can declare them if you want, initialize them and use them any place you care to in a D script. Here we might want to record the very first time we see Perl do an open.

( Period of silence )

I said before that it's possible to collect information segregated by threads. That you know threads come flying through system calls and all sorts of other probe points. You may want to collect you know particular counts based on which thread is coming through. There's a syntax for that.

Arbitrary, just simply prefix the name of the variable with self pointer, foo self pointer bar, this segregates the variable, attaches it to the particular thread that's flying through this probe point. All right so we can count the number of thread opens for this particular invocation of Perl on a thread. Perhaps we've got you know multiple versions of these things running in some sort of web server.

And then we have essentially what amount to equivalents of stack variables in the C language. Probe quo as locals, they appear and live for the life of the execution between the braces and then disappear. Usually used for you know temporary variables. Right? We want to compute a difference, we may use that several times. For efficiency we capture it once and then let the interpreter use that later on.

Okay, so orthogonal to the sort of scope issues are kind of dimensionality issues. So variables can be scalers. foo, self bar, this baz. They can appear in arrays and DTrace arrays are associative arrays. You can have arbitrary keys and actually tupoles of keys appearing between the square braces that index the associative array. And you can assign arbitrary values on the right hand side. It's very handy again for sort of teasing out data based on like for example a particular file descriptor, a particular socket number, a particular range of addresses.

And then there's this really powerful notion of aggregations. We actually started off in our second example just making use of an aggregation. That's a, a table, a place where we can count up stuff. But has a very special property. It works really well on multi processor systems, all right. As we scale up to more and more processors, if you think about assigning into an array, like @trapsTaken, you don't want to have to take a global lock to get an atomic update.

So what DTrace has introduced is this notion of an aggregation. It's quite a bit like a reduction operator if you've taken I don't know CS203 or something. Across a multi processor machine, we maintain these results in individual copies of the aggregation called @trapsTaken. And later when we want to the net effect of the aggregation, we combine them all together at once. That allows us to proceed synchronization free on all the processors.

Here's another example. We can use a tupole to index into this aggregation. Aggregations are introduced by the @ sign that's part of the syntax. But notice there's something special on the right hand side. I've not done, you know ++ or + another variable. I've called another function and that's very special.

Because there are just these seven, eight aggregating functions that DTrace allows. These functions have all the property that if you perform them on each of the multi processors, you can in the end reconstruct the total by just bringing the, the numbers together and reapplying, essentially reapplying some versions of function. So if we count, we just maintain counts on all these machines.

Time comes to collect the data, pull them all together, well just add them and we've got the net count. You need to sum, say byte counts, how many bytes are being allocated by malloc on this process or by malloc on that processor, maintain them independently. When time comes to print the data, bring them together, just sum them up.

Similarly for maximum, minimum and then these very interesting quantize functions that are essentially capturing distributions of values. You can do the same thing, histograms, and then combine them at the end into one sort of grand histogram, one global distribution. Okay, so quick demonstration of aggregations. We've seen some of these before. Here's a, another piece on this.

Good. So we're going to call, we're going to look at system calls again. This time the memory map system call. And it turns out that the second argument to memory map is the number of bytes you'd like mapped. Number of bytes you'd like to attain for their chunk of memory. Of course since that's zero based, we call that arg1.

We form an aggregation introduced by the @ sign, call it mmaptotal. We're going to tease these out by the name of the process that's doing the mmap. And just a keep a you know a sort of running total of all the mmap activity across the system that way. So off we go.

And what the heck, let's launch Safari and see what happens during that little piece of work. And we'll quit. And absent to anything else, when the script ends and there's been an aggregation accumulating, DTrace does the sensible thing. It says you've been collecting this table of numbers. I'm going to pull it all together for you and display it smallest to largest.

And sure enough, we launch Safari here and it mmapped what 78 megabytes of stuff, probably it's code segments or the leading piece there. PubSubAgent, that's interesting. I hadn't seen that before. And that would bear some looking into. Maybe that's a recent feature. ( Laughter ) Okay.

( Period of silence )

So but maybe we're running a little bit more, we've got this sort of overall global sum. You know how were these mmaps taken? Were they for a large chunks of memory, small chunks of memory, what's the distribution? So here's where these quantize aggregations come into play.

( Period of silence )

So this time rather than sum the sizes, we're going to quantize these. Essentially collect a histogram. Let's send that off to run and we'll warm launch Safari this time, it may turn up different. But it's, it's sort of the same idea. Quit from here. Stop the script. And what we see are the distributions of mmap sizes organized in a sit on your side histogram. And split out by executable.

So we see Safari a whole bunch of 16K segments. Hmm, I'm not so sure they're code segments anymore. I don't know. And DTrace itself allocated two very big mmapped buffers. I know that from how DTrace works. Gee the dock got interest, did some interesting stuff in there. And what was launch D up to? So.

( Laughter )

Yeah, right. There's a talk about that, or there was a talk about that, a great talk.

So often it's the, sort of immediately gestalt impression of a histogram that leads you to the next question. You know why was there this one really big thing, chunked way out there and then you can filter on that. So again the paradigm of follow your nose through these things.

And finally suppose we wanted to know the site. You know where was it that Safari was mmapping, mmapping. Where was Safari mmapping? That's an interesting question. Perhaps to isolate down to a particular framework where big chunks of resources were being consumed. So, here's sort of this interesting idea.

We'll look for mmap entries, filter it down to Safari, the exec name Safari, collect an aggregation, but now for the key to the table we're constructing, we'll collect a user stack. All right? And the value stored there will be sort of the net amount of mmapping done at that site. So let's how that works. Off we go.

Safari up and away, faster and faster each time, that's good. Let's pull down a menu for the heck of it. Quit, stop here, ah interesting. Oh, I should have left it running. One more time. Run it again. Safari.

Ran the wrong one.
I ran the wrong one. Did I? Yeah I did. Okay, nice and easy. That's the one I wanted. Safari, let's do something, go visit somewhere else. All right.

All right, now great. Ah, much better. We see the wonderful work that James has done with sybolication. So, so you know here's a, here's a big collection of mmaps, mmap calls, netting out to a total of what, 3.3 megabytes made from this stack location. In libGIF. That's interesting. In ImageIO, another 3 megabytes of mMap stuff. Core foundation similarly.

So it's you know yet another slice and dice, right? And just by sort of almost elementary recombination of the elements we've talked about, lead you to you know sort of very interesting displays of data. And seeing them smallest to largest is often very quickly focuses you on you know where the action is. Okay, back to the slides.

Good. Good, good, good. So you're now all DTrace experts, ready to go out and right tons of scripts. Well, maybe not, you know you may have a restricted set of problems you always like to look at. You maybe want to have some handy tools you know sort of prestructed. And that's what the DTrace ToolKit is about.

Freeware again done, first at Sun Microsystems for Open Solaris. And imported almost trivially to our implementation of DTrace. There's very few changes once you get the basic DTrace stuff. The scripts just seem to come right across. They're a set a handy analysis tools for common investigations done by a very talented Sun field engineer. They're, and all DTrace scripts or D Language programs wrapped in a little bit of shell wrapper to do argument parsing.

They're readily customized if you have a, you know a sort of a different need, different take on what you want to see, you know hit it with your favorite editor. Browse them, learn about the D Language, some of the more esoteric patterns appear in some of these scripts. Sort of useful for a second course in DTrace.

We've got a little more than 40 of 104 scripts ported to Leopard. The ones that are missing largely rely on specific, you know Solaris implementation details. We'll be moving those over in, or many of them over in the next, in the coming weeks. But today on your Leopard CD you'll see around 40 scripts read@y to rock and roll. They're all on usr/bin. man pages in a place where just saying man's such and such will get them.

And there are some interesting annotated examples of how these things can be used in user share examples, DTTk. So we spent a lot of time this afternoon looking at sys calls and how to dive into them and collect various information. Well that's all been sort of pulled together in a tool called dtruss. It's very popular in the Solaris world. I find it very useful now on Mac OS X.

Here's its help message. Gives the list of arguments, the things that it knows how to do. The argument sort of partition into what thing do we, do you want me to look at. Is it already running? Or do I want to start it up by using the last option command, just utter the name of the thing on the command line. Do we want to filter down to just a particular set of system calls or see everything in sight? Should we print summary counts? Do we want to look at times? Back traces and a demo. Wow.

( Laughter )

( Period of silence )

Click to type. So let's run dtruss now. We're going to collect counts and the thing we want to look at is the dock which is already running. So it's, oh, we go right over the dock and it begins to react to our mouse motions. Interesting. Pop up a menu.

Cause something to show. And at the end I just stopped it, we see a summary of all the system calls it passed by. And just to look up a little bit in the scroll, process number 96 an isolary process number for the dock, a thread identifier, the kernel thread that sits behind the particular thread that dove into the system call. The name of the system call. Some of its arguments.

The immediate return value of the system call and any error node that was set. And so here we see for some reason we weren't allowed to make dir probably because it already exists on user Apple library preferences. Return minus 1, that's error node 17 and you can look that up in syserrnode.h. All right, back to the slides one more time.

( Period of silence )

So does that look green? It should. The notion is here's our roster of providers in Mac OS X at WWDC. fbt function boundary tracing we didn't talk much about that. But it's in that initial list of 20,000 probe points. You can see the entry and return from any function in the UNIX kernel.

lockstat, particular semantic knowledge about kernel locking. vminfo, we took a quick look at, see where page faults occur. proc, life of processes in the system. io, where I/O occurs. Profile, timed events, system calls we done, did a ton with, mach traps. Sort of the mach world compliment to a system call. plockstats, those are locks up in user land. p thread locking.

And you can investigate the behavior of those things using the plockstat provider. pid and Objective-C we talked quite a bit about. usdt, which are static probes up in user land. An advanced DTrace topic and we believe Dave will be talking about those in the advanced DTrace and Xray session tomorrow.

And finally sort of like a gift, when you've got DTrace and user, user land probes working right, you begin to pick up what's going on in the other communities around Perl, Python, Ruby, and now the Java 6.0 preview have, all have DTrace instrumentation points that are ready to rock and roll on Mac OS X.

( Period of silence )

So the summary, the take home message. The DTrace approach unlike the monolithic tools we've previously had, ktrace, trace, is incremental, exploratory and improvisional. So follow your nose. Aggregation turns out to be the really powerful, powerful piece of work. And finally you're going to find all these facilities in the Seed as it was delivered to you earlier this week. More information there is manual pages, the examples for the DTrace ToolKit scripts that I mentioned. A vital and vibrant community of DTrace users at opensolaris.org. And a discussion mailing list that's again lively and fun to keep an eye on.