OS Foundation • 41:32
This session discuss a new subsystem for managing log data which consolidates logging information provides it in a consistent format, and makes it much easier for an administrator to analyze system behavior. Usage of the new system and APIs for generating and monitoring log events are discussed.
Speakers: Nicolai Krakowiak, Jordan Hubbard
Unlisted on Apple Developer site
Transcript
This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.
Good morning. It's a pleasure to start the OS Foundation track this year with a talk entitled Beyond Syslog: The New Log Subsystem. We have a couple of interesting things planned for logging, although they may look rather small. They're things that we'd like to be able to leverage in the future.
Unfortunately, I have a confession to make. If you wanted to go ahead and use the technology I'm presenting today to make a new dashboard widget to win the competition, the APIs that we have to show you today aren't actually on the CD, but will be coming sometime before Tiger actually ships.
So my name is Nicolai Krakoviak, as I've been introduced, BSD Technology Group. And first I'd like to start with a background on syslog, what it is, what it does, what it does well, and what it does not so well. And then also do a little bit on the motivations on why we're actually planning to do a new logging infrastructure, again from some of syslog's sort of deficiencies.
There's a little bit on system administration beyond just logging, plans that Apple has for the future, for different things that Unix administrators have been asking for. And by the end of the talk, you'll be able to have a little familiarity with the API that we're planning, how to write log messages, how to actually search through the log messages, and best practices for when and what to log.
So what is syslog? Well, in a world without syslog, you just open up a file and start printing records to it. But what the syslog provides you is a system-based facility for being able to do all of your logging through a central point. This allows a lot of configurability and allows for messages not to be scattered all over your system, so you don't have each application writing individual log files to various separate directories.
So by having the centralized facility, we can then gain other functionality in addition to printf. Things like being able to tag priority messages. So a message like "my printer is on fire" is very different from "I just opened up a TCP connection." And to be able to differentiate between those is very useful. You can also do simple log filtering.
Based on these priorities, you can specify separate log files that get the messages. Things like system.log and varlog gets the majority of the data, but you can also do things like authorization information with a slightly more secure file permissions, or things like logging mail messages or FTP connections also go to their own separate files. For extreme cases, you can also write things directly to dev console. Again, like "my printer is on fire." That might be a very important thing to keep in mind.
So administrator can configure it so it goes directly to a console as opposed to just a file so users are more likely to see it immediately. Syslog also provides a basic capability for remote logging. This unfortunately was designed back in the heyday before denial of service attacks and our fancy encryption. So it's not necessarily the most secure. So it's not enabled by default, but we're looking into possibly fixing that. So there are a number of things that Syslog got right.
A centralized logging facility to be able to have a centralized mechanism for dealing with log messages. Being able to specify log levels for messages, everything from debug through emergency. It provides a lightweight API, so it's a very low barrier for entry for adoption. And you don't want logging to be necessarily a difficult thing for your users to have to do.
It's also very configurable. Being able to filter on its priority and level. And also the ability to log to those remote machines. But there are a few things that we'd like to address with Syslog. It has no internationalization support. You can only log messages with a subset of ASCII. This isn't very good for -- because a number of languages are very difficult to represent in just ASCII. The messages are also a little too simple.
By only providing the timestamp and priority and then a message, there are a lot more things about a log message that you can insert that might be useful for a person who wants to look at those log messages. While you can't add them just for the log message string itself, then it exhibits another problem with Syslog. It's got an unstructured format, so it's difficult to parse, and it's also difficult to search. You just have to do things like using grep and fancy sort of expressions to be able to get what you want.
So our goals include adding a structured log format to make searching much faster and easier, and in particular for a programmatic perspective, to be able to have rich data to be able to attach to these log messages as a separate field so you can log things that won't necessarily just appear through the log interface. Again, a lightweight API. We don't want to make it too complicated for you to actually log your messages. And of course, backwards compatibility. If your program is written today to use syslog, it should continue to work into the future.
Another thing, integrated log file maintenance. Currently, log files are rotated by some external application usually done through a cron-like mechanism. But cron really doesn't know a whole lot about log messages. So it'd be really nice to be able to move some of that functionality inward. And then extensible input and output facilities. Be able to stay control whether or not you get log messages from different available resources or being able to specify some outputs.
And I'll give some examples of that later. For log file maintenance, I think it would be really nice to be able to have configurable log file expiration. So a system administrator can say, keep all messages higher than warning. Those are going to be messages that I want to archive well into the future.
But I might only want to keep the last 200 info messages. And those can be just as a rotating sort of buffer. You might want to say, don't keep debug messages from an app that continually spews out into the system. log because they're just cluttering up your logs and you don't find that information valuable.
We're also thinking of things like, you know, don't spin up the disk so you can save battery life as opposed to having to write these log file messages. And then also I'd like to add like filtering on compound expressions. So all of these sort of rules can be have, you know, complicated things like, you know, John that has a level greater than notice you might want to do something special with or all of my log levels yesterday that were of a warning level.
Back to that input/output thing I mentioned. The next generation syslog, ASL, will have a syslogd module so it can handle input from all of the standard interfaces that syslog expects them from, like devlog and such. Also, from UDP if you so wish to enable that because you're dependent on that functionality. And then for writing it'll have full syslog comp support so you can have messages still continue to go to their old locations, although it's not the preferred interface for some of the things like searching.
The local store, as you heard yesterday, we have SQLite in Tiger, and it'll give you the facility to be able to do a little bit richer searching on your log messages, and also go through a programmatic interface. So your individual app can display a short history of your possible debug output without having to send the user to a separate location to find that data.
We also have a couple of ideas for possible future extensions. As we mentioned, the remote logging is not secure, so maybe come up with a way of being able to remedy that sort of problem. Be able to do logging from SNMP information, to be able to integrate that into the logging infrastructure. Maybe you want to advertise some of your syslog information on your blog. You can do things like with RSS.
Once the information is in a separate modular sort of interface, you can do all sorts of things. Maybe message in the bedroom can message the machine in the den, "My disk is full. I need help." The individual log records themselves have a couple of components that are automatically filled in for you. We're going to be trying to capture the user information both as a username string and as a UID if you use separate record interfaces, because those might not necessarily match all the time.
Similarly, group information. You might want to know our query across a group of which applications are doing logging. Timestamp for when of course the log message is received. Machine name, mostly for now for future extensibility for being able to support remote logging. And then the application name itself, which is derived simply from RGB0, so if you wish to overload that you can change it.
And then of course things that you have to specifically call from the API. The log level itself determines a little bit how it gets filtered. The message itself. And then we're also providing an ancillary data field so you can go ahead and attach a string that would not show up necessarily in the central log message so it's searchable as a separate field.
When is it a good idea to log information? Well, really, all the time. Anytime you do anything sort of useful, like a system service, like if you start listening on a socket or start advertising a service, if you disable a service, parsing configuration files, creating new files. And some of the reasons this is useful is if things start to break down, then you can provide the error messages saying, you know, this file was corrupted and I can't deal with it. Some people have had applications just go ahead and do a couple bounces in the dock and then not do anything. And that's really not a very good interface.
And to be able to provide some sort of mechanism for the user to be able to figure out what exactly went wrong and maybe possibly correct it would be very useful. So you can also use it for diagnostic information for various things, not just from application, like hard application errors, but things where the system is in a consistent situation. state where your application can deal with a particular problem, but it might not be ideal, and the system can provide log messages to be able to give information for the user to be able to tune their system.
Again, providing a simple audit trail to be able to figure out what the application is actually doing. One of the nice things about Mac OS X is you get to hide a lot of that complexity from the lower layers up through a nice user interface, but it's also nice to be able to keep track of that information so if a person does need to use it, it is available.
For ASO, we're planning on a log level policy very similar to what syslog currently uses. It's extensible in the manner for these are going to be arbitrary strings that you can use to set these log messages. But for right now, we want to be able to, as we migrate from syslog, be able to provide direct syslog compatibility.
At the top level, it's an emergency log level, typically not for most application developers. It's more for things like the kernel to say, "The system is actually completely stopped. I can't do anything more." It's a wonder this log message will actually get written at all. It's to the point where you're archiving the state of the world for future generations to come back and determine what happened. Alerts are for things that the user can handle and should immediately. Things like disk full or maybe my network disconnected and I'm absolutely dependent on that network. And I'm not sure if I'm going to be able to do that.
I'm not sure if I'm going to be able to do that. I'm not sure if I'm going to be able to do that. I'm not sure if I'm going maybe even but not quite the same as say like a hard device error, like I had a problem reading from this hard drive. Again, like these are some of the kernel level facilities and things like that that are higher level than some applications.
Application things start right around error where it's errors, you know, a hard fatal error where you were expecting something and that's not true anymore and you can't do anything but exit. This will provide at least a mechanism for you to provide the user with information of why you have to quit.
Warnings, again, non-fatal errors, things that the system or that your application may not necessarily have a problem dealing with, but that shouldn't be that way. And this allows the users, like if a configuration file is consistently missing, it has to be regenerated or something. Notices are for things that aren't errors, but you may still want to inform the user, like I'm starting a particular service, I've turned SSH on, I'm allowing remote printing.
Info allows you to log informational messages such as, "This user logged in from a remote connection," or "There's mail waiting for you," or any number of different things that could be useful for the user to know about. And then debug provides, again, a lower level so you can do things like lightweight tracing.
You know, syslog isn't a profiling utility, so it shouldn't be used as such. But to be able to provide the user like a set of clear things of how it got to where it did, and then as you as application developers, maybe you can do things like log analysis to figure out what necessarily went wrong.
There's a command line interface. Currently it's the same as what we currently have for syslog compatibility, a command called logger. So logger by itself with a string will log just that string at some default level priority. You can also do things slightly more sophisticated like specify the actual app name that gets logged and then the priority from the command line. So this would be just starting back up at log level notice with the application name backup.sh.
We're also planning a query interface from the command line that'll sort of obsolete using grep and that'll provide a little bit more functionality. Currently we haven't decided if this command is going to live within logger or as a separate utility. But the interface will be something like you can specify logger-q for a query.
And then you can do globs on the particular thing. So by default it would glob the message itself. So this would get both starting back up and started testing. Or you could say ask for things like give me all log messages from backup.sh with a priority level of notice.
The API, again, much like syslog, is rather straightforward. All the headers that you need are just going to be contained within ASL.h. There's very few basic data structures that you need. One of them is opaque struct, ASL_REF. It's returned by ASL_OPEN. If it's null, you have an error, and you really can't log the fact that you have an error.
ASL_LOG then calls with a ref with a tag that specifies the actual priority, and then takes a printf-style string that specifies the log message itself. And then it's always nice to be able to close your references if you don't need them anymore. But I also mentioned that you can specify ancillary data.
So using the ASL set command, you can do things like specifying the key data. You can add, say, an additional arbitrary string like, say, hostname to the log message. And then as you call ASL log, then that extra information is going to be logged with the next log message, and then that field is cleared. So that if I called ASL log again, you would not have the hostname be set.
How do we search? In addition to the ASL ref, we also need another opaque struct called ASL response. Again, opening the log reference is the same. And then you call ASL set, and then a particular key that you wish to search for, and then what that key should match as. So ASL level debug for the ASL key level will return all messages that match the level debug throughout your log archive. Then you call ASL search on that reference, and that returns an ASL response.
Now with an ASL response, you can iterate over calling ASL response next, and then to get the individual fields from the log message, you can say ASL response get on that response. So in this case, you'll be retrieving the sender, the PID, and the message, and then printing off them in a format similar to an old style syslog message. And then ASL response free cleanup references along with ASL close.
So, in summary, we have a consolidated programming interface for accessing log information, something that really quite hasn't been done before. Unified log information store, again, being able to query the single SQLite database as the backend, but also being able to specify other locations that the messages could possibly go.
The modular interface for receiving and storing messages. The API isn't planned for Tiger, but we're looking at things in the future for opening up things so you can write your own modules for being able to deal with these messages, both input and output. And it's just in general a good debugging tool for users to be able to find out what went wrong with their system and if they could do anything about it. This time I'd like to bring up Jordan Hubbard for Beyond Syslog. Things that in system administration that go beyond the scope of just looking at logs.
Thanks. Am I on? Okay. So my portion of the talk might be titled Beyond, Beyond Syslog, because we realized in this session that we were going to have a lot of system administrators in the audience, and we figured this would be a good opportunity to talk about some stuff that we may not actually be able to have in Tiger, but are all works in progress. We're certainly going to try and get some or all of them done for Tiger, but no promises. But this is sort of the direction we're going in and the types of problems we're trying to solve.
So, one thing we've noticed that's interesting about Apple's evolving user base, if I can put it that way, is we're getting a lot more Unix people. And they tend to be system administrators, they tend to deploy in IT environments, they're deploying XSERs, which is obviously a new product for Apple, and somewhat out of the boundaries of the traditional desktop market. So, they're bringing to us a lot of interesting new problems, shall I say. that I'll cover here.
One thing, actually I'll say the number one thing they asked for is better package management. Yeah, so... Unfortunately, package management is actually hard. It's one of those really, it's not very deep, but it's a very wide problem space. And dealing with dependency tracking, dealing with undo, dealing with arbitrary scripts that might run, sort of post-installation scripts where you have some really strange piece of software that has to go off and perturb the system, maybe add users. Perhaps go configure something in the I/O registry. You never really know what an arbitrary package is going to try to do. So, to get it to do that in a way which is flexible and gives you some prayer of security is not an easy problem to solve.
So, as I said, one of the, well, these are some of the issues that we have with today's package management. And one is, by which I mean the .pkg, .mpkg files that the installer eats. And number one gripe, I can't install what I've installed. Or I can't back out. So if I've installed a software update, which for some unforeseen reason doesn't leave my system in the best condition afterwards, I'd like to be able to go back in time.
There's no dependency tracking across packages. So if you have several packages sitting in a directory together, and one depends on the other, there's really no notion for that at all. The only thing that Apple provides today is something called the M package, which is basically just an agglomeration of a package and all of its dependencies. And if you have two M packages which contain the same subcomponents, they'll both stupidly install the same subcomponents over and over again. Plus, of course, you have the overhead of downloading those same bits several times.
We have library receipts and the BOM or build of materials files as pretty much the only installed software database that you get. And obviously this is insufficient. It doesn't let you do queries, it doesn't let you at a glance find out what's installed on your system, when it was installed, and other useful things.
There's no file or package conflict checking, so if two packages claim the same file, that's just fine, which of course it's not. And upgrade support is really pretty weak with the current system. It basically just checks that you're, I think you're monotomically increasing inversion numbers and then splats the new files into place. There is no command line interface that I really know of for installing packages.
[Transcript missing]
So, for the future, we're not sure what the future holds here. We're continuing to back Darwin Ports just as a pretty good solution, but we're actually looking at cooperating with some of the Linux distributions which have also started to run into the same problem.
Some of whom are saying SRPMs are not really the right way to build software, it was a good solution for its time, but we want something similar to the FreeBSD Ports collection. In the sense that we want a massive tree of building recipes. It's important to note, end users may consume packages, but those packages have to come from somewhere. And so, creating the infrastructure and environment for building these packages and maintaining that is at least as important as coming up with a good package management system.
And so, we're looking at some interesting ways of doing maybe an end-to-end solution where you start with the end user, you start with an XML recipe which describes how to build it, and then as it goes along the food chain and gets built into some sort of destination route, and then it's finally packaged up, that same XML file simply has more properties added to it which describe the package metadata, and then that's bundled in as part of the package itself. So, you never actually lose the original build recipe.
So, some other issues. System configuration is still difficult. It's something that involves lots and lots of different configuration files scattered around the system. You're never even sure at any given time what daemons are configured or will launch in response to some stimulus. And then of course once you know which daemons are out there, you have to learn their configuration formats. And it seems like every single daemon in the system has invented its own. Apache's got httpd.conf, Samba has smb.conf. You name a daemon, I'll name you a unique configuration format which applies to that daemon and that daemon only.
Different services also have different semantics. So in some cases you can edit the configuration file and then you need to go tell the daemon that its configuration file changed by sending it a special signal. In other cases there's a management command like Apache CTL which you need to run to restart it in some sort of civilized fashion. In most cases there's nothing. You go and you find the daemon in question, you shoot it down and you restart it and you hope it restarts and didn't leave any state files lying around. Which annoy it the second time it comes up.
Basically a reboot is the only assured method for restarting it properly. Which is by the way one of the reasons why some of the software updates reboot you. It isn't just because the kernel has changed, it's because it's altered some service and the only way to get the new service running is to reboot your system. That blows.
So, there's also no notion of a configuration parameter space that they can share or rendezvous in. So, in some cases, a number of daemons would like to share a common knob. It's a knob that controls some global resource, and they should all respect it. Or a global knob that says, "No daemons should run at all right now. Please, I'm doing a backup." Again, so there's no notion of that.
The closest thing we have to a shared configuration space is the sysctl table in the kernel. And obviously, that's pretty low level and only really--. It allows you to configure kernel knobs. It doesn't allow you to configure things up in user space, nor should it. And the whole remote management issue is basically nonexistent other than SSH-ing in and doing, if it has a command line interface, doing it that way.
So we've been looking at this problem very hard, and one notion that we came up with, which we've been prototyping and playing with, is the notion of a configuration API. And that's good for a lot of different reasons. And the chief reason I really like it is it finally gives you a path to go down if you want to create a new service and say, "I don't want to invent my own configuration file format.
There are quite enough, thank you." So, but I don't have any--there's nothing in libc or any of the standard Unix APIs that allows you to do that. And, okay, if you use core foundation, you can use CFPreferences, which is--which is a good start for this. It definitely eliminated a lot of dot files in your home directory.
But if you're not using CF, if you're sort of a Unix demon, then that doesn't really do you any good unless you want to drag all of CF in with--with the--with the--your set of libraries. So, we're looking at a fairly low level service--sorry, a fairly low level service which lets you set and get properties, query, walk the tree, something similar to an SNMP MIB, but a little bit with more structured data.
And so you have a configuration in a space, you have APIs for getting and setting this stuff, and you have a persistent data store, probably an SQLite database is our first choice right now. So, you have tables and rows and you can also bring up the SQLite tool to browse it if you want to do it that way.
But we're not married to that right now, it's just one potential data store. And the most important aspect of this is its application agnostic. Now you can write a Cocoa browser which lets you browse these things and tweak stuff. You can have services just programmatically go in and setting and get values.
You'll have a command line tool which for any demon which conforms to this--this new API will have its knobs registered. And then you can have a command line tool which for any demon which conforms to this--this new API will have its knobs registered with so you don't have to think of which specific command goes with a specific demon. And obviously if you also want to have a web front end to it for remote management or whatever, you could do that. Or you could write a proxy demon which authenticates to a remote client, maybe a Cocoa application again or whatever, and lets you do this remotely.
So, some of the issues that are going to make this a difficult problem is we have a number of existing configuration data stores. There's the SC Dynamics store stuff that ConfigD uses. There's surprisingly NetInfo, which still stores certain system parameters, so we need to merge those. And you need backwards compatibility.
One of the cardinal sins in the Unix market is to come up there with a bold new mechanism that completely changes all the rules and say, "But it's really better. You really want it because it's better." And they say, "I don't care. I can write Apache configuration files in my sleep, and I want to keep doing that.
Thank you very much." Or, "I artist a set of common configuration files across my Linux machines, my Mac OS X boxes." "My FreeBSD boxes, and I want that scheme to continue to work." So, we are looking at the notion of configuration file parsing plug-ins that map from one space to the other, which is not easy to do because there's an imperfect match for a lot of these configuration files.
And you, of course, have the issue of what happens when you update one or not the other. Do you try and back propagate the change if you make it to the centralized data store, but don't make it to the flat ASCII file? That it came from? Or do you just do it one way? Do you essentially do it as a migration tool? So, there's a lot of interesting trade-offs there, and we're looking at that very closely.
So, and you need to come up with a fairly high level API for this so that you can do something similar to PLIS where you have dictionaries and arrays and sort of arbitrarily complex data types so that people don't just use it as a binary blob data store and still have to pick the data apart themselves. But obviously you don't know the full range of configuration data that people are going to want to store so you need to pick a very flexible format. The P-list format actually isn't a bad one because it is quite flexible that way.
And of course you need things like transaction rollback. If I make a whole bunch of configuration changes and then it blew the system up, it would be, or somebody did it at 3 o'clock in the morning because you have five or six administrators all sharing the same duties, then I'd like to be able to come in and say, you know, at 6 a.m. and say, "Burt messed up the system." You probably wouldn't use that word, but anyway. And you want to be able to roll it back. So that's, and it's important. Or you can see the transaction history to see who it was whose fingerprints are all over this mess.
[Transcript missing]
So, some other design goals, and this doesn't just apply to the problem space I mentioned, but overall, anything doable via the UI should be doable on the command line. That's one of the number one bricks to the head we've taken over the last year or so.
Now, don't clap yet, because I didn't promise this. I said this is a design goal, and this is something we are getting at this point. Every time I go out to Hollywood or somebody who uses a lot of machines, this is like one of the number one things they hit me with. So we do understand this is a really important goal.
Some of the stuff that Apple has done is very UI-centric, like software update. And it took a little while to make it dual mode. So in some cases you have to go back and sort of do retroactive design. But going forward, we're really clear on the fact that you need to design that, that is a design goal from the beginning.
Another complaint we get is the UI did something and I have no idea what. You know, I clicked on something and some file was modified or some data store had an update made to it. Can you tell me what that was? I might also want to be able to reproduce that action on the command line elsewhere where I don't have that UI. So the UI can be a great way of showing you how to do something if it will show you how it did it. So that's another important design goal for everything that we're doing in the future. Documentation for command line tools turns out to be important.
So we're actually now doing this as part of our builds. We actually have a verification phase which goes and says, "Hey, Mr. Maintainer, your command line tool, you installed something in user bin and it does not have a man page." And it slaps you upside the head repeatedly until you do something about it. So.
So, something that we're also trying to be really cognizant of is that the road is littered with the decaying skeletons of those Unix folks who have gone before us and tried to make this work. It is a hard problem space to find the right balance for. If you get too hog wild in front-ending everything, the admins scream that you're dumbing things down, or that you didn't give them a button for exactly what they wanted to do, how dare you. And if you make it too simplistic, they won't use it.
They'll say, "What's the point? I can do this all on the command line. My fingers are hardwired for that command. What good is your UI tool?" And if you look at a lot of things that people have done in the industry, you'll see things on both sides of the number line.
So this is something that-- one of the reasons I wanted to cover these points here is these are all works in progress. These are all things that either are under investigation or are being prototyped, and this is an area where administrative feedback, people who actually at the rock face doing this stuff day in and day out, can really, really help us with. Because again, we're engineers so we'll go out and build an elephant where a mouse is called for or vice versa if you just leave us alone with the specs and no adult supervision.
But if we have real world scenarios and definite strong input, must do this, it must do that, it must not do this other thing, that is incredibly important to us. So I encourage you to send me email, contact us through channels, file enhancement reports. You don't just have to file bug reports when you go to Bug Reporter. You can also say, I'd like this. enhanced, like a new feature place, so I encourage you to do that.