Extensible Kernel Networking Services - WWDC 2001

Networking and Security • 1:02:25

The Mac OS X kernel has a powerful networking architecture that offers numerous ways to extend kernel capabilities. Learn how to exploit this architecture to develop advanced networking products such as firewalls, VPNs, and content filters.

Speaker: Laurent Dumont

Unlisted on Apple Developer site

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.

Good morning. My name is Tom Weier. I'm the Network and Communications Technology Manager in Developer Relations. I'm going to introduce session 304, which is Extensible Kernel Networking Services. Hopefully many of you have made it over from the previous session in room A1. With that, I'd like to introduce Laurent Dumont, who is one of the CoreOS networking engineers.

Good morning. I'm gonna talk about the extensible kernel networking services. And in our case, we'll see that really mainly what we're talking about here are kernel extensions and how to create kernel extension to add some networking functionality to the kernel. So as an introduction, Mac OS X networking architecture is extensible.

So we get mechanism where you can add firewalls or VPNs or content filters or network drivers. So this is the goal here is that a little bit like in Mac OS 9 where you get a mechanism for extensions. Here you can add your networking extension without having to recompile the whole kernel like you would do in a regular kernel. free BSD type of environments.

What you'll learn in this session, we'll talk in some details about the network kernel architecture. So what we have for Mac OS X and Darwin, we're gonna use both terms because everything here is in Darwin, so it's all open source. So you're really welcome to look at all this, you know, at the source yourself from the Darwin kernel. So, and we'll see how to do and filter and intercept packets from different points in the kernel, from the socket layer, from the lower levels, and we'll see that. And also how to add network interfaces and drivers in the kernels.

And one of the interesting things that you may learn here is some interesting tips about the specificities of Mac OS X kernel. And some of the things that you may learn here is some interesting tips about the specificities of Mac OS X kernel. And some of the changes, you know, coming from Mac OS 9 or even from some FreeBSD or Darwin, I mean, that are different in Darwin from FreeBSD. So there is different behaviors and some tips here that you may be interested in.

[Transcript missing]

So networking on Darwin, to summarize this, it's currently based on FreeBSD 3.2, so we get the TCP/IP stack, socket, and a bunch of other services for networking from the FreeBSD 3.2 implementation. So if you're a little bit aware of this, it's not the latest version of FreeBSD, but it's a very solid base for what we're doing here.

So it's a very robust and proven implementation. FreeBSD is used in many places, in servers and things like that. So what that gives us, it's a stack which has a lot of life, and a lot of problems have been flushed already. So we inherit all those improvements and all those things in the Darwin kernel.

However, we got some Apple enhancements there. For plenty of reasons that we'll detail, but the FreeBSD stack by itself was not completely meeting our needs. So we're trying to have a more dynamic approach where people can load and unload without rebooting and without recompiling. So we also have support for MPs.

The FreeBSD stack right now is not really MP-savvy, so we got some mechanism here for multithreading and MP efficiency and it's a networking subsystem in 10. So we also, as Vincent talked about a little bit before, we also tuned some buffer allocation for both client and some high performance for gigabit and things like that.

So that's also some of the modification we've done to the stack. And so we get this famous data link interface layer, which is a mechanism for extensibility at the bottom of the stack for adding protocols, adding network families. One of the biggest problems with FreeBSD is that basically when you get a new driver or a new class of driver, a new family of driver, you more or less have to go pretty deep into the kernel and recompile the kernel and get that to be integrated in your stack.

With Mac OS X and DLIL, we have some mechanism to do this on the fly basically. You can load and unload your drivers, add some type of different family without having to restart the machine. And as we said, it's extensible because of those plug-in architectures and network kernel extension we're going to just talk about.

So the network kernel extensions, what are they? They add extensibility to the kernel networking. They are basically part of code that's going to link against a kernel. Dynamically, and that will be part of the kernel. So this is a big responsibility. You as a developer are going to make some, I don't know, filter, a new protocol or something. And using the network kernel extension, you'll link your code to the kernel code. As you know in Mac OS X, and it's been said in all the sessions, but there is a pretty hard boundary in between user mode and kernel mode.

In user mode, basically whatever you do, if you crash your application, you're not going to bring down the machine. When you run in the kernel, you're in protected mode and you have access to all the goodies and all the resources. It means that if your code does something wrong, it's going to panic and it's going to be a very bad experience. So kernel extension in that sense are some things that should be used only if what you're trying to do cannot be done in user mode.

If what you're trying to do as an application using networking can be done in user, without running in the kernel, it's always best to do so. Why? Because if your application crashes for some reason, you'll bring down your application. User may have to restart your application, but life will be fine.

If something goes wrong in the kernel, it's a reboot, and that's really something we're trying to avoid at all costs. So this is a first word of caution. We're going to have several of them during this talk. Kernel extensions have the potential to crash your machines. You have to be really careful about what is done there.

So what you can do in a network extension, and basically you can do a filter and case. So filter and case will see, can modify, and inject or drop packets. So at those different layers we've seen before, you'll be able, we're going to go through the different type of filter and case you can make. But those filters will intercept packets at some point.

And you'll be able to do whatever you want with a packet. So you can decide to let it go through. You can decide to modify it. You can decide to swallow it, duplicate it, whatever you're trying to, you want to do, you'll be able to do that through filter and case. So this is a very powerful mechanism.

However, as I said, you have the potential to, you know, if you, somebody, if you're sending back a packet which is bad and it's going to crash somewhere up the stack, you know, it's going to crash somewhere up the stack. You know, it's going to crash somewhere up the stack. It's a bad thing. So you've got to be really careful about what you're doing. Network kind of extensions are using the I/O Kit functionality. So through the I/O Kit mechanism, you can dynamically load and unload the NKEs.

And there is no need to reboot. This is something that you can do by yourself or you can have through a set of dependency your NKE be loaded. We'll see in the next slide. So again, you're running in the kernel. So big liability. Be careful about your pointers. Be careful about your doing. When you're unloading, you have some special things to do to make sure that you're living a clean state after your module unloads.

So, proceed with caution. Another important point is that there is no real API. The networking subsystem is wide open. You're looking against all the symbols in the kernel. So there is no guarantee of binary compatibility in the future. Just a simple example here. If we change a structure or even worse, if we change a macro and you have linked with an old version of this macro, somehow this is going to work. You won't have any resolution problem because this is a macro.

But the underlying implementation of the macro is going to be different from what is in the kernel. And your NKE may work for a while, but at some point when it's going to hit this macro, it's going to do the wrong stuff and the consequences might be harmful. You can potentially panic or even worse, panic two hours later because you freed the wrong stuff. So this is a problem here, but you get to be aware of this.

The binary compatibility is not guaranteed in the future because we may have to fix a stack. We may have to fix some of the things. So there is some mechanism in I/O Kitties. I/O Kitties that let you declare dependency and that will make sure that your NKE won't load if the kernel version has changed and things like that. So those are things that you need to look into when you're doing an NKE. Darwin is not BSD.

Darwin is based on BSD. So some of the rules apply. Most of the rules, if you're used to BSD, apply, but not all of them. And we'll go into some of them. We'll talk about things like funnels, or things like where you have to be a little bit careful with MP.

So don't expect your code that you just got from open source for the kernel part to just compile and run in Darwin. It probably will, but you need to have to be careful to look at some of the aspects we'll talk about. And get a pretty good eye on it and see what are the areas that may give you problems into Darwin. So don't expect this to just compile and link it. into Darwin, it will work. You may have to do a little bit more.

OK, so the NKE dynamic loadings, so as we said before, this is provided by I/O Kit. So basically a kernel module, is it a filter, is it your protocol, you know, whatever you do as an NKE will have two required entry points. A start routine, which would be called when your module is loaded, and a stop routine, which is called when your module is unloaded.

This is basically it, all the rest is in terms of the NKE I/O Kit side of things. All the rest is going to be the filter or whatever you're going to use as a mechanism to plug yourself inside the kernel. So the NKs are not automatically loaded. You have to load them or invoke their loading through a startup script or through dependency.

You can have a chain of dependency like that will require a start up. You can have a chain of dependency like that will require your NKE to be loaded. This is pretty much what's going on for some of the NKE we got in the system library extensions as well.

The Apple NKs live. Some of them are loaded through scripts and some of them are loaded through dependency. So there is three command line tools that you should be aware of for, you know, especially if you're debugging and doing your NKs. Kxload is going to load your NKE. Kmodstat will give you a list of all the NKEs.

Kmodunload will load your NKE. Your NKE won't load if there is some symbol collision with what's in the kernel. This is I think in the next slide. No. One of the things I wanted to add here is that when you're going to do an NKE, you should use some prefixing.

For your symbols there because you're gonna be in the same address space, in the same symbol space as the kernel. So if you declare a function which is, I don't know, TCP connect, right? You're gonna have a problem because this function is already inside the stack in the kernel so your module won't load.

And also we would like people to somehow prefix their symbols with their own kind of prefix, I don't know, maybe their vendor code, the Apple OS type, to be sure that they're not gonna conflict with anything else or any other vendor NKEs that might load. So that's something to think about.

So different, we alluded to this a little bit, but different types of NKE. So four main types. Socket filters. Socket filters are at the socket layer, we'll go into details on this. So basically they let you intercept socket calls. Network protocols. Network protocols can be implemented as NKE also. And there is mechanism to add those network protocol to the Darwin stack dynamically.

Data link interface layer filters. So there's different types of filter here for more of a lower level type of filtering. And also network interfaces. If you add some different, a new family, a new type of interface, you might have to have some NKE to describe that and let the protocol and DLIL know about your new family. So we'll go into those.

So first of all, the socket layer. So just for people who are aware of the Unix or the FreeBSD way of things, the socket layer is right under the user to kernel boundary. And it's got a pretty interesting role here. Is that the socket layer basically is doing all the copying of buffers in and out of kernel space. So by this, when your application under Mac OS X, when your application is trying to do, let's say a SAN call, it's gonna try to send something on the network.

The buffer live into your user thread in your application, and they need to be put into the kernel memory. And this is done at the socket layer. The socket layer here is gonna take the memory, I mean your buffers, and put them into the socket buffer. Cues will go in details, a little bit more details about this.

And the copy to the kernel space is done at the socket layer. One thing to note, for people who are more aware of FreeBSD, is that here, the execution of your thread is gonna continue in the kernel. So your user thread will do the SAN. So we'll do that from that thread.

So on the other side, the socket layer is where the protocol stack are gonna send the buffers, and basically, by a mechanism, awake, get the application awakened, the application thread awakened. We're gonna come another thread, the input thread, and by some mechanism, we'll get the application awakened, and the buffer will be copied from the socket layer back to user space. If you remember what Vincent was talking about this morning, about those XTI SAN buff size, well, this is where they are, basically. We got two sets of cues here, one for sending and one for receiving. And this is, they have some parametriable size.

So the socket filters actually live into the socket layer. So they're a way to intercept packets, both coming from the application, or going from the application, or going up to the application. So different sets of calls, if you are aware of the socket calls, most of them have some, you know, some kind of a mechanism here for filters.

So what we'll do is, when you'll create an NKE socket filter, we'll have a mechanism to know that your socket, your filter, your code need to be called. And depending on which call you added your filter to, you will be called and you'll receive the packets and decide either to modify, drop, you know, do nothing most of the time.

But you have the flexibility to do something with packets coming in and out of the socket, or you can do something with packets coming in and out of the socket layer. So this is one way of putting a, that's a content filter. I'm talking about all this in this slide. So sockets, again, is the API for networking.

This is the native API. So everything is going through the socket layer. Socket is a glue in between application and network protocols, we talked about this. This is where you cross the boundaries of the kernel to user boundary. It sits above the network protocol. So the network protocols are not in the socket layers. The socket layer is gonna decide which protocol need to be called.

Socket is kind of a file structure. It's following the Unix type, everything is a file system, more or less. So it's a subset of that. It's got some specific calls, but it's following some of the systems. You can do a read or you can do a write on your socket.

So we talked about this. Socket has a pair of packet queues for incoming and outgoing packets. So your packets are going to sit in those queues, especially on the way up, until basically your application is awakened and it's going to grab those packets. So as we said, Darwin Sockets have plug-ins. Each of those calls, let's say, SB receive, for example, is going to run, is going to look through all the filters that are associated to it before it actually does the call. So you have the opportunity to just discard the packet if you want or change it.

It's one way. Okay, so socket filter and KE, yeah, we just talked about this. Oh yeah, the other thing is you have two type of socket filters here. You can have, which probably is the simplest one, a global socket filter. Your filter, the filters that you create, is gonna be invoked for each socket.

So every socket is gonna run through your filter. Even if you don't care, you'll say, "Okay, well I don't care." And you'll return and the packet won't be touched. - And the application, I mean, and the processing will continue. There is some, a little bit more complex way of doing things where you can do programmatic socket filters that will only apply to a certain type of sockets. You know, let's say you just want something for web traffic or something. You have some way to decide that your socket filter will apply only to that.

You have to register your NKE handle with Apple. So those are for people familiar with Mac OS 9, those are the OS types, the vendor types. So that let us identify the sockets and there is no collision when you insert your sockets. Your socket filter can be run after some other vendor X socket filter, so you don't know that. So we need to have some way to identify all socket filters. So you need to use the NKE handle. So C with DTS to get your handle there.

And as we said, an example of this is a content filter where you're going to decide based on some... - You know, your own criteria, what you want to let go through in those packets that you receive each time you get a call. So, either you change them, you swallow them, you do whatever you want, you duplicate them, that's what, you know, whatever your content filter or your socket filter is gonna do.

So important points for the socket NKEs, there is no built-in reference tracking. What we mean by this is, - That you have to, in your NKE, you need to keep track of where your socket filter is inserted. If you're inserted in 15 different sockets, you'll receive, you'll know that, and basically what it means, if somebody asks you to unload, you get to keep track of those insertions.

And I'm gonna explain briefly what's going on, that when we run the socket filter code, basically there's a pointer to your socket filter handler where you're gonna receive packets. And if you don't correctly remove all this when you unload, we're gonna call into some code that doesn't exist, and guess what?

It's gonna panic. So it's all your responsibility to take care of this and refuse to unload. It's, it's pretty, it's pretty reasonable if your socket filter is in a state where it doesn't know or for some reason cannot really unload, to refuse to unload. And people won't be able to unload your socket filter, but it's much better than panicking five minutes later.

So that's something you need to be aware, and as a warning, you're in the kernel, it's pretty much wide open. You know, you have wide open access to all the structure, all the thing, but you're part of the kernel. You're just a function. If you're not there, you get to do the right stuff to plug the hole, basically, and don't leave, you know, no pointers and things like that hanging around. Because that's, that's gonna be a pretty bad user experience. So that's why we, another word of caution about using kernel extensions, you get to be really careful.

Okay, so now that second layer when we looked at what basically the networking in Darwin is, the network protocols, two examples that come to mind here is TCP/IP and AppleTalk. So that's more or less something which is pretty close to what you'll find in FreeBSD as a way to register protocol and add protocols is pretty much the same. We get some
Some mechanism to let you do that dynamically, add and remove dynamically your protocols and your domain. So this is a second layer we're gonna talk about.

So what's important to know here is that a domain defines a protocol family. One of the big examples here is pfinet. This is where all the TCP, UDP, and raw IP live. Another one that you can think about that we have in Darwin is pfinet6, which is for the family of IPv6.

And so this is something that is pretty much covered in a bunch of BSD books, the protocol family, and how the protocol handler works. So this is what we have in Darwin here. From the socket layer, we're going to decide which protocol handler, depending of the type of sockets, the address family you put in your socket.

And if you have TCP, you do a connect, we're going to call TCP connect. And this is done. This is a structure where you can add your own protocol. If you come with a next killer protocol family that's going to replace IP, this is where you're going to add it.

So if you do that, please do it and put it in Darwin. We'll be happy to take it. So this is extensible. Same thing. You can add your own protocol. And there is a mechanism to declare this from your NKE. Again, you can remove that. When you remove it, be careful. Clean up after yourself. Otherwise, a very bad thing will happen and the kernel will panic.

[Transcript missing]

some kind of a wire or wireless but you know they're moving some physical bits if we can say that well you get some pseudo drivers like tunneling devices and things like that that just add stuff or remove things to frames that have been generated somewhere else so they can generate their own frames but they don't they don't go to a media somehow all those go to dlil and register themselves and and give some information about the type of framings they do and the type of you know specificities they have so dlil is here as a central point to uh to handle those and added to this if i go back to the previous scheme here in between those we got two types of filter again we got in between the network protocol we got the protocol filters that register with dlil and say i want the ip packets for interface you know en0 you know your your built-in ethernet and you also have interface filters that are registering with dlil and are more lower level those guys will see all packets for these one interface say you want to see everything coming from en0 or cr port n1 or whatever ppp and you're going to put an interface filter here that will not be protocol dependent you will get all your appletalk plus ip plus you know whatever packets here well as a protocol filter you'll specify that you're interested only in ip packets or appletalk packets whatever you want so - So this mechanism here is different from BSD, and this is something you need to be aware of if you take something like a driver or something like that.

It's got to, or pseudo driver, let's say, tunneling driver, it's got to follow some different rules that it would in FreeBSD 3 or 4. It will have to declare some DLL modules there to register with DLL and do things differently. There is some example in the end case and in the Darwin code about how to do that. AppleTalk is an example. IP does that. IPv6 does that. So.

So the interface layers is for, as I said, I/O Kit type drivers or absolute drivers. So they attach to DLL and basically tell the networking stack that they're available. So dynamically, your airport is turned on or something. And the EN1 is going to basically tell DLL, hey, I'm an Ethernet type of driver. And this is where I am. So it's more of a flexible mechanism for attachment of detachment of interface on the fly.

On the same way, you register your new, you load your new protocol. And dynamically, it's going to tell DLL that, hey, I'm protocol this type and I'll take whatever snap ID for my packets. Now I want all those packets on this interface and that interface. So you'll see around the DLL tag, which is the cookies or the handle that you'll get for filters and you'll get for protocol attachment or to interface. And all this is what identify basically your unique connection to DLL.

[Transcript missing]

The DLL filters, so as we said before, there is protocol filters on top of DLL and under the protocols.

The difference, what they do is that they see all protocol datagram for an interface. So the little trick here is that you register per interface your filter. So if you register for IP on EN0, you'll see valid IP frames for EN0. You won't see any AppleTalk packets there, but you won't see IP packets for EN1 for your airport, let's say. So you need to register on both.

So they're kind of low level. However, you got the interface filters that give you even more flexibility because basically at this point you'll see the walled frame. So framing, looking at, you know, if it's an IP packet or an AppleTalk packet, won't be done in the interface filter. So this is between the interface and the family here. You'll get access to the packets as they come from the driver.

You'll get access to the valid packets coming from the driver. So you'll get access to the valid packets coming from the driver. You'll see full frames. And this is where you can, if you're trying to do like a VPN solution or a firewall, is there one way where you could put your NKE is asking, you know.

You could be doing that as a protocol filter, or add the interface filter. It really depends what you're doing. But those are good point for getting all the traffic. While things at the socket layer are more for getting things distinct to an application. Here you'll get all the traffic for all the sockets, basically. Or even if they don't go to any socket. If they just dropped in the stack, you'll see them at this point. It's before any processing by the stacks. So those are good solutions for VPN and firewalls.

Network interface, there is still the FreeBSD, the BSD IFNet structure. You have one per interface. It's I/O Kit based. You have a lot of things for Ethernet. Ethernet has, there is a lot of sample and the family shows how to do that for I/O Kit. So there, that's, if you're doing an Ethernet driver or, you know, some drivers that really talk to a media, you really need to look into I/O Kit because those drivers don't live into those four layers.

They live into I/O Kit. However, if you're doing a pseudo device of some, or something which is a little bit in between like PPP, you may have to do some work in the I/O Kit for your driver, you know, dealing with media side. And also at the, at the network interface layer. Here, where you will have some DLL work to get your stuff registered with DLL and known by the network stack. So the network, networking subsystem is not part of I/O Kit.

But I/O Kit has some way to basically give the packets and, you know, call some DLL functions to make the interface well known. One case where if you're doing a pseudo driver like a tunneling device, you may, you may do that only in the networking subsystem. You don't have to go to I/O Kit. If all you do is take packets, you know, add a header to it and do some encapsulation of some sort and send it back to an Ethernet or another interface, you won't have to go through I/O Kit.

That's a thing to know. There is another, we just mentioned that here because there is some confusion here, but those filters we're talking about are different levels and people who are coming from Unix may know, especially on the BSD side, so BPF, which is, BPF is really an I/O Kit kind of tap.

The difference, it's, well, it's a standard way, if you're not aware, if you're coming from on FreeBSD to get things like sniffer type application, network traffic analyzer, and those kind of things. So those are tap from the driver, which will copy all the frames back to BPF and get them to your application, which asking for BPF traffic.

The big difference with the DLL filter is that in DLL, when you put, let's say, an interface filter, you will get packets, but you'll decide to let those packets go through or make a copy and do your own kind of tap functionality. But by default, the packets, there's one instance of the packet. Here, you get a copy of the packet which is made.

And also, for internet, that's pretty true for internet, BPF, opening the BPF device will put the driver in promiscuous mode, which means you will get all the packets, you know, basically seen by your interface, and you can see the packets, you know, basically seen by your interface, and you can see the packets, you know, basically seen by your interface, not just the one for your MAC address and, you know, the various multicast or broadcasts that you can get. You will get everything that is physically seen on your segment. So it's something you won't see from a DLL interface filter. DLL interface filter will get only the packets that are valid for your address.

It's not gonna be in promiscuous mode. And again, there are some standard hooks in I/O Kit network interface for this. So that's a good model to follow if you're building your own drivers. It's a neat utility to be able to use TCP dumps. There's a bunch of, you know, services on top of that. And that's pretty much low work to do to get the BPF support in your new driver. So we just mention it here because there's some confusion between those and the DLL interface filters. They are not exactly the same. at the same level.

So, non-IoKit interfaces need to support BPF works. That's for BPF. Okay, another important thing here that we wanna talk about about the networking subsystem is the MBUF. So, MBUFs are the memory buffers that we use all over the system in the networking subsystem to all network data. If you're coming from 9, you're pretty much aware of the MBLOCs which are used in the OT or the streams modules. It's more or less the same thing in the BSD world.

What we do with MBUF is that we're gonna hold either the packets coming from the socket layer in MBUFs or things that are coming from drivers. So, I/O Kit is gonna create some MBUF with the packets received by, let's say, an internet driver and sends this back up to the DLL, of course, and DLL will route those packets back to TCP/IP or AppleTalk, whatever. But those are all MBUFs that are manipulated. So, MBUFs are interesting because, like MBLOCs, manipulate pointers to data. So, once you're in the kernel, there is no more copy of data. Everything is done through MBUF.

So, the drivers copy their data from their ring buffer to the MBUF, actually, there's a MBUF already, and they're passed up until they get to the socket layer. And then, when the application will be awakened and will get its data, then they will be copied back from kernel space to the user application memory. And they'll be released at this point.

So, what's interesting in the MBUF, is that the data is already there. So, what's interesting in the MBUF is that, if we take the example of a packet going down, the example of a send, you're sending some TCP traffic, what's gonna happen at the socket layer is that your packets from user space will be copied in the kernel into some MBUF clusters, into some MBUF, and those MBUF will be in the socket queue, remember the socket queue we were talking about earlier.

And what we're gonna do to send, I think, the data, is that we're gonna send the data to the socket layer. So, what's gonna happen is that the IP packet, CPIP packets, is that we'll add an MBUF to a part of the data that you sent, and this will logically point to the data in your socket buffer. And until, and this is gonna go down to the driver, and the driver will send this, and the driver will send this.

we'll say it's done, but the data in the socket buffer won't be released until the data has been acknowledged by the other side of the TCP protocol. So if we need to do a retransmit of this packet that we took from your data, we'll do that by prepending a new header and pointing to your data, but your data will be the same in there. So there is no copy here. It's only once we know that all the data has been acknowledged and we don't need it at the socket layers that it's gonna be released.

So that's something to know about mbuf. For you, as somebody who's gonna write in network kernel extension, you have a bunch of function to access mbuf, to allocate mbuf, to manipulate mbuf. There is pretty much everything you can think about to do with mbuf in mbuf.h is a good start in the sys directory to look at.

However, there is a bunch of macro dealing with mbuf. Try to avoid using them if possible for the problem we talked about before. If we change the implementation underneath in the macro, that may give some kind of panic if you're using that. So try to avoid the macro use if possible.

The other thing to know which is a little bit different from BSD is that we have a different VM subsystem underneath and the way we allocate memory is kind of different. So in free BSD, you're pretty much, guaranteed that you'll get memory when you allocate an mbuf. It's not the case in Darwin. So be aware of that, that your allocation of mbuf can and will fail.

There is two modes for mbuf. You can ask for an mbuf with a don't wait, which means give me an mbuf if you have one to handle your packets, your data. If you don't, just return. This is something you use in the Fastpass, let's say on your transmit and receive path. In your end case.

This be warned that you may get, you will get a null back. And probably the best way in that case to drop your packet and do whatever is good there. You can also ask for a wait mode, but don't do that on the Fastpass because you're gonna block the thread while we're trying to allocate memory. So you're gonna block potentially a thread.

Yeah, it's not very good. So do that for things that are low bandwidth kind of things. When you start your protocol and need some mbuf to upfront or something like that. But don't do that on the Fastpass. So yeah, the rule is do not block. So mbuf, you get to release them. There is some rules. It's not completely depending on socket buffers and everything. You'll see that you have to release them or somebody's gonna release it for you. There's no real preset rules.

Yeah, maybe some other thing about mbufs. No, I think that's it. Just, yeah, just the warning is be careful with mbuf. Be careful about the use of macros. And there is some command like netstat-m that's going to tell you how many mbuf you're using in your system. And this is a really good way to see if you have a leak in your NKE. If you see the number of mbuf, you know, going up in use, it's probably somewhere you forget to release one. And so you might want to look at that.

You might want to check that when you're doing debugging of your NKEs. - From inside the kernel from GDB, you can look at MB stats. It's gonna give you some stats about the allocations, number of drops, and things like that. And again, it's gonna give you some information if you forget to release some MBUF in your NK.

Another thing here that we have in terms of kernel extensibility are the kernel events. What we have is basically a new domain here, the PF system, which from the socket protocol gives you a way by listening on a socket to see some kernel events. So it's going to report events from kernel to user space.

And those are pretty low bandwidth events. Usually the kind of events we have are things like your interface, you put the link, the link is up, the link is down, things like that, or the IP address change. So you will receive an event if you listen on this, if you connect to this socket. This is used mainly by the system configuration. The configuration is pretty simple. You can also add your own events to this mechanism.

You can, if you have like some specific driver, some specific driver families that you added, and you want to add your events and you want to have an application listening to this, those events, you can do that. It's not meant for high traffic. It's just low bandwidth stuff. But it's a system, PF system.

There is another thing that we add in Darwin, which is the network NDRV, the PF NDRV. So that gives an access to, yeah, I want to mention for people coming from 9, it has nothing to do with NDRV that you, you know, the driver, the native driver from Mac OS 9. It's really, you know, the network driver here.

So what it gives you is from the socket level, it gives you access to all the packets, to the raw packets. And one, you know, you can do, as we said before, try to avoid, if you can, to do your protocol or things you're trying to do, try to avoid doing them in the kernel for all the reasons we stated before. If you want to do your own protocol in user land, you could use PF NDRV to get some packets there.

As an example, in Darwin, you can look at shared IP, which is the name of the kernel extension we're using for doing the port sharing with classic, with classic networking. It's an example of basically classic listen to PF NDRV sockets to get its packet back and, you know, emulate ZOT, you know, driver, DLPI driver that way. So the DLPI drivers is really talking through a socket to PF NDRV. That's an example for you if you're trying to do those kind of things. So.

Right now it works on Ethernet. Okay, funnels. Funnels are a mechanism that we introduced in Darwin. This is not something you'll find in FreeBSD. Why do we have this? Is basically, as you know, we have an MP and we're an MP system, so we have a multi-processor.

And what we want is to have a mode where we can have performance in MP and the networking stack in Darwin from FreeBSD is not completely MP safe, let's say. And so funnels are a mechanism to give some mutual exclusion to make sure that when we run into the networking code, networking from coming from I/O Kit up to the socket layer, to the system call, to the packet level, that nobody else is gonna be running into the code at the same time.

So we got a mutex that we take from the socket layer or from the packet level that's gonna make sure that nobody else can be running in the, let's say, TCP code and do something at the socket. One problem to think about is you're trying to send something on, let's say, TCP, you're sending some data on the TCP, and you're sending some data on the TCP socket, at the same time we're getting a disconnect.

If we didn't have a system like funnels, we're in a multi-processor environment, you could be doing your TCP send while the state of your TCP transaction is being modified by the packets, the incoming packet, and we don't wanna do that. We are not prepared for that. So the way to do this is to have funnel.

So basically in the Darwin kernel, there is two funnels, there is the network funnel, which is used by the network stack, and the kernel funnel, which is used by the rest of the BSD subsystem. If you remember the diagram from before, inside the Mac OS X kernel, we get BSD subsystem, which is more or less networking plus file system. So it's used by the file system.

So what it means is that in the file system or in the networking, you cannot have two processors at the same time. However, by this mechanism, you can have one processor running, dealing with packets and processing packets while the other processor is doing file system sync. So that give us some good performance in servers and MP environment, where you can have your Apache or Apple share server do at the same time, have one processor do some networking activity while the other is flashing stuff on the disk or doing some file access.

So the problem with this funnel is that you need to be aware of them. And the rule I stated is that basically we have one lock on top of the networking stack at the socket layer. And I mean, that's a system called layer. And one at the bottom is not completely true. So we're gonna go into some of the detail you need to do in your NKE to deal with funnels.

But that's a difference from FreeBSD. So that's something you need to be aware of. And the thing is here is that you need to deal with funnels. You cannot like say, I don't care about funnels. Your system is gonna be on having problem if you don't deal with that right.

Okay, so when to use them. So you need to set the network funnel and specify that you wanna work in the network funnel in your module start and stop. Why? Because you're called by I/O Kit, basically when your module starts. And I/O Kit is not running under a funnel.

I/O Kit and the Mac part of the kernel don't need to have funnels. They're already MP, completely safe, and they don't need that. So you need, when you're gonna be called, you need to basically tell, I'm gonna use the network funnel, and there's calls to do that. Same thing for the stop.

Timeouts are another one where you need to be switching funnel or taking the funnel, taking the network funnel. Why? Because the timeouts are called as a direct mapping from the Mach subsystem, and you're not under any funnels. So if you, to run your code, you need to explicitly say, hey, I'm gonna run the networking code, so I need to grab the network funnel first. And there is also things, when you're doing things with threading, if you create a new thread, you need to explicitly tell that this thread has to run to the network funnel.

Yeah, it's a preemption point. So each time you're trying to grab the funnel or you're gonna leave the funnel, you can be preempted. Another thread in the kernel can run. So be aware of that. Your state might change when you come back from asking for getting a funnel or leaving a funnel. Just be aware of that.

And more or less, if you're aware with FreeBSD and the SPLnet, SPLX, in Darwin, there are more or less no ops. I'm saying more or less because all they do right now is they just make sure that your ether under the network is a kernel funnel. So they don't have any active nesting or anything like that.

They just, they don't have the, I mean, in FreeBSD, you would product yourself for a critical section by raising SPLnet and say nobody else can enter at this point. On 10, there are more or less no ops. But if you're under the network funnel, nobody's gonna get in there.

So that's for funnels. So that's something you need to look at and look at our kernel extensions in Darwin to see how to use that appropriately. NKE control. There is a way, PF NKE, to control the NKE from a process, from user mode. So let's say you inserted a socket filter or a data link interface layer filter, and you want to have some control to that, you can have a special mechanism like a conduit to control your NKE through this NKE control, the PF NKE.

The NKE manager is not loaded by default, so it's something you need to do. And we're kind of in the changing, we'll change that. So right now it's work in progress. It's a character device. And there is some other way that we encourage to talk to your NKEs. You can go through I/O CTLs. You're gonna intercept I/O CTLs and use that as a control mechanism for your NKE. Or also socket options. This is what we do. If you look in shared IP, we use socket option as a control mechanism for the NKE.

So VPN, yeah, we talked about a way to implement a VPN. So you could do that as a pseudo device, depending of what you're doing, the type of VPN you're doing. Or you could do that as a DLL filter. It really depends about how your code is organized and which level you think is appropriate for you to plug in. Be aware that the CAME IPSec is coming to Mac OS X. So if your VPN solution is using IPSec, IPSec right now, CAME IPSec is in Darwin.

You can take a peek of that and this is gonna come and be sometime in the future, this will be part of the Mac OS X kernel. So you can build your own Darwin kernel with IPSec right now and use that as a base for your VPN solution if you're using IPSec. And talk to us if you're interested. We're really interested in knowing how we can help you with that.

Summary, I just want to go again because the message here is about the rule for NKEs. You have to be really careful about your dependency. You have some IOCAT mechanism to say, "Hey, I'm linking against that kernel. I don't want to be loaded if it's version 15 of Mac OS X." You got to keep track of your resource and usage. Nobody's going to clean up after you.

You have to do it. Do not block input on the Fastpass. You're going to block the world networking stack here. You're part of the networking, so you just have to behave and use those rules. You have to know your split funnels. Be really careful about that. Remember timeouts and anything that is coming out of the kernel funnel or the rest of the kernel is probably not under the funnel. So just check with that and be aware of binary compatibility.

In the future, as we said, IPsec, IPv6 are coming. They're part of Darwin. We get, you know, you can look at that in the Darwin kernel. We're gonna be based on that. PPP extensibility, there will be a way to get some plugins for PPP. If you say you wanna have PPP over ATM or PPP over some other new cool media you just invented.

The NKE control via socket API is in flux, so it's gonna change a little bit. And also in the future, we're planning to get something to, you know, instead of funnels, some more finer grain locking of sockets and things so we don't have to use a funnel mechanism. So that's pretty much it. So I hope you got information there. You can have additional resources here. Mac OS X, of course, you've seen those. I'm gonna go fast through those. Darwin org and FreeBSD are also good points for information. information.

Resources, the TCP, the Stevens books that we talked about. The implementation, design implementation of BSD 4.4 is also pretty interesting. However, be aware of those differences we talked about like funnels, DLL, and things like that that gives you a pretty good idea about what's going on but it's not exactly what we have in Darwin. And the network kernel extension PDF file that has a very, very complete coverage about all the type of DLL filter you can do and socket filters. This is where you're gonna dig into all the details about how to do your NKE.

And also kernel extension tutorial from Iokits which is gonna tell you about how to build NKs with Project Builder and what are the rules for dependency and things like that. and tell us what you need from us. We're trying to make this extensible. We got some mechanisms that we think cover some ground. So tell us if you need more or what you think we should change here. And we're really interested in getting your feedback on this.

And the roadmap this morning was the networking overview session. It's done already. And this afternoon, pretty interesting, the 303 network configuration mobility. We'll see how those guys are using the events to get some state information from the stack. And Thursday morning, we'll all be there for the network feedback forum, which is in room J1, just next. It's always fun. So with that, you know who to contact. Contact my boss, Vincent.