Core OS State of the Union - WWDC 2009

Mac • 39:59

Speaker: Simon Patience

Unlisted on Apple Developer site

Downloads from Apple

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript has potential transcription errors. We are working on an improved version.

Good afternoon and welcome to WWDC. This conference is about you, the developer. And if you remember in Bertrand's last slide he said "the ball is now in your court." Now some people think that soccer is played on the field and others down on pitch but Bertrand seems to think it something that is played on a court.

[ Laughter ]

[ Applause ]

But I'm here to talk to you about Core OS, Core OS is the foundation technology for the Mac and for the phone. And it's a foundation that's built on the stable core of UNIX, a system that is renowned for its reliability and its robustness. And our UNIX system forms the engine of Mac OS X and the iPhone OS.

Now UNIX has a time proven architecture. It's a modular system with well defined interfaces between subcomponents inside the system that allows for the evolution of these subsystems to face new technology challenges. For example, the Scheduler is evolving to deal with the ever increasing number of calls and at the same time adding features like support for symmetrical multithreading or hyperthreads.

UNIX is also an extensible system, with things like switches internally for things like VFS where you can add more file systems of different types and then eventually add new file systems when the need arises or launchd which is replaced all the init style daemons but kept all the interfaces with the rest of the system the same, but adding new functionality to meet the challenges of an ever more dynamic world.

And it's scalable. UNIX has always been a strong SMP scalable operating system and this has stood us in good stead to meet the challenges of the multicore world. Now in the Core OS we are a very strong user of open source. And in fact there are a very large number of open source projects that we use, in fact over 400 projects in Core OS alone. Some of these projects we just take from the community, we compile them and use them. Others, we actively contribute to the technology pushing the technology forward and yet others still where we create the projects, host them on Mac OS Forge and drive the technology forward.

An example of that would be Bonjour. UNIX is also a portable system and that stood us in great stead when we moved from PowerPC to Intel. That transition was relatively straightforward for us because of the porting layer of the base of the system. But more so, when we started to work on the phone we could leverage the open source to provide a lot of the arms support that we could easily integrate in making that transition from the Mac to the phone at the Core OS level also relatively straightforward.

But it's not only the system that's portable, applications are also portable. The POSIX standard has been around for a long time and is very stable and allows your application to move around between systems but actually even if you go out beyond the Core OS and look at the system as a whole. There are many shared APIs between the iPhone and the Mac, and in fact with Leopard and iPhone OS 2.2, 80% of the APIs available to you on the phone are also shared with Mac OS X. That's a huge number.

And as we move forward to Snow Leopard and to iPhone OS 3.0, even more of this APIs are shared with the movement of things like Core Data coming from the Mac onto the phone and Core Location from the phone onto the Mac. And in fact we've added over 1,400 APIs into this common set between the two platforms. But it's not only APIs that are common. In Core OS, 85% of the code that runs on the phone also is running on the Mac, 85% and that number is only going to increase.

So we have a stable foundation at the core of our system built for innovation, that's UNIX and that forms the center of the Core OS, the foundation technology, from Mac OS X on iPhone OS. So I've told you a little bit about the foundation, so let's talk a little bit about technology. The first technology I want to talk about is Time Machine or more specifically Time Capsule. Now when you set up a Time Capsule and you do your first backup, typically the first backup that you do is around about 100 gigabytes.

That's a fair amount of data. Now we have improved the performance of that initial backup and made it 55% faster. That means that you could do your first backup over Wi-Fi in about 5 1/2 hours, that's easily done overnight. Now the interesting thing is that we didn't optimized Time Machine to do that. We actually optimized a whole host of technologies.

We took a slice across the whole system and made sure that each part of it was working optimally. In Time Machine itself we added support for doing overlapping IO, in Spotlight we increased the speed of indexing, in the networking code we used hardware checksums whenever possible and when not possible we've optimized the software checksums so that they're as fast as they can possibly be.

We've increased the speed of journaling in HFS, we're doing-- adding concurrency for disc images and for AFP we've reduced the amount of traffic that goes over the network. Now the great thing about this approach where you take a slice of technology to solve a specific problem is that the whole system benefits, all the applications will benefit from these optimizations. So that's Time Capsule. So the next technology I want to talk about keeping in the networking vein and the file system vein, are network file systems.

So let's start off and look at server performance. We've done a lot of work to optimize the performance of our various different network file systems and so for AFP, our standard file sharing protocol, we have increased server performance on the AFP bench, benchmark by 30% which is pr0etty good. But for NFS using the SPEC SFS 2008 benchmark we've doubled the speed of the server performance for NFS.

And for CIFS which is the Samba implementation, that performance again for the SPEC SFS 2008 implementation is 130% faster in Snow Leopard than it is in Leopard. But it's not only the server, we are all sitting on clients and their performance is important too. So if you look at file copy, taking a file and copying it across in a network volume, in AFP we've increased that performance by 30% too.

With NFS that's gone up by 50% and SMB, the Windows file sharing protocol, is 60% faster for doing a file copy in a share. But if you look at opening a finder window on a network file system, AFP has always done pretty well because it's got some great caching technology and we've taken that caching technology and put it into NFS and to SMB which has yielded a massive 5.6 times faster for NFS and 5 times faster for SMB when you open up your finder window on a network share, incredible performance improvements. And so that's on network file systems. The next technology I'd like to talk about, also in the networking vein is VPN.

But this time let's talk about the phone. Now, you have your phone and on your right there's your enterprise providing services like mail and web and so forth. But your phone is outside the firewall so you try and send the piece of mail and it won't go. So what do you do? So fortunately, your IS department has or IT department has configured your phone to use VPN so you go into Settings and then General and Network and VPN, and you find your VPN configuration and you select then you have to then put in your password.

Finally when you got all that, you manage to get a tunnel into your enterprise and then your mail can go out. So that's a bit of a challenge. I mean that's a lot of work to do in order to be able to connect your enterprise. So what we have done in the Core OS is to take a bunch of technologies that we had for various different things in different places and we've made a feature called VPN on Demand and how does this work.

Well you remember your IT department gave you-- the VPN configuration is called a configuration profile and describes which servers you access and so on and so forth. Well what they can do now, if they support Cisco IPsec is that they can give you, in your configuration profile, a certificate.

Now when you want to use your phone and you look up any address in the enterprise like mail.myenterprise.com which is your mail server, what we do is we see that that's behind the firewall and we send the certificate over. We initiate the VPN connection, open up the firewall, so your mail just goes straight through. There's no user interaction required.

And the great thing about this is because we're triggering this off the DNS lookup of the application this works regardless of whether you're sending email, whether you're browsing the enterprise's internal websites or whether there's applications that your IS or IT department has given you and put on your phone. Any network request that looks up addresses behind the firewall will trigger this behavior so they all just work.

So that's VPN. The next technology I want to do, again in the networking vein and again with the phone, is Hotspots. Now I'm sure we've all taken our phone into a cafe or an airport or something like that. Got out and tried to look at mail and what do we see?

We see a spinning cursor but we don't see any new mail. Now, if we were smart we'd gone off and looked at Safari then we'd actually see a login page that this is what they were asking you to do. So you type in your username or was it the phone number or maybe it was a home phone number or perhaps it was that account number. But anyway, whichever one what it was you type that in and the password if you can remember it and then you can get access to the Hotspot.

We don't think that's a particularly good experience either. Now these hotspots are what we call captive networks. These are the networks that you find in say airports, in hotels, in coffee shops, McDonald's, that actually allow you to associate with the network but then don't give you access to the internet as a whole because they want you to authenticate to allow you to say that you agree to their terms or have paid or whatever it is that they do.

So we've introduced some technologies to be able to handle captive networks. So what does that do? So when you associate with the network, we send a little test URL with the expectation that it comes back and says you're connected. But on these hotspots you're not and so it sent you back a login page and we spot that and so we mark your phone as being captive which is our little fence here. And as soon as we notice that you're captive what we do is bring on a web sheet. This is not the Safari as a whole. This is just a single page that provides you with a login page so that you can type in your credentials.

And once you've done that as in any Safari page we send it back to the Hotspot but inside there's a couple of strings, one containing your username and one containing your password and we spot that as it goes past. And we save it away. And once we've done that, we take away the web sheet. You're no longer captive so you can access the internet but you-- the key thing to notice is that you're back at the application that you were using at the time that you turned the phone on in the Hotspot which is a much, much better experience.

[ Applause ]

But it gets better. You want a second cup of coffee. So what happens now is that you go back into the hotspot, where, Starbucks obviously, test URL goes back, we mark it as captive, you have the fence around your phone. But many of these hotspots implement a protocol called Whisper and what Whisper allows you to do is to silently authenticate to the hotspot and so what we do is that we remember your credentials, we get them out of the keychain and behind your back we pass them over to the hotspot and then it authenticates and you're no longer captive. Now the thing to notice here is that there was no UI. You turned on your phone, it authenticated behind the scenes and you're good to go, no UI whatsoever.

And so that is our hotspot support.

[ Applause ]

I can tell you I like that one. The next technology again in the networking space we'll talk about Bonjour. Now I have a sleeping problem. I have a printing sleeping problem so there's my computer upstairs, it's the family computer, it's in the den and that is the printer connected to it, it's the family's computer.

Downstairs though in the study I have another computer which you know I work on and I'm sitting there doing a document or whatever it is, and I decide I need to print something. So there's my document going up. Unfortunately, this machine upstairs is asleep so it's like I don't have a computer, a printer, sorry.

So I can't print, that's a problem. I have a streaming problem. There is the computer upstairs again running iTunes. iTunes has this great feature where you can stream content to another machine and downstairs in the living room we're all rocking out on our Apple TV streaming the music from our family music collection up on the iMac. Unfortunately, when that's asleep you don't have any content.

And finally, I have a back to my Mac problem. So there's my machine in the study. I was working on this presentation last night and busy changing last minute things and the next morning a bit sleepy, I go to work and I forget it. There's my office, well actually, there's my office.

[Laughter] And fortunately, I have MobileMe and back to my Mac so it's no problem, connect to my machine at home, get the presentation, I'm all set for today. Except that the machine is asleep so I'm stuck Now there is a solution to this problem, it's a very simple solution.

It's leave your computer on but that's a really powerful solution and then not a good solution and we need a more energy efficient solution because basically it's good to go to sleep. We want you to leave your machines so that they can go and-- they can sleep and be energy efficient. So what's the solution?

The solution is the Bonjour Sleep Proxy. Now how does that work? Well in your house, you have a device that is on all the time. It's called your AirPort Extreme and it consumes very little power so this is a perfect machine to use in order to keep your services available.

So how this works is that when your machine goes to sleep, your computer goes to sleep, it sends all the Bonjour service records down to the AirPort Extreme. Say I have a printer. The AirPort Extreme immediately steals that IP address of the machine and as far as the rest of the network is concerned it is your computer with the printer.

So now in the den I'm trying to print my document. And so I connect to the machine that I thought was the computer with the printer attached to it. The AirPort Extreme will wake up my computer upstairs in the den. It immediately steals back the IP address from the AirPort Extreme. And so now my print goes through with no user interaction from me in the den and no one running upstairs to wake up the machine. That's the Bonjour Sleep Proxy.

And it solves all those problems.

[ Applause ]

So what do you need to do to take advantage of this? Well if you are using the Bonjour service and connection APIs you don't need to do anything at all, all this just happens by magic. If you're not, you need to start looking at these APIs and adopting them in your code as opposed to directly connecting services. But if you think about what we've done, we've taken Bonjour which is a local subdomain service discovery and connection protocol.

We've added wide area access so you can now get to the computer from anywhere and the Bonjour Sleep Proxy which means you can now get to any machine regardless of whether it's asleep or awake. So you can get access to any machine from anywhere in the world regardless of its state. So what you need to do is to think about how you can take advantage of that and innovate on top of our platform. And that's the Bonjour Sleep Proxy.

[ Applause ]

Now the next technology I want to talk about we've heard a little bit about this morning but I should go into a little bit more detail. GCD is a technology that is designed to address the problems of multicore. The big problem with multicore as we've heard earlier on is that our solutions are really SMP based i.e. program in threads.

But threads are hard. First of all you don't know how many to create. And the reason that you don't know how many to create is because you don't really know how many resources are available to you. And not only that, but even the resources that you have, the threads that you created, you don't really know whether they're blocked or not so it's really hard for you outside of the system to know how many threads you need. And if that wasn't bad enough we have these guys to deal with.

You have to work out how to lock and synchronize your application. How many locks do you need, what's the granularity, you don't want too many because that puts too much of an overhead on your application, you don't want too few because then you won't get the concurrency and you'll end up having contention. So having all these locks, now you need to deal with lock hierarchies otherwise you end up with this guy, deadlock. So our solution is Grand Central Dispatch. Now the goals for Grand Central Dispatch are actually relatively straightforward.

It's a programming model for multicores. What you have to do as an application is to break your application into blocks of work and we help you do that. You then hand those blocks of work and the OS gets the work done. It takes care of all those details that you didn't have enough information to be able to deal.

So we've designed a whole new multicore engine. It consists of an ultra fast user space scheduler to be able to switch between your blocks of work. It has optimal thread management, always the right number of threads to maximize your concurrency and never get the spinning pizza. We've simplified event handling dramatically and also we have a simple synchronization method based on queues and all this is a new powerful multicore engine for GCD. So what are blocks of work look like? Well, it's relatively straightforward as I say.

You just cut your application into these blocks, queue them up with GCD, GCD then will find out how many resources are available to it and schedules these little trains down the tracks and because it's part of the system, when more CPUs become available, GCD creates more threads, schedules more work and your application gets better performance without you having to do anything. Now we did this in our own applications. So if you look at Preview for example, it starts off with-- in Leopard which is on the left.

It starts off with a few threads, creates lot of threads when this particular benchmark, which is basically a repeated page load of complex pages, creates lots of threads to be able to deal with the load. When it's finished it tries its best to clean up but the reality is that there's a lot of threads left over, normally for things like event handling and so forth.

In Snow Leopard, we made Preview use GCD and as you can see it starts off with a small number of threads, it scales much better, and then when the work is done it drops back to a sensible number of threads, doesn't use up the resources of the system.

And we did this not only with Preview but also with Mail and you could see a significant difference there. Interestingly with Mail, with GCD we create more threads than we did with Leopard and the reason is that, as I say, it's hard to judge exactly how many you need, and we suspect that in Leopard we didn't create enough.

Similarly with iChat, the same story, better size to under load, saving resources when idle. Now, if you can imagine how this works when you apply this to every single application, most application is idle for a lot of their lifetimes. That means that the number of system resources with the threads, with their kernel representations has dropped dramatically.

These are resources that your application can now use. So GCD has a highly tuned engine. It's incredibly low overhead. The overhead to switch between a completed piece of work and the next piece of work is the order of a function call. That's amazingly fast. GCD is implemented with a completely lockless scheduler.

It uses wait-free algorithms and what that means to you is that you can now implement your application in a lockless way too. GCD scales up not only when you move it from a 2-core MacBook to an 8-core MacPro but actually running it on a machine with lots of cores that has load and then that load lessens. GCD will automatically expand up as the system becomes free for you.

It scales down to which means when the load goes back up on that system it scales back the number of resources that it uses to stop that bit from-- the system from contending and thrashing on resources. And in fact it scales down so far that even if you have only a single CPU available to your application, the overhead on the system is so low that it's still highly efficient. So we have a comprehensive solution. Not only do we have this multicore engine but we have a language extension to make it simple for you to create these blocks of work. We have an object-oriented framework. We have new system-wide APIs and some tools.

So let me talk through them. The language extension is very, very simple in order for you to create lots of work. Basically it's like a function pointer but instead of a star it's a little caret. And you place it in front of the curly braces of your section of code and that now defines a block which you have appointed to, that you can hand to GCD.

It's as simple as that. Now in Leopard we have an object-oriented framework which we introduced called NSOperation and NSOperation is a very high level framework that actually allows you to basically queue work up in a very similar way to GCD but it was a very high level framework. This was so successful that we've extended the functionality to allow you to solve more complex problems and we built in Snow Leopard the dispatch framework. And in fact, it's so efficient that we've implemented NSOperation in terms of dispatch so they're tightly integrated.

We have many new systemwide APIs. Now all these frameworks tend to generate events so whether it's fast system events or spotlight searches or network connectivity changes or reachability changes, all these things are generating events. All these frameworks have got additional APIs which allow you to define a block of code that you want to be executed when that event occurs and then the GCD queue that you want it to be executed on when that event triggers. No more event loops, much, much simpler event source handling.

And this is the way it works, right. There's your work coming in on your blocks and suddenly a file system event happens without you doing anything. It's just queued onto the GCD queue and then managed in exactly the same way as all your work. The system will scale up and scale down and then when the work is complete it tidies up.

Very simple event management story. We've also extended Instruments to be able to understand blocks and introspect what's going on. And there are three main areas, so this is the blocks view that allows you to look at a list of all your blocks. There is one that allows you to look all the queues and all the blocks that are attached to it. And there's another one, a core tree which allows you to see what an individual block is doing in any given instant. It's a very power tools set and helps you understand and introspect exactly what your application is doing.

Now a little earlier on we saw Richard Wright come and talk to us about Seeker and how he took a single threaded app and in a relatively straightforward way he managed to get multicore performance by using as little 12 lines of GCD, of code calling GCD. That was pretty astonishing but what happens if you already have a multithreaded app, what does that look like? Well I'd like to invite up on the stage Paul Carnine who's a senior software engineer at Telestream to tell us about their experience of taking a multithreaded app and using GCD.

Welcome Paul.

[ Applause ]

I'm one of the developers working on ScreenFlow and we've taken a quick look at optimizing ScreenFlow for Snow Leopard. Now some of you may know ScreenFlow is a screen casting studio application that offers a streamline workflow for being able to create a video of your desktop. It also has an editor which you can see up here and this editor allows you to create a very, very complex presentation using various sources such as your iSight, your desktop of course, images, and all of these-- any media you've got floating around in your disc.

Now you can export that to disc and the process to export inside of ScreenFlow is quite simple from a user point of view but from our point of view inside the code, it's very complicated. If you think about the pipeline that needs to happen from decoding all your sources, to going all the way through, to compositing them, and then sending them off to the H.264 code that can go into disc, it's very complicated. And we spent a great deal of effort under the current Leopard Operating System optimizing that.

We have some very low level optimizations, SSE all of this fun things and we have some very high level optimizations. We use a lot of pthreads inside of ScreenFlow during the export process to optimize the export and to do that, we have well over 2,000 lines of simply glue code.

So if we're talking about complications, it's quite complicated to write this kind of stuff. Now the great thing is that Grand Central Dispatch can do away with all of that code and in some instances you can just have one line of code that will actually allow you to take advantage of multicore CPUs. So we said, OK, let's take a look at this and let's try and improve even better our export inside of ScreenFlow.

So we did what anyone of you would do, we'd fire up Shark, we look to the Shark clocks, we'd fire up Instruments, we look at the traces, and we'd look-- we'd find our hotspots and then all we do is trace up the stack and try and find the best place to fix this. Now the best thing is that GCD is so simple that we're able to experiment.

With pthreads it's a little more difficult to find the best place to optimize. So with GCD we just throw in a change here and we might get a 3% boost. Oh, that's great. We'd throw in another change and we'd get 5 or 10% boost and remember this is with an application that's already heavily optimized and uses pthreads all throughout the export process. Now in the end ScreenFlow with the future release of ScreenFlow running under Snow Leopard we'll see up to two times speed improvement during export thanks GCD and the best part is that it uses a heck of a lot less code.

So I suggest that you guys go and check it out.

[ Applause ]

Thank you, Paul. So whether you're starting from a single threaded application and wanting to get some parallelism on the multicore machines or whether you already have a threaded application and you just need that turbo boost, then GCD is your answer. The next technology I want to talk to you about is 64-bit.

So we started our 64-bit journey in Tiger with the support for 64-bit UNIX appdlications. Now these applications were really your scientific or high performance computing UNIX apps, the non-GUI apps. In Leopard, we went all the way up the framework stack to support full GUI 64-bit Cocoa applications. And in Snow Leopard we have 64-bit applications, a 32-bit application is also on the same system but more interestingly, 64-bit system applications, all the major system applications are now 64-bit.

So we're done, right, that's it. Or not quite, the kernel is still a 32-bit program and what we've done in Snow Leopard is we built a 64-bit kernel. So why would we want to do that? Well, the reason is that physical RAM sizes are rapidly increasing. You can see we went from 1 or 2 GB back in 2005, 2006; 16 GB in 2006 and 2007, up to 32 GB in 2008 and today w e're up to 48 GB and very soon, as soon as they get enough dense actually, you should be able to put 96 GB of physical memory into-- in some of our configurations of Xserve and Mac Pro. So what does that mean?

Well in the kernel it takes 64 bytes, not very much, to describe each 4K page of physical memory. Unfortunately, with 96 GB of RAM that's 1.5 GB of kernel virtual address base, and so almost a third of the kernel's virtual address space just to describe physical memory that doesn't include any of the VM data structures, any of the processed data structures, any of your VNotes, any of the file data that you've got cached, any of the graphics memory that you have mapped, nothing. So basically, the kernel has run out of address space.

So that's why we did a 64-bit kernel. But what does it mean for your apps? Well the 64-bit kernel has the exact same user kernel interface which is not the read and write system call. That's actually the system call trap in getting into the kernel mode, as the 32-bit kernel which we planned awhile ago. So your apps are 100% compatible with a 32-bit and a 64-bit kernel which is good news but there's better news.

The system call entry point for 64-bit kernel is actually is 250% faster than Pit is on a 32-bit kernel. Now you've probably never see that actually if you try and benchmark the system because it doesn't happen that often and unless you use micro benchmarks. But the one that you will see is that there is a 70% improvemepnt in the performance of the user kernel memory copy and you see that because of things like read and write.

So the 64-bit kernel actually brings you performance improvements too. But there are things that you do in the kernel as well, there are KEXTs. Now as I said the 64-bit kernel runs on some Xserve and Mac Pro configurations but we expect that overtime to go across a broader set of machines and so consequently at Apple, all of our KEXTs are both 32-bit and 64-bit. But we need all KEXTs to be 64-bit, that's all of your KEXTs. Now we have removed a few KPIs but these were deprecated anyway so that shouldn't be a problem for you.

And then there are a couple of extra ones that we had to remove because they were really very 32-bit specific2 but there are new KPIs that replace them that are not dependent on the word size. The remaining KPIs are fundamentally unchanged and what we found in our experience of bringing KEXTs over to the 64-bit world is that most of them just recompile.

And so that's what we would like you to do to your KEXTs. So that's K64 the 64-bit kernel which completes our 64-bit story. Now one thing I'd like to do is to remind you of the calls to action that I've done today. For Bonjour, you need to adopt the Bonjour service discovery and connection APIs and to look at what we've done with Bonjour to allow you to get to any machine from anywhere in any state and see what you can do with that. GCD, give your application a turbo boost. Deal with the multicore situation, it's very straightforward and very powerful. And 64-bit, those 64-bit KEXTs, we need them to be converted.

And so that's the end of my wander through the technology. We have two great products, Mac OS X Snow Leopard and iPhone OS 3.0. Two great products, at the heart of which we have one great technology in the Core OS and that's it. Thank you very much.

[ Applause ]