Graphics and Imaging • 59:15
Core Image performs image processing operations at blistering speeds to create spectacular visual effects and transitions. Discover how to use powerful Core Image filters in your application to enhance still images, process RAW photos, create video effects, and visualize scientific processes. Learn how you can create filters that harness the power of the GPU for your own custom algorithms. See how Core Image and Core Animation can be integrated to add stunning effects to your application's user interface. This is a must-experience session for developers of image enhancement software, video effects systems, and scientific analysis packages.
Speakers: Allan Schaffer, Frank Doepke, Ralph Brunner, David Hayward
Unlisted on Apple Developer site
Downloads from Apple
Transcript
This transcript has potential transcription errors. We are working on an improved version.
I'm here basically because we know that there's a lot of attendees here at WWDC who this is their first time. And so I'm just going to spend maybe about five minutes giving those of you who are new to Core Image a quick overview, very, very quick. And then we're going to bring the engineering guys up to dive very deep on it.
So the basics of Core Image. This is an image processing framework that we first distributed in Tiger and now it's updated in Leopard. And the whole idea is to be able to do image processing very, very fast. And it does this by leveraging the GPU that's shipped in all the machines that we, that we support in Leopard. Under the hood it's actually using OpenGL to load fragment programs that encapsulate the image filters, and the image processing effects that we want to do.
And so you know, the idea there is that there's a, there's some of you in the audience who have done image processing, you've written it yourself in the past, and typically done that work in the CPU. And so, and perhaps you're doing something where you're just you know, in a four loop running a convolution filter over an image.
But by moving that onto the GPU, you can essentially kind of push all of the pixels through in a very parallel fashion, and they run on the back end with this fragment program, and so basically lets us do some very heavy lifting for you, and use the GPU to do the hard work.
Of course the whole pipeline is floating point, it's color managed, there's a lot of image processing filters that are included in the bundle that you get in Leopard. And then now also you can extend the base set. If you have a custom filter that you would like to write, you can do that, and then you bundle it up in a plug in called a image unit.
So the basic idea here, just the very, very basic concept of course is that you're, you're doing per pixel operations on an image. And so here on the left we have a picture of Copenhagen. We're running all of the pixels through a hue adjustment filter, which is basically just, if you think of the color wheel, just sort of rotating the whole color wheel around. And so you know, blues kind of change in this case to oranges, oranges sort of rotate around to blues.
But you know, like if you look at the boats for example, there's a white boat and a black boat, and those are staying white and black in the final image, because you know, what are you doing there? You're kind of rotating a gray around different grays, right? So that's what a hue adjustment filter is doing.
But so the output of this of course is the image on the right. But it goes a lot further than that. The real idea is that the filters over the you know, a hundred different filters, or you can repeat filters as well. They can be chained together to kind of stack different operations on top of each other.
And so we start again the same way, original image of Copenhagen, we're running that through a hue adjustment filter. And then as a second step we get, we take that image and run it through a blending filter, and that's how we're adding this sort of starburst effect onto the final image on the right.
But it actually works a lot smarter than that. It's not doing these step by step processes, sort of completing a whole image and then going through to the next part, and completing another image, and so on. What it'll do is, these filters are kind of like recipes, and it'll chain the filters together, compile them essentially into a single unit so that each pixel only has to run through kind of to the, you know, one set of recipes.
So by chaining these filters together, we still get the same final image, but it's a much more efficient approach. Now the reason why this is so important is because of the optimization possibilities in this case. So here we have just a lot of filters kind of going from a start image to an end image.
You know, same thing, hue adjustment, we're adding that blending filter, it looks like we have a pointelized filter being applied. And so, you know, but the thing is like let's just imagine if that fourth dot on the right was a crop filter. Then by optimizing this all together into sort of one big recipe, we're able to eliminate the work of the pixels that are going to be cropped out. There's no reason to run those pixels through the hue adjustment, through the blend, if they're just going to get discarded at the end of the chain.
Okay? So that's just, that's the very basic idea of how core image is working, and now to show a demo of that and go much deeper in the session, I'd like to bring up Frank Doepke who is going to take us the rest of the way.
( applause )
Okay, good afternoon. My name is Frank Doepke, working on the Core Image team. I hope you all had a nice lunch. Let's take bets if Paris Hilton's lunch was better. So as Allan showed already nicely how to use Core Image to demonstrate the effects of global warming on Copenhagen, although I heard it's raining there. I would like to dive a little bit deeper.
He gave us a nice overview, but let's look a little bit more what we can really do with Core Image, and how all this stuff works. And I'm picking actually one of the demos that we actually have on your Leopard disk, it's the image unit UI demo, IUUI. And so let's have a look actually what we can do with this. And I would like to go back to the demo machine.
So this is a really, really small application. If you look at the code base, if you want to take that code and read it to your child to put it to sleep at night, you might take some extra reading, it's really small. Plus it's riveting, so he would not fall asleep over that.
( laughter )
We are on the wrong machines, thank you. Ha, now you see the same thing as I see, good. Okay. So what this is, we have a little image here, and I already brought up now, oh let me do it again so that you see where it comes from.
We have a little filter browser, and now you can see I have a whole set of filters. We have like all these different adjustments, we have blur filters, I can pick one, I can see a little preview of it here, Gaussian blur. As Allan said, there's a set of over a hundred. Unfortunately he's not up to date, it's over 125 filters that %we have already.
And this browser's actually part of our new NetWare that we already provide you for it. So I can see what the filter does, and now let's make some image adjustments. So I'm not a graphic artist, so bear with me, I'm not the best one. And, I want to do something that looks like an old photo, let me start with something like sepia, always a favorite.
So I added to it, now I have a sepia filter, I can adjust here on the image. Now next I want to give it a little of a gloomy effect, and there it is. You see, just typing, I find my gloom filter. Now it looks like almost like a faded image already, and although there's like multiple filters, I can adjust each of them, and you see this actually happening in real time, going through all the pixels. Now let me add actually in between those two filter, like a gamma adjust filter. Oops, if I can type here, there we go.
So, now I have like some nice effect and make it like, well, looks like an old photo that you probably find in your attic from your grandmother or something like that. Now what I can do is I can actually export this filter, and let me save it, I can call this old photo.
And I can give it some description, I'll skip this for the moment, yeah. And on the sepia tone filter I want to export that you can later on use to input intensity. And I do the same on the gloom, and actually leave both the same name. What this means actually is I'll take both of them and adjust them later on as one control.
And let me also from the Gamma Adjust, I want to have the input power. So I'll export this now, and you go, and the nice part now is I can just bring this into our library, we go into the graphics folder, into image units, and save it right there.
So let's quit for now the image in the demo application, and for those who've already worked with Core Image on Tiger, they probably know the Core Image funhouse application, this is our old school way of doing things. And now I can go in Copenhagen, must be our city favorite today, and I can go, this is now the old way how we did the browser there that was all done by hand. If you want to do it by hand, so this is the sample to look at.
And now I can go into my stylized filter and I see old photo. I can apply this one. It looks like one filter that I just put together. And now you can see the intensity takes a sepia together with subdue.
( applause )
Thank you. Okay. Instead I would like to go back to the slides please.
So how did we do all this stuff? So let's get to know Core Image a little bit more. Core Image is all about doing this stuff on the GPU. So we do it very fast on the graphics card. We have, as Allan already said, a full floating-point, color-managed pipeline, and I will go into some details on that later.
As I said, we have over 125 filters, so there's plenty for you already to play around with. And if you have filter ideas and you want to write your own effects, go ahead, write an image unit. And you can, as you can see, you can like make it available for everybody.
The key point about these filters is the filter kernel. And this filter kernel actually is architecture independent. And that allows you that, for instance as we can do it here. If the GPU, so it has to be fragment program L. Or if this is not the case, that means you will fall back to the CPU.
And there we can actually go to the SSE, we can do the velocity engine for the PowerPC chip, and we can also do, as you saw already in the keynote, is the whole thing in 64-bit. So those are the key points about what Core Image is about. Now what do you need to work with Core Image?
First of all, you need an image. And that is really the object that represents really what we have to render. Now image, we always think of it more like as a pixel bucket. This is not the case, this is more rendering recipes. It really just captures like all the steps that it has to do to really later on render onto it.
And we have the filters. And the filters actually have the processing kernels, and those will be the ones that apply the effects. They typically have a set of parameters. Some of them actually don't, but most of them have like input images, and radius as we saw already in the demo.
And they produce one output image, and that's what you want to then later on draw. You need to draw it, you need a context. And that's a CIContext. So this is where all the rendering of CI later on goes. It can be based on an OpenGL context, or a CGContext. There is something special about this core graphics context, I will go into some of these details later, especially for those who have already worked with it on Tiger.
Okay. Let's do it once more and step by step. We start with a CIContext. Can somebody behind the stage shut up? I hear some voices, and somebody's talking on the phone, that's not too good. Okay, so we start with a context, then we have an image from some image data, from a file or that you have from data in memory. Next we create a filter object, and if you need more than one filter you just repeat that.
And next step is we need to set the parameters on the filter. So now you set your radius and your intensity, and stuff like this. And last step but not least, I have to click it once, there we go. We draw the whole output into the context. That's it. That's all what you need to do to draw this core image.
How does it work? One thing that's important to know is Core Image is lazy. We all are, but Core Image is specifically lazy. The reason for this, we have a lazy valuation. This allows us actually to defer all the rendering really at the, until the draw time. So we are not like when you apply your filter immediately rendering anything. Nothing's happening, so this is pretty much like for free I would say.
With this deferred rendering, what we can do is we can concatenate this further. So we will look at all these filters, and then really see okay, we really only need to touch these and these pixels, and we can also combine some of these filters together to get a combined effect. So this is happening at run time with a just-in-time compiler.
And the benefits of course is first of all that we have way better performance, because we can do multiple things in one pipeline rather than doing it step by step by step by step. And also it allows us way better precision, cause we do not (inaudible) between any of the intermediate steps.
As we said, we do color management. So what does this mean? The whole output is color management. We go from an input image, and bring this into our working space in which we do all our rendering. This working space is Generic Linear RGB. It's light linear, and it has an infinite gamut. This way allows us to do all the further processing, will not create any banning because we run out of a gamut space. And then we go to the output space. In the diagram this looks like this.
You see a bunch of images basically coming in, all the filters doing the work. And in the end it gets put into the target output, target context. So you might set up that output context to whatever color space you want. And the images now may have a color space attached to them, or if you have something that you rendered on your own, you might have to set that yourself.
So we say we can do this really, really fast. Now you saw in the demo that was already pretty fast. But let's talk a little bit more about the performance. First of all, we have a little tool that we use internally to measure if we are really doing everything correctly. It's called CIBenchmark. And what we show here is, we did a little trick.
Normally Core Image will use, depending on how many cores you have available, the right amount of threads and scale nicely across (inaudible) scaling. So we did a little trick that we can actually turn off threads and actually see how well do we scale. And as you can see there with the first thread, you need about 8 Mpixels up to 33 Mpixels 3 when we use really all eight threads on an eight core machine. Now how does the graphic card stack up to this? And this was actually a run of the middle graphics card almost, it's a 7300 from NVIDIA, nothing really too fancy, and it still beats the eight core machine, and twice on Sunday.
So let's look at another filter here, we'll just pick up something special here. It's the ExposureAdjust. It's just a really simple filter. See again, we scale nicely over the eight course, let's look at the graphics card, boom. As Steve would say.
( laughter )
Now CIConics, this is a really computation intensive filter in this (inaudible) bound. We compare that with the graphics card, it's still faster, but not as much. So your mileage varies a little bit depending on which filters you use and how you combine them.
Now one part that we are proud of is that we are faster than ever than we've been in Tiger. So let's compare like the CIGaussianBlur specifically. Now it's fast enough actually like even on our low end platform, like the MacBook, that you can actually run effects in real time on the GPU with the Gaussian blur, because we are oh about over nine times faster than what we've been in Tiger.
( applause )
Thank you. So those were charts. Let's show it off. And now comes the tricky part. We had some technical problems with this machine, so let's pray that this will work. Okay, where is my mouse, okay there. And I'm missing something here. Yeah, there should have been actually something that's on this machine. (inaudible), sorry.
( laughter )
All right. So we have a little image here, let's apply some effects to this. And now what we are doing here is, you see this is now with one thread, basically on this eight core machine.
And we will now be going up and make this a little bit faster by using two, going forward three, going forward to four cores by now, and you see how this effect slowly goes most around? But it gets faster and faster by the more cores we're actually using by using our threads.
So just (inaudible), it's nothing that you have to do in your code, this is just a sample actually by something that we can do by turning off threads in Core Image. And you see it scales nicely, going all the way pretty fast. Let's see what the GPU can do. There we go.
( laughter )
( applause )
Thank you.
- At that one point that was better GPU.
- Yeah, in all fairness this is a high end GPU, course almost as small as a small car. I don't know if that's quite true, but it is the Quadro card from NVIDIA, so this is really a high end one. But even the 7300 was still faster than the 8 cores.
So the one part that I showed you in the image unit demo application is that I was able to take a bunch of filters, chain them together, and export them as one new filter that we used in the funhouse, my old photo filter. So this is what we call the CIFilterGenerator. This is new in Leopard.
It allows you to concatenate multiple filters and wrap them into one filter. And it can be stored so that you can reuse it as a macro. Now how does it work? It's all about connecting the filters. So when you want to do this, you need to connect basically a source to a target filter. The source can actually be already an image, this is something that I didn't show in this demo. But you can just provide a path, and then you have a default image. That's important when you use a filter that needs an environment map for instance.
Now if you want to declare basically, like as I set the little checkbox, this means like okay, this input of this filter I want to later on export to the customer to be used. And I can actually with this way also set my own parameters, and I can combine as we showed already, like I take the same input key on multiple filters and actually channel it into one, so I can control multiple aspects through one control.
You can set your own attributes. This is important when you for instance sometimes have this particular problem that you have a filter, but it has a way too high maximum value for your production. So you can actually set a normal now default where the maximum value is smaller.
And one thing that's important is you always need to declare an output image. As I said, every filter has at least an output image. And then you write this for the generators until it's actually a plist document. And every application that supports image units will automatically support this new filter.
Now you saw that we have UI, as I showed already the filter browser, and all the stuff that happened in the filter panel in the image UI demo application, hence the name, now has in a way of like actually using an automatically created UI from the filter. So this was a big request that we got from you from those work requirements in Tiger, where when you looked at like the old funhouse application had lots of code in there to create all these panels, now this is all there for you. And you actually get a view based on the filter with all its controls. You can configure the size, if you want small, medium, or the regular size controls.
And you can exclude also certain keys, like if you look in the (inaudible), all of these filters do have an input image, but they didn't show up in the panel because it's already the input image coming from the document. And if you write image units, and you say well I want to have, really have my own look to it, you can provide it with your own branding as well.
And then we have the filter browser. The filter browser allows you nicely to pick the filters, you can get them as a view so you can embed it into your windows, you can get it as a sheet or as a panel, just as you like it. And you see all the filters, you get the description. And one part that I didn't show was very much like the font panel.
You can collect your favorites. If you have a favorite set of filters, you can put them into the favorite and find them very easily. And all of this is actually in the image kit framework. So since this is a little bit higher level abstraction which depends a lot on AppKit, you have to go into image kit and find the headers for it there.
Now in general we have some new API. Also, for those who already worked with it, let's have a quick look over that. We did some refinements all over the place based on your feedback again, and we have now convenience functions for some of the common tasks, for instance to create an empty image, this was one of the things that was not so easy before.
Now we have also constants for all the common keys in the filters, so this allows you actually to just type in KCI input image, and export should do the code completion for you. That makes life a little bit easier there. And you can also now provide documentation for your filters. If you write your own filter, and you want to of course explain to the user what this filter does, you can actually provide a UL, and actually see a video documentation for the filter.
Now clip in your bindings, we are not going skiing, but what this means is Core Image is using key-value-coding. And one thing that didn't work in Tiger, but that works in Leopard now, is that you can observe the output image. What that means is for instance if I have filter A, and I can take the output image, put it into filter B, and then that output image into filter C. When I do this, and I changed in Tiger something on filter A, I would have to again, take the output image, put it into filter B, take that output image, put it into filter C. You don't have to do that any more. In Leopard it would automatically observe if something changed on filter A, it propagates all the way through, and filter C is the only one where I have to observe the output image and draw. Makes life much, much easier that way.
So this allows you to always automatically update your filter chain. And as I said, we have new filters, let me just showcase a little bit of what we have here in filters. So this is just a little overview, and might be even some more coming. So try those out, play around with them, and I give you a tip how to try those out. Comes in a moment. Now I already mentioned that we can either create an OpenGL context, or a CGContext as the base for CI.
With Quartz GL, which is new in Leopard, we have actually something special on the CGContext. In the past when you used the CG-based context, what happened was CI did all its rendering up on the graphics card really spiffy and fast, but since CG was of course bitmap based, it had to read all the stuff back, and then composite in the CG space.
Now this reading back from the graphic card is not really coming for free, this is actually relatively costly. Now when you use Leopard, and your application opts in to use OpenGL, you can create a CGContext, you don't have to write any OpenGL code, and everything stays on the GPU.
And we can show this already, like if you look at the HazeFilter sample, the TransitionSelector, and even the Exposure sample, those three have already been converted by actually just using in this case the any reports GL key and the info plist as also a way to do that on a window by window base. Now let's see actually what is the performance difference between those two.
So on the left side you see it took, and I just did this as a quick test, the CIHazeFilter. It took about 150 milliseconds to render the old way when I had a CGContext and used a CIFilter. If I turn on Quartz GL and opt into this, it takes four milliseconds, a little bit of a difference there.
So this is it. There are some tools also that makes life a little bit easier for you to develop this core image. First of all, debugging. We all have to do that at one point. And for that we can use Quartz Debug now. It knows which filters get executed, and you can actually see which filters are in there. Sometimes you might be surprised, there might be a filter in your pipeline you didn't think that you had in there.
And you can see how long it takes, especially for those who do like video applications, you need to find out if everything happens in real time. You can figure out okay, what is the time spent on those filters. So look at Quartz Debug for debugging. Next we have a widget. This is actually coming with Leopard. And it's in the developer tools, I think right now it's installed under extras Core Image. It should later on actually even like be right in the dock when you have Xcode installed.
You can browse the filters for image like the (inaudible), you can find filters. But this is more in tune for the developers because you actually can get to know the filters. You see the, all the keys set out there for the filters, and all the parameters for those. And you can actually check as you can see, oh well it's probably a little bit hard to read. But you see there's a little line that tells you if this filter is available in 10.5 or 10.4.
You can try them out, because you see this preview down there? You can drag in your own images and see actually how that looks like. That's the way how to test out these new filters that we have. And one neat feature is that you can copy the code.
So if you found the filter that you like in the widget, hit command C, go back into Xcode, and just paste it in, and it gives you the completion already like scenes filter equals CIFilter with name, and all the parameters in there. That saves you a lot of typing, that's actually very convenient.
Now as Allan already pointed out, you can be part of Core Image, right image units, we have lots of documentations for this. There's a whole set of like how to do this, and it guides us to all these little steps. And we have an Image Unit logo program, so you can actually put this little icon on your box and saying I actually support image units, or upgraded image units. And now, this was all a little bit dry, let's bring up mister nine espressos for the day, Ralph Brunner.
Thank you.
( applause )
I do know for a fact there are people from Copenhagen in the audience. So let me start with pointing out a bunch of places in the US where we use Core Image essentially for UI visual effects. One you know very well from Tiger is the Dashboard ripple can have pretty obvious effect, so I'm not going to spend a lot of time on that. New in Leopard is the menu bar. And it was mentioned in the Keynote, and kind of was not what, wasn't mentioned was why this is actually interesting. So let me try to explain that a little.
So what the new menu bar does, it has a stack of filters that examines what is underneath the menu bar. It computes an average color, and then based on that average color, starts to adjust values, like how transparent the bar is, how much light gets pushed in from the back, the inside glow, and all these kind of things.
You might think that is, well the purpose of this is to make sure it looks good on as many backgrounds as people can possibly have. You might think that is a bit excessive for something that's 22 pixels high, but the point I'm kind of trying to make here is a kind of first step to go beyond standard alpha compositing.
So by actually putting a bit of logic into the filter that kind of measures what's underneath and adapt appropriately, you know, a little bit of that knowledge that the designer have applied before is now you know, written in code. And I expect we are going to do more of that, and that is something to think about you know, how can you use filters to gather statistics, and then react, kind of make your visuals really match.
A second example I'd like to make is in Leopard underneath the sheet dialog, there is a small and pretty subtle blur. It's not clear you can see that in there, if you floated away from the protected surface. And again, this is a pretty subtle effect, and kind of there's a lesson there, is most filters that you would, would use most special effects you do, you know, they're just a small touch to kind of enhance the experience.
And that you know, sometimes if you're in an application, if you build an application that does something you know, that's really in your face, like think Omni Dazzle. Then the effect really has to be at the forefront, but for most applications you really want you know, subtle touches that just overall improve things.
So with that I'm going to switch gears, and talk a bit about more advanced use of Core Image. The example I'm going to explain now is essentially how do you do color tracking on the GPU. What we're doing, we're going to take and build a mask based on a specific color in the image, we're looking for a specific color, and then compute you know, the center of gravity of that object, and then apply an effect based on the parameters we found at that location. And that makes much more sense if I actually show a demo, so let me do that first.
( laughter )
waving a pink ball around, which is kind of silly, but you know, bear with me, you will see how silly this really gets. On the right side you see the mask that was found.
Essentially we're looking for pixels that are looking somewhat like the pink on the right side in the color well, and then we display that mask. I can actually overlay that mask to make this a bit more visible. So just the white area in the video is what the filter thinks the mask is. And based on that mask, we are computing the center of gravity of that object. And I will explain in a minute how that works.
And after that we are taking that position and feeding it into a compositing filter, which does the following. It takes a duck and puts it in the same place.
( laughter )
And you will notice that there is actually a depth, a distance estimate going on, so when the ball comes closer, the duck gets bigger and gets smaller, and that is done by not only measuring the center of gravity of the object, it's also measured the visible area. And based on the visible area, you can compute a distance estimate, which is essentially one over (inaudible) area.
( applause )
So to summarize what's going on here. So first the mask image by color, and that's really just essentially a Euclidian distance. So you take current pixel color minus pink color, measure how the distance is in you know, three dimensional space.
And if it's further away than a certain threshold then we consider it outside the ball, otherwise it's inside the ball. One subtle detail there. Before the, before that distance computation, we take the pixel, and take the pixel's color, and divide it by the luminance. And the reason is the ball has a shading on it because the illumination isn't completely uniform. So by dividing by the luminance you get a solid pink color, and that makes for a better match.
The next step is for every pixel we take its XY coordinate and multiply that by that mask, and store that, and this is the kind of the funky part, store the XY coordinate that you know, you multiply XY coordinate as in the pixel as the red and the green components. So this is where you know, having floating point images is really useful.
And so at this point you have an image which is zero everywhere except for the ball which has you know, little tuples of XY all over it. And then we use a new filter in Leopard, the CIAreaAverage, which computes the average color essentially of an image. In this case it computes the average XY coordinate over all the coordinates in that ball, which gives you the center of gravity.
Now another interesting detail there is that the CIAreaAverage filter, the output is a single pixel image, it's an image that's one pixel wide and one pixel high. And the reason for that is on GPUs it's really hard to get data from one stage to the next if it's not an image. So well, let's make it an image. Floating points should just work. So the output of that filter is a single pixel, and that pixel has in the first two components a coordinate, and in the alpha component the visible area, which we use for the distance estimate.
And the last step is well, the duck. So the last step has a filter which takes the video frame, and it does a secondary image which is that 1x1 pixel image, which is the found coordinate. And then it scales and composites the duck appropriately. Oops. So if you would like to learn a bit more about that, there is actually a chapter in the upcoming GPU Gems 3 book that describes how this works. In particular, it also describes how the CIAreaAverage filter is implemented.
So if you want to do other statistics gathering on the GPU, which isn't min, max and average which we added in Leopard, then there you can find how to do this. So if interested in these kind of things, it's available in August, look in your favorite bookstore for the cover with the friendly face.
( laughter )
Okay. So a second example I would like to show is kind of similar to the first one, but now instead of tracking something by color, we are tracking it by geometry.
And so that just kind of came up in our, as an internal usage for us because we have to support a large number of cameras for (inaudible) support which actually we'll hear about a bit later. And that involves calibrating these cameras. And one part, and that's the only part I'm going to talk about here, is once you have an image with one of these color charts which have known color values where you can build your calibration information off, the part I'm talking about is you have an image, where in that image is the chart.
( laughter )
So the goal is to find the chart, we need the location, XY coordinate, we need the size, width, height, and we also would like to have orientation, the rotation angle, simply because it's sometimes really hard to make it perfectly straight, so it would be nice if the filter would handle this. And we want to do this by looking for geometry and not for color, because well, do using it for color calibration so at the beginning color isn't particularly reliable. So that's why we are looking at tracking geometry. So with that I have a demo.
( laughter )
So let me stand in front of this camera, okay. That's by the way the only reason why I'm wearing this shirt.
( laughter )
So what I'm having here is this color chart, and where I'm moving this color chart into the frame. Here is the chart, yeah, it tracks it you know, size should work reasonably well, angle works most of the time.
Yeah, one interesting thing here is there's not that much light here, so if I move the chart fast I get motion blur, then the detection completely fails. Interestingly even though there is actually no code in there, tilting it away from the projection plane is still working reasonably well. But you know, after about 20 degrees or so tracking will fail.
Okay. So one thing I would like to draw your attention to is that little quarter circle histogram on the right side. And this is essentially a histogram over all angles in the image. And because all of these squares have a, essentially angles that are ninety degrees apart. So as I move the chart, you see this spike moving to different places.
And that gives us you know, the rotation estimate, how that is done. Essentially we're running a gradient filter, then a histogram over the gradients, and then we look for the spike in that histogram to figure out what the orientation of the chart is. Okay, with that I will go back to the slides.
( applause )
Okay. So first step. As I said, color is not reliable in this application, so the first thing we're going to do, we convert to grayscale. And there's kind of the straight forward way, which is convert to grayscale based on luminance, and that's what you're seeing here. So the left side you see the original, the middle image is based on luminance.
There's also a filter, which is probably the most simple filter we ship in Leopard, it's the CI maximum component, which converts to grayscale based on the maximum red, green, or blue. And for this particular application, this is perfect. Because we really want, we want to find these squares, and we want to you know, differentiate between the squares and the gaps, and the gaps happen to be black.
And some of these colors on the chart are just saturated red, saturated green, saturated blue, so by using the maximum component, I get a more, an image which has more white or light gray patches, as you can see here, which helps the detection. So that's kind of a lesson here if you want to do these kind of things, you know, any kind of domain specific knowledge you have, you know, exploit it, because that makes these things much easier.
Okay. So the next step is called a row projection. What's happening is we are computing the average over each scan line. There is a CRO projection filter in Leopard, so you can use that one. And because showing, so what's happening, you take this image and we create an image which is one pixel wide and the full height of the image. And for each pixel will contain the average of each scan line.
Because showing a one pixel wide image is really hard on the projector, what I'm showing here is instead a luminance plot of that you know, really narrow image. And what you're seeing in that luminance plot is there are 10 pulses in there. Essentially for each row of tiles you get a high value, and then you have a black gap in between them, so get a low value and then so on, ten times.
So what we are going to do now is match a pulse signal. We know we have to look for 10 pulses at 2D. It's essentially kind of an auto correlation that we're doing here. And we try out every start position and every pulse width. But because it's a 1D image it's actually really fast to do this. And the best fit of those 10 pulses to this signal will tell us where the chart is in the vertical dimension. So we know where it starts and how tall it is.
So the next step is called a column projection. But before we apply that, because we currently just gathered some information, we know the vertical extent of the chart, so we can actually crop this image, and remove everything outside which we know is not chart, so we're not really interested in it, and then do a column projection, which is pretty much equivalent what we did before. The row projection is just an average over each scanned row this time.
You will notice that the luminance plot down there, the pulses and the gaps are now much more pronounced. And that's the result of that cropping, because we're no longer diluting the average with stuff that doesn't belong to the chart, we're getting a really clean signal. And we do the same thing again with before, we fit 14 pulses on this, we find the start position.
And we also kind of reduce the search space now, because from the previous pass we know how tall one of these squares were. And well, they're supposed to be square, so actually the width has to be pretty much in the same range. So we can only search for a very small number of widths to find to match the pulse signal to this spot. Now the other creation also gives you kind of a goodness indicator into how well pulses fit to this chart.
And we can just use a threshold to kind of determine empirically, to figure out well, chances are there isn't charting here at all, it's just you know, a local maxima that doesn't have any kind of pulse nature. Okay. And with that we're essentially done. We found the angle in the beginning, and rotated the image to make sure that you know, this row and column projection works. And we found a vertical extent, the horizontal extent, and the start positions. So now we have our chart.
Okay. So with that I would like to kind of give some conclusions out of that experiment. So first of all why implement that in a Core Image filter? Because clearly you could write some C code that just goes over arrays and does the same thing. Well some of the advantages you get out of it is the Core Image runtime will go and build SSE code velocity engine code, R fragment (inaudible) GPU for you essentially off the same code base. So that's really convenient. The second, as Frank was already mentioning before, it will scale to eight core machines, and whatever gets released in the future, it's probably going to work.
And another aspect is that you don't have to worry about image formats. So I did this demo with video which comes in as eight bit per component by UE, but we actually used that stuff that comes from still image cameras, and that is 16 bits per component RGB, and the code just works, because Code Image will convert the proper pieces to you know, into the space that the filter expects it.
The video playback via Quartz GL point here kind of seems innocent, but to me that's actually a really profound part. What you just saw was essentially app kit in the drawRect function, and grabbing a video frame and drawing it into CGContext. And that is possible because of QuartzGL. Because as the video frame comes in, it gets uploaded to the GPU, and all the processing and all the rendering gets done in the GPU.
And this is kind of, this is new in Leopard, we could never do that before. You now can actually draw text on top of video without having to do with child windows and all these kind of things that are really ugly. So, I mean to drawRect for this demo was literally 20 lines of code, and most of it was to draw the little bar charts on the side.
And the last point I would like to make is yeah, mind the color matching. Turns out when I was debugging this, I had some issues with you know, trying to find bugs in bin filters. And what I did, I used the best debugging tool ever invented, printf.
( laughter )
And I just took the value out of the orientation angle and printed that in the console, and well it was wrong.
And you know, when you move the chart it went up and down properly, it was monotonic, you know, so I'm thinking something was right. But clearly scale was wrong, and I was wondering did I forget to multiply by pi or whatever.
( laughter )
And it turns out it's really important to understand the color matching part of Core Image, because that, it tries to do the right thing in terms of you know, processing images. But once you go and put angles or XY coordinates into pixels, you probably don't want to color match those.
( laughter )
( laughter )
( applause )
Thank you Ralph. Ralph did a great job of explaining how you can tackle some very complex image processing problems using Core Image. I want to talk about another one today, which is sort of an interesting problem. So imagine you have an RGB image, and it's missing 50% of its green pixels, 75%, three quarters of its red pixels, and three quarters of its blue pixels, and your job is to recover all the missing pixels.
Sounds impossible. But in fact this is a fundamental problem that's, needs to be solved to produce images out of most image sensors that we see today. Ranging from disposable cameras, to high quality video cameras all need to solve this fundamental problem of how to recover all this missing data. As a bonus challange, you need to give those images to the most discriminating eyes that are available, and they're going to try to see if there's anything missing.
A visual way of looking at this same problem is that while your eyes can see an image like this, what most bare sensors see is an image like this. And the idea is to be able to recover the full image in a way that's pleasing and undetectable to the user.
So this problem is generally known as the debayering, and Apple has been working on this problem since 2004. Our algorithms fully leverage SMP and GPU programming wherever possible. And we have some new developments coming in this for Leopard that give us some of the best results we've seen. So we're very, very happy with the direction we're going on this. And as always, we're very concerned about getting the best performance and the best quality to meet the demands of the most demanding users.
So let me talk a little bit about what's involved in RAW image processing. First of all, we need to be able to decode the file to get the actual bare sensor data. And then we need to be able to extract metadata from that file in order to handle it correctly. Then we do spatial reconstruction, which involves de-bayering, interpolating the pixels that are missing, compensating for noise, chroma blur and luma sharpening to make the image more pleasing.
Then there's a whole litany of color processing that's also involved, highlight recovery, adjusting exposure and temperature/tint, converting from the scene-referred nature of the sensor values to the output-referred color space that is desired for rendering. And lastly, part of image processing is making the image not just accurate, but pleasing. And so there's an additional amount of steps that is often very desirable to adjust the image to make it look pleasing to the eye, increasing contrast for example.
So this is a very complex process, and requires dozens of parameters in order to handle it correctly. And so for each camera make and model that we support we have default parameters for all these cameras. We are also constantly improving our RAW processing methods. And so the system maintains a set of method versions that have been supported in the past, so that applications that need to be able to reproduce old results can continue to do so. And lastly as I mentioned before, Leopard has a new and improved method that greatly includes, improves handling of moire noise, high ISO noise, and also adds broader support for D and G formats.
The last thing I want to say though is that even though we've done all this work to get good default values for all the cameras, the default parameters are never going to be perfect for all photographs or all photographers. There's a high amount of taste and personal preference that's involved in debayering images.
And so ideally you want to give your users control over the rendering of an image to get the best results for your users. And this is the main benefit of shooting RAW on cameras, is that you can give your users this control, because the image has not already been processed as an in camera JPEG.
In order to provide this as a feature that you can take advantage of in your application, we have a new filter in Leopard call CIRAWFilter, which is designed to give your applications the ability to give your users control over the RAW processing pipeline. And also it's designed to give fast interactive performance.
So here's roughly how this works in the process. You start out with a RAW image, either URL or data, and today you know, you can do this in either Tiger or Leopard using the core graphics APIs. If you use the core graphics APIs, we'll use the default parameters, for example, exposure and temperature and adjustment. From that URL or data you create a CGImageSource. From that CGImageSource you can get a CGImageRef. And then from that CGImageRef you can draw that to the display.
And this works great and gives you a good default image. However, if you want to give your users control over the rendering of this RAW image, then we have a new ability which is to use the new CIRAWFilter. Again, you start with a RAW image, URL or data, and we're going to create a RAW filter where we add into it the default user adjustments as input parameters. This can also be done with key value bindings, which is really convenient.
From that RAW filter we can extract a CIImage, and from that C image, CIImage we can display it to the screen. The great advantage of this way of rendering RAW is that now you can give the users the ability to control the sliders, which then get fed back back into the pipeline, adjusting the input parameters, and therefore altering the output CIImage. And this can be done very quickly.
Of course you don't always want to go to the display, sometimes you need to go to the, an output file to be able to convert for example a RAW file to a TIFF or a JPEG. This can be done also using Core Image. From the CIImage you can create a CGImageRef, and from the CGImageRef you can use CGImage destination to write to a file or TIFF, or whatever format you wish.
This is how simple the code is to use this new filter. It's basically three things that we're showing how to do in this little code snippet. We're creating a instance of the CIFilter by calling filterWithImageURL, we're then setting in this example just one option, which is the exposure, and we're setting the value to minus one to darken the image a bit. And lastly, once we've set the input parameters, we get the output CIImage by asking for the OutputImageKey. So now let me give a demo of this, and show you how this works in practice.
So. So one of the key advantages of RAW is that because you're, you get to do the development yourself, you get to get a lot more of the original sensor data out than you would if you were using an in-camera generated JPEG. So I want to show a couple examples of images where you can really see the benefit of that.
So I'm going to go to open here, and I've got pairs of images. I have an in-camera generated JPEG and an in-camera generated RAW file. And we can open these both up. I should mention real quickly that this application we have right here is a newer version of what's available on the SDK, but the principal is the same. All these sliders and controls that I'll be showing are also available in the sample code that's on the disk, and there'll be updates to it soon.
So what we have here is a picture, a really great picture of the space shuttle as it's coming out of the vehicle assembly building. And you can see parts of the image are very bright, and parts of the images are very dark and in the shadows. And this is a typical image where you can really benefit from having controls over how the image is developed.
If we look at this image here, we can see kind of in this area that there's some strange, I hope you can see it, strange scion, it's like I don't know what kind of clouds those are, I don't even really know what's there. Is that outside or inside? And if we bring the exposure down to see if it's blown out, we see that there's nothing, we don't really learn anything by looking at the exposure, as we bring the exposure down.
If I go and zoom in, you can see it's still kind of hard to tell what's going on here. And that's because when this image was converted to JPEG by the camera, all the values were capped at 255. So as we bring the exposure down, there's no content there to reveal.
That's entirely different with RAW files. With RAW files as you bring the exposure down, you can actually see the content that was really in the scene, which is this scion tarp that was present. And if I zoom in here, not that far, you can really see the detail in that tarp. So this is a textbook example of the benefits you get by manipulating an image, a RAW image. And again as you see, we're doing that, we're getting very interactive performance on these things. So let me set this exposure to minus one exactly.
And now as we look at the image, let me go back to zoom to fit. Because we brought the exposure down, a lot of the dark areas got darker. And oftentimes that's desirable, it can make an image look more contrasty. But other times you want to show detail in those areas. And this is one of the things we've spent a lot of time on for the new version of RAW in Leopard, is to give new and better controls to adjust the shadow areas of an image.
This is parts of images a lot of discriminating photographers really want to be able to see, and to gather detail out of the darkest areas of the image. So as we zoom in here, and let me make sure I'm at the right level, I'm hoping you can see this on the display.
But I've got two new sliders here, one is one that's sort of a bias on the exposure that we can bring down. And as we go down it will make the darks lighter. And then also there is another adjustment here which also affects the shadows, and affects how much they're toned.
So if we were to try to do the same thing on the JPEG, A we don't have some of these controls available, and oftentimes because it's only an 8-bit image, making the shadows darker will just leave you with a very posturized looking image. Let me look at, see if there's one more image to look at.
Here's another fun set. Again, with the JPEG we don't have enough controls, if we want to bring the exposure down we lose a lot of, we don't get any extra content in the cap clouds, and we lose all of our detail in the shadows. When we go into a RAW file we can bring the exposure down, and continue to see the cloud detail.
And as we bring the shadows up we can see all the details in the scaffolding. If I go in, zoom in, you can see all the corrugated metal on the sides. Hopefully you can see that up there. No, it's not quite as visible for you. So that's what I'd like to show about the RAW. Again, this is the CIRAWFilter sample. It's a very simple CIFilter to use, and it'll allow you to give your users all the controls for this. So back to the slides.
( applause )
So that's our discussion for today. I just want to have a few final thoughts. As we learned from today, you can use Core Image to process more than just general purpose data. As Ralph was talking about earlier, you can put information like XY coordinates and angles in a pixel buffer, and give it to Core Image to process your data.
Another good bit of advice is you can use Quartz Composer as a prototyping tool for developing new CIKernels. Once you've developed a kernel and debugged it in their interactive debugger, you can then take that kernel and patch it, and package it as an image unit. Next you can use CIFilters together with core animation, which work together very nicely, to spice up your interface, to add transition, roll-over effects, or subtle transparencies.
Also keep in mind that many of our Macs these days ship with built-in iSights with are a great input device to add interesting effects over. And most important, amaze your users, and give them great dynamic control over images that will wow them and amaze all of us. So that's the end of our conversation. We have a lab that follows this, which will be a great time to talk to all of us for questions and answers periods. And for more information, contact Allan, who's our Graphics and Imaging Evangelist, or check out our documentation and other resources on the web.