Graphics • 1:08:13
Core Image is Mac OS X's new image processing architecture that takes full advantage of the processing power available in the latest GPUs. Core Image provides a wide spectrum of built-in filters that provide high performance image processing operations such as blurs, distortions, transition effects, and color adjustments. Core Image is developer extensible via "Image Units", a plug-in architecture for host applications. View this session and learn about the latest revolution in image processing!
Speakers: Ralph Brunner, Mark Zimmer, Frank Doepke
Unlisted on Apple Developer site
Transcript
This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.
Hello and welcome to session 201 on Core Image. That's me, Ralph Brunner, and So here's the agenda of this session. First, I'm going to explain to you what the Core Image framework provides and well, essentially what's there. Then how to use the API and I will not go into too much detail there. There's a nice set of documentation on this. But essentially what the spirit of the API is and what are the key methods do you have to know. Then Mark Zimmer will come up and explain to you how to write your own image processing kernel.
After that, Frank Doepke will show you how to take that image processing kernel and create an image unit out of it. And by then you'll hopefully fully know what an image unit is. And then I will bore you a little more at the end with giving you essentially ideas how this could be used in various applications. So if you're not interested in any of these things and rather hear about databases, now would be a good time to leave.
Okay, so what's Core Image? Core Image is an image processing framework now in Mac OS X Tiger, and the goal is really to put image processing onto the GPU. It has a full floating point pipeline, so there is no clamping to 0 to 1 through the pipeline, and you get really nice deep pixels.
There's a base set of about 60 filters, which contains various things from fairly pedestrian stuff like contrast brightness to some more funky stuff like bump distortion, and things like that. There is the concept of Image Units, which is essentially a plug-in architecture, how you can make a filter and put it on the system and have it show up in other applications that are hosts for Image Units.
And there is, while Core Image is really focused on working on the GPU, there is a CPU fallback available. So if your GPU is not adequately equipped with fragment programs and the nice things that we utilize, then it will run on the CPU. And actually the CPU, it's a fallback, but it is working pretty well, actually fairly good, very well optimized code.
So, why the GPU? Well, there are essentially two reasons. If you can put something on the GPU, well, the CPU is free. And CPU cycles are kind of a scarce resource in whatever you're doing. So that helps. And the second reason is GPUs actually are outpacing CPUs in a number of interesting benchmarks like memory bandwidth and the floating point processing power. So this graph down here tries to illustrate that. We essentially took a bunch of image processing filters, about five, stack them behind each other, and then run it on different hardware. So the first four are different hardware configurations from the iMac G4 to the dual 2.5 G5.
And that's interesting here because, by the way, the number is in megapixels per second for that particular operation. So the interesting thing here is the dual G5 is actually about four times faster than the iMac, which makes sense. It's twice the gigahertz and it's twice the number of processors.
What about right? So, if you look at the graphics cards, there is, if you take a low-end graphics card, the GeForce FX 5200, and compare that to the high-end graphics cards, then you actually get a factor of, you know, almost 9. And that's kind of a challenge because a factor 4 is something that your software can handle. It means, well, if you had 20 frames per second before, now we have 5 frames per second. So, it's not exactly good, but yeah, you can live with that.
If you have a factor 10, well, maybe you want to consider Adding a special code into your application that if it detects if the gap is that big, well, maybe we should do something else. And for example, if you have something like a keynote presentation application, if you can't do the transition smoothly, well, maybe just fall back to a dissolve or something like that. Okay. Well, on Mac OS X, the way to access the GPU is OpenGL. So let me tell a bit more about that.
There are essentially two different ways, well, I see things, I see two different ways how you can use OpenGL. Well, one is use it as a low-level 3D renderer. And the characteristics there, you have high polygon count, moderate amount of textures, and you're doing a lot of work in optimizing your scene and figuring out which pieces you need to draw. And the model new matrix is set up that is a camera looking into a 3D scene.
And pretty much all 3D games you play, ever played, are in that category. Second category is using OpenGL more as a pixel calculation engine. In that case, you usually have a lot of large textures and the geometry is pretty much negligible. And the model new matrix is really set up to allow you to address individual pixels.
So, I call that Quartz versus Quake. And... Since Jaguar, the Windows Server compositor was layered on top of OpenGL. We call that Quartz Extreme. It took essentially what Peter put it, I think, two years ago as it removes the transparency tax. The cost of doing all that compositing is now on the GPU and from the CPU's perspective, it's free. This year, Quartz Extreme also encompasses the Quartz 2D Drawing API. And Core Image uses OpenGL in that way as well.
So why are we doing this? Well, for an application developer, using OpenGL efficiently is hard. You have a lot of stuff you need to learn about P buffers and the right way to switch context so that everything is still streamed to the graphics card correctly. And the goal of Core Image is essentially to abstract you, the developer, from that burden. You should be able to concentrate on drawing big quads with video on it and that's it. And not know about the 200 lines of code underneath to manage all the buffers.
So after talking about that much about hardware, well here are actually the hardware requirements. So, the key thing that Core Image needs is an ARP fragment programmable graphics card. And it works pretty much on any ARP fragment programmable graphics card, but more memory tends to be better, especially with all these other system services going to the graphics card as well.
Well, more is better, but it does work well on my laptop with the ATI card which has 64 MB of memory. So, if the GPU isn't fragment programmable, then we have a fallback which uses the CPU. And here the G4 and the G5 are strongly recommended. It does work on a G3, however, because all the computation is done in floating point, having that regular Velocity Engine unit there is a great game. And it supports multiple processors, so if you have one of these nice dual desktops, then you will utilize your two processors pretty much optimally.
So with that, I would like to go to Demo Machine 2 and give you a bit of an impression what kind of filters are available.
[Transcript missing]
Here I have a sepia tone filter and more of the more pedestrian ones. There is saturation. That looks ugly. Brightness and contrast.
You can make a monochrome image. This is not terribly exciting, but it turns out that these are actually really useful in your workflows. They have to be there. So let me try to get some more interesting stuff going. Here is an edge detection filter. Picture here, can do a bump distortion, and if that's too scary, let's make that a bit smaller. Okay. Now I have a glass distortion filter and I can move the glass around like this. Or modulate how thick the glass actually is.
Yeah, more funky stuff like perspective, transform, oops, that was a bit too much. or put a spotlight on an image. Things like that. So there are about 60 of those and some are fun, some are useful, some are both. You can also do effects on text, which turns out to be fairly interesting if you, for example, get your drawing out of Core Graphics and then put effects on top of it. So you can do things like that. Or my favorite, the zoom blur. Actually, Zoom Blur works even better if I go and say, "Fur to the edge detection filter." Like this. And then put the Zoom Blur on top of it. I like that better.
So let me try to do something a bit more useful and we had a bunch of artists come up with scenarios that I could demo, so I will just repeat one of them. So here we have a guitar and first thing I'm going to do... I'm using the monochrome filter to produce more of a blue-like look here. And then... Exposure to make it a bit darker. Actually, I want to focus on the hand here, so I will add a spotlight.
Yeah, that's good. And then let me add a layer on top of this, like the loose album text line art. And at the very end, Crop to frame. And now I have a little album cover for my blues band. So the key thing here is that I stacked a bunch of filters on top of each other and all of these filters are still live. None of these get the result written back into main memory. So I can still go and change the blue tint in the middle or if I don't like the spotlight I can move it around underneath and so on.
So the next thing I would like to show, we can actually handle fairly decent sized images. So what I showed before, these were images pretty much screen sized or projector sized. This one here is different. This one is an 11-megapixel image and it's 16-bit per component, so it's a 90-megabyte image total. And, well, it's a 90-megabyte image, but it goes to a 1-megapixel projector, so you can't really see that much of it.
[Transcript missing]
There's a statue of a bull, and so on. So let's do something on this image and go back and use the glass distortion I used before. So let me show you, you can actually see all the detail in the glass distortion applied to this 90 megabyte image.
So naturally, 90 megabytes, this takes a bit longer. It's not totally real time, so if I drag this around you see it's something like 3 to 4 frames per second. But it still works and the key message here is this is actually well beyond the capabilities of the hardware. The hardware has a texture limit, a size limitation, and this image is big, too big.
And Core Image goes and cuts it into little tiles, sends them up to the card, does all the magic that for every filter there's enough information there to do its job, and then stitches everything together in the end so that you can enjoy this on your screen. So the last thing I would like to show is, well, if it's fast, you can also apply it to movies.
So here I have my little bump distortion on this movie trailer. I just have hours of fun with this. Okay. Or yeah, do the sepia tone. And even then, you can go and stack filters like So let's do an edge detection filter on this movie. So, okay, I think you get the idea. With that, let's go back to the slides.
So now, how does this all work? So you essentially have to know three core classes in Core Image to use filters. So the first one is the CI Image, and to everybody's surprise, that is an image. So, An image is typically something you bring in from the outside world like a JPEG file or it's the output of a filter. And more on that a bit later. There is a context which is the other end of the workflow which is the destination abstraction. You build a context say on top of a ZG context or an OpenGL context and the context is where you draw into.
And the third thing, third piece you need is the CI filter. And the filter is what actually does the work. You put one side, the image comes in, the filter modifies it somehow, and then the output is an image. You can either take that image and pass it to the next filter or draw it into the context.
So a bit more detail about what the image does. So an image can get created from, for example, a CG image ref. There's also a possibility you can pass in NSData with row bytes and an encoding or for an OpenGL texture. And this is the API call you do, in this case, for a CG image. They all look very similar.
And typically the output of a filter is a CI image and this is the API call you do to get that out. So an important thing here is that A CI image can have infinite extent. So if you imagine an infinite plane with a checkerboard pattern on it, that's a valid CI image. I hope you will never try to draw something with an infinite extent, so better specify a sub-rectangle you're really interested in.
So filters. Filters are created by name. So this case here we create a filter with the sepia tone. by name CICP at home. And all the parameters that go into the filter are set using key value coding. So in this case down here, I will create an NSNumber and set it to the key input intensity. And key value coding is also how you get your output. So you can ask the filter, give me the value for key output image. That's the output image. So the reason why that is done is, well, first reason is we're lazy.
We have 60 filters and we don't want getters and setters for each one of them. And the second one is this allows you to introspect the capabilities of a filter so you can build automatic UI and support plug-ins and things like that. So here's the part about introspection.
If you want to know what filters are available, well, solution one, you go and read in the documentation. Or solution two, you ask the CF filter class, give me all filters in a specific category. You can ask for give me all filters, period. Or give me all filters that are transitions and suitable for video. And then you get a reduced set. There's a bunch of categories for color adjustment, geometric adjustment, things like that. And there are attributes like suitable for video, suitable for interlaced data, suitable for still images and things like that.
So once you have a filter, you can also ask the filter for, well, what are your parameters? What are the input keys and what are the output keys? And there's an attribute dictionary you can get which Sorry. Which contains all the metadata about an input. So it has things like type.
This is a number, this is a vector, this is an image. And it has semantic information. So for a number you could, for example, have this is an angle, this is a distance, and if you want to put up a slider, it's from this value to this value.
So the attributes dictionary is fairly rich and allows you to build automatic UI. In fact, the demo I showed you, I didn't write any UI code specific to these two filters there. They just introspect the filter attributes and build, okay, for each of these values, add a slider for each of this at a point that you can move around and things like that.
So there is an interesting detail behind that API. It's fairly straightforward in the sense it looks like immediate mode. You take an image, you pass it to the filter, you get a new image out. But what's really happening behind the scenes is that image hasn't been rendered. The image you got back is kind of an image recipe. It's an image that contains references to original data plus a program that says what to do if somebody really wants to draw this.
And it also contains a snapshot of the parameters that were in the filter at the time when you asked for the output image. And the image is really only rendered if you go to the context and say, "Okay, draw this image." And that has a bunch of interesting implications, and pretty much all of them related to performance.
So first of all, if you take an image from one filter and pass it to the next filter, the next filter just says, "Okay, I have this image here. I'll append this program that I need to execute," and just passes the data further on. So if you never draw, it's really, really cheap.
But probably you want to draw it at one time. And at the time you draw, there is a compiler concatenating these programs, and actually it's like a just-in-time inliner, that tries to, for the target that you're drawing, to produce an optimal program that runs in these pixels. So, concatenation of filter operations. And the second thing it allows you to do if you actually draw only a sub-rectangle out of your image, then it at that point knows which pieces it actually needs to process. So, let me go on to that point a bit more.
So what this does is you can have several components in your app or the operating system that conspire on an image computation. So as an example down here, I have an image that comes from the Image I/O subsystem and it does a color adjust and returns a new image. That gets passed into a different subsystem, in this case a thumbnail renderer, which scales the image down and then draws it on, I don't know, a sheet with a bunch of thumbnails, I guess.
Because all the operations are deferred, at the very end, when the context sees, well, I have to draw a small image, and it propagates it all back and figures out, well, I could move that color adjust to the right level and process far less pixels than doing the color adjust on the original size image. So this is essentially an optimization technique and it happens completely transparently to the user of the API.
So let me give you a really simple example how code with using Core Image actually looks like. In fact, it's so embarrassingly simple you might easily ask why would you use Core Image at all for this, but I'm going to build on it, so stay with me. So the first thing I'm doing is creating a CI image. In this case, I'm using a CG image ref.
The next step, I'm creating a context. And again, I like CG, so I'm using a CG context ref on top of that. And then I go and draw the image here. And I specify two things. One, I specify the point where I'm going to draw it, and I specify the sub-rect out of the image I'm going to draw. Remember that there are infinite sized images, so it's a really good idea to specify a sub-rectangle in that case.
So let's add something interesting in between, which so far we just draw an image which, uh, It's hardly worth the time I should spend on it. So let's create a filter in the middle. I create a CI color invert filter. It's the simplest filter we have. It inverts the colors and has no parameters at all except one image.
And keys and values is a convenience method to set all the keys and values instead of having to call individually for each one of them. So it sets the input image to the image we created. And the draw image call, instead of taking the original image, it just asks the filter for the output image.
And so this is our four lines of code example how to use a filter. So the bigger picture of things-- Everything that happens within the space of Core Image happens in a defined working space. And that thing has two aspects to it. There is a color working space and a working coordinate space. So imagine the diagram down here. You have a bunch of images coming in. It flows through a graph of filters and then it exits to the context, essentially.
So let me talk about the... "Coordinate Space" first. Coordinate Space is essentially an infinite sized plane with a defined origin where you can place your images. And there are a bunch of filters that do things like affine transforms so you can move them around and scale and things like that.
So that's pretty simple. Again, this is the reason why you have to specify a sub-rectangle at the very end when you actually draw out of that working space. The color space is a bit more interesting. So color matching happens on input and output. If an image comes in, is tagged with a color space, it gets converted into a working color space. All the processing is done in that working color space, and on output there is a conversion to the context color space. Which, for example, if you created off a CG context, you don't have to do anything, it figures it out directly.
The default working space is CChemo2. CChemo2 has a bunch of interesting properties for image processing. First one is it's light linear. So if you imagine you have two flashlights and they point at the same spot on, I don't know, this stage, then the amount of light that is reflected is somehow proportional to the light from one flashlight plus the light of the other flashlight. Light linear essentially means that the math inside your filter program has that proportion. And we'll have an example for that a bit later. And the second one is it has infinite gamut. So CChemo2 doesn't clamp colors to 0/1 range.
So if you, for example, have a YCbCr image coming in from video and you want to process it, if you convert it to RGB, if you clamp to 0/1, you will lose some color. But because it has an infinite gamut, in this case it doesn't. So conversion from YCbCr to RGB and back is lossless.
There's a bit of secret sauce on the color space issue when it comes to CI images. And the color space of a CI image can be nil. But that means you're telling the system, this image, I'll give you an image, but it actually doesn't contain any color information. So for example, if you have elevation maps, normal vectors, or actually function tables you want to pass into your filter kernel, then clearly you don't want to have color matching on your sampled asign function. So nil is the color space used in that case.
So note that there is a subtle difference between nil, which means this isn't colors or don't match, and this is already in device color space. If you want to send an image in that is already in the device color space, just tag it with the working space. Then Core Image will say, OK, image and working space are the same, so there's no work to be done here.
Let me try to give you an example why this matters. So what I did, I took a tripod, put my camera on it, and took a picture with bracketing on. So I have three different exposures of the same scene and it's two f-spots apart from left to right. So this is physics. This is letting more light through the shutter by keeping it open longer. So what I'm doing now is take that leftmost image, darkest one, and try to simulate the same thing inside Core Image. So I take that image.
And multiply all the numbers by two, gives me the middle one. Multiply the numbers by two again, gives me the end one. This is why that CCAM two color space matters. It actually matches what physics does. So for comparison, if you actually do that in device color space, so skip all the color matching, The results are rather different. So for your application that means, for example, if you're dealing with photographs, you can have an exposure adjust UI and it actually produces the same results that the user is used to from his camera.
Okay, let me switch gears here a little and talk about the overall model behind the architecture. And it's based on this SIGGRAPH paper, "Modeler for Efficient and Flexible Image Computing" by Michael Schanzes. And it's published in SIGGRAPH '94. And it essentially explains how to build The pipeline of image processing operations and how to propagate through that pipeline all the information that is needed to do the operation in the end. And actually Shake is based on the same model.
So I will use the terms out of that paper because, well, why invent something new if it's already been done? So there are two key concepts here. The first one is the domain of definition. Domain of definition is essentially the area of an image that is, in the broadest sense, interesting, meaning that everything outside that domain of definition is transparent.
As I mentioned before, the domain of definition could be infinite, but for pretty much every be an image you load from disk? It's probably not. Second term is the region of interest or ROI. So when you actually draw at the very end of your filter chain, you specify the area that you want to pick out of the working space. This is called the region of interest.
So here's an example how this works. So at leftmost image is the original image and then do the zoom blur filter on it, which you saw before. So the domain of definition of the original image is, well, the extent of the image. The domain of definition of the filtered image is larger because the filter bleeds out into neighboring pixels.
So when I actually go, and now I'm at the bottom right corner of this diagram, and draw a sub-rectangle, in this case the duck's eye, out of that scene, that region of interest is propagated back into the original image. So the filter has to have all of the yellow area in the bottom left corner available to do its operation. And what it does, it intersects the region of interest and the domain of definition of the source, and that's the real data that needs to be fetched. So why is that important? If you write your own filters, you actually have to provide functions that do this mapping.
With that we come to the topic of how to write your own filters. So there are essentially two more classes that you need to know on top of the three I mentioned before, how to write your own filters. There's the CI kernel which represents the per pixel operation. It's essentially a little program that produces a pixel at the output and it can have things in it like looking up pixels in the source image at various places or just math or scaling and things like that.
And there is the CI Sampler which is the kind of the mediator between the kernel and the original image. Sampler is an accessor function. So when you go and say "give me the pixel at coordinate x, y" then the sampler will kick in and do some magic things to that. So a sampler has two key elements. It has an interpolation style, so you have to know which, whether you want to do linear interpolation or no interpolation. It has wrap mode attributes, so if you ask outside the DoD what should happen.
So the kernel here, um... Yeah, sorry about that. The kernel represents the per pixel operation. And I have a little sample here. This is the magnifier code I showed in the previous demo, the little circle that showed you the Eiffel Tower. That's pretty much all there is to it. There is a distance computation to make the little gradient on the side and pick out the right scaled pixels. With that, I'll go back to demo machine number two.
Some point? Okay. So what I'm going to show you now is essentially more of the same, but now that you have the overall concept, I can explain a bit more what's going on. So the first thing I'm doing, this is a little test app, by the way, that we wrote to figure out how the framework, how to tune the framework, and it has a nice aspect that it visualizes nicely what's going on. So I'll take an image, hibiscus here, then I pipe that through, let's say the bump distortion filter you saw before, and then display it. So this is the output image.
And in the top left corner you see that little flow, how the data flows through. So this is the bump distortion you saw before, I can move that around. And it has a bunch of parameters like the radius and things like that. So the first thing I mentioned is that there is a CPU fallback.
So what this case here is doing, Is the bump distortion running on the GPU? And it's probably really hard to read in the audience, but there is a little frame rate indicator in the bottom left corner, and it says this runs at 93 frames per second. So let me go and switch... and switch this to use the CPU renderer instead. And you see it's a bit more chunky, but the CPU renderer runs on, yeah, about 20 to 25 frames per second for this operation.
and he produced the same result, which is kind of surprising at times. So let me try to build a bit more complicated example that shows the power behind the framework, especially when the just-in-time compiler kicks in and does some more or less smart things at times. So let me start with the edge detection filter you saw before. So it looks like this. Let's make these edges really visible. And I'm going to change the image as well. So let's open an image like Core Image. Make a little wire here, I'm going to need that a bit later.
So this is the edge detection filter on Core Image line art. So far, not that interesting. I'm going to add a new filter for... Exposure Adjustment. And then I'm adding the zoom blur I showed you before. So this is how the zoom blur image looks like. Let me make that a bit... So this is the zoom blur effect on line art. At the very end I'm going to add a compositing mode, in this case "Addition". and add the original image back to the Zoom Blur image. So this looks like this.
So it looks like that. And with the mouse I'm just moving the origin of the zoom blur around. So you can easily imagine that if you want to do some visual effects on line art, building some interesting visuals with this is pretty easy. And that's the key where the compiler actually comes in.
Most of these operations actually are collapsed into single pass. It's only the zoom blur which already does multiple passes toward this, to create this effect that still has multiple passes in the very end, but everything else is collapsed. www.microsoft.com/mechanics/coreimage-im ages I would like to ask Mark Zimmer up on the stage to explain to you how to actually write one of those things.
Thanks, Ralph. Okay, the fun part about this is writing your own image processing kernel. The thing about core image is that it does come with 60-plus filters, but if they're not really the filters that you want, or you can't put them together like you just saw into a graph to do what you want, or that graph that you did put together is not running fast enough, then you may want to write your own kernel. In fact, that turns out to be one of the things in Tiger that really runs much faster than anything we've ever produced in the past. Think about it.
So you can basically make your Tiger Roar by building your own filter. So creating your own plug-in filter in Core Image is something that you can do and basically when you do that you end up creating a pixel shader to program a GPU and it's something that sounds actually pretty complicated but as it turns out it's really very simple.
The neat thing is that these filters, once you've created them, are first-class citizens. So that means your filter can run as fast as our filters. You basically are going to concentrate on the kernel implementation. Most of the stuff around there represents a small fraction of what you do. A lot of your time will be spent working on the actual shader.
So, filters are based on Objective-C at the top level. They have a few methods which you'll see and it's pretty simple to create one. It has an image processing kernel which is the fun part and you'll see what I'm talking about in a minute. All of the filters that you've seen so far have kernels and are built on top of the GL shading language and we have a special variant of that called the CI kernel language. The filter is embedded in the Image Unit Bundle structure which Frank Doepke will talk about in the next speaker.
Cool thing about this, I've been writing image processing effects essentially for 18 years. I found that this was the easiest way to write them and it produced the fastest results for any filter that I've worked on. So for instance, the Zoom Blur filter that I produced ended up being about a hundred times faster than the same filter in Photoshop.
Alright, that's running on a G4 1.4 dual anyway. So, let's talk about the Objective-C portion. What you do is you're going to build a class definition that's subclassing CI filter. In the class definition you will define your input keys. The init method of that basically helps you to locate your kernel which is going to be in a text file from the bundle. It's actually pretty simple although it's several steps.
The custom attributes method helps you to specify key defaults and ranges for your input keys. That's particularly useful if you load your filter into a program that automatically produces UI such as Quartz Composer or any of the demo apps that you saw here. The output image method finally is something that helps you to take your, okay let's talk about this, take a step back. The kernel is executed once per pixel. So, what you want to do is take everything inside of there. It doesn't need to be there and hoist it into the Objective-C output image method. So, that's what I mean by performing pixel loop invariant calculations.
Anything you can hoist or constant fold is pulled out. It will organize your input keys as you'll see. It calls the kernel using an apply method which is a filter method. And then it also helps you to specify the domain of definition of the operation. That's part of the apply as you'll see.
Okay, so the Funhaus filter, which is demonstrated up here in the corner, shows how, is something I think has been shown here, is something that we're now showing a class definition for. You'll see input image, which is basically your one image that comes into it. You'll see three parameters: the center, the width, and the amount. So, we're creating an interface for that. Notice the header files at the top: Quartz Core, Core Image.h, I hope I got that right.
All right, okay. Now, the init method is kind of, this is where things get a little bit more complicated. And what we're trying to do here is we're trying to locate our kernel from the bundle and process it. So, the first thing you do is to find your bundle, so it's bundled for class.
The second thing you do is to load up code, basically string with contents of file. So you've located your bundle, you brought it in as a, it's like opening a text file and putting it into a string. The third thing you do is to, the clicker's got to work.
I probably have my hand in front of the thing. Okay, is to basically use the CI kernel kernels with string method to extract all of the kernels from the contents of that file. And what happens is there's multiple kernels per file if you want, but usually a filter will only have one. The reason you might want multiples is so you can do multiple pass operations internally to a filter. And that's useful for various things. If you wanted to build a blur, for instance, you really would have to use multiple passes.
Okay. There we go. The custom attributes method, this is something I'll pretty much skim over. It's not massively important. For this, you see input width is defined with a range and a given default. Also, its type is defined, type distance. There's type scalar, there's angles, there's other possible types. Check the header out for the various types. You'll also have to set the ranges and defaults for input amount and input center, which are the two other keys, non-image keys. They'll be provided here as well, I just didn't show them.
The output image method is probably the most important part of the Objective-C level. And you'll see the apply method being called there. And it gets passed the kernel and also the other parameters of the filter. It's important to note that the parameters in the filter have to mirror exactly what's being passed into the kernel in the kernel program itself.
So what we're doing here, we have 1 over radius, 1 over 10 to the float value of input amount, etc. These are things that we sort of constant folded and put into here so we won't have to do them in the kernel program. And as always, parameters are passed as objects. So we're using NSNumber. If you were passing a coordinate, you'd pass a CI vector. If you were passing a color, you'd pass CI color.
And of course, the images are passed as samplers. So you'll see how we load the sampler and then pass it in there. So that's really all you have to do inside of your filter. Except now we're getting to the most interesting part. You know, if filters were like a piece of candy, then I suppose the shader part, which processes the pixels, you'd probably consider that the soft, chewy center, the best part. This is the part you want to spend your time thinking about what to do.
Okay, we're using a subset of the GL, the OpenGL shading language here. You can specify multiple kernels. Other functions can be included for modularity's sake. If you have something that's used multiple times and you want to extract it, you can basically treat it like you would normal C. This is the place where you're going to want to go to find out about the OpenGL shading language. So, opengl.org/documentations/oglsl.html.
Okay, each kernel procedure gets called once per pixel. I mentioned this before, and it becomes important when you decide how to build your shader. When you are building your shader, another thing you should know is that there is no knowledge passed or accumulated from pixel to pixel. So it's kind of like a ray tracer. You don't know anything else but what you're doing for this pixel.
Think of it that way. And like I said before, hoist as much invariant calculation as you can out. Anything you can do is going to save time because it gets executed once per pixel. When you pass in color, remember the colors are pre-multiplied alpha, and in general, so are the images.
Okay, so here's some effects you can look at. There's the original image, there's the Funhaus effect which provides sort of a distortion in X, the Edges effect which everybody seems to like to show, got shown in the keynote, and the Spotlight effect. These effects are all going to be shown here.
So let's start with a little fun in the shader here. We've got the displacement effect. One thing you should notice for displacement effects is if you want to do a displacement effect, you want to sort of operate using the inverse transform. In other words, for the destination point, what the source point is.
Anybody who's done, you know, in the texture that you're loading, anybody who's done an image processing transform or a displacement transform will know this is how it works. It's kind of the opposite of, you know, where does my pixel go? It's, you know, the opposite thing. So in this case here, we start by loading the dest cord. This is a built-in function in our kernel language that allows you to load the location in working coordinates of the current pixel.
Then you want to apply the transform. So I'm subtracting a center X, multiply radius, whatever, blah blah blah. Basically what it does is it's going to distort T1.X which is the destination coordinate. Finally, we fetch the displaced sample. And what we're doing here is we're using the sample function, again, from the sampler, and the coordinates being passed in is T1.
But what we have to do is we need to run it through sampler transform. This is if that sampler actually is referenced under transform. So it will take care of that for you. If you don't want the sampler to be referenced under transform, then I guess you can just use sampler coord instead of sampler transform. I'll show you that method. At the end, I'll talk about it. And... is this battery running out? Is that what's going on? Okay.
All right, the edges effect, which seems to be shown quite a bit, is really quite simple. What it is is a cross-gradient function. And one thing you want to remember is when you do calculations in here, particularly on things like vec4s, which is R1, R3, R2, R0, what's happening is you're doing all the operations component-wise. So you're using the power of the vector instructions to get it done several times faster, even inside the shader itself.
Remember that the graphics cards are actually multi-pipelines. So what that means is it's doing, you know, when I say R1 minus R3 there, it actually does four subtractions simultaneously. But there's also multiple pipes, so it's times the number of pipes, which is the number of calculations happening at the same time. So you can actually get, you know, 20 gigaflops on these cards quite easily.
Okay, so the first thing we do is load a neighborhood of samplers, of samples, sorry. And that's just a square neighborhood, very simple. Okay, then the second thing we do is to compute the cross-gradient. So we're subtracting a cross-gradient like so. And then we're doing the least squares computation and multiplying by the scale, so we can scale up or down the edges. And then I just throw in an alpha of 1 at the end just so it will be visible. It doesn't really have a reasonable value that you might want to put there.
Okay, the final example is the spotlight. And often when you're doing a 2D rendering, you'll want to use some kind of a 3D effect on it to make it look cool. In this case here, we're doing a spotlight. The spotlight is done in the shader. This is probably the most complicated shader. The first thing I do is to get the pixel color. So that's the color that we're shading.
Okay, second thing I do is to calculate the vector to the light source and then normalize it. Because I'm assuming the picture is in the XY plane, the normal to the picture is 001. So if I, so n.l, where n is 001, means that the n.l is going to be r0.z here.
Okay, and then I calculate the light solid angle cosine. This tells me how much light is being put into that spot. So it's just really a cosine, a dot product is for calculating the cosine. Then I raise it to a power to concentrate it and make it a little, you know, so if you wanted to have a concentrated or a wide spread or a thin spread.
Then finally, I calculate the pixel color by multiplying n.l by the light color by R1, which is the color of the pixel, times the beam concentration. That gives me the final result for the spotlight. It's not really a lot to it, really. When you talk about the OpenGL Shading Language, let me just sort of go over it in general.
The first thing is it uses standard C precedents, and it looks pretty much like C. The second thing is that there is no coercion in this language, which means if you want to add an int to a float, it's going to give you an error when it compiles it. So you should have a constants should be specified for floating point as like 1.0, 0.0. Okay. The third thing is you noticed in those examples we have float variables, scalar, or we have vector variables, vec2, vec3, vec4, which are vectors of floats.
There's also Boolean vectors and such things as that. But basically, the vector variables are how you get a lot of the computation done. The built-in functions, I should talk a little bit about it. The first row of built-in functions, sine, cos, pow, et cetera, those are kind of the more expensive functions, with the ones at the beginning kind of being more expensive than the ones at the end. Square root or inverse square root It's actually pretty cheap. But, you know, if you have like... Sign calls in a fragment program, you know, it may be a very slow fragment program.
The second line, these are practically zero cost functions, or they cost one instruction. Things like absolute value, sign, floor, etc. They're all available for you. And then there's the things like distance, normalize, etc. which are extremely useful in doing radial or centric kind of calculations. Also, it uses a swizzle syntax for load and mass store. So you can reference something like R0.R for the red component or R0.RGB for the red, green, and blue components of it. In particular, when you're doing a store, it really just says which of the components you're storing into.
The language for CI Kernels is the OpenGL shading. We've added two things to it. In particular, we've added the kernel specifier, which you've seen in each example. The sampler type is used to declare image parameters. DeskCord is added to give you the working coordinates location of the current pixel. There's a sample function which allows you to do a texture map lookup. So it's very simple, as you saw in all the examples. There's a sample.
SamplerCord is for giving you the location on that sample, and that's actually how you access a texture unit, if you're familiar with OpenGL. So each sampler may get its own texture unit. The sampler transform is one where a sampler may have an affine transform applied to it, and that's how you can make sure that is preserved also. It literally generates instructions. We use the __ColorType to define Vec4s that are color matched inside of your shader.
And finally, because colors are pre-multiplied, if you're doing something like a color transform operation like, well, invert is a good example, color invert, or any kind of color control, brightness, contrast, the first thing you want to do is un-premultiply that color. Then do your light linear transform. "And make sure you preserve alpha in your transform if that's what you want to do. So there are some things like for instance if you just want to scale, you know, do an opacity calculation you can do on the entire color without pre-multiply or un-pre-multiply.
You could multiply your VEC4 color by the opacity fraction to get a pre-multiplied color that was correct. Okay, and basically that's the entire, that's how to write the internals of a filter. Now I'd like to introduce...
[Transcript missing]
Thank you, Mark. Good afternoon. As you can see, I'm the packaging expert on the... how we put the eye candy now into the box.
So what are Image Units? Image Units is our architecture that you can provide a plug-in that we can use in any application that will use the Core Image architecture. And for that we chose NSBundle as the delivery mechanism for the whole part. And that makes it very easy to write, actually, in Image Units.
The key point here is that this is actually your business opportunity. We know that this is something that we only introduced in Tiger, but we have "We're going to talk about how applications will pick up this technology and we already talked our applications division and "In the future on our stuff, so there's a good opportunity that you can buy just a filter and have a good audience of like people who want to use these filters later." One concept that is interesting to know here is we have a non-executable filter. That means we just have a CI kernel, that's all what UHDCD would provide in your plug-in, and you don't have any CPU executable code.
It's important when you talk about security sensitive applications like some system servers or if you have like something like the screen saver you want to pass around and those you definitely don't want to have any kind of Trojan horses or viruses in it since they will be executed without you even know it.
So now let me talk a little bit like where do we actually store these image units. Location is the key point here and we have two spots where we would normally introduce them. And those are the graphics plugins architecture, sorry, folders inside the system library folder or the user library folder.
That's where the custom load API basically will look and find your filters. If your application has additional filters that they just want to have in your own bundle, we have an API that will load those units one by one. So you have to call those by yourself. And on that part, that brings me over a little bit to the structure of our image units.
And you can see I put up a little screenshot like how it looks in Xcode. And as you can see in the very top part, I have a little loader part. That's all like my Objective-C code. And then I stole Mark's filter. The Funhouse filter, which has some Objective-C stuff in it. And then we see there's in the resources part, that's where the in terms of image units is right now.
We have the CI kernels, which as Mark explained are really the core of the filter. And we have a description P list. And that is the part which gives us the part what is really inside this image unit and what do I get out of it. Especially important for the non-executable ones because we have to communicate somehow, okay, which are the parameters that I can pass in and out. And we provide also a way that you can put in your localization into it. So you can provide your filter for multiple languages.
From there let me go to the API, which you can see is really extensive. We have three calls that are important for the loading. The first one will just load all filters. The second one will only load the non-executable ones. So if you write an application where you want to make sure that I cannot load any security sensitive plug-ins, that will be the call to use.
And the third one is the one that you would use if you have your image units in a specific, your own application bundle and you want to load those. So you would just go through your folder and each of the bundles that you find there, you would load them with this API call.
On the other hand, when we look into the plug-in side, we have a very simple call that we recall on the primary class of the NSBundle, and that is the load call. And you can see this actually returns a Boolean. So this is a place where you can, for instance, do your registration, or you can check your hardware requirements. You can say, "Oh, well, that's a serial number that ends on a 3. I'm not running that machine." So I can send you return false, and the filter will not be loaded. So then from there, I would like to go to demo machine 2.
Okay, what we see here is now again the Xcode project and I just want to get quickly over the stuff. Since we've seen how the Funhouse filter works, I won't go too much into details there. But I show you now an example of a non-executable filter and this is my filter function that I have here. This is something like a pixelate filter, just does a slight twist to it.
And then at the end, I have the description "plist". So the only part that's kind of important to know about my little kernel is that I have like two parameters that I need to pass in. So it takes an sampler, which is my image, and I have a scale vector on it.
So I open now the description plist in our plist editor. It might be a little bit hard hopefully for you to read, but I try to go with it so that you understand what I'm doing here. In the description plist you see that I have two filters. We have in the first part the Funhouse Mirror filter and just my simple kernel filter. The important part are then the attributes. And as Ralph mentioned in the beginning, we have different categories. So I can see this is a distortion effect, that's how I can detect it. It's suitable for video and also for still images.
This is my display name. Kind of pay attention that here it's still my kernel filter. I'm actually using our localization technique so that it will actually later on in the UI show up as a correct and a little bit more nicer name. And then the important part is actually the inputs. And I have two of them, as I already mentioned in the beginning.
One part is an image, so I just give it a class and say, "Well, this is my input image." And the second part to it, I have an NSNumber, which is the scale of my floating, so I can just set some scale point up here. And that's pretty much all I need.
So all what I do now is basically I would have to build this bundle once and put it in the right location, and now I can use it in my applications. And for this, I go into the Quartz Composer. So right now what I set up is just simply an image that will be shown on the screen. So this looks like this. And we have a surfer.
And now, I can simply go in here and look into my distortion effects. Look there, it's my kernel filter, and you see it has a little bit more nicer name. And all that I will do now is simply reroute the image through that filter. And run this thing again. And you see, it looks like a little bit pixelated. And I can do the same thing with the Funhouse filter, which we also have in here. And let me reroute that through here.
And as you can clearly see on the side of the image, there's a distortion. That is basically how easy it is to create one and how to use it as well. And as no pixels were heard in this demonstration, I would like to pass back to Ralph to finish up our demonstration. Thank you.
I said that no pixels were hurt in this demo. That's not totally true. There were two. But they had it coming. Okay, so the key message—let me get back to the first one. So what I'm going to do now is give you some ideas what we could do with these things.
I assume if you're in the business of writing an application that deals with photographs or video, you should have a good idea by now what to do with it. But there are actually applications where the use of core images isn't totally obvious, but could actually make a nice difference. So first thing I would like to say here is the key message that in this millennium, image processing is something that happens in the display pipeline.
There is no separate render stage. Core image essentially does most of the things in real time today. On next year's hardware, most likely it does a very significant portion of what we would ever want to do on an image in real time. So if you're building an application that has—these are a bunch of things.
These are a bunch of settings, and then you press apply and it gets rendered into a bitmap, and that's probably the wrong UI to pursue. Instead, try to make the application respond in real time. So you have a slider, and as the user moves the slider around, the image data is processed right away.
And this has a bunch of interesting implications. For example, if you have an undo buffer. In the last millennium, undo buffers on images were really quite a science. You have to keep megabytes of data around for each stage that you're doing. In this case, you no longer do that. You just have your filter and a handful of parameters. And the only thing that the undo really affects is these handful of parameters. So things like infinite undo on an image is almost trivial. Trivial.
So here's a bunch of ideas what to do with it while you're processing photographs and video, using it for transition effects, or work on creating a richer user interface. We're running kind of late on time here, so I'm going to switch to the demo right away. Demo machine two, please.
Okay, so the first thing I would like to demo... Which key is it? Here. Well, you saw Dashboard in the keynote. And when I take something and drag it here, you see this little ripple effect. And that's Core Image at work. So this is an example how you can put Core Image somewhere in a little piece of your application and do something nice with it. So let's do that again, just because it was so much work to make that ripple working.
[Transcript missing]
First one is a little toy. When I'm really bored in airplanes, and that happens a lot, I take pictures out of the window and unfortunately it looks like this. The key problem is there's just so much air between you and your subject and it produces haze effect and it looks completely washed out. So I tried to build a filter that tries to correct some of this.
So, this slider here tries to simulate the distance to the ground at the bottom of the image. So I'll kind of move that around and let's see, this is about the brown that you would expect from the ground. And then this slider here, the second one, is the distance at the top. So it assumes there is a slanted plane you're looking at. So you can figure that around like this. And by the way, this is Crater Lake in Oregon. So just to, where did I put it, here.
So this is the before/after shot essentially. And it's not exactly a particularly problem that you encounter all the time. It's a very specific solution. I would probably never have attempted it to do if it would have taken me a half a day of work. But with Core Image it was literally 40 minutes of work so I could, well, not experiment a bit and try to do this.
"Clearly, the white point isn't right, so that mountain needs some adjustment." So, and by the way, the source code for that and as well the source code of how to build an image unit is available on the Apple website. Sursko is also available for this example here. This is essentially doing nine different transitions all at the same time.
And why do I show that? This is an example of what new types of UI this could enable. Imagine an application like Keynote, which has a widget to select the transition between one slide to the next. Well, today that widget is a pop-up menu and a little preview at the bottom. But with the power of Core Image at your hands, you could actually go and say, "Well, skip the pop-up, just show all transitions at the same time." The user just clicks on one.
This makes a more compelling UI because, well, it's clearly more discoverable what kind of functionality is available. And I have to admit I was cheating for this demo. This morning, I found a bug in our GPU implementation with this particular case. So this is actually running on the software . OK, with that, we go back to the slides.
So, where to go from now? Well, tomorrow morning there's the Graphics and Media Lab. My colleagues and I will be there if you have questions, that's the right place to go. If you're interested in how to use Core Image together with Video, then the new directions for QuickTime Performance is the session to go to. You will learn about Core Video and how to create pipe video frames through Core Image.
And on Friday, there is the Discovering Quartz Composer session. You saw it in Frank's section. Quartz Composer is a really great tool to just wrap in a prototype, string a bunch of filters together, figure out how things look, and if you build your own filters, it will load them, so that's just a session to check out.
For more information, well, there's documentation available. Actually on the Tiger DVD, there is the reference on Core Image and on the Apple website, connect.apple.com, is the actual architecture and fairly rich set of, all filters are in there with an image, you know, before, after that explains what they're doing. That's a great way to start.