Video and Photo Effects with Core Image - Tech Talks

2011 • 31:20

Core Image enables you to perform sophisticated image processing operations and create stunning visual effects. Learn about the powerful capabilities of Core Image in iOS 5 for adjusting still images and enhancing live video. Get details about the built-in filters, and see how to take advantage of new APIs for facial feature detection.

Speaker: Allan Schaffer

Unlisted on Apple Developer site

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.

Hello, I'm Allan Schaffehr, the Game Technologies Evangelist at Apple. Core Image is a new framework in iOS 5 that provides an easy way for your apps to apply effects to photos and videos. And it does this using a number of built-in filters, such as for color transformations and compositing operations, and also includes some advanced features, such as face detection, red-eye reduction, and auto-enhance. So, in this video, I'll introduce the key concepts of Core Image, and then dive deeper into its three main classes on iOS: the CI Image, CI Filter, and CI Context. And then I'll wrap up with those advanced features. So, let's get started.

So let me begin with an explanation of the basic concept of Core Image. The idea is that you start with an image or a photo, like the one on the left here, with the picture of the sailboat in a harbor. And you take that image, and various image processing operations can be applied to each pixel. So in this case, we're applying a sepia filter, which transforms the colors to various shades of this brown color, and we get this photo on the right.

But what's interesting is that filters can be chained together to create more complex effects. So here we're starting with that same original image and applying the same sepia filter, which results in this brown-colored monochromatic image. But then we're running that through a hue adjustment filter, which conceptually will rotate the colors around on a color wheel. Then that result is being run through a contrast filter, which boosts the contrast of the image, and we get this result over on the right. And we have a lot of different filters that you can choose from.

But we don't actually create all those intermediate images. Really, the chain of filters is treated more like a recipe of operations that will be applied to the pixels in the original image. And then the entire chain of filters is concatenated together. This eliminates any need for those intermediate buffers.

And further, when that recipe is created, there's a compiler that's used to further optimize the work that'll be done. In this case, for example, I've replaced two of the filters with Color Matrix filters, which can be combined together by the compiler, and this further improves performance. So, that's the basic idea of what Core Image is doing when you're applying filters to images.

Now, Core Image is very flexible. It can take photos and videos from the user's photo library. It can take live video being captured from the camera. You can process images in memory or images from files in a variety of different formats. The outputs are really flexible as well.

Once you've run an image through this so-called filter chain, it can be output as a Core Graphics CG image ref, which itself is really versatile, and can then be used as a UI image in the UI kit or go back out to the photo library with AL Asset Library. Or, you can render images into a CA Eagle layer with OpenGL ES.

Or, third, you can render to a CV Pixel Buffer ref for live video. Which can then be used by AV Foundation. And last, you can also have the output go into memory as just raw pixel data. So, those are the inputs and the outputs, and they're really flexible. But what about this part in the middle, the filters?

Well, so here's a list of all the filters supported in iOS 5. It's a big list, 48 filters. And it's greatly expanded since the original introduction of Core Image at WWDC 2011. So, what I want to do is to go quickly through some of these filters by category, just to help you understand what's being provided.

And the first category are the color adjustment filters. These are primarily various adjustments that you can make on photos, similar to what you see in the color adjustments panel in the iPhoto app. So, it's things like adjusting the exposure in gamma. Color controls lets you adjust saturation, brightness, and contrast. And over on the right, you see white point, color temperature, and so on.

This next group are the color effects. You've already seen sepia tone, but the one that's really interesting here is Color Cube. So this lets you define a three-dimensional color table where you're using the red, green, and blue values of pixels in the source image to look up positions in the 3D space and get a replacement color. This is really powerful and can be used for a lot of different effects.

Next are the compositing operations. This is almost half of all the filters, and it's a big group that's been added since WWDC. These divide into two categories. The first category are various blending operations, which blend the colors from two source images, depending on which filter you're using. And the second category are various compositing operations, which choose to take each pixel from either of the two source images based on its color values and the compositing technique you've chosen. So, as an example, you might use these if you wanted to overlay a custom frame around the edges of a photo.

There's a few generators. These are primarily for testing or just to provide a simple input to some other filter. They can create a checkerboard or stripes or a monochromatic image. The next are the geometry adjustments. Conceptually, these are very simple, such as filters to crop an image, apply an affine transformation, or to rotate or straighten an image.

And last are the stylized filters and gradients. Something you can do, for example, is to run an image or a constant color into a gradient, and then blend that with another image to get some really interesting fall-off or vignette effects. Okay, so that's a big list. You can find more information about each of these filters in the Core Image Filter Reference document, which is available on our developer website.

So now with that introduction, let's dive deeper and look at how to use the Core Image API. There's three major classes that you'll use all the time in Core Image. The first is the CI Filter, and this represents a specific image processing operation. You just saw a list of 48 different filters. Now, almost all the filters take an image as an input, and all of them produce an image as output. And many have various settings as input parameters for the effect being applied.

Next is the CI Image. This represents the image that you'll be processing, but really it represents the recipe that's going to be applied to that image. And it can come from those four sources I discussed earlier, or it can be the output of a filter. And the third class is the CI Context. This is the class that actually does the work. It's what you'll use to run the image through the filters that you've defined and render it. And that work can be done either on the CPU or the GPU.

So this is a good time to mention some of the platform-specific information about Core Image. Core Image got its start on the Mac with Mac OS X Tiger, and there's some differences between what you have on the Mac with the implementation on iOS. The first difference are the filters. We currently have 48 filters on iOS, the list I just showed you. Whereas on the Mac, there are many more, about 130 built-in filters. Plus, on the Mac, it's possible for third-party developers to write your own filters. whereas that capability is not provided on iOS.

Next is the API. iOS uses the three classes I just mentioned, the Filter, the Image, and the Context, whereas on the Mac there are others, primarily to support developing custom filters. You would subclass CI Filter and implement your own kernel, for example. Now, on both platforms, they share the same underlying implementation that concatenates and optimizes the filter graph that you set up to get the best possible performance. And instructions are then generated that either take advantage of the CPU or the GPU using the native libraries for each platform.

Now, let me show you a quick example of using the Core Image API to apply sepia tone to an image and render the result. So first, we'll start by creating a CI image instance. In this case, we're loading from a URL in our application bundle or from some other location. So that gives us an image. Next, we're going to create a filter and configure its inputs. I'm creating a sepia tone filter and setting its input to be the image we just loaded, and setting its intensity to 0.8.

Then I create the context which is going to render the filtered image. Now here I render it. I'm asking the filter for its output CI image, which at this point just represents the recipe that will be applied to the image when it's rendered. And in this final line, I ask the context to render that CI image into a Core Graphics CG image ref. But so those four steps, creating the three objects and then rendering, represent most of the work you'll do with Core Image. But next I want to go into a little more detail about each step. So first, let's look at how you can create a CI image.

As I said earlier, the CI image is really flexible. You can initialize it with the data from a number of different sources, such as a URL to a photo or an image file using image with contents of URL, or an NSData containing an image file in any of a number of common formats.

Next is the Core Graphics CG Image Ref, which is very commonly used on iOS and the Mac, and itself is very versatile. And in fact, what I'm showing here is how to start from a UI image. We're loading foo.ping and asking the UI image for its CG Image Ref, and that's what we use to initialize our CI image.

A third source could be from a Core Video Pixel Buffer, which is a buffer that enables you to take, for example, live video from an AV Foundation capture session and use those frames as images in Core Image. And then finally, you can also initialize a CI image from raw pixel data in memory. So here, I'm creating an image using an NSData containing the bitmap, and then specifying the bytes per row, the size, the format, and an optional color space.

Now, while I'm on the subject of color space, let me talk a little bit about color management in Core Image. On Mac OS X, a CI image can be tagged with any color space. This is provided using the Core Graphics Data Type CG color space. And if a CI image is tagged, the pixels are converted to a linear working space before filtering is applied. Now, on iOS, it's a little bit different. A CI image can be tagged with the device RGB color space, and if it is tagged, the pixels are gamma corrected to a linear working space before filtering.

So, you can use the KCI Image Color Space key to override the default color space on the CI image you're creating. And one value you might set is NULL to leave the image unmanaged. While in most cases you do want to have the gamma correction turned on, there are cases where you want to run your entire filtering pipeline, perhaps, with color management turned off, such as to improve performance or to get a particular effect.

And the last thing to mention about images is metadata. Modern cameras can associate a lot of metadata with the photos they take, the make and model of the camera, the date and time the photo was taken, the orientation of the image, the exposure, shutter speed, and so on.

And we provide a way for you to get at that information through a dictionary called Properties on the CI image. There's a wide variety of data that could be included, and it's image format dependent with different metadata for GIF, TIFF, JPEG, etc. But one piece of metadata that is typically very important is the image orientation.

You'll typically want to use this so you can maintain the orientation any time you display that image in your UI. Now, the metadata gets automatically defined for you if you're loading an image from a file or from an NSData containing a file. But if you're creating a file, you're creating CI images from data in memory, for example, or from core video, and want to specify the metadata yourself, then there's a way to do that as well using the Image Properties key.

Okay, so that's CI Image. Next, let's go on to how do we define and apply filters. So let's talk about the inputs and outputs of filters first. Here's a sepia tone filter. And as I said, almost all filters take an image as input, perform some operation, and then provide an image as output. So here you see an input and output image for this filter. And for sepia tone, we need also an intensity value of how much sepia to apply.

Here's another filter, Color Controls. This also has an input image and output image, of course, but this one has multiple other parameters to control the saturation, brightness, and contrast of the resulting image. This next one is a little different. This is Hue Blend Mode. So this is going to blend two images together. So it takes two images as input and provides one image as the output.

And the point of all of this is to remember that we'll typically be chaining filters together with the output of one filter feeding into the input of the next filter and so on. So here I have two CI images that will be blended. Then that result will go into the sepia tone filter. Then that result will go into more filters or be drawn by the context. And everything is wired up here except for the sepia intensity, which perhaps you would allow the user to adjust with the slider, for example.

So, let's look at how we accomplish this. Filters are instantiated by name. For example, here I'm asking for a sepia tone filter. This is an NSString, and you can either use the API to query the list of filters supported by the device, or you can just look up their names in our documentation.

The inputs to the filter are set using key value syntax. So here, for example, I'm telling the filter to set this image for its key input image, and I'm telling it to set the value 0.8 for its key input intensity. And just like the names, you can either use the API to query the names of these keys or look them up. So those are the inputs.

Now, all filters only have one output, the output image, and there's three different ways that you can get it. You can use the value for key method on the filter, or you can use the output image method, or just use dot syntax for the property output image. The only difference here is the syntax. You get the same result from all three techniques.

So let's say we have two filters and we want to chain them together and get the result. So it's the exact same code to start with. We're creating a sepia tone filter, setting its intensity to 0.8, and setting its input image to some image that we've previously loaded.

But now let's create another filter, in this case a hue adjustment, which rotates the colors of the image around the color wheel. So we set its input angle to 0.5 radians, and the input image for this filter is the output image from the sepia filter. And then we get the output of the Hue Adjustment Filter, which is what we'll use to render with the context. And that takes us to rendering.

The CI context is the object we use to render a CI image into its destination, into Core Graphics, into OpenGL, into Core Video and AV Foundation, or into an array of bytes and memory. So I want to describe how this works by showing you four common use cases: processing the image and displaying it in a view, or saving the processed image into the photo library, or rendering it into an OpenGL surface, or rendering it into a Core Video pixel buffer.

So, here's the first case. Assuming we've already created a CI image and set up our filters, here's the steps we'll take to actually render that into a view. First, we create our CI context with whatever options need to be set. Then, we'll get the output image from the last filter in our filter chain. And at this point, the output image is just the recipe that the context will use when it goes to render the image.

So then we tell the context to render that output image into a Core Graphics CG image ref. Now, we really have an image now with pixels rendered, and we want to put that up in a view, so we'll wrap it up in a UI image and put the UI image in a UI image view.

So let's take a look at that code. First, we create the context. Now, a quick thing to mention, in all the examples I'm about to show, I include a line like this that creates a context, and then we go on to the other steps. But actually, if you're going to be doing this repeatedly, you should only create the context once, and then just keep reusing that same context every time. You don't want to create a new context every time you render.

But OK, coming back. So we have our context now. Next, we're going to get the output image of the last filter in our filter chain. And now we render that into a CG image ref. We're telling the context, create a CG image from the output of our filter, and we specify its dimensions.

But now we have real pixels, and we need to put them into a view. So we'll create a UI image, passing the CG image we just created, specifying its scale and orientation. And finally, we put that UI image into a UI image view. So it's a lot of steps, but it illustrates each step of the process. However, I only did it this way for completeness. There's actually a shortcut.

And it's that we've added a method to UI Image to take a CI image directly. So all we need to do is create the UI Image and pass the output of the last filter in our filter chain. Then we just put that UI Image into a UI Image view, and that's it.

Next is rendering the output and saving it to the photo library. So this is all the same as rendering into a view, but what I want to illustrate here is a case where you might want to have the context do its rendering on the CPU instead of the GPU.

And there's two situations where you'd want this. The first is if the images you're dealing with are larger than the GPU can handle. And this is frequently the case with large photos in the user's photo library. On the iPhone 4S and iPad 2, the maximum size for images on the GPU is 4K by 4K.

And on iPhone 3GS, iPhone 4, iPad 1, and the third and fourth generation iPod Touch, the maximum size is 2K by 2K. And now, the second case is if you're doing a particularly long processing operation and want to allow it to continue in the background. Well, only CPU context can allow that. So, we'll create our context on the CPU, and then the rest is just like you saw before. We'll get the output of the last filter in our filter chain, render that into a CG image ref, and then put that CG image ref into the photo library.

So let's look at the code. First, we're creating a CPU context, and I do that by specifying an options dictionary, sending a Boolean yes for the key context use software renderer. And then it's the same as what we saw before. I get the output image from the last filter in my filter chain, and I ask the context to render it to a Core Graphics CG image ref. Now I have a real image, and I can use the Assets Library API to add it to the Photo Library.

I get a handle to the library, then I call write image to saved photos album, passing the CG image ref I just created and the metadata for the original image. This API takes a block as a completion handler, and in that I release the CG image. So great. So that's usage case number two, rendering to the Photo Library and illustrating the use of a CPU based context.

So here's the third case, rendering into OpenGL ES. For those of you who are OpenGL programmers, this is really simple. We're going to initialize the Core Image context using the same Eagle context that you're drawing into with OpenGL ES. Then each frame, we do the same things as the previous examples. Get the output image from the last filter in our filter chain, and tell the context to render it.

So here's that example. At initialization time, we create our OpenGL ES context, telling it to use the OpenGL ES 2.0 API. Then we use that OpenGL ES context to create the Core Image context. Core Image is really using OpenGL ES internally in this case, so it's able to use your context very efficiently.

Then each frame at render time, it's the similar as before. We get the output image from the last filter in the filter chain, and then tell the context to render it. And the syntax with OpenGL is just draw image at point from rect. And now we have our processed image rendered into the OpenGL ES render buffer.

Then we can go on to make the rest of our OpenGL ES calls for that frame. And when the frame is finished, we bind the render buffer and present it. This is with plain OpenGL ES. If you were using GL Kit instead, you would just put the Core Image calls into your glkViewDrawInRect delegate method. So that's rendering to OpenGL ES, usage case number three, and there's one more to go.

And that's rendering into Core Video. So this is going to let you source frames of live video from a capture session, for example, process the frames through Core Image, and render them back out to Core Video and AV Foundation, all in real time. So the steps here are identical to what we've seen before.

We create a context when we start this process, and then for each frame, get the output image from the last filter in our filter chain, and then tell the context to render it into a CVPixelBuffer rev. We use context rendered in CVPixelBuffer, and specify the bounds and the color space. And that's it. We're processing frames of live video and writing them back out to the buffer.

So now we've covered the three main classes of Core Image, the CI Image, CI Filter, and CI Context, and looked at how we can get images from various sources, apply filters, and write the output to various destinations. Next, I want to change topics and talk about two advanced features provided for image analysis, and the first of these is face detection.

The idea of face detection, quite simply, is to analyze a photo or live video to detect faces and report back their position in the image. So here's a family vacation photo, and we're able to analyze this image and get the position of the faces and the location of the features of each person, like their eyes and mouth.

Now here's the classes involved. The CI Detector is the main class. You'll provide it with an image to analyze, and if it finds any faces, it'll return an array of CI features. A CI feature has a type and bounds, and a subclass called CI Face Feature, which contains the positions of the eyes and the mouth.

So let's look at how we set it up. So we create a CI detector object by specifying its type, a context, and some options. The only type currently supported is a face. If we set the context to nil, the detector will create one of its own, and then there's the options. This is used to specify whether the detector should have a trade-off for speed or for accuracy. Low accuracy is generally appropriate for detecting faces in live video. And high accuracy is more appropriate if you're doing processing on a single photo and need the most accurate result.

We have our detector set up, and we can ask it to find any faces in the image. So we're calling Detector Features an Image, passing the image, and another set of options. This returns an NSArray of CI Face feature objects, one for each face in the image. The options are to tell the detector which direction is up in the image you're providing. The detector needs that in order to calculate whether or not something in the image is a face. So that's what this code is doing. We're getting the orientation key from the image and using that to set the orientation key for the detector.

Great, so now we have an array of faces, and we can process the results. So here I'm just looping through the array and printing information to the console, starting with the bounds of the face of the image, and then if we have data for the position of the left eye, right eye, and mouth, we'll output that as well.

So, that's face detection. The other advanced feature I wanted to tell you about is Auto-enhance. This is very similar to what you see in the Photos app, and now we've made it available for developers to include this in your apps as well. The idea is to analyze the image. We look at its histogram, we look for faces, and other metadata associated with the image. We run an analysis and provide your app with an array of filters with their parameters perfectly set up to enhance this particular image.

Now, here's the filters we use. The first is Red Eye Correction. If we find any faces in the image, we can look for the eyes and repair the red eye effect that happens in some photos due to the camera flash. The next is Face Balance. Again, we look for faces and can make color adjustments to give better skin tones.

Vibrance makes the image pop. It increases the color saturation while maintaining proper skin tones. The next is Tone Curve, which can improve the contrast. And last is Highlight Shadow Adjust, which will pull down the highlights and pull up the shadows in an image while maintaining the contrast and color.

The API for this is really simple. On the CI image, there's a method, Auto Adjustment Filters with Options. It returns an array of filters specifically tuned for that particular image. And it takes some options. The primary option is the orientation. Since, again, it uses the face detector to find faces and improve skin tones and remove red eye, the detector needs to know what direction is up. And we use the same options as I showed you in the face detection section. Then, you can also disable certain adjustments. For example, if you only want red eye correction, or you'd like everything except red eye correction, there are keys that you can set for that.

Now here's the code in context. We start with an image and call auto-adjustment filters with options on that image. This gives you back an array of filters. And it's now your responsibility to chain them together, hooking the output of the first filter to the input of the next filter, and so on.

So that's what this loop is doing. We're running through the array of filters and setting the input image of each filter to be the output image of the previous filter. And then once we're done, we're ready to render that image. So let's look at some examples. On the left, we have a photo of this little girl. It has nice composition, but her skin tones are a little green. With Auto-enhance, we're able to fix the skin tone and make the other colors more vibrant.

Next are these girls standing on a pier by the ocean. Their faces are in shadow, and we're able to improve the brightness. And finally, this boy here on the right, if you look closely at his eyes, there's some red eye in this picture, which we can detect and correct automatically.

Great, so that's Core Image. I've covered the basics, the CI image, CI filter, and CI context, and a number of common use cases for apps working with photos, live video, and OpenGL ES. Then I went into the advanced features for face detection and auto-enhance. If you have any questions about Core Image, you're welcome to contact me at aschaffe at apple.com. We also have some great documentation available in both the iOS and Mac dev centers. And you can find me in the developer forums as well. Thanks for watching.