Media • iOS, macOS, tvOS • 49:16
Get all the details on how to access the latest capabilities of Core Image. Learn about new ways to efficiently render images and create custom CIKernels in the Metal Shading Language. Find out about all of the new CIFilters that include support for applying image processing to depth data and handling barcodes. See how the Vision framework can be leveraged within Core Image to do amazing things.
Speaker: David Hayward
Unlisted on Apple Developer site
Downloads from Apple
Transcript
This transcript has potential transcription errors. We are working on an improved version.
All right. Now -- thank you very much and welcome. And my name is David Hayward, and I'm really excited to talk to you today about all the great new features and functionality we've added to Core Image on iOS, macOS, and tvOS. We have a great agenda today. I'll be giving a brief overview of Core Image and a summary of the highlights of what we've added this year.
Then after that we'll spend the rest of the hour going into deep details on all our new APIs and new features. So, with that in mind, let's start going in. So a brief overview of Core Image. In a nutshell, Core Image provides a simple high-performance API to apply filters to images. These images can be used to adjust the color, adjust the geometry, or perform complex reductions or convolutions.
There are several tricks that Core Image uses to get the best performance out of this because these filters can be combined in either simple chains like this example here or in complex graphs. One of the tricks Core Image does to get great performance is to support automatic tiling, and this allows us to -- if an image is too big or there's too much memory that's going to be required to render a render graph, we can reduce our memory footprint.
Another feature of this tiling, which it means that if we're only rendering part of an image we can be very efficient and only load the portion of the input image that's needed. So these are great performance things that you get for free by using Core Image. Also when you're writing your own kernels in Core Image, our language extensions provide you the ability to get this functionality for free with very little difficulty.
Another thing to keep in mind is that all the filters in Core Image may be based on one or more kernels, either our built-in kernels or custom kernels. And another trick Core Image uses to get great performance is to concatenate these kernels into programs. This allows Core Image to reduce the number of intermediate buffers, which allows us to reduce our memory requirements and also improves image quality.
So with that introduction underway, I'm going to give a brief description of what we've added to Core Image this year. It falls into three main categories. First, of course, is performance. As always, this is something that's very important to Core Image, and we've added some enhancements in this area to give your application the best performance.
Specifically -- the next thing I'd like to mention is that we've also spent a lot of time this year enhancing Core Image to give you, the developer, better information about how Core Image works internally. All of these optimizations we do internally that I alluded to in the previous slides, you can now see what we're doing to achieve these.
Thirdly, we've added a lot of functionality, and this is to allow your applications to get the best access to all the new features on our platform. So a little bit more detail on these. In the area of performance, Core Image now allows you to write CI kernels directly in Metal. And we also have a new API to allow you to better render to destinations. We'll be talking about this in much more detail later on in the presentation.
In the area of information, we have a new API that allows you to get information about what Core Image did for a given render. And also, we have some great new Xcode Quick Look support, which we'll show you. And in the area of functionality, we have some great new stuff as well.
We have a new collection of filters, new barcode support, and also support for editing depth. I want to call out the session that occurred earlier today on image editing with depth. If you didn't see it, you should definitely go back and watch it. It goes into great detail about how to use Core Image to edit depth.
So now let me talk in more detail about the new filters that we've added this release. We now have 196 built-in filters, and we've added some that have been -- are great new additions. For example, some of these are very useful when you're working with depth data. For example, we have convenience filters for converting between depth and disparity. We also have morphological operations, which allows you to erode and dilate an image, which is useful for manipulating depth masks. We also have convenience filters that allow you to combine an image with two different color cubes based on the depth of an image.
I also want to call out a great new filter which we talk about in the previous -- in the editing session called CIDepthBlurEffect, which allows your application to get access to the great depth blur effect that we have in our camera and photos application. Again, I highly recommend you watch the image editing and depth session that was recorded earlier today.
We also have several other new filters based on popular requests. We have a filter now that allows you to generate an image from text, which is great for allowing you to add watermarks to video or other textual overlays. We have a filter that allows you to compare two images in LabDeltaE space, which is great for seeing if your results are what you expect or to give a user information about how much an image might have changed. We also have a new bicubic upsample, or downsample filter, which is great for a variety of purposes. We also have a new way of generating barcodes, which we'll talk about in more detail later in the presentation.
Lastly, in the area of filters, we have some filters that have been improved since our last release. We have several of the blend mode filters -- now behave more like expectations. And we've also improved greatly the quality of our demosaic and noise reduction filters that are part of our RAW pipeline.
So, as we release new cameras, we'll be getting -- or support for new cameras, you'll see the improvements of that. So that's new filters. I'd like to bring Tony up to the stage who'll be talking in detail about how to write kernels directly in Metal, which is a great new feature.
[ Applause ]
All right. Thank you David. Good afternoon everyone. My name is Tony, and I'm really excited to tell you about this great new feature we've added to Core Image. So let's get right to it. So first, let's put this in a little bit of context. If you refer back to this simple filter graph that you saw earlier. What we're talking about now are these kernels that you see here at the bottom, which allow you to implement your very own custom code that will describe exactly how you want the pixel to be processed on the GPU.
So previously these kernels were written in the CIKernel Language, which was the shading language based on GLSL, but it also provided some extensions that allows Core Image to enable automatic tiling and subregion rendering. For example, we had a function called destCoord that lets you access the coordinate of the destination that you are about to render to regardless of whether you're just rendering a subportion of the output or if the output image is tiled.
We also have a couple of functions called samplerTransform and sample that let you sample from an input image regardless of whether the input image is tiled. So again, as David mentioned earlier, this provides a nice abstraction to tiling so that you don't have to worry about that when writing your kernels.
So once these kernels are written, they are then translated, concatenated together as much as possible with other kernels, and then compiled at runtime to either Metal or GLSL. Now for a rich language like Metal, the compilation phase can actually take quite a long time at runtime. And, in fact, in the worst case, if you're just rendering a very small input image - sorry, rendering a very small image or a relatively simple filter graph, most of that time could actually be spent compiling versus the actual rendering.
So to show you an example of that, here's a case where on the very first render before any of the compilation has been cached, you can see there's a lot of time spent compiling versus rendering. So if we step through these stage by stage, the first step is to translate the CIKernels shown in blue.
And the second stage is to concatenate the CIKernels. And then we go through a phase to compile the CIKernels to an intermediate representation, which is independent of the device. And then there's a final stage to actually compile that IR to GPU code to be executed on the GPU.
So the problem here is that concatenating CIKernels is something that has been done dynamically at runtime, so what if we were to allow that stage to happen after the compilation? So what that allows us to do is hoist up that really expensive compilation at build-time and therefore leaving behind only the work that needs to be done at runtime. So, as you can see, this is now a much more efficient use of the CPU, not to mention lower power consumption.
So I'm pleased to say that this is now possible in Core Image and that's by writing CIKernels directly in Metal. And to make this happen required some really close collaboration with both the Metal Framework Team and the Metal Compiler Team. And we think this is going to open up doors to some really exciting new opportunities. But first, let me just highlight some of the key benefits that you're already getting today.
So, as you saw earlier, now the CIKernels can be precompiled offline at build-time. And along with that, you can get some really nice error diagnostics. So if you had some typo or a mistake in your kernel, you can see that directly in Xcode, without having to wait for the runtime to detect them.
Second is now you have access to much more modern language features since Metal is a relatively new language that was based on C++. And I want to stress that with writing CIKernels in Metal, you still get all the benefits such as concatenation and tiling, which has been the cornerstone of our Core Image framework for many years. So nothing is compromised by writing CIKernels in this new way.
And furthermore, these new CIKernels in Metal can also be mixed with traditional CIKernels. So [inaudible] can contain either traditional kernels or kernels written in Metal. And that allows this feature to be maximally compatible with your existing application. And, as you would expect, this feature is supported in a wide variety of platforms, namely iOS for A8 or newer devices, as well as macOS and tvOS.
So now let's take a look at how we go about creating these Metal CIKernels. The first step is to write your CIKernel in a Metal shader file. Then once you have that CIKernel implemented, the second step is to compile and link the Metal shader file in order to generate a Metal library that can then be loaded at runtime. And then a final step is to just initialize the CIKernel with any function from that Metal library. So let's take a closer look at the first step, writing a CIKernel in Metal.
So to do that, I'd like to introduce you to our new CIKernal Metal library. And what that is is basically a header file that contains our CIKernel extensions to the Metal shading language. So namely we have some new data types, such as destination, sampler, and sample. Destination lets you access all the information that you need that pertains to the output. And sampler lets you access all the information that pertains to the input image. And sample is a representation of a single-color sample from an input image. And along with these types we also have some convenience functions that are very useful for image processing.
For example, you can do premulitply and unpremultiply, as well as some color conversions between different color spaces. So these new extensions are semantically the same as they used to be in the CIKernel language. There's just some slight syntax differences that pertains to the destination and sampler types, so let me show you that in a little bit more detail. So here's a snippet of what our CIKernel Metal library looks like. It is called CIKernelMetalLib.h, and all our extensions are declared inside a namespace called coreimage to avoid any conflicts with Metal.
So the first step that we have defined is called a destination and it has a method that lets you access the coordinate of the destination. Previously, if you were writing CIKernels in the CIKernel language, you would have done that via a global function called destCoord. But now, if you're writing kernels in Metal, you need to declare this type as an argument to your kernel in order to access that method. And then the second type we have to find is the sampler. And this has all the same methods that used to exist as global functions, but they are now implemented as member functions on the sampler type.
So to give you a nice summary of all that, here's a table that shows you the syntax that used to exist in CIKernel language versus the syntax that is now available in Metal. And, as you can see, in the CIKernel language, those are all implemented as global functions, but now with Metal, those are all member functions on their appropriate types.
So we think that with the new syntax they'll allow you to write your code to be more concise and easier to read. But for sake of portability, we did include the global sampler functions in our header, which are merely just wrappers to the new syntax. So that'll help minimize the amount of code changes that you need to make if you're importing from existing kernels to Metal.
So now let's take a look at some examples of CIKernels in Metal. The first one we're going to look at is a warp kernel. And as with all Metal shaders, the first thing you need to include is the metal underscore stdlib, but for CIKernels you need to include our Metal kernel library, and that can be done by just including an umbrella header CoreImage.h. Then the next step is to implement all your kernels inside the extern C enclosure, and what that allows is for the kernel to be accessible at runtime by name.
So here we have a simple kernel called myWarp. And all it takes is a single argument, a destination type, and from that destination you can access the coordinate that you're about to render to and apply various geometric transformations that you want to it. And then return the result. And for sake of comparison, here's that same warp kernel that was implemented in the CIKernel language. So you can see they're almost identical minus some minor syntax differences. But semantically they are the same, and at the end of the day compiled to the exact same GPU code.
The second example here is a color kernel. And for the most part it looks very similar, the only difference is now we have a kernel called myColor, and what it takes is a single sample as input. From that sample, you can apply various color transformations that you want on it and again return the result. Here again is that same color kernel implemented in the CIKernel language.
And then the last example I want to show you is a general kernel, which you can do if you can't implement your kernel as either a warp or a color. And so here we have a kernel called myKernel. And it takes a single input, which is a sampler type, and from that sampler, you can sample anywhere in the input image and take as many samples as you need. And again, do something interesting with it and return the result. And one more time, here is that same CIKernel written in the old CIKernal language.
So now that you have a CIKernel implemented in Metal shader file, the next step is to compile and link the Metal shader. So, for those who have experience writing Metal shaders, this build pipeline should look very familiar. It's basically a two-stage process. The first one is compiling a .metal to a .air file, and then the second stage is to link the .air file, and package it up in a .metallib file.
The only additional thing you need to do here for CIKernels is specify some new options. The first option you need to specify is for the compiler. It is called -fcikernel. And then the second option is for the linker and it's called -cikernel. Note that there's no f on that option. And you can do that directly in Xcode, and let me show you that with a short little video clip that illustrates how that can be done.
So for the compiler option, you could just look up the Metal compiler build options and specify -fcikernel directly in the other Metal compiler flags. And because we don't have a UI for linker options to specify that, you have to add a user-to-find setting. And give that setting a key called MTLLINKER underscore FLAGS and then the value that you specify is -cikernel.
So you just need to set this up once for your project. And then all the Metal shaders that you have in there will be automatically compiled with these options. But if you prefer to do things on [inaudible] or in a custom scrip, you can also invoke those two compiler and linker tools, like so.
So now the last and probably the easiest step is to initialize a CIKernel with a given function from the Metal library. And so to do that we have some new API on our CIKernel class, and they allow you to initialize the CIKernel with a given function by name as well as a Metal library that you can load at runtime. There's also a variant on this API that lets you specify an output pixel format for your kernel. So if your kernel is just going to output some single shadow data, you can specify a single shadow format for that kernel.
So here's an example of how to initialize the CIKernel. All it takes is those three simple lines. The first two is for loading the Metal library, which, by default, if it was built in Xcode, will be called default.metallib. And then once you have that data loaded, you can initialize the CIKernel with a given function name from that library.
Similarly, for warp and color kernels they can be initialized with the exactly the same API. So once you have that kernel initialized you can apply that however you like to produce the filter graph that you desire. So that's all there is to writing CIKernels in Metal and we think this is going to be a great new workflow for developers so we look forward to seeing some amazing things that you can do with this new capability.
All right. So now the next topic I'd like to talk about is a new API that we have for rendering to destinations. And this is a new consistent API across all the different destination types that we support. Namely IOSurfaces, which, by the way, is now public API on iOS. And we also support rendering to CVPixelBuffers, as well as Metal and OpenGL Textures. Or even just some raw bitmap data that you have in memory.
And one of the first things you'll notice with this new API is that it will now return immediately, if it detects a render failure, and give you back an error indicating why it failed. So now you can actually detect that programmatically in the application and fails gracefully if an error is detected.
With this API, you can also set some common properties for the destination object, such as an alpha mode or a clamping mode behavior, or even a colorspace that you want to render the output to. Previously, with our existing API, the alpha mode and clamping mode was something that would be determined implicitly, based on the format of your destination. But now, you can actually explicitly override that with the behavior that you want.
In addition to these common properties, we have some new advanced properties that you can set on the destination, such as dithering and blending. So, for example, if you have an 8-bit output buffer that you want to render to, you can just simply enable dithering to get some -- a greater perceived color depth in order to reduce some banding artifacts that you may see in certain parts of the image.
And a nice thing about these properties is now they effectively reduce the need for having to create multiple CIContexts. And that's because some of these properties used to be tied to the CIContexts. So, if you had multiple configuration of different destinations, you would have had to create a CIContext for every single one. So now that these properties are nicely decoupled, you can, for the most part, just have one CIContext that can render to various different destinations.
But along with all these functionality that this API provides, there are some really great performance enhancements that can be realized with this new API. For example, our CIContext API for rendering to IOSurfaces or CVPixelBuffers. They used to return after all the render on the GPU is completed. But now with this new API, it will return as soon as the CPU has finished issuing all the work for the GPU. And without having to wait for the GPU work to finish. So we think this new flexibility will now allow you to pipeline all your CPU and GPU work much more efficiently.
So let me show you an example of that use case. So here we have a simple render routine that is going to clear a destination surface and then render a foreground image over top of a background image. So the first thing we do is initialize a CIRenderDestination object given in ioSurface. And then the first thing we want is to get a CIContext and start a render task to clear the destination. But before waiting for that task to actually finish, we can now start another task to render the background image to the destination.
And then, now before we start the final task, we can set a blend kernel on this destination object, which can be anyone of our 37 built-in blend kernels. In this case, we've chosen a sourceOver blend. But you can even create your own custom blend kernel by using our new CIBlendKernel API.
So once we have the blend kernel that we want, we then call CIContext to start the final render task to render the foreground image over top of whatever is already in that destination. And only then do you need to call waitUntilCompleted, if you need to access the contents on the CPU. So with this new setup, this will now minimize the latency of getting your results without having to do any unnecessary synchronization with the GPU.
The next use case I'd like to illustrate is one that will highlight a much more subtle performance benefit, but it can have a huge impact in your application. And that's rendering to Metal drawable textures. So you can do that very simply by getting a currentDrawable from, let's say Metal [inaudible] view. And then from that, you can initialize a CIRenderDestination with the texture from that drawable. So this will work just fine, but if you were to do this in a per-frame render loop, there's a potential for a performance bottleneck here that may not be so obvious.
So let me try to describe that or explain that in a little bit more detail with a timeline view here. And please bear with me because there could be a lot of steps involved. So here we have a timeline that has two tracks, the CPU at the top and the GPU at the bottom.
Technically there's actually a third component in play here, which is to display. But for the sake of simplicity, we'll just treat that as part of the GPU. So in the very first frame, your app will try to get a drawable from the view. And then from that drawable you can get a texture and then start a task to render to that texture.
So once the CI gets that call, we will start encoding the commands on the CPU for the work to be done on the GPU. And in this particular case, we're illustrating a filter graph that actually has multiple render passes, namely two intermediate passes and a final destination pass.
Once Core Image has finished encoding all the work, the call to startTask will return. And then from then, that point on the GPU will happily schedule that work to be done at some appropriate time. But, if the work on the GPU is going to take a long time, your app could get called to render another frame before the work is done. And at that point, if you try to get a drawable, that call to get drawable will stall until it is ready to be vended back to your application.
And then only then can you get the texture from it and start another task to render to it and then so on for all subsequent frames. So, as you can see here, this is not a very efficient use of both the CPU and the GPU because there's a lot of idle times on both processors. But, if you look closely here, the drawable texture that we're about to render to is actually not needed until the very last render pass. So let's look at how we can actually improve this scenario.
So, with our new CIRenderDestination API, you can now initialize it, not with the texture, per se, but rather, all the properties of the textures, such as the width and height and the pixel format of the texture. And then you can provide that texture via call back, which will be called lazily at the latest possible time for when that texture is actually needed. And so now, with the destination object initialized immediately, you can start a task and render to it much sooner, and this will effectively defer that potentially blocking call to currentDrawable to a much later point in time.
So now, if we look at this example, the work now on the CPU and GPU can be pipelined much more efficiently. So if you're rendering to Metal drawable textures, we strongly encourage you to use this new API. Because this could greatly improve the frame rate of your application.
In fact, we have seen cases where the frame rate literally doubled just by simply employing this technique. All right. So now I'd like to hand it back to David who will tell you about some really cool stuff that lets you look under the hood inside the Core Image framework. Thank you.
[ Applause ]
Thank you so much Tony. That's great stuff. As I mentioned in my introduction, Core Image has a lot of great tricks it uses to get the best performance. And one of our goals this year was to make it clearer to you, the developer, how those tricks are occurring so that you can get a better understanding of how to use Core Image efficiently.
And we've done that in a couple of interesting ways. First of all, in our new APIs, we have some new ways of returning you information about the render. After you've issued a task to render, you can now, after when you wait for that task to complete, it will return a CIRenderInfo object.
And this object will return to you an object with a few properties on it, including the number of passes that Core Image needed to use to perform that render. As well as the total amount of time spent executing kernels on the device and also the total number of pixels processed. So that's just a great piece of information that you can get when we return information to you. But perhaps what is even cooler is the awesome new editions we've made to Core Image to provide better Quick Looks in Xcode.
Notably, we now have a great Quick Look support for CIImages. In addition to just showing the pixels, we now show you the image graph that you constructed to produce that image. If you do a Quick Look on a CIRenderTask, it'll show you the optimized graph that Core Image converted your image graph into. And, if you wait for the render info to be returned to you, if you do a Quick Look on that, it'll show you all the concatenation, timing, and caching information that Core Image did internally.
So to give you an idea, we're going to show you this in a very visual way. Here's some code. Let's pretend we're stepping through this in Xcode. Here's an example of a image graph that we're going to construct. In this case, we're creating a CIImage from a URL, and we have a new option that we're specifying on this image, which is kCIApplyImageOrientationProperty to true. And what this will do is automatically make the image upright for you, which is a nice convenience. The next thing we're going to do is we're going to add onto that image an additional AffineTransform which scales it down by .5.
Now imagine we're in Xcode, and we hover over the image object. And, if you click on the little eye icon, it'll now pick up an image like this. In addition to the image showing you what the image looks like and it's nice and upright, it also shows you the graph to create that image below it.
If we zoom in, we can see all sorts of interesting information. We can see, at the input of the image graph, we have our IOSurface. This is, I can tell by looking at it it's a YCC image. It means it probably came from a JPEG, and you can see the size of that surface, as well as that it's opaque.
You can then see the next step above that in the graph is a color-matching operation. So we were able to determine automatically what the colorspace of the input image was. And we've inserted it in the render graph and operation to convert from the display P3 colorspace to Core Image's workingspace.
And lastly, you can see three affine matrices. The first one, as we're counting from the bottom, is the matrix that converts from the image coordinate system to the Cartesian coordinate system that Core Image uses. Then we have the affine to make the image upright, and then the image affine to scale it down by .5. So now you can really look at an image and see everything that's being asked of it.
Now let's do something slightly different this time. Now we're going to ask for an image, but we're going to ask for the auxiliary disparity image. And this is a new option that we have as well, and this, if your image has depth information in it, it will return that as a monochrome image.
After we ask for that image, we're going to apply a filter on it. In this case, we're going to apply a bicubic scale transform to adjust its size. Now, if we were to hover over this object while we're debugging in Xcode, we will now be able to get this image.
And now you can actually see the disparity image where white is in the foreground and darker is further in the background, but we can also see the graph that was used to generate this image. Here we see that the leaf or input image is an IOSurface that is a format luminance half float, and you can also see that the dimensions of the image are smaller than the original image.
You can also see at the top of this graph the cubic upsampling filter that we've applied. There's actually some method to our colors that we chose here. One thing you'll notice is all of the inputs to our graphs are purple. Anything that affects the color of an image, i.e., CIColorKernel is red. Anything that affects the geometry, in other words, a CIWarpKernel, is green. And the rest of the kernels are in a blue color.
So now let's get even more interesting. We're going to take the primary image and we're going to apply two different color cubes to it. We're going to take those two resulting images and then we're going to combine them with the CIBlendWithMask filter. And, if we look at this in Quick Looks, we now see the final image where it's been beautifully filtered with two different effects, based on foreground and background. But also, we see detailed information about the graph that was used to produce it.
You can see here, on the left-hand side, the portion of the subgraph where we took the input image, got the color cube data, which is a 32 by 24 image, and then apply that color cube to it. On the middle graph, we're doing the same thing for the background image. All of these, plus the mask image, are used to combine with the blendWithMask kernel.
So we're hoping that gives you great insight on how your application creates CIImages. But what happens when it comes time to render? And this is where things get really interesting. Once you tell a CIContext to start a task, that'll return a CIRenderedTask object. This also supports Quick Looks. And, if we look at this, we see now an even more elaborate graph.
Again, the color coding is the same and we can see some of the same operations but now, what we saw before as a color matching operation has been converted into the primitives that are needed to do the color management. So we can see that we needed to apply this gamma function and this color matrix to convert from P3 to our workingspace. Another thing you can see is, while the original image had three affine transforms, Core Image has concatenated those all into one.
Another thing you can notice is at the end of the graph, we now know what the destination colorspace is. So those operations have been applied to the image as well, both applying the colorspace as well as the coordinate system transform for the final destination. You can also see it associated with all these objects or all the dimensions of all the images that are involved at each stage of the render graph. Now, if you wait for your task to be completed, now there's a new object. And associated with this object is detailed information about how Core Image was able to concatenate and what the performance of that render was.
You'll see now there's much fewer objects in the tree because of concatenation. If we look at one of the lower ones here, we can see that this particular program is the result of concatenating a few steps into one program, and we can also see associated with this the amount of time that was spent on that program in milliseconds.
One great feature of Core Image is, if you then render this image again later and a portion of the graph is the same, then Core Image may be able to obtain the previous results from a cache. If that happens, then you'll see the time go to zero in those cases. So this also provides a way for your application to know how efficiently Core Image is able to render and cache your results, given memory limits.
So we hope you find this really informative and instructional on how Core Image works internally. The next subject I'd like to talk a little bit about today is barcodes, specifically the CIBarcodeDescriptor API. We now have great broad support for barcodes on our platform. Barcodes of various different types from Aztec to Code128 to QRCode and PDF417.
We also have support for barcodes across a broad variety of frameworks, and those frameworks use barcodes in different ways for good reasons. For example, oops, sorry. AVFoundation is the framework to use if you want to detect barcodes when you're capturing video from the camera. If you want to detect barcodes from either still images or from a video post-capture, the Vision framework is a great way of using it. And lastly, you can use Core Image to render barcodes into actual image files. Given this broad support across frameworks, we wanted to have a new data type that would allow the barcode information to be transported between these frameworks in a lossless way.
And that is the reason for the new CIBarcodeDescription API. There's one key property of this object, and that is the errorCorrectedPayload. This is not just the textural message of a barcode. It's actually the raw data, which allows you to use barcodes and get information out of them in ways that are more than just textural information. So, with this raw errorCorrectedPayload and understanding of each barcode's formatting, you can do interesting properties and build interesting vertical applications based on barcodes.
Also there are properties that are unique to each particular barcode. For example, in the case of the Aztec barcode, you can know how many layers were involved in that code. Or, in the case of the QRCode barcode, what the maskPattern that was used on this. So let me give you an example of how this type can be used across these three frameworks. So firstly, in the area of AVFoundation, it is possible to register for a metadataOutputObjectDelegate that will see barcodes, as they are appeared in a video feed.
In your code, all you have to do is setup this object. And, in your response for when that object is detected, you can ask for the object as a AVMetadataMachine ReadableCodeObject. From that object, you can get the descriptor property, and that will return one of the CIBarcodeDescriptor objects.
Second if you want to use the Vision framework to detect barcodes, the code is really simple as well. Basically, we're going to use Vision to both create a request handler as well as a request. Then we're going to issue that request to the handler to detect the barcode. Once we get the request results back, we can then get the barcodeDescriptor object from that result. Very simple.
And lastly, simplest of all, is using Core Image to generate a barcode image from a descriptor. So, in this case, it's very easy. We just create an instance of a CIFilter of type BarcodeGenerator. We then give that filter the input of the descriptor object for the inputBarcodeDescriptor. And then we're going ask for the output image.
And these combined allow us to do some interesting stuff with both detection and generation of barcodes. And so, as a brief prerecorded demo of this, let me show you a sample app that we wrote. And what this does is look for frames of video, pulls the barcode, and actually renders the barcode back over the barcode as a -- kind of a augmented image over it. And you can see that we were able to perfectly reproduce the detected barcode and re-render it on top of it. If we just see that again real quickly, we can actually even see that it's being render because overlapping my thumb in the image. So --
[ Applause ]
All right. So for our last section of our talk today, I'm really excited to talk about how to use Core Image and Vision together. These are two great frameworks. Core Image is a great easy to use framework for applying image processing to images. Vision is a great framework for finding information about images. And you can use those together in great and novel ways.
For example, you can use Core Image as a way of preparing images before passing them to Vision. For example, you might want to crop the image to an area of interest. Or correctly make the image upright. Or convert to grayscale before passing division. Also, once you've called Vision, once you have information about the image, you might be able to use that as a way of guiding how you want to adjust the look of your image. So, for example, if a feature is detected in an image, you might choose different ways to apply image processing to it.
And, of course, these two can be combined together. But to go into a little bit more detail about this. We have a interesting demo we'd like to talk about today, where we're going to try to generate a photo from several frames of the video. Where the unwanted objects have been removed from that. And this demo is going to involve using three frameworks and four steps. The first step is going to be using AVFoundation to get the frames out of the video.
And that's very simple to do. Then we're going to use Vision to determine what the homography matrices are needed to align each of these frames to a common reference. Inevitably there's some camera shake, so a little bit of correction goes a long way. So that will allow us to get these homography matrices represented here as these slightly moving arrows for each frame in the image.
The third step is to use Core Image to align all these frames to each other. And that's very easy to do, as well. And lastly, we're going to use a median technique to create a single photo from all the frames in the video in a way that produces an optimal image.
The technique here is to produce an output image where, at each location of the output, we're going to look at locations in the input frames. And we're going to use the median value at each location. If you look in this example here, the first four images are that little spot is over the concrete pavement, but the fifth one is over the legs. Now, if we take the median of these five values, we're going to get a value that looks like just the concrete.
If we do the same thing at another portion of the image, again, we're here underneath the tree. Three of the five frames are good. The other two are less good. So we're going to use the median of those, and if you do that for every pixel of an image, you get a great result. Where objects that are transitory in the video are magically removed.
Let me talk a little bit about the code, and I promise we'll get to the demo at the end. The first step here is we're going to use Vision to determine the homographic registration for each frame. Again, this is quite easy. We're going to use Vision to create a request and a request handler to produce that information.
We're going to tell Vision to perform that request. And then once we get the result, we'll ask for the result and make sure it is an object of type VNImageHomographic AlignmentObservation. Wow, that's a mouthful. And then we'll return that. That object is basically a 3 by 3 matrix.
Once we've returned that, we can then use Core Image to align the images based on this 3 by 3 matrix. This is a little tricky, but it's actually very easy to do using a custom warp kernel written in Metal. You can see here in this kernel we now have a parameter which is a float3 by 3. This is something that's new to Core Image this year.
And what we're going to do is we're going to get destCoord, convert that to a homogenous destCoord by adding a 1 to the end. We're then going to multiply that vector by the matrix and that gives us our homogenousSrcCoord. Then we're going to do a perspective divide to get us the source coordinate that we're going to sample from. And that's all there is to it.
The last step is to apply the median filter in Core Image, and, in this example here, I'm illustrating the code that we use for a 5-image median. In fact, sometimes you have many more and we'll go into that during the demo. But, in this case, we're going to use a sorting network to determine what the median value of the five pixel samples are.
Again, if we look here, this is a great example where writing this kernel in Metal was actually very convenient and easy, because now we can actually pass the values into this swap function by reference rather than value. So now for the fun part. I'm going to invite Sky up to the stage. He'll be showing you more about this and showing how this filter works.
[ Applause ]
All right. Thank you, David. Hi, my name is Sky and it's a pleasure for me to bring to you today this demo. As we can see here on the top, this is our input in video. And, if I scrub around on the slider, we can see that there's no point in time during the whole duration of the video that there's a clear image that we can take of the landmark.
So, for example, at the beginning, we're obstructed by the shadow here. And then we have people passing by. So there's really just no point during the entire video where we can get a clean shot. And actually, if we zoom in on one of the corners, we can see that it's constantly shifting across the entire duration.
So before we run the reduction kernel, we need to first align these frames. And the way we do that is using Vision, as David just mentioned. Now Vision offers two registration APIs, as shown here in the slider. We have a homographic line, and which David just mentioned, but Vision also offers a translational alignment which, in this case, doesn't work extremely well. It's because our camera movement is not restricted to one plane that is parallel to the image plane. So the way we're doing the stabilization is we're registering every frame during the video onto the middle frame.
And so you can expect a pretty dramatic camera movement between the first frame and middle frame, which is why the homographic registration is going to work better for us in this case. So let's just go with that. So with that turned on, if I zoom in on this corner here again and scrub through, you can see that the point is barely moving across the entire frame. And if we go back and scrub through, the video becomes extremely stabilized.
And so this gives you an idea, if you're like writing an image stabilization application, then you could easily do that with Core Image and Vision. So now let me jump into the reduction part. And the first thing I'd like to point out is doing a median reduction over what we have here is 148 frames. It's not really that practical because we need to hold all those frames in memory when we're sorting them. So what we do here instead is we take the median of median as an approximation.
So the first thing we do is we're dividing entire frames, the 148 frames, into several groups. And for each one of those groups, we compute a local group median. And on the second pass, we run our reduction kernel again on the local medians to compute the final approximation results.
Which is why we have this control here on the bottom, and these ticks here that shows you how the frames are grouped. And so, if I change this group count here, we can see that the indicators are changing. So, if we have a group count of three, that means we're dividing the entire video range into three groups.
And for each one of those groups, I can change the group size, which indicates how many evenly distributed tasks we're taking out of the frames to use for our group median computation. So, with that in mind, let's try something with a group count of five and group size of seven, let's say. And let's just see what that gets us.
It's going to take a little bit of time to run because Vision needs to do all the registration and we need to warp the images. And so, as we see here in the output, none of the moving transient objects were actually in our final reduced image, and we have a very clean shot of our landmark, which is exactly what we wanted.
And, if we switch back between the input and the output, we can see that all textual details are very well preserved, which gives you an idea of how well Vision is doing the alignment. And so I hope this gives you a sense of like the interesting applications you can build with this nice synergy between Core Image and Vision. And with that, I'd like to invite David back onstage and to give you a recap.
[ Applause ]
All right. Well, thank you all so much. Let me just give just a summary of what we talked about today. Again, our primary goals for this release was to give your applications better performance, better information about how Core Image works, and great new functionality. And we really look forward to seeing how your applications can grow this year based on this new functionality. So we look forward to you coming to some of our related sessions.
If you look -- need more information, please go to developer.apple.com website. There are some related sessions that are definitely worth watching. There was a session earlier today called image editing with depth, as well as sessions on Vision framework and sessions on capturing data with depth. Thank you so much for coming and have a great rest of your show. Thanks.
[ Applause ]