Using Accelerate and simd - WWDC 2018

Frameworks • iOS, macOS, tvOS, watchOS • 36:48

Learn how to use sophisticated Signal and Image Processing techniques to bring higher performance to your apps while lowering battery consumption. See compelling use cases for the Accelerate framework with interactive demos. Explore using simd—a valuable addition that effortlessly brings vector programming to your apps.

Speakers: Matthew Badin, Luke Chang

Unlisted on Apple Developer site

Downloads from Apple

HD Video (1.14 GB)
SD Video (294.4 MB)
PDF Slides (1.1 MB)

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript has potential transcription errors. We are working on an improved version.

Good morning. How is everybody?
Good.

[ Applause ]

My name is Matthew Badin, and I'm an engineer at Apple. Welcome to the Using Accelerate and simd session. So my colleague Luke Chang and I are pretty excited today. We get to talk to you about all the great APIs that are available in Accelerate and its associated frameworks, and we're going to start by taking a high-level overview of Accelerate and some of the high-performance libraries that are contained inside it.

We're then going to take a deep dive into a few of libraries, and we're going to start by taking a look at vDSP. We have two examples for you. The first one, we're going to show you how to extract signal from noise, and then we're going to show you how you can remove certain types of artifacts from an image. We're then going to take a look at simd, and we're going to show you how you can use quaternions to represent rotations in three dimensions.

I'm then going to pass off the presentation to my colleague, Luke Cheng, who's going to show you some of the interesting things that you can do with vImage. So with that, let's get started. So you're probably asking yourself exactly this. What on Earth is Accelerate? So the primary purpose of Accelerate is to provide thousands of low-level math primitives, and we provide these primitives across all of Apple's platforms. So this includes not only iOS and macOS but watchOS and tvOS as well.

Most of these primitives are hand-tuned to the microarchitecture of the processor. So this means we get excellent performance. And this performance translates directly into energy savings. So if you're an app developer and you use the Accelerate framework, not only will your application run faster, but you'll also use less battery life. This means your users are going to have an overall better experience.

Now, because we provide so many primitives, we find it useful to organize or group them into domain-specific libraries. So, for instance, we group all the signal processing primitives under vDSP. So, for instance, this would be your FFTs or your DFTs and your DCTs, your fast Fourier transforms, your discrete cosine transforms. VImage contains the image processing primitives. If you're doing color space conversions, this is the library for you. VForce contains vector versions of the transcendental functions. So for instance sine and cosine.

We also have support for dense linear algebra as well as sparse linear algebra, and we have a specialized library for neural networks called BNNS. It stands for Basic Neural Network Subroutines. Not strictly part of the Accelerate framework but very closely related, and we find these libraries very useful, include libraries like simd, which is a vector programming aid for the CPU, and Compression, which contains several different lossless data compression algorithms. So let's take a look at our first library. We're going to take a look at vDSP. VDSP is a state-of-the-art signal processing library, and it contains a wide range of signal processing primitives.

These range from basic arithmetic operations on arrays such as add and subtract as well as more complicated operations like convolution and FFTs. If you're a successful app developer and maybe in the past you have avoided FFTs, I want to show you how we make that easy with Accelerate. With just a few lines of code, you can make that work. And I have an example. I'm going to show you how to extract signal from noise.

So what we have here is an audio signal. This is the base signal. We haven't added any noise to it yet, and you'll notice I have two sliders at the bottom. In the lower left I have a slider that lets me add noise, and you can see that.

And I also have a second slider that let's me remove this noise called Threshold. You'll also notice I have a toggle switch in the lower right. This lets me look at this under a different domain. So currently we're looking at this signal under the time domain. We're going to do some analysis and see this signal in the frequency domain.

You can see all the spikes on the left. Those are the frequency components of the signal. You'll also notice the blue bar. The blue bar is the threshold slider, so you can see I can move it. So what I'm going to do now is add some more noise to the signal. Actually, let's add quite a bit of noise.

So you can see the signal that I'm still interested in is represented by the spikes on the left or another way of looking at it is the spikes with the tallest height, and what we've added, the type of noise we're adding to this, is background noise. So it's kind of evenly distributed spikes everywhere, but they're low-level spikes.

So now I'm going to remove this noise, and the way I'm going to do this is I'm going to move this threshold slightly higher. As I do this, what's happening is, behind the scenes, is we're identifying any frequency component less than this threshold and removing it. We're saying effectively if there is any spike with a height less than this blue bar, we're going to pretend it's noise and remove it. So if I keep sliding this higher, eventually I've removed all the noise.

And if I go back to the time domain, you can see that I removed the noise. And in case you don't believe me, we can remove the threshold, and this is what this looks like with all the noise still added. So let me show you how we do that.

At a high level what we're going to do is first perform an analysis on a signal. That's what the switch let me do. We're then going to identify the frequency components that represent the noise and remove them. After we have done that, we're going to reconstruct the audio signal. So let me show you some code.

What we're going to use here is a discrete cosine transform or a DCT. You can see here the DCT CreateSetup. This context is going to describe the type of work we're going to do as well as give us space to perform this work. In this case, we're going to use a type 2 DCT. We're then going to pass this context to an execute function.

This will actually perform the work. Once we've performed the analysis, we want to remove the noise. This right here is where the magic happens. This routine is going to identify any frequency component less than that threshold and make it zero. It's going to zero it out. After we've done that, we want to reconstruct the audio signal.

Again, we're going to need to use CreateSetup to create context. In this case, we're going to use a type 3 DCT to reconstruct the signal. We're then going to pass this context DCT execute, and this will actually perform the work of reconstructing the audio signal. So what we've shown you is an example of how you can remove certain types of noise from an audio signal using vDSP. I want to show you an example of how we can remove certain types of noise from an image also using vDSP. In this case, we're going to restore an old newspaper photograph.

So what we've done is we've taken this image, and we've applied a two-tone screen to it. So this could represent an old newspaper photograph. And what we're going to try to do is remove this screen. We're going to try to remove the artifacts that you see. Currently, we're in off, so we're not doing anything. What we're going to do is actually take a sample of this screen, and we're going to create a mask from that sample and then apply that to the image to try to remove it. So let me show you the first attempt.

So what we're doing is we are identifying a frequency component at a certain threshold, and any frequency component higher than that threshold we're going to remove. And you see, if we set the threshold too low, not only do we remove the artifacts, but we also remove quite a bit of the image as well. If we set it too high, you can see that we didn't do much of anything. Medium seems about right. Medium seems too correctly identify the artifacts in the image without damaging too much of the image. So let me show you how we do that.

At a high level what we're going to do is we're going to perform an FFT on the image and the sample. We're then going to create a mask from that sample and apply it to the image. Once we've done that, we're going to reconstruct the original image. So let me walk you through some code. We're going to do an FFT. This means it needs to be a power of 2. That's why you see the log2 call. The 1024 x 1024 is the size of the image.

We're then going to pass it to fft2d zrop. This is quite a mouthful. The important part here is the op stands for out of place. So we're going to have to create some temporary space just for the result. We're going to store in this complex structure, and effectively this says we're going to store the complex number in two arrays with the real component in one array and the imaginary component in the second. We also need to specify a direction. In this case we're going to do a forward FFT.

Now the artifact removal is a little bit more advanced, so I'm only going to go through this at a very high level. I recommend you download the sample application that's available online right now, and each one of these routines is also documented under vDSP, and we have excellent online documentation.

A high level what we're going to do is we're going to identify the magnitude of the frequency components, in this case with the sample. We're then going to identify the components we want to remove. We're then going to create a mask from that. And once we've done that, we're going to apply that mask to the image. Effectively what we're doing is we're multiplying by 0 the components we want to remove and by 1 the components we want to keep.

We're going to use zrop again to reconstruct the image. Because this is an FFT, we get to reuse the context. In this case, one key detail is we're going to store the image in two arrays because this is a complex structure. So the even pixels are going to be in the real array, and the odd pixels are going to be in the imaginary.

And again we're going to specify a direction. In this case, we're going to use an inverse FFT. So now I want to shift gears for a moment. Previously we showed you two examples, and we had two working examples for vDSP, and then we worked backwards, and we showed you how we constructed these examples. We worked backwards and showed you the moving pieces.

For this next library, what I want to do is start at the basic components. I want to build up to a result. So we're going to take a look at simd, and we're going to start at the basic low-level primitives, and we're going to build up to rotations. In this case, rotations of 3D objects.

At a high level, simd is an abstraction for the vector processing unit. So what this lets you do is declare vector and matrix objects, and you get to perform operations on these objects, and this will map directly to the vector hardware of the processor. So let me show you a coding example. So what we're going to do here is we're going to take two arrays, and we're going to average the components. So we're going to iteratively go through each of the scaler components, add them together, and divide by 2.

This is going to be really slow. Instead, you can declare these arrays as simd float4 vector types. So then we can perform basic arithmetic operations on these objects. So not only can you express the computation more naturally, but this will also run about as fast as it can. And this will work across all of Apple's platforms.

Simd has a tremendous amount of functionality. In addition to vector and matrix objects and allowing you to perform arithmetic operations on these objects. It has extensions. So for instance, dot products and clamp. It also has support for the transcendental functions, so for instance sine and cosine as well as quaternions. Quaternions are very useful for representing rotations in three dimensions, and I actually want to talk a little bit more about those. So I'm going to walk you through a coding example.

There's quite a bit to unpack here. So we're going to start on the right. We have a unit sphere. That's what this gray sphere is. And you'll notice this red dot. And that's actually the tip of this vector. We've declared a simd float3 vector. We've set the x and y components to 0 and the z component to 1. So it's coming out at us. And that red dot is represented by the tip of that vector.

We're now going to perform a rotation on this vector using a quaternion. Technically, we're rotating the entire scene, but for purposes of illustration, we're going to say we're rotating this vector. When using a quaternion for rotations, you need to specify an axis and angle of rotation. Or to put it another way, what are you rotating about and by how much.

So we're going to rotate about the x axis, and we're going to rotate pi over 3 radians up. You apply the rotation by calling the simd act function. This applies the action of the quaternion on that vector and returns a rotated vector. So let's take a look at that now.

So normally you're not interested in rotating along a single axis. You usually want to rotate along multiple axes, and if you're already familiar with rotation matrices, this is going to seem very natural. Like rotation matrices, you can combine the rotations using multiplication, and also like rotation matrices, the multiplication is not commutative.

So this means if you change the order of the operands, you will change the order of the rotations. So effectively what we're going to do here is rotate pi over 3 radians up, and then pi over 3 radians to the right. And we're going to combine that into a single rotation.

Some of the more interesting things you can actually do with quaternions and simd is interpolation, and we support two types of interpolation. The first is Slerp. It stands for Spherical Linear Interpolation, and there are actually two variants of it. We have simd slerp, which will find the shortest arc between these two points, in this case between the blue and the green, and we have simd slerp longest, which will find the longest arc. So you'll actually see it go behind the unit sphere.

The second variant is Spline. Spline is really more useful if you have more than two rotations. So, for instance, here we're going to interpolate between an array of rotations, and there's quite a bit of boilerplate code here, so I want you to focus just on the Spline call.

Effectively what we're doing is just iterating through all the individual rotations and applying Spline. So the thing you have to specify with Spline is not only the two rotations that you wish to interpolate between but also the previous and the next as well. And this is what this looks like.

So if you're a game developer, you're probably not interested in rotating individual vectors. You're probably interested in rotating objects. So we have that for you. We have a cube. It's represented by multiple vectors, and it's going to go through a series of eight rotations. On the left, we're going to trace those rotations using Slerp, and on the right we're going to use Spline. So let me show you what Slerp looks like.

And you can see because it's a linear interpolation, every single time it changes direction you get these sharp corners. Whereas if we look at Spline -- Because it's aware of the previous and next rotation as well, you end up with these rounded corners. So let's see that again.

So now I went through all those topics pretty quickly, so as a quick recap, we started by taking a look at vDSP, and we showed you two examples. The first was how to extract a signal from noise, and the second was how to remove certain types of artifacts from an image. We then took a look at simd, and I showed you how you can represent rotations in three dimensions using quaternions. I'm now going to pass off the presentation to my colleague, Luke Chang, who's going to show you some of the interesting things you can do with vImage.

[ Applause ]

Thank you, Matthew. Hello everyone. My name is Luke Chang. I'm an engineer in Vector and Numerics group. Today, I'm going to talk about vImage, what vImage offers, and how easy it is to use vImage in your apps. With just a few lines of code, you can create engaging video effects in your app. Let's get right to it.

VImage is our image processing library. It has several components. The first component is the conversion function. Conversion function help you move image between different image formats. Different image formats have different advantages. For example, RGV format matches the pixels on your display, so it's best for the display. On the other hand, we have YCbCr image, which is similar to how human perceive the image. Human eyes recognize brightness, which is the luminance channel. Also, the color, which is the chrominance channel. Also, camera uses YCbCr format to capture images. So converging function helps you easily move images between these formats.

Next, we have geometry function. Geometry function changes the size or orientation of the image. We have vImage scale that can do enlarge or shrink the image. We use Lanczos algorithm, so we'll have high-quality output after the operation. We also have vImage rotate that can rotate image clockwise or counterclockwise.

Next, convolution function. The most notable effect of convolution function is the blur effect. You see blur effect all the time, in UI, in photography. If you want to phase something into the background, you can apply the blur function, blur effect. Next, transform function. Transform function is basically a matrix multiply. It lets you operate on the data channel of each pixel. Let's say you want to increase red or increase green, you can do that with transform functions.

Morphology. Morphology changes the size or shape of the objects in the image, not the image itself. We have vImage erode, vImage dilate to make the object smaller and bigger. If you're feeling adventurous, you can actually provide a custom shape of kernels to these functions, and vImage erode and dilate will make the object smaller or bigger according to the kernel you provided. Those are the five things in vImage. Now, I want to show you a demo app that we wrote based on vImage and show you what kind of effect you can get from vImage.

What I have here is a lab that captures images using the back camera and put the image onto the screen. And we're doing in real-time. This is a live stream, so you can see the drinking bird doing its motion next to the roses. All right, the first effect I want to show you is the color saturation effect. You can see this effect in a lot of photo editing software. So I want to bring out the color. What I can do is move this slider to the right to make the red really red and green really green.

And on the other side, I have white roses. I feel that the color of white roses is not that interesting to me. I would like to direct the focus of my audience to the composition and the contrast of this image. I can slide to the left to desaturate the image to the point that this becomes a black and white image. So now color is no more distraction in this image, and the viewer can focus on the composition and contrast. Okay. So, how do we do this?

There are several steps we need to take. First, of course, we have to get the image from the camera, and then we want to use vImage to process, to apply the effect, so we have to prepare the input and output buffer for vImage. Then we actually calling vImage functions to apply these effects, and we display the output to the screen. Let me jump ahead and talk about how do we apply effects using vImage functions.

The effect I show you is a color saturation effect, and here is the formula to do color saturation. Basically, we want to remove the bias from the pixel, and using multiplication to apply the saturation effects, and then we put the bias back to the pixel. VImage has exactly the function to do this operation, which is vImage matrix multiply.

VImage matrix multiply takes the preBias, in this case minus 128, to remove the bias, and because the saturation is [inaudible] point and the image is an integer, we want to convert this saturation value first into fixed point format. We chose Q12 as the fixed point format, hence a divisor of 0x100. And then we have the postBias 128 times the divisor, just to put the bias back to the pixel.

And the matrix itself is really, really simple. All we want to do is just doing a scaling of CbCr channel. So the matrix itself is just a scaler, want to multiply CbCr channel with this scaler. We have all the information, so let's call vImageMatrixMultiply, and with just one line of code, one function call, you can achieve the saturation effect.

Now let me come back to other steps that we need to take. We need to take the image from the camera. How do we do that? We need to write a delegate method, and what camera gives us is a CV image buffer. So we get the buffer. We have to make sure this buffer is accessible to CPU. That's where vImage lives. After we apply the effects, whatever effects that may be, and we have to unlock the base address of this pixel buffer so that the camera can reuse this piece of memory.

The second step, we have to prepare the vImage input and output buffer. We already have this image in CV image buffer. All we need to do is just get the information such as height and width, and then we can package this into a vImage buffer object so it can be consumed by vImage library. We do this for luminance and chrominance channel.

Now we need to prepare an output buffer. Remember, we don't have a piece of memory allocated to the output image yet, so we need to do that, and vImage has a convenience function, vImage buffer in it, to do just that. Given this height, width, and bits per pixel, vImage buffer in it will allocate a memory that's large enough to hold this image and then also create a vImage buffer object so it can be consumed by the vImage library. Last step is put this process image to the screen.

Like I said before, RGB is really the best format for display, so let's use the conversion function to convert YCbCr image into a RGB image. And then because the UI expect the cgImage object, we have to create one. There is a convenience function in vImage. vImageCreatesCGImage FromBuffer that helps you to create a cgImage based on the buffer you already have in vImage.

One thing to note is that we're not actually copying the large data buffer in the image from one place to another. We're simply creating a cgImage object that adds a container to this image buffer. So we are only filling in the information that cgImage needs, create a cgImage objects, instead of copying data around.

Once we have that, we can send the cgImage object to the image view, and it will be displayed on the screen. So it's that simple. Four steps and you can create your own effects, we show you the saturation effects. Now, there are other effects we can do with vImage.

We can do a rotation, like I said before, rotate the image clockwise, counterclockwise. We can do blur or phase something into the background. And you feel, if you feel like you want to add some retro feeling to your images, you can do dithering for black and white images, and color quantization for color images. Let me show you how they look like in the app.

So, again, I have the slider here to control the rotation. I can do rotate counterclockwise or rotate clockwise. Now I want to try the blurring effect. Let me click on the one here. And I can apply more blurring or slide to the left, bring the roses back to the foreground.

For black and white, I can use dithering. Now this black and white image, the gray scale is represented now by the density of the dots, that's the dithering effect. And we use the accents and dithering algorithm for this, and I will show you how to do it later, and for color quantization, we have lookup table.

And I can move this slider to increase the quantization level. As I move the slider to the right, fewer and fewer color is available in this image. That's sort of creative, that's how your computer screen looks like in the '90's or in the '80's. All right. So let me show you how we do that.

For rotation effect, you can call vImageRotate and given the angle of rotation, it will do counterclockwise or clockwise to better align your images. For the blur effect, we use TentConvolve. The blur effect is controlled by the size of the kernel. The larger the kernel, the more blur you'll get. Dither effect is basically converting an 8-bit image into a 1-bit image. At the same time, you can specify a dithering algorithm. In this case, we used the Atkinson dithering algorithm.

Color quantization, we used the quantization level to create a lookup table for the RGB channel, and we call vImageTableLookUp to apply this table lookup to RGB channels to limit the number of colors on the screen. Those are the four additional effects I wanted to show you, and I think now is a good time to move onto the next topic.

LINPACK Benchmark. We talked about the functionality of Accelerate. We talk about how easy it is to use in Accelerate in your apps. We haven't talked about how fast Accelerate really is. And LINPACK Benchmark is a perfect tool to do that. What is LINPACK Benchmark? Basically, it's trying to measure how fast you can solve a linear system on your machine.

There are actually three different LINPACK Benchmarks. The first one is solving a 100-by-100 linear system. The second one is solving a 1000-by-1000. The last one, which is the most interesting one, that's the one we're going to use today, is no holds barred. You can solve as large a system you want to fully utilize every last bit of the computational power on your machine.

Let's see the performance of iPhone X using Accelerate. The performance is measured in gigaflops, for double precisions catch up with iPhone 5S, iPhone 6, iPhone 6S, iPhone 7, iPhone X comes in around 28.7 gigaflops. That's double precision. Let's look at single precision. Again, we run out of space, we have to shrink to try to make it come closer.

IPhone X comes in at 68 gigaflops. Now you might be thinking, this is not that surprising. However, it improves over time, so the performance improves over time as well, but in fact, that's only half of the story. When there is a micro architecture change to have additional computational power into a machine, you need the matching [inaudible] to fully utilize this additional computational power. And that is where we come in. Remember, this is the same LINPACK Benchmark executable running on all five generations of iPhones. They all got the best performance using Accelerate without changing a bit.

The same is also true for your apps. If you use Accelerate in your apps, you'll get the best performance automatically on all the architectures we support. Moreover, Accelerate supports across platforms. Accelerate works on macOS, iOS, tvOS, watchOS. So let's say if tomorrow Apple comes out with a new architecture or new platform, you don't have to worry about it. All you need to do, the most you'll ever need to do is just to rebuild your apps, link against Accelerate. You will automatically get the best performance on the latest release platform or architecture.

Just as a summary, we talk about Accelerate supports a wide variety of functionalities. More than likely, you will find something you need in Accelerate. If you need something that's not available, please feel free to file feature requests. We constantly look at this feature request, evaluate them, and then put into builds. Actually, some of our best features come from feature requests.

Accelerate is easy to use. Most of the time it's just one function call and the job is done. It's fast and energy efficient, so your app is more responsive and the battery life is longer. Accelerate is portable across platforms and architectures. You get the best performance on all the platforms we support and all the architectures we support, and the best part is, you don't have to change your code.

For more information, you can go to our online documentation at developer.apple.com, and all our demo apps, simple code, and session material will be available online. We have a lab session tomorrow afternoon at 2. I look forward to seeing you guys there. If you have any questions or you want to learn more about Accelerate, I'd love to see you there. That's all for our presentation today. Thank you all for coming. Have a great day.

[ Applause ]