Frameworks • iOS, macOS, tvOS, watchOS • 34:37
The Core ML tools ecosystem gives you many options for building and optimizing models to meet your app requirements. Learn how to add flexibility to existing models, quantize them, and take advantage of Core ML's support for customization.
Speakers: Aseem Wadhwa, Sohaib Qureshi
Unlisted on Apple Developer site
Downloads from Apple
Transcript
This transcript has potential transcription errors. We are working on an improved version.
Hello!
[ Applause ]
So welcome to the second session of Core ML. My name is Aseem, and I'm an engineer in the Core ML team. As you all know, Core ML is Apple's machine learning framework for on-device inference. And the one thing I really like about Core ML is that it's optimized on all Apple hardware.
Over the last year, we have seen lots of amazing apps across all Apple platforms. So that's really exciting. And we are even more excited with the new features that we have this year. Now you can reduce the size of your app by a lot. You can make your app much faster by using the new batch-predict API. And you can really easily include cutting-edge research right in your app using customization. So that was a recap of the first session. And in case you missed it, I would highly encourage you to go back and check the slides.
In this session, we are going to see how to actually make use of these features. More specifically, we'll walk through a few examples and show you that how in a few simple steps using Core ML Tools. You can reduce the size of the model, and you can include a custom feature in your model. Here's the agenda of the session. We'll start by a really quick update on the Core ML Tools ecosystem. And then we'll dive into a demo of our quantization and custom conversion. So let me start with the ecosystem.
So how do you get an ML model? Well, the best thing is that if you, if you can, if you find it online, you just download it, right? Very good place to download your ML models is the Apple Machine Learning landing page. We have a few models there. Now let's say you want to train a model on your data set. In that case, you can use Create ML.
This is a new framework that we have just launched this year, and you do not have to be a machine learning expert to use it. It's really easy to use. It's right there in Xcode. So go and give it a try. Now some of you are already familiar with the amazing machine learning tools that we have outside in the community. And for that, last year we had released Core ML Tools, a Python package. And along with that, we had released a few converters.
Now there has been a lot of activity in this area over the last year. And this is how the picture looks now. So as you can see, there are many more converters out there. And you really do have a lot of choice to choose your training framework now. And all of these converters are built on top of Core ML Tools.
Now, I do want to highlight a couple of different converters here. Last year, we collaborated with Google and released the TensorFlow converter. So that was exciting. As you know, TensorFlow is quite popular with researchers who try out new layers so we recently added support for custom layers into the converter. And TensorFlow recently released support for quantization during training and that's Core ML 2 supports quantization. This feature will be added soon to the converter.
Another exciting partnership we had was with Facebook and Prisma. And this resulted in the ONNX converter. The nice thing about ONNX is that now you have access to a bunch of different training libraries that can all be converted to Core ML using the new ONNX converter. So that was a quick wrap-up of Core ML Tools ecosystem. Now to talk about quantization, I would like to invite my friend Sohaib on stage.
[ Applause ]
Good morning, everyone. My name is Sohaib. I'm an engineer in the Core ML team. And today we're going to be taking a look at new quantization utilities in Core ML Tools 2.0. Core ML Tools 2.0 has support for the latest Core ML model format specification. It also has utilities which make it really easy for you to add flexible shapes and quantize in your own network machine learning models. Using these great new features in Core ML, you can not only reduce the size of your models. But also reduce the number of models in your app, reducing the footprint of your app. Now let's start off by taking a look at quantization.
Core ML Tools supports post-training quantization. We start off with a Core ML neural network model which has 32-bit float weight parameters. And we use Core ML Tools to quantize the weights for this model. The resulting model is smaller in size. Now size reduction of the model is directly dependent on the number of bits we quantize our model to. Now, many of us may be wondering what exactly is quantization? And how can it reduce the size of my models? Let's step back and take a peek under the hood.
Neural networks are composed of layers. And these layers can be thought of as mathematical functions. And these mathematical functions have parameters called weights. And these weights are usually stored as 32-bit floats. Now in our previous session, we took a look at ResNet50. A popular machine-learning model which is used for image classification amongst other things. Now this particular model has over 25 million weight parameters. So you can imagine, if you could somehow represent these param -- these parameters using a fewer number of bits, we can drastically reduce the size of this model.
In fact, this process is called quantization. In quantization, we take the weights for our layers which [inaudible] to minimum and to maximum value and we map them to unsigned integers. Now for APIC quantization, we map these values from a range of 0 to 55. For 7-bit quantization, we map them from 0 to 127, all the way down to 1 bit. Where we map these weights as either zeros or ones. Since we're using fewer bits to represent the same information, we reduce the size of our model.
Great. Now many of you may have noticed that we're mapping floats to integers. And you may have come to the conclusion that maybe there's some accuracy loss in this mapping. That's true. The rule of thumb is the lower the number of bits you quantize your model to, the more of a hit our model takes in terms of accuracy. And we'll get back to that in a bit.
So that's an overview of quantization. But the question remains. How do we obtain this mapping? Well, there are many popular algorithms and techniques out there which help you to do this. And Core ML supports two of the most popular ones: linear quantization and lookup table quantization. Let's have a brief overview.
Linear quantization is an algorithm in which you map these full parameters equally. The quantization is parametrized by a scale and by values. And these values are calculated based on the parameters of the layers that we're quantizing. Now and a really intuitive way to see how this mapping works is if we take a step back. And see how we would go back from our quantized weights which are at the bottom back to our original float weights. In linear quantization, we would simply multiply our quantized weights with the scale parameter and add the bias.
The second quantization technique that Core ML supports is lookup table quantization. And this technique is exactly what it sounds like. We construct a lookup table. Now again it's helpful if we imagine how we would go back from our quantized weights back to our original weights. And in this case, the quantized weights are simply indices back into our lookup table. Now, if you notice, unlike linear quantization, we have the ability to move our quantized weights around. They don't have to be spaced out in a linear fashion.
So to recap, Core ML Tools supports linear quantization and lookup table quantization where we start off with a full precision neural network model. And quantize the weights for that model using the utilities. Now you may be wondering well great, I can reduce the size of my model. But how do I figure out the parameters for my quantization? If I'm doing linear quantization, how do I figure out my scale and bias? If I'm doing lookup table quantization, how do I construct my lookup table? I'm here to tell you that you don't have to worry about any of that. All you do is decide on the number of bits you want to quantize your model to. And decide on the algorithm you want to use, and let Core ML Tools do the rest. In fact --
[ Applause ]
In fact, it's so simple to take a Core ML neural network model. And quantize it. Then we can do it in a few lines of Python code. But why stand here and talk about it when we can show you a demo? So for the purposes of this demo, I'm going to need a neural network in the Core ML model format. Now, as my colleague Aseem mentioned, a great place to find these models is on the Core ML machine learning home page. And I've gone ahead and downloaded one of the models from that page. So this model's called SqueezeNet. And let's go ahead and open it up.
As we can see, this model is 5 megabytes in size. It has a input which is an image of 227 by 227 pixels. And it has two outputs. One of the outputs is the class label which is a string, and this is the most likely label for the, for the input image.
And the second output is a mapping of strings to probabilities given that if we pass an image, it's going to be a list of probabilities of what that image may be. Now let's start quantizing this model. So the first thing I want to do is I want to get into a Python environment. Now a Jupyter Notebook is one such environment that I'm comfortable with. So I'm going to go ahead and open that up.
Let's open up a new notebook and zoom in on that. Alright. So let's start off by importing Core ML Tools. Let's run that. Now the second thing I want to do is I want to import all the new quantization utilities that we have in Core ML Tools. And we do that by running this. And now we need to load up the model which we want to quantize. And we just saw the SqueezeNet model a minute okay. We're going to go ahead and get an instance of that model.
Send this to my desktop. Great. Now to quantize this model, we just need to make one simple API call. And let's try a linear, quantizing this model using linear quantization. And its API is simply called quantize weights. And the first parameter we pass in is the original model which you just loaded up. The number of bits we want to quantize our model to. In this case, it's 8 bits. And the quantization algorithm we want to use. Let's try linear quantization.
Now what's happening is that the utility is iterating over all of the layers of the linear networks. And is quantizing all the weights in those layers. And we're finished. Now, if you recall a few moments ago I mentioned that quantizing our model had an associated loss in accuracy. So we want to know how our quantized model stacks up to the original model.
And the easiest way of doing this is taking some data, passing and getting inference on that data using our original model. And doing the same inference on the same data using our quantized model and comparing the predictions from that model. And seeing how well they agree. Core ML Tools has utilities which help you to do that. And we can do that by making this call which is called compare models. We pass in our full precision model, and we pass in our model which we had just quantized.
And because this model is a simple image classifier which it only has one image inputs. We, we have a convenience utility. So we can just pass in a folder containing sample data images. Now on my desktop here, I have a folder with a set of images which are relevant for my application. So I'm going to go ahead and pass a path to this folder as my [inaudible] parameter.
Great. So now we see we're analyzing all the images in that folder. We're running inference on the, we're using full prediction or full precision model. And we're running inference on our quantized model. And we're comparing our two predictions. So we seem to have finished that. And you can see our Top 1 Agreement is 94.8%.
Not bad. Now what does this Top 1 Agreement mean? This means that when I pass in my original model, that image of a dog for example, and it predicted that this image was a dog. My quantized model did the same. And that happened over 98, 94.8% of the data set.
So I can go ahead and use this model in my app. But I want to see if other quantization techniques work better on this model. As I mentioned, Core ML supports two types of quantization techniques. Linear quantization and lookup table quantization. So let's go ahead and try and quantize this model using lookup table quantization. Again, we pass in an original model, the number of bits we want to quantize our model to. And our quantization techniques. Oops, made a typo there.
Let's go ahead and run this. Now, k-means is a simple clustering algorithm which approximates the distribution of our weights. And using this distribution, we can construct the lookup table for our weights. And what we're doing over here is that we're iterating over all the layers in the neural network. And we're quantizing and we're figuring out the lookup table for that particular layer.
Now, if you're an expert and you know that your model, you know your model architecture and you know that k-means is not the algorithm for you, you have the flexibility of passing in your own custom function instead of this algorithm and the utility will use your custom function to actually construct the lookup table.
So we finished quantizing this model again using the lookup table approach. And now let's see how well this model compares with our original model. So once again we call our compare model's API. We pass in our original model and we pass in our lookup table model. And again we pass in our sample data folder.
Again, we run inference over all the images using both the original model and the quantized model. And we see this time we're getting a much better, little bit better Top 1 Agreement. Now for this model, we see that lookup table was the right way to go. But again, this is model-dependent and for other models, linear may be the way. So now that we're happy with this and we see that this is good enough for at least my application, let's go ahead and save this model out. We do that by causing or calling save. I'm going to give it the creative name of Quantized SqueezeNet.
And there we go. We have a quantized model. So this was an original model. And we saw that it was 5 megabytes in size. Let's open up our quantized model. And the first thing we notice right off the bat is that this model is only 1.3 megabytes in size.
[ Applause ]
So if you notice, all the details about, about our quantized model are the same as the original model. It still takes in an image input, and it still has two outputs. Now, if I had an app using this model, what I could do as we saw in the previous demo. Is we could just drag this quantized model into our app and start using that instead. And just like that, we reduce the size of our app.
So that was quantization using Core ML Tools. To recap, we saw how easy it was to use Core ML Tools to quantize our model. Using a simple API, we provided our original model, the number of bits we wanted to quantize our model to, and the quantization algorithm we wanted to use. We also saw that Core ML Tools has utilities which help us to compare our quantized model to see how it performs against our original model.
Now as we saw in the demo, there is a loss of accuracy associated with quantizing our model. And this loss of accuracy is highly model and data dependent. Some models work well or perform better than others after quantization. As a general rule of thumb again, the lower the number of bits we quantize our model to the more of a precision hit we take.
Now in the demo we saw that we were able to use Core ML Tools to compare our quantized model and the original model using our Top 1 Agreement metric. But you have to figure out what the relevant metric for your model and your use case is and validate that your quantize model is acceptable. Now in a previous session, we took a look at a style transfer demo. And this network took in an input image, and the output for this network was a stylized image. Let's take a look at how this model performs at different levels of quantization.
So on the top, top left here, your left. We see that original model is 30 -- is 32 bits and it's 6.7 megabytes in size. And our 8-bit linearly quantized model is only 1.7 megabits in size. And we see that the performance by visual inspection it's good enough for my style transfer demo.
Now we can see that even down to 4 bits, we don't lose out much in the way of performance. I would even argue that for my app at least, the 3 bit will work fine as well. And we see at 2 bit, we start to see a lot of artifacts and this may not be the right model for us. And that was quantization using Core ML Tools. Now I'm going to hand it back to Aseem who's going to talk about custom conversion. Thank you.
[ Applause ]
Thank you, Sohaib. So I want to talk about a feature that is essential to keep pace with the machine learning research that's happening around us. As you all know, the field of machine learning is expanding very rapidly. So it's very critical for us at Core ML to provide you with the necessary software tools to help with that.
Now let's take an example. Let's say you are experimenting with a new model that that is not supported on Core ML. Or let's say you have a neural network that runs on Core ML but maybe there's a layer or two that Core ML does not have yet. In that case, you should still be able to use the power of Core ML, right? And the answer to that question is yes.
And the feature of customization will help you there. In the next few minutes, I want to really focus on the specific use case of having a new neural network layer. And show you how you would convert it to Core ML and then how you would implement it in your app.
So let's take a look at model conversion. So if you have used one of our converters, or even if you have not, it's a really simple API. It's just a call to one function. This is how it looks for the Keras converter. And it's very similar for say the ONNX converter or the TensorFlow converter. Now when you call this function, mostly everything goes right. But sometimes you might get an error message like this.
It might say, "Hey, unsupported operation of such-and-such kind." Now if that happens to you, you only need to do a little bit more to get past this error. More specifically, such an error message is an indication that you should be using a custom layer. And before I show you what is the little bit of extra effort that you need to do to convert, let's look at a few examples where you would need to use a custom layer.
So let's say you have an image classifier. This is how it looks in Xcode. So it will be high-level description of the model. If you look inside, it's very likely that it's a neural network. And it's very likely that it's a convolutional neural network. So it has a lot of layers, convolution, activation. Now it might happen that there's a new activation layer that comes up that Core ML does not support.
And it's like at every machine learning conference, researchers are coming up with new layers all the time. So this is a very common scenario. Now if this happens, you only need to use a custom implementation of this new layer. And then you are good to go. So this is how the model will look like. The only difference is this dependency section at the bottom. Which would say that this model contains a description of this custom layer. Let's take a look at another example. Let's say we have a very simple digit classifier.
Now I came across this research paper recently. It's called Spatial Transformer Network. And what it does is this. So it inserts a neural network after the digit that tries to localize the digit. And then it feeds it through a grid sampler layer which renders the digit again, but this time it has already focused on the digit.
And then you pass it through your old classify method. Now we don't need to worry about the details here. But the point to note is that the portion in green is what Core ML supports. And the portion in red, which is this new grid sampler layer, is this new experimental layer that Core ML does not support.
So I want to take an example of this particular model and show you how you would convert it using Core ML Tools. So let's go to demo. I hope it works on the first try. Back, oh yes. Okay. So let me close off these windows. Let me get, clear this. Clear the ML. Okay, so I'm also going to use Jupyter Notebook to show the demo.
So I just navigate to the folder where I have my pre-trained network. So what you see here is that I have this spatial transformer dot [inaudible] file. This is a pre-trained Keras model. And if you are wondering if I did something special to get this model. Basically what I did was I could easily find an open source implementation of spatial transformer. I just exhibited that script in Keras, and I got this model.
And along with this model, I also got this grid sampler layer Python script. Now this grid sampler layer that I'm talking about, it's also not supported on Keras natively. So the implementation that I got online used that Keras custom layer to implement the layer. So as you can see, the concept of customization is not unique to Core ML. In fact, it's very common in most machine learning frameworks.
This is how people experiment in new layers. Okay, so so far, I just have a Keras model. And now I want to focus on how can I get a Core ML model? So I'll open -- there, let me launch a new Python notebook. So I'll start by importing this Keras model into my Python environment.
Okay? So I import Keras, I import the, the custom layer that we have in Keras. And now I will load the model in Keras. Okay? So this is how you load model, Keras models. You give the part to the model and if there's a custom layer, you give a part to that.
Okay. So we have the model now. Now let's convert this to Core ML. So I'm going to import Core ML Tools. Execute that. And now as I, as I showed you before that this is just a call to one function to convert it. So let me do that.
That's my call. And I get an error as expected. Python likes to throw these huge error messages. But really what we're focused on is this last line. Let me -- So as we can see in this last line it says that hey, the layer or sampler is not supported.
So now let's see what we need to do to get rid of that. Maybe I clear this all so that you can see. Okay. So now I change my converter call just a little bit so I have my Core ML model. And now I'm going to pass one additional argument. It's called custom conversion functions.
And this will be a dictionary from the name of the layer to a function that I will define in a minute. And that I'm calling a good sampler. So let me take a step back and explain what is happening here. So as we know the way converter works is that it goes through each and every Keras layer.
It will, if you look at the first layer. Then [inaudible] its parameters to Core ML. If you go to the second layer, then translate its parameters and so on. Now when it hits this custom layer, it doesn't know what to do. So this function that I'm passing here that convert this sampler is going to help my converter in doing that. And let me show you what this function looks like.
So this is a function. There are a few lines of code, but all that it's doing is three things. First, it's giving a name of a class. So as we might have noticed, the implementation of the layer is not here. The implementation will come later in the app and it will be encapsulated in a class.
And this is the name of the class that we'll later implement. So during conversion, we just need to specify this class name. That's it. And then there's the description which is a, which you should provide so that if anybody is, if somebody is looking at your model, they know what it has.
And the third thing is basically translating any parameters that the Keras layer had to Core ML. For this particular layer, it has two parameters. The output height, and output weight. And I'm just translating it to Core ML. If your custom layer that does not have any parameters, then you load, then you do not need to do, do this.
If your layer has lots of parameters, they can all go here, and they will all be encapsulated inside the Core ML model. So as you might have noticed that all I did here was very similar to how you would define a class, right? You give a class name. Maybe a description, maybe some parameters. So now let me execute this.
And now we see that the converter went, conversion went fine. So let me this is behaving very weirdly for some reason. If you don't mind, I'm just going to delete this all. So let me visualize this model, and you can do that very simply using function in Core ML Tools. That's called visualize spec.
And here you can see a visualization of the model. So as we can see, we have the [inaudible] and some layers there. And this is our custom layer. And if I click on this, I see the parameters that it has. So this is the name of the class that I gave. And this, and these are the parameters that I set.
It's always a good idea to visualize your Core ML model before you drag and drop just to see if everything looks fine. Okay. This is the wrong notebook. Okay. And now I'll save out this model. And now let's take a look at this model. So let me close this. Okay.
Let me actually let me navigate to the directory that I have. And here's my model. So if I click on it and see it in Xcode just to see how it looks. We can see that it has the custom description here. Okay. Let me go back to slides.
[ Applause ]
So what we just saw was with a few simple lines, we could exhibit a convert a function to Core ML. And the process is pretty much the same if you are using the TensorFlow converter or the ONNX converter. So we have our model here on the left-hand side.
The custom layer model with the parameters. Now when you drag and drop this model into Xcode, you will need to provide the implementation of the class. In a file say, for example, [inaudible]. And this is how it would look like. So you have your class, so you'll have the initializer function. So this would be just initializing any parameters that we had in the model. And then the main function in this class would be evaluate.
This is where the actual implementation of whatever mathematical function the layer is supposed to perform will go here, in here. And then there's one more function called output shape or input shapes. This just specifies the size of the output area that the layer produces. This helps Core ML in allocating the buffer size at load time so that your app is more efficient at runtime.
So we just saw how you would tackle a new layer in a neural network. There's a very similar concept to a custom layer, and it's called custom model. It has the same idea, but it's sort of more generic. So with a custom model, you can deal with any sort of network. It need not be a neural -- it need not be a neural network. And basically gives you just more flexibility overall.
So let me summarize the session. We saw how much more rich is this ecosystem around Core ML Tools and that's great for you guys. Because now you have lot of choice to get Core ML models from. We saw how easy it was to quantize this, quantize Core ML model. And we saw that with a few lines of code, we could easily integrate a new custom layer in the model. You can find more information at our documentation page. And come to the labs and talk to us. Okay, thank you.
[ Applause ]