Introducing Core ML - WWDC 2017

System Frameworks • iOS, macOS, tvOS, watchOS • 31:24

Machine learning opens up opportunities for creating new and engaging experiences. Core ML is a new framework which you can use to easily integrate machine learning models into your app. See how Xcode and Core ML can help you make your app more intelligent with just a few lines of code.

Speakers: Gaurav Kapoor, Michael Siracusa, Lizi Ottens

Unlisted on Apple Developer site

Downloads from Apple

HD Video (973.2 MB)
SD Video (251.3 MB)
PDF Slides (3.5 MB)

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript has potential transcription errors. We are working on an improved version.

Wow. It's very exciting to be here. I'm Gaurav. And today we are going to discuss machine learning. Machine learning is a very powerful technology. And together with the power of our devices, you can create some really amazing experiences. For example, you can do things like real-time image recognition and content creation. In the next 40 minutes, we are going to tell you all about those experiences and how Apple is making it easy for you to integrate machine learning in your app. In particular, we will be focusing on Core ML, our machine learning framework.

At Apple, we are using machine learning extensively. In our photos app, we use it for people recognition, scene recognition. In our keyboard app, we use it for the next word prediction, smart responses. We use it even in our watch to do smart responses and handwriting recognition. And I'm sure you guys also want to create similar experiences in your app. For example, some of you might like to do real-time image recognition. In this way, you can tell your users, provide them more contextual information about they are seeing, for example whether they are eating a hotdog or not.

[ Laughter ]

We want you to enable all of these great facility in your app. And, hence, we are introducing machine learning frameworks just for you guys to enable all the awesome things. You can clap.

[ Applause ]

But before we clap, we should understand why do we even need machine learning in the first place? And when do you need machine learning? So let's just say I'm a florist. And all I want to show is all the images of roses in user's photo library. This seems like a simple task, so I can do some programming. Perhaps, I'll start with the color. I say if the dominant color in the picture is red maybe it's a rose. The problem is there are many rose that are white in nature, white in color, or yellow in color.

So I will actually go forward and start describing the shape. And soon I will realize that it's very difficult to write even such a simple program programmatically. Hence, [inaudible] to machine learning for our help. Rather than describing how a rose looks like programmatically, we will describe a rose empirically. We will learn that representation of rose empirically. Machine learning actually has two steps. One, training.

So what you will do in this case, off-line you will collect images of roses, lilies, sunflowers and label them. You will run it past through a learning algorithm. And you will get what we call as a model. This model is an empirical representation of how a rose look like.

Almost all of this stuff happens off-line. And there is a big [inaudible] on the Internet around training. Training is a complex subject. Now once you have the model, you want to use this model. So what you will do, you will take a picture of a rose, embed this model in your app, you run it past through your model, and you will get the label and the confidence. This step is known as inference. Training is very complex, but inference is also getting very, very challenging nowadays.

For example, if you want to describe a new, deep neural network programmatically in your code, you have to write thousands of lines of code just to describe the network. It can be [inaudible] tedious and perhaps you can face your biggest challenge in proving the correctness of such a program. Once you make sure that the program is correct, you have to make sure that you're getting the best performance possible.

Finally, you also have to make sure that there is energy efficiency involved here. These tasks can be very, very challenging and can bog you down completely. At Apple, we are using on-device machine learning for several years now. And we have solved these challenges. We have faced these challenges and solved them.

And we really don't want you to solve them. We will solve them for you. We want you to focus only on the experience. We will give you the implementation, the best implementation possible. And, hence, we are introducing machine learning frameworks for you in which case we take care of all the challenges, and you take care of all the experience in your app. By doing so, we are going to follow a layered approach. At the top is your app.

We are going to introduce new domain specific framework, in particular, Vision. The Vision framework will actually allow you to do all the tasks related to computer vision. The best part about computer vision of Vision framework is that you don't have to be a configuration expert for using Vision framework.

We are also enhancing our NLP APIs to include more languages and more functionality. We are introducing a brand-new framework called Core ML. This framework has an exhaustive support of machine learning algorithms, more deep learning as well as a standard. And finally, all of these frameworks are built on top of Accelerate and Metal Performance Shaders.

To give you a sneak peak, we have Vision framework. Vision is our one-stop shop to do all things related to computer vision and images. You can use it to do things like object tracking or more modern deep learning-based face detection. There is a lot more to Vision framework. We are going to discuss Vision in detail tomorrow afternoon.

NLP, this is our one-stop shop to do the text processing. You can use it to do things like language identification. And we are also releasing Named Entity Recognition APIs. These APIs can actually tell you in your text if it is a name of the location, people, and organization. We believe this will be very powerful for many app developers so please check this out tomorrow more in the morning session on NLP.

Core ML, until now, we were discussing domain specific framework. Core ML is our machine learning framework and is domain agnostic. It supports multiple kind of inputs such as images, text, dictionaries, raw numbers. So you can do things like music tagging. There's another interesting part of our Core ML that it has the ability to deal with mixed input/output type. So you can take an image and you can get a text out.

The best part about all these frameworks is that they can work together. So you take a text, you pass it through your NLP, you get the output, and you pass it through Core ML and do things like sentiment analysis. We are going to discuss it in detail in our in-depth Core ML session, how you do that exactly on Thursday.

All of these frameworks are being powered by Accelerate and MPS. These are our engine. Now you can use Accelerate and MPS whenever you need some kind of math functionality. So it doesn't have to be necessarily related to machine learning. You can also use these Accelerate and MPS when you are doing custom ML models. So if you are using a machine learning model, you might like to use Accelerate and MPS.

All these APIs run on user's device. And they're very, very highly performing. And we work very hard to make sure you get the best performance possible. Now because they learn on user's device, there are advantages. First, user privacy. Your users will be happy to know that you're not sending their personal messages, texts, and images to the server.

Your users don't have to pay extra data cost just to receive these -- just to send this data to the server to get a prediction. You also don't have to actually give huge amount of money to server companies just to set up the servers and get a prediction. Our devices are very, very powerful. You can do a lot here.

And finally, your app is always available. Let's just say your user goes to Yosemite in the wilderness where there is no network connectivity. They can still use your app. And this is very powerful. However, perhaps one of the best things you can do is real-time machine learning. Our devices are very powerful. They can run modern deep neural networks such as ResNet 15 [inaudible]. You can use our devices to do real-time image recognition. In these cases, you really don't have [inaudible] to actually get the application.

[ Applause ]

So for any latency sensitive things, you might like to do real-time image recognition. So just to recap here, we are releasing a series of machine learning frameworks. If your app is a vision-based app, please use Vision framework. If your app is a text-based app, please use NLP. Now if you think that NLP and the Vision framework are not providing the APIs you need, go down on Core ML.

Core ML provides exhaustive support of standard machine learning and deep learning algorithms. And let's just say you are [inaudible] your APIs machine learning algorithms then you should use Accelerate and MPS. We are going to discuss all these frameworks in detail in the next couple of days so you are [inaudible], but today let's just focus on Core ML. And to do that, I'll invite my friend and colleague, Michael.

Thanks Gaurav. Hi everyone. I'm so excited to talk about Core ML and its role in helping you make a great app. We're going to start out with a high level overview of Core MLs main goals. We'll then talk a little bit about models and how they're represented. And then we'll walk you through the typical development process using Core ML.

So the Core ML framework is available on macOS, iOS, watchOS, and tvOS. But Core ML is more than just a framework and a set of APIs. It's really a set of development tools all designed around making it as easy as possible to take a machine learned model and integrate it into your app. This will let you focus on the user experience you're trying to enable rather than the implementation details.

Core ML is simple to use, will give you the performance you need while being compatible with a wide variety of machine learning tools out there. Its simplicity comes from having a Unified Inference API across all model types. Xcode integration will let you interact with machine learned models using the same software development practices you're all already experts in.

It uses the inference engines Apple is already shipping to millions of customers in its apps and systems services now being made available to you via Core ML. And as mentioned, these are built on top of Metal and Accelerate to make the best use of hardware that your apps are being deployed on.

Core ML is also designed to work in this rapidly evolving machine learning ecosystem. It defines a new public format for describing models as well as a set of tools that will allow you to convert the output of popular training libraries into this format. So that's Core ML in a nutshell. It's all about making it super easy to get a learned model integrated into your app. So let's talk a little bit more about these models.

Since Core ML, a model is simply a function. Now the logic for this function happens to be learned from data but just like any other function, it takes in a set of inputs and produces a set of outputs. In this case, shown on the slide, we have a single input of an image and an output, perhaps telling you what type of flower is present in it.

Now many of the use cases in which you may want to apply a machine learning model have some key function at its core. A common type of function is performing some sort of classification. It's taking a set of inputs and assigning some categorical label. Take, for example, sentiment analysis. Here, you may take in an English sentence and output whether the sentiment was positive or negative, here represented by an emoji. Now, in order to encode functions of this type, Core ML supports a wide variety of models.

It has extensive support for neural networks both feedforward and recurrent neural networks with over 30 different layer types. It also supports tree ensembles, support vector machines, and generalized linear models. Now each of these model types are actually a large family of models. And we could spend the rest of this session and the rest of this conference just talking about any one of them, but don't let that intimidate you.

You do not need to be a machine learning expert to make use of these models. You continue to focus on the use case you're trying to enable and let Core ML handle those low-level details. Core ML represents a model in a single document. That is, sharing a model is just like sharing a file.

This document has the high level information you, as a developer, need to program against. That is the functional description, its inputs, types, and outputs as well as those details that Core ML needs to actually execute the function. For a neural network, this may be the structure of the neural network along with its trained parameter. We encourage you to come to our Thursday session to learn more about this format and the set of models it can encode. But maybe now in the back of your head you're starting to think well, where do I get these models? Where do models come from?

Well, a great place to start is to visit our machine learning landing page on developer.apple.com. There you'll find a small collection of task specific, ready to use models already in the Core ML format. We'll be expanding the set overtime. And we encourage you to explore. Give it a try. This is a great way to get introduced to machine learning in general and Core ML and how it can be used in your app.

But we may not have all the models you need. And for your application, you may need to make some customization to an existing model or make a whole new model from scratch. And for that, we encourage you and your colleagues to tap into the machine learning community. There is a thriving community out there.

There are tons of libraries and models already existing for you to start playing with. In addition, there are wonderful online courses and new tutorials are popping up every week. You can do it. There's great resources to learn. And I'm confident that all of you can get started. It's a wonderful time.

But you'll see that most of these libraries are focused around training. That is, it's a very important step but they'll spend a lot of time making sure that you can train a great model. What we want you to do is let Core ML take you that last mile from taking that machine learned model and actually deploying it in your app.

So we're doing this by introducing the Core ML Tools Python package. This is a Python package that is centered completely around taking the output of these machine learning libraries and getting them into the Core ML format. We'll be expanding these tools over time, but they're also being made Open Source.

[ Applause ]

So what this means, if you don't find the converter you need there, we're super confident you can already write your own. All of the primitives are there ready for you to use. Now I encourage you, again, to visit us on Thursday to learn more about this Python package and how you can easily convert models. OK. Now we have this model. Let's talk about how we use it in our apps. So you're going to start with the machine learned model in this Core ML format.

You're going to simply add it to your Xcode project. And Xcode will automatically recognize this model as a machine learned model, and it will generate an interface for you. That will give you a programmatic interface to this model using data types and structures you're already familiar programming against. You'll program using this interface and compile it into your app.

But in addition, the model itself will also get compiled and bundled into your app. That is, we'll take this format that's been optimized for openness and compatibility and turn it into one that is optimized for run time on the device. To show you this in action, I'm going to invite my friend and colleague, Lizi, to the stage.

[ Applause ]

And I'm an engineer on the Core ML team. Today, let's take a look at how you can use Core ML to integrate machine learning into your applications. Now first some context. As much as I love machine learning, I also love to garden. And if you're anything like me, you may get really excited whenever you come across a new kind of flower and instantly want to identify it.

So we sought out to build an app that would take images of different types of flowers and classify them by type. But first, in order to do that, we needed a model. So we went ahead and we collected many different images of different types of flowers, and we labeled each one.

I then trained a neural network classifier using an Open Source machine learning tool and converted it into our Core ML format. So with this trained model and with this app shell of this flower classifier, let's take a look at how we can put the two together. So over in Xcode, you can see I'm currently running the flower classifier app and nearing the display on my device. And what I can do is I can go into the photos library and choose an image, but right now it doesn't currently predict anything for the output. It just displays this string.

So if I move over to the ViewController, what we see is it has this caption method that currently just returns that blank string. It doesn't actually do anything. Over here, I have this flower classifier in the ML model format. And all I need to do is take it and drag it over into the project.

And we can see, as soon as I add it, Xcode automatically recognizes this file format. And it populates the name of the model as well as what type it is underneath. We can see that the size is 41 megabytes and also other information like the author, license, or description associated with it.

In the bottom, we can see the inputs and outputs that this model takes which, first is flower image, a type of image that's ordered red, green, blue, the following width and height. We can also see that it outputs a flower type as a string. So if give it an image of a rose, I would expect flower type to be rose. There's also a dictionary full of probabilities for different types of flowers.

Now the first thing I want to do is I want to make sure that I add this to my app. And you'll see that afterwards this middle pane will show that the Swift generated source has arrived in the application. What that means is we can actually use this generated code to load the model and predict against it. To show you, let's go to the ViewController.

The first thing I want to do is define this flower model. And to instantiate it, all I need to do is use the name of a model class itself. We'll see that it's highlighted because it exists in the project. Now next in the caption method, let's define a prediction.

And we'll actually use the flower model and look inside. We'll see using auto complete that this prediction method is available that takes a CVPixelBuffer as input. We'll use that and pass the image directly to it. Now, instead of returning this preset string, we can use the prediction object and instead return the flower type.

Now if I save, let's build and run the app. And what's happening while I do this is the model itself is actually getting compiled and bundled with the application. So if we go back to the device, if I go to my photos library, I can choose a rose on the second row. And we can see it actually is able to display rose.

[ Applause ]

We give it another one, maybe the sunflower on the third. It's also able to get this one. Or even something a little bit more difficult like the passion flower on the bottom, it can still do. So to recap, we were just able to drag a trained ML model into Xcode and easily add it into our application with three lines of code.

And now we have a full neural network classifier running on the device. But I have some other models that we might want to try so let's go through this again. You may have noticed there's also this FlowerSqueeze. And the first thing I'll do is try to drag it in again.

And then I'll also add this one to the flower classifier, the hello flowers target so that it starts generating the Swift source in the background. And we'll see, again if we zoom in, that the name of this model is different. It's called FlowerSqueeze but in the bottom the inputs and outputs are exactly the same.

What this means is that if I go back into the ViewController, instead of flower -- actually, let's delete flower classifier. We don't want that model anymore. And in the ViewController, instead of instantiating this one, we can use FlowerSqueeze. And the really neat thing about this model is its size. So if we go back, this one is only 5.4 megabytes, which is a huge difference.

Back in the device, we can go to the photos library and choose another one like love in the mist and see that this app is also able to classify this as well. Try another passion flower or even a more difficult photo of a rose on the side and it's still able to predict correctly. So to recap, we just saw that we're able to run this neural network classifier on device using Core ML, all with only a couple lines of code.

[ Applause ]

Now let's go back and recap what we just saw. We saw that in Xcode, as soon as you drag an ML model that's been trained into your application, you get this view that's populated that gives you information such as the name of the model, the type that it is underneath, and also other information like the size or any other information the author put in.

You can also see in the bottom that you get information such as the input and output of the model which is very helpful if you're actually trying to use it. And once you add it to your app target, you get generated code that you can use to load and use the model to predict.

We also saw in our application how simple it was to actually use it. Again, even though the model was a neural network classifier, all we needed to do to instantiate it was call the name of the file. This means that the model type was completely abstracted so whether it's a support vector machine, a linear model, or a neural network, you all load them the exact same way. We also noticed that the prediction method took a CVPixelBuffer. It was strongly typed so you know you'll be able to check input errors at compile time rather than at runtime.

But now let's take a deeper look at the generated source that's being created by Xcode. Here, we see it defines three classes, first of which is the input that defines the flower image as a CVPixelBuffer. We then see the output with the same two types that were listed in the generated interface. And next, in the model class itself, we see the convenience initializer that we use to actually create the model and the predict method as well.

But also inside this generated source, you have access to the underlying model. So for more advanced use cases, you can use it as well to programmatically interact with it. We take a look inside the MLModel class, we can see that there's a model description which gives you access to anything you saw in the metadata in Xcode.

There is also an additional prediction method that takes protocol conforming input and returns protocol conforming output giving you more flexibility in how the input is provided. So we walked through the top half of this development flow and saw what happens when you drag a trained model into Xcode. It's able to generate Swift source that you can then use in your -- to use the model in your application. The model itself, as we saw, is also compiled and bundled in your app.

To elaborate on this step a bit more, what we do, we first take the model from its JSON form, which is optimized for portability and ease of conversion, and compile it so that it loads quickly on your device when you initialize it. We also make sure that the model is matched to the underlying inference engines that are ultimately evaluating it on your device giving you optimized prediction when you need it.

And the last thing we saw in the demo was we swapped out models. Mostly just to reduce the size but there are other reasons why you might want to play around with different ones. For example, accuracy or just to test out different types to see how they perform.

In summary, today we saw an overview of some of the machine learning frameworks that we have available. We also introduced Core ML, a new framework for on-device inference. And also we showed you the development flow in action. For more information, make sure you check us out at developer.apple.com with our session number 703.

And also, come visit some of the related sessions such as Vision and NLP and Core ML in depth on Thursday if you want to see some more use cases of how you can use Core ML or how to actually convert models into the Core ML format. Thank you and enjoy the rest of the conference.

[ Applause ]