Video hosted by Apple at devstreaming-cdn.apple.com

Configure player

Close

WWDC Index does not host video files

If you have access to video files, you can configure a URL pattern to be used in a video player.

URL pattern

preview

Use any of these variables in your URL pattern, the pattern is stored in your browsers' local storage.

$id
ID of session: wwdc2020-10657
$eventId
ID of event: wwdc2020
$eventContentId
ID of session without event part: 10657
$eventShortId
Shortened ID of event: wwdc20
$year
Year of session: 2020
$extension
Extension of original filename: mp4
$filenameAlmostEvery
Filename from "(Almost) Every..." gist: [2020] [Session 10657] Make apps...

WWDC20 • Session 10657

Make apps smarter with Natural Language

Frameworks • iOS, macOS • 41:03

Explore how you can leverage the Natural Language framework to better analyze and understand text. Learn how to draw meaning from text using the framework's built-in word and sentence embeddings, and how to create your own custom embeddings for specific needs. We’ll show you how to use samples to train a custom text classifier or word tagger to extract important pieces of information out of text— all powered by the transfer learning algorithms in Natural Language. Find out how you can create apps that can answer user questions, recognize similarities in text, and find relevant documents, images, and more. To get the most out of this session, you should have a basic understanding of the Natural Language framework. For an overview, watch “Introducing Natural Language Framework” and “Advances in Natural Language Framework.” You can also brush up on model training using Create ML through “Introducing the Create ML App.”

Speakers: Vivek Kumar Rangarajan Sridhar, Doug Davidson

Open in Apple Developer site

Transcript

Hello and welcome to WWDC. Hello, everyone. Welcome to our session on natural language processing. The goal of this session is to help you make your apps smarter by using the power of NLP in the Natural Language framework. I'm Vivek, and I'll be jointly presenting this session with my colleague Doug Davidson. So let's get started. Let's begin with the central notion of language. Language is a core system that helps us humans solve difficult problems through communication, and it also provides us with a very unique type of social interaction.

If you think of how we communicate using language, language is an intermediate representation that helps us translate concepts into symbols, which can then be expressed in the form of words, phrases or sentences with some grammatical structure. The medium of expression can be through speech, perhaps through writing on a keyboard or Apple Pencil. It can even be an image or a video that you capture using your camera.

Now, language also has this remarkable property that not only helps us translate concepts into symbols, but it also helps us assimilate content into concepts. In the last few years, as we have moved from human intelligence into machine intelligence, this central notion of language has been replaced by NLP.

NLP has now become the intermediate representation that helps machines translate concepts into symbols, and also assimilate content into concepts. Now, what does on-device NLP at Apple look like? Until 2017, the primary modality where NLP was exposed at Apple was through the NSLinguisticTagger class in Foundation. Now, this provides fundamental text processing such as language identification, tokenization, and so on.

In 2018, we introduced the Natural Language framework. The Natural Language framework provides everything that NSLinguisticTagger can, and on top of it, we started focusing on state-of-the-art machine-learning and modern NLP techniques such as text embeddings and custom models. Not only that, we also started tightly integrating Natural Language framework with the rest of the machine-learning ecosystem at Apple through tight integration with Create ML and Core ML.

Now, before we jump into the rest of the session, we'd like to tell you that NSLinguisticTagger has been marked for deprecation. We strongly encourage you to move towards Natural Language for all your language-processing needs. Now, if you look at the kinds of functionalities provided in the Natural Language framework, they can be broadly broken down into three different categories. The first is in the area of fundamental text processing. The second is in the realm of text embeddings. And the third is in the area of custom models.

So, let's begin with fundamental text processing. Natural Language framework provides several basic fundamental building blocks, such as language identification, tokenization, part of speech tagging, lemmatization, and named entity recognition. And we provide these APIs across a wide variety of languages. For more information about these APIs, you can refer to our 2018 and 2019 WWDC sessions. But at the high level, all of these APIs operate on a piece of text, and what they give as an output is a hypothesis or a prediction.

However, they did not tell us a notion of confidence associated with this prediction. And this year, we have a brand-new API called confidence scores. So, this builds on top of the existing functionality, and in addition to the hypothesis or the predicted labels, you can also get the confidence scores using the APIs.

Let's see how we can use this. We start off by creating an instance of NLTagger and specify the tagScheme to be .nameType. This is something that you've been used to so far. Now we have a brand-new API called tagHypotheses. So, when you use tagHypotheses, in addition to getting the predictions either at the sentence level or the token level, you also get a confidence score associated with that prediction. Let's look at how to use these confidence scores through the lens of a hypothetical app called Buzz. Buzz is a news reader app. As part of this application, you can browse articles, you can bookmark them, and you can organize them so that you can read them later.

And what we would like to do is add a new feature to this application wherein we extract recent entities from the articles that you've read. So, we want to populate these entities on the right-side pane, and when you click on an entity, you can be taken back to the article that you've already read.

So, how do we do this? So, we want to use our named entity recognition API to automatically analyze this text and extract these named entities, so that we get these named entities such as Cartagena, and so on and so forth. Now, if you take a close look at the entities on the right side, you'll see there is a spurious entry.

We have something called, "Do Not Disturb While Driving." So, while the named entity recognition API gives us person names, organization names, as well as location names, this seems like a false positive from this machine-learning API. So, how do we fix this? Suppose we had an input sentence such as, "He was driving with Do Not Disturb While Driving turned on." When we pass this sentence through the named entity recognition API, what it does is it analyzes the sequence of tokens and produces a span of tokens as an organization name.

Now, this hypothesis is incorrect in this machine-learning model. Now, armed with the power of confidence scores, you can also get the confidence scores for each of these labels. As you can see, the confidence score is pretty low. By setting a threshold of, for instance, 0.8 for organization names, you can easily filter out this false positive.

With this, if you now go back to the app and incorporate this in your application, you can easily filter out the false positive, and you have a much better and enhanced user experience. We do have a few recommendations in terms of best practices. First, we'd like to recommend that you avoid heuristic hard coding of these threshold values and calibrate it on representative data that is pertinent to your app and domain of operation.

We'd also recommend that you consider creating thresholds on a per-class basis. Rather than setting a global threshold for all the classes in a particular application, you can set it on a per-class basis so that you get finer control of false positives versus false negatives in your app. Now let's move on and shift our attention to text embeddings.

Text embeddings are really important. In fact, they have been the cornerstone of recent advances in modern NLP. To really understand text embeddings, let's begin with the notion of a text corpus. What is a text corpus? A text corpus is a collection of documents which are comprised of paragraphs, sentences, phrases, and words, and in conventional NLP, when we start with the text corpus, the first thing that we do is to tokenize this corpus.

When we tokenize a text corpus, what we get is an inventory of words in this corpus. And this inventory of words can be thought of as a bag-of-words representation where each word is independent. Now, if you were to look at this from a machine representation standpoint, it is also called as one-hot encoding.

So, in this example, we have a bunch of words: food, burger, pizza, automobile, bus, and car. And we've gone over the corpus and extracted these words. Now, each word here is represented by a bit vector which has one bit on and the rest of the bits off. And the length of this vector is the same as the number of unique words in your corpus.

Now, as humans, we can see that food, burger, and pizza are related concepts, and similarly, automobile, car, and bus are also related concepts. However, if you just look at this bit vector representation, it doesn't provide any information about the similarity or dissimilarity of words. So, wouldn't it be great if we had a representation that also incorporated the information about similarities of words? And this is really where word embeddings come in.

When you use word embeddings, again, you start with a text corpus, and what you get as an output is a vector representation of words, wherein words that are similar are clustered together, and words that are dissimilar are clustered away. So, in this example, you can see that burger, pizza, and food are clustered together, and away from the concepts of automobile, car, and bus.

Now, to obtain these word embeddings, you can use different sorts of machine-learning algorithms which could be linear models or nonlinear models, but at a high level, they capture this vector representation by analyzing global co-occurrences of words in the text corpus. If you consider this and look at it from a machine representation standpoint, now the representation is different from one-hot encoding.

Each word gets a real-valued vector of D dimensions, or you can think of it as D columns. And now, if you look at the vector for food, burger, and pizza, they're close to each other in the vector space. Similarly, the vectors for automobile, car and bus are also close to each other, but far away from the food concepts.

Now that we've understood word embeddings, let's look at the different types of word embeddings. The first is called static word embeddings. Let's understand this concept. Suppose we had an input sentence, "I want a burger from a fast food joint." And we want to extract the word embedding for the word "food".

For the case of static embeddings, what we do is, for all the words in the vocabulary, we pre-compute the embeddings and store it as a lookup table. Now, this lookup table is pre-computed and stored on-device in an efficient manner. So, when we need to look up the word embedding for a particular word such as "food", we simply go into this lookup table, pick the corresponding vector, and give it as the output.

Now, static word embeddings are really useful. They are very useful to get the nearest neighbors of words in the vector space, and they're also very useful as inputs to neural network algorithms. But they do have some shortcomings. Suppose we had an input sentence such as, "It is food for thought," where the word "food" is represented in a different connotation based on the context.

What happens in static word embeddings is you will still pass this through the lookup table and extract the same vector for the word "food", even though we know that the connotation is different and the context is different. So, even though the semantic connotation of the word "food" is different because of the context in which it's used, we still get the same vector representation.

So, can we do better? And this is where dynamic word embeddings come into the picture. So, in dynamic word embeddings, what we do is we pass every sentence through a neural network, and what we get is a dynamic embedding for every word in that sequence which is completely contextual.

So, if we pass these two sentences through dynamic word embeddings, which can be a neural network such as a transformer network or an ELMo-style model, what we get as an output is one vector for each word that is different based on the context. So, the word "food" now gets completely different vector representations because the context of food in these two sentences is different. Now, on the OS, we support static embeddings in a variety of languages, also across different Apple platforms. For more information about static word embeddings, you can refer to our 2019 WWDC session.

In addition to static word embeddings, we also support what we call custom word embeddings, wherein you can train your own embeddings using a third-party toolkit such as fasttext, word2vec, GloVe, or perhaps even a custom neural network in TensorFlow or PyTorch. Once you do this, you can bring these embeddings onto Apple platforms, compress them, and store them, and use them in an efficient way.

Once you convert them to a representation on-device, you can use them just the same way as static word embeddings. Now, in order to use word embeddings, let's look at how you use it. So, you create an instance of NLEmbedding.wordEmbedding, and you specify the language, and once you have this, you can perform three different operations.

The first is, given a word, you can get the vector representation of the word. The second is, given two words, you can get the distance between these two words in the vector space. And the third is, given a word, you can find the nearest neighbors of the word in the vector space.

Let's look at the use of word embeddings through a hypothetical app called Nosh. Nosh is a food delivery app, and as part of this application, we have an FAQ section. Now, the user experience in this app, especially in the FAQ section, is not great. So, if I were to find some information, I have to scroll through all these questions and look for the question that I'm interested in, and then for the corresponding answer. So we want to improve this user experience in the Nosh app by adding an automatic search feature, so that you can type, or you can speak the query, and we can pick the corresponding question and show you the relevant answer.

How do we build this using word embeddings? So, one way to build this is using static word embeddings. Let's say you have an input query, "Do you deliver to Cupertino?" When you pass it through the word embeddings API, you can enumerate every word and get one-vector representation for each word in the sequence. Once you do that, a heuristic way of getting a sentence representation is to simply take the average of the vectors of every word. And what you'd get as an output is one vector of the same dimension.

Now, you can also pre-compute the word embeddings for every single FAQ question in your database. So you would take every question, run it through word embeddings, get the vectors, average them, and pre-compute the embeddings. So at runtime, given a query, you find the question that is closest to the input query vector, and you pick the question, and show the corresponding answer in the UI.

Now, this seems like a reasonable way of solving this problem, but it does have several shortcomings. The first is the issue with word coverage. Since static word embeddings work with a finite vocabulary, if you have an input query that does not have a word in the lookup table, you will lose information.

The second is using this averaging process is very noisy. It's akin to a bag-of-words representation that loses compositional knowledge. For instance, if we had a query such as, "Do you deliver from Cupertino to San Jose?" by simply taking the average, we are jumbling up the words, and we lose the compositional information contained in words such as "from" and "to".

So, the big question is: can we do better? And yes, we certainly can. And we are delighted to tell you that we have a brand-new technology called sentence embedding that solves this problem. Now, by using sentence embedding API, when you pass an input query, or a sentence such as, "Do you deliver to Cupertino?" it analyzes this entire sentence and encodes this information into a finite-dimensional vector. In the current API, the dimension of this vector is 512 dimensions.

So, how does this work? Intuitively, you can think of it as starting from a text corpus, and in the text corpus, if you were to tokenize the text at the sentence level, when you pass it through the sentence embedding, instead of working with words, now you have sentence representations. Each of these sentences are represented in this vector space in such a way that sentences that are conceptually similar are clustered together, and sentences that are dissimilar are clustered away from each other.

Now, the technology under this is fairly complex and utilizes several machine-learning techniques, one of which is pre-trained models in conjunction with custom layers such as bidirectional LSTM, as well as fully-connected layers. And we train this network in a multitask training setup on different tasks such as natural language inference, binary text similarity, as well as next sentence prediction.

But to use it, you don't have to worry about these details, you simply have to ask for it. So, you start by importing NaturalLanguage, and you create an instance of NLEmbedding.sentenceEmbedding and specify the language as English. Once you have this, you can ask for the vector of an input sentence. When you do this, the sentence is run through the neural network, and what you get as an output is a finite 512-dimensional vector that encodes the meaning of this sentence.

Given two sentences, you can also find the distance between these two sentences in the vector space. You simply run these two sentences underneath this API through the neural network, get the vector representation, and then compute the distance. Now, there are a wide variety of other potential applications for this technology.

But since we don't have a finite list of sentences right now, and you cannot pre-compute the embedding for all the sentences a priori, there is no nearest neighbors API available for this technology. But later in the session, Doug will tell you how you can use sentence embeddings and do nearest neighbors by leveraging custom embedding technology.

Now, if we were to go back to the Nosh application, when you have a query such as, "Do you deliver to Cupertino?" you simply pass it through the sentence embedding API, and you get one vector that encodes all of the meaning. Similarly, for all of the FAQ questions in your index, you can pre-compute the sentence embeddings. And at runtime, given an input, you simply find the closest question. And once you do this, you show the relevant answer in the application at the UI level.

To see the sentence embeddings in action, I'm going to hand it over to Doug, who's going to show us a demo of this working in the Nosh application. Over to you, Doug. Thanks, Vivek. So, let's see some of this in action. In our Nosh application, what we're going to do is to let the user type in a query string, and then we're going to return an appropriate answer from our frequently asked questions using sentence embeddings. So, let's look at some code.

So, the first thing we're going to do in this method is to just get a sentenceEmbedding, in this case, for English. Very simple. And then we'll ask that embedding for the vector for the user's query string. When we first constructed this application, we took each of our answers and constructed for it two or three example queries, and pre-calculated the sentence embedding vectors for each one and put them in a table. So what we're going to do is just iterate through that table.

We'll go through... For each key, we have two or three vectors representing these example queries. We'll find the distance between each of those and our query vector, and the smallest distance is our nearest neighbor, which represents the answer that we're going to show, and we return that answer. So, let's try it out.

So, if the user types, for example, "How do I use it?" then we can search, and go through and find a nearest neighbor and point them to the "How does it work?" section of our frequently asked questions. Or maybe they ask, uh, "Where do you deliver?" And then we'll search and find a nearest neighbor, and point them to the "Delivery area" section of our frequently asked questions. Or maybe the user wants to know, "Where is my order?" In which case, we search, and we can point them directly to the "Order status" section of our frequently asked questions.

Now, there are many other possible uses for this. Let's consider another hypothetical sample application called Verse, and Verse is an application for showing poetry. So, Verse has many, many different poems in it. And one obvious UI for this is that we could just have a long list of the poems where the user picks one, and then the user sees that poem, and that's fine.

But wouldn't it be nice to have some additional ways of looking for these poems? For example, suppose that I type in, "You're beautiful." Well, then we can find out that Shakespeare said it better, and we can do this using sentence embeddings. So, what we can do is take each line of each poem and calculate the sentence embedding vector for that line and then put them in a table, and then iterate through them just as we did in the Nosh application. But there's one twist here, and that is that we have hundreds of poems and thousands of lines. So it may be that the simple linear search and table that we used in the Nosh app isn't efficient enough, and we have a solution to that.

And the solution is to make use of custom embeddings. What do we need in order to create a custom embedding? We need a dictionary. The keys in the dictionary are arbitrary. So I've chosen them here to be, for example, poem_1_line_1, poem_1_line_2, to be strings from which we can readily determine which poem we were looking at, and which line. And then the values are just these vectors that we got for each line.

And from that, we can produce a custom embedding, and the custom embedding has two important properties. First, it gives a very space-efficient representation of that dictionary, and second, it has geometric information that we can use to do efficient nearest-neighbor search without having to go through the entire thing.

And now to create one of these custom embeddings, it's very simple. You can do this in Create ML, and then all you do is to take that dictionary and pass it into Create ML. And what comes out is a Core ML model that represents that custom embedding. So, let's take a look at this in action. Let's take a look at some code in our Verse application. And here is the corresponding method in Verse that takes the user's query string and returns the answer key.

So, just as before, we get the sentence embedding for English, and we get the query vector for that embedding. But now the rest is even simpler. We just take our custom embedding and pass it in that query vector and it will directly return to us the nearest neighbor, and it will return the key that we put into that dictionary from which we created the custom embedding. And as I mentioned, we can easily determine which is the right poem to return from that key. So, let's try it out.

If the user types in something like, let's say, "I love you," we can get a poetic expression for that and find a poem that represents that sentiment. Or maybe they type in something like... "Don't... forget me," and we can find a poem that expresses that sentiment. Just about anything we want, we can find a suitable expression. Maybe it's, uh... "Love... isn't... everything." And here's a poem for that, too, as well.

Now, I don't want to give the impression that the only thing you can do with sentence embeddings is this sort of text retrieval, because sentence embeddings are useful for all sorts of different applications. For example, consider a hypothetical app called FindMyShot, which stores images, and happens to have captions for each of those images. Now, since the image is associated with captions, I can use sentence embeddings to find an image based on similarity between the user's query text and the caption.

And there are many other possible usages for these. You can use them for detecting paraphrases, you can use them for input for training more complicated models, and you can use them for clustering. So let me spend a moment to talk about clustering. If you don't have any prearranged text, if all the text comes in from the user, then you can still make use of sentence embeddings.

For example, if you had messages, or maybe reviews, or maybe problem reports from your users, you can take sentence embeddings and calculate a vector for each one of these. And then you can use standard clustering algorithms to group these into as many groups as you want. And what sentence embedding means is that these groups are going to be sentences that are close together in meaning.

The availability of the sentence embeddings is for a number of different languages, English, Spanish, French, German, Italian, Portuguese, and Simplified Chinese, and on macOS, iOS and iPadOS. Now, these sentence embeddings are intended for use on natural language text, um, especially text that's come in from the user. You don't have to do a lot of preprocessing on this text. You don't have to remove stop words, for example, because the sentence embeddings have seen all this in their training.

And they're intended for being applied to text that is similar in length to a single sentence, maybe a couple of sentences or a short paragraph. Um, if you have text that's longer than that, then you can divide it up into sentences and apply the sentence embeddings to each one, just as we did with our poems. And, uh... Also, you can make use of the custom embeddings in case you have large numbers of the usage you want to store and look through. So, next, I'd like to turn to the topic of custom models.

The idea in custom models is that you bring in your custom training data, and we train a model for you for some particular NLP task. Now, there are two broad kinds of NLP tasks that we support that cover a wide range of functionality. And the first is a text classifier, which the object is to take a piece of text and supply a label to it. And the other is a word tagger, in which the object is to take a sequence of words in a sentence and supply a label for each one.

The custom model training is exposed through Create ML. You pass in your training data, Create ML passes it to Natural Language. Natural Language produces a model, and what you get out is a Core ML model, either a tagger or a text classifier. And our focus for the last couple of years has been on applying the power of transfer learning to these models. With transfer learning, the idea is that you can incorporate pre-existing knowledge of the language so that you don't have to supply quite so much training data in order to produce a good model.

And this pre-existing knowledge comes in by means of word embeddings, because the word embeddings have been trained on large amounts of natural language text. Now, we introduced this last year for text classifiers, and that provides a very powerful solution for many apps. For example, we can consider a hypothetical app called Merch, which is intended for transactions between buyers and sellers, and they communicate with each other about these transactions. But one complaint the users have, perhaps, is that they get sometimes spam messages and they don't want to have to look at all these.

Well, one possible solution is that you can train a text classifier by bringing in large amounts of example sentences labeled as spam or not spam, and then train a text classifier. And a transfer-learning model is actually very effective for this sort of task. And then the model in your app will tell you whether a particular message is likely to be spam, and then you can show it, appropriately, or not, to the user.

But what I really want to talk about today is the application of transfer learning to word tagging, which is new this year. Now let's go back and talk about the task of word tagging. As I said, the object is to take a sequence of words in a sentence and supply a label for each one. And probably the prototypical task for this is part of speech tagging, but it can be used for many other things.

For example, you can potentially use word tagging to divide a sentence up into phrases, or, and this is what we're going to be talking about here, you can take a sentence and extract important pieces of information from it, even though it's unstructured text. For example, in a travel application, I might want to know where the user's coming from and where they're going to.

Can we make use of this in our Nosh application? Well, let's take a look. So, we saw that with the sentence embedding vectors, we could return general answers to the users' queries. But there are other things that I might want to look at in a user sentence. For example, I might want to know what kind of food they're looking for, or where they want to get it from. And I could potentially label these parts of the sentence as food, or a city where the food is coming from.

Now, the most obvious and simple way to handle this sort of problem would be to just list all the potential foods and potential cities, and then just search through the text for each of those. And, of course, we support that sort of thing. We have our NLGazetteer class, which provides an efficient representation for any number of tables of items that you might want to look for in text.

But the problem with this approach is that, in general, you're not going to be able to list all the potential values that you might want to look for. So, as soon as you encounter some piece of text that you hadn't thought of before, then this simple search is not going to help you.

And the other problem with this approach is that it doesn't take into account anything about the meaning of words in context, and a word tagger can solve both of these problems. In addition, it's possible to combine a word tagger and an NLGazetteer for even greater accuracy. So, suppose I've decided that I actually want to use a word tagger for my Nosh application. Where do I start? The first thing to do is to decide what pieces of information I want to get out and assign labels to those. Then I collect sufficiently many example sentences that the user might enter, and I decide how I'm going to label them.

And then I actually label those sentences and continue to repeat this process until I have enough data to train a good model. And I might have to continue repeating it if my model ever runs across a situation that it doesn't handle adequately. Usually, the solution is to add some more training data and retrain the model.

So, what does our training data look like? Um... So, in our Nosh application, we're going to add some labels to sentences like this. So we'll use a neutral label, "O" here, in this case, for "OTHER," for the pieces of text that we're not particularly interested in, and we'll use labels like FOOD, FROM_CITY, RESTAURANT, for the pieces of text that we are specifically interested in.

Now, why did I say "FROM_CITY" rather than just "CITY"? Because I noticed that in these example sentences, there are two kinds of ways where a city can come in. The first is where it's the city where the restaurant is located, where the food is supposed to be coming from, and the second is where it's the city the user's located, where the food is being delivered to.

So I'm going to label those differently as "FROM_CITY" and "TO_CITY." And because the word tagger can take advantage of the meaning of words in context, it can distinguish between these two, provided I give it sufficient training data. And here is what the training data looks like in JSON format, which is very convenient for use with Create ML. So when I want to go and train a model on Create ML, it's very simple. If I'm doing it in code, I just import Create ML.

And then I ask Create ML to provide me a model. Now, we've supported this for a couple of years using an algorithm known as CRF, conditional random fields, and it works well. But what's new this year is that we are applying the power of transfer learning to word tagging.

And as I said before, what transfer learning does is to allow us to apply pre-existing knowledge of the language so that you don't have to supply quite so much training data in order to train a good model. And the way in which this knowledge comes in is via dynamic word embeddings.

As we said before, the dynamic word embeddings understand something about the meaning of words in context, which is just what the word tagger wants. So we use the dynamic word embedding as an input layer, and on top of it, we take the data that you provide and we train a multi-layer neural network, and that is the thing that actually produces the output labels.

Now, this sounds complex, but if you want it, all you have to do is ask for it. So, instead of asking for the CRF algorithm, you just ask for a transfer learning algorithm with dynamic word embeddings, and then it will train a model for you. So let's take a look at that in action.

So here's the Nosh application, and here is some of the training data that I have added for it. And I produced, oh, somewhat over 1,000 sentences of this format, and you'll notice these are in JSON format. So each example is a parallel sequence of tokens and labels, one label for each token.

And you'll notice that cities are labeled "FROM_CITY" or "TO_CITY," and you'll notice that foods are labeled and restaurants are labeled. And this is the data that I'm going to use to train my model. And so it's possible to train it in code, but I'm going to train this model using the Create ML application which makes it very simple.

So here's the Create ML application. All I have to do is point it to my training data, tell it which are the labels and tokens, and tell it which algorithm I want to use. In this case, we're going to use the transfer learning algorithm, and this is going to be for English. And that's really about all there is to it. I just started off and set it to train.

And the first thing that it does is to load all the data, extract features from it, and then it's going to start training using transfer learning, and it will train a neural network. So, this takes a number of iterations. With each iteration, it gets more and more accurate. Now, this particular training process takes two or three minutes, so I'm not going to make you sit through all of it. I actually have a pre-trained model, so let's go back... to... the application and take a look at some code.

So here is an example method in the Nosh application that's going to make use of our trained model. So we're going to be passed in the user's string that they've typed... First thing we'll do is load our model, our word tagger model as an NL model, and then what we're going to do here is use it with an NLTagger. And that's convenient because the NLTagger will take care of all of the tokenization and application, and just give us the results. So we've created a custom tagScheme. That's just a string constant that refers to this set of tags.

And we'll tell our tagger that that's what we want to use, and then we tell our tagger to use our custom model for this custom tag scheme. We attach the user's string to the tagger, and that's really all there is to it. We can then use the tagger to go through the words, and it will tell us, for each one, what it thinks the label should be, whether it's RESTAURANT, or FOOD, or FROM_CITY, or TO_CITY, or nothing.

And then we take note of those particular pieces of what the user has entered. And then we can use that according to the needs of the application to generate any sort of custom response that we might want to provide. So I've taken the liberty of adding a few custom responses to the Nosh application. Let's try it out.

So, the user might type something like, uh, "Do you... deliver to... Cupertino?" So what is our model going to tell us? It's going to look at all these words, and it will notice that Cupertino is a city, and it's a city they want delivery to, so we can generate a custom response that is specific to Cupertino.

Or they might ask, "Do you deliver... pizza?" And then our model will notice that pizza is a food name, so we can generate a custom response based on pizza. Or maybe they ask if we deliver from a specific restaurant, say, Pizza City, and the one in Cupertino. And in that case, the model will tell us that Pizza City is a restaurant name and that Cupertino is a city where the food is coming from, and we can use either or both of those to generate a custom response that mentions those. So that shows the power of word tagging to extract pieces of information from unstructured text.

So let's go back to the slides and let me turn it back over to Vivek. Thank you, Doug, for showing us a demo of transfer learning with word tagging. Now, transfer learning for word tagging is supported for the same languages as static embeddings and sentence embeddings across Apple platforms.

To get the best use out of transfer learning for word tagging technology, we have a few recommendations. We recommend that you first start off with the conditional random field, especially for languages such as English. The conditional random field, or CRF, pays particular attention to syntactic features, which are quite useful in many applications. However, if you do not know the kind of distribution that you will be encountering at runtime, it is better to use transfer learning because it provides better generalization.

We also recommend you to use more data for transfer learning for word tagging. Since the prediction is at a per-token level in contrast with text classification, it requires an order of more magnitude data. As I mentioned, NSLinguisticTagger has now been marked for deprecation. We strongly encourage you to move towards Natural Language framework. We also told you how to use confidence scores along with existing APIs. And this can be used to prune out false positives in your application.

And then we provided an overview of sentence embedding technology and demonstrated how you can use this in several hypothetical apps. And we concluded with a new technology for transfer learning for word tagging. With that, we'd like to conclude by saying, make your apps smarter by using the Natural Language framework. Thank you for your attention.