Understanding Ink APIs - WWDC 2003

Application Frameworks • 56:37

Inkwell provides basic support for entering handwritten data into your application zwithout requiring any modifications. Apple is introducing a new API to give you access to more advanced handwriting features. View this session to learn how to leverage ink in device-specific input solutions, how to use gestures to directly manipulate text in your application, and how to use recognition alternates to implement a correction mechanism. In addition, we discuss using the API to implement searchable ink and deferred recognition.

Speakers: Giulia Pagallo, Larry Yaeger, Brad Reigel

Unlisted on Apple Developer site

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.

Good afternoon and welcome to session 428, Understanding the New Ink API on Mac OS X. We are thrilled today to introduce a new set of APIs on Mac OS X that will allow you to provide additional support for handwriting recognition in your application. As you know, with Jaguar, Apple introduced Inkwell, the end customer solution for handwriting recognition. Inkwell provides system-wide support for Cocoa and Carbon applications without requiring any modification to the apps. And what I'd like to do first today is give you a brief overview of the functionality that Inkwell supports in the applications.

here is a Wacom tablet, my Wacom tablet. And I have downloaded the Wacom driver from the web. And the first thing I'm going to do, is go to the system preference and know that there's an ink panel there. So I'm going to launch that panel. And there is a control that is going to allow me to turn recognition on and off. Recognition is already on, but if it went off, I turn it on. And that's all basically all I need to do to have system-wide support for handwriting recognition in Java.

There's more stuff that you can do in the Ink Preference Panel. Basically, tweak the settings for recognition. Look at the gestures that we support. Add words to your user dictionary. But I'm not going to go into those right now. So let me quit the Ink Preference Panel. And now I'm ready to start inputting text in my Cocoa application.

I have text added up here. And I'm going to start writing anywhere I like on the screen. And you'll see that the strokes and the writing guidelines will show up. The strokes get shipped as a unit to the recognizer, recognized as text, sent to Cocoa as a Carbon event. Cocoa will extract the information and enter into the document. I can continue to ride.

On this tablet, strokes we build collected, recognized, and boom, they go to my application. As easy as that. Text edits have not been modified to support recognition. However, Inkwell allows you to do much more text than entering your text. It also allows you to do editing. So, for instance, if I hold down the stylus, the pen will go into mouse mode. And now I have a selection. I can act upon the selection and I can erase it by just doing a simple erase gesture on the screen. And boom, that word is gone. But I really didn't want to erase that word.

I want it back in my document and I'm going to... do an undo gesture a shaky and new gesture but there it is i got my word back on this on the document so now let me switch to a Carbon application and i'm going to show you that exactly the same paradigm works there i have itunes and say that i bought a new cd and i want to add it to my collection and i want to give it a name i'm gonna pick up my stylus and i'm gonna write the name for my new cd there you go i'm done so without any modification i can enter text edit and manipulate the ui however this is that's pretty powerful but there is much more than really applications could do if they had access to the information that the recognition system provides and to give you just a glimpse of that i'm going to show you what the ink pad does the ink pad is this little input window that we ship on jaguar that is accessible from the ink window And that we like to call it the first Ink Aware application. So as you see, I can go to the Ink pad, and now right there on the window, again, the text, the ink is recognized and will show up on the document.

However, if I can go now to that same text, and because this is an Ink Aware application, I can double click and hold, and I can look at all of the alternatives that the recognizing the system has given us. On top of that, I can choose one of them if I like, or, and I can look at the strokes that they use that I've written so that I get feedback on what that word was meant to be. So I can select that. And now, the case for the word has changed.

Okay. Let me use the undo. It works right there. Okay, so I can continue to write on the ink pad exactly the same way that I wrote on TextEdit. The phone ran, and I stopped in the middle of the word, where I can pick it up right where I left it.

Like the other part of the cognition, and then do a gesture to merge those two pieces of those two words that I wrote. So with a simple gesture that looks very much like a V, I can merge those two strings. And now, notice that because this is an InkAware application, I can erase a word just by scribbling on top of that without having to go through the process of selecting and then erasing. So there is much more power that you can add to your application by using the stylus directly on the objects.

And developers have recognized that. And since we introduced Inkwell, we have had a series of requests for developers to tap into the Ink services that we provide. And the requests have ranged from questions like, how can I add support for devices other than the Wacom tablets, to requests where developers are asking about separating the process of gathering the digital ink from the process of recognizing the ink-- basically doing deferred recognition, if you are familiar with the term-- to questions about-- and other developers who want to have more control on where and when the ink is drawn.

So we have taken, for Panther, we have taken those requests and combined it with our vision on how handwriting support should work on Mac OS X to provide you with a set of very simple yet powerful APIs that would allow you to provide additional support for handwriting. They will allow you to ship to your customers the solutions that they are looking for.

So today, what I want to do for the rest of the presentation is introduce you to some of the concepts that we are going to use throughout the presentation, some very basic concepts that we use in the APIs. And then we're going to talk briefly about the technologies that you use behind the scenes to provide the recognition support. And finally, we're going to go into the meat of the presentation and give you a brief overview of the API we support and how you can use them to develop solutions for your customers.

One question that we often get is, what kind of device do I need in order to get Inkwell or Ink services to work? And for Jaguar, the answer was, you need a Wacom tablet and a Wacom driver. For Panther, we're going to continue, of course, to support the Wacom devices, but we are going to add support for devices that provide XY and pressure information, either in a real-time fashion, like tablets do, or in an offline way, like the readily available USB and Bluetooth pens that you have these days on the market.

Now, let's go through the concept. There are basically five concepts, very simple concepts that we're going to use throughout the presentation and in the documentation. The first one is an ink phrase, and you are already familiar with that. Think of an ink phrase as a group of strokes that stays on the screen before being shipped to the recognizer. There are a little bit of nuances to that concept that you can read in the documentation, but that basically is what you need to think about for this presentation.

Now, as I was writing on the tablet and I lifted the pen, the ink phrase was terminated, shipped to the recognizer, and then to the application. There are other ways that I can terminate a phrase. And basically, I can, if I hold the pen for a while and nothing, and I don't do anything, anything else, a timeout is going to happen and the ink phrase is going to terminate.

The other way that ink phrases terminate is because if I write my strokes apart on the X direction and on the Y direction, that's also a way that the ink phrase is terminated. And the reason why I'm telling you this, what is an important concept to you, is because later on when we go through the developer scenarios and the APIs, we're going to talk about ways for you to take control over phrase termination if your application needs to do so.

The next concept is Ink Text Ref. And if the strokes you wrote are recognized by the ink services as text, essentially what you are going to get The first tool you need to get in your application is an ink text ref. An ink text ref is an opaque object that contains information about the digital ink that you wrote, the location of that ink, and the information from the recognizer. That set of strings that the recognizer is provided as the alternate for the ink. And the APIs provide a way for you to access most of the information that is available on the ink text ref.

The Alternates Menu. If you want to present to your user the results of the recognition so that they can browse the information, we're going to provide a simple API for you to construct that menu and show it to the user. And the menu will contain, again, the alternates plus the ink. And the final concept that I want to go over is gestures.

You saw me use all of these gestures in the demo. And so we basically have three types of gestures. The first kind, let me position myself, on the right hand of the screen is the undo gesture. And we call those kind of gestures non-targeted gestures. And what those kind of gestures, what it really means is no matter where you draw them on the screen, they're going to perform the same action.

The second type of gestures, which I have here on the top left of the screen, are what we call optionally targeted gestures. And an example of that is the erase gesture. And you saw me use it in two different contexts. In text edit, which is at this point not an InkAware app, I had to do a selection in order to erase. In the InkPad, which has been an InkAware application, you can just erase on top of the text. So that is the difference for those gestures. And we have additional gestures that are optionally targeted.

And finally, the third class of gestures is the joint gesture. Those are gestures that are only supported in InkAware apps because we need to know where on the screen that gesture occurs. So there is no way for the Ink services to differentiate between a V or a joint gesture unless the application gives us the information or where on the context of the application the gesture has occurred.

So again, there are three classes of gestures, and the reason why we're telling you this is because you again can choose to support those gestures in your application to allow the user to do direct manipulation, to interact with the stylus directly with the objects without requiring them to go through the metaphor that is needed for the keyboard where you have to select and then act upon the selection. You can just go and use the stylus directly in the object.

And so the non-targeted gesture, basically you don't need to add the support. The system can handle it. For the optionally targeted gesture, you may want to add the support if you want that interaction to happen because you need to add the information of the location. And for the always targeted gesture, the support has to come from the application.

And that concludes kind of the basic concept that you need to know today. Now let me move on to the ink components. And we have basically four components for the ink services. On the right-hand side of the screen, that blue rectangle shows the application context. And in the application context, two of the components live there. The ink input method is the part of the ink services that provide data collection and drawing of the ink. So as you write, the ink input method is going to gather the data and draw it.

The other component on the context of the application is the ink framework. That's where the implementation of the API lives. On the left-hand corner of the screen, the box shows you the user preferences, which I show you in the demo. And that contains the-- and then contains the press pane that allows users to set their recognition mode, look at the gesture, turn gestures on and off, add words to their dictionary.

And finally, on the top corner of the screen, we have the ink server. That's where the core of the recognition lives, where there's the segmentation, the recognition process, the language model is instantiated. And in addition, the ink window. But the important-- for you today is the fact that that's where the core of the recognition lives. And with that, it is my pleasure to introduce Larry Yeager. Larry.

Larry is a distinguished engineer at Apple. He's worked for us for several years in handwriting recognition, and I'm very pleased to have him on stage today. Okay, I'm going to give you a quick run through some of the technologies underlying what Giulia has just demonstrated for you and even dive down a little bit into the recognition technology, though that doesn't directly affect your apps. It's sort of what provides all these services.

The recognizer can be boiled down to something very simple. Basically, these three boxes pretty much define the last ten years of my life. It starts off with X, Y points and pen lifts, just exactly what you would imagine are coming from the graphics tablet. And we take those in strokes, what you normally think of as strokes with pen and paper, and we look at combinations of those strokes to see what might possibly be characters, and that's what we call character segmentation. So out of this tentative segmentation block comes these character segmentation hypotheses, and some of them are going to be right and some of them are going to be wrong, and we can't possibly predict up front which.

The next block is a neural network classifier. It takes these combinations of strokes that might be characters, looks at them in a certain way, and tries to decide whether they are or not, and which characters they are. So what it emits are these character class hypotheses. It's trying to say if it's A, B, C, D, E, F, or whatever.

That sequence of probability vectors are fed into a search engine. And the search engine sits there with a language model, that's the context, and tries to just find the best combination of all those possible character combinations and vectors of probabilities of characters and look for the maximum likelihood path through this graph and end up with, finally, word probabilities, what the person wrote.

So, sort of to see what that looks like, suppose a person wrote this on the computer. You, you know, hopefully maybe 50% of the people in the audience thought it was clog and 50% thought it was dog, although dog's such a common word, maybe a little skewed that way.

Well, what we do is take those strokes and take the first stroke that kind of looks like a C and feed it into the neural network classifier. It emits this vector of probabilities, and let's say it's pretty good. You know, 0.2. 90% certain that it's a C. Okay, that's great.

Next, we'll probably try the first two strokes together. And let's say that, well, you know, there's a written kind of far apart. That's an awfully big gap in there. Let's say the neural net kind of picks up on that and says, well, 70% probability that's a D. And, you know, a few other things scattered around.

When that second stroke is then tried by itself, the only legal transition is from the C to the L. You can't have the C and the D because they use the same strokes. So let's say that the neural net's pretty good and it's got about an 80% probability that that's an L.

Well, I've deliberately jury-rigged the numbers so that that looks like 0.9 times 0.8 is 0.72. So 72% chance that the theory so far is CL. 72% chance that that's right. Well, and I assigned 70% to the D. So, so far, our little engine thinks that it's more likely to be CL than it is D.

Okay, well, when the O stroke comes along, let's say it's pretty good. There's an interesting thing going on here that it almost doesn't matter what that probability is at this point. The CLO is going to win over the DO because the CL won over the D, and that's going to be true when we bring up the G.

You multiply those probabilities together, and indeed, ever so slightly the way I've jury-rigged the numbers, it would probably pick clog. But in fact, as I mentioned earlier, it's guided by context. We have a language model that's fairly complex that's guiding all this, and you can write outside of the dictionaries. You don't have to write in the dictionaries.

It's a loosely applied language model, but it does guide the search. And so if you were searching with the pet philosophy news group where the only words were cat, dog, and dogma, That would guide the search, and so the only viable path through that graph of the language model would be dog, and so really dog would win.

And there's tons more information actually published available on the web that we'll give you a URL for later. Okay, so given that the recognition engine is sitting under there ready to provide these answers, let's take a look at how the data flows through the system before it gets to your application. And again, you don't have to worry about every little detail in here, but hopefully this will give you an idea enough about how things are working that it'll make sense when you do have to make calls to make things happen just the way you want.

You'll understand why those are the right things to do. So initially, data comes in through the graphics tablet. That tablet data is fed into, really this is kind of two layers in this next block, but I just lumped them together to fit it on the slide gracefully. There's a tablet driver living at the IOHID layer, and what it's really emitting is, mouse events with tablet data in them. So they really are true, honest to God, first rate citizen mouse events, but they also have this block of data that has to do with pressure and orientation of the pen and all the special things that pens can do. Okay.

Those mouse event plus tablidator are fed through a core graphics layer, of course. And the core graphics layer does something very special for us. You may have noticed while Giulia was demoing ink text and some of the other things, she did not have to write in any of ink text's windows. She could write off apparently on the desktop.

We call that our write anywhere model. And it's something the user can turn on or off, but we turn it on by default if you turn ink on at all because it's deemed, we call it a write anywhere model. We think it's an easy way to interact with the system when you're using a pen.

Basically, you write where you want to, the recognition results flow to the insertion point. So core graphics is sitting there, and if this mouse event coming through the pipeline has tablet data in it and recognition is enabled, and we've told it that in the background, then it will route those events to the frontmost app.

That's going to be important later because you might very well, in the process of handling ink-related events, get points that lie out there. So you can write outside your windows, and you kind of need to know that. Okay, so there's one other thing that core graphics does for us.

If we're going to allow you to write outside of windows and write all over the screen, well, there are some places where you're going to put the pen, the user is going to put the pen down that they really want it to just behave like a mouse. When you put it down in the drag region of a window, you don't want to have it start writing while you're trying to drag the window. If you put it down in the scroll bar, you don't want it to start writing while you're trying to scroll the window.

And the dock is the same way. So we've invented this idea of instant mousing regions, and so there are certain special places that if the pen lands in one of these regions, it will instantly behave as a mouse instead of inking. And so core graphics has its own idea of a few very special places like the menu bar and the dock that are known, and the draggable regions of windows that are server-side dragged, that it knows to go ahead and treat instantly. as a mouse.

Then we take-- so it finally still emits just a mouse event with tablet data in. And this is now we're getting into your application context area. But first, our input method, our text services manager, TSM input method, gets a chance at the data first. We have our own idea about instant mousing.

And we take care of some things for you, like most controls in the user interface probably want to be instant mousers. Well, we know about that. So all your standard Carbon and Cocoa controls are automatically instant mousers, except for the ones that--where we think you ought to be able to write. There's going to be a way for you to manage your own custom controls in a minute.

As Giulia said, we gather that ink data up into strokes. We add those strokes to the idea of this current phrase and eventually get around to terminating the phrase, as Giulia said. We send that off to the recognizer. We get the results back. And in Jaguar, we post, first we post a unit code for key event so that apps that are already TSM savvy and looking for that can get the whole block of text at once. But if you don't handle that, then by gosh, we turn it into raw key downs. So finally, your application will get the text data corresponding to what the user wrote.

For Panther... We still have Unicode for key and key down at the end of the chain, but we've just introduced four events that let you manage what happens with ink in your application. There's an instant mouser event that lets you decide if some special region on your document layout needs to be an instant mouser. Hopefully there won't be any of those, but this lets you manage it just in case.

We send an ink point event for each of the points that comes in that we think should be inking. Now, I've talked about instant mousing and write anywhere and all that. There's always this decision going on when a person is using a pen whether it is supposed to be mousing or inking. And there's a fair bit of logic. We queue events up and wait until they've moved far enough, soon enough, and to make a decision that, yeah, they really think they're inking and all these things, and they're not in the right place on the screen and so on.

So we do have all this logic in place to take care of that for you. If you install an ink point event handler, you will only get these events when we've already done all that and decided that, yeah, really, the person thinks they're writing at this point, or at least that's our system view of when they ought to be writing.

So if you want to, for example, draw your own ink, you have a calligraphy application where you want to pay attention to pen orientation and all these things and produce really beautiful ink. Basically, you can tell us to stop drawing. Handle the ink point event and do what you want with it. And yet, you'll get the same user interface for your application as every other ink aware or non-ink aware application in the system.

Okay, the Ink Gesture event allows you to intercept those gestures that Giulia was showing you so that when you want to provide targeted gesture support, you just put in an Ink Gesture event handler, and now we'll send you things when the user has written one of these special gestures. Finally, there's the Ink Text event, and that's what provides that ink text ref that Giulia described that lets you provide the correction model with the alternate word list and so on.

Okay, now I'm going to do a quick run through the APIs. This is not the in-depth portion. You're going to get a really good walkthrough by Brad Reigel in a minute, who's going to show you how to actually, you know, sample code, show you how to actually use them in some very specific examples.

This is an attempt to give you a broad overview so you know what to draw from and what's possible in addition to the specifics that you'll see. Okay, I mentioned those four events. There they are. Don't need to go through them again because we've just talked through them. There's the instant mouser, there's the ink point, there's the ink gesture, and the ink text.

This is the, there's only one object in the ink world, and that is the ink text ref. And there are a very limited set, but hopefully fully flexible set of ink text APIs. There's, you can find out how many of those recognition alternates there are. There will never be more than five. You can find out, you can get the CFString for the top choice, or the second choice, or third, or whatever.

You can get, you can hand us a menu ref, and we will populate it for you with all those alternates. So it's very easy to create this. You can just, in fact, that'll be an example, so you'll see that. You can find out what modifiers, keyboard modifiers, were held down while the user was writing. There may be occasions where that's useful. You can grab a copy of a text ref.

You can find out its bounds. You can draw it into a rect that you specify. You can flatten it. That just means you collapse the data structures into a contiguous block of memory into a CFDataRef that you can write out to disk, should you want to save any of these things. And the counterpart to that is that you can do a create from CFDataRef.

There are a few Ink state APIs that let you query the state of the system and set the state of the system in certain cases. For example, you can find out that-- I can't even see it, but oh, there we go. Ink user writing mode. That lets you find out if the user has basically turned on that write anywhere option or not.

And the alternative to write anywhere is write only in ink-aware applications, which is what you'll be writing. The ink set application writing mode, as opposed to the user writing mode, allows you to basically tell the system that you are one of those ink-aware applications that ink should work in when the user has turned off write anywhere.

Ink set application recognition mode lets you control text recognition versus gesture recognition and appropriate combinations of those. Set phrase termination mode allows you to take control over when and how phrases are terminated, including if you want to eliminate R, handling of it altogether, and have it only terminate a phrase when the user presses an I'm done writing button.

And you'll get an example of that. And ink is phrase in progress. Well, that's important for an API I'm about to show you that if you think you're going to handle control terminating the phrase, well, it's probably best to make sure there's actually one in progress before you go terminating it.

And the only public structure from ink is this ink point. It contains just three things. The, the HI point, the location on the screen where it happened. It gives you the tablet point rec. That's defined in carbonevents.h. It's standard data structure that's passed up all the way from the tablet, the tablet driver and on. And the keyboard modifiers that were held down. Basically, it's everything that we know about the state under which this ink was written.

There's ink add stroke to current phrase. Now, especially if you're a device-specific solution where you have a bunch of data that you want to pass all at once to have recognition performed on, it doesn't involve the user writing real time, you'll probably be adding a stroke at a time to the current phrase until your idea of a phrase is done, at which point you'll call ink terminate current phrase. And add stroke to current phrase just passes an array of ink points. Okay, with that, I think I'll hand it over to Brad. Brad will take you through some specific examples of applying these things.

Thank you, Larry. Good afternoon, everyone. We're going to take a look at some developer scenarios. Basically, we're going to walk through the APIs that Larry just briefly introduced and see how we can use some of them in our applications to go beyond the basic services that you already get in Jaguar, allow you to provide more functionality for your users to take advantage of their tablets.

We're going to take a look at three basic applications or three basic scenarios. First one's going to be text engines, ways that text engines can take advantage of the functionality we provide and make better use of ink services. Second one is going to be input devices. These are going to be, you've got your own whiteboard or pen or whatever, and you want to be able to provide a custom solution for your users that takes advantage of handwriting recognition through ink services. All right.

And the third one is controlling phrase termination. It's not really an application, obviously. It's applications that wanted to control when phrases are terminated. They don't want to use the standard termination that we have in place. They may want to do something custom because of the way that users would input text into their application, things along those lines.

I'm not going to go through in-depth the APIs. We've actually got full documentation, believe it or not. We have two documents. One's a reference document that actually goes through all the APIs and describes all of them in depth. And we have an overview document that talks about a lot of what we're talking about here today.

So first we're going to take a look at text engines. In this case, we're going to look at first how your text engine can interact directly with gestures. So as Giulia showed earlier in Jaguar, right now you already get support with gestures, but only untargeted. Basically what this means is if the user were to write a cut gesture, clear space, things like that, it only applies to the current selection. So you can use whatever the selection is or the insertion point. So in this case, you'll notice up at the top of the screen, we have an example of, and this actually was done on InkPad at InkAware application.

You have the cut gesture. In this case, you'll notice that the insertion point, there's no selection. The insertion point is sitting down next to the word menu, but we're actually using the cut gesture to target cutting the word contextual. It doesn't matter where the selection is, where the insertion point is. Your users can use those gestures to directly interact with text, objects, whatever makes sense for your application.

The bottom of the screen, you'll see another example, which is the clear gesture. In this case, Giulia showed you earlier, you can write the clear gesture today, and it simply clears, deletes whatever the current selection is. In this case, you'll see that the insertion point is at the end of the phrase there, but we're actually going to clear, delete the word recognition right out of the middle of the phrase using gestures. So how do you do this?

You do this using the Ink Gesture Event that Larry referred to. Ink Gesture Event is pretty straightforward. It's only got three parameters. The first parameter is just what type of gesture, what kind. It could be clear, it could be undo, cut, copy, paste. If you look in the Ink Pref pane, it actually has one of the panels in there that lists out all the gestures that we support. The other two parameters are the bounds of the gesture. This is just the global coordinates on the screen where the user wrote that gesture. And the last one is the hotspot. And we'll take a look at those two and see how you take advantage of those.

So hotspots and bounds. You need to pay attention to this in order to support targeted gestures. On the upper corner up here, you've got the cut gesture. And you'll notice that there's a red circle around the very beginning point. Basically, you write the cut gesture starting from that bottom point going up. The hotspot is treated as the starting point for that gesture.

So when you're interacting with gestures, you're going to look at that point to determine what piece of text is the user trying to interact with, what object is the user trying to interact with. The other corner, we've got the space gesture. In this case, the hotspot is actually the uppermost point of that upside-down V, however you want to describe it. And wherever that point lands is where you're going to insert a space. So once again, as we saw in the screenshot a minute ago, it doesn't matter where the insertion point is, what the selection is.

You can take a look at the gesture and the hotspot of it and do the right thing. The bottom of the screen, we've got the join gesture. This is one, as Giulia mentioned earlier, the join gesture is actually one where it depends upon where it's written on the screen, how it's handled. And in this case, you want to take a look at the bounds of the gesture. There's not a single hotspot. You take a look at the bounds. And for this one, the bounds are the upper two most corners of, sorry, the hotspots, really.

That you're looking at are the upper two points for the bounds. And you're going to take a look at those two points and say, okay, did those land on text? If so, I want to treat it as a join, join those two pieces of text together. If they didn't land on text, you're actually going to end up returning event not handled for that gesture. We'll then turn around and recognize that gesture as the letter V.

So real quickly, it's just a standard event handler. You'll go through and grab the kind, grab the hotspot, and grab the bounds, and save those off. And depending upon which gesture you've got, you're going to do things differently. The undo gesture, if you're getting that one, it's not targeted toward anything. It's just, if you get an undo gesture, it's just your standard whatever undoes the last user action.

Currently in Jaguar, the way we make this work is, in Panther, if you don't handle these events, what we actually end up doing is turning around. If you don't handle the events, we turn around, and for undo, we end up issuing a command U, or whatever makes sense, the standard command key shortcuts for those actions. So you actually don't have to handle the undo. If you don't, we'll just end up issuing a command U, assuming that your application supports that.

So you're walking through, handling the events. First one, the undo gesture. Just call your basic undo routine. Second one is the cut gesture. As we just saw, cut depends upon a hotspot. So you're going to have some routine that handles that gesture, and you're just going to hand it the hotspot. And if the hotspot lands on text, then you want to cut whatever object or text that gesture lands on top of, the hotspot lands on top of.

If you determine that the hotspot lands on text, then you want to cut whatever object or text that gesture lands on top of. If the hotspot landed somewhere outside of text or not on an object or whatever, you'll probably want to handle that as just cutting whatever the current selection is, whatever makes sense for free application.

The join gesture, as we just looked at, this one you're going to look at the bounds. You're going to have a routine that looks at the upper two points of the bounds, determines if they land on text, and if they do, do the right thing for your application. And if all that comes back true, you did handle it and everything.

You just return no error, and the event's done with. If you determine that that join gesture didn't land on top of text, then you're going to go ahead and return event not handled error, like I said before, and we go ahead, reprocess that, and treat it as letter V. Thank you.

So second thing we're going to take a look at is how to implement an alternates menu for a text engine. So as you saw earlier, this already supported in the ink pad, and it's pretty straightforward, easy to implement. Actually, the ink pad in Panther uses our public API as well and doesn't do anything special with it.

So in this case, you've got your text editor, or actually, this case has to be a text editor, and you want to support bringing up a list of alternates for the word that's been selected. Usually, it would be just a contextual menu click or whatever makes sense on that word. So how do you do that?

First thing you need to do is handle the ink text event. And as we saw earlier, the ink text event is sent once we've finished recognition. The user has written everything they're going to on the screen for a phrase. And we issue one ink text event per word that we recognize.

So if the user wrote hello world or something like that, they're going to end up receiving two ink text events. And each text event contains an ink text ref, which, as we saw earlier, one of the pieces of information it contains is the list of alternates, basically recognition choices for that word.

And you'll end up... handling that ink text event. Go through the first thing for the ink text event is-- and actually, it's missing off that last slide-- there's actually one other parameter, which is, is it a keyboard shortcut? Because we handle just single letters, things like that, and you could be holding down a modifier key at the same time, the user could have actually-- can write a command key shortcut purely using the tablet without actually using the keyboard.

In most cases for your application, you're probably not going to want to treat that as a text. You're going to want to allow that to filter through as a standard command key shortcut. So first parameter is just a Boolean that says, you know, do we believe this is a command key shortcut, command or control, along with a single character? Assuming it's not, then you're going to go through and extract the ink text ref from the event.

You're going to call ink text create CFString with that text ref. The zero parameter there just means it's the topmost choice. This is the piece of text that we think the user just wrote. You're then going to take that CFString. If your application doesn't want a CFString, you can get the data out of the CFStringRef.

But you're then going to insert that in your document. And one of the things you need to do is when you insert that into your document, you also need to take a look at the ink text ref and keep that around. And you need to associate that ink text ref with the word that you just inserted. The reason why is, as we'll see here in a moment, we end up using that ink text ref to create the alternates menu.

So you want to implement an alternates menu. One single API does it for you. Basically, you're going to create a menu ref. This could be one that you've already got laying around that you use. It's completely up to you how you want to handle it. You pass ink insert alternates and menu and pass this to the ink text ref. This is why we needed to keep it in that last step.

That corresponds to that word that the user is clicking on and your menu ref. And we then insert, just like you saw in that screenshot and like you saw in ink pad demo earlier, we insert the five alternates and we actually insert an item that draws the original ink so that the user then, say, they look at it and that's not what I wanted, but then they look at what they wrote, oh, that was what I wrote. They can see it all on the fly there.

So we insert in the menu. It's up to you then. You just call pop-up menu select, contextual menu select, whatever makes sense for your application. And when you're done, you'll just get the selection that was made from that menu. And that's the piece of text that you should replace in your document. So say they wrote hello. It got recognized with a capital H. They really wanted a lowercase h at the beginning. The alternate menu would contain that item.

If the user selects that, you go grab that menu item text, replace the capitalized version with the opposite, you know, whatever the user chooses. And that gives you an alternate menu, an alternates menu. A single API along with a couple of the menu manager routines will do it for you.

Third scenario. We're going to look at input devices and how you can create a custom solution for your users. And the demo, I have a little demo here. It's actually pretty, it's really basic what our demo does, just for the sake of simplicity. It could vary a lot for whether or not you've got a pen or you could be gathering data just in a single application. It varies a lot. But essentially what you're going to do is collect the ink data in your application. And then you're going to tell us to perform recognition on it. And let me show you a demo real quick here of what I'm talking about.

So we've got a simple application here, and all it does is it's got an area in the window here where it's going to collect data. And in this case, I'm using a tablet, but that's just because I'm used to inputting it that way. This application, though, turns off ink recognition. We don't get in the way at all. The application itself ends up collecting the mouse events.

and draws them on the screen. So you'll see, you know, this is not the standard black, all that kind of stuff. The application is actually drawing all this on their own. They're collecting the data and then they end up sending us that data. We perform recognition on it. And just like the standard behavior is, we end up sending a series of ink text events and then you get the text out of that. So let me show you how we do that.

So first thing you're going to do, set the recognition state with ink set application writing mode. And you're going to tell us, don't do any recognition. Don't allow writing with ink services in your application. I'm going to handle it all on my own. For our demo app here, like I said, for sake of simplicity, we're just looking at mouse events. Depending upon your device and such, you could gather the data in whatever manner you wanted to.

And as we looked at earlier, we gather data just like we do internally. You're going to want to gather data into strokes. And strokes are simply, as we said earlier, a basically mouse down, mouse drag, mouse up sequence defines one stroke. So you're going to gather data, and once you've gathered an entire stroke, you're just going to call ink add stroke to current phrase and send us the stroke along with how many points are in there, and we start building up a phrase for you. You continue doing this, iterating through for each stroke that you've gathered data for, and when you're finished, you call ink terminate current phrase, and we go ahead and perform recognition at that point.

So the code to do it, initial setup. In this case, we just have a mouse event handler. We turn off automatic recognition for our application. Don't want Inkwell getting in the way. And then we just have a simple mouse event handler. And as we go through, we're going to look for first a mouse down event. When we see that first one, we'll go ahead and start a new array of ink points, start building that up.

So you get a mouse down, a series of mouse drag events. For each mouse drag event, you're just going to continue building on your array of ink points. And when you get a mouse-up event in our case, or whatever defines a stroke for your case, you'll call inkaddstroke to current phrase, send us that array and the number of points in that array, and continue on in that loop.

And when you finish gathering the data, you just call ink terminate current phrase, no parameters or anything to it. We terminate the phrase that you were sending us the data for. We perform the recognition, send a series of ink text events for each word that we recognized, and business as usual.

So last scenario we're going to look at are controlling phrase termination. This is applications where your user may be interacting with your application in, I don't want to say a nonstandard way, but you want to control when phrases are terminated. You want to control when you're getting the data. You want the user to be able to continue writing until the user said, I'm done with this. We're going to do this by looking at the ink point events that we talked about earlier and by doing your own custom phrase termination.

So let me show you a quick demo of that as well. That will show you. So in this case, it's just a window. And we don't do anything special here except for the fact that we have an API that says turn off automatic phrase termination. So in this case, the application is in complete control.

And as Giulia was saying earlier, say the phone rings. Oh, I want to go answer the phone or whatever. Whatever makes sense for your application. Notice I've taken the pen away. I'm waiting a really long time. No termination is happening. You're in complete control of it. The thing you need to keep in mind here is you're in complete control of it. The phrase will not get terminated until your application says to terminate it. So this ink will stay here until you finish.

So we have a button down here at the bottom, and if I actually clicked on it, there we go. And we've terminated the phrase. Recognition happens. We send ink text events. We get the data out of it. Go along from there, business as usual. How we do that... So one step back, as we said, automatic termination can be turned off using ink set phrase termination.

Giulia went through earlier when termination happens normally. You actually also have finer grain of control. You can control so that termination only doesn't happen when the pen leaves proximity. It might only happen, or you can turn off so it only doesn't terminate if a timeout happens or if it takes too long. And if you turn off termination altogether, then you can control termination through ink terminate current phrase.

So the way we do this, one of the things you need to keep in mind is we continue handling all those mouse events that are coming through. So normally, notice we had a button in there that controlled phrase termination. Normally, we're going to continue writing. Once we've started a phrase, we don't send anything through to controls. We don't do anything. We can keep sucking all that data in as further data for writing.

One of the things we talked about earlier was for each mouse event that we get through that we've decided, okay, the user's now inking. For each mouse event that comes through, we send an ink point event. And the ink point event contains a single parameter, which is actually the original mouse event. And things you can use this for is controlling whether or not to even allow inking in a certain region. Say a phrase is started, but you don't want to allow inking outside of a certain region in your window.

Or in our case, what we're doing with this application is you want to, if a mouse down event happens in a certain area, such as on a button or things like that, you want to do something special with that. So if you return event not handled there for the ink point, you can just look at it, see if you want to do anything with it.

If you return event not handled, we continue on with recognition, continue sucking all that data in. If you return no error, then we basically drop that and regenerate or repost. Okay. So that's the original mouse event. So if you say you've looked at an ink point event, you've determined, okay, it landed in an area where I don't want them to treat it as ink. I want to terminate and start or just repost as a mouse event so it gets to my control. You just return no error for that ink point event.

So the steps to do it, ink set phrase termination or turn off automatic termination. You've got an ink point event handler that checks the location of mouse down events in our case. If the mouse down lands on basically the region where that terminate phrase button is, and if there's currently a phrase in session, then you just call ink terminate current phrase, and we finish. We do recognition, and you get the ink text events. Otherwise, just return event not handled error, and we continue building up the ink data.

The code to do it is simply you handle an ink point event. It's not the k-event ink text event that shows up in there. It's actually an ink point event. You install a handler on it, and you call ink set phrase termination and say termination none. Don't do any automatic termination. The application is in complete control.

In Point Event Handler, you're going to grab the mouse event out of there, the single parameter. You're going to check to see, for our application's purposes, you're going to check to see, was that a mouse down event? If so, check to see, you know, hit test, where did that event land?

If it landed on the area where your button is or, you know, wherever you're trying to terminate, then check to see if there's currently a phrase happening, you know, are they actually inking right now? And if so, terminate the phrase and return no response. If there's no error, everything finishes, we perform recognition. Otherwise, like I said, just return event not handled error, and we continue building up the data.

And that's all you need to do for that. So summary things to keep in mind. If you're keeping count, we have actually less than 20 APIs total. They're really straightforward. We put a lot of effort in making them easy for you to use without a lot of complexity.

And the Carbon events that we have basically take advantage of the current event model. They're really easy to just flow into your current event handlers. And basically, we just want to encourage you to add more support to ink, allow your users to take advantage of the services that are provided to you, and ink different.

So we have, as I mentioned earlier, we have two documents that are available on the ADC website, and they're already up there. One is the Ink Services Reference, which goes through and actually documents all these APIs and everything, and goes into pretty in-depth. We've got some sample code in there as well that shows you how to use them. And also we have an overview document that shows you how to integrate ink services into your application, goes through some of the samples we've got here. Review all that and see how you can take advantage of it.

The application that I was just using here is called InkSample. We've also got the code for that posted, so you can go take a look at how we do it. And if you're interested in more information on how the recognition technology works, Larry's got a website. It's pobox.com slash tilde Larry Y. And he's got a lot of good information up there as far as how the recognition system works. And if you have further questions, you can contact Xavier and send him questions, feedback.