Application Technologies • 1:01:28
Make your application ready for new markets by designing with the world in mind. Learn how to take advantage of Mac OS X's robust suite of international technologies to support languages and regions around the world. We'll focus on Unicode text support, international date and number formatting, and language localization issues.
Speakers: Lee Collins, John Jenkins, Deborah Goldsmith
Unlisted on Apple Developer site
Transcript
This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.
I'm going to be talking today to you about how to take your application global. That means providing the same quality of user experience that you provide to your users here in the US and other English speaking markets to users around the world. Let's think first of all about why you care and why Apple cares about this. Take a look at this pie chart. It shows a recent breakdown of our revenues. As you can see, a big chunk of our revenues are now coming from outside of the US.
So we want to sell more systems. We need your applications to sell those systems. And a lot of those systems are going to be sold and are sold now outside of the US. You want to sell more applications. You want a lot more Mac systems out there to sell your applications on. So you need to think about your customers outside of the US too.
Let me first talk a little bit about what it is we're talking about. Let's define some terms. There are two related disciplines involved in taking an application global. One is localization. Localization is essentially preparing your application to be used in a particular market, making the user experience good enough and natural enough so users will want to buy your application. Internationalization is a discipline that requires, it's a design, methodology, and it's a set of services you provide, or that we provide through the system, that enable you to take your application to a particular market.
So for example, say you want to take your application to the Middle East. Well, if you haven't built your application so that it can handle complex bidirectional text, even if you send it out to be localized, it's not just a matter of localizing the strings. Your text has to be able to display those strings in Arabic.
So it's more than just a translation of user interface. It's also a cultural awareness and sensitivity to users in your markets. Apple learned this a long time ago when we first shipped HyperCard. We shipped a stack that contained A number of things that people thought represented fun. One of those things was a woman in a bikini on a beach playing with a beach ball. Well, that may be fun to many people here in the United States, but in other countries that can be a source of outrage. So you have to attune the content of your application to users in a particular market. Avoid cultural faux pas.
So let's look at a typical case. You start out, you've written your application. You have a window that displays a list of names. You've done everything, probably done everything right. Maybe using Cocoa, it's very easy to display the text, bring up the window. This is the way your application should look to a Japanese user.
Now, in many cases, This will just happen for you, but say there's a point where you're getting data from the internet somewhere. and you didn't know how to translate that data correctly into Unicode. Your application could end up looking like this. So these are some of the points we're going to cover today and show you how to avoid this kind of situation.
So what is the ideal global application? Basically, it's really the same application that you've developed in the US, but it's an application that feels natural and provides a pleasing experience to users in any country. So you've considered the different content of the language. You've set it up so it can be localized without having to change the software. If you have to completely rebuild your application to take it to a new market, that's really a very expensive process and it's not a good design.
If you want to take it to any market around the world, especially growing markets in the Middle East, South Asia, then you have to be able to handle complex and bidirectional scripts such as Arabic, Devanagari used for Hindi, and other languages. And finally, you have to be able to present data in formats that people understand intuitively. Date/time formats, number formats, they vary around the world. So the ideal we're presenting is very high, but fortunately we provide you with a great set of comprehensive and powerful APIs to do this.
So these are the things that we're going to cover today. We're going to cover basically everything you need to know to create that ideal application. There are a lot of things we have to cover. We're going to cover the high points. We'll point out the important things. You'll know everything that's there.
You won't know the details, but you'll know where to go to the Apple Developer Connection to find out the details when you actually sit down and try to solve some of these problems in your application. So it's going to be a whirlwind overview, but it will be complete. Before going on to our overview, let's talk about what we've introduced in Tiger.
So one of the big features is OpenType layout. Now, the layout tables are the tables in the font that allow you to do high-end typographical things like compose ligatures, select variants in a glyph, swash variants for positional settings in a glyph, numbers, old style number formats, for example.
Now we supported the OpenType font format itself, the outline format, almost from day one in OS X. We've heard from you. We know that OpenType layout is important. We've begun to add it in Tiger. We're starting with Roman and Japanese. We will eventually work our way through the full set of UniCode.
We'll provide ways for you to implement that if we don't. So now if you bring your nice Adobe font with all of the high-end typographical features onto OS X, it'll just work. And you don't have to do anything extra, because our APIs handle it, as long as you're using system APIs to do line layout and rendering.
We've also provided some other features in Tiger, for example, string translation. You can now take a string of Chinese text in Han characters and convert them into the corresponding pinyin, for example. Unicode type selection. This has been available in the Finder already as an SPI since Panther. Now we're making it available to application developers who want to do their own navigation through lists, etc.
We've greatly extended the set of UniCode characters covered in our basic fonts. These are Times, Helvetica now contain fairly complete repertoires covering extended Latin, Greek, Cyrillic, and other symbols and things. We're continuing our language support extensions. We added Tamil and Tiger. No jokes about the Tamil Tigers.
[Transcript missing]
et cetera. And this is due to our participation in an organization called the Common Locale Depository, which is hosted by the UniCode Consortium. We're active members in this. We get data from them.
We provide data back. We've considerably beefed up the support for manipulating some of these formats, including numbers, date times, et cetera. You can now spell out the number the way you say it in English, for example. There are new classes to manipulate calendars, NS Calendar, CF Calendar, and the Cocoa and Carbon Worlds. So for example, you can pick a Japanese emperor's reign and iterate through back to the previous emperor, if you want to find out who that was. There's also a NSDate Picker that provides a user interface for selecting calendar dates.
Another big feature is we now support calendars other than the standard Gregorian format that most of us are familiar with. Here's an example of those calendars: Japanese, Thai Buddhist, Islamic and Hebrew, a date from each, shown as localized to the U.S. English locale. Note that the calendar is orthogonal to how you present it for a particular locale. Now I'm going to turn over the session to John Jenkins, who's going to give you a demo of some of these new features.
[Transcript missing]
You'll see that the ligatures are forming automatically just as you would expect. And in point of fact, if I bring down the set of ligature features that are available in the font, you'll see that the common ligatures are on by default. So I can select the text, I can turn that off, turn that back on.
I can...
[Transcript missing]
The first example here is if I turn on historical forms and ligatures, I'm getting long S's. And if I turn on rare ligatures, I get an actual long S, long SI ligature, which is present in Minion Pro. And for all of this, we also have letter casings, so I can turn on small caps. I can turn on small caps from capitals. These are all features which are built into the font, and I'm getting them without any effort.
One last thing I'd like to show in terms of the OpenType support here is the number case and number spacing. So typically, numbers are monospaced for lining purposes. I can make them proportional if I want. I can switch to old-style numerals if I want, which looks better. Interestingly enough, Minion is set up so that small caps turns on the old-style numerals.
And I can override that and force them back to be lining if I want. And again, I can make them monospaced or I can make them proportional just as I desire. So all of these features are defined in the font. These are percolating up to the user. And if you're using the Unicode Drawing APIs, all of this will happen automatically.
For the internationalization, we have here a simple application. This is a Cocoa application to do diary stuff. And it's very simple, straightforward. I can put down, what is the weather today? It's a little cloudy. And I can talk about what I did and navigate back and forth just as you would expect. Now this is, as I say, is a Cocoa application. The focus of this session is on core foundation.
But in some sense, that's irrelevant. Core foundation and Cocoa are very heavily toll-free bridged, which means that you can freely intermingle the two technologies. And in point of fact, even though this application was written for Cocoa, it makes extensive use of core foundation. And the main things that are different from this application and a real live application is the fact that I have a locale menu and a calendar menu.
And I've done this simply because for demo purposes, ordinarily an application would use whatever the user has set in the international preferences. Let's look at the locale menu first. And the first thing you'll notice is that this is a big menu. In point of fact, it's so big that I'm organizing it by language and it's still a very big menu.
Now there are a couple of differences between this and the locale menu that you see in international preferences. The first thing is that you have the umbrella language locale. This is available so that you don't have to do a localization over and over again. For a language like French, if I wanted to localize, I can do a generic French locale and then I can localize and override that as necessary for Belgium, Canada, Switzerland, France, and Luxembourg. So that's different.
And the other thing that I'm doing differently is just to be fancy, I'm getting the name of the locale as displayed in the locale itself. International preferences uses the system locale. So I can do this fancy text and stuff. So if I switch to, let's go to Gaelic.
[Transcript missing]
Now that we have support for additional calendars, if I switch to another calendar, okay, now this is my own date picker, because NSDate picker is a little too small for our purposes here. But you'll notice that it has done, first of all, it switched the number of days in the month. It knows what day of the month the month starts on. It knows how many days there are in the month.
So I've been able to query the system and get all of this information. And in doing data arithmetic, if I want to go back a month, it leaves me on the 26th of the month, just as you would expect it to do, because I'm using the system calls to do the date calculations. Same thing for the year.
It leaves me on IR26 as I go back and forth through the years. So all of this is happening automatically. I'm using the system calls. I'm getting all of this functionality. And now that you've seen some of the things that you can do, I'm going to turn it back to Lee, and he will show you how to do them. Thank you.
Thank you, John. Now all those ancient dates I remember from Roman history will be of use. So, taking your application global. The place you begin this is at the lowest level, text storage and access. In OS X, text means Unicode text. At the high levels, that's UTF-16. At the lower level, BSD, that can be UTF-8. We use UniCode because it's comprehensive and of all the solutions, well it's also a standard, international and industry, and of all the solutions out there, it does a great job given the complexity of the problem.
Now the one thing to think about UniCode is it's not simply a 16-bit ASCII. The problem is too complex. In fact, it's not even 16 bits anymore. The current version of UniCode now contains over 96,000 characters. And it breaks that 64K wall to get into that larger character set space by using something called surrogate pairs. These are well-defined ranges of UniCode characters, 16-bit units, that can be paired together into a surrogate pair. For example, we see such a surrogate pair for a Chinese character here.
So the main thing to think about here is that a UniCode code point is not simply a 16-bit unit. Another interesting feature is that the things that a user sees as characters, on the screen and in a document are not necessarily the things that are stored in the backing store of your text.
So for example, a base letter with an acute accent, as in resume or the Vietnamese word viet, it's backing store could actually be multiple Unicode characters. So when you're looking at--even if you're looking at Unicode code points, you still have to consider that maybe what the user sees as a character is different from what you're seeing in the memory. We provide ways for you to manage this.
Finally, for primarily historical reasons, Unicode has allowed multiple encodings of some of these combinations of base letters and accents. So for example, all of the commonly used accented letters, E with an accent, A with an umlaut, et cetera, all have two representations within Unicode. both as a pre-composed single UniCode code point and as multiple code points representing the base letter and the accent. And in the future, all extensions to these kind of pairs or triples of letters will come through composition, not through pre-composition. A side effect of this though is that you can encounter multiple spellings in a document.
has a one-to-one correspondence with what's in the backing store, as we see with the word resume. It may be fully composed, UniCode. It may be fully decomposed, or it may be a mixture. You have to be prepared to handle this case. So that's UniCode in 60 seconds. If you keep these points in mind when you are working with UniCode as text, directly as text, not doing line layout, higher level functions, you'll probably avoid a lot of common errors.
So this is UniCode as you look at the data types that support it in the system. At the lowest level you have a UTF-16 UniCode unit, Unichar, with mixed caps or all lower case if you're working in Cocoa. Now Unichar, arrays of Unichars are fine for storing text, for passing text around the system, especially large chunks of text. But when you need to actually process a single UniCode code point, which could be one of these surrogate pairs or even a 32-bit value, you need to look at it differently.
That's where you need to actually extract a UniCode code point from your Unichart array. This single UniCode code point is called-- has a type, UniCode scalar value, or UTF-32 care. Now this is, now for example, if you want to find out the property of a character, you want to know if a character is a number, if it's a punctuation point, etc. You first need to get the Unicode scalar value from your Unichar array, and then do the test. And we provide ways in the system for you to do this easily.
At the higher level are string classes. John mentioned we have corresponding string classes for both core foundation and for the Cocoa world. And they're toll-free bridge, as John mentioned, which means that you can mix them together. These are CF StringRef, CF Mutable StringRef, and Nu and Tiger CF Attributed StringRef, which allows you to associate arbitrary attributes to any string or any ranges within a string.
Now, given that we all now know that we're going into this beautiful world of having to deal with very different processor architectures, when you're dealing with Unicode, you have to keep a few things in mind about the ordering of the bytes. All Unicode strings in memory are assumed to be native Indian. So on Intel, it's going to be little Indian. On PowerPC, that'll be big Indian. The external representation of UniCode is always going to be either marked with a UniCode byte order mark, that's the bomb there.
or it's going to be assumed to be big Indian. And this is important, whether you're getting the data from a file or you're getting it off the pasteboard. It could be in a byte order different from that which you expect based on what your application is running as.
So how do you store your text? Well, the most convenient way to store text is CFString, mutable string, or if you want attributes on it, the new CF attributed string. Now it's pretty easy to create a CFString from a Unichar array. Just call CFString, create with characters. Now if you have Uni, Unicode text, that's really easy. An associated problem is how you figure out what the encoding of your text is if it's not UniCode text. I'll talk about that in a few minutes.
Now one thing to think about when you're using a CFString is that there can be some inefficiencies if you try to access each character through one by one. So a useful function inside of CFString is a function that allows you to get the characters out from what's called an inline buffer.
This is the fastest way to iterate through the contents of a CFString. And then finally, as I mentioned, we have ways for constructing and for querying the attributes of the new CF attributed string. These attributes can be any arbitrary attribute you want to associate it with it, including font name, font size, language attributes, et cetera.
So given that the thing that the user sees, the character cluster or the grapheme, as UniCode calls it, may be different from what's in your backing store, you need to be sensitive to this question when you're allowing a user to select or to cursor through a string. Because you want the user to be able to select a character in the same unit that they see. So if they see a single accented letter, they typically expect to cursor over that as a single unit.
So we provide ways for you to do this. Now I want to first note that this is not for line breaking. It's only for processing or cursor movement. And the way you do this is you call CFString get range of composed characters at index. So for example, if you wanted to break after a cluster, what you would do is start at a known cluster boundary, call this function, get a cfrangeback that will tell you the origin of the beginning of the cluster that you're at and the length, sum those two together, and then you'll be on the other side of the cluster. It's pretty easy.
So a big issue actually is not dealing with Unicode proper, but dealing with legacy and external text that's not in a Unicode encoding. This is very typical when you're dealing with older non-Unicode documents or text that comes off the internet. So the basic problem is to figure out what encoding you can pass to one of the functions that creates a CFString or just converts the Unicode directly as a Unichar array. And there are four basic scenarios that we commonly encounter where you need to be careful in choosing your text encoding.
The first scenario is you've got a string that you use internally to your program, say to identify a resource within your bundle, say a file. This string is never going to be seen by an end user. It does not need to be localized. So the most appropriate encoding for that is ASCII, or if you want to be more general, UTF-8. Don't try to use the system encoding because you can actually create problems, not find that resource.
In this case, you already know what the string is, so you don't have to call an API to determine it-- to determine the encoding. The next case is when you're dealing with a plain text document. Now there's no way, I mean there are very complex algorithms that will try to test what the encoding of a plain text document is, but there's no really foolproof way. Currently, our recommended way of doing this is to query the system for the current system encoding. The system encoding corresponds to the preference that the user has picked in the international preferences panel for his or her language.
So when you encounter that plain text document, call CFString, get system encoding, and that will return a text encoding that you can then pass to one of these functions to create a CFString. That will work most of the time. Now if you have style text, you have somewhat better information, especially if it comes from an older Macintosh system.
Because on the old, in the OS 9 encoding system, every encoding corresponded to, or every font had a corresponding encoding. So you can map that font information to generate a text encoding. And for that we provide ATS font get family, get encoding family, ATS font family get encoding. So for each style run within that style text document, You just grab the font, map it to a text encoding, and then encode that run, create a CFString, or translate it to Unichars using the returned encoding.
Finally, if you're getting data from the internet, we assume that you know how to get the MyIM Charset information from that data, whether it's mail or HTML. And that will come as a string. You pass that string to one of our functions, CFString, convert IANA Charset to encoding. And that will return an encoding appropriate for the internet data that you're working with.
Now let me talk about how you set up your application for localization. So the way localization works on OS X is probably in a way you're very familiar with. It's the same mechanism used to package your application. It's part of the CFBundle mechanism. You already know that you can put resource files associated with your application into your bundle and CFBundle will find them for you.
Well, a further step to that is to add lproj directories. And lproj corresponds to each localization that you want to implement. CFBundle will then find, when the user has selected a particular language, CFBundle will find the resources that correspond to that lproj. Here we have examples of English, Japanese, and I'm sure some of you in the audience will be able to identify the third lproj directory.
Now, a very convenient way to package up your API is to use what are called Nib files, which I believe stand for Next Interface Builder. Because it's built, Nib files can be built and edited using Interface Builder. Nib files contain whole UI elements such as menus. You can use them either in Carbon or in Cocoa. Here are some functions you can call to create to load nib files in Carbon.
And the beauty of the nib file is it can be localized. It's a complete localization for an element. And it can handle issues like different string links that you encounter when you move between languages. So you have one nib you would localize for English, another for German. And the much longer German strings would be handled by the nib. And also you can farm out your nib work to third parties if you want, because you do not have to be a programmer to build and edit a nib.
Now there will always be times when you need to use strings, localized strings in your application. NIBs may not always be sufficient. So the way you do that is you create a localized strings file. Now, a localized.strings file is simply a set of key value pairs. The key is a key that you determine when you call a function within your program to get that string. The value is the localized string. Here are examples for string files from US English and from Japanese.
Now inside of your code you'd call a function like CF copy localized string. You call it with the key and then with a very, what should be a useful comment about how the string is being used. The reason for the comment is both for you as a programmer when you're looking at your source code to remember what the string is for, but also there's a tool called GenStrings that you can run over your source code that will output all of the localized strings as a localized .strings file.
And the output will contain the key and the comment. You can then pass that to a localizer who can read the comment and know how to translate it into the target local. And things over to Deborah Goldsmith who will tell you about a lot more things like drawing, etc.
Thank you, Lee. I'm going to take over as your tour guide for the second part of our whirlwind tour of international APIs. So let's keep on going. Okay, there we go. The next topic we're going to cover is drawing and editing the text in your application. Now, the easiest way to take care of text in your application is to use one of the text editing engines that is built into Mac OS X instead of doing all the drawing and editing yourself. And for Carbon applications, they're all based on MLTE, the Multilingual Text Engine. Now, MLTE is a very rich and powerful API, and we're not going to have time to go into it in detail today.
There's great documentation in the ADC reference library, which is installed on your system and also available at developer.apple.com. You should start off with handling Unicode text editing with MLTE. A new feature for Tiger that I'd like to point out is that now if you're importing or exporting plain text files, you can control the encoding that gets used instead of having it always use the system encoding.
MLTE is great to use, but an easier way of using it is some of the prepackaged versions of MLTE that we have. One is HITextView that wraps MLTE up in an HIView object, which can be part of an HIView hierarchy. If you're using controls, you want to use the EditUnicodeTextControl, which is also based on MLTE. You want to use that instead of the older EditTextControl, which is not capable of handling Unicode. And there are APIs available for creating all of those versions.
One thing that we recommend that you not do is try to render UniCode text yourself. UniCode is not 16-bit ASCII, and converting UniCode into glyphs to be displayed on the screen is a very complex process. Lee already mentioned the normalization issues, whether a given string consists of pre-composed characters, or base letters and combining marks, or a mixture of the two. So any rendering engine has to handle that.
In addition, some scripts are so-called complex scripts, and the way that individual characters get mapped into the glyphs that display on the screen can be very complicated. Here we've got an example from Hindi, and you can see that the glyphs, the characters don't wind up in the same order once they get translated for display on the screen or the printed page. In addition, languages like Arabic and Hebrew are bidirectional, that is, there are some parts of the text that go left to right, and other parts that go right to left, and that's very complicated to figure out.
We'll say a little bit more about that in a moment. Even for so-called simple languages like English, you can have ligatures and typographic features in high-end fonts, as we saw earlier in the demonstration. Here's an example of a ligature in our Zapfino font, where the word "Zapfino" becomes a very elaborate ligature. So all in all, this is something that we recommend you stay away from.
Let's talk a little bit more about handling bidirectional text. UniCode uses something called the bidirectional algorithm in order to figure out which parts of the text go left to right and which parts go right to left. Now, Arabic and Hebrew are normally written right to left, but they're usually written together with other languages and also with numbers that go left to right. So you're always going to have two directions of text when you're dealing with those languages.
In order to figure-- the process of mapping those pieces of text to where they wind up in the final display is rather complicated. You can have text at the end of a paragraph that can affect how the pieces at the beginning of the paragraph get laid out and what direction they go in.
So in order to figure this out, you need to look at an entire paragraph of text. Now, the system will take care of this for you, but in order for it to do its job correctly, you have to give it an entire paragraph of text. In older systems, using Quick Draw, you might have used an approach where you took a run of text that was all in the same font, drew it with one call, then moved to the end of that and drew the next run of text, and that approach won't work for handling bidirectional text. You have to let the system look at an entire run of text in order to figure out where everything goes. Once it's done that, you can then break that up into pieces and draw them yourself. but it needs the whole paragraph to figure out where things go.
Now fortunately, you don't have to worry about this most of the time because the system APIs will do it for you. If you're drawing short strings, for example, if you've got a drawing program like Illustrator and you want to draw short snippets of text, then there are several APIs available for doing that.
If you're okay with using the theme fonts, for example, if you're drawing a user interface element, you can use HithinDrawTextBox. If you want a little bit more control over the fonts and rotation, CG context stuff, then you can use one of the two text box APIs shown here. One works with text in a CFString, and the other works with text stored as arrays of unicars.
If you're drawing a lot of text, for example, if you're implementing a word processor or a web browser or something that has to do a lot of complex text operations, then if you're working in Carbon, the APA that you want to use is Atsui, the Apple-type services for Unicode imaging. And again, Atsui is a very powerful and very rich set of APIs, and we're not going to be going into detail about it here today.
There's a document, Rendering Unicode Text with Atsui, which is in the ADC Reference Library, and I highly recommend that as a starting point for learning about what Atsui can do. I will give a very brief overview of the architecture of Atsui. The two most important objects when dealing with the Atsui APIs are the text layout object, which represents one or more paragraphs, and the style object, which lets Atsui know where your style run boundaries are, and what typographic attributes you have on a particular run of text. Not just the font, size, and style that you may be familiar with from Quickdraw, but also things like typographic features, such as whether to turn ligatures on or not. Atsui is very powerful and very customizable.
When I say that you give it an entire paragraph of text, that doesn't mean you give it the paragraph and it takes care of making all the decisions. You can do a lot of customization and figure out where all the pieces of that paragraph, wind up. You can even get, rather than drawing, you can even get a glyph array out as an end result to the process, and then you can use that glyph array with core graphics and do all sorts of effects, or you can even get curves and use those with OpenGL or other sets of APIs.
Well, the flip side of drawing text is inputting text. If you're not using a prepackaged text editing engine or a text control, you'll need to handle text input coming from the user as the user types. Now, if you're already supporting languages like Chinese or Japanese in your application, then you probably know about TSM, the Text Services Manager.
TSM is the set of APIs that an application uses in order to support inline input, the ability to type a language like Japanese right in your document instead of having to do it in a separate window. If you're already supporting TSM in your application, then it's very easy to support Unicode as well. Just when you create your TSM document, you just indicate that it's a Unicode document instead of an older variety.
You also want to use the TSM document to support your application. You want to use the text input class of Carbon events to receive your input and not the lower level keyboard events, because if you work at the level of individual keystrokes, the input method doesn't have a chance to do its job, and input won't work correctly for users that use input methods.
So you want to use the -- if you're using that class of Carbon events, really the only one you have to worry about is the Unicode for Key event, which has a string of Unicode text coming from either a keyboard or an input method. an input method. In order to support inline input from an input method, you just use the same Carbon events that you would use in a non-Unicode application, and I won't go into detail on that. You can learn about that in the TSM documentation.
In order to support inline input from an input method, you just use the same Carbon events that you would use in a non-Unicode application, and I won't go into detail on that. You can learn about that in the TSM documentation. Lee Collins, John Jenkins, Deborah Goldsmith In order to support inline input from an input method, you just use the same Carbon events that you would use in a non-Unicode application, and I won't go into detail on that. You can learn about that in the TSM documentation. Carbon events in your application.
Okay, let's move on to analyzing and modifying text, or maybe poking and prodding would be a better way of putting it.
[Transcript missing]
Okay, let's move on to analyzing and modifying text, or maybe poking and prodding would be a better way of putting it. So if you're sorting a large amount of text, that can make it go much faster because you're only doing the prep work once for each string.
If you need to look for a substring inside of another string, you can use CFStringFind, and there's also a version of that that looks for characters in a given character set. and CFString can do much, much more, but we don't have time to go into that today, so I'd encourage you to look at the documentation.
One thing that CFString can do that's new in Tiger is transliterate text and do other transformations on it. This is something that we had available in the script manager in world script, and now we've got a Unicode version. It can do things like strip diacritics or translate text that's in a mixture of different writing systems into the Latin script or go back and forth between various writing systems. It can even replace non-ASCII characters by the numeric entities that you use in HTML or XML. And it has other capabilities besides, and again, you can find out more by reading the documentation.
If you put together lists or tables, you might be in a situation where you have text that's too wide to fit in a given space. Again, figuring out where to truncate the text is something that needs to be done in a language sensitive way, and there are APIs in the system that can help you do that.
If you're using HITHeemDrawTextBox, that has an option to truncate the text to a given width. But it doesn't modify the text that's actually stored in memory. It actually only truncates it as it's drawing it, so the original text is unchanged. If you're using HITHeemDrawTextBox, that has an option to truncate the text to a given width. But it doesn't modify the text that's actually stored in memory. It actually only truncates it as it's drawing it, so the original text is unchanged.
If you're using HITHeamDrawTextBox, Okay, we talked earlier, Lee talked earlier about finding a place where you can break without breaking up a user character, but it's also important to find things like line and word boundaries. For example, if you're doing Word Wrap or double click. And in fact, line break and word break are two different things. Line break is a place where you can do Word Wrap.
Word break is used for things like double click or for doing whole word search in an application. If you've got an editor and say find entire words, you can do that. And because of that, they give slightly different results. For example, if you look at the French example at the bottom, you can see it's okay to have a line break after the hyphen, but if you're double clicking or if you're doing whole word search, you want to treat that hyphenated form as an entire word.
So the APIs for finding word or line breaks operate at the array of Unicar level. And the way you do that is you create a text break locator object and specify the kinds of breaks that you want to do, whether you're looking for word breaks or line breaks or what have you.
You can then call UC Find Text Break, give it an index to start from, and search backwards or forwards for the next word or line break. If you're looking for cluster breaks in particular, CFString can handle that, but at the moment there isn't a CFString API for doing word or line breaking.
Lee also talked earlier about finding the properties of characters and how when you do that, it's important to make sure you get both parts of a surrogate pair. So you may have a 16-bit character or you may have a 32-bit character, and to get the properties correctly, you need to get both halves of a 32-bit character. Examples of properties are things like, is this character a letter? Is it a number? Is it a kind of dash? And so on.
So there are two APIs available for doing that. One is CF Character Set, which will take a given character and test it for membership in a set. Another is UC Get Car Property, which can return different kinds of properties. For example, if you have a character that is known to be a decimal digit, UC Get Car Property can tell you what the value of that digit is.
In order to put surrogate pairs back together, there are three very useful macros you can use. There are two that are used for testing whether a character is a high or a low member of a surrogate pair. If you determine that you've got a high surrogate character followed by a low surrogate character, you can then call UC get Unicode scalar value for surrogate pair, take a deep breath when you say that, and put those two halves back together and get the Unicode scalar value.
Once you've got that, you can call something like CF character set is long character member in order to find out whether a character's got a different property. So in this case, we're getting a predefined character set from CF character set, the character set of decimal digits. And we then test our character, our 32-bit character, to see if it's a member of that set.
Okay, moving right along. Everybody take a deep breath, do some stretching exercises. It's time to move on to dates, times, numbers, and calendars. So Lee talked earlier about locales. John talked about them also. A locale is basically a package of data that represents how they do things in a certain country and in a certain language. Dates, times, numbers, calendars, and so on. What currency they use, you name it.
This is different from the UI language. You may remember that in John's demonstration, at no time was the user interface language changed. He was always running in English. The menus were always in English. However, he was able to change his dates, times, and so on to different locales. So the locale is not the same thing as the UI language. It's also not the same thing as the collation order because the user can set that separately too, or the text break behavior.
So the most important API to remember for locales is CFLocale.copy.current. That will give you a CFLocale object that represents the user's current locale preferences, including any customizations they may have done. And there's lots of different things you can do on a locale object. I'm not going to go into detail on them. But two of the more important ones I'll mention are to get data associated with the locale.
For example, you can say, what is the currency string for this locale, or what is the decimal point character or decimal point string? Another thing that you can do with a locale that's new in Tiger is get localized names for things. So for example, the ISO language code JA, the localized display name for that in English is Japanese. The localized display name for the country code JP is Japan. And the localized display name for the currency code USD is the dollar sign. And you can find out more about what you can do with CFLocale by looking at the documentation.
We're going to go into a little bit more detail on how to deal with dates, times, and numbers. For dates, there's a core foundation object called the CFDateFormatter, and that will convert a string that represents an end user date to an internal number or go in the other direction.
In addition to that, you can set properties on a date/time formatter that will control how it does that process. So here's an example. We'll create a CFDate formatter. And you see we're calling CFLocale copy current to get the current locale that represents the user's preferences. And we're specifying that we want the long style.
Next, we set the property on this date formatter to indicate that we want lenient parsing. So if we're converting a date string to a number, we'll be more lenient about what we'll accept. It won't complain if it doesn't exactly match what the format says. And finally, we call CFDate formatter to convert a string into an absolute time.
Now here's two examples of what a long style date string might look like. They're both for June 6. The first is in the US English locale, and the second one is in an Arabic locale. And you can see that the Arabic locale does not use the same ASCII digits, for example. If you use CFDate formatter, you don't need to worry about any of this. The system takes care of it for you.
As was mentioned earlier, new in Tiger is calendar support. We had Gregorian calendar support before this, but now we support different kinds of calendars. And you can do the kinds of operations that you saw John doing in his demo. You can convert between calendar components, that is a year number, a month number, and so on, and an absolute time, or go in the other direction. You can say add one month, add one year, or subtract one day.
You can find out the first day of the week. In some countries, the first day of the week is Monday. In other countries, it's Sunday. And you can also find out the shape of the calendar. That is, for a given month, how many days does it have and what are the range? For example, you can ask CF Calendar, what is the range of days in February 2000? And the answer is 1 through 29, because that was a leap year. CF Calendar can do much more and you can find out about it again in the developer reference library.
Numbers are handled in a way that's very analogous to dates and times using an analogous object called CFNumberFormatter. Again, it converts a string to a number or a number to a string. It goes in both directions. And again, you create it using the current locale. CFNumber Formatter also lets you set a format string if you want to format numbers in a particular way, as opposed to using the user's preference for number formats. So here's an example. We'll create a number formatter object. Again, that's done using the current locale.
and we in this time we say we want a currency style. We have a double which has the value 42 which represents an amount of currency and we call CFNumber formatter to convert that value into a string. And here's two examples that you might get. The first is for the German locale I believe and we get 42 euros.
The second is for the Hindi locale in India and in this case we get 42 rupees and again using non-ASCII digits because that's the way things are done in India. If you use core foundation CFNumber formatter you don't need to worry about this. It's all taken care of for you.
Okay, we're at the end of our whirlwind tour, so again, everybody take a deep breath. Before we get to the Q&A, there's a couple of other things I'd like to cover. If you have more questions that we didn't answer here, Xavier's name is last, but he's the person you should try first. He's the contact in Worldwide Developer Relations, Xavier Legault. Or you can contact myself or Lee Collins if you have questions that the documentation doesn't answer or that you can't get answered on a mailing list.
For more information, you can follow the links on the More Information page for WWDC for documentation, sample code, and other things. So just go to the main WWDC page, then go to the Extra Information page for this session, and you'll see all sorts of links you can follow to find out more about what we've talked about today.