Internationalizing Your Software - WWDC 2008

Essentials • 1:02:42

By creating international versions of your application, you can reach more users and expand your sales. It can even be easy to do, once you know a few rules and learn the tools that do most of the work for you. As a developer, you have to use the right APIs and follow certain rules to be localization-friendly. As a localizer, you need to use the right tools. Learn what to do and what to use from the people who write the APIs and the tools.

Speaker: John Jenkins

Unlisted on Apple Developer site

Downloads from Apple

SD Video (699 MB)

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.

Well, good afternoon, everybody. Bonjour, konichiwa, nimakauma. Welcome, bienvenue, daigafu, nyingra. This is session 374, internationalizing your software. And I am John Jenkins. I'm a senior software engineer at Apple. I will be your guide to this wonderful world of localization and internationalization. In the spirit of the session, I am also Zheng Zhaohang, or in the barbarian northern dialect, Zheng Zhaohang, or if you prefer, Shona Prichardt. or even Juani Tinikini. I will answer to any of those and I may give a prize to the first person who can identify all four localizations of my name. So, what are we talking about?

Internationalization, which is often referred to with the abbreviation I18N, an I followed by 18 letters followed by an N. This is the process of preparing your software to be used in different places and with different languages. Localization, which has a similar abbreviation, L10N. This is the process of actually adding data for specific languages and locales.

You need to do both of them if you're going to be properly internationalized. Both processes must be done. However, of the two, the most important is the internationalization, the process of getting things ready, getting things set up. This has to be done by the programmers. This has to be done in-house. This has to be done first. Once it is done, the localization, particularly on Mac OS, the localization is an almost entirely separate process. It comes later.

It can be done, actually can be outsourced and off to third parties to do. The structure that we have and which we'll be explaining makes it easy to add localizations once the work is finished. So we will be focusing on internationalization. We will also be focusing on Mac OS X.

Pretty much everything here does, in fact, apply to the iPhone as well. So if you learn how to internationalize on the Mac, you know how to internationalize on the iPhone. So learn it once, and you can do it on both ways. Some key ideas that we want everyone to come away from this session understanding is that internationalization is enormously difficult. It is a very, very hard thing to do. Fortunately, we have done it for you.

If you use the APIs built into Mac OS, virtually all of the work will be done for you. In fact, for most people, all of the work will be done for you. There will be a few people, a few situations where you need something special that you'll have to do on your own, but almost nobody will have to deal with them. So write your application properly, use the APIs that we provide, and everything comes automatically.

Which means we're through, basically. There are three main steps to internationalization. The first is to use Unicode. The second is to take advantage of bundles. And then the third is to use locales. We will be discussing all three of these. And they all need to be done. Now, as I say, we do most of the work for you, so you may not actually have to use these directly, but you should at least understand what's going on under the hood.

So, let's start with Unicode. Now, Unicode, you know, I'm old enough that I remember introducing Unicode as a new character set. Unicode is, in fact, not a new character set. It's almost 20 years old. And yet, there's a surprising amount of misunderstandings that you'll run into regarding it.

So, we're going to take a few minutes and discuss it in some detail. Unicode, if you do not know, is a character set, and its goal is to make it possible to represent on computers simultaneously all written languages, living or dead. It's not quite there yet. There are some dead languages it doesn't cover. There are some minority scripts it doesn't cover. But virtually anything that your users will run into is in Unicode now, or will be shortly. And by simultaneous, I mean exactly that. A properly written Unicode application can handle any text in any writing system with equal ease.

You could even theoretically have a document which contains in one document samples from every language and every writing system, and it will work. That's what Unicode is all about. The current version of the standard is 5.1.0. That came out just a couple months ago. And there are currently over 100,000 characters in Unicode. Now, the exact number depends on how you count them. There are a number of control characters, things like that. But there are over 100,000 graphical characters that cover 77 different writing systems.

So, as I say, pretty much everything you will need is already there. Moreover, Unicode is the way to represent internationalized text in almost every setting now, but in particular on the Internet. Just to give an example, if you do HTML and you use numeric entities at all, those are Unicode code points.

So it is everywhere that you're doing text. And just so that you're familiar with the convention, the way to represent Unicode code points is with a capital U, a plus sign, and then four to six hexadecimal digits following that. And this is the convention we'll be following here.

Main thing to remember and to understand about Unicode is that it is not 16-bit ASCII. ASCII is obscenely easy to program for. There's almost nothing you need to do. Unicode is almost obscenely difficult to program for. It is very hard to do. And as I say, we've done the work for you, and you should be glad of that. There are some things that you may run into or that you may need to be aware of. And we'll mention those briefly.

We're not going to go into all of Unicode. We could devote an entire conference. In fact, there are entire conferences devoted to Unicode. But we'll be mentioning here some really important points. The first is the existence of the different encoding forms of Unicode. Second is the byte order mark.

The four or five normalization forms and, most importantly, grapheme clusters. And again, don't try to do this on your own. You may need to know what's going on. But if so long as you're doing it, you're going to be able to do it. And if so long as you use the APIs that we've provided, you should be just fine. So encoding forms. There are three basic ways that Unicode is actually represented in terms of bytes.

The first of these is UTF-8. This is a byte-oriented protocol. You read data one byte at a time. This means a couple of really useful things. In particular, it is upwardly compatible with ASCII. A valid ASCII file is a valid UTF-8 file. It also means that you can use it in a lot of different ways. You can use it in a lot of different ways. You can use it in a lot of different ways. The second is that UTF-8 works seamlessly with older protocols and older libraries which expect ASCII-like character sets.

UTF-8 is now the most common text encoding used on the Internet. Google announced just a few weeks ago that of all the pages on the Internet, more are encoded in UTF-8 than any other text encoding. Now, there's also UTF-16. UTF-16 is the oldest way of representing Unicode on computers, and it's the way that you will typically see in libraries. In particular, it's the form which we use inside NSString.

UTF-16, as the name implies, you're reading data 16 bits at a time, two bytes. And if you're dealing with units of more than one byte, you have to care about byte swapping. Now, there's a built-in mechanism that I'll mention in a minute to deal with byte swapping. There are also two variations of UTF-16, UTF-16BE and UTF-16LE, which specify in advance what the byte order is, so that you can deal with that.

And finally, there's UTF-32, and as, again, the name implies, you're dealing with data 32 bits at a time, four bytes, so there are also two variations that specify the byte order in advance. UTF-32 is not terribly common, and it is not likely to be terribly common, and this is why.

Here we have four characters from Unicode. A common Latin letter, a not-quite-so-common Latin letter, a relatively common Chinese character, and a decidedly rare Chinese character. And how you represent them in each of these three encoding forms. UTF-8, capital letter A is one byte, 4-1, just as you would expect.

The others require two, three, and four bytes, respectively. UTF-16 are common characters, each require two bytes. The rare character requires four. UTF-32, they each require four bytes, and that tells you instantly what's wrong with UTF-32. Most of those bits are zero. Most of the bytes are zero. So it's not a terribly efficient way of storing text in terms of memory. Hence, UTF-16 is probably going to be the most common for the indefinite future. Now, I mentioned that Unicode has a built-in mechanism to deal with byte order. This is called the byte order mark, Unicode F-E-F-F.

It is used to distinguish UTF-16 and UTF-32 byte order variations. You don't need it if you already know what the byte order is going to be, if that's specified some other way. In particular, you don't need it for UTF-8. UTF-8 reads data byte by byte. There's no byte order. However, you will sometimes see it at the beginning of UTF-8 text files. People use this as a flag so that they know that the text file is UTF-8. This can create problems, but you do see it. It's generated by several different programs. This is how it works.

You put the buy order mark at the beginning of your text, F-E-F-F. You account on the fact that its byte-swapped counterpart, FFFE, is defined to be undefined. So when the data is read in, you read in the first two bytes, you will either see F-E-F-F or F-F-F-E. If you see F-E-F-F, that's the byte order mark.

That tells you that you are interpreting byte order the same way the text file is, and you don't need to do anything. If you see F-F-F-E, you know that you do need to byte swap. You are interpreting byte order, or you have a different byte order from the text file, and you know how to proceed.

So here we have an example. The first word in our document is 6F22. There are two different Unicode characters that this could possibly represent, although admittedly, one of them is not likely to be the first character in a text file, but they're still both valid Unicode. Without the byte order mark, you don't know which one to use. With the byte order mark, you do. So you will see this at the beginning of a lot of Unicode text files.

Next are the normalization forms. One of the things that makes Unicode really difficult to deal with is the fact that characters or units on the screen, what we call graphemes, can be represented with more than one byte and sometimes characters can be represented in more than one way. Example here is an accented E. You can either do it as a single character or as a pair of characters, the lowercase e and the acute accent.

To simplify Unicode processing, the Unicode Consortium has defined four different normalization forms. The idea is that you will take text, you will normalize it to one of these four forms, you will then know an awful lot about the characteristics it has, and that will make it easier for you to process. The first form is normalization form C. You use combined forms wherever possible. Normalization form D is the opposite. You use decomposed forms wherever possible.

And then there are two variations of these, NFKC and NFKD. The K stands for compatibility. And the idea here is that Unicode has what are called compatibility characters. These two normalization forms will get rid of the compatibility variance. Before I show examples, I just need to mention there is, in fact, a fifth form that you may theoretically run into on the Mac OS, and that is the form used by HFS+. HFS+ is a file system which is used by most Macintoshes out there.

And it uses a form of names. It uses UTF-16. It uses a decomposed form of Unicode, which is almost exactly the same as NFD, but not quite. There are some differences. Again, this is not something you'll probably have to deal with, but you should be aware that it exists. So here are some examples.

I want to compare NFC and NFD. I've already showed one instance of this. E with an acute accent. Okay, you can either do it as a single character or as two. The common Korean surname Kim, as in Kim Jong Il, you can either do as a single character or you can do it, given the nature of Korean, as three, the "keu," the "i," and the "m" separately.

To compare NFC and NFKC, one of Unicode's goals was to be compatible with character sets which existed when it was created. This makes it really useful for upgrading old data or for doing conversions between character sets. But it also means that Unicode inherited characters that it otherwise would not have included. And one of these is the FI ligature, which is found at Unicode FB01.

has an FI ligature, so Unicode had to have one. Now, Unicode prefers that you do ligatures by other mechanisms, and so that's considered a compatibility character. It's there only for compatibility with a different character set. So if you use NFC, it will be there. If you use NFKC, the compatibility mapping is taken away, and it's split into two characters, the two pieces. An even better example of a compatibility character is F900 compared with 8C4.

Now, if you look at these, your first impression may be that they look exactly the same. But if you look at them really, really, really closely, you'll notice that they do, in fact, look exactly the same. The difference between the two is that they are pronounced differently in Korean.

Unicode does not distinguish characters based on how they are pronounced in Korean, but an older Korean character set did. So Unicode has F900 as a compatibility character. And if you do one of the compatibility mappings, that distinction, which is not relevant to Unicode, will be removed. And finally, we have an example, and you can just look at it, of the difference between NFD and HFS+. This particular character is one character in HFS+ on your file systems, and it's two in NFD. And again, this is not a character you are, in fact, likely to have in very many file names.

Finally, most importantly, we want to talk about grapheme clusters. I already mentioned Unicode can have characters represented with pieces. Now, what you see on the screen, what you see on the printed page, this is called a grapheme. This is the basic unit of writing. And it can be built up out of pieces. And this is very useful in Unicode.

Okay, here we have a rather rare example. This is a Greek alpha with a macron and an acute accent. You may actually see this in books on ancient Greek poetry. It's a legitimate thing to want to represent. However, the number of possible combinations of accents that do exist for Latin, Greek, and Cyrillic is so astronomically high that to find and encode them all separately would require prohibitive expense. So Unicode takes the easy route, and it just lets you encode it by hand. So Unicode takes the easy route, and it just lets you encode it by hand.

Now, in practice, what this means is that you cannot just jump into an arbitrary point in text and assume that you are on the boundaries between things that the user wants to distinguish. You might be in the middle of a grapheme cluster. So if you are doing searching, if you are doing substringing, this you need to be very much aware of. Because otherwise, you might be doing things to the text that don't reflect what the user is actually looking for.

So that's Unicode. Brief introduction. How do you do Unicode? How do you do text on Mac OS? On the Mac and on the iPhone, you use NSString. This is the class for representing text. It is toll-free bridged with its core foundation counterpart, CFStringRef. That means that you can use them interchangeably. If you have a CFStringRef and you need a pointer to an NSString, you can just do a cast, and that's all the conversion that's needed.

NSString keeps you, the application programmer, from having to deal with the complexities of text on your own. Now, in this case, there are some related classes that you may find useful. There's NSMutableString, in case you are modifying your string after you make it. Perhaps you're creating it one piece at a time.

There's NSAttributedString. This is enormously useful. If you want to have a button or a table or some other UI element have text that you get to control the font for, that you get to specify the point size for, on the fly, you use NSAttributedString, and it's drawn correctly. Finally, there's NSMutableAttributedString, which, as the name implies, does both of these. We'll be looking primarily at NSString.

How do you create them? There's an easy way, which, of course, you're not supposed to do. Put them in your source code. And the way you do that is you have your string constant and you put an at sign in front of it that tells the compiler that this is an NSString and not a char star.

Or the most flexible way is using string with format. We'll discuss that in a little more detail. Now, if you have an array of 16-bit characters, unit chars, and you know how long the array is, you can just hand that into NSString and create a string with that, string with characters.

Notice, do not use WCharty. WCharty on the Mac is not 16 bits in size. It is on other platforms. It is not on the Mac. You need to use something which is 16 bits in size, like a unit char or a UN16T or something like that. Not WCharty.

Finally, if you know the encoding of an old piece of text, it is a C string that you got handed somehow, it is an old file that you have, it is something that is being pulled off the web, or it is even in an NSData object, you know what the encoding is, there are functions in NSString that let you create a string and convert it to Unicode. So conversion to Unicode is simple. We won't show examples, but conversion from Unicode is equally simple.

Now let's look at string with format in a little more detail. This is, as I say, the way you will probably be doing most of your string creation. It's very simple, because the syntax is almost exactly the same as sprintf. There are a couple of differences that are very handy.

The first is to use %@, this lets you include an Objective-C object in your formatted string. Usually you use this for an NSString object. You can do it with other Objective-C objects, but typically you will want to use NSString. We'll show why later on. You can even specify the order of the parameters in the formatted object. You use a % then a number indicating the order of the parameter, the index of the parameter, then the $ and @. These are one-based indices, so you start with one, and this lets you specify the order in a particular format.

This is useful. Different languages order the same elements of a sentence in a different way. If you know a Romance language, something like Spanish or French, they will usually put adjectives, for example, after the nouns they modify. English prefers to put them first. So you need to be able to control the order of words in a sentence so that you can do localization properly.

Now, let's do a couple of examples here quickly. Let's start out with a very easy one. We have an NSString that we want to create, and we hand in a template to string with format. It takes one Objective-C object, follows it with world. We hand in an NSString greetings as a parameter, and we get greetings world.

Now we try it again with two parameters. This time we want to control the order. So we have in our first instance, the first parameter comes first in the sentence. The second parameter comes second. We hand in hello and world as our parameters. And we get out hello world.

Do the same thing with the order switch. In the last example, we have the second parameter coming first and the first parameter coming second. Again, we hand in "hello" and "world," and instead of getting "hello world," we get "world hello." Now, you may have noticed something about this third example. I cheated with regards to the capitalization. I knew in advance which one was going to come first, and so I capitalized appropriately. That's not going to happen most of the time.

To deal with that, in similar situations, there is a rich array of functions that are available for NSString that you can use to manipulate a string once you've created it. There is, for example, there are a number of casing messages you can send an NSString object, uppercase string, lowercase string, capitalized string. Typically, you would use capitalized string in a case like the one we just saw. There is precomposed string with compatibility mapping. This is one of a number of functions that lets you convert to a particular Unicode normalization form, in this case, NFKC.

Is equal to string. You can compare strings. This is the simplest way to do it. There are variations on this that give you more control. You can also find out the relative order of two strings. If you need a substring, substring with range, hand in a range. Now, if you're dealing with substrings, remember, you have to be worried about graphing cluster boundaries.

And so to do that, we have some functions. We have range of composed characters, sequence at index, hand in the index. You get back the range of the graphing cluster that contains it. And similarly, with a range, you can extend its boundaries so that they match graphing cluster boundaries. So these are the functions you use to deal with graphing clusters.

Now, all of this is no good if you can't localize what you're going to display to the user. The way we do this on Mac OS is we have what are called .strings files. And you can use a number of different encodings for these. We strongly recommend UTF-16. In fact, forget that you can do other encodings.

Just use UTF-16. You can have problems otherwise. And these are simply plain text files. Each line, you have a key and a value. And you use different functions then to access the data in them. The two most common will be NSLocalizedString or the second most common will be NSLocalizedStringWithTable.

or maybe it's the other way around. I'm not sure. I haven't checked. But probably use one or the other. In both cases, you hand in a key, the key that you want the value corresponding to. And when it's called, the system will look through the localizations available, find the best match, and return the value that corresponds to that. Now, both of these also take a localization hint as a parameter. This is useful because of a handy little tool we include on the system called GenStrings.

GenStrings is a tool that you run over your source files. It looks inside them for calls to NSLocalizedString and NSLocalizedString with table and their variants, and it creates the .strings files for you. Now, typically, you don't need to hand in any command line arguments because it will usually do exactly what you want to do.

And those localization hints that you included in your code, well, they show up as comments that are read by the localizer. Let's look at an example here. So source code, mountains.m, inside there I'm creating an NSString object. I'm using NSLocalizedString from table. The key I'm going to look for is something called tallmountain.

The table I'll be looking inside is called mountains.strings. And the localization hint is name of the tallest mountain in the world. OK, I look in my directory. There's no .strings file. I run genstrings over my source code, and there it is, mountains.mountain.mountain. .strings. It's been created for me.

I open it up with my favorite text editor, and what I see is my localization hint as a comment, my key, and then a dummy value for the value. OK, now this I give to my localizers. My English localizer, who probably has the cubicle next to mine, hand it to him. He reads it, opens it. Name of the tallest mountain in the world. Okay, the value for that key has to be Mount Everest.

sent it off to New Delhi to my Hindi localizer. She opens it up in her text editor, sees name of the tallest mountain in the world, and for the value, she puts in however you say Mount Everest in Hindi, which I cannot pronounce. I'm sorry to say. I don't need to.

The localization is there for me. This is how you localize text. You use .strings files. Put every string that you're going to show to the user in one of these, have a key for it, and then use the functions that access the value for the key. That's strings. That was the hard bit. Now we're going to go on to bundles.

Bundles are easy. Bundles are special folders, and that's really all they are. They have a certain structure that is used. They have identifiers, which can be handy. Most of the time, the user will see them as a single object. And in particular, for internationalization, inside a bundle, inside the bundles that we'll be dealing with, is a subfolder called resources. And inside those are subfolders whose names end in L, language project. That's where the localizations live. What are bundles?

Applications are bundles. They're folders with a certain structure. Frameworks are bundles. They're folders with a certain structure. Both of these can contain localizations. Another thing which you will usually see as a bundle is a nib. Nib files are where your UI elements live. And they contain your windows, they contain your menus, things like that. You create them and edit them with Interface Builder. Now, in Leopard, we've introduced a ZIB format, which is an XML format that can contain the contents of a nib. These play nicer with most source control systems. But in the past, these have always been bundles, nib.

So how do we represent them on Mac OS? Well, we use the NSBundle class. This is the basic class. Be aware, NSBundle is not toll-free bridged with its core foundation counterpart. This is the one place where you have to be aware where you're getting the bundle from and what functions you need to use to manipulate it.

Again, here we're focusing on Cocoa, so we'll be looking at NSBundle. The same functionality is available pretty much for CFBundleRef. You just access it slightly differently. How do you get them? There are three main ways you will probably get a bundle when you need it. The easiest is to call mainBundle.

Hand this off to the NSBundle class, and you get back the bundle that corresponds to the application which is currently running, which is probably going to be you. Now, if you're a framework, that doesn't do you any good. If you want to get some data inside yourself as a framework, you can't use mainBundle. That will give you the bundle for the application that's using you, not you. So you call bundleWithIdentifier. You know what your identifier is.

I hope you know what your identifier is. And that way, you get inside yourself. Or, if you're an application and you want to get inside a frameworks bundle, which will not happen very often, but when it does, you will typically get a bundle. You will typically know a class which is defined by that framework, and you simply call bundleForClass, given the class. And you get the correct bundle back.

You're not going to use bundles a lot directly. Most of this is taken care of for you by the system. If you do need to use bundles, it will probably be because you need to get some sort of a resource file of a type which isn't defined by the system. It isn't a nib. It isn't an image. It isn't a string.

For that kind of data, you will use the function path for resource of type. You hand in the name of the file, you hand in its extension, and you get back a path to the properly localized version of that file. There are different variants of this, and you can use the But this is the one that you will most often be using. Okay, that was bundles. They are also easy. Let's look at locales.

What is a locale? Well, a locale is an object that corresponds to a particular place. Places can be big. They can be entire countries: Russia, China, Canada. They can be small: Wales, Liechtenstein, my bedroom. Most often, they are not just a place, they are a combination of a place and a language. This makes it possible to have one locale for Welsh-speaking Wales and another for English-speaking Wales, or French-speaking Canada and English-speaking Canada. defines a default set of values and way of doing things. Date and time formats.

What the local currency is. What units of measurements are being used. What calendar is being used. This is all found inside a locale. The values that we supply on Mac OS come from what is called the Common Locale Data Repository, which has data on literally hundreds of locales. It's a massive open source project in which Apple participates. Of course, you don't have to use these. You can override them yourself if you want to live dangerously, or you can let the user override them if you want to live rather less dangerously. Okay.

Locales are not user preferences. They kind of are. They tell you how the user wants to do things. But the main thing you need to be aware of is a locale is kind of a preference that is common to an entire place and which is accessed through not the preference mechanism but through the locale mechanism.

Similarly, locales are not localizations. Now, they contain localized data. They contain data that changes from place to place. But what you need to think of as a localization is what is unique to your program that varies from place to place. Your menus, your windows, your panels, things like that. That's where the localization is. : Locales do contain localized data, but it's not the same thing as your localization. Here's how you set them. On the Mac, you set locales, you set place and language separately. To set the place, you go to your international preferences panel.

You go to the formats tab. I've set it to the United Kingdom. I'm formatting dates the British way. I'm using the British pound sterling. I'm on the metric system. I'm using the Gregorian calendar. All of this is set for me all at once. If I want to set the language, I again go to international preferences. This time I look at the languages pane, and here I have a list of languages. It's important to be aware of this.

This is the order in which the user would like to see localizations. Now, here they're saying I want to see the English localization first. If you have one, show it to me. That's not going to be a terribly useful list in this case because almost everything is going to be localized to English. Here's another example.

This is what you might see on a system of somebody who's native Chinese. They prefer traditional Chinese, but they can read simplified Chinese. They can manage Japanese and even a little French. So if you have a traditional Chinese localization, that's what they want. If you don't have one, then they'd like to see simplified Chinese. If you don't have either of those, they'll live with Japanese. And if you don't have any of those, French is OK.

And if you don't have French, OK, I'll live with English. And that's how it works. This is how the system decides which localization to show the user, if not all of them are available. So a localization, this is an LPROJ folder that lives inside your application bundle or your frameworks bundle.

That's where your localization data, localized data lives, your localized nibs, your localized .strings files. The functions look inside the various LPROJ folders, which are available in a certain order. That order is defined by the user. And when they find something acceptable, then that's what they show to the user.

The Lproj folders are named for the language or the dialect that they cover. For this, we use a protocol, best common practices 47, which is a way of defining in a standard way locale identifiers and language identifiers. Here's an example. We're looking inside an application bundle. Here's our contents. We have our resources subfolder, and we have a number of different localizations here.

We have EN-DSRT, that's English with a Deseret alphabet, and then regular English, and then French and Japanese and Russian and traditional Chinese. You'll notice my icon isn't localized, so it's not inside an Lproj folder, but everything else is. Now, I include Deseret alphabet not just to be cute.

There's a really important point you need to be aware of here. You, as an application developer, and your users, are not limited to a predefined set of localizations by Apple. If for some reason your users do in fact want a Deseret Alphabet localization, you can provide it and they can use it, and nothing has to be done to the system. Now, Deseret Alphabet is not terribly likely. There are some other languages that are likely. Thai, for example. We may not localize everything to Thai, but a Thai locale can be provided, a Thai localization. So you are not limited by us to a certain set of locales.

Localization tools. There are a couple of things that you can do to create localizations. You can hand edit them. You can hand edit the nibs and strings yourself if you really want to. Big problem here is you may end up with inconsistent translations. Two different translators will translate the same thing two different ways.

There are tools that help simplify this process. Apple provides one called Apple Glot. There's also a third-party tool, iLocalize. Both of these use glossaries provided by Apple, which is basically a set of the translations that we use. And you can take advantage of these to get consistent translations.

How do we represent locales? Well, we use the NSLocale class. This is the core class to represent a locale object. It's toll-free bridged, again, with its core foundation counterpart, NSLocaleRef. Which is, again, very handy, but we'll be focusing on NSLocale. How do I create one? Well, The easiest way is to use current locale. That gives me the current locale, kind of. It gives me whatever locale was set when I was launched.

Now, you may want to be a little more responsive than that. You may want to actually, if the user changes the locale while you're running, you may want to reflect that back to them. And so in Leopard, we've added a new function, auto-updating current locale. This gives you the actual, right now, up-to-the-date current locale.

If you're going to be responding to when the locale changes, you need to be able to find out when the locale changes. And so we have a new notification you can take advantage of. NSCurrentLocale did change notification. If you react to this, then you will find out when the locale has changed and you can update things appropriately. And finally, there's NSLocaleWithIdentifier. I mentioned there's a standard way of giving locales identifiers. This lets you put in an arbitrary locale and hopefully get something useful back.

You can get a list of the locales that come with the system, or you can create your own on the fly if you really want to. So here I have an example. EN, English language, hyphen DSRT, written with the Deseret alphabet, underscore US, reflecting American practices, at calendar equals Japanese, using the Japanese calendar instead of whatever the default was. I really doubt you will ever in your life encounter this locale. for the record I have. It exists. You can define it. Your user can define it. If this is what they want, this can be provided to them.

What can you get from a locale? All kinds of stuff. What currency is set? What measurement system is being used? Where are you? So here we have an example. We get our locale, auto-updating current locale. Find out the currency, or in this case the currency code, NSLocale currency code. Hand that off to object for key. We find out what the currency code is. This is a string. We want to know if we're using the metric system or not. NSLocale uses metric system. This gives us back an NSNumber, which we interpret as a Boolean.

Right off we know. Now, suppose I want to show the user the name of the country where they are. Suppose somehow they have forgotten this fact. Easy to do. First, I get the identifier, the ID for the country, the standard country code. I have my locale, say object for key, NSLocale country code, and I get back something like US.

Now, this is an ID. This is not a user-visible string. NSLocale can hand you the proper string for the user to see. Here we use a function, display name for key. We have our locale. It's the current locale. We say, okay, I want a country code converted to a user-visible string. And the country code in particular is US, and it hands me back the string United States. So this lets you see the proper localization for a particular piece of data.

NSL CalObjects do not, however, let you know is a localizer. He knows what the user's list of preferred languages is. Fortunately, that is also easy to get. This is new in Leopard. You can give the NSLocale class, not objects, but the class. hand off the message preferred languages. This gives you an array back. The array consists of NSStrings. These NSStrings are the proper IDs for the languages in order. So we have here ZH-HANT, traditional Chinese, followed by ZH-HANS, simplified Chinese, Japanese, French, English, Russian, and so on down the line.

This has been available for a long time. In the past, you got that through the preferences mechanism, and it was clumsy and difficult, and now it's much easier in case you need to have it. Usually, of course, what you want to know is what the user has set to the current language, so you get this array. Just look at the first element.

When do you use locales? Well, more likely than not, you will never need to use an NSLocale. Because the resource accessing functions do the right thing for you. They are aware of what the current locale is and they will give you back the correct localization based on that.

Now, if you allow the user to override whatever is set in System Preferences, your calendar application, and you let the user set the calendar inside your application, say, or something like that, okay, then you can create your own locale and use it. You need to explicitly use the locale. Or if you want to show data from multiple locales, multiple localizations, again, this is a case where you would actually create and use an NSLocale option. project.

All right. That's the basics of how to localize, how to internationalize on Mac OS. Let's see some examples that show how it actually works. So, first we want to get a localized nib, and we want to get a localized data file, and finally we want to get a localized string nib.

Okay, remember, nibs are where UI elements live. This provides you your menus, things like that. All right, I want my properly localized one. How do I do it? You probably have to do nothing. A lot of applications, if basically all you're dealing with are nibs for your menus and maybe your documents, well, then everything is taken care of for you.

When you're launched, the correct localization is found, the correct nib is dealt with in the system, and you just have to find out when the nib is set up. You don't have to do any work. If you, however, do something like have your own about box or you have a preferences pane or something like that, well, okay, you need a nib for those. How do I do it? There's a function on NSBundle, and it's loadNibNamed.

You hand in the name of the nib that you want. You hand in the object that's going to own it. You're given back a Boolean that lets you know whether or not it was found and loaded. And that's all the work you have to do. So finding the correct localization is handled automatically for you.

Now, what about a data file? Okay, I have a data file inside my application. Oh, and it's localized. It varies from place to place. I want to make sure I get the right one. How do I do that? Well, again, you use Path for Resource of Type. So, here's an example. I want my bundle. I'm an application, so I get my bundle.

I'm looking for a file. The file name is Mountains. The file type is plist. Give me a path to it. Hand that off to the bundle. And you get the correct path for the correct localization of that file. Maybe there is no localization at all. Maybe it's something that isn't localized. That's okay.

If there is a localization, you're handled that. If there isn't one, then you're handled that. Everything is taken care of for you. All right. Now, this is a P-list. P lists are handy because you can put Objective C objects of them of certain types, strings and numbers and so on.

So here I have an array inside my P list. This is a file I created, so I know what it is. So I just hand it off to NSArray, array with contents of file, hand off the path I got, and the correct localization for this data file is turned into an array of objects. I can now step through the array and do whatever it is I want to do.

Last example, we want to get a localized string. We're going to go through this a bit more slowly because there are a couple of gotchas here. I'm going to show a string to the user. This string is going to show them a mountain's name, how high it is, and when it was first climbed. So there are three values that I'm going to deal with. I presumably have a mountain object that contains these. Now, user visible string, that means I want it localized. So I start off by calling NS localized string. I'm going to look for a key sentence format.

I'm going to look inside mountains.strings to get the string. And I'm telling the localizer, by the way, in case you need to know, this is a sentence. It has the mountain's name as the first parameter, the height as the second parameter, and the climb date as the third parameter. Notice that this is handy not just for the localizer, but also for the programmer, who probably forgot to comment their code here anyway. So there's your comment.

Okay. I have my format. Now I want my localized name. Okay. I want a localized name. Now, this is assuming I'm using just one table of mountains, and so I do need a localized name. So, again, I call NSLocalizedString from table. I hand in the name of the mountain. I look inside names.strings. This time, it's not possible to have a really useful localization hint. You could put one in. There's certainly no harm to it. In this case, we didn't. And you get back the correctly localized name for the mountain.

Equally easy. Here's where the gotchas come in. You have an NSNumber. You want to include it in a formatted string that the user sees. String with format handles NSNumbers. You can hand it an NSNumber and it will format for you. The temptation is going to be to do that. Don't do that. Numbers are formatted in different ways in different parts of the world.

French and English, yeah, a Frenchman can live with the way English format numbers and vice versa. In fact, string with format doesn't even put the commas in. There's no problem there, surely. Arabic countries do not use the same set of numerals as the West. They will want to see the Arabic numerals used instead of the Hindu Arabic numerals that we use. So you want a localized number. How do I do that? Well, I need a number formatter.

Create it, initialize it. I tell it, okay, this is going to be a decimal number. That means you use a decimal point and a thousand separators, possibly. And then I just give it the number, hand that off to string from number, and I get back a correctly localized string.

[Transcript missing]

Okay. Here we go. Now, let's start off by looking a little bit at our localization stuff. So let's go to System Preferences. International Preferences. We're set to the United States. We can live with this for the moment if we really have to. Look at the number of regions available. That's not a terribly long list.

Well, it happens that this is just a list of the countries that have English locales defined. I must admit this is a really weird list. I would not have expected Botswana and Belgium to have English locales defined for them, but they do. So that data is there. If I want to see all the locales, show all regions. This shows me all of the places that we have data for. This is a much longer list.

is grouped by language, so it's easy to find what I'm looking for. Some places, Afrikaans is only used in South Africa. Other places, Arabic is used in lots of different countries. So in each of these cases, you can have lots of different places that use a particular language. So there are lots and lots of different places for which localization, localized data exists.

All right. Let's switch back to languages for the moment. All right. Now, we're going to show off localization. So showing English first, well, that's dull. Let's do French first, which is far more interesting. And, okay, let's leave things there for the moment. All right, here's our application. Launch it. And something has gone wrong. This isn't I tell you that because you may not notice. What happened? Well, okay, let's look inside here. This is a bundle. It's a folder. Hold down control. Okay, show package contents.

Okay, contents, resources. All right, localizations that this application has, desert alphabet, English, Japanese, Russian, traditional Chinese, no French. That's why we saw no French. There is no French localization. All right, we'll deal with that problem in a minute. Now, let's look here. Actually, we do see some French.

We've gotten as much French as the system can provide in the absence of an actual French localization. The dates are French. That's been taken care of for me. Here we have a list of mountains. This is a fascinating list of mountains. We have some very tall mountains here. Mount Everest, K2, Annapurna.

We have some very important American mountains, Mount Whitney, Mount Shasta, for the really patriotic Mount Washington. I have a couple of mountains that I can see from my front porch, so they're there, too. That's fine. We have the date they were first climbed. I don't know when the Mount Olympus in Salt Lake was first climbed, so I leave that blank. That's all right. I can sort everything. This is an NS table view, so I can do that. I can do this very easily.

Okay. Up at the top, I have a sentence about whatever mountain is selected. I change the selection. That updates automatically. Okay. Now, I have a date and time picker. I did this just to be cute. I'll be honest. Simply because it's doing the right thing. I had to do no work here. I simply put it in my application, and it knows that the user wants to use French, and so it gives them a French calendar.

[Transcript missing]

My application is aware of when the locale changes. Here we go. Heights are now formatted the way it's done in Luxembourg. To be honest, this is not what I was expecting, because the French would have left a space. People in Luxembourg prefer a period. I'm using the metric system now, because I'm in Europe. The heights have switched to meters. The system won't change between metric and English for you. I'm sorry. You do have to do that yourself.

But at least you can find out which you're supposed to use. Now, finally, down here, in case you didn't notice, here I have a list of what locales are available for this application, what localizations are available. English, this is the English-English flag, as opposed to the British flag. Japanese, Russian, and traditional Chinese. So here I'm getting data from different locales. I'm running through the list of available locales. I'm seeing which ones are defined for this application.

And I'm getting the flag icon from them. All right. So everything seems to be working, just so you notice. If I change the order of the languages, it does update here. So that's all working the way I want. Let's go back to French first. All right. Now, we're missing the French localization. Very bad. If only there were a French localization available.

Fortunately, somebody, some kind soul, has left a folder called fr.lproj on my desktop. Let us see what happens when I put it inside my application. Launch, and we're in French. Now, localization can be done in an entirely drop-in way on Mac OS. To add a localization, you just add the L project and that's it. It is a very simple process. This means that localizations can be added after the fact. After you ship, new localizations can be added to your application. So, all right, everything is in French now. Well, we've done a couple of other changes, too.

We've rearranged things. Our date and time picker is now on the right instead of the left. You do see this. Not so much for French, not so much for Western European languages, but definitely for Middle Eastern languages. Languages which are written right to left tend to prefer to have the UI elements reordered appropriately. So you should not assume that you know how these windows are laid out. The localizer may need to change it to fit local practices.

And of course, we have now our list of localizations showing up correctly. Now notice something else. Annapurna K2, Mount McKinley, Mount Everest, list of mountains has changed. Now here, I have a localized set of mountains and data for the mountains. So the people in France are evidently not terribly interested in the mountains that are visible from my front porch. They would rather find out about Le Cervin, the Matterhorn. So the French localizer has changed the list of mountains to reflect what the user wants to see. You'll notice that I've done another bit of cheating here. Well, not real cheating. Another little trick.

In the middle of the sentence, the Servant is not capitalized. Shouldn't be. It's in the middle of a sentence. When it's standing by itself, it is. So in this case, I have my NSString that I got back, but before I hand it off to the table view, I capitalized it because it's standing by itself. So again, the system is doing the right thing. Now, capitalizing French is really not terribly difficult.

Writing a capitalization routine that can handle hundreds of different languages and 77 different scripts, most of which you've never even heard of, that's a bit harder. Fortunately, InnoString is doing the work for us. I just say I want the capitalized string, and I get it back. All right, that's the demo. As I say, a version of this is currently available on the website, and an updated version will be available soon. If we can go back to the slides.

All right. More information. The person to contact is Derek Horn, the application technologies evangelist. [email protected]. He will give you the help that you need or possibly direct you to the place you need to go. Documentation, if you want to find out how to internationalize on Mac OS, excellent documentation is available at the developer website, developer.apple.com. Let's be localized about this, stroke internationalization.

That's how the BBC seems to say it all the time. If you want localization tools, Apple's tools, these are a free download. This is where you can get Apple Glot and other tools and the glossaries that we use. developer.apple.com/internationalization/ localization/tools. And if you want to know more about Unicode, go to www.unicode.org. Excellent place to start.

There are some sessions with content related to this session. A couple of these have already been presented, but you might want to go back and see what was said there if you didn't attend. There is designing applications with interface builder. Interface builder is your basic tool for laying out your UI. That was yesterday at 5:00. This morning we had font management and core text.

There is a lot of overlap in terms of engineering between people who display text and people who set up interface. There is a lot of overlap in content. So this is a session that you might want to look into if you didn't attend. Coming up this evening, we have text input on iPhone. This evening at 5:00. And then tomorrow we have polishing your Cocoa application. This will be an excellent opportunity to see how this all fits into the bigger picture of Cocoa. A couple of labs.

We have an input method and internationalization lab coming up this afternoon right after this session at 3:30. iPhone text input lab tomorrow at 2:00. And finally, Friday afternoon, you can stick around long enough if you really want for the core text lab, 2:00, foundations lab A. And again, there is going to be a lot of expertise there, not just on core text but also on internationalization if you have any questions. So in summary.

Unicode is the way to do internationalized text on the Mac, and not just on the Mac, pretty much everywhere. Do not assume that you're dealing with Unicode. Don't assume that you can deal with text on a character-by-character basis. Remember, you have to deal with chunks of characters. Above all, use the APIs built into the system. This means that your work is taken care of. Do not try to write your own international support. Unless you are expert, have a lot of time, something will go wrong along the way.

Okay. CS101 should have taught you this, but you know what? Programmers are lazy and we sometimes forget it. Never show hard-coded strings to the user. Never, never, never. Hard-coded strings are useful in your application as keys, but never as something that you show to the user. Also, this is a gotcha, never assume anything about the relative layout of objects in your UI. They may have to change from localization to localization.

Don't assume that they're always going to be laid out in the same way. Oh, and of course, never show unlocalized numbers to the user. This is not a CS101 thing. This is something that is easy to forget. Don't do it. Always use a formatted number. Use NSNumberFormatter. Get the correct localization.