Use Java to internationalize your HTML
For one reason or another, you've decided to internationalize your Web site. English and maybe a little Spanish and German just won't do anymore; now you need to support Japanese, Russian, and maybe even Arabic. After all, they don't call the Web "World Wide" for nothing.
Internationalization (abbreviated i18n, because there are 18 letters between 'i' and 'n') involves much more than replacing words in English with words in other languages. You must translate entire sentences or even paragraphs in order to retain the proper context for your text. You must also make sure date, time, currency, and other numerical formats appear properly. And what if you sort words alphabetically? Remember, Chinese doesn't even use an alphabet.
An obvious place to begin your i18n efforts is with Java. After all, Java 1.1 contains a robust i18n library, and it supports Unicode, the international standard character set that contains almost all of the world's written symbols. So you dig out your favorite Java reference and read all about locales, resource files, and the like. Fine. Now you know how to i18ize your applets, assuming, of course, your users have a Unicode font installed in their systems. (This article assumes you have at least a passing familiarity with i18n in Java, which you can get from any good Java 1.1 or 1.2 book.)
The problem with fontsIn this article I will talk about the font problem. There are Unicode fonts out there that have the thousands of Unicode symbols in them, but there's no reason to believe your users will have one. You can be sure, though, that your Japanese users have a Japanese font, your Russian users have a Cyrillic font, etc. You can also be reasonably sure that almost all of your users are using local versions of Netscape and Microsoft browsers. How does a Japanese version of Communicator, say, know to display a page of Japanese HTML in kanji?
The browser will read the character encoding specified by the Web page. Web pages that contain non-ASCII characters typically contain a META tag that specifies the character encoding of the document. For example, you might see the following META tag near the top of the page of Japanese HTML:
(When I say "Japanese HTML," I mean that the text the user sees is in Japanese; the HTML tags themselves are still in the English-based ASCII we all know and love.) Browsers expect to receive a stream of ASCII symbols. If it has an appropriate Japanese font installed, the browser will automatically convert the symbols it receives into the Shift-JIS character encoding specified in the META tag.
The Japanese characters represented by the Shift-JIS encoding are a subset of Unicode, but the Shift-JIS character encoding is not compatible with the Unicode character encoding: if you replaced the charset in the META tag with the Unicode charset, the browser would encode the symbols incorrectly, and you would see a page of gibberish.
As a result, you must support a different character encoding for each language that uses different characters. Supporting multiple character encodings, though, is really not that bad; in fact, it will only take a moment of your translators' time.
That brings up the next point -- supporting your human translators. Make no mistake, for every language you want to support, you will need to hire someone fluent in that language to translate your Web site from English. This process is the most expensive part of i18n, so the more you can do to save your translators' time, the better.
Localizing your Web siteThere are actually two parts of making your Web site truly international -- i18n and localization. I18n means making your site language-neutral, while localization means taking your site and putting it into a particular language. (To be precise, localization means putting your site in a particular locale. A locale includes rules about numbers, time, currency, etc., in addition to the language itself. In this article, however, I am only talking about the language.)
Java handles text localization with resource files. A resource file consists of i18n-ized words and phrases paired with expressions of those phrases in the desired language. For example, a German resource file might contain:
On the left are i18n-ized phrases; think of them as variables that stand for a phrase. (I made mine look like English to make the resource file more readable.) The translator takes the English resource file, which might contain:
and replaces the English on the right with text in his or her language. "Sure," you're saying, "German is easy; it's pretty much just ASCII. What about a language like Japanese?" The important fact to remember is that the goal is to send the Japanese browser a stream of symbols that it will convert into Japanese properly. So you ask your translators to use a tool that will produce valid Web pages in their language. For example, our Japanese translator uses the Japanese version of Microsoft FrontPage, not to create HTML (which it will do, of course), but just to enter the text. Because FrontPage is designed to be compatible with browsers, the files it creates will have the proper character encodings.
The resource file I give the translator must therefore be HTML, not because the Java i18n library requires it (it doesn't), but because if their translation looks right when the resource file itself is displayed in a browser, then I know that their translated phrases will also look right when my Web pages are displayed in a browser. So the file I give the Japanese translator looks like this:
The # symbol indicates a comment in the resource file, so Java will ignore those lines when it reads the file. However, # is just another character to a browser, so the above page is a complete page of HTML. Once the translator is done, check the file in a Japanese browser. If the translated strings look right, the translator has done their job correctly.
Notice that the character encoding appears twice; once in the META tag (so that the resource file will display properly in a browser), and again as the translation of the variable "char_encoding."
Using the Java I18n libraryPulling the appropriate strings into your HTML is a simple matter of creating a Java class that contains public methods that return the appropriate strings given the proper input. I use the
All the Java in class Query should be familiar to you, if you have a familiarity with the Java i18n library.
Next, you write some SSJS functions that load the class Query into the project object (making it available to the entire SSJS application) and then access the various methods of Query, like so:
Note that the variable "client.language" contains the two-letter string that represents the locale of the user. You can set this on an earlier page when the user selects their preferred language, or you can have your Web server set this automatically based on the locale information supplied by the user's browser.
Building your international Web pageWe're finally ready to create a Web page. The following is an example of building a Web page with SSJS; if you know Active Server Pages, JHTML, or Server-Side Includes, it should look pretty familiar:
ConclusionYou know that i18n-ized pages will look right in local browsers, because you know the resource file looked right in the same browser. The resource file is likely to be unreadable when opened in a text editor, but that doesn't matter. All that matters is that the Web pages look the way they're supposed to.
The translators use the character encoding appropriate to their language, and they perform their translations with a Web page tool that is also localized to their language. In addition, all the text they must translate is located in a single resource file, making their job easier, which saves you money and time. Best of all, because your i18n is on the server side, all you send to the browser is localized HTML, which gives you excellent compatibility with the browsers you will find around the world.