April 19, 2014
Hot Topics:
RSS RSS feed Download our iPhone app

Use Java to internationalize your HTML

  • November 19, 1998
  • By Jason Bloomberg
  • Send Email »
  • More Articles »


For one reason or another, you've decided to internationalize your Web site. English and maybe a little Spanish and German just won't do anymore; now you need to support Japanese, Russian, and maybe even Arabic. After all, they don't call the Web "World Wide" for nothing.

Internationalization (abbreviated i18n, because there are 18 letters between 'i' and 'n') involves much more than replacing words in English with words in other languages. You must translate entire sentences or even paragraphs in order to retain the proper context for your text. You must also make sure date, time, currency, and other numerical formats appear properly. And what if you sort words alphabetically? Remember, Chinese doesn't even use an alphabet.

An obvious place to begin your i18n efforts is with Java. After all, Java 1.1 contains a robust i18n library, and it supports Unicode, the international standard character set that contains almost all of the world's written symbols. So you dig out your favorite Java reference and read all about locales, resource files, and the like. Fine. Now you know how to i18ize your applets, assuming, of course, your users have a Unicode font installed in their systems. (This article assumes you have at least a passing familiarity with i18n in Java, which you can get from any good Java 1.1 or 1.2 book.)

Right away you see a big problem. "Wait a minute," you say, "my Web site may have a few applets on it, but the bulk of what my users see is just HTML! And in any case, there's no reason to believe my users have a Unicode font installed on their computers! And if that weren't bad enough, Java i18n is supported by only the most recent browser versions!" Are you back to square one? Can you use Java to i18n-ize your site? Sure you can. The trick is to use server-side Java, which you access from the browser using some sort of server-side scripting technology. I will use Server-Side JavaScript (SSJS) and server-side LiveConnect, which are supported by the Netscape Enterprise Server, but you could use Java Servlets or some other technology.

The problem with fonts

In this article I will talk about the font problem. There are Unicode fonts out there that have the thousands of Unicode symbols in them, but there's no reason to believe your users will have one. You can be sure, though, that your Japanese users have a Japanese font, your Russian users have a Cyrillic font, etc. You can also be reasonably sure that almost all of your users are using local versions of Netscape and Microsoft browsers. How does a Japanese version of Communicator, say, know to display a page of Japanese HTML in kanji?

The browser will read the character encoding specified by the Web page. Web pages that contain non-ASCII characters typically contain a META tag that specifies the character encoding of the document. For example, you might see the following META tag near the top of the page of Japanese HTML:

<META http-equiv="Content-Type"
content="text/html; charset=x-sjis">

(When I say "Japanese HTML," I mean that the text the user sees is in Japanese; the HTML tags themselves are still in the English-based ASCII we all know and love.) Browsers expect to receive a stream of ASCII symbols. If it has an appropriate Japanese font installed, the browser will automatically convert the symbols it receives into the Shift-JIS character encoding specified in the META tag.

The Japanese characters represented by the Shift-JIS encoding are a subset of Unicode, but the Shift-JIS character encoding is not compatible with the Unicode character encoding: if you replaced the charset in the META tag with the Unicode charset, the browser would encode the symbols incorrectly, and you would see a page of gibberish.

As a result, you must support a different character encoding for each language that uses different characters. Supporting multiple character encodings, though, is really not that bad; in fact, it will only take a moment of your translators' time.

That brings up the next point -- supporting your human translators. Make no mistake, for every language you want to support, you will need to hire someone fluent in that language to translate your Web site from English. This process is the most expensive part of i18n, so the more you can do to save your translators' time, the better.

Localizing your Web site

There are actually two parts of making your Web site truly international -- i18n and localization. I18n means making your site language-neutral, while localization means taking your site and putting it into a particular language. (To be precise, localization means putting your site in a particular locale. A locale includes rules about numbers, time, currency, etc., in addition to the language itself. In this article, however, I am only talking about the language.)

Java handles text localization with resource files. A resource file consists of i18n-ized words and phrases paired with expressions of those phrases in the desired language. For example, a German resource file might contain:

good_morning = guten Morgen
how_are_you = Wie geht's?

On the left are i18n-ized phrases; think of them as variables that stand for a phrase. (I made mine look like English to make the resource file more readable.) The translator takes the English resource file, which might contain:

good_morning = good morning
how_are_you = How are you?

and replaces the English on the right with text in his or her language. "Sure," you're saying, "German is easy; it's pretty much just ASCII. What about a language like Japanese?" The important fact to remember is that the goal is to send the Japanese browser a stream of symbols that it will convert into Japanese properly. So you ask your translators to use a tool that will produce valid Web pages in their language. For example, our Japanese translator uses the Japanese version of Microsoft FrontPage, not to create HTML (which it will do, of course), but just to enter the text. Because FrontPage is designed to be compatible with browsers, the files it creates will have the proper character encodings.

The resource file I give the translator must therefore be HTML, not because the Java i18n library requires it (it doesn't), but because if their translation looks right when the resource file itself is displayed in a browser, then I know that their translated phrases will also look right when my Web pages are displayed in a browser. So the file I give the Japanese translator looks like this:

#<html><head><meta http-equiv="Content-Type" content="text/html; charset=x-sjis"></head><body><pre>
char_encoding = x-sjis

#Please translate the phrases on the right of the equals signs below!
good_morning = good morning
how_are_you = How are you?

#</PRE></BODY></HTML>

The # symbol indicates a comment in the resource file, so Java will ignore those lines when it reads the file. However, # is just another character to a browser, so the above page is a complete page of HTML. Once the translator is done, check the file in a Japanese browser. If the translated strings look right, the translator has done their job correctly.

Notice that the character encoding appears twice; once in the META tag (so that the resource file will display properly in a browser), and again as the translation of the variable "char_encoding."

char_encoding
will be used to localize the META tag that will appear on the page that you serve to the browser.

Using the Java I18n library

Pulling the appropriate strings into your HTML is a simple matter of creating a Java class that contains public methods that return the appropriate strings given the proper input. I use the
getPhrase ()
method to pull the strings out of the resource file:
import java.util.*;
import java.io.*;
import java.text.*;

public class Query
	{

	public synchronized String getPhrase (String input, String loc)

		{
		Locale currentLocale = new Locale (loc, "");
		ResourceBundle bundle = 
		 	 ResourceBundle.getBundle ("myResourceFile", currentLocale);
		String s = bundle.getString (input);
		return s;
		}
	
	/*	other localization methods go here, for example, 
		ones that return dates, times, and currency amounts	*/

	}

All the Java in class Query should be familiar to you, if you have a familiarity with the Java i18n library.

Next, you write some SSJS functions that load the class Query into the project object (making it available to the entire SSJS application) and then access the various methods of Query, like so:

function loadquery ()
	{
	if (project.query == null)
		project.query = new Packages.Query ();
	}
	
function phrase (str)
	{
	return "" + project.query.getPhrase (str, client.language);
	}

Note that the variable "client.language" contains the two-letter string that represents the locale of the user. You can set this on an earlier page when the user selects their preferred language, or you can have your Web server set this automatically based on the locale information supplied by the user's browser.

Building your international Web page

We're finally ready to create a Web page. The following is an example of building a Web page with SSJS; if you know Active Server Pages, JHTML, or Server-Side Includes, it should look pretty familiar:
<HTML><HEAD>
<server>
loadquery ();
write ('<meta http-equiv="Content-Type"
 content="text/html; charset=' + phrase('char_encoding') + '">	\n');
write ('<title>' + phrase ('good_morning') + '</title>	\n');
write ('</head>	\n');
write ('<h1>' + phrase ('how_are_you') + '</h1>	\n');
write ('\n');
</server>
</HTML>

Passing

char_encoding
to the function phrase
()
returns the string representing the character encoding that the user selected when they specified their desired language. Once the browser sees this string in the META tag, the browser knows how to display the page properly.

Conclusion

You know that i18n-ized pages will look right in local browsers, because you know the resource file looked right in the same browser. The resource file is likely to be unreadable when opened in a text editor, but that doesn't matter. All that matters is that the Web pages look the way they're supposed to.

The translators use the character encoding appropriate to their language, and they perform their translations with a Web page tool that is also localized to their language. In addition, all the text they must translate is located in a single resource file, making their job easier, which saves you money and time. Best of all, because your i18n is on the server side, all you send to the browser is localized HTML, which gives you excellent compatibility with the browsers you will find around the world.

Taking advantage of Java's i18n library on the server side is just one example of the power of LiveConnect -- Netscape's technology for connecting JavaScript and Java. What if you wanted to i18n-ize a high-volume Web solution that included multiple Web servers? Can you locate your resource files on an application server, and have each Web server access the same resource files? Yes, you can, using server-side LiveConnect and Java RMI. However, you will have to wait for my next article for the details.

Resources

About the author

Jason Bloomberg has been coding, scripting and programming Web sites since early 1995. He is now Director of Web Technology at TransNexus LLC, an Internet Telephony company, but is best known for his JavaScript games at The Rhodes Arcade. His book, Web Page Scripting Techniques, was published by Hayden Books in 1996. He has two children and lives in Atlanta, GA.







Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 


Sitemap | Contact Us

Rocket Fuel