This series of articles describes how to provide Webcontent to mobile devices through WML (Wireless Markup Language). This articlecovers techniques to use when delivering standard HTML to WML-compatibledevices.
Note: These articles cover WML and WMLScript version 1.1,which are supported by the majority of mobile devices in use today. Thearticles assume a working knowledge of HTML and general Web technologies, andfurther assume that you have read the previous article(s) in this series.
Delivering Converted HTML
There may be several reasons why you may need to deliverstandard HTML markup text to a WML-compatible device. You may have data storedin a database that is typically displayed in a standard browser; legacy data orpages that resist conversion; or cross-platform text that needs to be primarilyavailable for a standard HTML browser, but would be useful if delivered to WMLclients.
For example, I’m the administrator for a movie news Website. The articles for the site are marked up using standard HTML, stored in aSQL database, and delivered to the clients using PHP pages. The bite-sizedarticles also make for great wireless contentsomething that can be browsedwhile in an airport or during other downtimeso I decided to make this contentavailable in WML. Unfortunately, I quickly found out how incompatible evenminor HTML tags are with WML, creating the need for some simple conversionprocedures. Although not perfect, those procedures are used as the basis forthis article.
Tip: The quick-and-dirty methods described in this articleare handy as a temporary or short-term measure. If you intend to support aparticular platform long-term, I recommend creating custom code for thatplatform.
Standard HTML vs. WML
Standard HTML documents don’t work well on WML-capabledevices. Even if the wireless services offer translation services through theirgateways, standard HTML seldom displays as the developer or user would like.
Limited Tags
WML supports a very limited subset of HTML tags. Among thosesupported are the following tags:
- Character formatting
- <B> – Bold
- <I> – Italic
- <U> – Underline
- <BIG> – Big text
- <SMALL> – Small text
- <STRONG> – Strong (visually emphasized) text
- <P> – Paragraph
- Table tags
- <TABLE>
- <TR> – Table row
- <TD> – Table column/cell
Several tags nearly alike between the two languages, buttheir formatting and/or parameters are different enough to cause problems. Forexample, the line break tag is simply <br> in HTML, but <br /> inWML. Also, tags such as the table tags support many more options and parametersin HTML than in WML, and rarely allow HTML tables to display properly in WML.
Device Display Limitations
Standard HTML documents are generally designed for largedisplays, such as 800 x 600 resolution CRTs connected to a PC, not a 240 x 320LCD on a PDA (or smaller, if a cell phone). Even devices that runHTML-compliant browsers (such as IE in Windows CE devices) have problems withthe majority of today’s Web sites.
The almost unrecognizable internet.com home page, displayedin IE on a Pocket PC (Windows CE).
Tip: To gauge roughly how a page will look on a smallerdevice, shrink your standard PC browser window down to that size.
Most mobile devices don’t support the vast array of textformatting available toPC browsers. For example, earlier versions of certainmobile browsers don’t support underlining; others don’t support italic or boldtext. Tables are especially problematic due to their width.
Device memory is also a problem. Most mobile browsers onlysupport pages (decks) a few kilobytes in size, requiring the content to bebroken down into bite-sized chunks and displayed across several cards, if notseveral decks.
Finally, most modern PC-based browsers (IE, Mozilla,Netscape, and so on) have built-in logic to handle incomplete or misused tags.For example, most PC browsers are forgiving of HTML documents that fail toclose a major element such as a table or the body of the document. Most mobilebrowsers are far less forgiving, requiring very strict use of tags.
Device Input Limitations
Interactive Web pages present even more challenges to themobile user. Anyone who has needed to tap/write out even a short note on a PDAcan appreciate the need to keep interfaces simple. Those who have tried tocompose more than just a few characters on a standard cell phone keypad canappreciate this even more strongly.
The simplest Web interface is the form, whose structure isconsiderably different in WML. Simply converting the structure and tags isn’tsufficient; you also have to consider how it will affect the end user on his orher individual platform. For example, choosing the correct state code from adrop-down list is easy on a standard browser. However, drop-down liststranslate to select lists in WML, necessitating a list of 50 entries that theuser must scrolled through (usually 9 items per page) to select the propercode.
When Is Converting Worth the Effort?
Given the discussion above, there are a few HTML-to-WMLconversions that are more problematic than they are worth:
- Tables
Unless you know that every table in the document is extremely narrow and contains no fancy formatting/parameters, you should simply remove the tags. - Graphics
Some gateways will convert graphic files into the prerequisite WBMP format. However, most will simply refuse to display the standard JPG/GIF/PNG Web formats. Unless you have the appropriate graphics available in the WBMP format, remove the graphics. - Code
HTML pages that rely on Java, JavaScript, or some other scripting language generally will not be compatible with mobile devices, especially those compatible only with WML. Devices using IE (such as CE-equipped PDAs) will fare much better, but you can’t rely on that.
In short, only textual pages are worth the time to convert.More complex pages should be redesigned for each individual platform you wantto support. Keep in mind that straight WML does not have the facility toconvert HTML–you must use a CGI or PHP script to deliver the content instead.
Note: See the two previous articles on how to integrate PHPinto your WML delivery.
Conversion Procedures
Converting standard HTML-formatted text is a two-stepprocess. First, remove any tags that are not supported by the target platform.Second, tailor supported tags to the target platform. For example, the linebreak tag is supported by WML, but needs to have the slash added (“<br/>”).
Removing Unsupported Tags
To ensure a smooth conversion, remove all but the followingtags from the HTML code:
- <p>
- </p>
- <br>
You can also retain text-formatting tags that your targetbrowser supports, such as <i>, <b>, etc.
If you are using PHP, the code to strip the offending tagsis very simple:
$wml =strip_tags($html,'<p><br><i><b><u>’);
Using the HTML (stored in $html), the above code removes alltags but those given in the “strip_tags” parameter, and stores theresult in the variable $wml.
If you are feeling adventurous and know the format of tablesin the code, you can parse the table tags down to the bare minimum parameters(as supported by your target browser). However, only the smallest tables willdisplay conveniently on mobile devices.
Converting Supported Tags
Although paragraph (<p>) and line break (<br />)tags are supported in WML, their usage varies from that in HTML. For example,blocks of text must be enclosed in paragraph tags; you cannot use a stray tagto separate paragraphs, like this:
Paragraph . . .
<p>
Paragraph . . .
Although such use is sloppy when used anywhere, it hasbecome prevalent in HTML pages. Instead of creating a sophisticated parsingscheme to ensure the matching pairs of tags, it’s much easier to convert allopen and closing paragraph tags to double line break tags. This causes thecurrent line to break where the paragraph tag was used, and inserts the extraspace between the paragraphs.
Again, if you are using PHP, the code is straightforward:
$wml = str_replace(“<p>”,”<br/><br />”,$wml);
$wml = str_replace(“</p>”,”<br/><br />”,$wml);
The above code will replace every “<p>” and”</p>” with “<br /><br />”.
Each line break tag in WML must end in a slash. A similarPHP str_replace statement takes care of this requirement:
$wml = str_replace(“<br>”,”<br/>”,$wml);
Note: PHP functions that support regular expressions can bemore versatile and can do more work per statement if constructed correctly. Iprefer to use individual statements for later flexibility and clearer code.
Miscellaneous Cleanup
Two more items need to be cleaned up to display correctly inWML: ampersands (“&”) and dollar signs (“$”). Anampersand must be converted to an entity (“&”), and a dollarsign must be doubled (“$$”).
Again, in PHP you can use the str_replace function:
$wml =str_replace(“&”,”&”,$wml);
$wml = str_replace(“$”,”$$”,$wml);
Note: An abundance of special characters can find their wayinto otherwise mundane HTML code. For example, when text is cut-and-pasted froma word processing document into HTML documents, single and double quotesusually appear as extended ASCII characters, and must be converted to the appropriateplain text characters or HTML entities. Only direct experience andexperimentation with your specific documents can determine what problems youmay have and need to work around.
About the Author
Steve Schafer is president and CEO of Progeny Linux Systems,a Linux-based consulting company in Indianapolis, Indiana. He has writtenseveral technical books and articles and can be reached atsschafer@synergy-tech.com.
# # #