Delivering HTML To a WML Device
When Is Converting Worth the Effort?
Given the discussion above, there are a few HTML-to-WML conversions that are more problematic than they are worth:
- Tables
Unless you know that every table in the document is extremely narrow and contains no fancy formatting/parameters, you should simply remove the tags. - Graphics
Some gateways will convert graphic files into the prerequisite WBMP format. However, most will simply refuse to display the standard JPG/GIF/PNG Web formats. Unless you have the appropriate graphics available in the WBMP format, remove the graphics. - Code
HTML pages that rely on Java, JavaScript, or some other scripting language generally will not be compatible with mobile devices, especially those compatible only with WML. Devices using IE (such as CE-equipped PDAs) will fare much better, but you can't rely on that.
In short, only textual pages are worth the time to convert. More complex pages should be redesigned for each individual platform you want to support. Keep in mind that straight WML does not have the facility to convert HTML--you must use a CGI or PHP script to deliver the content instead.
Note: See the two previous articles on how to integrate PHP into your WML delivery.
Conversion Procedures
Converting standard HTML-formatted text is a two-step process. First, remove any tags that are not supported by the target platform. Second, tailor supported tags to the target platform. For example, the line break tag is supported by WML, but needs to have the slash added ("<br />").
Removing Unsupported Tags
To ensure a smooth conversion, remove all but the following tags from the HTML code:
- <p>
- </p>
- <br>
You can also retain text-formatting tags that your target browser supports, such as <i>, <b>, etc.
If you are using PHP, the code to strip the offending tags is very simple:
$wml = strip_tags($html,'<p><br><i><b><u>');
Using the HTML (stored in $html), the above code removes all tags but those given in the "strip_tags" parameter, and stores the result in the variable $wml.
If you are feeling adventurous and know the format of tables in the code, you can parse the table tags down to the bare minimum parameters (as supported by your target browser). However, only the smallest tables will display conveniently on mobile devices.
Converting Supported Tags
Although paragraph (<p>) and line break (<br />) tags are supported in WML, their usage varies from that in HTML. For example, blocks of text must be enclosed in paragraph tags; you cannot use a stray tag to separate paragraphs, like this:
Paragraph . . .
<p>
Paragraph . . .
Although such use is sloppy when used anywhere, it has become prevalent in HTML pages. Instead of creating a sophisticated parsing scheme to ensure the matching pairs of tags, it's much easier to convert all open and closing paragraph tags to double line break tags. This causes the current line to break where the paragraph tag was used, and inserts the extra space between the paragraphs.
Again, if you are using PHP, the code is straightforward:
$wml = str_replace("<p>","<br /><br />",$wml);
$wml = str_replace("</p>","<br /><br />",$wml);
The above code will replace every "<p>" and "</p>" with "<br /><br />".
Each line break tag in WML must end in a slash. A similar PHP str_replace statement takes care of this requirement:
$wml = str_replace("<br>","<br />",$wml);
Note: PHP functions that support regular expressions can be more versatile and can do more work per statement if constructed correctly. I prefer to use individual statements for later flexibility and clearer code.
Miscellaneous Cleanup
Two more items need to be cleaned up to display correctly in WML: ampersands ("&") and dollar signs ("$"). An ampersand must be converted to an entity ("&"), and a dollar sign must be doubled ("$$").
Again, in PHP you can use the str_replace function:
$wml = str_replace("&","&",$wml);
$wml = str_replace("$","$$",$wml);
Note: An abundance of special characters can find their way into otherwise mundane HTML code. For example, when text is cut-and-pasted from a word processing document into HTML documents, single and double quotes usually appear as extended ASCII characters, and must be converted to the appropriate plain text characters or HTML entities. Only direct experience and experimentation with your specific documents can determine what problems you may have and need to work around.
About the Author
Steve Schafer is president and CEO of Progeny Linux Systems, a Linux-based consulting company in Indianapolis, Indiana. He has written several technical books and articles and can be reached at sschafer@synergy-tech.com.
# # #
Next article: Interactive Fun and Games with WAP and WML
