Delivering HTML To a WML Device, Page 2
When Is Converting Worth the Effort?
Given the discussion above, there are a few HTML-to-WMLconversions that are more problematic than they are worth:
Unless you know that every table in the document is extremely narrow and contains no fancy formatting/parameters, you should simply remove the tags.
Some gateways will convert graphic files into the prerequisite WBMP format. However, most will simply refuse to display the standard JPG/GIF/PNG Web formats. Unless you have the appropriate graphics available in the WBMP format, remove the graphics.
In short, only textual pages are worth the time to convert.More complex pages should be redesigned for each individual platform you wantto support. Keep in mind that straight WML does not have the facility toconvert HTML--you must use a CGI or PHP script to deliver the content instead.
Converting standard HTML-formatted text is a two-stepprocess. First, remove any tags that are not supported by the target platform.Second, tailor supported tags to the target platform. For example, the linebreak tag is supported by WML, but needs to have the slash added ("<br/>").
Removing Unsupported Tags
To ensure a smooth conversion, remove all but the followingtags from the HTML code:
You can also retain text-formatting tags that your targetbrowser supports, such as <i>, <b>, etc.
If you are using PHP, the code to strip the offending tagsis very simple:
Using the HTML (stored in $html), the above code removes alltags but those given in the "strip_tags" parameter, and stores theresult in the variable $wml.
If you are feeling adventurous and know the format of tablesin the code, you can parse the table tags down to the bare minimum parameters(as supported by your target browser). However, only the smallest tables willdisplay conveniently on mobile devices.
Converting Supported Tags
Although paragraph (<p>) and line break (<br />)tags are supported in WML, their usage varies from that in HTML. For example,blocks of text must be enclosed in paragraph tags; you cannot use a stray tagto separate paragraphs, like this:
Paragraph . . .
Paragraph . . .
Although such use is sloppy when used anywhere, it hasbecome prevalent in HTML pages. Instead of creating a sophisticated parsingscheme to ensure the matching pairs of tags, it's much easier to convert allopen and closing paragraph tags to double line break tags. This causes thecurrent line to break where the paragraph tag was used, and inserts the extraspace between the paragraphs.
Again, if you are using PHP, the code is straightforward:
$wml = str_replace("<p>","<br/><br />",$wml);
$wml = str_replace("</p>","<br/><br />",$wml);
The above code will replace every "<p>" and"</p>" with "<br /><br />".
Each line break tag in WML must end in a slash. A similarPHP str_replace statement takes care of this requirement:
$wml = str_replace("<br>","<br/>",$wml);
Note: PHP functions that support regular expressions can bemore versatile and can do more work per statement if constructed correctly. Iprefer to use individual statements for later flexibility and clearer code.
Two more items need to be cleaned up to display correctly inWML: ampersands ("&") and dollar signs ("$"). Anampersand must be converted to an entity ("&"), and a dollarsign must be doubled ("$$").
Again, in PHP you can use the str_replace function:
$wml = str_replace("$","$$",$wml);
Note: An abundance of special characters can find their wayinto otherwise mundane HTML code. For example, when text is cut-and-pasted froma word processing document into HTML documents, single and double quotesusually appear as extended ASCII characters, and must be converted to the appropriateplain text characters or HTML entities. Only direct experience andexperimentation with your specific documents can determine what problems youmay have and need to work around.
About the Author
Steve Schafer is president and CEO of Progeny Linux Systems,a Linux-based consulting company in Indianapolis, Indiana. He has writtenseveral technical books and articles and can be reached email@example.com.
# # #
Page 2 of 2