Term of the Week: XHTML
HTML (HyperText Markup Language) exploded into the savvy computer user and developer's consciousness around 1994. Although it had been around before that, this was roughly the tipping point when HTML went from obscure scientific curiosity to the point where you could pick up a mainstream computer magazine or go to a bookstore and be assaulted by an overwhelming number of articles and books about HTML. Over the next several years, the W3C (World Wide Web Consortium) codified several new versions of HTML, driven by the leaps in technology from the Mosaic, then Netscape and Internet Explorer web browsers.
All of these early HTML versions, up through the current and final HTML 4.01 standard were rooted in SGML (Standard Generalized Markup Language). HTML was a dramatic simplification of the massively complex SGML standard. But, HTML lacked in extensibility - every time there was a need for a new tag or feature, the standard itself had to be rewritten. And, HTML allowed for sloppiness. There were rules, sure, but they didn't need to be enforced. And, HTML wasn't easy to display on the widening number of device types people use to access the web: it's good for computer monitors, but not great for mobile phones or PDAs.
To address these deficiencies, and to allow for greater flexibility in the future, the W3C set XHTML (eXtensible HyperText Markup Lanaguage) as the new HTML standard. XHTML extends HTML. XHTML adds extensibility and strictness to HTML by relying on XML. XML allows new modules to easily be incorporated into XHTML documents and XML has built in mechanisms for enforcing strict markup practices. Luckily, you don't actually learn any XML to use XHTML. And, importantly XHTML is mostly compatible with current HTML 4 browsers, if you follow a few simple rules you'll see later.
Choose Your XHTML Flavor
The current XHTML 1.0 specification defines 3 versions. These are:
- Strict - this makes use of CSS (Cascading Style Sheets) for formatting. While this requires a little more learning curve, if you code a lot of web pages, and especially if this is part of your job, you should make moving to Strict XHTML and CSS a new-term goal.
- Transitional - for browsers that don't understand CSS. Or if your pages are fairly simple and you won't be doing enough formatting to bother with CSS.
- Frameset - for use with documents that use HTML frames.
Accordingly, each XHTML document must begin with one of these three
DOCTYPE declarations corresponding to the 3 XHTML flavors. These are:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">
Other Required Elements
DOCTYPE declaration, you must have a root element defining the document as html, as in the following example.
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
And, your documents must include
Important Changes from HTML 4
First, well-formedness required, as in XML. Mostly importantly, this means proper nesting of elements. In HTML 4 would could get away with coding like this:
<b><i>This is bold italic text</b></i>
In this example, the bold and italic tags aren't properly nested because the
<i> needs the closing
</i> before we close the
</b>. So, in XHTML, this needs to be marked up as:
<b><i>This is bold italic text</i></b>
Second, XML is case sensitive, so tag names in XHTML must be lower case. Note that the
DOCTYPE definition precedes the beginning of the HTML though, so that declaration is properly capitalized as shown earlier.
Next, and this is a point where HTML versions have really let us get away with sloppy markup practices, XHTML requires end tags for all elements. No more
</p>. And list items must be closed with an
</li> as well. A closing tag is even required for empty elements such as
<br> although to close one of these tags, you need to do it a little differently to maintain backward compatibility with 4.01. To do this, write it as:
where the space before the ending / is important.
In XHTML, attribute values must always be surrounded by quotes. For example, the following line is acceptable in HTML, but not in XHTML:
<a href="http://www.w3.org/TR/xhtml1/" target=new> W3C XHTML Documentation</a>
In XHTML, add quotes around the
new attribute as such:
<a href="http://www.w3.org/TR/xhtml1/" target="new"> W3C XHTML Documentation</a>
XHTML also enforces prohibitions on deprecated elements. While HTML 4.01 would let you get away with using these even though they were no longer officially supported, none of the following tags are supported in XHTML:
<applet> </applet> <basefont /> <center> </center> <dir> </dir> <font> </font> <isindex /> <menu> </menu> <s> </s> <strike> </strike> <u> </u>
Transitional XHTML won't kick these deprecated elements out as unsupported but strict will.
Get the Details and Get Validated
This has just been an overview of XHTML and the key differences you need to know. To read the full spec with links to all of the compatibility guidelines, you'll want to go to the W3C XHTML Documentation
The W3C also has a useful Validator service. With this you can point the validator to an URL or upload a file from your computer for it to check to see if your XHTML is valid.
In order to make good use of the validator, there are a couple of things you need to know about the results. First, for this line of code:
<TITLE>Term of the Week: XHTML</TITLE>
it will point you to what it sees as the error like this:
Line 8, column 6: element "TITLE" undefined
It's up to you to figure out that in this case, the problem is TITLE in all caps instead of lower case.
Second, it's "strongly recommended" by the W3C that all XHTML documents begin with an XML declaration. The Validator won't work in fact if you don't include one. So, begin your XHTML documents with an XML declaration before the DOCTYPE declaration like this:
<?xml version="1.0" encoding="UTF-8"?>
If you are using a different document encoding than UTF-8, you'll change that attribute.
# # #
Jim Minatel is a freelance writer for Developer.com in addition to working with Wiley and WROX publishing.