Unlike with HTML, where a browser can check HTML because it knows all about legal HTML, you create your own markup in XML, which means that an XML processor can’t check your markup unless you let it know how to. In XML, you define what’s legal and what’s not by specifying the syntax you’re going to allow for an XML document. There are two ways to validate XML documents—with document type definitions (DTDs) and with XML schemas. This article will focus on DTDs. For more information on schemas please see my book Sams Teach Yourself XML in 21 Days, Third Edition.
DTDs provided the original way to validate XML documents, and the syntax for DTDs is built right in to the XML 1.0 specification. Tons of XML processors out there use DTDs in XML documents, and DTDs are the first step in any discussion on validation.
All About DTDs
While an XML document needs to be well-formed to be considered a true XML document, that’s only part of the story. In real life, we also need to give an XML processor some way of checking the syntax (also called the grammar) of an XML document to make sure the data remains intact. For example, take a look at the XML document that contains data about employees:
<?xml version = "1.0" standalone="yes"?> <document> <employee> <name> <lastname>Kelly</lastname> <firstname>Grace</firstname> </name> <hiredate>October 15, 2005</hiredate> <projects> <project> <product>Printer</product> <id>111</id> <price>$111.00</price> </project> <project> <product>Laptop</product> <id>222</id> <price>$989.00</price> </project> </projects> </employee> . . . </document>
Say we’ve expanded to 5,000 employees, and that we have a team of typists typing in all that employee data. The likelihood is high that there are going to be errors in all that data entry. But how will an XML processor know that a <project> element must contain at least one <product> element unless we tell it so? How do we tell an XML processor that each <employee> element must contain one <name> element? To do this and more, we can use a DTD. DTDs are all about specifying the structure of an XML document, not the data in that document. The formal rules for DTDs are available in the XML 1.0 recommendation, http://www.w3.org/TR/REC-xml. (Note that the XML 1.1 candidate recommendation has nothing to add about DTDs as of this writing.)
We define the syntax of an XML document by using a DTD, and we declare that definition in a document by using a document type declaration. We can use a <!DOCTYPE> element to create a DTD, and the DTD appears in that element. The element can take many different forms, including the following (where URI is the URI of a DTD outside the current XML document and rootname is the name of the root element) :
-
<!DOCTYPE rootname [DTD]>
-
<!DOCTYPE rootname SYSTEM URI>
-
<!DOCTYPE rootname SYSTEM URI [DTD]>
-
<!DOCTYPE rootname PUBLIC identifier URI>
-
<!DOCTYPE rootname PUBLIC identifier URI [DTD]>
To use a DTD, we need a DTD, which means we need a <!DOCTYPE> element. The <!DOCTYPE> element is part of a document’s prolog. For example, here’s how we would add a <!DOCTYPE> element to the employees example:
<?xml version = "1.0" standalone="yes"?> <!DOCTYPE document [ . . <!-- DTD goes here! --> . . ]> <document> <employee> <name> <lastname>Kelly</lastname> <firstname>Grace</firstname> </name> <hiredate>October 15, 2005</hiredate> <projects> <project> <product>Printer</product> <id>111</id> <price>$111.00</price> </project> <project> <product>Laptop</product> <id>222</id> <price>$989.00</price> </project> </projects> </employee> . . . </document>
So what does a DTD look like? The actual XML syntax for DTDs is pretty terse, so this article’s discussion is dedicated to unraveling that terseness. To get started, Listing 1 shows a full <!DOCTYPE> element that contains a DTD for the employee document. We’re going to dissect that DTD here.
Listing 1: A Sample XML Document with a DTD
<?xml version = "1.0" standalone="yes"?> <!DOCTYPE document [ <!ELEMENT document (employee)*> <!ELEMENT employee (name, hiredate, projects)> <!ELEMENT name (lastname, firstname)> <!ELEMENT lastname (#PCDATA)> <!ELEMENT firstname (#PCDATA)> <!ELEMENT hiredate (#PCDATA)> <!ELEMENT projects (project)*> <!ELEMENT project (product,id,price)> <!ELEMENT product (#PCDATA)> <!ELEMENT id (#PCDATA)> <!ELEMENT price (#PCDATA)> ] > <document> <employee> <name> <lastname>Kelly</lastname> <firstname>Grace</firstname> </name> <hiredate>October 15, 2005</hiredate> <projects> <project> <product>Printer</product> <id>111</id> <price>$111.00</price> </project> <project> <product>Laptop</product> <id>222</id> <price>$989.00</price> </project> </projects> </employee> <employee> <name> <lastname>Grant</lastname> <firstname>Cary</firstname> </name> <hiredate>October 20, 2005</hiredate> <projects> <project> <product>Desktop</product> <id>333</id> <price>$2995.00</price> </project> <project> <product>Scanner</product> <id>444</id> <price>$200.00</price> </project> </projects> </employee> <employee> <name> <lastname>Gable</lastname> <firstname>Clark</firstname> </name> <hiredate>October 25, 2005</hiredate> <projects> <project> <product>Keyboard</product> <id>555</id> <price>$129.00</price> </project> <project> <product>Mouse</product> <id>666</id> <price>$25.00</price> </project> </projects> </employee> </document>
Validating a Document by Using a DTD
Before you create DTDs of the kind shown in Listing 1, let’s take a look at how to use DTDs to check an XML document’s validity by using an XML validator. One of the easiest to use is the Scholarly Technology Group’s XML validator at Brown University, http://www.stg.brown.edu/service/xmlvalid; although it’s online, it lets you browse to XML documents on your hard drive to check them. Figure 1 shows the results of validating the first DTD example in Listing 1; as we can see, the document validates correctly. (Edit: The above link to brown.edu’s page is no longer available as of the end of 2015)
Figure 1
Validating an XML document by using a DTD.
On the other hand, say that our data-entry team made a mistake and someone typed <nane> instead of <name> in an element:
<document> <employee> <nane> <lastname>Kelly</lastname> <firstname>Grace</firstname> </name> <hiredate>October 15, 2005</hiredate> <projects> <project> <product>Printer</product> <id>111</id> <price>$111.00</price> </project> <project> <product>Laptop</product> <id>222</id> <price>$989.00</price> </project> </projects> </employee> . . .
This error would not be easy to catch if you were trying to check all 5,000 employee records by eye, but it’s no problem at all for an XML validator. Figure 2 shows how the Scholarly Technology Group’s XML validator catches this error and others.
Figure 2
Catching an error in an XML document by using a DTD.