July 29, 2014
Hot Topics:
RSS RSS feed Download our iPhone app

Creating Valid XML Documents: DTDs

  • January 16, 2004
  • By Steven Holzner
  • Send Email »
  • More Articles »

Unlike with HTML, where a browser can check HTML because it knows all about legal HTML, you create your own markup in XML, which means that an XML processor can't check your markup unless you let it know how to. In XML, you define what's legal and what's not by specifying the syntax you're going to allow for an XML document. There are two ways to validate XML documents—with document type definitions (DTDs) and with XML schemas. This article will focus on DTDs. For more information on schemas please see my book Sams Teach Yourself XML in 21 Days, Third Edition.

DTDs provided the original way to validate XML documents, and the syntax for DTDs is built right in to the XML 1.0 specification. Tons of XML processors out there use DTDs in XML documents, and DTDs are the first step in any discussion on validation.

All About DTDs

While an XML document needs to be well-formed to be considered a true XML document, that's only part of the story. In real life, we also need to give an XML processor some way of checking the syntax (also called the grammar) of an XML document to make sure the data remains intact. For example, take a look at the XML document that contains data about employees:

<?xml version = "1.0" standalone="yes"?>
<document>
  <employee>
    <name>
      <lastname>Kelly</lastname>
      <firstname>Grace</firstname>
    </name>
    <hiredate>October 15, 2005</hiredate>
    <projects>
      <project>
        <product>Printer</product>
        <id>111</id>
        <price>$111.00</price>
      </project>
      <project>
        <product>Laptop</product>
        <id>222</id>
        <price>$989.00</price>
      </project>
    </projects>
  </employee>
    .
    .
    .
</document>

Say we've expanded to 5,000 employees, and that we have a team of typists typing in all that employee data. The likelihood is high that there are going to be errors in all that data entry. But how will an XML processor know that a <project> element must contain at least one <product> element unless we tell it so? How do we tell an XML processor that each <employee> element must contain one <name> element? To do this and more, we can use a DTD. DTDs are all about specifying the structure of an XML document, not the data in that document. The formal rules for DTDs are available in the XML 1.0 recommendation, http://www.w3.org/TR/REC-xml. (Note that the XML 1.1 candidate recommendation has nothing to add about DTDs as of this writing.)

We define the syntax of an XML document by using a DTD, and we declare that definition in a document by using a document type declaration. We can use a <!DOCTYPE> element to create a DTD, and the DTD appears in that element. The element can take many different forms, including the following (where URI is the URI of a DTD outside the current XML document and rootname is the name of the root element) :

  • <!DOCTYPE rootname [DTD]>

  • <!DOCTYPE rootname SYSTEM URI>

  • <!DOCTYPE rootname SYSTEM URI [DTD]>

  • <!DOCTYPE rootname PUBLIC identifier URI>

  • <!DOCTYPE rootname PUBLIC identifier URI [DTD]>

To use a DTD, we need a DTD, which means we need a <!DOCTYPE> element. The <!DOCTYPE> element is part of a document's prolog. For example, here's how we would add a <!DOCTYPE> element to the employees example:

<?xml version = "1.0" standalone="yes"?>
<!DOCTYPE document [ 
    .
    .
  <!-- DTD goes here! -->
    .
    .
]> 
<document>
  <employee>
    <name>
      <lastname>Kelly</lastname>
      <firstname>Grace</firstname>
    </name>
    <hiredate>October 15, 2005</hiredate>
    <projects>
      <project>
        <product>Printer</product>
        <id>111</id>
        <price>$111.00</price>
      </project>
      <project>
        <product>Laptop</product>
        <id>222</id>
        <price>$989.00</price>
      </project>
    </projects>
  </employee>
    .
    .
    .
</document>

So what does a DTD look like? The actual XML syntax for DTDs is pretty terse, so this article's discussion is dedicated to unraveling that terseness. To get started, Listing 1 shows a full <!DOCTYPE> element that contains a DTD for the employee document. We're going to dissect that DTD here.

Listing 1: A Sample XML Document with a DTD

<?xml version = "1.0" standalone="yes"?>
<!DOCTYPE document [ 
<!ELEMENT document (employee)*> 
<!ELEMENT employee (name, hiredate, projects)> 
<!ELEMENT name (lastname, firstname)> 
<!ELEMENT lastname (#PCDATA)> 
<!ELEMENT firstname (#PCDATA)> 
<!ELEMENT hiredate (#PCDATA)> 
<!ELEMENT projects (project)*> 
<!ELEMENT project (product,id,price)> 
<!ELEMENT product (#PCDATA)> 
<!ELEMENT id (#PCDATA)> 
<!ELEMENT price (#PCDATA)> 
] > 
<document>
  <employee>
    <name>
      <lastname>Kelly</lastname>
      <firstname>Grace</firstname>
    </name>
    <hiredate>October 15, 2005</hiredate>
    <projects>
      <project>
        <product>Printer</product>
        <id>111</id>
        <price>$111.00</price>
      </project>
      <project>
        <product>Laptop</product>
        <id>222</id>
        <price>$989.00</price>
      </project>
    </projects>
  </employee>
  <employee>
    <name>
      <lastname>Grant</lastname>
      <firstname>Cary</firstname>
    </name>
    <hiredate>October 20, 2005</hiredate>
    <projects>
      <project>
        <product>Desktop</product>
        <id>333</id>
        <price>$2995.00</price>
      </project>
      <project>
        <product>Scanner</product>
        <id>444</id>
        <price>$200.00</price>
      </project>
    </projects>
  </employee>
  <employee>
    <name>
      <lastname>Gable</lastname>
      <firstname>Clark</firstname>
    </name>
    <hiredate>October 25, 2005</hiredate>
    <projects>
      <project>
        <product>Keyboard</product>
        <id>555</id>
        <price>$129.00</price>
      </project>
      <project>
        <product>Mouse</product>
        <id>666</id>
        <price>$25.00</price>
      </project>
    </projects>
  </employee>
</document>




Page 1 of 4



Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 


Sitemap | Contact Us

Rocket Fuel