December 19, 2014
Hot Topics:

Creating Valid XML Documents: DTDs

  • January 16, 2004
  • By Steven Holzner
  • Send Email »
  • More Articles »

Validating a Document by Using a DTD

Before you create DTDs of the kind shown in Listing 1, let's take a look at how to use DTDs to check an XML document's validity by using an XML validator. One of the easiest to use is the Scholarly Technology Group's XML validator at Brown University, http://www.stg.brown.edu/service/xmlvalid; although it's online, it lets you browse to XML documents on your hard drive to check them. Figure 1 shows the results of validating the first DTD example in Listing 1; as we can see, the document validates correctly.

Figure 1
Validating an XML document by using a DTD.

On the other hand, say that our data-entry team made a mistake and someone typed <nane> instead of <name> in an element:

<document>
  <employee>
    <nane>
      <lastname>Kelly</lastname>
      <firstname>Grace</firstname>
    </name>
    <hiredate>October 15, 2005</hiredate>
    <projects>
      <project>
        <product>Printer</product>
        <id>111</id>
        <price>$111.00</price>
      </project>
      <project>
        <product>Laptop</product>
        <id>222</id>
        <price>$989.00</price>
      </project>
    </projects>
  </employee>
    .
    .
    .

This error would not be easy to catch if you were trying to check all 5,000 employee records by eye, but it's no problem at all for an XML validator. Figure 2 shows how the Scholarly Technology Group's XML validator catches this error and others.

Figure 2
Catching an error in an XML document by using a DTD.


Tip - Can a browser such as Internet Explorer use DTDs to validate XML documents? Yes, but not by default. By default, Internet Explorer can use XML schemas and displays the results when loading a document. But if we want to validate by using DTDs in Internet Explorer, we can only check whether the validation went well by using a scripting language such as JavaScript.


Let's start creating DTDs like the one shown in Listing 1>. You've seen that a DTD goes in a <!DOCTYPE> element, but what does the actual DTD itself look like? The first step in creating that DTD is to declare the elements that appear in the XML document, as described in the following section.

Creating Element Content Models

To declare the syntax of an element in a DTD, we use the <!ELEMENT> element like this: <!ELEMENT name content_model>. In this syntax, name is the name of the element we're declaring and content_model is the content model of the element. A content model indicates what content the element is allowed to have—for example, you can allow child elements or text data, or you can make the element empty by using the EMPTY keyword, or you can allow any content by using the ANY keyword, as you'll soon see. Here's how to declare the <document> element in Listing 1:

<!DOCTYPE document [ 
<!ELEMENT document (employee)*> 
    .
    .
    .
]> 

This <!ELEMENT> element not only declares the <document> element, but it also says that the <document> element may contain <employee> elements. When you declare an element in this way, you also specify what contents that element can legally contain; the syntax for doing that is a little involved. The following sections dissect that syntax, taking a look at how to specify the content model of elements, starting with the least restrictive content model of all—ANY, which allows any content at all.

Handling Any Content

If you give an element the content model ANY, that element can contain any content, which means any elements and/or any character data. What this really means is that you're turning off validation for this element because the contents of elements with the content model ANY are not even checked. Here's how to specify the content model ANY for an element named <document>:

<!DOCTYPE document [ 
<!ELEMENT document ANY> 
    .
    .
    .
]> 

As far as the XML validator is concerned, this just turns off validation for the <document> element. It's usually not a good idea to turn off validation, but you might want to turn off validation for specific elements, for example, if you want to debug a DTD that's not working. It's usually far preferable to actually list the contents you want to allow in an element, such as any possible child elements the element can contain.

Specifying Child Elements

You can specify what child elements an element can contain in that element's content model. For example, you can specify that an element can contain another element by explicitly listing the name of the contained element in parentheses, like this:

<!DOCTYPE document [ 
<!ELEMENT document (employee)*> 
    .
    .
    .
]> 

This specifies that a <document> element can contain <employee> elements. The * here means that a <document> element can contain any number (including zero) <employee> elements. (We'll talk about what other possibilities besides * are available in a few pages.) With this line in a DTD, you can now start placing an <employee> element or elements inside a <document> element, this way:

<?xml version = "1.0" standalone="yes"?>
<!DOCTYPE document [ 
<!ELEMENT document (employee)*> 
]> 
<document>
  <employee>
    .
    .
    .
  </employee>
</document>

Note, however, that this is no longer a valid XML document because you haven't specified the syntax for individual <employee> elements. Because <employee> elements can contain <name>, <hiredate>, and <projects> elements, in that order, you can specify a content model for <employee> elements this way:

<?xml version = "1.0" standalone="yes"?>
<!DOCTYPE document [ 
<!ELEMENT document (employee)*> 
<!ELEMENT employee (name, hiredate, projects)> 
<!ELEMENT name (lastname, firstname)> 
<document>
  <employee>
    <name>
      <lastname>Kelly</lastname>
      <firstname>Grace</firstname>
    </name>
    <hiredate>October 15, 2005</hiredate>
    <projects>
      <project>
        <product>Printer</product>
        <id>111</id>
        <price>$111.00</price>
      </project>
      <project>
        <product>Laptop</product>
        <id>222</id>
        <price>$989.00</price>
      </project>
    </projects>
  </employee>
</document>

Listing multiple elements in a content model this way is called creating a sequence. You use commas to separate the elements you want to have appear, and then the elements have to appear in that sequence in our XML document. For example, if you declare this sequence in the DTD:

<!ELEMENT employee (name, hiredate, projects)> 

then inside an <employee> element, the <name> element must come first, followed by the <hiredate> element, followed by the <projects> element, like this:

<employee>
  <name>
    <lastname>Kelly</lastname>
    <firstname>Grace</firstname>
  </name>
  <hiredate>October 15, 2005</hiredate>
  <projects>
    <project>
      <product>Printer</product>
      <id>111</id>
      <price>$111.00</price>
    </project>
    <project>
      <product>Laptop</product>
      <id>222</id>
      <price>$989.00</price>
    </project>
  </projects>
</employee>

This example introduces a whole new set of elements—<name>, <hiredate>, <lastname>, and so on—that don't contain other elements at all—they contain text. So how can you specify that an element contains text? Read on.





Page 2 of 4



Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 


Enterprise Development Update

Don't miss an article. Subscribe to our newsletter below.

Sitemap | Contact Us

Rocket Fuel