http://www.developer.com/

Back to article

Creating Valid XML Documents: DTDs


January 16, 2004

Unlike with HTML, where a browser can check HTML because it knows all about legal HTML, you create your own markup in XML, which means that an XML processor can't check your markup unless you let it know how to. In XML, you define what's legal and what's not by specifying the syntax you're going to allow for an XML document. There are two ways to validate XML documents—with document type definitions (DTDs) and with XML schemas. This article will focus on DTDs. For more information on schemas please see my book Sams Teach Yourself XML in 21 Days, Third Edition.

DTDs provided the original way to validate XML documents, and the syntax for DTDs is built right in to the XML 1.0 specification. Tons of XML processors out there use DTDs in XML documents, and DTDs are the first step in any discussion on validation.

All About DTDs

While an XML document needs to be well-formed to be considered a true XML document, that's only part of the story. In real life, we also need to give an XML processor some way of checking the syntax (also called the grammar) of an XML document to make sure the data remains intact. For example, take a look at the XML document that contains data about employees:

<?xml version = "1.0" standalone="yes"?>
<document>
  <employee>
    <name>
      <lastname>Kelly</lastname>
      <firstname>Grace</firstname>
    </name>
    <hiredate>October 15, 2005</hiredate>
    <projects>
      <project>
        <product>Printer</product>
        <id>111</id>
        <price>$111.00</price>
      </project>
      <project>
        <product>Laptop</product>
        <id>222</id>
        <price>$989.00</price>
      </project>
    </projects>
  </employee>
    .
    .
    .
</document>

Say we've expanded to 5,000 employees, and that we have a team of typists typing in all that employee data. The likelihood is high that there are going to be errors in all that data entry. But how will an XML processor know that a <project> element must contain at least one <product> element unless we tell it so? How do we tell an XML processor that each <employee> element must contain one <name> element? To do this and more, we can use a DTD. DTDs are all about specifying the structure of an XML document, not the data in that document. The formal rules for DTDs are available in the XML 1.0 recommendation, http://www.w3.org/TR/REC-xml. (Note that the XML 1.1 candidate recommendation has nothing to add about DTDs as of this writing.)

We define the syntax of an XML document by using a DTD, and we declare that definition in a document by using a document type declaration. We can use a <!DOCTYPE> element to create a DTD, and the DTD appears in that element. The element can take many different forms, including the following (where URI is the URI of a DTD outside the current XML document and rootname is the name of the root element) :

  • <!DOCTYPE rootname [DTD]>

  • <!DOCTYPE rootname SYSTEM URI>

  • <!DOCTYPE rootname SYSTEM URI [DTD]>

  • <!DOCTYPE rootname PUBLIC identifier URI>

  • <!DOCTYPE rootname PUBLIC identifier URI [DTD]>

To use a DTD, we need a DTD, which means we need a <!DOCTYPE> element. The <!DOCTYPE> element is part of a document's prolog. For example, here's how we would add a <!DOCTYPE> element to the employees example:

<?xml version = "1.0" standalone="yes"?>
<!DOCTYPE document [ 
    .
    .
  <!-- DTD goes here! -->
    .
    .
]> 
<document>
  <employee>
    <name>
      <lastname>Kelly</lastname>
      <firstname>Grace</firstname>
    </name>
    <hiredate>October 15, 2005</hiredate>
    <projects>
      <project>
        <product>Printer</product>
        <id>111</id>
        <price>$111.00</price>
      </project>
      <project>
        <product>Laptop</product>
        <id>222</id>
        <price>$989.00</price>
      </project>
    </projects>
  </employee>
    .
    .
    .
</document>

So what does a DTD look like? The actual XML syntax for DTDs is pretty terse, so this article's discussion is dedicated to unraveling that terseness. To get started, Listing 1 shows a full <!DOCTYPE> element that contains a DTD for the employee document. We're going to dissect that DTD here.

Listing 1: A Sample XML Document with a DTD

<?xml version = "1.0" standalone="yes"?>
<!DOCTYPE document [ 
<!ELEMENT document (employee)*> 
<!ELEMENT employee (name, hiredate, projects)> 
<!ELEMENT name (lastname, firstname)> 
<!ELEMENT lastname (#PCDATA)> 
<!ELEMENT firstname (#PCDATA)> 
<!ELEMENT hiredate (#PCDATA)> 
<!ELEMENT projects (project)*> 
<!ELEMENT project (product,id,price)> 
<!ELEMENT product (#PCDATA)> 
<!ELEMENT id (#PCDATA)> 
<!ELEMENT price (#PCDATA)> 
] > 
<document>
  <employee>
    <name>
      <lastname>Kelly</lastname>
      <firstname>Grace</firstname>
    </name>
    <hiredate>October 15, 2005</hiredate>
    <projects>
      <project>
        <product>Printer</product>
        <id>111</id>
        <price>$111.00</price>
      </project>
      <project>
        <product>Laptop</product>
        <id>222</id>
        <price>$989.00</price>
      </project>
    </projects>
  </employee>
  <employee>
    <name>
      <lastname>Grant</lastname>
      <firstname>Cary</firstname>
    </name>
    <hiredate>October 20, 2005</hiredate>
    <projects>
      <project>
        <product>Desktop</product>
        <id>333</id>
        <price>$2995.00</price>
      </project>
      <project>
        <product>Scanner</product>
        <id>444</id>
        <price>$200.00</price>
      </project>
    </projects>
  </employee>
  <employee>
    <name>
      <lastname>Gable</lastname>
      <firstname>Clark</firstname>
    </name>
    <hiredate>October 25, 2005</hiredate>
    <projects>
      <project>
        <product>Keyboard</product>
        <id>555</id>
        <price>$129.00</price>
      </project>
      <project>
        <product>Mouse</product>
        <id>666</id>
        <price>$25.00</price>
      </project>
    </projects>
  </employee>
</document>

Validating a Document by Using a DTD

Before you create DTDs of the kind shown in Listing 1, let's take a look at how to use DTDs to check an XML document's validity by using an XML validator. One of the easiest to use is the Scholarly Technology Group's XML validator at Brown University, http://www.stg.brown.edu/service/xmlvalid; although it's online, it lets you browse to XML documents on your hard drive to check them. Figure 1 shows the results of validating the first DTD example in Listing 1; as we can see, the document validates correctly.

Figure 1
Validating an XML document by using a DTD.

On the other hand, say that our data-entry team made a mistake and someone typed <nane> instead of <name> in an element:

<document>
  <employee>
    <nane>
      <lastname>Kelly</lastname>
      <firstname>Grace</firstname>
    </name>
    <hiredate>October 15, 2005</hiredate>
    <projects>
      <project>
        <product>Printer</product>
        <id>111</id>
        <price>$111.00</price>
      </project>
      <project>
        <product>Laptop</product>
        <id>222</id>
        <price>$989.00</price>
      </project>
    </projects>
  </employee>
    .
    .
    .

This error would not be easy to catch if you were trying to check all 5,000 employee records by eye, but it's no problem at all for an XML validator. Figure 2 shows how the Scholarly Technology Group's XML validator catches this error and others.

Figure 2
Catching an error in an XML document by using a DTD.


Tip - Can a browser such as Internet Explorer use DTDs to validate XML documents? Yes, but not by default. By default, Internet Explorer can use XML schemas and displays the results when loading a document. But if we want to validate by using DTDs in Internet Explorer, we can only check whether the validation went well by using a scripting language such as JavaScript.


Let's start creating DTDs like the one shown in Listing 1>. You've seen that a DTD goes in a <!DOCTYPE> element, but what does the actual DTD itself look like? The first step in creating that DTD is to declare the elements that appear in the XML document, as described in the following section.

Creating Element Content Models

To declare the syntax of an element in a DTD, we use the <!ELEMENT> element like this: <!ELEMENT name content_model>. In this syntax, name is the name of the element we're declaring and content_model is the content model of the element. A content model indicates what content the element is allowed to have—for example, you can allow child elements or text data, or you can make the element empty by using the EMPTY keyword, or you can allow any content by using the ANY keyword, as you'll soon see. Here's how to declare the <document> element in Listing 1:

<!DOCTYPE document [ 
<!ELEMENT document (employee)*> 
    .
    .
    .
]> 

This <!ELEMENT> element not only declares the <document> element, but it also says that the <document> element may contain <employee> elements. When you declare an element in this way, you also specify what contents that element can legally contain; the syntax for doing that is a little involved. The following sections dissect that syntax, taking a look at how to specify the content model of elements, starting with the least restrictive content model of all—ANY, which allows any content at all.

Handling Any Content

If you give an element the content model ANY, that element can contain any content, which means any elements and/or any character data. What this really means is that you're turning off validation for this element because the contents of elements with the content model ANY are not even checked. Here's how to specify the content model ANY for an element named <document>:

<!DOCTYPE document [ 
<!ELEMENT document ANY> 
    .
    .
    .
]> 

As far as the XML validator is concerned, this just turns off validation for the <document> element. It's usually not a good idea to turn off validation, but you might want to turn off validation for specific elements, for example, if you want to debug a DTD that's not working. It's usually far preferable to actually list the contents you want to allow in an element, such as any possible child elements the element can contain.

Specifying Child Elements

You can specify what child elements an element can contain in that element's content model. For example, you can specify that an element can contain another element by explicitly listing the name of the contained element in parentheses, like this:

<!DOCTYPE document [ 
<!ELEMENT document (employee)*> 
    .
    .
    .
]> 

This specifies that a <document> element can contain <employee> elements. The * here means that a <document> element can contain any number (including zero) <employee> elements. (We'll talk about what other possibilities besides * are available in a few pages.) With this line in a DTD, you can now start placing an <employee> element or elements inside a <document> element, this way:

<?xml version = "1.0" standalone="yes"?>
<!DOCTYPE document [ 
<!ELEMENT document (employee)*> 
]> 
<document>
  <employee>
    .
    .
    .
  </employee>
</document>

Note, however, that this is no longer a valid XML document because you haven't specified the syntax for individual <employee> elements. Because <employee> elements can contain <name>, <hiredate>, and <projects> elements, in that order, you can specify a content model for <employee> elements this way:

<?xml version = "1.0" standalone="yes"?>
<!DOCTYPE document [ 
<!ELEMENT document (employee)*> 
<!ELEMENT employee (name, hiredate, projects)> 
<!ELEMENT name (lastname, firstname)> 
<document>
  <employee>
    <name>
      <lastname>Kelly</lastname>
      <firstname>Grace</firstname>
    </name>
    <hiredate>October 15, 2005</hiredate>
    <projects>
      <project>
        <product>Printer</product>
        <id>111</id>
        <price>$111.00</price>
      </project>
      <project>
        <product>Laptop</product>
        <id>222</id>
        <price>$989.00</price>
      </project>
    </projects>
  </employee>
</document>

Listing multiple elements in a content model this way is called creating a sequence. You use commas to separate the elements you want to have appear, and then the elements have to appear in that sequence in our XML document. For example, if you declare this sequence in the DTD:

<!ELEMENT employee (name, hiredate, projects)> 

then inside an <employee> element, the <name> element must come first, followed by the <hiredate> element, followed by the <projects> element, like this:

<employee>
  <name>
    <lastname>Kelly</lastname>
    <firstname>Grace</firstname>
  </name>
  <hiredate>October 15, 2005</hiredate>
  <projects>
    <project>
      <product>Printer</product>
      <id>111</id>
      <price>$111.00</price>
    </project>
    <project>
      <product>Laptop</product>
      <id>222</id>
      <price>$989.00</price>
    </project>
  </projects>
</employee>

This example introduces a whole new set of elements—<name>, <hiredate>, <lastname>, and so on—that don't contain other elements at all—they contain text. So how can you specify that an element contains text? Read on.

Handling Text Content

In the preceding section's example, the <name>, <hiredate>, and <lastname> elements contain text data. In DTDs, non-markup text is considered parsed character data (in other words, text that has already been parsed, which means the XML processor shouldn't touch that text because it doesn't contain markup). In a DTD, we refer to parsed character data as #PCDATA. Note that this is the only way to refer to text data in a DTD—you can't say anything about the actual format of the text, although that might be important if you're dealing with numbers. In fact, this lack of precision is one of the reasons that XML schemas were introduced.

Here's how to give the text-containing elements in the PCDATA content model example:

<?xml version = "1.0" standalone="yes"?>
<!DOCTYPE document [ 
<!ELEMENT document (employee)*> 
<!ELEMENT employee (name, hiredate, projects)> 
<!ELEMENT name (lastname, firstname)> 
<!ELEMENT lastname (#PCDATA)> 
<!ELEMENT firstname (#PCDATA)> 
<!ELEMENT hiredate (#PCDATA)> 
<!ELEMENT projects (project)*> 
<!ELEMENT project (product,id,price)> 
<!ELEMENT product (#PCDATA)> 
<!ELEMENT id (#PCDATA)> 
<!ELEMENT price (#PCDATA)> 
]> 
<document>
  <employee>
    <name>
      <lastname>Kelly</lastname>
      <firstname>Grace</firstname>
    </name>
    <hiredate>October 15, 2005</hiredate>
    <projects>
      <project>
        <product>Printer</product>
        <id>111</id>
        <price>$111.00</price>
      </project>
      <project>
        <product>Laptop</product>
        <id>222</id>
        <price>$989.00</price>
      </project>
    </projects>
  </employee>
</document>

Note - Can you mix elements and PCDATA in the same content model? Yes, you can. This is called a mixed content model, and you'll see how to work with such models in a few pages.


You're almost done with the sample DTD—except for the * symbol. The following section takes a look at * and the other possible symbols to use.

Specifying Multiple Child Elements

There are a number of options for declaring an element that can contain child elements. You can declare the element to contain a single child element:

<!ELEMENT document (employee)> 

You can declare the element to contain a list of child elements, in order:

<!ELEMENT document (employee, contractor, partner)> 

You can also use symbols with special meanings in DTDs, such as *, which means "zero or more of," as in this example, where you're allowing zero or more <employee> elements in a <document> element:

<!ELEMENT document (employee)*> 

There are a number of other ways of specifying multiple children by using symbols. (This syntax is actually borrowed from regular expression handling in the Perl language, so if you know that language, you have a leg up here.) Here are the possibilities:

  • x+—Means x can appear one or more times.

  • x*—Means x can appear zero or more times.

  • x?—Means x can appear once or not at all.

  • x, y—Means x followed by y.

  • x | y—Means x or y—but not both.

The following sections take a look at these options.

Allowing One or More Children

You might want to specify that a <document> element can contain between 200 and 250 <employee> elements, and if you do, you're out of luck with DTDs because DTD syntax doesn't give us that kind of precision. On the other hand, you still do have some control here; for example, you can specify that a <document> element must contain one or more <employee> elements if you use a + symbol, like this:

<!ELEMENT document (employee)+> 

Here, the XML processor is being told that a <document> element has to contain at least one <employee> element.

Allowing Zero or More Children

By using a DTD, you can use the * symbol to specify that you want an element to contain any number of child elements—that is, zero or more child elements. You saw this in action earlier, when you specified that the <document> element may contain <employee> elements in the Listing 1 example:

<!ELEMENT document (employee)*> 
Allowing Zero or One Child

When using a DTD, you can use ? to specify zero or one child elements. Using ? indicates that a particular child element may be present once in the element you're declaring, but it need not be. For example, here's how to indicate that a <document> element may contain zero or one <employee> elements:

<!ELEMENT document (employee)?> 
Using +, *, and ? in Sequences

You can use the +, *, and ? symbols in content model sequences. For example, here's how you might specify that there can be one or more <name> elements for an employee, an optional <hiredate> element, and any number of <project> elements:

<?xml version = "1.0" standalone="yes"?>
<!DOCTYPE document [ 
<!ELEMENT document (employee)*> 
<!ELEMENT employee (name+, hiredate?, projects*)> 
<!ELEMENT name (lastname, firstname)> 
<!ELEMENT lastname (#PCDATA)> 
<!ELEMENT firstname (#PCDATA)> 
<!ELEMENT hiredate (#PCDATA)> 
<!ELEMENT projects (project)*> 
<!ELEMENT project (product,id,price)> 
<!ELEMENT product (#PCDATA)> 
<!ELEMENT id (#PCDATA)> 
<!ELEMENT price (#PCDATA)> 
]> 
<document>
  <employee>
    <name>
      <lastname>Kelly</lastname>
      <firstname>Grace</firstname>
    </name>
    <hiredate>October 15, 2005</hiredate>
    <projects>
      <project>
        <product>Printer</product>
        <id>111</id>
        <price>$111.00</price>
      </project>
      <project>
        <product>Laptop</product>
        <id>222</id>
        <price>$989.00</price>
      </project>
    </projects>
  </employee>
</document>

Using +, *, and ? inside sequences provides a lot of flexibility because it means you can specify how many times an element can appear in a sequence—and even whether the element can be absent altogether.

In fact, you can get even more powerful results by using the +, *, and ? operators inside sequences. By using parentheses, we can create subsequences—that is, sequences inside sequences. For example, say that we wanted to allow each employee to list multiple names (including nicknames and so on), possibly list his or her age, and give multiple phone numbers. You can do that by using the subsequence shown in Listing 2.

Listing 2 A Sample XML Document That Uses Subsequences in a DTD

<?xml version = "1.0" encoding="UTF-8" standalone="yes"?>
<!DOCTYPE document [ 
<!ELEMENT document (employee)*> 
<!ELEMENT employee ((name, age?, phone*)+, hiredate, projects)> 
<!ELEMENT name (lastname, firstname)> 
<!ELEMENT lastname (#PCDATA)> 
<!ELEMENT firstname (#PCDATA)> 
<!ELEMENT hiredate (#PCDATA)> 
<!ELEMENT projects (project)*> 
<!ELEMENT project (product,id,price)> 
<!ELEMENT product (#PCDATA)> 
<!ELEMENT age (#PCDATA)> 
<!ELEMENT phone (#PCDATA)> 
<!ELEMENT id (#PCDATA)> 
<!ELEMENT price (#PCDATA)> 
]> 
<document>
  <employee>
    <name>
      <lastname>Kelly</lastname>
      <firstname>Grace</firstname>
    </name>
    <phone>
      555.2345
    </phone>
    <hiredate>October 15, 2005</hiredate>
    <projects>
      <project>
        <product>Printer</product>
        <id>111</id>
        <price>$111.00</price>
      </project>
      <project>
        <product>Laptop</product>
        <id>222</id>
        <price>$989.00</price>
      </project>
    </projects>
  </employee>
  <employee>
    <name>
      <lastname>Grant</lastname>
      <firstname>Cary</firstname>
    </name>
    <age>
      32
    </age>
    <phone>
      555.2346
    </phone>
    <hiredate>October 20, 2005</hiredate>
    <projects>
      <project>
        <product>Desktop</product>
        <id>333</id>
        <price>$2995.00</price>
      </project>
      <project>
        <product>Scanner</product>
        <id>444</id>
        <price>$200.00</price>
      </project>
    </projects>
  </employee>
  <employee>
    <name>
      <lastname>Gable</lastname>
      <firstname>Clark</firstname>
    </name>
    <age>
      46
    </age>
    <phone>
      555.2347
    </phone>
    <hiredate>October 25, 2005</hiredate>
    <projects>
      <project>
        <product>Keyboard</product>
        <id>555</id>
        <price>$129.00</price>
      </project>
      <project>
        <product>Mouse</product>
        <id>666</id>
        <price>$25.00</price>
      </project>
    </projects>
  </employee>
</document>

Getting creative when defining subsequences and using the +, *, and ? operators allows us to be extremely flexible in DTDs.

Allowing Choices

DTDs can support choices. By using a choice, we can specify one of a group of items. For example, if you want to specify that one (and only one) of either <x>, <y>, or <z> will appear, use a choice like this:

(x | y | z)

Listing 3 shows an example of using choices in the document. In that example, each product is allowed to contain either a <price> element or a <discountprice> element. To indicate that that's what you want, you only need to make this change to the DTD (as well as declare the new <discountprice> element):

<!ELEMENT project (product, id, (price | discountprice))> 

Listing 3 A Sample XML Document That Uses Choices in a DTD

<?xml version = "1.0" standalone="yes"?>
<!DOCTYPE document [ 
<!ELEMENT document (employee)*> 
<!ELEMENT employee (name, hiredate, projects)> 
<!ELEMENT name (lastname, firstname)> 
<!ELEMENT lastname (#PCDATA)> 
<!ELEMENT firstname (#PCDATA)> 
<!ELEMENT hiredate (#PCDATA)> 
<!ELEMENT projects (project)*> 
<!ELEMENT project (product, id, (price | discountprice))> 
<!ELEMENT product (#PCDATA)> 
<!ELEMENT id (#PCDATA)> 
<!ELEMENT price (#PCDATA) > 
<!ELEMENT discountprice (#PCDATA)> 
]> 
<document>
  <employee>
    <name>
      <lastname>Kelly</lastname>
      <firstname>Grace</firstname>
    </name>
    <hiredate>October 15, 2005</hiredate>
    <projects>
      <project>
        <product>Printer</product>
        <id>111</id>
        <discountprice>$111.00</discountprice>
      </project>
      <project>
        <product>Laptop</product>
        <id>222</id>
        <price>$989.00</price>
      </project>
    </projects>
  </employee>
    .
    .
    .
  <employee>
    <name>
      <lastname>Gable</lastname>
      <firstname>Clark</firstname>
    </name>
    <hiredate>October 25, 2005</hiredate>
    <projects>
      <project>
        <product>Keyboard</product>
        <id>555</id>
        <price>$129.00</price>
      </project>
      <project>
        <product>Mouse</product>
        <id>666</id>
        <discountprice>$25.00</discountprice>
      </project>
    </projects>
  </employee>
</document>

You can also use the +, *, and ? operators with choices. For example, to allow multiple discount prices and to insist that at least one element from the choice appear in the XML document, you can do something like this:

<!ELEMENT project (product, id, (price | discountprice*)+)> 

As you can see, there are plenty of options available when it comes to specifying elements or text content in DTDs (although XML schemas allow us to be even more precise, specifying numeric formats for numbers and so on). But what if we want a content model to let an element contain both elements and text? That's coming up next.

Allowing Mixed Content

When using a DTD, you can allow an element to contain text or child elements, giving it a mixed content model. Note that even with a mixed content model, an element can't contain child elements and text data at the same level at the same time (unless you use the content model ANY). For example, this doesn't work:

<product>
  Keyboard
  <stocknumber>1113</stocknumber>
<product>

However, you can set up a DTD so that an element can contain either child elements or text data. To do that, we treat #PCDATA as we would any element name in a DTD choice. Listing 4 shows an example of this; in this example, the <product> element is declared so that it can have text content or it can contain a <stocknumber> element.

Listing 4 A Sample XML Document That Uses a Mixed Content Model

<?xml version = "1.0" encoding="UTF-8" standalone="yes"?>
<!DOCTYPE document [ 
<!ELEMENT document (employee)*> 
<!ELEMENT employee (name, hiredate, projects)> 
<!ELEMENT name (lastname, firstname)> 
<!ELEMENT lastname (#PCDATA)> 
<!ELEMENT firstname (#PCDATA)> 
<!ELEMENT hiredate (#PCDATA)> 
<!ELEMENT projects (project)*> 
<!ELEMENT project (product, id, price)> 
<!ELEMENT product (#PCDATA | stocknumber)*> 
<!ELEMENT id (#PCDATA)> 
<!ELEMENT price (#PCDATA)> 
<!ELEMENT stocknumber (#PCDATA)> 
]> 
<document>
  <employee>
    <name>
      <lastname>Kelly</lastname>
      <firstname>Grace</firstname>
    </name>
    <hiredate>October 15, 2005</hiredate>
    <projects>
      <project>
        <product>
          <stocknumber>1111</stocknumber>
        </product>
        <id>111</id>
        <price>$111.00</price>
      </project>
      <project>
        <product>
          Laptop
        </product>
        <id>222</id>
        <price>$989.00</price>
      </project>
    </projects>
  </employee>
    .
    .
    .
  <employee>
    <name>
      <lastname>Gable</lastname>
      <firstname>Clark</firstname>
    </name>
    <hiredate>October 25, 2005</hiredate>
    <projects>
      <project>
        <product>
          <stocknumber>1113</stocknumber>
        </product>
        <id>555</id>
        <price>$129.00</price>
      </project>
      <project>
        <product>Mouse</product>
        <id>666</id>
        <price>$25.00</price>
      </project>
    </projects>
  </employee>
</document>

There are plenty of restrictions when we use a mixed content model like this in a DTD. We cannot specify the order of the child elements, and we cannot use the +, *, or ? operators. In fact, there's usually very little reason to use mixed content models at all in XML. We're almost always better off being consistent and declaring a new element that can contain our text data than using a mixed content model.

Allowing Empty Elements

Elements don't need to have any content at all, of course; they can be empty. As you would expect, you can support empty elements by using DTDs. In particular, you can create an empty content model with the keyword EMPTY, like this:

<!ELEMENT intern EMPTY> 

This declares an empty element named <intern/> that you can use to indicate that an employee is an intern. Listing 5 shows this new empty element at work. As you can see, this example allows each <employee> element to contain an <intern/> element—and makes that element optional.

Listing 5 A Sample XML Document That Uses an Empty Element

<?xml version = "1.0" encoding="UTF-8" standalone="yes"?>
<!DOCTYPE document [ 
<!ELEMENT document (employee)*> 
<!ELEMENT employee (intern?, name, hiredate, projects)> 
<!ELEMENT name (lastname, firstname)> 
<!ELEMENT lastname (#PCDATA)> 
<!ELEMENT firstname (#PCDATA)> 
<!ELEMENT hiredate (#PCDATA)> 
<!ELEMENT projects (project)*> 
<!ELEMENT project (product, id, price)> 
<!ELEMENT product (#PCDATA)> 
<!ELEMENT id (#PCDATA)> 
<!ELEMENT price (#PCDATA)> 
<!ELEMENT intern EMPTY> 
]> 
<document>
  <employee>
    <intern/>
    <name>
      <lastname>Kelly</lastname>
      <firstname>Grace</firstname>
    </name>
    <hiredate>October 15, 2005</hiredate>
    <projects>
      <project>
        <product>Printer</product>
        <id>111</id>
        <price>$111.00</price>
      </project>
      <project>
        <product>Laptop</product>
        <id>222</id>
        <price>$989.00</price>
      </project>
    </projects>
  </employee>
    .
    .
    .
  <employee>
    <intern/>
    <name>
      <lastname>Gable</lastname>
      <firstname>Clark</firstname>
    </name>
    <hiredate>October 25, 2005</hiredate>
    <projects>
      <project>
        <product>Keyboard</product>
        <id>555</id>
        <price>$129.00</price>
      </project>
      <project>
        <product>Mouse</product>
        <id>666</id>
        <price>$25.00</price>
      </project>
    </projects>
  </employee>
</document>

Empty elements can't contain any content, but they can contain attributes.

Summary

In this article you have practiced validating XML documents with DTDs and specified the syntax of XML documents for XML processors to check. In a perfect world, there would be no data-entry errors in XML documents, but real life is a different story. If you specify the syntax of an XML document, you can let an XML processor check that document automatically.

About the Author

Steven Holzner is an award-winning author who has written 80 computing books. Material in this article was taken from Sams Teach Yourself XML in 21 Days, Third Edition. (Copyright Sams Publishing) He has been writing about XML since it first appeared and is one of the foremost XML experts in the United States, having written several XML bestsellers and being a much-requested speaker on the topic. He's also been a contributing editor at PC Magazine, has been on the faculty of Cornell University and MIT, and teaches corporate programming classes around the United States.

Sitemap | Contact Us

Thanks for your registration, follow us on our social networks to keep up-to-date