September 19, 2014
Hot Topics:
RSS RSS feed Download our iPhone app

Creating Valid XML Documents: DTDs

  • January 16, 2004
  • By Steven Holzner
  • Send Email »
  • More Articles »

Handling Text Content

In the preceding section's example, the <name>, <hiredate>, and <lastname> elements contain text data. In DTDs, non-markup text is considered parsed character data (in other words, text that has already been parsed, which means the XML processor shouldn't touch that text because it doesn't contain markup). In a DTD, we refer to parsed character data as #PCDATA. Note that this is the only way to refer to text data in a DTD—you can't say anything about the actual format of the text, although that might be important if you're dealing with numbers. In fact, this lack of precision is one of the reasons that XML schemas were introduced.

Here's how to give the text-containing elements in the PCDATA content model example:

<?xml version = "1.0" standalone="yes"?>
<!DOCTYPE document [ 
<!ELEMENT document (employee)*> 
<!ELEMENT employee (name, hiredate, projects)> 
<!ELEMENT name (lastname, firstname)> 
<!ELEMENT lastname (#PCDATA)> 
<!ELEMENT firstname (#PCDATA)> 
<!ELEMENT hiredate (#PCDATA)> 
<!ELEMENT projects (project)*> 
<!ELEMENT project (product,id,price)> 
<!ELEMENT product (#PCDATA)> 
<!ELEMENT id (#PCDATA)> 
<!ELEMENT price (#PCDATA)> 
]> 
<document>
  <employee>
    <name>
      <lastname>Kelly</lastname>
      <firstname>Grace</firstname>
    </name>
    <hiredate>October 15, 2005</hiredate>
    <projects>
      <project>
        <product>Printer</product>
        <id>111</id>
        <price>$111.00</price>
      </project>
      <project>
        <product>Laptop</product>
        <id>222</id>
        <price>$989.00</price>
      </project>
    </projects>
  </employee>
</document>

Note - Can you mix elements and PCDATA in the same content model? Yes, you can. This is called a mixed content model, and you'll see how to work with such models in a few pages.


You're almost done with the sample DTD—except for the * symbol. The following section takes a look at * and the other possible symbols to use.

Specifying Multiple Child Elements

There are a number of options for declaring an element that can contain child elements. You can declare the element to contain a single child element:

<!ELEMENT document (employee)> 

You can declare the element to contain a list of child elements, in order:

<!ELEMENT document (employee, contractor, partner)> 

You can also use symbols with special meanings in DTDs, such as *, which means "zero or more of," as in this example, where you're allowing zero or more <employee> elements in a <document> element:

<!ELEMENT document (employee)*> 

There are a number of other ways of specifying multiple children by using symbols. (This syntax is actually borrowed from regular expression handling in the Perl language, so if you know that language, you have a leg up here.) Here are the possibilities:

  • x+—Means x can appear one or more times.

  • x*—Means x can appear zero or more times.

  • x?—Means x can appear once or not at all.

  • x, y—Means x followed by y.

  • x | y—Means x or y—but not both.

The following sections take a look at these options.

Allowing One or More Children

You might want to specify that a <document> element can contain between 200 and 250 <employee> elements, and if you do, you're out of luck with DTDs because DTD syntax doesn't give us that kind of precision. On the other hand, you still do have some control here; for example, you can specify that a <document> element must contain one or more <employee> elements if you use a + symbol, like this:

<!ELEMENT document (employee)+> 

Here, the XML processor is being told that a <document> element has to contain at least one <employee> element.

Allowing Zero or More Children

By using a DTD, you can use the * symbol to specify that you want an element to contain any number of child elements—that is, zero or more child elements. You saw this in action earlier, when you specified that the <document> element may contain <employee> elements in the Listing 1 example:

<!ELEMENT document (employee)*> 
Allowing Zero or One Child

When using a DTD, you can use ? to specify zero or one child elements. Using ? indicates that a particular child element may be present once in the element you're declaring, but it need not be. For example, here's how to indicate that a <document> element may contain zero or one <employee> elements:

<!ELEMENT document (employee)?> 
Using +, *, and ? in Sequences

You can use the +, *, and ? symbols in content model sequences. For example, here's how you might specify that there can be one or more <name> elements for an employee, an optional <hiredate> element, and any number of <project> elements:

<?xml version = "1.0" standalone="yes"?>
<!DOCTYPE document [ 
<!ELEMENT document (employee)*> 
<!ELEMENT employee (name+, hiredate?, projects*)> 
<!ELEMENT name (lastname, firstname)> 
<!ELEMENT lastname (#PCDATA)> 
<!ELEMENT firstname (#PCDATA)> 
<!ELEMENT hiredate (#PCDATA)> 
<!ELEMENT projects (project)*> 
<!ELEMENT project (product,id,price)> 
<!ELEMENT product (#PCDATA)> 
<!ELEMENT id (#PCDATA)> 
<!ELEMENT price (#PCDATA)> 
]> 
<document>
  <employee>
    <name>
      <lastname>Kelly</lastname>
      <firstname>Grace</firstname>
    </name>
    <hiredate>October 15, 2005</hiredate>
    <projects>
      <project>
        <product>Printer</product>
        <id>111</id>
        <price>$111.00</price>
      </project>
      <project>
        <product>Laptop</product>
        <id>222</id>
        <price>$989.00</price>
      </project>
    </projects>
  </employee>
</document>

Using +, *, and ? inside sequences provides a lot of flexibility because it means you can specify how many times an element can appear in a sequence—and even whether the element can be absent altogether.

In fact, you can get even more powerful results by using the +, *, and ? operators inside sequences. By using parentheses, we can create subsequences—that is, sequences inside sequences. For example, say that we wanted to allow each employee to list multiple names (including nicknames and so on), possibly list his or her age, and give multiple phone numbers. You can do that by using the subsequence shown in Listing 2.

Listing 2 A Sample XML Document That Uses Subsequences in a DTD

<?xml version = "1.0" encoding="UTF-8" standalone="yes"?>
<!DOCTYPE document [ 
<!ELEMENT document (employee)*> 
<!ELEMENT employee ((name, age?, phone*)+, hiredate, projects)> 
<!ELEMENT name (lastname, firstname)> 
<!ELEMENT lastname (#PCDATA)> 
<!ELEMENT firstname (#PCDATA)> 
<!ELEMENT hiredate (#PCDATA)> 
<!ELEMENT projects (project)*> 
<!ELEMENT project (product,id,price)> 
<!ELEMENT product (#PCDATA)> 
<!ELEMENT age (#PCDATA)> 
<!ELEMENT phone (#PCDATA)> 
<!ELEMENT id (#PCDATA)> 
<!ELEMENT price (#PCDATA)> 
]> 
<document>
  <employee>
    <name>
      <lastname>Kelly</lastname>
      <firstname>Grace</firstname>
    </name>
    <phone>
      555.2345
    </phone>
    <hiredate>October 15, 2005</hiredate>
    <projects>
      <project>
        <product>Printer</product>
        <id>111</id>
        <price>$111.00</price>
      </project>
      <project>
        <product>Laptop</product>
        <id>222</id>
        <price>$989.00</price>
      </project>
    </projects>
  </employee>
  <employee>
    <name>
      <lastname>Grant</lastname>
      <firstname>Cary</firstname>
    </name>
    <age>
      32
    </age>
    <phone>
      555.2346
    </phone>
    <hiredate>October 20, 2005</hiredate>
    <projects>
      <project>
        <product>Desktop</product>
        <id>333</id>
        <price>$2995.00</price>
      </project>
      <project>
        <product>Scanner</product>
        <id>444</id>
        <price>$200.00</price>
      </project>
    </projects>
  </employee>
  <employee>
    <name>
      <lastname>Gable</lastname>
      <firstname>Clark</firstname>
    </name>
    <age>
      46
    </age>
    <phone>
      555.2347
    </phone>
    <hiredate>October 25, 2005</hiredate>
    <projects>
      <project>
        <product>Keyboard</product>
        <id>555</id>
        <price>$129.00</price>
      </project>
      <project>
        <product>Mouse</product>
        <id>666</id>
        <price>$25.00</price>
      </project>
    </projects>
  </employee>
</document>

Getting creative when defining subsequences and using the +, *, and ? operators allows us to be extremely flexible in DTDs.





Page 3 of 4



Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 


Sitemap | Contact Us

Rocket Fuel