XML is a great way of representing information. It’s simple for both humans and computers to read and can be readily transformed into other formats. This by itself has made XML the preferred technology for describing data. What is missing is a good way of describing the structure of a valid XML document. The Document Type Definition (DTD) is one method, but it’s far from perfect. The obvious problem is that DTD itself is not valid XML. It’s like an English grammar book written in French. Further, DTD is limited in scope. DTD can only describe character data, it knows no other data type.
DTD is a form of metadata. It’s information about information. Metadata is found in many forms — for example, database schemas, C structures, or Java/C++ classes. Metadata may sound like a curious computer science concept, but it has powerful practical applications. A database schema can be used to automatically create data entry and presentation forms with little or no custom programming. A Java class can be introspected to expose properties and methods for use within a builder tool. Metadata can also be used to automatically transform data from one representation to another. This leads to greater productivity and software that is correct and robust from the outset.
The XML schema specification can be found at the W3C architecture domain. Schema is very powerful and complex! In fact, it is probably the single most complex of all XML specifications. Fortunately, it’s not necessary to know all the details of XML schema. As long as you understand the basic concepts behind XML schema, there are tools that can help you solve most problems.
Let’s look at an example. Let’s say we want to create a schema describing a purchase order. An order might look something like this:
<PurchaseOrder> <item> <description>Ethernet hub</description> <sku>10123</sku> <quantity>1</quantity> <price>49.95</price> </item> <item> <description>10BaseT connectors</description> <sku>11098</sku> <quantity>10</quantity> <price>0.95</price> </item> </PurchaseOrder>
We will use XML Spy to generate and work with schemas. XML Spy is an excellent tool for almost anything related to XML and a good way to get your feet wet with XML schema. We can start up the XML Spy IDE and create a new W3C Schema (xsd) document. We can create a schema graphically, which simplifies our task considerably. The view appears in Figure 1.
The schema as XML itself looks like this:
<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns_xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" attributeFormDefault="unqualified"> <xs:element name="PurchaseOrder"> <xs:annotation> <xs:documentation>Example XML schema</xs:documentation> </xs:annotation> <xs:complexType> <xs:sequence maxOccurs="unbounded"> <xs:element name="item"> <xs:complexType> <xs:sequence> <xs:element name="description"/> <xs:element name="sku"/> <xs:element name="quantity"/> <xs:element name="price"/> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> </xs:schema>
Looking at this example gives us some insight into the basics of XML schema. The PurchaseOrder element appears to be a complex type containing an unbounded number of elements. The item element is also a complex type that contains exactly four elements. From this example, we can gather that elements can contain a collection of other elements along with multiplicity constraints. This level of functionality is essentially available with DTD. Schema permits substantially greater flexibility, as we will see.
Elements are declared using the <element> element. An element can be defined in one of two ways: either through a type reference or an anonymous definition. In the example above, anonymous definitions were used. A type reference for the purchase element order might look like this:
<xs:element name="PurchaseOrder" type="PurchaseOrderType" />
This requires that the details of the PurchaseOrder element must be described elsewhere within this document or another document. This permits reuse of schema definitions. The definition of the PurchaseOrderType might look like this:
<xs:complexType name="PurchaseOrderType"> <xs:sequence maxOccurs="unbounded"> <xs:element name="item"> <xs:complexType> <xs:sequence> <xs:element name="description"/> <xs:element name="sku"/> <xs:element name="quantity"/> <xs:element name="price"/> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType>
Reference to external documents can be made through <import> or <include> elements. Very complex schemas can be created spanning multiple documents.
Let’s first consider the case of simple types, that is, elements containing text only. The description element is an example of a simple type. As it stands, our example schema allows any text for the simple types. We can refine the contents of a simple type using data types and facets. Data types include familiar examples, such as string, date, float and so on. Facets support specification through numerical limits, enumerations over a set of values, the length of text, or even a regular expressions. For example, if we wanted to define a simple, anonymous type element for an enumeration of cities, it might look something like this:
<xs:element name="city"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="New York"/> <xs:enumeration value="London"/> <xs:enumeration value="Paris"/> <xs:enumeration value="Sydney"/> </xs:restriction> </xs:simpleType> </xs:element>
A tool or other software could read our schema and automatically generate a data entry form. Upon reading the definition for the city element, the software could present a drop down list populated with the defined cities. The schema could also be used for strict validation of data prior to further processing.
Complex elements can contain child elements. In almost all cases, the root element of a document will be complex, containing a series of child elements. The order, number, and types of child elements can be strictly defined using XML schema. Allowed child elements can be listed as a choice or sequence of many elements, a group, or any content. In our PurchaseOrder example, the item element, must contain a sequence of elements in a specific order. Numerous other options such as the number of occurrences can also be specified. You can experiment with all of these options easily and painlessly using XML Spy.
Once we have a schema, we can create a document based on that schema and validate it. Again, we can use XML Spy to help us out. When we create a new XML document, it can be based on a DTD or schema. If we select the schema we created earlier, all type restrictions will be applied as we enter data to fill in contents of the document. We can then check for well-formedness and also validity based on our schema.
This gives a small taste of what XML schema is all about and some of the possibilities it presents. Of course, schema is a large, complex topic that is difficult to fully internalize at one sitting. Fortunately, there are excellent tools that make the job of creating schemas straightforward and painless.
About the Author
Madhu Siddalingaiah is CTO of Aquarius Solutions, a Java/XML consulting firm. Madhu is co-author of XML and Web Services Unleashed (Sams Publishing) and a veteran Java architect and developer.