Web Services Tutorial: Understanding XML and XML Schema-Part 1
Advantages of XML Schema
XML Schema offers many advantages over DTD:
- Enhanced data types: Schema supports over 44 datatypes versus 10 supported by DTD. You can create your own data type in XML Schema. For example: "This is a new type based on the string type and elements of this type must follow this pattern: ddd-dddd-ddd, where 'd' represents a digit".
- Written in the same syntax as instance documents: DTD requires you to remember another syntax than the one used to write XML documents. Remembering an extra set of syntax is an overhead and is error-prone.
- Object-oriented: Schemas can extend or restrict a type (derive new type definitions on the basis of old ones).
- Schemas can specify element content as being unique (keys on content) and uniqueness within a region. They can also define multiple elements with the same name but different content—by using namespaces. Lack of namespaces was a major drawback in DTD. You can think of namespaces in XML like namespaces in C++. A simple analogy of DTD vs. Schema namespace usage is the use of global and local variable in programming languages. A local variable's name is unique with in its scope, whereas a global variable has to be unique across functions. Similarly, with a XML Schema namespace you have freedom to define datatypes without worrying about name collisions.
Well formedness and Validity
Well formedness of an XML document (also known as instance document) refers to the characteristic of a document adhering to the XML rule of well formedness. As you would recall, XML has a stringent set of rules unlike HTML, such as closing all the tags, no nested tags, and so forth.
Validity of an instance document implies that the document conforms to the specified Schema or DTD file mentioned in the XML document.
Example of XML Schema
The following diagram shows the vocabulary for defining XML Schema document.
Each of the key words is easy to understand. An element is represented by "element" tag. A complexType provides us with a mechanism to define complex user types. Let us now look at an example of Schema for BookStore that can contain multiple books.
<?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.books.org" xmlns=http://www.books.org> A <xsd:element name="BookStore"> B <xsd:complexType> <xsd:sequence> <xsd:element ref="Book" minOccurs="1" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="Book"> C <xsd:complexType> <xsd:sequence> <xsd:element ref="Title" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Author" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Date" minOccurs="1" maxOccurs="1"/> <xsd:element ref="ISBN" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Publisher" minOccurs="1" maxOccurs="1"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="Title" type="xsd:string"/> D <xsd:element name="Author" type="xsd:string"/> <xsd:element name="Date" type="xsd:string"/> <xsd:element name="ISBN" type="xsd:string"/> <xsd:element name="Publisher" type="xsd:string"/> </xsd:schema>
For easy referencing of the XML document, I have marked the document in various sections. In section A of the document, the first line contains the value of xsd as a namespace. Any namespace reference is depicted by using this value as a prefix and the syntax to refer to an element in namespace is namespace:elementname (for example, xsd:element). The targetNamespace attribute specifies the namespace in which the newly defined types reside. In the above example, the defined types of BookStore, Book, Title, Author, ISBN, and Publisher reside in the namespace http://www.books.org.
The namespace http://www.books.org
The xmlns attribute defines the default namespace; in other words, from the location at which the elements would be looked for in absence of any namespace prefix for an element.
In the section B a new type, BookStore, is defined as a complexType, which consists of a sequence of the Book type. A complexType is used to define a user type, which can contain multiple elements and attributes. Section C details the Book type, which itself a complexType consisting of sequence of elements Title, Author, Date, ISBN, and Publisher. The attributes minOccur and maxOccur define the constraint on the occurrence of the elements. The order of the elements in the Book type should follow the order in declaration. In other words, the Book type will contain a Title element, followed by Author, and so on.
Section D defines the types of individual elements for Book. Each of the elements are of built in a simpleType string.
You can define your own simpleType, but a simpleType does not consist of multiple elements or any attribute. The simpleType is usually defined to represent a restriction on the datatype. For example, here is the definition for a simple type elevation that can have valid values between 50 and 12000.
<xsd:simpleType name="elevation"> <xsd:restriction base="xsd:integer"> <xsd:minInclusive value="50"/> <xsd:maxInclusive value="12000"/> </xsd:restriction> </xsd:simpleType>
Having looked at the Schema document, let's look at an instance document using the above schema.
<?xml version="1.0"?> <BookStore xmlns ="http://www.books.org" 1 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 2 xsi:schemaLocation="http://www.books.org/ BookStore.xsd"> 3 <Book> 4 <Title>Web Services Security</Title> <Author>Ravi Trivedi</Author> <Date>Dec, 2002</Date> <ISBN>1861007655</ISBN> <Publisher>Wrox Publishing</Publisher> </Book> </BookStore>
The first definition of xmlns uses a default namespace declaration, and tells the schema-validator that all of the elements used in this instance document come from the Book namespace. The second definition tells the schema-validator that the schemaLocation attribute we are using is the one in the XMLSchema-instance namespace. The third definition, with schemaLocation, tells the schema-validator that the http://www.books.org namespace is defined by BookStore.xsd (in other words, schemaLocation contains a pair of values). This file would be looked up by the schem validator to validate the references the current document.
Lastly, in the fourth declaration, a Book element is declared which contains the details of a book. Note how the order of the elements is retained in the complexType Book.
One of the major advantages of the XML Schema specification is that it uses XML as its underlying syntax. By using XML, existing XML parsers can be used in conjunction with Schema validators to provide well-formedness and validation facilities.
The XML Schema specification plays an important role in the design and implementation of Web Services. WSDL files are also built using XML Schema as the underlying syntax.
The XML Schema offers an automated mechanism for validating the XML documents. It has been observed that in a typical program, up to 60% of the code is spent checking the data. If your data is structured as XML, and there is a schema, you can hand the data-checking task off to a schema validator. Thus, your code is reduced by up to 60%. Also, next time if your constraints change or you add new elements, you need not write new code to check for new values of elements.
In this article, we briefly looked at the Web Services vision and the standards in the Web Services stack. The XML standard forms the underlying layer for the Web Services technology. XML Schema is used to describe the grammar and constraints on the elements of an XML instance document.
The XML Schema specification allows for the use of both built-in and custom defined datatypes making it possible to more accurately express and constrain data found in compliant XML documents. Reuse thru inheritance and groupings, similar to functionality found in OO programming languages, is also provided. This provides a flexible and useful mechanism for automatic data validation.
The next article in this series will cover the creation of XML schemas and the parsing of XML documents.
About the Author
Ravi Trivedi holds a Masters degree in Computer Science from the Indian Institute of Science, Bangalore. He is a Technical Lead at Hewlett-Packard, Bangalore, who represents HP in JAXR and P3P standards. He is also a committer for UDDI4j (www.uddi4j.org). He is involved in developing Web Services infrastructure and solutions. Ravi co-authored the book Web Services Security available at amazon.com. He can be reached at email@example.com.
Page 2 of 2