http://www.developer.com/

Back to article

Web Services Tutorial: Understanding XML and XML Schema-Part 1


April 23, 2003

Abstract

This Web Service article series covers all the important standards in the Web Services stack and ties them together with real-world examples. The first article in this series discusses XML (Extended Markup Language). XML provides a significant advance in how data is described and exchanged by Web-based applications using a simple, flexible, standards-based format. The article focuses on XML Schema, an important component of creating XML documents.

Introduction

By now, you would have heard about Web Services—a technology that can change the future of computing and e-commerce. Web Services is a distributed computing technology that offers interaction and collaboration among vendors and customers, with the vision of providing ubiquitous computing.

When you plug an appliance into the electricity socket, you don't worry about how the electricity generation and distribution takes place. All you care about is uninterrupted power and of course the utility bill that you get at the end of the month! Similarly, Web Services will make computing resources, both hardware and software, accessible to you through the Internet just like electricity is made available to you. Web Services will do for computing what the Internet did for data. They would encourage a pay-per-usage model and make dynamic collaborations possible. One of key definitions of Web Services is: "Web Services are loosely coupled software components delivered over Internet-standard technologies."

Some of the early products in Web Services started appearing in 1997 when Sun announced its Jini platform and Hewlett-Packard its e-speak. After that, many big players such as IBM and Microsoft joined this race. The Web Services arena picked up steam after the big players roped in and several small players also joined hands for what was perceived as the next Internet wave. Server-standard body consortiums were formed, which developed numerous standards on different aspects of Web Services. Some of the key standard bodies consortiums are: W3C, Oasis, JCP, OMG, and several individual efforts by a group of companies.

Two of the key problems solved by Web Services over earlier distributed systems such as CORBA, DCOM, RPC, and so forth were:

  • Interoperability: Earlier distributed systems suffered from interoperability issues because each vendor implemented its own on-wire format for distributed object messaging. By using XML as an on-wire standard, the two camps of Java/J2EE and .NET/C# now could speak to each other.
  • Firewall traversal: Collaboration across corporations was an issue because distributed systems such as CORBA and DCOM used non-standard ports. As a result, collaboration meant punching a hole in your firewall, which was often unacceptable to IT. Hence, this did not allow any dynamic collaboration, as it required going through a manual process for collaborating with partners. Web Services use HTTP as a transport protocol and most of the firewalls allow access though port 80 (for HTTP), leading to easier and dynamic collaboration. The dynamic nature of Web Services interaction offers several exciting services for the users.

What are the key technologies that made Web Services possible? Let us now examine the key interactions and the key standards involved in the Web Services stack.

Web Services Stack

To understand what technologies are required for Web Services, we need to understand a typical Web Service interaction.

The Web Services model follows the publish, find, and bind paradigm. In the first step, a service provider publishes a Web Service in a Web Service registry. Secondly, a client who is looking for a service to meet their requirement searches in a registry. After successfully finding multiple matches, it chooses a service. The client then chooses a service based on its preferences. The client then downloads the service description and binds with that to invoke and use the service.

One of the primary concerns of Web-based programmers was how to transmit data in an interoperable manner. At the bottom-most layer is the XML standard that addresses this. SOAP (Simple Object Access Protocol) is an XML-based mechanism for messaging and RPC (Remote Procedure Calls). It addresses the fundamental problem of firewall traversal in RPC systems by using HTTP as the transport. SOAP is the protocol used for invoking the service.



Click here for a larger image.

WSDL (Web Services Description Language) provides an XML-based way of describing a Web Service, giving details of using it. WSDL is an XML equivalent of IDL (Interface Definition Language), used in the RPC days. UDDI (Universal Description Discovery Integration) provides a "Yellow page" directory of Web Services, making it easier for clients to discover the services of their choice. The service provider publishes the service description (WSDL) and other searchable details in the UDDI registry. A client uses UDDI to perform the find of a service.

In this tutorial series, we will cover each and every standard in the Web Service stack moving from the bottom up, beginning with XML.

XML

Extensible Markup Language (XML) is a extensible, portable, and structured text format. XML is playing an increasingly important role in the exchange of a wide variety of data on the Web and elsewhere. XML was derived from SGML, which was a complex language for defining other markup languages.

XML initiative consists of bunch of related standards. Apart from the core XML standard, it includes XSL—Extensible Stylesheet language, which is used to transform XML data into a customizable presentation. XLink and XQuery provide a way to provide flexible query facilities to extract data from real and virtual XML documents on the Web. XPath and XPointer are languages for addressing parts of an XML document.

A previous article, "Understanding XML," introduced you to the fundamentals of XML. XML Schema is one of the key components of XML. Therefore, in this article we will closely look at working with XML Schema.

Working with XML

When working with XML, we think of creating XML documents and consuming XML documents. The creation process involves using editors and tools to create XML documents. On the other hand, consuming XML documents involves parsing the XML documents and extracting the useful data.

Creating XML documents

Creating XML documents is a two step process, which involves:

  • Defining the grammar and restrictions over data for the XML document.
  • Creating the XML document itself. This document can be validated against the grammar.

The DTD and Schema are used to describe the grammar and restriction over data in the XML document.

DTD and Schema

DTD and schema are used to specify the structure of instance documents and the datatype of each element/attribute. DTDs used today in the XML originated from the parent SGML specification. Because SGML was designed for a more document-centric model, it did not require the use of complex datatyping definitions. The XML Schema specification improves greatly upon the DTD content model by providing rich datatyping capabilities for elements and attributes as well as providing OO design principles.

XML Schema was approved as a W3C Recommendation in May, 2001 and is now being widely used for structuring XML documents for e-commerce and Web Services applications.

The two major goals that the W3C XML Schema working group focused on during the design of the XML Schema standard were:

  • Expressing Object Oriented design principles found in common OO programming languages into the specification.
  • Providing rich datatyping support similar to the datatyping functionality available in most relational database systems.

XML Schemas provides a means of creating a set of rules that can be used to identify document rules governing the validity of the XML documents that you create. Schemas provide a means of defining the structure, content, and semantics of XML documents that can be shared between different types of computers and documents.

Advantages of XML Schema

XML Schema offers many advantages over DTD:

  • Enhanced data types: Schema supports over 44 datatypes versus 10 supported by DTD. You can create your own data type in XML Schema. For example: "This is a new type based on the string type and elements of this type must follow this pattern: ddd-dddd-ddd, where 'd' represents a digit".
  • Written in the same syntax as instance documents: DTD requires you to remember another syntax than the one used to write XML documents. Remembering an extra set of syntax is an overhead and is error-prone.
  • Object-oriented: Schemas can extend or restrict a type (derive new type definitions on the basis of old ones).
  • Schemas can specify element content as being unique (keys on content) and uniqueness within a region. They can also define multiple elements with the same name but different content—by using namespaces. Lack of namespaces was a major drawback in DTD. You can think of namespaces in XML like namespaces in C++. A simple analogy of DTD vs. Schema namespace usage is the use of global and local variable in programming languages. A local variable's name is unique with in its scope, whereas a global variable has to be unique across functions. Similarly, with a XML Schema namespace you have freedom to define datatypes without worrying about name collisions.

Well formedness and Validity

Well formedness of an XML document (also known as instance document) refers to the characteristic of a document adhering to the XML rule of well formedness. As you would recall, XML has a stringent set of rules unlike HTML, such as closing all the tags, no nested tags, and so forth.

One quick way of checking well formedness of an XML document is opening it in a browser window. Both Inernet Explorer and Netscape provide automatic well formedness checking and show if any errors shown shows up in the browser.

Validity of an instance document implies that the document conforms to the specified Schema or DTD file mentioned in the XML document.

Example of XML Schema

The following diagram shows the vocabulary for defining XML Schema document.

Each of the key words is easy to understand. An element is represented by "element" tag. A complexType provides us with a mechanism to define complex user types. Let us now look at an example of Schema for BookStore that can contain multiple books.

<?xml version="1.0"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
                      targetNamespace="http://www.books.org"
                      xmlns=http://www.books.org>  A
<xsd:element name="BookStore">  B
  <xsd:complexType>
    <xsd:sequence>
      <xsd:element ref="Book" minOccurs="1" maxOccurs="unbounded"/>
    </xsd:sequence>
  </xsd:complexType>
</xsd:element>
<xsd:element name="Book">  C
  <xsd:complexType>
    <xsd:sequence>
      <xsd:element ref="Title" minOccurs="1" maxOccurs="1"/>
      <xsd:element ref="Author" minOccurs="1" maxOccurs="1"/>
      <xsd:element ref="Date" minOccurs="1" maxOccurs="1"/>
      <xsd:element ref="ISBN" minOccurs="1" maxOccurs="1"/>
      <xsd:element ref="Publisher" minOccurs="1" maxOccurs="1"/>
    </xsd:sequence>
  </xsd:complexType>
</xsd:element>
<xsd:element name="Title" type="xsd:string"/>  D
<xsd:element name="Author" type="xsd:string"/>
<xsd:element name="Date" type="xsd:string"/>
<xsd:element name="ISBN" type="xsd:string"/>
<xsd:element name="Publisher" type="xsd:string"/>
</xsd:schema>

For easy referencing of the XML document, I have marked the document in various sections. In section A of the document, the first line contains the value of xsd as a namespace. Any namespace reference is depicted by using this value as a prefix and the syntax to refer to an element in namespace is namespace:elementname (for example, xsd:element). The targetNamespace attribute specifies the namespace in which the newly defined types reside. In the above example, the defined types of BookStore, Book, Title, Author, ISBN, and Publisher reside in the namespace http://www.books.org.

The namespace http://www.books.org

The xmlns attribute defines the default namespace; in other words, from the location at which the elements would be looked for in absence of any namespace prefix for an element.

In the section B a new type, BookStore, is defined as a complexType, which consists of a sequence of the Book type. A complexType is used to define a user type, which can contain multiple elements and attributes. Section C details the Book type, which itself a complexType consisting of sequence of elements Title, Author, Date, ISBN, and Publisher. The attributes minOccur and maxOccur define the constraint on the occurrence of the elements. The order of the elements in the Book type should follow the order in declaration. In other words, the Book type will contain a Title element, followed by Author, and so on.

Section D defines the types of individual elements for Book. Each of the elements are of built in a simpleType string.

You can define your own simpleType, but a simpleType does not consist of multiple elements or any attribute. The simpleType is usually defined to represent a restriction on the datatype. For example, here is the definition for a simple type elevation that can have valid values between 50 and 12000.

<xsd:simpleType name="elevation">
  <xsd:restriction base="xsd:integer">
    <xsd:minInclusive value="50"/>
    <xsd:maxInclusive value="12000"/>
  </xsd:restriction>
</xsd:simpleType>

Having looked at the Schema document, let's look at an instance document using the above schema.

<?xml version="1.0"?>
<BookStore xmlns ="http://www.books.org"  1
           xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"  2
           xsi:schemaLocation="http://www.books.org/
                               BookStore.xsd">  3
    <Book>  4
            <Title>Web Services Security</Title>
            <Author>Ravi Trivedi</Author>
            <Date>Dec, 2002</Date>
            <ISBN>1861007655</ISBN>
            <Publisher>Wrox Publishing</Publisher>
        </Book>
</BookStore>

The first definition of xmlns uses a default namespace declaration, and tells the schema-validator that all of the elements used in this instance document come from the Book namespace. The second definition tells the schema-validator that the schemaLocation attribute we are using is the one in the XMLSchema-instance namespace. The third definition, with schemaLocation, tells the schema-validator that the http://www.books.org namespace is defined by BookStore.xsd (in other words, schemaLocation contains a pair of values). This file would be looked up by the schem validator to validate the references the current document.

Lastly, in the fourth declaration, a Book element is declared which contains the details of a book. Note how the order of the elements is retained in the complexType Book.

One of the major advantages of the XML Schema specification is that it uses XML as its underlying syntax. By using XML, existing XML parsers can be used in conjunction with Schema validators to provide well-formedness and validation facilities.

The XML Schema specification plays an important role in the design and implementation of Web Services. WSDL files are also built using XML Schema as the underlying syntax.

The XML Schema offers an automated mechanism for validating the XML documents. It has been observed that in a typical program, up to 60% of the code is spent checking the data. If your data is structured as XML, and there is a schema, you can hand the data-checking task off to a schema validator. Thus, your code is reduced by up to 60%. Also, next time if your constraints change or you add new elements, you need not write new code to check for new values of elements.

Summary

In this article, we briefly looked at the Web Services vision and the standards in the Web Services stack. The XML standard forms the underlying layer for the Web Services technology. XML Schema is used to describe the grammar and constraints on the elements of an XML instance document.

The XML Schema specification allows for the use of both built-in and custom defined datatypes making it possible to more accurately express and constrain data found in compliant XML documents. Reuse thru inheritance and groupings, similar to functionality found in OO programming languages, is also provided. This provides a flexible and useful mechanism for automatic data validation.

The next article in this series will cover the creation of XML schemas and the parsing of XML documents.

About the Author

Ravi Trivedi holds a Masters degree in Computer Science from the Indian Institute of Science, Bangalore. He is a Technical Lead at Hewlett-Packard, Bangalore, who represents HP in JAXR and P3P standards. He is also a committer for UDDI4j (www.uddi4j.org). He is involved in developing Web Services infrastructure and solutions. Ravi co-authored the book Web Services Security available at amazon.com. He can be reached at ravi_trivedi@yahoo.com.

Sitemap | Contact Us

Thanks for your registration, follow us on our social networks to keep up-to-date