This article of the series may seem as a diversion from our focus area of Web Services. Nonetheless, as we mentioned in Part 1, Web services uses XML as it’s base language. With that large-scale use of XML in Web Services, it would be useful to take a look at some basic and advanced concepts of XML. With this article, we will aim to lay the foundation for a better understanding of what XML is all about.
About XML—a Primer
XML (eXtensible Markup Language) is a universally agreed markup meta-language primarily used for information exchange. A good example of a markup language is the Hyper Text Markup Language (HTML) The beauty of XML lies in the fact that it is extensible. Simply put, XML is a set of predefined rules (syntactical framework) that you need to follow when structuring your data. For a long time, programmers and application vendors have built applications and systems deployed in an enterprise that processes data that can be interpreted by the enterprise systems—essentially, data structured in a proprietary fashion. But as information exchange between applications and systems across enterprises became prevalent, it became very difficult to exchange data because the systems were never designed to accept data from external, unknown systems. XML provides a standard and common data structure for sharing data between disparate systems. Additionally, XML has built-in data validation, which guarantees that the structure of the data that is received is valid.
Let us take a look at how data is represented using XML:
<employee><shift id= "counter" time="8-12"> <phone id = "1"> All phone information <number>3444333</number > </phone></shift ><shift id="help_desk" time="1-5"> <phone id = "2"> All phone information <number>332333</number > </phone></shift >...<home-address> <street>3434 Norwalk street</street> <city>New York</city> <state>NY</state></home-address></employee>
This illustrates that an employee having more than one shift (for example, mornings he works on the counter and evenings at the help desk).
In the preceding example, we represent the personal information and shift data for an employee in an organization. Notice how XML uses the distinctive “<> </>” tags similar to the tags used in HTML? This is because XML is a markup language much like HTML. The two primary building blocks of XML used in the preceding example are elements and attributes.
Elements are tags, just like the ones used in HTML, and have values. Further, elements are structured as a tree. Hence, you have elements organized in a hierarchical fashion with a base element (parents element) and child elements; child elements themselves further can have more child elements, and so on. In the preceding example, <employee> is the root element and has <shift> as its child element; further down, <phone> is the child of <shift>.
Elements have certain characteristics. Some of these characteristics are:
- Elements can contain data, such as the <number> element in the example.
- On the flip side, elements may not contain data but just attributes, such as the <shift> element.
- Alternatively, elements may have both attributes as well as data, and may also contain child elements, like the <phone> element in the example.
There are many more features and rules associated with elements, such as what valid names an element tag can have, elements have to be properly nested, and so on.
Attributes help you to give more meaning and describe your element more efficiently and clearly. In the preceding example, the <shift> element has an attribute "id" with values "counter" and "help_desk". With the use of such attributes, you can easily know that an employee can be working at a counter or help desk. This helps make the data in the XML document self-describing. You should always remember that the core purpose of attributes is to provide more information about the element and should not be used to contain the data itself. Just as with elements, attributes have many rules associated with them.
Document Type Definition (DTD)
Just as when you start coding in any programming language you need to know the language specification, in the same way DTD is a specification, which has to be followed while creating an XML. Also, just like one of the task of compiler for any programming language is to see if the specification was followed, similarly there are parsers that use the DTD against an XML document to check for the document’s validity.
A DTD helps you to define the structure of your XML document. It provides a strict framework and rules to be followed when creating XML documents. In addition, DTD can be used to check the validity and integrity of the data contained in an XML document. A few salient features of DTD are listed below:
- DTD is used to specify valid elements and attributes that can be used in the XML document.
- With a DTD, you can define a tree hierarchy of the elements.
- Sequential organization of a collection of child elements that can exist in an XML document can also be defined by using a DTD.
DTD can be used directly inside the XML source or can exist outside XML document with a link specified in the XML document to that DTD.
Basically, DTD consists of these items:
|DTD Element||Metadata about an element. It specifies what kind of data will element have, the number of occurrences of each element, relationships between elements and so on.|
|DTD Attributes||Specify various rules and specifications associated with the data.|
|DTD Entities||Used to reference a external file or provide a short cuts to common text.|
Example: <!ELEMENT employee (shift+, home-address, hobbies*)>
employee can have one or more shift a day and should have one home-address and can have zero or more hobbies.
Example: <!ATTLIST shift id CDATA #REQUIRED>
shift should have an id attribute.
In short, DTD is used to define a document structure by specifying the details regarding all the elements that are to be used and hence can be used to check the validity of an XML that is supposed to follow the rules laid down by this DTD.
XML Schema is a more advanced version of DTD. DTD has lots of disadvantages over schema, such as it does not support strong data typing, has syntax other than XML, and it is not expandable. Schema was introduced to overcome those drawbacks. The most common features of XML Schema are:
- Syntax is very similar to XML. This means you can edit your schema by using any XML editor.
- You not only specify basic data types like string, integer, long, float, and so forth but also can define your own custom data types. Example:
<xs:element name="name" type="xs:string"/>
The new types you can define are simple and complex types definition. Complex types may contain other elements and/or attributes, whereas Simple types do not contain other elements or attributes. Instead, they contain only simple text data.
- XML Schema provides Content-Based Validation (the order in which the child elements are nested) and also provides Data Type validations. You have lots of functionality and validation checks provided for simple and complex types.
For example, you can define a simple type with ‘year’ range between 2000 and 2100 as follows:
<xsd:simpleType name="year"> <xsd:restriction base="xsd:integer"> <xsd:minInclusive value="2000"/> <xsd:maxInclusive value="2100"/> </xsd:restriction></xsd:simpleType>
Similarly, Complex types can describe the restrictions on the sequence in which the child elements should appear. Example:
<xsd:complexType name="Employee"> <xsd:sequence> <xsd:element name="Name" type="xsd:string"/> <xsd:element name="Address" type="xsd:string"/> <xsd:element name="Phone" type=" xsd:string "/> </xsd:sequence></xsd:complexType>
- XML Schema provides you with the ability to extend other documents, which is nothing but inheritance in Object-Oriented terms. This means you can reuse and refine other schema definitions.
- Support for Namespace (using URI) is also provided by XML Schema. It provides each element a unique identifier, which avoids element name conflicts that may occur due to many reasons say, when two documents are merged, and both document have “name” fields but have different meanings to them. For example, “name” can be a person’s name in one document, but it can be a spouse’s name in another document. In short, it helps in distinguishing duplicate elements and attributes.
Here, you see ‘id’ is a prefix and namespace is ‘http://somesite.com/schema’.
After you define a namespace, you can use the prefix on all the elements to uniquely identify it.
<myElement xmlns_id='http://somesite.com/schema'> <id:name>myName</id:name></ myElement >
- Last but not least, XML Schema is easily extendible to incorporate more features in the future.
Finally, let us take a quick look at a comparison between DTD and XML Schema.
DTD and XML Schema and DTD—a Comparsion
- XML Schema is an extension of DTD.
- XML Schema supports Namespace; DTD does not.
- XML Schema uses XML syntax that is easy to understand; DTD uses a specialized syntax.
- XML Schema supports Standard data types as well as user-defined; DTD provides for only textual types.
- XML Schema supports inheritance; DTD does not provide any Object-Oriented Features.
In this article, we covered the basics of XML to get you up to speed with XML concepts. This background of XML will help you when we study each of the technologies of Web Services in the coming weeks. Next, we will cover the advanced XML technologies such as ebXML and take a detailed look at how Web Services and XML go hand in hand.