Introduction to Web Services Part 3: Understanding XML
This article of the series may seem as a diversion from our focus area of Web Services. Nonetheless, as we mentioned in Part 1, Web services uses XML as it's base language. With that large-scale use of XML in Web Services, it would be useful to take a look at some basic and advanced concepts of XML. With this article, we will aim to lay the foundation for a better understanding of what XML is all about.
About XML—a Primer
XML (eXtensible Markup Language) is a universally agreed markup meta-language primarily used for information exchange. A good example of a markup language is the Hyper Text Markup Language (HTML) The beauty of XML lies in the fact that it is extensible. Simply put, XML is a set of predefined rules (syntactical framework) that you need to follow when structuring your data. For a long time, programmers and application vendors have built applications and systems deployed in an enterprise that processes data that can be interpreted by the enterprise systems—essentially, data structured in a proprietary fashion. But as information exchange between applications and systems across enterprises became prevalent, it became very difficult to exchange data because the systems were never designed to accept data from external, unknown systems. XML provides a standard and common data structure for sharing data between disparate systems. Additionally, XML has built-in data validation, which guarantees that the structure of the data that is received is valid.
Let us take a look at how data is represented using XML:
<employee><shift id= "counter" time="8-12"> <phone id = "1"> All phone information <number>3444333</number > </phone></shift ><shift id="help_desk" time="1-5"> <phone id = "2"> All phone information <number>332333</number > </phone></shift >...<home-address> <street>3434 Norwalk street</street> <city>New York</city> <state>NY</state></home-address></employee>
This illustrates that an employee having more than one shift (for example, mornings he works on the counter and evenings at the help desk).
In the preceding example, we represent the personal information and shift data for an employee in an organization. Notice how XML uses the distinctive "<> </>" tags similar to the tags used in HTML? This is because XML is a markup language much like HTML. The two primary building blocks of XML used in the preceding example are elements and attributes.
Elements are tags, just like the ones used in HTML, and have values. Further, elements are structured as a tree. Hence, you have elements organized in a hierarchical fashion with a base element (parents element) and child elements; child elements themselves further can have more child elements, and so on. In the preceding example, <employee> is the root element and has <shift> as its child element; further down, <phone> is the child of <shift>.
Elements have certain characteristics. Some of these characteristics are:
- Elements can contain data, such as the <number> element in the example.
- On the flip side, elements may not contain data but just attributes, such as the <shift> element.
- Alternatively, elements may have both attributes as well as data, and may also contain child elements, like the <phone> element in the example.
There are many more features and rules associated with elements, such as what valid names an element tag can have, elements have to be properly nested, and so on.
Attributes help you to give more meaning and describe your element more efficiently and clearly. In the preceding example, the <shift> element has an attribute "id" with values "counter" and "help_desk". With the use of such attributes, you can easily know that an employee can be working at a counter or help desk. This helps make the data in the XML document self-describing. You should always remember that the core purpose of attributes is to provide more information about the element and should not be used to contain the data itself. Just as with elements, attributes have many rules associated with them.
Document Type Definition (DTD)
Just as when you start coding in any programming language you need to know the language specification, in the same way DTD is a specification, which has to be followed while creating an XML. Also, just like one of the task of compiler for any programming language is to see if the specification was followed, similarly there are parsers that use the DTD against an XML document to check for the document's validity.
A DTD helps you to define the structure of your XML document. It provides a strict framework and rules to be followed when creating XML documents. In addition, DTD can be used to check the validity and integrity of the data contained in an XML document. A few salient features of DTD are listed below:
- DTD is used to specify valid elements and attributes that can be used in the XML document.
- With a DTD, you can define a tree hierarchy of the elements.
- Sequential organization of a collection of child elements that can exist in an XML document can also be defined by using a DTD.
DTD can be used directly inside the XML source or can exist outside XML document with a link specified in the XML document to that DTD.
Basically, DTD consists of these items:
|DTD Element||Metadata about an element. It specifies what kind of data will element have, the number of occurrences of each element, relationships between elements and so on.|
|DTD Attributes||Specify various rules and specifications associated with the data.|
|DTD Entities||Used to reference a external file or provide a short cuts to common text.|
Example: <!ELEMENT employee (shift+, home-address, hobbies*)>
employee can have one or more shift a day and should have one home-address and can have zero or more hobbies.
Example: <!ATTLIST shift id CDATA #REQUIRED>
shift should have an id attribute.
In short, DTD is used to define a document structure by specifying the details regarding all the elements that are to be used and hence can be used to check the validity of an XML that is supposed to follow the rules laid down by this DTD.