Data structures are a fundamental part of programming. A language without the ability to manipulate various data structures is useless. The Extensible Markup Language (XML) promises to change the way we (and computers) “view” data.
By marking up, XML brings contextual meaning and structure to data. With many vendors supporting the XML specification, we will soon find ourselves in a world filled with XML-based data. XML has the potential to become a universal standard for expressing data and therefore it can also be viewed as an effective mechanism for exchanging data.
Today, if a language does not support ODBC or JDBC (for database connectivity), it loses its effectiveness for enterprise applications. Similarly, in an XML world, it becomes imperative for programming languages to manipulate and interact with XML documents. This article provides an overview of how such interaction would happen in the case of the Java programming language.
Why Java?
From the beginning, Java distinguished itself among programming languages by being platform-independent. Objects created in Java would run on any platform that supports Java. In a way, XML is platform-independent data. If you accept the argument that programs and data go hand-in-hand, then you can see why Java and XML are such a perfect fit. As you’ll see later, the fact that Java is object-oriented also helps in its interaction with XML documents, but Java is not unique in that sense.Basics of XML/program integration
XML documents are text-based. Typically a document is composed of two parts. The first is the Document Type Definition (DTD), which specifies the rules and grammar for the XML document. The DTD allows you to specify what is and what is not a valid entry in the XML document. With that specification in hand, the parser can then validate the document. The second part of the XML document is the actual data. Tags are used to delimit data throughout a document. For each opening tag, there is a corresponding closing tag. Nesting of tags is also allowed. You can read the complete XML specification online at the World Wide Web Consortium.XML transforms a given set of data in two distinct manners. First, it simply marks up the data with tags. “Marked up” data is referred to as nodes. So XML transforms a set of data to a set of nodes. Second, XML provides a hierarchical organization to the document and its nodes. This hierarchy is the structure that becomes the core of program interaction.
In order for a programming language to effectively interact with XML documents the two types of transformations mentioned above must be captured. That is, the program must have a way to interact with the various nodes (elements, attributes, text, etc.) and it must have a way to navigate the hierarchical data structure of an XML document.
Parser
A parser is a program that reads and interprets an XML document. The parser does the low-level interaction with a physical XML file or a stream of data representing an XML document. The parser exposes the elements within an XML document and the hierarchical structure of the document.
With a good understanding of DOM and SAX you should be able to incorporate the newer technologies into your enterprise applications. |
There are two types of parsers: validating and non-validating. A validating parser can not only parse the XML document, but it can also parse the DTD that belongs to the document. Furthermore, it “validates” the XML document against the specified DTD. A non-validating parser, simply checks for “well-formedness” of the XML document. A document that conforms to the XML syntax guidelines (which are not very complex) is considered a well-formed document.
Before your Java program can do anything with an XML document, you must first choose a parser. There are a number of parsers available. Here is a partial listing:
- XML Parser from IBM — This is a validating parser. It is one of the most complete parsers available.
- Project X from Sun Microsystems — This validating parser is in early release. It requires JDK 1.1.6 or later.
- XP by James Clark — This is a simple parser developed by James Clark.
- Lark and Larval from Textuality — XML spec co-editor Tim Bray’s non-validating and validating parsers are implemented in Java.
I have a parser. Now what?
Once you’ve picked a parser, you are ready to begin writing Java programs that interact with XML. There are two main approaches for interacting with an XML document: The Document Object Model (DOM) and the Simple API for XML (SAX). You should refer to your parser documentation for details of how your parser supports DOM and SAX.
DOM
Most validating parsers support DOM. The basic philosophy behind DOM is to take advantage of the hierarchical nature of an XML document. DOM uses a tree structure to represent the document hierarchy and provides a set of APIs for accessing and manipulating the content of the tree. Of course, the content of the tree is nothing but the XML nodes “parsed” from the XML document. These nodes could be elements (tags), attributes, text, comments, and so on. The root node represents the document, and everything else falls under that.The obvious drawback of DOM is that the entire document must be parsed first, before your program can begin manipulating it. This may not be a problem for smaller documents, but with larger documents you run into memory management and efficiency issues.
SAX
Another approach to interacting with XML documents is based on an event-model. As the parser goes through an XML document, it generates events, and the program can then handle those events via callbacks. For example, as each tag begins, an event could be generated. The program could only respond to events for the tags that it is interested in.Although, SAX is arguably at a lower level than DOM, it does provide a very effective mechanism to manipulate XML documents. Its event-driven approach is not new to programmers who have experience with callbacks and GUI development. Further, SAX does not suffer from the memory management issues that DOM faces when dealing with larger documents.
Conclusion
XML is quickly becoming a popular standard for expressing data. In this article, we have discussed the basic themes for integrating XML data with a programming language such as Java. With a parser in hand, you can use either the DOM or SAX API to manipulate an XML document within your Java program. Many efforts are under way to expand the integration of XML and Java. Serialization of Java objects in terms of XML and using XML-RPC for exchanging objects/data are two examples. While it is certain that many more standards and APIs will be introduced, with a good understanding of DOM and SAX you should be able to incorporate the newer technologies into your enterprise applications.Piroz Mohseni is a senior managing consultant with Automated Concepts Inc. (ACI) in New York City. He specializes in enterprise Java, XML and e-commerce applications.