Introduction
Whenever you get a new tool in your toolbox, you are anxious to use it. As you followed along the examples in my previous articles, “Does StAX Stack Up?” and “Delving Deeper Into StAX,” you may have noticed that StAX (Streaming API for XML) is a very developer-friendly API. You also might be a little confused over when to use StAX versus the other XML processing tools in your toolbox such as SAX (Simple API for XML), DOM (Document Object Model), or TrAX (Transformation API for XML).
Each tool, or API, in the JAXP (Java API for XML processing) toolbox has its strengths and weaknesses in assisting you with a given job. While there are no hard and fast rules over when to use a given XML processing API, I’ll talk about the characteristics of all the JAXP API which now include StAX and give guidelines for their use.
JAXP API Overview
The JAXP family of inter-related API includes SAX, StAX, DOM, and TrAX as shown in Figure 1 below. Let’s briefly review the characteristics of each API.
SAX
SAX is a “push” type of API that provides an event callback interface. There is only a reading API. SAX requires that an entire XML document be read. However, it doesn’t require the entire document to be held in memory at any point in time. SAX is a very low-level, efficient API for parsing XML documents.
StAX
StAX is a “pull” type of API. As we discussed last time, there are a Cursor and an Event Iterator API. There are both reading and writing sides of the API. It is more developer friendly than SAX. Like SAX, StAX doesn’t require an entire document to be held in memory. However, unlike SAX, an entire document need not be read. Portions can be skipped. This may result in even improved performance over SAX.
DOM
DOM creates an in-memory tree representation of an XML document. It provides a very flexible API for creating, reading, updating, and deleting nodes within the tree. Implementations of DOM allow XPath expressions to be used to query the DOM, providing declarative access. Although there are optimized implementations of DOM, in general, an in-memory representation of the document is required; this can be costly in terms of performance.
TrAX
TrAX is an API for transforming source documents into result documents using XSLT (XML Stylesheet Language for Transformations), a declarative, rule-based language. XPath is a foundational part of XSLT and is used to query nodes from the source document and apply styling templates to them to create a result document. A TrAX source document may be created via SAX or DOM. TrAX requires both Java and XSLT skills. Optimizing TrAX takes a little bit of work; one approach can be found in my previous article, “Optimizing XSLT with TrAX.”
Figure 1: The JAXP Family tree.
JAXP API Summary
The following table summarizes the characteristics of the JAXP API:
JAXP API Property | StAX | SAX | DOM | TrAX |
API Style | Pull events; streaming | Push events; streaming | In memory tree based | XSLT Rule based templates |
Ease of Use | High | Medium | High | Medium |
XPath Capability | No | No | Yes | Yes |
CPU and Memory Utilization | Good | Good | Depends | Depends |
Forward Only | Yes | Yes | No | No |
Reading | Yes | Yes | Yes | Yes |
Writing | Yes | No | Yes | Yes |
Create, Read, Update, Delete (CRUD) | No | No | Yes | No |
In reviewing the characteristics of the API, you’ll notice that StAX and SAX are both streaming, forward-only API. StAX scores higher on the ease of use scale because it uses a more familiar iterator versus event callback style API. It also has a writing API that SAX doesn’t have. It has the potential to be more efficient than SAX because it can easily read partial documents. Will StAX supplant SAX? It certainly has the potential to. However, SAX is a proven standby, while StAX is the new kid on the block. We’ll have to wait and see.
Note that DOM and TrAX have very different characteristics than their sister API. They serve different purposes than StAX and SAX and complement them well.
XML Processing Use Cases
Let’s look at some typical use cases for processing XML and see which tools in the JAXP toolbox work best for a given use case.
XML to XML Transformation
Often, it is necessary to transform one XML schema to another. This may be to adapt a request or response format, or perhaps to generate presentation such as XHTML or WML from XML.
While this use case could be implemented with several of the JAXP API, this is what TrAX (Transformation API for XML) was designed for and should be your first choice. Being a declarative, rules-based language, XSLT is a high-level language, optimized for this use case. The use of XPath makes navigating the source document a breeze.
Because there are both reading and writing sides of StAX, it might be considered for this job as well. It could be more efficient than XSLT. However, it is a lower-level and programmatic API and could result in a lot of code. For simple structures, it could work well.
Of course, DOM could do the job as well. As with StAX, it would be programmatic, but not as efficient. However, implementations of DOM can use XPath; this simplifies processing the input source document to the transformation.
Transform Arbitrary Data Structures to XML
Sometimes, it is useful to view arbitrary data structures as XML. In two of my previous articles, “Transforming Flat Files To XML with SAX and XSLT” and “Converting JDBC Result Sets to XML,” I provided detailed examples of this use case.
Both SAX and StAX have the ability to view arbitrary structures as a stream of XML events. With SAX, a class that implements the XMLReader interface is created. This class parses the non XML structure and publishes events to handler classes representing elements, attributes, and data. With StAX, a class is written to parse the non-XML structure and manufacture events using the XMLEventFactory. Of course, DOM could also be used for this task, although it might not be as efficient.
Data Binding
Data binding is another type of transformation where the XML is transformed to and from objects. SAX can be used to unmarshal an XML document to objects. However, because there isn’t a writing side of the API, another approach is needed to marshal an XML documents from objects. Because StAX has both a reading and writing side of the API, it is well suited to this purpose. In addition, a “pull” architecture for data binding based upon StAX can simplify code, reduce memory overhead, and improve performance.
Domain Model
Rather than transform XML into objects, sometimes it is useful to retain the incoming XML document and use it as your domain model. For example, if you acquired the data from an XML-based service, and are going to transform it into XHTML, what is the sense in binding it to objects? For this use case, DOM is the only choice. It is the only JAXP API providing true CRUD (Create, Read, Update, Delete) and random access ability to an XML document.
XML Pipelines
Some complex XML processing problems can be decomposed into a series of XML to XML transformations. It would be inefficient for each step in the pipeline to re-parse the XML. Pipelines can be designed with an API to communicate between steps. StAX is the only JAXP API designed specifically to optimize this use case. SAX is often used in pipelines; however, it does not provide a writing side of the API.
Use Case Summary
The following table summarizes which JAXP API is suited to the use cases discussed.
USE CASE | StAX | SAX | DOM | TrAX |
XML to XML Transformation | Decent choice for relatively simple structures | N/A | Decent choice for moderately complex structures | Best overall choice |
Transform arbitrary data structures to XML | Good all-purpose choice | Good all-purpose choice | N/A | N/A |
Data Binding | Good all-purpose choice | Works only for unmarshaling XML | May be used in conjunction with other API for buffering extremely complex structures | N/A |
Domain Model | N/A | N/A | Best overall choice | N/A |
XML Pipelines | Best overall choice | Works for reading, but not writing steps in pipeline | N/A | N/A |
Caveat
StAX is the newest API in the JAXP family. We’ve studied it over the last three articles and established some initial guidelines for its use. But, please keep in mind that experience will be the best teacher. As the API is used on more and more real-life projects, there will be additional learning to share.
Summary
We studied the place of StAX alongside its sister API in the JAXP family. It is well suited to many common XML processing use cases. It just might become the favorite tool in your XML processing toolbox.
In the last three articles, we’ve laid the groundwork for you to begin using StAX effectively. Are you looking to leverage StAX for your XML processing needs? The rest is up to you!
About the Author
Jeff Ryan is an enterprise architect for Hartford Financial Services Group Inc. He has twenty years experience designing, developing, and delivering automated solutions to business problems. His current focus is on Java, XML, and Service Oriented Architecture. |
References
- Java Community Process. http://jcp.org, Streaming API for XML JSR-173. Specification Version 1.0, October 2003, p. 18
- Java Community Process. http://jcp.org, Streaming API for XML JSR-173. Specification Version 1.0, October 2003, p. 22
# # #