Does StAX Belong in Your XML Toolbox?
Whenever you get a new tool in your toolbox, you are anxious to use it. As you followed along the examples in my previous articles, "Does StAX Stack Up?" and "Delving Deeper Into StAX," you may have noticed that StAX (Streaming API for XML) is a very developer-friendly API. You also might be a little confused over when to use StAX versus the other XML processing tools in your toolbox such as SAX (Simple API for XML), DOM (Document Object Model), or TrAX (Transformation API for XML).
Each tool, or API, in the JAXP (Java API for XML processing) toolbox has its strengths and weaknesses in assisting you with a given job. While there are no hard and fast rules over when to use a given XML processing API, I'll talk about the characteristics of all the JAXP API which now include StAX and give guidelines for their use.
JAXP API Overview
The JAXP family of inter-related API includes SAX, StAX, DOM, and TrAX as shown in Figure 1 below. Let's briefly review the characteristics of each API.
SAX is a "push" type of API that provides an event callback interface. There is only a reading API. SAX requires that an entire XML document be read. However, it doesn't require the entire document to be held in memory at any point in time. SAX is a very low-level, efficient API for parsing XML documents.
StAX is a "pull" type of API. As we discussed last time, there are a Cursor and an Event Iterator API. There are both reading and writing sides of the API. It is more developer friendly than SAX. Like SAX, StAX doesn't require an entire document to be held in memory. However, unlike SAX, an entire document need not be read. Portions can be skipped. This may result in even improved performance over SAX.
DOM creates an in-memory tree representation of an XML document. It provides a very flexible API for creating, reading, updating, and deleting nodes within the tree. Implementations of DOM allow XPath expressions to be used to query the DOM, providing declarative access. Although there are optimized implementations of DOM, in general, an in-memory representation of the document is required; this can be costly in terms of performance.
TrAX is an API for transforming source documents into result documents using XSLT (XML Stylesheet Language for Transformations), a declarative, rule-based language. XPath is a foundational part of XSLT and is used to query nodes from the source document and apply styling templates to them to create a result document. A TrAX source document may be created via SAX or DOM. TrAX requires both Java and XSLT skills. Optimizing TrAX takes a little bit of work; one approach can be found in my previous article, "Optimizing XSLT with TrAX."
Figure 1: The JAXP Family tree.
JAXP API Summary
The following table summarizes the characteristics of the JAXP API:
|JAXP API Property||StAX||SAX||DOM||TrAX|
|API Style||Pull events; streaming||Push events; streaming||In memory tree based||XSLT Rule based templates|
|Ease of Use||High||Medium||High||Medium|
|CPU and Memory Utilization||Good||Good||Depends||Depends|
|Create, Read, Update, Delete (CRUD)||No||No||Yes||No|
In reviewing the characteristics of the API, you'll notice that StAX and SAX are both streaming, forward-only API. StAX scores higher on the ease of use scale because it uses a more familiar iterator versus event callback style API. It also has a writing API that SAX doesn't have. It has the potential to be more efficient than SAX because it can easily read partial documents. Will StAX supplant SAX? It certainly has the potential to. However, SAX is a proven standby, while StAX is the new kid on the block. We'll have to wait and see.
Note that DOM and TrAX have very different characteristics than their sister API. They serve different purposes than StAX and SAX and complement them well.
XML Processing Use Cases
Let's look at some typical use cases for processing XML and see which tools in the JAXP toolbox work best for a given use case.
XML to XML Transformation
Often, it is necessary to transform one XML schema to another. This may be to adapt a request or response format, or perhaps to generate presentation such as XHTML or WML from XML.
While this use case could be implemented with several of the JAXP API, this is what TrAX (Transformation API for XML) was designed for and should be your first choice. Being a declarative, rules-based language, XSLT is a high-level language, optimized for this use case. The use of XPath makes navigating the source document a breeze.
Because there are both reading and writing sides of StAX, it might be considered for this job as well. It could be more efficient than XSLT. However, it is a lower-level and programmatic API and could result in a lot of code. For simple structures, it could work well.
Of course, DOM could do the job as well. As with StAX, it would be programmatic, but not as efficient. However, implementations of DOM can use XPath; this simplifies processing the input source document to the transformation.