LanguagesXMLDoes StAX Belong in Your XML Toolbox?

Does StAX Belong in Your XML Toolbox?

Introduction

Whenever you get a new tool in your toolbox, you are anxious to use it. As you followed along the examples in my previous articles, “Does StAX Stack Up?” and “Delving Deeper Into StAX,” you may have noticed that StAX (Streaming API for XML) is a very developer-friendly API. You also might be a little confused over when to use StAX versus the other XML processing tools in your toolbox such as SAX (Simple API for XML), DOM (Document Object Model), or TrAX (Transformation API for XML).

Each tool, or API, in the JAXP (Java API for XML processing) toolbox has its strengths and weaknesses in assisting you with a given job. While there are no hard and fast rules over when to use a given XML processing API, I’ll talk about the characteristics of all the JAXP API which now include StAX and give guidelines for their use.

JAXP API Overview

The JAXP family of inter-related API includes SAX, StAX, DOM, and TrAX as shown in Figure 1 below. Let’s briefly review the characteristics of each API.

SAX

SAX is a “push” type of API that provides an event callback interface. There is only a reading API. SAX requires that an entire XML document be read. However, it doesn’t require the entire document to be held in memory at any point in time. SAX is a very low-level, efficient API for parsing XML documents.

StAX

StAX is a “pull” type of API. As we discussed last time, there are a Cursor and an Event Iterator API. There are both reading and writing sides of the API. It is more developer friendly than SAX. Like SAX, StAX doesn’t require an entire document to be held in memory. However, unlike SAX, an entire document need not be read. Portions can be skipped. This may result in even improved performance over SAX.

DOM

DOM creates an in-memory tree representation of an XML document. It provides a very flexible API for creating, reading, updating, and deleting nodes within the tree. Implementations of DOM allow XPath expressions to be used to query the DOM, providing declarative access. Although there are optimized implementations of DOM, in general, an in-memory representation of the document is required; this can be costly in terms of performance.

TrAX

TrAX is an API for transforming source documents into result documents using XSLT (XML Stylesheet Language for Transformations), a declarative, rule-based language. XPath is a foundational part of XSLT and is used to query nodes from the source document and apply styling templates to them to create a result document. A TrAX source document may be created via SAX or DOM. TrAX requires both Java and XSLT skills. Optimizing TrAX takes a little bit of work; one approach can be found in my previous article, “Optimizing XSLT with TrAX.”


Figure 1: The JAXP Family tree.

JAXP API Summary

The following table summarizes the characteristics of the JAXP API:

JAXP API Property StAX SAX DOM TrAX
API Style Pull events; streaming Push events; streaming In memory tree based XSLT Rule based templates
Ease of Use High Medium High Medium
XPath Capability No No Yes Yes
CPU and Memory Utilization Good Good Depends Depends
Forward Only Yes Yes No No
Reading Yes Yes Yes Yes
Writing Yes No Yes Yes
Create, Read, Update, Delete (CRUD) No No Yes No

In reviewing the characteristics of the API, you’ll notice that StAX and SAX are both streaming, forward-only API. StAX scores higher on the ease of use scale because it uses a more familiar iterator versus event callback style API. It also has a writing API that SAX doesn’t have. It has the potential to be more efficient than SAX because it can easily read partial documents. Will StAX supplant SAX? It certainly has the potential to. However, SAX is a proven standby, while StAX is the new kid on the block. We’ll have to wait and see.

Note that DOM and TrAX have very different characteristics than their sister API. They serve different purposes than StAX and SAX and complement them well.

XML Processing Use Cases

Let’s look at some typical use cases for processing XML and see which tools in the JAXP toolbox work best for a given use case.

XML to XML Transformation

Often, it is necessary to transform one XML schema to another. This may be to adapt a request or response format, or perhaps to generate presentation such as XHTML or WML from XML.

While this use case could be implemented with several of the JAXP API, this is what TrAX (Transformation API for XML) was designed for and should be your first choice. Being a declarative, rules-based language, XSLT is a high-level language, optimized for this use case. The use of XPath makes navigating the source document a breeze.

Because there are both reading and writing sides of StAX, it might be considered for this job as well. It could be more efficient than XSLT. However, it is a lower-level and programmatic API and could result in a lot of code. For simple structures, it could work well.

Of course, DOM could do the job as well. As with StAX, it would be programmatic, but not as efficient. However, implementations of DOM can use XPath; this simplifies processing the input source document to the transformation.

Transform Arbitrary Data Structures to XML

Sometimes, it is useful to view arbitrary data structures as XML. In two of my previous articles, “Transforming Flat Files To XML with SAX and XSLT” and “Converting JDBC Result Sets to XML,” I provided detailed examples of this use case.

Both SAX and StAX have the ability to view arbitrary structures as a stream of XML events. With SAX, a class that implements the XMLReader interface is created. This class parses the non XML structure and publishes events to handler classes representing elements, attributes, and data. With StAX, a class is written to parse the non-XML structure and manufacture events using the XMLEventFactory. Of course, DOM could also be used for this task, although it might not be as efficient.

Data Binding

Data binding is another type of transformation where the XML is transformed to and from objects. SAX can be used to unmarshal an XML document to objects. However, because there isn’t a writing side of the API, another approach is needed to marshal an XML documents from objects. Because StAX has both a reading and writing side of the API, it is well suited to this purpose. In addition, a “pull” architecture for data binding based upon StAX can simplify code, reduce memory overhead, and improve performance.

Domain Model

Rather than transform XML into objects, sometimes it is useful to retain the incoming XML document and use it as your domain model. For example, if you acquired the data from an XML-based service, and are going to transform it into XHTML, what is the sense in binding it to objects? For this use case, DOM is the only choice. It is the only JAXP API providing true CRUD (Create, Read, Update, Delete) and random access ability to an XML document.

XML Pipelines

Some complex XML processing problems can be decomposed into a series of XML to XML transformations. It would be inefficient for each step in the pipeline to re-parse the XML. Pipelines can be designed with an API to communicate between steps. StAX is the only JAXP API designed specifically to optimize this use case. SAX is often used in pipelines; however, it does not provide a writing side of the API.

Use Case Summary

The following table summarizes which JAXP API is suited to the use cases discussed.

USE CASE StAX SAX DOM TrAX
XML to XML Transformation Decent choice for relatively simple structures N/A Decent choice for moderately complex structures Best overall choice
Transform arbitrary data structures to XML Good all-purpose choice Good all-purpose choice N/A N/A
Data Binding Good all-purpose choice Works only for unmarshaling XML May be used in conjunction with other API for buffering extremely complex structures N/A
Domain Model N/A N/A Best overall choice N/A
XML Pipelines Best overall choice Works for reading, but not writing steps in pipeline N/A N/A

Caveat

StAX is the newest API in the JAXP family. We’ve studied it over the last three articles and established some initial guidelines for its use. But, please keep in mind that experience will be the best teacher. As the API is used on more and more real-life projects, there will be additional learning to share.

Summary

We studied the place of StAX alongside its sister API in the JAXP family. It is well suited to many common XML processing use cases. It just might become the favorite tool in your XML processing toolbox.

In the last three articles, we’ve laid the groundwork for you to begin using StAX effectively. Are you looking to leverage StAX for your XML processing needs? The rest is up to you!

About the Author

Jeff Ryan is an enterprise architect for Hartford Financial Services Group Inc. He has twenty years experience designing, developing, and delivering automated solutions to business problems. His current focus is on Java, XML, and Service Oriented Architecture.

References

  1. Java Community Process. http://jcp.org, Streaming API for XML JSR-173. Specification Version 1.0, October 2003, p. 18
  2. Java Community Process. http://jcp.org, Streaming API for XML JSR-173. Specification Version 1.0, October 2003, p. 22

# # #

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends & analysis

Latest Posts

Related Stories