Introduction
In my previous article, Does StAX Stack Up?, I introduced StAX (Streaming API for XML), the new parsing API in the JAXP (Java API for XML Processing) family. We talked about how StAX fit in with its sister API, downloaded the reference implementation, and developed some simple examples of using the StAX cursor based API for reading and writing documents. In this article, we’ll dive deeper into StAX and introduce the features of the more advanced event iterator API.
StAX API Overview
While StAX is an option to consider versus DOM (Document Object Model), SAX (Simple API for XML) and TrAX (Transformation API for XML), within StAX itself, there are options to consider. StAX has both a cursor API and an event iterator API. Each of these APIs has a reading and writing side. This is depicted in the following diagram:
Cursor API Recap
In the previous article, we built the equivalent of “Hello world!” examples using the read and write sides of the cursor API. Conceptually, the cursor-based API moves a virtual cursor over the XML document.
On the reading side, an instance of an XMLStreamReader is obtained from the XMLInputFactory. The iterator API exposes hasNext() and next() methods that are used to read through the document in a forward only manner. Accessor methods such as getText() are used to read the current event which may be an element, attribute, or data. Interrogator methods such as isStartElement() help parse elements, attributes, and data from the document. Here is a sample code snippet for reference:
XMLInputFactory factory = XMLInputFactory.newInstance(); XMLStreamReader reader = factory.createXMLStreamReader(new FileReader("test.xml")); while (reader.hasNext()) { if (reader.isStartElement()) { System.out.print(reader.getText()); } reader.next(); } reader.close();
On the writing side of the API, an interface is exposed to write elements, attributes, and data. An instance of an XMLStreamWriter is obtained from the XMLOutputFactory. Once again, a simple example is provided for reference:
XMLOutputFactory.newInstance(); XMLStreamWriter writer = factory.createXMLStreamWriter(new FileWriter("test.xml")); writer.writeStartDocument(); writer.writeStartElement("Name"); writer.writeCharacters("Jeff"); writer.writeEndElement("Name"); writer.writeEndDocument(); writer.flush();
Event Iterator API Overview
The event iterator API also has reading and writing sides of the API. As with the cursor API, instances of readers are obtained from the input and output factories. However, there is an additional factory, XMLEventFactory, used to manufacture events.
On the reading side, an iterator API exposes a hasNext() method for reading through the document. The nextEvent() method is used to get a handle on an event. There are various subclasses of events with their own accessor and interrogator methods.
Here is a sample code snippet for reference:
XMLInputFactory.newInstance(); XMLEventReader reader = factory.createXMLEventReader(new FileReader("test.xml"')); while (reader.hasNext()) { XMLEvent event = reader.nextEvent(); switch (event.getEventType()) { case XMLEvent.START_ELEMENT : StartElement se = event.asStartElement(); System.out.println(se.getName()); } } reader.close;
On the writing side of the API, the XMLEventFactory is used to manufacture events that can be added to the output stream by the XMLEventWriter. Once again, a code snippet is provided for reference:
XMLOutputFactory outputFactory = XMLOutputFactory.newInstance(); XMLEventWriter writer = outputFactory.createXMLEventWriter(new FileWriter("test.xml")); XMLEventFactory eventFactory = XMLEventFactory.newInstance(); StartDocument sd = eventFactory.createStartDocument(); StartElement se = eventFactory.createStartElement("","","Name"); Characters c = eventFactory.createCharacters("Jeff"); EndElement ee = eventFactory.createEndElement("","","Name"); EndDocument ed = eventFactory.createEndDocument(); writer.add(sd); writer.add(se); writer.add(c); writer.add(ee); writer.add(ed); writer.close();
SimpleXmlEventReader
Let’s build a full example to showcase some of the capabilities of the event iterator API. We’ll begin with imports from the new stream packages:
package com.developer.stax; import java.io.*; import java.util.*; import javax.xml.stream.*; import javax.xml.stream.events.*; public class SimpleXmlEventReader {
We’ll create an instance of the XMLInputFactory, and ask it for an XMLEventReader. So far, this isn’t much different than the cursor API other than the type of reader created.
public static void main(String[] args) throws FileNotFoundException, XMLStreamException { String filename = args[0]; XMLInputFactory factory = XMLInputFactory.newInstance(); XMLEventReader reader = factory.createXMLEventReader(new FileReader(filename));
Now we start to see some of the differences in the event iterator API.The cursor API would use next() to position the cursor at the next element, attribute, or data. With the event iterator API, we use nextEvent() to get a handle to the next event. A very handy feature is the peek() method that is used to determine what the next sequential event is.
while (reader.hasNext())
{
XMLEvent event = reader.nextEvent();
XMLEvent nextEvent = reader.peek();
We can determine the type of event by using the switch construct over the event type, or by interrogator methods. This is true of both the cursor and event API. However, the event API has asXxx() methods such as asStartElement() to type the proper event object without casting. In the followingcode, StartElement, Character, and EndElement events are parsed. Notice how the nextEvent instance variable obtained via peek() is used to determine whether or not the StartElement has characters.
switch (event.getEventType()) { case XMLEvent.START_ELEMENT : StartElement se = event.asStartElement(); System.out.print("<" + se.getName()); Iterator attributes = se.getAttributes(); while (attributes.hasNext()) { Attribute attr = (Attribute)attributes.next(); System.out.print(" " + attr.getName() + "="" + attr.getValue() + """); } System.out.print(">"); if (nextEvent.isCharacters() ) { Characters c = reader.nextEvent().asCharacters(); if (!c.isWhiteSpace()) System.out.print(c.getData()); } case XMLEvent.END_ELEMENT> EndElement ee = event.asEndElement(); System.out.print("</" + ee.getName() + ">"); break; } } reader.close(); } }
In case you cannot tell by the example, SimpleXmlEventReader simply echoes the input document to the System.out object.
SimpleXmlEventWriter
Now, let’s build a full example to showcase the event writing API. We begin with imports from the new streaming packages.
package com.developer.stax; import java.io.*; import javax.xml.stream.*; import javax.xml.stream.events.*; public class SimpleXmlEventWriter {
Notice how the main method below is expecting two input parameters. The first parameter, args[0], represents the input file to be merged or wrapped in the output file specified in args[1].
public static void main(String[] args) throws FileNotFoundException, XMLStreamException, IOException { String inFile = args[0]; String outFile = args[1];
Here we get an instance of an XMLEventWriter from the XMLOutputFactory. We also get an XMLEventFactory for manufacturing events.
XMLOutputFactory outputFactory = XMLOutputFactory.newInstance();
XMLEventWriter writer =
outputFactory.createXMLEventWriter(new FileWriter(outFile));
XMLEventFactory eventFactory = XMLEventFactory.newInstance();
Before writing to the output stream, we need to create the needed events. Because we are going to be wrapping the input document, we’ll create the wrapper element and add it to the output stream.
StartDocument sd = eventFactory.createStartDocument(); StartElement se = eventFactory.createStartElement("","","Wrapper"); writer.add(sd); writer.add(se);
The next section of code bears special interest. Here, we get an event reader for the passed input file. Then, we simply add it to the output stream. Merging XML documents has never been easier than this.
// merge file XMLInputFactory factory = XMLInputFactory.newInstance(); XMLEventReader reader = factory.createXMLEventReader(new FileReader(inFile)); reader.next(); // read past processing instruction writer.add(reader);
Finally, we manufacture the EndElement and EndDocument events, add them to the output stream, and close the writer. The call to SimpleXmlEventReader will echo our wrapper document to System.out.
EndElement ee = eventFactory.createEndElement("","","Wrapper"); EndDocument ed = eventFactory.createEndDocument(); writer.add(ee); writer.add(ed); writer.close(); String[] argv = new String[1]; argv[0] = outFile; SimpleXmlEventReader.main(argv); }
When to Use a Cursor or Event Iterator API
The cursor API is less verbose and less powerful than the event API. Presumably, it is more efficient at what it does and creates fewer temporary objects. Both the cursor and event iterator API are forward only API. However, the event iterator API provides a peek() feature to get the next event, as was demonstrated in SimpleXmlEventWriter. The event iterator API has many other capabilities that we didn’t cover here, such as the ability to filter, buffer, persist, and compare events[i].
Sample Code
The sample code can be downloaded here.
Summary
StAX has cursor and event based iterator APIs. There are both reading and writing sides of the API. We reviewed code snippets of all the API, and built classes for reading and writing documents using the event API. We discussed why there are two types of StAX API, and why you might use each. Next time, we’ll discuss when it is appropriate to use SAX, DOM, TrAX, and StAX—the various API in the JAXP family. Until then, the rest is up to you!
About the Author
Jeff Ryan is an enterprise architect for Hartford Financial Services. He has twenty years experience designing, developing, and delivering automated solutions to business problems. His current focus is on Java, XML, and Service Oriented Architecture. He may be reached at jeffreyjryan@aol.com.
[i] Java Community Process http://jcp.org, Streaming API for XML JSR-173 Specification Version 1.0, October 2003