October 23, 2016
Hot Topics:

Working with XML and Java

  • January 2, 2008
  • By Rob Lybarger
  • Send Email »
  • More Articles »

DOM versus SAX

The preceding section talked about the classes and interfaces that contain the data that represents an XML document once it has been read from a file. However, the very process of reading in a file drags in a whole other discussion point: DOM versus SAX. In the Document Object Model (DOM) mode, the entire file is loaded and stored completely in memory all at once: The root element node, all descendants, text nodes, attributes, comments, processing instructions are all present in one big tree underneath the top-level Document object. You are given the entire tree when the parsing is finished, assuming there were no exceptions thrown while parsing the file; and you can inspect this big tree of nodes in any way you want at any time you want, adding or removing nodes, changing values, or whatever suits your fancy. This approach is the easiest one for you to deal with and is what is used for the examples in this article.

The alternative, the Simple API for XML (SAX), works quite a bit differently. Instead of loading up a complete node tree and giving you the result when finished, the SAX model instead functions as an event-callback model—the SAX parser reads just enough of the file at a time to know whether it has just read the start tag of an element, read characters in a text node, read a comment node, and so forth. For each such event, it calls a method to relay the information as it happens. It may happen that you have registered no callback method for a particular SAX event (more specifically, you have not overridden a "do nothing" callback method in some convenience class) and in this case, you have no knowledge that some particular node was encountered by the parser. The utility for using a SAX parser is explained more completely in other sources. However, one of the big reasons is that some XML documents are large. Really, really large. Larger than you want to commit memory for to have the entire structure loaded in memory. In cases such as this, a SAX-based approach lets you listen in on the parser and only respond to the little pieces of the file you are interested in. (In actual practice, writing code for a SAX parser can turn pretty ugly, because now you are responsible for keeping track of the state of such things as how many levels beneath a node you are currently at, so you know when to change your own processing rules.) Due to the number of details involved, I will not discuss a SAX-based approach further.

By Example

The XML Data File

First, create a rather arbitrary XML document. I will be working with a file named "demo.xml" that looks like this:

<?xml version="1.0"?>
   <child name="child_one">Hello</child>
   <child name="child_two">World</child>

Create a similar five-line file in a text editor and save it. You will next start working on code to load the file into a Document and then explore what you do to navigate around and read information from it.

The Java Code

To keep things simple at this point, you can start by importing these packages:

Note: Full-package imports is a bad, lazy habit. If your editor or IDE will insert import statements for each class you reference automatically, do it!

Next, instantiate a standard File object for your XML file. Something like this:

File file = new File("demo.xml");

Although there are variations in the factory methods coming up next that can try to work directly with a filename (that is, a String object), thereby skipping the need for instantiating a File object, doing things this way lets you make a quick sanity check via file.exists() and possibly file.canRead(). This lets you print a polite warning message and exit before getting further weighed down in the code. (In such cases of errors, do be so polite as to exit with a status other than zero, because another program can determine your program had a problem if the exit code is non-zero.) Consider:

String FILE_NAME = "demo.xml";
File file = new File(FILE_NAME);
if ( ! (file.exists() && file.canRead()) {
   System.err.println("Error: cannot read "+FILE_NAME+".
                      Exiting now.");

As to actually get the XML file loaded into a Document object, follow this recipe:

DocumentBuilderFactory dbFactory =
DocumentBuilder builder = dbFactory.newDocumentBuilder();
Document doc = builder.parse(file);
Important: These three lines may throw a few different types of exceptions while doing their work. The code for this demo will be given below and will include a fairly lazy try/catch around everything. Again, this is a bad habit, but this keeps the article/example simple.

Page 2 of 5

Comment and Contribute


(Maximum characters: 1200). You have characters left.



Enterprise Development Update

Don't miss an article. Subscribe to our newsletter below.

Sitemap | Contact Us

Thanks for your registration, follow us on our social networks to keep up-to-date
Rocket Fuel