Java plus XML is a combination of skills that is currently much in demand.
For Java programmers that want jump into the XML fray, this article shows you the
basics of using the Java API for XML Processing (JAXP).
API Overview
The Java API for XML Processing (JAXP) gives Java programmers a standardized API for working with XML documents, independent of the actual XML parser that is used. The classes and interfaces that comprise JAXP can be divided into three general categories:
- Document Object Model (DOM) – These classes are used to process XML
documents as DOM documents. - Simple API for XML (SAX) – These classes are used to parse XML documents in
an event-driven manner. - XML Stylesheet Language for Transformations (XSLT) –
These classes are used to transform XML documents using XSL.
Tools
To try the examples in this article, you will need the following tools:
- Java Development Kit (JDK) 1.3 or higher
- Java XML Pack 1.2
- Your favorite text editor
Using Dom
The Document Object Model (DOM) is a standardized API that is used to represent,
navigate, and manipulate the structure and content of structured documents, such
as valid HTML and XML. Documents are represented in DOM as trees, where each
document contains one root node, which has zero or more child nodes, which in
turn can be the root node of a tree.
To create a DOM document from an XML file, you use classes in the
javax.xml.parsers package. DocumentBuilderFactory is used to create instances of
DocumentBuilder, which is used to create DOM documents from XML sources. To
navigate and manipulate the document, you use the classes in the org.w3c.dom
package, such as Document and Node.
DomPrint is an example of using the JAXP
DOM classes. It creates a DOM document from a given XML file then prints the
content as plain text, with indentation to indicate nested elements. Even though
it is recursive, the algorithm for DomPrint is straightforward:
-
Check command-line arguments. If not enough arguments,
print usage message, then exit. -
Create a File object from the first command-line
argument. -
Get an instance of DocumentBuilderFactory and
configure it. -
Get a DocumentBuilder from the DocumentBuilderFactory.
-
Tell the DocumentBuilder to parse the given file and
return a DOM Document. -
Print the tree, starting from the root node:
-
Print indentation for the given nesting level (0 =
no indentation).< /p > -
Print the node name.
-
If the node has attributes, print them, one per
line, indented under the node name. -
Print the node value on the next line after the node
name. -
If the node has children:
-
Increment indentation level.
-
For each child: print the tree, starting from the
child.
-
-
Running DomPrint on an Ant project file produces
this output. Running it on a DocBook article produces this output
.
Using Sax
The Simple API for XML (SAX) is an event-based API for processing XML documents.
As a document is parsed, events, such as document start or element start, are
reported to an application. In order to handle these events, the application
implements event handling interfaces.
To parse an XML document with SAX, you use the classes in the java.xml.parsers
package. SAXParserFactory is used to create instances of SAXParser, which is
used to parse XML documents. To handle parsing events, you extend
org.xml.sax.helpers.DefaultHandler or implement org.xml.sax.ContentHandler.
SaxPrint is an example of using SAX to parse
an XML document. It parses a given XML file and prints the content as block-
structured text. Here is the algorithm:
-
Get command-line arguments. If not enough arguments,
print usage message, then exit. -
Create File from first command-line argument.
-
Get an instance of SAXParserFactory and configure it.
-
Get a SAXParser from the SAXParserFactory.
-
Tell the SAXParser to parse the given file.
-
Handle events:
-
When startDocument: print “BEGIN DOCUMENT”.
-
When endDocument: print “END DOCUMENT”.
-
When startElement:
-
Print “BEGIN” + element name.
-
If element has attributes, print them, indented
under element name.
-
-
When endElement: print “END ” + element name.
Running SaxPrint on an Ant project file produces
this output. Running it on a DocBook article produces this output
.
Using Transformations
The XML Stylesheet Language for Transformations (XSLT) classes are used to
transform XML documents into other forms, such as other XML structures, HTML, or
plain text. Transformation is accomplished by applying instructions (rules) in
an XSL stylesheet to an input source and creating an output result. Both the
input source and the output result can be an a DOM document, SAX events, or an
XML stream.
To transform an XML document with XSLT, you use the classes in the
javax.xml.transform package. TransformerFactory is used to create instances of
Transformer, which is used to run transformations. Input sources and output
results are created with the classes in the package that corresponds to the type
or source or result. For example, stream sources are created with the classes in
the javax.xml.transform.stream package.
Transform is an example of transforming a given XML
file with a given XSL stylesheet. Both the input and the result are streams.
Here is the algorithm:
- Get command-line arguments. If not enough arguments, print usage
message, then exit. - Create stylesheet File from first command-line argument, input File from
second command-line argument. - Create stream sources for stylesheet and input file, stream result for
System.out. - Get instance of TransformerFactory.
- Get a Transformer from TransformerFactory that uses the given
stylesheet. - Tell the Transformer to transform the input stream
and write the output to the result stream.
Running Transform on article.xml using article2html.xsl produces this output
.
Resources
- Java Technology and XML
Understanding XML- Java XML Pack
- JAXP
Tutorial - SAX Project
- XSLT Specification
Copyright ) 2002, Thornton Rose