Getting Started with Java JAXP and XSL Transformations (XSLT)
November 25, 2003
The Source interface
Source is an interface, not a class. Sun has this to say
about the Source interface:
"An object that implements this interface contains the
information needed to act as source input (XML source or transformation
instructions)."
(Note that the reference to transformation instructions
in the above quotation is a reference to the input parameter to the
second overloaded version of the newTransformer method
discussed earlier. Again, I will show you how to use this version
in a future lesson.)
In this program, I will create and use an object of the DOMSource
class as the source for the transformation. (The DOMSource
class implements the Source interface.)
The DOMSource class
Here is what Sun has to say about an object of the DOMSource
class:
"Acts as a holder for a transformation Source tree in
the
form of a Document Object Model (DOM) tree."
The Result interface
Sun has this to say about the Result interface:
"An object that implements this interface contains the
information needed to build a transformation result tree."
In this program, I will transform the DOMSource object
into two different Result objects:
A StreamResult object that points to the Standard Output
Device (typically the screen).
A StreamResult object that points to the output file.
The StreamResult class
Sun has this to say about the StreamResult class:
"Acts as an holder for a transformation result, which
may be
XML, plain Text, HTML, or some other form of markup."
Get a DOMSource object
Listing 4 shows the code that gets a DOMSource object, which
represents the Document object.
DOMSource source = new DOMSource(document);
Listing 4
The DOMSource class provides several different overloaded
constructors, one of which requires a single incoming parameter of type
Node.
Recall that the variable document contains a reference to an
object
that implements the Document interface, which is a subinterface
of
the Node interface. Thus, document satisfies the
parameter type requirement for the constructor shown in Listing 4.
The DOMSource object produced in Listing 4 will later be
transformed into two different Result objects.
Get a StreamResult object
The statement in Listing 5 gets a StreamResult object that
points to the Standard Output Device.
StreamResult scrResult = new StreamResult(System.out);
Listing 5
The StreamResult class provides several overloaded
constructors, one of which requires an incoming parameter of type OutputStream.
System.out contains a reference to an object of type PrintStream,
which is a subclass of OutputStream. Therefore, System.out
satisfies the parameter type requirement for one of the overloaded
constructors
of StreamResult.
Transform the DOMSource to text on the screen
The statement in Listing 6 invokes the transform method of the
Transformer class to transform the DOMSource object to
text
on the screen.
transformer.transform(source, scrResult);
Listing 6
The two parameters to the transform method shown in Listing 6
satisfy the parameter type requirements (Source and Result)
shown earlier in Figure 1.
Because the DOMSource object represents the Document
object, the code in Listing 6 transforms the Document object to
the screen. Since the Document object represents the
original XML file, this effectively transforms the contents of the
original XML file to the screen.
The screen output
The statement shown in Listing 6 produced the screen output shown in
Figure 2.
<?xml version="1.0" encoding="UTF-8"?> <bookOfPoems> <poem PoemNumber="1" DumAtr="dum val"> <line>Roses are red,</line> <line>Violets are blue.</line> <line>Sugar is sweet,</line> <line>and so are you.</line> </poem> <?processor ProcInstr="Dummy"?> <!--Comment--> <poem PoemNumber="2" DumAtr="dum val"> <line>Roses are pink,</line> <line>Dandelions are yellow,</line> <line>If you like Java,</line> <line>You are a good fellow.</line> </poem> </bookOfPoems>
Figure 2
If you compare Figure 2 with the input XML file shown in Listing 10
near the end of the lesson, you will see that it matches in all
respects but one.
The one line that doesn't match is the XML declaration in the first
line
of Figure 2 and Listing 10.
The XML declaration
The XML declaration is really not part of the XML data. Rather,
the XML declaration provides information to the processor being used to
process the XML data. I don't believe that the XML declaration
becomes a part of the DOM tree structure.
(Recall that in the previous lesson, I used a separate
statement to write the XML declaration into the output file before
beginning the process of writing data in the output file based on data
in the DOM tree.)
The encoding attribute in the XML declaration shown in Figure
2
is optional. I elected not to include it in the original XML
file.
The author of the transform method of the Transformer
class
elected to include it in the transformed output. That is why it
appears
in Figure 2 and does not appear in Listing 10.
Write an output XML file
The three statements in Listing 7 perform the following three actions
in order:
Get an output stream for the output XML file.
Get a StreamResult object that points to the output file.
Transform the DOMSource object to text in the output file.
PrintWriter outStream = new PrintWriter( new FileOutputStream(argv[1]));
StreamResult fileResult = new StreamResult(outStream);
Figure 3 shows the contents of the output file produced by Listing 7.
<?xml version="1.0" encoding="UTF-8"?> <bookOfPoems> <poem PoemNumber="1" DumAtr="dum val"> <line>Roses are red,</line> <line>Violets are blue.</line> <line>Sugar is sweet,</line> <line>and so are you.</line> </poem> <?processor ProcInstr="Dummy"?> <!--Comment--> <poem PoemNumber="2" DumAtr="dum val"> <line>Roses are pink,</line> <line>Dandelions are yellow,</line> <line>If you like Java,</line> <line>You are a good fellow.</line> </poem> </bookOfPoems>
Figure 3
As you might have surmised, the contents of the output file shown in
Figure 3 match the screen output shown in Figure 2. Also, with
the exception of the optional encoding attribute in the XML
declaration, the contents of the output file match the contents of the
original XML file shown in Listing 10.
End of the try block
Listing 7 also signals the end of the try block and the end of
the code required to apply an identity XSL Transformation to a Document
object.
Now you know how to use an identity transform to either display the XML
data encapsulated in a Document object, or to cause that XML
data
to be written into a new XML file.
The remainder of this lesson deals with errors and exceptions, with
particular emphasis on providing meaningful output in the event of a
parser error.
Potential errors and exceptions
If we scan back through the code, we can identify the following
expressions related to XML processing that have the potential of
throwing errors and exceptions (I will omit I/O exceptions from
this discussion).
A review of the Sun documentation reveals that these expressions can
throw the errors and exceptions shown.
parser.parse(new File(argv[0]) throws SAXException if any
parse errors occur.
docBuildFactory.newDocumentBuilder() throws ParserConfigurationException
if a DocumentBuilder cannot be created which satisfies the
configuration requested.
xformFactory.newTransformer() throws TransformerConfigurationException
- May throw this during the parse when it is constructing the Templates
object and fails.
transformer.transform(source, scrResult) throws TransformerException
if an unrecoverable error occurs during the course of the
transformation.
transformer.transform(source, fileResult) throws TransformerException
if an unrecoverable error occurs during the course of the
transformation.
TransformerFactory.newInstance() throws TransformerFactoryConfigurationError
if the implementation is not available or cannot be instantiated.
DocumentBuilderFactory.newInstance() throws FactoryConfigurationError
if the implementation is not available or cannot be instantiated.
Handling errors and exceptions
The remaining code in the program provides specific catch blocks for
some, but not all of the exceptions and errors listed above.
(A general Exception catch block is provided to
handle those errors and exceptions for which specific catch blocks are
not provided.)
The SAXException class
The classes of primary interest in this lesson are the SAXException
class and a subclass of that class named SAXParseException.
Here is part of what Sun has to say about the SAXException
class (boldface added by this author for emphasis):
"Encapsulate a general SAX error or warning. ... This
class can contain basic error or warning information from either the
XML parser or the application: a parser writer or application writer
can subclass it to provide additional functionality. SAX handlers may
throw this exception or any exception subclassed from it.
If the application needs to pass through other types of
exceptions, it must wrap those exceptions in a SAXException or an
exception derived from a SAXException.
If the parser or application needs to include information about
a specific location in an XML document, it should use the
SAXParseException subclass."
The SAXParseException class
The SAXParseException class is a subclass of SAXException.
An object of SAXParseException can
"Encapsulate an XML parse error or warning. ... This
exception will include information for locating the error in the
original XML document."
The list that I showed you earlier indicated that the parse
method of the DocumentBuilder class throws SAXException.
That means that it can also throw any exception that is a subclass of SAXException.
As it turns out, the parse method actually throws a SAXParseException,
for at least some of the possible parsing error types.
The SAXParseException handler
Listing 8 shows the entire catch block for handling an exception of
type SAXParseException.
Exception ex = saxEx; if(saxEx.getException() != null){ ex = saxEx.getException(); System.err.println(ex.getMessage());} }//end catch
Listing 8
Of particular interest is the invocation of the five get
methods on the exception object for the purpose of getting and
displaying information about the exception.
Listing 11 contains an XML file named Xsl01bad.xml for which a
right angle bracket was purposely omitted from the end tag on the sixth
line of text. This caused the XML document to not be well formed
because
the line element on the sixth line is malformed.
The screen output
When this program was used to process the corrupt file named Xsl01bad.xml,
the code in Listing 8 produced the output shown in Figure 4. (Note
that I manually inserted a line break to force some of the output to
fit in this narrow publication format.)
SAXParseException Public ID: null System ID: file:C:/jnk/Xslt01bad.xml Line: 7 Column:-1 Next character must be ">" terminating element "line".
Figure 4
You should be able to correlate each line of output in Figure 4 with
the statements in Listing 8.
The -1 reported for the column number in Figure 4 indicates that the
column number was "not available" to the method named getColumnNumber.
The reported line number value of 7 is also one line beyond the
actual line where the error occurs in the XML document.
(My interpretation of this situation is that the parser
considered the error to be before the first character in line 7 instead
of at the end of line 6. The error because apparent to the parser
when it encountered the left angle bracket for a new start tag without
the previous end tag having been properly terminated with a right angle
bracket.)
Parsing with Internet Explorer
For comparison purposes, Figure 5 shows the result of attempting to
parse the same corrupt XML file using Internet Explorer.
Figure 5 Parsing error as per Internet Explorer
As you can see, the IE parser considered the error to be at the
beginning of line 7 instead of at the end of line 6. However, it
was able to provide a column number. (It also provides a nice
graphic display showing the location of the error.)
Wrapped exceptions
As indicated in the earlier quotations from Sun, objects of the classes
SAXException and SAXParseException can wrap other
exceptions. The mechanism for getting and displaying the wrapped
exception, if any, is shown by the invocation of the getException
method on the SAXParseException
object in Listing 8. According to Sun, the getException
method,
which is inherited from SAXException, "returns the embedded
exception,
if any." The embedded exception is returned as type Exception.
The screen output in Figure 4 indicates that there was no embedded
exception in this sample case.
The remaining exception handlers
You can view the remaining exception handlers in Listing 9 near the end
of the lesson. There is nothing unusual about any of them.
Therefore, I won't discuss them in detail.
Run the Program
I encourage you to copy the code and XML data from Listings 9, 10,
and 11 into your text editor. Compile the program and execute
it. Experiment with it, making changes, and observing the results
of your
changes.
In this second lesson on Java JAXP, I began by providing a brief
review of XSL and XSL Transformations (XSLT).
Then I showed you how to create an identity Transformer
object, and how to use that object to:
Display a DOM tree structure on the screen in XML format.
Write the contents of a DOM tree structure into an output XML
file.
Following that, I showed you how to write exception handlers that
provide meaningful information in the event of errors and exceptions,
with particular emphasis on parser errors and exceptions.
In the next lesson, I will show you how to write a program to
display a DOM tree on the screen in a format that is much easier to
interpret than raw XML code.
Complete Program Listings
Complete listings of the Java class and the XML documents discussed in
this lesson are shown in Listings 9, 10, and 11 below.
/*File Xslt01.java Copyright 2003 R.G.Baldwin
This is a modification of the program named Dom02.java that was discussed in an earlier lesson. The program was modified to use an identity XSLT transform to format the output XML file in place of a call to Dom02Writer. This results in a much simpler program.
The program was also modified to display the output XML on the Standard Output Device.
The program was also modified to provide meaningful output in the event of an error.
This program shows you how to:
1. Create a Document object using JAXP, DOM, and an input XML file. 2. Create an identity XSL Transformer object. 3. Use the identity XSL Transformer object to display the XML represented by the Document object on the Standard Output Device. 3. Use the identity XSL Transformer object to write the XML represented by the Document object into an output file.
The input XML file name is provided by the user as the first command-line argument. The output XML file name is provided by the user as the second command-line argument.
The program instantiates a DOM parser object based on JAXP. The parser is configured in the default non-validating configuration.
The program uses the parse() method of the parser object to parse an XML file specified on the command line. The parse method returns an object of type Document that represents the parsed XML file.
Then the program gets a TransformerFactory object and uses that object to get a default identity Transformer object capable of performing a copy of the source to the result.
Then the program uses the Document object to get a DOMSource object that acts as a holder for a transformation Source tree in the form of a Document Object Model (DOM) tree.
Then the program gets a StreamResult object that points to the standard output device. This object acts as a holder for a transformation result.
Then the program uses the Transformer object, the DOMSource object, and the StreamResult object to transform the DOM tree to text and display it on the standard output device.
Then the program gets another StreamResult object that points to an output file, transforms the DOM tree to text, and writes it into the output file.
The program catches a variety of different types of errors and exceptions and provides meaningful output in the event of an error or exception.
Tested using SDK 1.4.2 and WinXP with two differentan XML files. The XML file named Xslt01.xml is well formed, and reads as follows:
<?xml version="1.0"?> <bookOfPoems> <poem PoemNumber="1" DumAtr="dum val"> <line>Roses are red,</line> <line>Violets are blue.</line> <line>Sugar is sweet,</line> <line>and so are you.</line> </poem> <?processor ProcInstr="Dummy"?> <!--Comment--> <poem PoemNumber="2" DumAtr="dum val"> <line>Roses are pink,</line> <line>Dandelions are yellow,</line> <line>If you like Java,</line> <line>You are a good fellow.</line> </poem> </bookOfPoems>
The XML file named Xslt01bad.xml is not well formed and reads as follows:
<?xml version="1.0"?> <bookOfPoems> <poem PoemNumber="1" DumAtr="dum val"> <line>Roses are red,</line> <!--Following line missing > --> <line>Violets are blue.</line <line>Sugar is sweet,</line> <line>and so are you.</line> </poem> <?processor ProcInstr="Dummy"?> <!--Comment--> <poem PoemNumber="2" DumAtr="dum val"> <line>Roses are pink,</line> <line>Dandelions are yellow,</line> <line>If you like Java,</line> <line>You are a good fellow.</line> </poem> </bookOfPoems>
This file is purposely missing a right angle bracket in the closing tag of a line element, and is used to test for parser errors.
When processing the well formed XML file, the program produces the following text on the Standard Output Device:
<?xml version="1.0" encoding="UTF-8"?> <bookOfPoems> <poem PoemNumber="1" DumAtr="dum val"> <line>Roses are red,</line> <line>Violets are blue.</line> <line>Sugar is sweet,</line> <line>and so are you.</line> </poem> <?processor ProcInstr="Dummy"?> <!--Comment--> <poem PoemNumber="2" DumAtr="dum val"> <line>Roses are pink,</line> <line>Dandelions are yellow,</line> <line>If you like Java,</line> <line>You are a good fellow.</line> </poem> </bookOfPoems>
When processing the well formed XML file, the program produces an output XML file that reads as follows:
<?xml version="1.0" encoding="UTF-8"?> <bookOfPoems> <poem PoemNumber="1" DumAtr="dum val"> <line>Roses are red,</line> <line>Violets are blue.</line> <line>Sugar is sweet,</line> <line>and so are you.</line> </poem> <?processor ProcInstr="Dummy"?> <!--Comment--> <poem PoemNumber="2" DumAtr="dum val"> <line>Roses are pink,</line> <line>Dandelions are yellow,</line> <line>If you like Java,</line> <line>You are a good fellow.</line> </poem> </bookOfPoems>
When processing the bad XML file, the program aborts with the following error message on the standard error device:
SAXParseException Public ID: null System ID: file:C:/jnk/Xslt01bad.xml Line: 7 Column:-1 Next character must be ">" terminating element "line".
Note that I manually inserted line breaks into the error message above to force it to fit into this narrow publication format.
public static void main(String argv[]){ if (argv.length != 2){ System.err.println( "usage: java Xslt01 fileIn fileOut"); System.exit(0); }//end if
try{ //Get a factory object for DocumentBuilder // objects with default configuration. DocumentBuilderFactory docBuildFactory = DocumentBuilderFactory.newInstance();
//Get a DocumentBuilder (parser) object DocumentBuilder parser = docBuildFactory.newDocumentBuilder();
//Parse the XML input file to create a // Document object that represents the // input XML file. Document document = parser.parse( new File(argv[0]));
//Get a TransformerFactory object TransformerFactory xformFactory = TransformerFactory.newInstance(); //Get an XSL Transformer object Transformer transformer = xformFactory.newTransformer(); //Get a DOMSource object that represents // the Document object DOMSource source = new DOMSource(document);
//Get a StreamResult object that points to // the screen. Then transform the DOM // sending XML to the screen. StreamResult scrResult = new StreamResult(System.out); transformer.transform(source, scrResult);
//Get an output stream for the output XML // file. PrintWriter outStream = new PrintWriter( new FileOutputStream(argv[1]));
//Get a StreamResult object that points to // the output file. Then transform the DOM // sending XML to to the file StreamResult fileResult = new StreamResult(outStream); transformer.transform(source, fileResult); }//end try block
A listing of the file named Xslt01.xml is provided in Listing
10 below.
<?xml version="1.0"?> <bookOfPoems> <poem PoemNumber="1" DumAtr="dum val"> <line>Roses are red,</line> <line>Violets are blue.</line> <line>Sugar is sweet,</line> <line>and so are you.</line> </poem> <?processor ProcInstr="Dummy"?> <!--Comment--> <poem PoemNumber="2" DumAtr="dum val"> <line>Roses are pink,</line> <line>Dandelions are yellow,</line> <line>If you like Java,</line> <line>You are a good fellow.</line> </poem> </bookOfPoems>
Listing 10
A listing of the file named Xslt01bad.xml is provided in
Listing 11 below. Note the missing right angle bracket at the end
of line 6.
<?xml version="1.0"?> <bookOfPoems> <poem PoemNumber="1" DumAtr="dum val"> <line>Roses are red,</line> <!--Following line missing > --> <line>Violets are blue.</line <line>Sugar is sweet,</line> <line>and so are you.</line> </poem> <?processor ProcInstr="Dummy"?> <!--Comment--> <poem PoemNumber="2" DumAtr="dum val"> <line>Roses are pink,</line> <line>Dandelions are yellow,</line> <line>If you like Java,</line> <line>You are a good fellow.</line> </poem> </bookOfPoems>
Listing 11
Copyright 2003, Richard G. Baldwin. Reproduction in whole or
in
part in any form or medium without express written permission from
Richard
Baldwin is prohibited.
About the author
Richard Baldwin
is a college professor (at Austin Community College in Austin, TX) and
private consultant whose primary focus is a combination of Java, C#,
and XML. In addition to the many platform and/or language independent
benefits of Java and C# applications, he believes that a combination of
Java, C#, and XML will become the primary driving force in the delivery
of structured information on the Web.
Richard has participated in numerous consulting projects, and he
frequently provides onsite training at the high-tech companies located
in and around Austin, Texas. He is the author of Baldwin's
Programming Tutorials, which
has gained a worldwide following among experienced and aspiring
programmers. He has also published articles in JavaPro magazine.
Richard holds an MSEE degree from Southern Methodist University
and has many years of experience in the application of computer
technology to real-world problems.