October 24, 2014
Hot Topics:
RSS RSS feed Download our iPhone app

Getting Started with Java JAXP and XSL Transformations (XSLT)

  • November 25, 2003
  • By Richard G. Baldwin
  • Send Email »
  • More Articles »

Java Programming Notes # 2202


Preface

What is JAXP?

As the name implies, the Java API for XML Processing (JAXP) is an API designed to help you write programs for processing XML documents.  JAXP is very important for many reasons, not the least of which is the fact that it is a critical part of the Java Web Services Developer Pack (Java WSDP).

This is the second lesson in a series designed to initially help you understand how to use JAXP, and to eventually help you understand how to use the Java WSDP.

The first lesson was entitled Java API for XML Processing (JAXP), Getting Started.

What is XML?

XML is an acronym for the eXtensible Markup Language.  I will not attempt to teach XML in this series of tutorial lessons.  Rather, I will assume that you already understand XML, and I will teach you how to use JAXP to write programs for creating and processing XML documents.

I have published numerous tutorial lessons on XML at Gamelan.com and www.DickBaldwin.com.  You may find it useful to refer to those lessons.  In addition, I provided a review of the salient aspects of XML in the first lesson in this series.  From time to time, I will also provide background information regarding XML in the lessons in this series.  For example, I will provide background information on XSL and XSL Transformations (XSLT) later in this document under General Background Information on XSLT.

Viewing tip

You may find it useful to open another copy of this lesson in a separate browser window.  That will make it easier for you to scroll back and forth among the different listings and figures while you are reading about them.

Supplementary material

I recommend that you also study the other lessons in my extensive collection of online Java tutorials.  You will find those lessons published at Gamelan.com.  However, as of the date of this writing, Gamelan doesn't maintain a consolidated index of my Java tutorial lessons, and sometimes they are difficult to locate there.  You will find a consolidated index at www.DickBaldwin.com.

General Background Information on XSLT

Rendering XML documents

As of this writing, to my knowledge, Microsoft IE is the only widely-used web browser that has the ability to do a good job of rendering XML documents.  IE can render XML documents using either Cascading Style Sheets (CSS) or XSL.  Thus, IE provides a good vehicle for testing XSLT files that you intend to use with JAXP.

What is the W3C?

For purposes of this lesson, the W3C is a governing body that has published many important documents on XSL and XSLT, two of which will be referenced later in this document.

What is XSL?

XSL is an acronym for Extensible Stylesheet Language.

According to the W3C, XSL is a language for expressing stylesheets. It consists of two parts:

  1. A language for transforming XML documents, and
  2. An XML vocabulary for specifying formatting semantics.

Again, according to the W3C,

"An XSL stylesheet specifies the presentation of a class of XML documents by describing how an instance of the class is transformed into an XML document that uses the formatting vocabulary."

Separating content from presentation

As you are probably aware by now, one of the primary virtues of XML is the ability to separate content from presentation.

In other words, an XML document contains structured information, but does not provide any hints as to how that information should be rendered for the benefit of a consumer.

What is XSLT?

XSLT is an acronym for XSL Transformations.

According to the W3C

"This specification defines the syntax and semantics of XSLT, which is a language for transforming XML documents into other XML documents.

XSLT is designed for use as part of XSL, which is a stylesheet language for XML. In addition to XSLT, XSL
includes an XML vocabulary for specifying formatting. XSL specifies the styling of an XML document by using XSLT to describe how the document is transformed into another XML document that uses the formatting vocabulary.

XSLT is also designed to be used independently of XSL. However, XSLT is not intended as a completely general-purpose XML transformation language. Rather it is designed primarily for the kinds of transformations that are needed when XSLT is used as part of XSL."

Transforming XML to other formats

Because an HTML document can be represented as an XML document, XSLT can be used to transform XML documents into HTML documents.  This makes it possible to render the information contained in an XML document using a common HTML Web browser.  Thus, one useful way to view the contents of an XML document is to transform it into an HTML document and view it using a standard Web browser.

Where does the transformation take place?

When transforming information from an XML document for rendering on an HTML browser, the transformation can take place anywhere between the XML document and the browser.

Transforming on the server

For example, an XSLT engine could be written in Java and run as a servlet, or it could be written as a JavaBeans component and accessed from a scriptlet in a JavaServer page (JSP).

Transforming at the browser

Or, the transformation could be performed at the browser.  For example, Microsoft IE can be used for this purpose.

Preview

A tree structure in memory

As you learned in the previous lesson, a DOM parser can be used to create a tree structure in memory that represents an XML document.  In Java, that tree structure is encapsulated in an object of the interface type DocumentDocument and its superinterface Node declare numerous methods.  As is always the case, classes that implement Document must provide concrete definitions of those methods.

Many operations are possible

Thus, given an object of type Document, there are many methods that can be invoked on the object to perform a variety of operations.  For example, it is possible to move nodes from one location in the tree to another location in the tree, thus rearranging the structure of the XML document represented by the Document object.  It is also possible to delete nodes, and to insert new nodes.  As you saw in the sample program in the previous lesson, it is also possible to recursively traverse the tree, extracting information about the nodes along the way.

I showed you ...

In the previous lesson, I showed you how to:

  • Use JAXP, DOM, and an input XML file to create a Document object that represents the XML file.
  • Recursively traverse the DOM tree, gathering information about each node in the tree along the way.
  • Use the information about the nodes to create a new XML file that represents the Document object.
The unmodified Document object represented the original XML file in the previous lesson.  The DOM tree was not modified in that example.  The final XML file represented the unmodified Document object, which represented the original XML file.  Therefore, the final XML file was functionally equivalent to the original XML file.

Something of an overkill

The things that you learned in the previous lesson about traversing the tree structure and gathering information about each node in the tree will serve you well in the future.  However, if all you need to do is to write an output XML file that represents a DOM tree, there is an easier way to do that using XSLT.  That is the primary topic of this lesson.

For simplicity, I elected not to show you how to write exception handlers that produce meaningful output in the event of parser errors in the previous lesson.  I will also cover that topic in this lesson.

Nothing fancy intended

The sample program that I will explain in this lesson is not intended to do anything fancy.  It is intended simply to introduce you to the use of XSLT to transform DOM objects in Java programs.

Discussion and Sample Code

The sample program consists of a single class named Xslt01.  For purposes of illustration, the program operates on two XML files.  One of the XML files is named Xslt01.xml.  The other XML file is named Xslt01bad.xml.  The first XML file is well formed, and is used to illustrate the behavior of the program in the absence of parser errors.  The second XML file is not well formed, and is used to illustrate the behavior of the program in the face of parser errors.
(You could, of course, use the program to operate on other XML files of your own design.)
As is often the case, I will discuss the program code in fragments.  Complete listings of all three files are shown in Listings 9, 10, and 11 near the end of the lesson.

The XML file named Xslt01.xml

I will begin my discussion with the XML file named Xslt01.xml.   A complete listing of this file is shown in Listing 10 near the end of the lesson.  This is a relatively simple XML file.  Assuming that you understood the material in the previous lesson, there should be no surprises in the file named Xslt01.xml.  This file will be used to test the program for the case where there are no parser errors.

The XML file named Xslt01bad.xml

A complete listing of the file named Xslt01bad.xml is shown in Listing 11 near the end of the lesson.  This file is not well formed.  It is missing a right angle bracket at the end of line 6, resulting in a bad end tag for the element named line.  Again, assuming that you understood the material in the previous lesson, there should be no surprises in the file named Xslt01bad.xml.  This file will be used to test the program for the case where there are parser errors.

The class named Xslt01

The entire program in contained in a class named Xslt01.  A complete listing of the program is shown in Listing 9 near the end of the lesson.

Behavior of the program

This program is a modification of the program named Dom02 that was discussed in the previous lesson.  The program was modified to use an identity XSL Transformer object to format an output XML file in place of a call to Dom02Writer, as was the case in the previous program.  This modification resulted in a much simpler and probably more reliable program.

The program was also modified to display the output XML on the Standard Output Device (typically the screen) as well as to provide meaningful output in the event of a parsing error.

This program shows you how to:
  • Create a Document object using JAXP, DOM, and an input XML file.
  • Create an identity XSL Transformer object.
  • Use the identity Transformer object to display the XML represented by the Document object on the Standard Output Device.
  • Use the identity Transformer object to write the XML represented by the Document object into an output file.
  • Provide meaningful output in the case of a parser error.
Operation of the program

The program requires two command-line arguments.  The input XML file name is provided by the user as the first command-line argument.  The output XML file name is provided by the user as the second command-line argument.

Get a DOM parser object

The program begins by instantiating a DOM parser object of type DocumentBuilder based on JAXP.  The parser is configured as a non-validating parser.

Create a DOM tree as a Document object

The program uses the parse method of the parser object to parse an XML file specified on the command line.  The parse method returns an object of type Document that represents the parsed XML file.

Get an identity Transformer object

Then the program gets a TransformerFactory object and uses that object to get an identity Transformer object capable of performing a copy of a source to a result.

Get a Source object

Following this, the program uses the Document object to get a DOMSource object that implements the Source interface, and acts as a holder for a transformation source tree in the form of a DOM tree.

Get a Result object

Then the program gets a StreamResult object that implements the Result interface, and points to the standard output device.  This object acts as a holder for a transformation result.

Transform the DOM tree

Having gone through the preparation steps, the program uses the Transformer object, the DOMSource object, and the StreamResult object to transform the DOM tree to text and display it on the standard output device (the screen).

Having transformed the Document on the screen, the program gets another StreamResult object that points to an output file.  Then it transforms the DOM tree to XML text, and writes it into the output file.

Handle errors and exceptions

The program catches and handles a variety of different types of errors and exceptions and provides meaningful output in the event of parser errors.  An XML document that is not well formed is used to illustrate the ability to display meaningful information in the event of a parser error.

Miscellaneous comments about the program

The program was tested using SDK 1.4.2 and WinXP with two different XML files.  The XML file named Xslt01.xml is well formed, and is shown in Listing 10 near the end of the lesson.

The XML file named Xslt01bad.xml is not well formed and is shown in Listing 11 near the end of the lesson.  This file was purposely corrupted, and is missing a right angle bracket in the closing tag of a line element.  This file is used to test for parser errors.  I will show you the output produced by this file later in the lesson under the discussion of the catch block for exceptions of type SAXParseException.

Let's see some code

The program named Xslt01 begins in Listing 1, which shows the beginning of the class definition and the beginning of the main method.

public class Xslt01{

public static void main(String argv[]){
if (argv.length != 2){
System.err.println(
"usage: java Xslt01 fileIn fileOut");
System.exit(0);
}//end if

Listing 1

The code in Listing 1 simply checks to confirm that the user has entered the correct number of command-line arguments, and aborts if the user has failed to enter the correct number.

Steps for creating a Document object


As you will recall from the previous lesson, three steps are required to create a Document object:
  1. Create a DocumentBuilderFactory object
  2. Use the DocumentBuilderFactory object to create a DocumentBuilder object
  3. Use the parse method of the DocumentBuilder object to create a Document object
These three steps are illustrated by the three statements in Listing 2.  (We will probably see these three statements in many different programs in this series of lessons.)

    try{
DocumentBuilderFactory docBuildFactory =
DocumentBuilderFactory.newInstance();

DocumentBuilder parser =
docBuildFactory.newDocumentBuilder();

Document document = parser.parse(
new File(argv[0]));

Listing 2

The DocumentBuilderFactory Class

Reviewing some of what you learned in the previous lesson, the DocumentBuilderFactory class
"Defines a factory API that enables applications to obtain a parser that produces DOM object trees from XML documents."
The DocumentBuilderFactory class extends Object, and defines about fifteen methods, one of which is a static method named newInstance.  The newInstance method is used to create an object of the DocumentBuilderFactory class (as shown in Listing 2).

The class also defines the newDocumentBuilder instance method, which is used to create objects of the DocumentBuilder class (also shown in Listing 2).

The DocumentBuilder Class


The DocumentBuilder class
"Defines the API to obtain DOM Document instances from an XML document."
This class also extends Object, and defines about ten methods, which include several overloaded versions of the parse method.  When the parse method is invoked and passed an input source containing XML, the method returns a Document object (DOM tree) that represents the XML.
(In Listing 2, the parse method is passed a reference to a File object that represents the input XML file.)
The Document interface

Document is an interface in the org.w3c.dom package, which extends the Node interface belonging to the same package.  When we invoke the parse method, it returns a reference to an object instantiated from a class that implements the Document interface.
(The reference is returned as type Document, not as the name of the class from which the object was actually instantiated.   Because Document extends Node, that object could also be treated as type Node when appropriate.)
According to Sun:
"The Document interface represents the entire HTML or XML document. Conceptually, it is the root of the document tree, and provides the primary access to the document's data."
Steps for creating a Transformer object

This information is new to this lesson.  The following two steps are required to create an identity Transformer object.
  1. Create a TransformerFactory object by invoking the static newInstance method of the TransformerFactory class.
  2. Invoke the newTransformer method on the TransformerFactory object.
These two steps are illustrated by the code in Listing 3.

      //Get a TransformerFactory object
TransformerFactory xformFactory =
TransformerFactory.newInstance();
//Get an XSL Transformer object
Transformer transformer =
xformFactory.newTransformer();

Listing 3

The TransformerFactory class

A TransformerFactory instance can be used to create Transformer and Templates objects.
(This lesson does not discuss Templates objects.  That is a topic for a future lesson.)
In a programming style that should by now be familiar, this class provides a static method named newInstance.  Invocation of the newInstance method returns a reference to a new instance of TransformerFactory.

The newTransformer method

A TransformerFactory object provides two overloaded versions of the newTransformer method.  Invocation of the version of newTransformer that takes no parameters (on an instance of TransformerFactory) returns a reference to a new Transformer object that performs a copy of a source to a result.  Some authors refer to this as the identity transform.

The code in Listing 3 produces such a Transformer object, and saves the object's reference in a variable named transformer.

The other overloaded version of the newTransformer method takes a parameter that represents an XSL stylesheet, and returns a Transformer object that implements the instructions in the stylesheet.  I will show you how to use that version is a future lesson.

The Transformer class

Here is some of what Sun has to say about an object of the Transformer class:
"An instance of this abstract class can transform a source tree into a result tree.

An instance of this class can be obtained with the TransformerFactory.newTransformer method. This instance may then be used to process XML from a variety of sources and write the transformation output to a variety of sinks."
The transform method

The transform method of the Transformer class is partially described in Figure 1.
 
public abstract void transform(
Source xmlSource,
Result outputTarget)
throws TransformerException

Process the source tree to the output result.

Parameters:
xmlSource - The input for the source tree.
outputTarget - The output target.
Figure 1

As you can see, this method requires two parameters:
  1. A reference to an object of type Source
  2. A reference to an object of type Result
The method processes the Source to produce the Result.





Page 1 of 2



Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 


Sitemap | Contact Us

Rocket Fuel