Java JAXP, Implementing Default XSLT Behavior in Java
Java Programming Notes # 2206
- Preface
- Preview
- Some Details Regarding XSLT
- Discussion and Sample Code
- Run the Program
- Summary
- What's Next?
- Complete Program Listings
Preface
In this lesson, I will explain default XSLT behavior, and will show you how to write Java code that mimics that behavior. The resulting Java code serves as a skeleton for more advanced transformation programs.
What is JAXP?
JAXP is an
API designed
to help you write programs for creating and processing XML
documents. JAXP is
very important for many reasons, not the least of which is the
fact that it is a critical part of Sun's Java Web Services Developer
Pack
(JWSDP). As you are probably already aware, web services is
expected by many to be a very important aspect of the Internet of the
future
This lesson is one in a series designed to help you understand how to use JAXP and how to use the JWSDP.
The first lesson in this series was
entitled Java
API for XML Processing (JAXP), Getting Started.
The
previous lesson was entitled Java
JAXP, Exposing a DOM Tree.
What is XML?
XML is an acronym for the eXtensible Markup Language. I will assume that you already understand XML, and will teach you how to use JAXP to write programs for creating and processing XML documents.What are XSL and XSLT?
I provided quite a lot of background material on XSL and XSLT in a previous lesson in this series. A brief review of that material follows.
XSL is an acronym for Extensible Stylesheet language. XSLT is an acronym for XSL Transformations. The W3C is a governing body that has published many important documents on XML, XSL, and XSLT.
- Transforming non-XML documents into XML documents.
- Transforming XML documents into other XML documents.
- Transforming XML documents into non-XML documents.
You may find it useful to open another copy of this lesson in a separate browser window. That will make it easier for you to scroll back and forth among the different listings and figures while you are reading about them.
Supplementary material
I recommend that you also study the other lessons in my extensive collection of online Java and XML tutorials. You will find those lessons published at Gamelan.com. As of the date of this writing, Gamelan doesn't maintain a consolidated index of my tutorial lessons, and sometimes they are difficult to locate there. You will find a consolidated index at www.DickBaldwin.com.
Preview
A tree structure in memory
A DOM parser can be used to
create a tree structure in memory that represents an XML
document. In Java, that tree structure is encapsulated in an
object of the interface type Document. Document
and its superinterface Node declare numerous methods that can
be used to navigate, extract information from, modify, and otherwise
manipulate the DOM tree. As
is always the case, classes that implement Document must
provide concrete definitions of those methods.
Many operations are possible
Given an object of type Document, there are many
methods that
can be invoked on the object to perform a variety of operations.
For example, it is possible to write Java code to move nodes from one
location in the tree
to another location in the tree, thus rearranging the structure of the
XML document represented by the Document object. It is
possible to delete nodes, and to insert new nodes. It is
also possible
to
recursively traverse the tree, extracting information about the nodes
along
the way.
Two ways to
transform an XML document
There are at least two ways to transform the contents of an XML
document into another document:
- By writing Java code to manipulate the DOM and perform the transformation.
- By using XSLT to perform the transformation.
It should be possible to write Java code to perform any
transformation that can be performed using XSLT, but the reverse may
not be true.
General
description of XSLT
Here is a partial quotation from XML In A Nutshell, (which I highly recommend), by
Elliotte Rusty Harold and
W. Scott Means. This quotation provides a general description of
XSLT:
"...
(XSLT) is a functional programming language used to specify how an
input XML document is converted into another text document -- possibly,
though not necessarily, another XML document. An XSLT processor
reads both an input XML document and an XSLT stylesheet (which is
itself an XML document because XSLT is an XML application) and produces
a result tree as output. ... Documents can be transformed using a
standalone program or as part of a larger program that communicates
with the XSLT processor through its API."
In this lesson, I will provide and explain a larger program that communicates
with the XSLT processor through its API. The program will also
execute Java code that mimics the transformation provided by XSLT.
Advantages
and disadvantages
As is usually the case, there are advantages and disadvantages to
both approaches to
document transformation.
As an example of an advantage provided by XSLT, if it is possible to
perform the required
transformation using XSLT, that approach will probably require you to
write less code than would be required to perform the same
transformation by writing a Java program from scratch.
A large
library of functions
With the XSLT transformation process, you write a stylesheet, which
is somewhat analogous to a driver program in a more conventional
programming environment. That driver program accesses and
uses functions from a large library of pre-written functions to perform
a series of well-defined operations on the DOM tree to produce
the desired transformation.
(XSLT
authors don't call them functions. Rather, they are called XSLT
elements. According to XML
In A Nutshell, there are 37 standard
XSLT
elements. Also according to XML In A Nutshell, most
XSLT
processors also provide various nonstandard extension elements and
allow you to write your own extension elements in languages such as
Java.)
Is there a
similar library of Java methods?
I am not aware of a library of Java methods in the public domain
that emulates the 37 standard XSLT Elements. However, I freely
admit that such a library may exist and I may simply not know
about it.
Therefore, to write a Java program that emulates an XSLT
transformation, you need to either
- Create your own library of Java methods and use that library with your Java code to perform the transformation, or
- Start from scratch each time and write a custom program to perform the transformation.
A skeleton
library of Java methods
This lesson, and several lessons to follow this one, will show you
how to write the skeleton of a Java library containing methods that
emulate the most common XSLT elements. Once you have the library,
writing Java code to transform XML documents consists simply of writing
a short driver program to access and use those methods. Thus,
given the proper library of methods, it is no more difficult to write a
driver Java program to perform the transformation than it is to write
an
XSLT stylesheet.
Library is
not my primary purpose
However, my primary purpose in these lessons is not to provide such
a library, but rather is to help you understand how to use a DOM
tree to create, modify, and manipulate XML documents. By
comparing Java code that manipulates a DOM tree with similar XSLT
operations, you will have an opportunity to learn a little about XSLT
in the process of learning how to manipulate a DOM tree using Java code.
If you already know a lot about XSLT, you may learn a little
about Java by studying these lessons. If you already know a lot
about Java, you may learn a little about XSLT. If you don't
already know either
Java or XSLT, you may learn a little about both.
Debugging
XSLT can be difficult
While writing a Java program to emulate an XSLT Transformation may
require you to write more code than writing a stylesheet, in my
opinion, it is much easier to debug a Java program that fails to
deliver the desired result than it is to debug an XSL stylesheet that
fails to deliver. This is an advantage of
using Java code over XSLT. I find XSLT to be extremely difficult
to debug (but I haven't attempted to
use a fancy XSLT debugger, several of which are freely available on the
Internet).
Java
provides more detailed control
Another difference in using Java code relative to XSLT has to do
with
the detailed control of the transformation process. I
believe, (but cannot prove),
that it is possible to write Java programs
to provide transformations that are not possible using standard XSLT
elements. If I am correct, this may be another
advantage of writing Java code over using XSLT.
Some Details Regarding XSLT
The following is a partial quotation from XML In A Nutshell. (Note that I will be referring to
this excellent book several more times in this lesson. For
brevity, I will refer to it simply as Nutshell.)
"XSLT
is an XML application for specifying rules by which one XML document is
transformed into another XML document. An XSLT document -- that
is, an XSLT stylesheet -- contains template rules. Each template
rule has a pattern and a template. An XSLT processor compares the
elements and other nodes in an input XML document to the template-rule
patterns in a stylesheet. When one matches, it writes the
template from that rule into the output tree. ... XSLT uses the
XPath syntax to identify matching nodes."
My
explanation
Let's see if I can explain this process in my own words.
Assume that an XML document has been parsed so as to produce a DOM tree
in memory that represents the XML document. (The creation of a DOM tree in this manner
was discussed in several previous lessons
in this series.)
An XSLT processor starts examining the DOM tree at its root
node. It
obtains instructions from the XSLT stylesheet telling it how to
navigate the
tree, and what to do with each node that it encounters along the way.
Finding
matching template rules
As each node is encountered, the processor searches the stylesheet
looking for instructions on how to treat that node. (These instructions will be referred to
later as template rules.) If the processor finds
instructions that match the node type, it performs the operations
indicated by the
instructions. If it doesn't find matching instructions, it
executes built-in instructions appropriate to that node.
(An XML
document can contain seven different types of nodes. The
different types will be identified later. This lesson will
describe and explain the built-in
instructions for six of those seven node types. Java code will be
developed that emulates the built-in
instructions for each of the six types of nodes.)
Establishing
the context node
An XPath expression can be
used to point to a specific node and to
establish that node as the context node. Once a context node is
established, there are at least two XSLT elements that can be used to
manage the traversal among children of that node:
- xsl:apply-templates
select, optional attribute
mode, optional attribute
xsl:sort, optional XSLT element
- xsl:for-each
select, required attribute
xsl:sort, optional XSLT element
The
xsl:apply-templates XSLT element
The first of these, xsl:apply-templates,
examines and processes all child nodes of the context node that match
an optional select
attribute.
(When
combined with a default template rule to be discussed later, this often
results in a recursive examination and processing of all descendant
nodes of the context node.)
According to Nutshell,
"The
xsl:apply-templates instruction tells the processor to search for and
apply the highest-priority template in the stylesheet that matches each
node identified by the select attribute."
Applying
template rules
As each node is examined, the processor searches the stylesheet to
determine if the XSLT programmer has provided a template rule that
matches the node and defines how that
node should be treated. If a matching template rule is found, the
node is treated in the manner prescribed by the template rule.
Literal text
in the XSLT stylesheet elements
You can think of the XSLT process as operating on an input DOM tree
to produce an output DOM tree. If the template rule being applied
contains literal text, that literal text is used to
create a text node in the output tree.
(I will
explain how this feature is used to transform XML documents into XHTML
documents in a future lesson.)
If no match
is found
If a matching template rule is not found, the processor executes a
built-in template rule appropriate to the type of node involved.
Built-in template rules are provided by the XSLT processor to handle
the seven different types of nodes in an XML document:
- root node
- element node
- attribute node
- text node
- comment node
- processing instruction node
- namespace node
This lesson will explain the built-in rules that handle the first
six types of nodes in the above list.
Recursion is
common
As mentioned earlier, the combination of xsl:apply-templates and a built-in
template rule often produces recursion. Assuming that there is
nothing in a matching template rule that stops
the recursion operation, recursion continues until all descendant nodes
of the original context node have been examined and processed.
The mode
attribute
The mode attribute of xsl:apply-templates makes it
possible to cause different template
rules to match nodes of the same type at different places in the DOM
tree.
Sorting
The optional xsl:sort
element makes it possible to modify the
order in which the nodes are examined.
Iterative
operation
The second XSLT element in the above list, xsl:for-each, executes an iterative
examination and processing of all child nodes of the context node that
match the required select attribute.
According to Nutshell,
"The
xsl:for-each instruction iterates over the nodes identified by its
select attribute and applies templates to each one."
In other words, the processor will examine all child nodes of the
context node that match the select
attribute. As each child node is examined, the processor will
search the stylesheet looking for a template rule that matches the
child node. If a matching template rule is found, the matching
template rule will be used to process that
node.
If a matching template rule is not found, a built-in template rule
appropriate for the type of node will be used to process the node.
As before, the optional xsl:sort
element makes it possible to modify the
order in which the nodes are examined. I will explain this in
detail in a future lesson.
Combined
operations
Frequently a stylesheet will combine recursive and iterative
operations to produce more complex operations.
Enough talk, let's see some code
I will begin by discussing the XML file named Dom11.xml (shown in Listing 29) along with
the XSL
stylesheet file named Dom11.xsl
(shown in Listing 30).
These two listings are provided near the end of the lesson.
After explaining the transformation produced by applying this
stylesheet to this XML document, I will explain the transformation
produced by applying the empty stylesheet
named Dom11a.xsl, (shown in Listing 33), to a nearly
identical XML document.
A Java program named Dom11
Following that, I will explain a Java program (shown in Listing 31) that emulates the behavior of the stylesheets shown in Listings 30 and 33 when applied to the XML file shown in Listing 29.
I will explain that the Java program shown in Listing 31 emulates the behavior of the empty stylesheet shown in Listing 33, and will explain why that is true.
Discussion and Sample Code
The XML file shown in Listing 29 is relatively straightforward. A tree view of that XML file is shown in Figure 1.
The values of the text nodes in Figure 1 were manually highlighted in red to make it easier to refer to those values later in this lesson.)
#document DOCUMENT_NODE Figure 1 |
A database of books
As you may already have figured out, this XML document represents a small database containing information about books. However, the structure and content of this XML file was not intended to have any purpose other than to illustrate the default behavior of the built-in XSLT template rules.
The XSL stylesheet file named Dom11.xsl
The stylesheet file shown in Listing 30 is very important relative to the purpose of this lesson, so I will discuss it in detail.
Recall that an XSL stylesheet is itself an XML file, and can therefore be represented as a tree. I will begin by showing you an abbreviated version of a tree view of the stylesheet, as shown in Figure 2.
#document DOCUMENT_NODE Figure 2 |
Why abbreviated?
The reason that I refer to this as an abbreviated version is because I manually deleted comment nodes and extraneous text nodes in order to emphasize the important elements in the document.
The root element
The root node of all XML documents is the document node. However, in addition to the root node, there is also a root element.
As you can see from Figure 2, the root element in the XSL document is of type xsl:stylesheet. The root element has two attributes, each of which is standard for XSL stylesheets.
The first attribute points to the XSLT namespace URI, which you can read about in the W3C Recommendation. The second attribute provides the XSLT version. According to Nutshell, the version must be 1.0. Also, according to Nutshell,
Unable to verify this behavior
I have been unable to verify this behavior experimentally. When I delete a character from the XSL namespace URI and then load the XML file into IE 6.0, there is simply no output. The browser screen remains blank. When I modify the XSL namespace URI and attempt to use JAXP to apply the stylesheet to the XML file, the system throws several errors and the program aborts. Neither approach seems to "output the stylesheet itself" as indicated by Nutshell.
Children of the root element node
As you can see from Figure 2, the root element node has two child nodes, both of which are of type xsl:template. Here is what XSLT and XPath On The Edge by Jeni Tennison has to say about xsl:template:
As you can see from the attribute values in Figure 2, a match pattern is provided for both of the xsl:template nodes in Figure 2.
Back to basics
Getting back to XSLT basics, whenever the XSLT processor encounters a node while traversing the DOM tree, it will examine all of the template rules in the stylesheet searching for one whose match pattern matches the node. If it finds a matching template rule, it will execute the instructions contained as elements within the template rule. If it doesn't find a match, it will execute a built-in template rule that matches the node.
An explicit representation of a built-in template rule
Consider the first child node of the xsl:stylesheet root element in Figure 2. Listing 1 shows this template rule in XSL syntax, (extracted from Listing 30).
<xsl:template match="*|/"> |
Matching the root node and element nodes
Consider the match pattern for this template rule (the text value of the attribute named match). According to Nutshell,
The forward slash / is an XPath pattern that matches the root node.
This is the first node the processor selects for processing, and therefore this is the first template rule the processor executes (unless a nondefault template rule also matches the root node).
... the vertical bar combines these two expressions so that it matches both the root node and element nodes."
The <xsl:apply-templates/> element
Now consider the <xsl:apply-templates/> element that makes up the body of this template rule. This element causes the processor to process all child nodes of each matching node, examining nodes, searching for matching template rules, and executing the elements embedded in matching template rules along the way. Again, according to Nutshell, still speaking of the template rule in Listing 1,
An explicit representation of a built-in template rule
Once again, the template rule shown in Listing 1 is an explicit representation of one of the built-in template rules. If I were to remove this template rule from the stylesheet, and then apply the stylesheet to the XML document, this template rule would still be applied where appropriate by the XSLT processor, because it is built into the processor.
Handling text nodes by default
Listing 2 shows the template rule, in XSL syntax that corresponds to the second child node of the root element node in Figure 2. Once again, this is a template rule with a match pattern. This template rule is also an explicit representation of one of the built-in rules, which copies the value of text and attribute nodes into the output document.
<xsl:template match="text()|@*"> |
The text() in the value of the attribute named match is an XPath pattern matching all text nodes. The @* is an XPath pattern matching all attribute nodes. The vertical bar combines the two patterns. Hence, the template rule matches all text and all attribute nodes.
The xsl:value-of element
Once a match is made, the behavior of the rule is governed by the single element that is embedded in the rule. The xsl:value-of element, with a select value of "." returns the text value of the context or current node. (This is similar to the use of a single period to represent the current directory in some file management systems such as MSDOS.)
Text value to the output
Therefore, whenever the XSLT processor applies this template rule to a text or attribute node, the text value of that node is sent to the output document (a text node is created in the output tree).
If the node is a text node, the value is simply the text in the node.
If the node is an attribute node, the value is the attribute value, but not the attribute name.
The output
Now it's time for the big question. What does the output look like when the stylesheet shown in Listing 30 is used to transform the XML document shown in Listing 29? The result of such a transformation is shown in Figure 3.
<?xml version="1.0" encoding="UTF-8"?> Figure 3 |
The XML declaration
The first line in Figure 3 is an XML declaration that was placed there by the XSLT processor independent of the content of the XML file.
The text in the output
If you compare the text in Figure 3 with the material highlighted in red in Figure 1, you will see that the output produced by this stylesheet containing only explicit representations of default template rules is the concatenation of text values for all the element nodes in the XML document.
Line breaks in the output
The two line breaks following the words Java and rules in Figure 3 correspond to the line breaks in the text portion of the title element shown in Listing 3. (This element was extracted from the original XML file in Listing 29.)
<title>Java |
The remaining line breaks in the XML file shown in Listing 29 occur between XML tags. Therefore, they are not considered to be a part of the text content of any element and they do not appear in Figure 3.
No attribute values in the output
You may have noticed that even though a couple of the elements in the XML file have attributes (see Figure 1), and one of the template rules matches attribute nodes, the attribute values do not appear in the output shown in Figure 3. Nutshell explains this in the following way:
Nutshell goes on to tell us,
Finally, Nutshell tells us,
Applying an empty stylesheet
Now consider the stylesheet shown in Listing 33, as shown in abbreviated tree format in Figure 4.
#document DOCUMENT_NODE Figure 4 |
Unlike Figure 2, the stylesheet represented by Figure 4 doesn't contain any template rules. In fact, except for the root (document) node and the xsl:stylesheet root element node, the stylesheet is completely empty.
Produces exactly the same output
However, the result of applying the empty stylesheet to the XML file discussed earlier produces exactly the same result as was produced by applying the stylesheet shown in Listing 30 and Figure 2 to that XML file.
This is because the two template rules shown in Listing 30 and Figure 2 replicate the behavior of two of the built-in template rules. Therefore, removing them from the stylesheet has no impact on the result produced by applying the stylesheet to the XML file. If they are needed, they are available as built-in rules of the XSLT processor.
Transformation behavior of an empty stylesheet
Because the two template rules in the previous stylesheet replicate the behavior of two of the built-in template rules, removing those template rules from the stylesheet to produce an empty stylesheet had absolutely no impact on the transformation result. The transformation result produced by the previous stylesheet was identical to those produced by the empty stylesheet.
According to Nutshell, when you transform an XML document using an empty stylesheet,
Combined output
Whenever the XSLT processor encounters a node for which you haven't defined a matching template rule, the default template rule for that type of node will be applied. Therefore, the total output is often a combination of output produced by template rules that you provide and built-in template rules.
Therefore, if you are going to create a stylesheet containing template rules of your own design, it is very important for you to understand the default behavior provided by the built-in template rules. The total output produced by your stylesheet is very likely to be a combination of the output produced by your template rules and the output produced by the built-in template rules.
Other built-in template rules
I have explained the behavior of the built-in template rules that cover the following four types of nodes:
- root node
- element node
- attribute node
- text node
- comment node
- processing instruction node
A Java program that emulates the built-in template rules
Now let's change direction and concentrate on Java code rather than XSLT elements. The following paragraphs describe a Java program named Dom11.
The primary purposes of this lesson are to:
- Demonstrate Java code that replicates the behavior of the built-in template rules for six of the seven possible types of nodes.
- Provide a skeleton program that can be expanded later to provide more complex behavior.
As such, the program serves as the skeleton for the definition of custom template rules.
Behavior of the program
As written, this program extracts and concatenates all text values from a specified XML file, and writes that text into a result file, using two different approaches:
- An XSLT transformation operating under program control.
- Program code that emulates the behavior of the XSLT transformation.
As you saw in the earlier discussion, both XSL files produce the same result when processed against the XML files named Dom11.xml and Dom11a.xml, demonstrating the behavior of the built-in template rules. The execution of these built-in template rules causes the contents of every text node to be concatenated and written into the result file.
The program code in this program emulates those built-in template rules and produces the same results.
Usage instructions
The program requires three command line arguments in the following order:
- The name of the input XML file - must be Dom11.xml or Dom11a.xml.
- The name of the output file to be produced by the XSLT transformation.
- The name of the output file to be produced by the program code that emulates the XSLT transformation.
The program begins by executing code to transform the incoming XML file in a way that mimics the XSLT transformation. Along the way, it saves the processing instructions, (one of which contains the name of the stylesheet file), for later use by the code that governs the XSLT transformation process. (Otherwise, the code that performs the XSLT transformation later would have to search the DOM tree for the XSL stylesheet file name.)
The name of the XSL stylesheet file is extracted from the processing instruction in the XML file. Then the program uses the XSL style sheet to transform the XML file into a result file.
Errors, exceptions, and testing
No effort was made to provide meaningful information about errors and exceptions. If an error or exception occurs, the default behavior for that error or exception will occur.
The program was tested using SDK 1.4.2 under WinXP.
Will discuss in fragments
I will discuss this program in fragments. A complete listing of the program is shown in Listing 31 near the end of the lesson.
Listing 4 shows the beginning of the class named Dom11 and the beginning of the main method.
public class Dom11{
|
Then the code in Listing 4 provides usage instructions based on command-line arguments.
Parse the input XML file
The code in Listing 5 parses the input XML file, producing an object of type Document, which is a DOM tree in memory.
try{
|
There is nothing new in the code in Listing 5. I have discussed the code required to create a Document object in several previous lessons beginning with the lesson entitled Java API for XML Processing (JAXP), Getting Started.
As you saw in those earlier lessons, creating a Document object involves three steps:
- Create a DocumentBuilderFactory object
- Use the DocumentBuilderFactory object to create a DocumentBuilder object
- Use the DocumentBuilder object to create a Document object
Transformation through program code
The code in Listing 6 begins the process of transforming the DOM tree into an output file through the execution of program code (as opposed to an XSLT transformation).
The code begins by instantiating a new object of the Dom11 class.
Dom11 thisObj = new Dom11(); |
Then the program gets an output stream for the output produced by the program code. This stream points to an output file that was specified by the third command- line parameter.
Process the DOM tree
The code in listing 7 invokes the processDocumentNode method to process the DOM tree. This method (and the methods that it calls) begins with the Document node, and processes all the nodes in the DOM tree to produce the required output.
thisObj.processDocumentNode(document); |
Set the main method aside
My explanation of this program will follow the execution thread through the program. At this point, I will set the discussion of the main method aside temporarily and come back to it later when the processDocumentNode method returns control to the main method.
The processDocumentNode method
The entire processDocumentNode method is shown in Listing 8.
void processDocumentNode(Node node){
|
Invoke the processNode method
Despite the name that I chose to give to the processDocumentNode method, it doesn't actually process the document node directly. Rather after sending any required text to the output, it invokes the method named processNode to actually process the document node.
When the DOM tree has been processed ...
When the processNode method returns, (after the entire DOM tree has been processed), the processDocumentNode method flushes the output stream and returns control to the main method.
As you will see later, subsequent code in the main method invokes a method that will perform an XSLT transformation on the XML file and write the output into a different output file. I will discuss that method later in this lesson.
The processNode method
There are seven possible types of nodes in an XML document:
- root or document node
- element node
- attribute node
- text node
- comment node
- processing instruction node
- namespace node
Get and save the node type
The beginning of the processNode method is shown in Listing 9. Note that the method receives an incoming parameter, which is a reference to an object as type Node. This can include any of the seven node types that can occur in a DOM tree.
If the parameter doesn't point to an actual object, the method simply returns, as opposed to throwing a NullPointerException.
void processNode(Node node){ |
Process the node
Each time the processNode method is invoked, it receives a Node object's reference as an incoming parameter. The code in Listing 9 determines the type of the incoming node. Listing 10 shows the beginning of a switch statement that is used to initiate the processing of each incoming node based on its type.
switch (type){
|
The DOCUMENT_NODE case
The code in Listing 10 will be executed whenever the incoming method parameter points to a document node.
DOCUMENT_NODE is a constant (public static final variable) that is defined in the Node interface. (The interface provides similar constants for all node types other than namespace nodes.) These constants can be used to distinguish between different node types.
Will invoke default behavior in this case
Note that the code in the case in Listing 10 is an if/else construct. If the conditional clause in the if statement evaluates to true (which is not possible in this case), the code in the if statement will be executed. (This is where I will place the code for custom template rules in subsequent lessons.)
If the conditional clause in the if statement does not evaluate to true, the code in the else statement will be executed. (This is where I have placed the code that mimics the built-in template rules.)
Note that the code in the else statement in Listing 10 invokes a method named defElOrRtNodeTemp. When I discuss this method momentarily, you will see that its behavior mimics one of the built-in template rules that I discussed earlier in this lesson. Before getting to that, however, I want to give you a preview of how I will define custom template rules in future lessons.
Creating custom template rules
As you will see in subsequent lessons, the process for creating a custom template rule is as follows:
- Go to the method named processNode, which I am discussing right now.
- Identify the case for the node type in the switch statement.
- Change the conditional clause in the if statement for that case to implement a match for a particular node of that type.
- Write code in the body of the if statement to implement the custom template rule.
The ELEMENT_NODE case
Before getting to the discussion of the method named defElOrRtNodeTemp, I want to show you the ELEMENT_NODE case in Listing 11.
case Node.ELEMENT_NODE:{ |
As before, the code in the if statement is not reachable in this program.
The method named defElOrRtNodeTemp
Still following the execution thread, I will set my discussion of the switch statement aside temporarily and discuss the method named defElOrRtNodeTemp. As mentioned above, this method is invoked as the default behavior for document nodes and element nodes in Listings 10 and 11.
I will return to my discussion of the switch statement shortly.
The entire method named defElOrRtNodeTemp is shown in Listing 12.
void defElOrRtNodeTemp(Node node) |
This method mimics the behavior of the built-in XSLT template rule shown in Listing 1, and repeated in Figure 5 below for convenient viewing.
<xsl:template match="*|/"> Figure 5 |
As I indicated earlier, the match pattern for this template rule matches the document node and all element nodes.
Code is straightforward
The code in this method is relatively straightforward. First it tests to confirm that the incoming parameter points to a node of the correct type, and throws an exception if the incoming parameter is not of the correct type.
If the incoming parameter is of the correct type, the code in the method invokes a method named applyTemplates passing the node as a parameter to that method.
The method named applyTemplates
Continuing to follow the execution thread, I will now discuss the method named applyTemplates, shown in Listing 13.
void applyTemplates(Node node,String select){ |
The applyTemplates method partially emulates the XSLT apply-templates rule discussed earlier in this lesson, and shown in Figure 6.
<xsl:apply-templates Figure 6 |
The apply-templates rule has two attributes, select and mode.
As I explained earlier in this lesson,
Behavior of the method named applyTemplates
The applyTemplates method shown in Listing 13 receives two incoming parameters:
- The context node.
- The select parameter.
The code in Listing 13 invokes the getChildNodes method on the context node to get a list of all child nodes of the context node. If there are no child nodes, it quietly returns.
A recursive method call
If there are child nodes, the method uses a for loop to process all child nodes that match the select parameter as described above.
For each matching child node, the applyTemplates method makes a recursive call to the method named processNode, passing the child node's reference as a parameter to the processNode method.
Return to defElOrRtNodeTemp method
Eventually, the recursive process will end, and control will return to the defElOrRtNodeTemp method shown in Listing 12. From there, control will return to either the DOCUMENT_NODE case or the ELEMENT_NODE case in the switch statement in Listing 10 or Listing 11 from which the defElOrRtNodeTemp method was called.
That, in turn, brings us back to a discussion of the other cases in the switch statement.
The TEXT_NODE and ATTRIBUTE_NODE cases
The next two cases from the switch statement that I will discuss are shown in Listing 14. (The switch statement began in Listing 10)
Listing 14 shows the cases for text nodes and attribute nodes. I have grouped these two cases together because the default behavior of both cases is to invoke the method named defTextOrAttrTemp, and to send the String returned by that method to the output.
case Node.TEXT_NODE:{ |
Once again, following the execution thread, I will now discuss the method named defTextOrAttrTemp method. This method is called whenever:
- The processNode method is called with a reference to either a text node or an attribute node, and.
- The default behavior for the node type is executed.
String defTextOrAttrTemp(Node node) |
This method emulates the built-in XSLT template rule shown in Listing 2 and repeated in Figure 7 below for convenient viewing.
<xsl:template match="text()|@*"> Figure 7 |
As I told you earlier, this template rule matches all text nodes and all attribute nodes. Therefore, the defTextOrAttrTemp method is invoked by the default behavior of either the TEXT_NODE case or the ATTRIBUTE_NODE case in the switch statement in Listing 14.
Similar behavior
Once again, note the similarity between the method named defTextOrAttrTemp in Listing 15 and the template rule shown in Figure 7.
In Figure 7, the template rule executes the xsl:value-of XSLT element to send the value of the context node to the output.
The method shown in Listing 15 invokes a method named valueOf, passing "." as a parameter (note the period between the quotation marks). The value returned by that method is sent to the output by the code in the default behaviors of the two cases in Listing 14.
The method named valueOf
The method named valueOf, which begins in Listing 16, is fairly complex. I will discuss portions of this method in this lesson and will discuss the remainder of the method in subsequent lessons.
This method emulates an <xsl:value-of select="???"/> XSLT element.
Three forms of method call
The method requires two parameters. The first parameter is of type Node, and is the context node. The second parameter is of type String and is a select parameter.
The valueOf method recognizes three forms of call:
- valueOf(Node theNode,String "@attrName")
- valueOf(Node theNode,String ".")
- valueOf(Node theNode,String "nodeName")
In the second form, which is the only form actually used in this program, the value of the select parameter is a String containing a single period. In this form, the method returns the concatenated text values of the context node and all descendants of the context node (including text nodes that are children of the context node).
In the third form, the method returns the concatenated text values of all descendants of a specified child node of the context node. If the context node has more than one child node with the specified name, only the first one found is processed. The others are ignored.
Features not supported
The valueOf method does not support the following features, which are standard features of the xsl:value-of XSLT element:
- disable-output-escaping
- processing instruction nodes
- comment nodes
- namespace nodes
Since the second form of call listed above is the only form actually used in this program, I will discuss only those portions of the method that support that form. I will defer discussion of the other portions of the method until they are used in subsequent lessons.
Process the context node
The code in Listing 16 picks up at the point where it is determined that the incoming value for select is a String object's reference with a value of "." (note the period between the quotation marks). This is a request to return the value of the context node.
This method supports two possibilities for the context node:
- Element node - return the concatenated text values of all descendant nodes of the context node.
- Text node - return the text value of the text node.
When the context node is an element node ...
The code in Listing 16 shows the beginning of the code required to process the context node as an element node.
public String valueOf(Node node,String select){
|
In preparation for processing all descendant nodes of the context node, the code in Listing 17 gets a list of child nodes, along with the length of the list.
In addition, the code in Listing 17 initializes a String variable named nodeTextValue that will be used to collect the concatenated text values of the descendant nodes. Note that this variable is initialized to contain an empty string.
NodeList childNodes = |
Having gotten a list of child nodes of the context node, all that is required to accomplish the objective is to make a series of recursive calls to the valueOf method, passing each child node in turn to the valueOf method as shown in Listing 18.
for(int j = 0; j < listLen; j++){
|
Concatenation
The code in Listing 18 also deals with concatenation. The value returned from each call to the valueOf method is concatenated with the text value already stored in the variable named nodeTextValue.
Finally, after all child nodes have been processed, the code in Listing 18 returns the concatenated value stored in the variable named nodeTextValue.
When the context node is a text node ...
If you understood all of the above, (including the recursion), you should find it easy to understand the code shown in Listing 19. Listing 19 shows the case where the context node is a text node.
}else if(nodeType == Node.TEXT_NODE){ |
One other possibility
There is one other possibility that is handled by the code in Listing 20. That possibility is that the context node is neither a text node nor an element node. In that case, the valueOf method returns an empty string.
}else{
|
Returning to the switch statement that began in Listing 10, we find two additional cases, each of which invokes the same method by default:
- COMMENT_NODE
- PROCESSING_INSTRUCTION_NODE
case Node.COMMENT_NODE:{ |
I will discuss the defComOrProcInstrTemp method shortly. First, however, I will explain the extra code that appears in the default portion of the processing instruction node case in Listing 21.
The purpose of a processing instruction in an XML file is to provide instructions to processing programs such as this one. The XML file shown in Listing 29 contains the three processing instructions shown in Listing 22.
<?dummy-target dummy-data="def"?> |
The first and third of the three processing instructions are dummy processing instructions put there to test the capabilities of this program. However, the processing instruction in the middle is a real processing instruction that specifies the name of the file containing a stylesheet. That stylesheet will be used later when this program causes an XSLT transformation to take place using the XML file in Listing 29, and the stylesheet file identified in Listing 22. (That stylesheet actually appears in Listing 30.)
In order to use that processing instruction to identify the stylesheet file, this program must capture the processing instruction and extract the file name from the processing instruction. A statement in the second case in Listing 21 causes references to all processing instruction nodes to be added to and saved in static variable of the Dom11 class named procInstr.
That information will be used later to extract the name of the stylesheet file from the processing instruction.
The defComOrProcInstrTemp method
Both of the switch cases shown in Listing 21 invoke this method as their default behavior. A complete listing of the defComOrProcInstrTemp method is shown in Listing 23.
String defComOrProcInstrTemp(Node node) |
<xsl:template Figure 8 |
According to Nutshell, the built-in template rule for comments and processing instructions doesn't output anything into the output tree. Therefore, the defComOrProcInstrTemp method shown in Listing 23 simply returns an empty string.
The namespace node case
The default case for the switch statement begun in Listing 10 is shown in Listing 24.
default:{
|
Also, here is what Nutshell has to say about the built-in template rule for namespace nodes:
Therefore, the default case in Listing 24, which catches all namespace nodes, doesn't send anything to the output.
End of the processNode method
I have discussed everything of significance in the processNode method. Continuing to follow the execution thread, I will now turn my attention back to the main method.
Perform an XSLT transformation
After the code has been executed to process the document using program code (beginning with the invocation of the processDocumentNode method in Listing 7), the statement in Listing 25 invokes the doXslTransform method to cause the XML document to be transformed using the stylesheet identified in one of the processing instructions in the XML file.
thisObj.doXslTransform( |
The success of the method call in Listing 25 depends on the stylesheet processing instruction having been saved while the document was being processed. Otherwise, it would be necessary to add code in this method to search the DOM tree for the stylesheet processing instruction.
All processing instructions are saved in a Vector object by this program. The Vector object's reference is passed as the third parameter to this method. The first parameter is a reference to the Document or root node in the DOM tree. The second parameter is the name of the output file.
The doXslTransform method
The doXslTransform method begins in Listing 26. This method uses an XSLT stylesheet file to transform an incoming Document object into an output file. A large portion of the code in this method is dedicated to:
- Identifying the processing instruction containing the stylesheet information.
- Extracting the stylesheet information from the processing instruction.
The code in Listing 26 searches the Vector object seeking a processing instruction node that contains a stylesheet reference.
void doXslTransform(Document document, |
To see how this code works, first take a look at the processing instruction in the XML file that contains the stylesheet reference. This processing instruction was shown in Listing 22, and is repeated below in Figure 9 for convenient viewing.
<?xml-stylesheet Figure 9 |
The purpose of a processing instruction is to provide information to processing programs that will be used to process the XML file.
Format of a processing instruction
According to Nutshell,
Applying this knowledge to the stylesheet processing instruction in Figure 9, you can see that the target consists of the following text: xml-stylesheet.
Accessing the target and the data
The target of a processing instruction node can be accessed in Java by invoking the getTarget method on the processing instruction node's reference.
The remainder of the text in the processing instruction can be accessed by invoking the getData method on the same reference.
The code in Listing 26 examines each of the objects in the Vector, invoking getTarget and getData, searching for a processing instruction whose target and data match that which is known to be true for a stylesheet. When a match is found, the code breaks out of the for loop.
If no match is found, the code in Listing 26 throws an exception.
Extract the stylesheet file name
Having identified the processing instruction that contains the stylesheet reference, the code in Listing 27 uses the getData method of the ProcessingInstruction interface, along with some methods of the String class to extract the name of the file containing the stylesheet.
String xslFile = pi.getData(). |
Do the XSLT transformation
The remaining code in the doXslTransform method is shown in Listing 28.
//Get a TransformerFactory object |
The code in Listing 28 is not new to this series of lessons. This code was discussed in detail in the earlier lesson entitled Getting Started with Java JAXP and XSL Transformations (XSLT). Therefore, other than to point out one difference relative to the previous code, and to review the steps involved, I won't discuss the code in Listing 28 further in this lesson.
Steps for creating a Transformer object
The following two steps are required to create a Transformer object. Once a Transformer object is available, it can be used to transform one DOM tree into another DOM tree.
- Create a TransformerFactory object by invoking the static newInstance method of the TransformerFactory class.
- Invoke the newTransformer method on the TransformerFactory object.
There is one important difference between the code in Listing 28 and the code in the earlier lesson. The two programs invoke different overloaded versions of the newTransformer method of the TransformerFactory class.
The earlier lesson entitled Getting Started with Java JAXP and XSL Transformations (XSLT) invoked a version that took no parameters and returned a Transformer object that simply copies a source tree to a result tree.
The code in Listing 28 invokes a version of the newTransformer method that takes the stylesheet file as an input parameter and returns a Transformer object that uses the stylesheet file to perform an XSLT transformation.
That concludes the discussion of the program named Dom11.
Run the Program
I encourage you to copy the Java code, XML files, and XSL files from the listings near the end of this lesson. Compile and execute the programs. Experiment with them, making changes, and observing the results of your changes.
Summary
I explained default XSLT behavior and showed you how to write Java code that mimics that behavior. The resulting Java code serves as a skeleton for more advanced transformation programs.What's Next?
In the next lesson, I will show you how to write a Java program that mimics an XSLT transformation for converting an XML file into a text file. I will also show that once you have a library of Java methods that emulate XSLT elements, it is no more difficult to write a Java program to transform an XML document than it is to write an XSL stylesheet to transform the same document.
Complete Program Listings
<?xml version="1.0"?> |
<?xml version='1.0'?> |
/*File Dom11.java |
<?xml version="1.0"?> |
<?xml version='1.0'?> |
Copyright 2004, Richard G. Baldwin. Reproduction in whole or in part in any form or medium without express written permission from Richard Baldwin is prohibited.
About the author
Richard Baldwin is a college professor (at Austin Community College in Austin, TX) and private consultant whose primary focus is a combination of Java, C#, and XML. In addition to the many platform and/or language independent benefits of Java and C# applications, he believes that a combination of Java, C#, and XML will become the primary driving force in the delivery of structured information on the Web.Richard has participated in numerous consulting projects, and he frequently provides onsite training at the high-tech companies located in and around Austin, Texas. He is the author of Baldwin's Programming Tutorials, which has gained a worldwide following among experienced and aspiring programmers. He has also published articles in JavaPro magazine.
Richard holds an MSEE degree from Southern Methodist University and has many years of experience in the application of computer technology to real-world problems.
-end-
