Java JAXP, Writing Java Code to Emulate an XSLT Transformation
Java Programming Notes # 2208
- Preface
- Preview
- Some Details Regarding XSLT
- Discussion and Sample Code
- Run the Program
- Summary
- What's Next?
- Complete Program Listings
Preface
In the previous lesson entitled
Java JAXP,
Implementing Default XSLT Behavior in Java , I
explained default XSLT behavior,
and showed you how to write Java code that mimics default XSLT
behavior.
The Java program named Dom11 that I developed in that lesson serves as
a skeleton for more
advanced
transformation programs.
This lesson updates Dom11 into a new program that tests and
exercises several methods that were not
tested by the samples used in the previous lesson.
I will show that once you have a library of Java methods that emulate XSLT elements, it is no more difficult to write a Java program to transform an XML document than it is to write an XSL stylesheet to transform the same document.
JAXP
JAXP is an
API designed
to help you write programs for creating and processing XML
documents. It is a critical part of Sun's Java Web Services Developer
Pack
(JWSDP).
This lesson is one in a series designed to help you understand how to use JAXP and how to use the JWSDP.
The first lesson in the series was entitled Java API for XML Processing (JAXP), Getting Started . The previous lesson was entitled Java JAXP, Implementing Default XSLT Behavior in Java.
XML
XML is an acronym for the eXtensible Markup Language. I will assume that you already understand XML, and will teach you how to use JAXP to write programs for creating and processing XML documents.XSL and XSLT
XSL is an acronym for Extensible Stylesheet language. XSLT is an acronym for XSL Transformations.
- Transforming non-XML documents into XML documents.
- Transforming XML documents into other XML documents.
- Transforming XML documents into non-XML documents.
Viewing tip
You may find it useful to open another copy of this lesson in a separate browser window. That will make it easier for you to scroll back and forth among the different listings and figures while you are reading about them.
Supplementary material
I recommend that you also study the other lessons in my extensive collection of online Java and XML tutorials. You will find those lessons published at Gamelan.com. As of the date of this writing, Gamelan doesn't maintain a consolidated index of my tutorial lessons, and sometimes they are difficult to locate there. You will find a consolidated index at www.DickBaldwin.com.
Preview
A tree structure in memory
A DOM parser can be used to
create a tree structure in memory that represents an XML
document. In Java, that tree structure is encapsulated in an
object of the interface type Document.
Many operations are possible
Given an object of type Document (often called a DOM tree), there
are many
methods that
can be invoked on the object to perform a variety of operations.
For example, it is possible to write Java code to:
- Move nodes from one location in the tree to another location in the tree
- Delete nodes
- Insert new nodes
- Recursively traverse the tree, extracting information about the nodes along the way
- Various combinations of the above
Two ways to
transform an XML document
There are at least two ways to transform the contents of an XML
document into another document:
- By writing Java code to manipulate the DOM tree and perform the transformation.
- By using XSLT to perform the transformation.
As is usually the case, there are advantages and disadvantages to
both approaches.
As an example of an advantage provided by XSLT, if it is possible to perform the required transformation using XSLT, that approach will probably require you to write less code than would be required to perform the same transformation by writing a Java program from scratch. However, I will show that once you have a library of Java methods that emulate XSLT elements, it is no more difficult to write a Java program to transform an XML document than it is to write an XSL stylesheet to transform the same document.
Debugging
XSLT can be difficult
In my
opinion, it is much easier to debug a Java program than it is to debug
an XSL stylesheet that doesn't work properly. However, the use of
a good XSLT debugger may resolve that difference.
Java
provides more detailed control
A skeleton
library of Java methods
This is one of several lessons that show you
how to write the skeleton of a Java library containing methods that
emulate the most common XSLT elements. Once you have the library,
writing Java code to transform XML documents consists mainly of writing
a short driver program to access and use those methods. Thus,
given the proper library of methods, it is no more difficult to write a
Java program to perform the transformation than it is to write
an
XSLT stylesheet.
Library is
not my primary purpose
However, my primary purpose in these lessons is not to provide such a library, but rather is to help you understand how to use a DOM tree to create, modify, and manipulate XML documents. By comparing Java code that manipulates a DOM tree with similar XSLT operations, you will have an opportunity to learn a little about XSLT in the process of learning how to manipulate a DOM tree using Java code.
Some Details Regarding XSLT
Assume that an XML document has been parsed to produce a DOM
tree
in memory that represents the XML document.
An XSLT processor starts examining the DOM tree at its root
node. It
obtains instructions from the XSLT stylesheet telling it how to
navigate the
tree, and how to treat each node that it encounters along the way.
Finding
and applying matching template rules
As each node is encountered, the processor searches the stylesheet
looking for a template rule that governs how to treat nodes of that
type. If the
processor finds
a template rule that matches the node type, it performs the operations
indicated by the template rule. If it doesn't find a matching
template rule, it
executes a built-in template rule appropriate to that node. (I explained the behavior of the built-in
template rules in the previous lesson.)
Literal text
in the XSLT stylesheet elements
You can think of the XSLT process as operating on an input DOM tree to produce an output DOM tree. If the template rule being applied contains literal text, that literal text is used to create text nodes in the output tree.
Traversing
child nodes
An XPath expression can be
used to point to a specific node and to
establish that node as the context node. Once a context node is
established, there are at least two XSLT elements that can be used to
traverse the children of that node:
- xsl:apply-templates
- xsl:for-each
The
xsl:apply-templates element
The first of these, xsl:apply-templates,
examines all child nodes of the context node that match
an optional select
attribute. If the optional select attribute is omitted, then
all child nodes of the context node are examined.
(When combined with a default template rule, this often results in a recursive examination and processing of all descendant nodes of the context node.)
Applying
template rules
As each child node is examined, it is processed using a matching template rule or a built-in template rule.
Iterative operationThe second XSLT element in the above list, xsl:for-each, executes an iterative
examination of all child nodes of the context node that
match a required select attribute.
Note that unlike with the xsl:apply-templates
element, the select attribute
is not
optional for this element.
The processor examines all child nodes of the context node that match the select attribute. As each child node is examined, it is processed using a matching template rule or a built-in template rule.
Let's see some code
I will begin by discussing the XML file named Dom12.xml (shown in Listing 25 near the end of the
lesson) along with
the XSL
stylesheet file named Dom12.xsl
(shown in Listing 26).
A Java program
named Dom12
After explaining the transformation produced by applying this stylesheet to this XML document, I will explain the transformation produced by processing the XML file with a Java program named Dom12 (shown in Listing 24) that mimics the behavior of the XSLT transformation.
Discussion and Sample Code
The XML file shown in Listing 25 is relatively straightforward. A tree view of the XML file is shown in Figure 1. (This XML file is both well-formed and valid.) I used alternating colors of red and blue to identify successive nodes named theData. The reason for doing this will become apparent later.
#document DOCUMENT_NODE
top DOCUMENT_TYPE_NODE
#comment COMMENT_NODE
#comment COMMENT_NODE
dummy-target PROCESSING_INSTRUCTION_NODE
xml-stylesheet PROCESSING_INSTRUCTION_NODE
false-target PROCESSING_INSTRUCTION_NODE
top ELEMENT_NODE
theData ELEMENT_NODE
Attribute: attr=Dummy Attr Value
title ELEMENT_NODE
#text Java
subtitle ELEMENT_NODE
Attribute: position=Low
#text really
part1 ELEMENT_NODE
#text This is part 1
part2 ELEMENT_NODE
#text This is part 2
#text rules author ELEMENT_NODE
#text R.Baldwin
price ELEMENT_NODE
#text $9.95
theData ELEMENT_NODE
title ELEMENT_NODE
#text Python
author ELEMENT_NODE
#text R.Baldwin
price ELEMENT_NODE
#text $15.42
theData ELEMENT_NODE
title ELEMENT_NODE
#text XML
author ELEMENT_NODE
#text R.Baldwin
price ELEMENT_NODE
#text $19.60
Figure 1
|
(This tree view of the XML file was produced using a program named DomTree02, which was discussed in an earlier lesson. Note that in order to make the tree view more meaningful, I manually removed extraneous line breaks and text nodes associated with those line breaks. The extraneous line breaks in Figure 1 were caused by extraneous line breaks in the XML file. The extraneous line breaks in the XML file were placed there for cosmetic reasons and to force it to fit into this narrow publication format.)
A database of books
As you may already have figured out,
this XML document represents a small database containing information
about fictitious books.
It is important to note, however, that the structure and content of
this XML file
was not intended to have any purpose other than to illustrate the
concepts being covered in this lesson. In other words, some of
the structure makes no sense with regard to a database containing
information about books.
The XSLT Transformation
The XSL stylesheet file named Dom12.xslRecall that an XSL stylesheet is itself an XML file, and can therefore be represented as a tree. Figure 2 presents an abbreviated tree view of the stylesheet shown in Listing 26. I colored each of the five template rules in this view with alternating colors of red and blue to make them easier to identify visually.
(As is often the case with XSL stylesheets, this stylesheet file is well-formed but it is not valid.)
#document DOCUMENT_NODE
xsl:stylesheet ELEMENT_NODE
Attribute: xmlns:xsl=http:
//www.w3.org/1999/XSL/Transform
Attribute: version=1.0
xsl:template ELEMENT_NODE
Attribute: match=/
#textA Match Root
xsl:apply-templates ELEMENT_NODE
Attribute: select=top
xsl:template ELEMENT_NODE
Attribute: match=top
#textB Match top
xsl:apply-templates ELEMENT_NODE
Attribute: select=theData
xsl:template ELEMENT_NODE
Attribute: match=theData
#textC Match theData and show attribute
xsl:value-of ELEMENT_NODE
Attribute: select=@attr
xsl:apply-templates ELEMENT_NODE
Attribute: select=title
xsl:template ELEMENT_NODE
Attribute: match=title
#text
D Match title and show value of title as context
xsl:value-of ELEMENT_NODE
Attribute: select=.
#textE Show value of subtitle
xsl:value-of ELEMENT_NODE
Attribute: select=subtitle
xsl:apply-templates ELEMENT_NODE
Attribute: select=subtitle
xsl:template ELEMENT_NODE
Attribute: match=subtitle
#text
F match subtitle and show value of attribute
xsl:value-of ELEMENT_NODE
Attribute: select=@position
#text
G Show value of subtitle as context node
xsl:value-of ELEMENT_NODE
Attribute: select=.
Figure 2
|
Why abbreviated?
The reason that I refer to this as an abbreviated tree view is because I manually deleted comment nodes and extraneous text nodes in order to emphasize the important elements in the stylesheet.
(Extraneous text nodes occur as a result of inserting line breaks in the original XSL document for cosmetic purposes. Note that I also manually entered a line break in the third line of Figure 2 to force the material to fit into this narrow publication format.)
The root element
The root node of all XML documents is the document node. In
addition to the root node, there is also a root element, and it is
important not to confuse the two.
As you can see from Figure 2, the root element in the XSL document is
of type xsl:stylesheet.
The root element has two attributes, each of which is standard for XSL
stylesheets.
The first attribute points to the XSLT namespace URI, which you can
read about in the W3C
Recommendation. The second attribute provides the XSLT
version.
Children of the
root element node
The root element node in Figure
2 has five child
nodes, each of which is a template rule. (I discussed template rules in detail in
the previous lesson.)
Each of the five child nodes of the root node has a match
pattern. The five match patterns in the order that they appear in
Figure 2 are as follows:
- match=/ (root node)
- match=top (matches element node named top)
- match=theData (matches element node named theData)
- match=title (matches element node named title)
- match=subtitle (matches element node named subtitle)
(Note that the Java program discussed later produces essentially the same output as the XSLT transformation.)
The result of performing an XSLT transformation by applying the XSL stylesheet shown in Listing 26 to the XML file shown in Listing 25 is shown in Figure 3.
I will explain the operations in the XSLT transformation that produced each line of text in Figure 3.
<?xml version="1.0" encoding="UTF-8"?> A Match Root B Match top C Match theData and show attribute Dummy Attr Value D Match title and show value of title as context Java really This is part 1 This is part 2 rules E Show value of subtitle really This is part 1 This is part 2 F match subtitle and show value of attribute Low G Show value of subtitle as context node really This is part 1 This is part 2 C Match theData and show attribute D Match title and show value of title as context Python E Show value of subtitle C Match theData and show attribute D Match title and show value of title as context XML E Show value of subtitle |
(Note that I manually deleted a couple of extraneous line breaks from the output shown in Figure 3.)
The first line of text in the output shown in Figure 3 is an XML declaration that is produced automatically by the XSLT transformer available with JAXP.
(Note however, that the existence of this line of text doesn't cause the document to be an XML document. This document cannot be parsed as an XML document. An attempt to do so results in various parser errors.)
The first template rule (extracted from Figure 2) is shown in tree view in Figure 4. This template rule contains an XPath expression that matches the document root (note the forward slash).
xsl:template ELEMENT_NODE
Attribute: match=/
#textA Match Root
xsl:apply-templates ELEMENT_NODE
Attribute: select=top
Figure 4
|
Listing 1 shows the same template rule in XSL format, (extracted from Listing 26).
<xsl:template match="/"> A Match Root <xsl:apply-templates select="top" /> </xsl:template> Listing 1 |
What is the effect
of a literal text node?
This template rule contains a
literal text node, which is highlighted in red in Figure 4 and Listing
1.
When an XSL stylesheet is used to perform an XSLT transformation on an
XML file, any text nodes that exist in the XSL stylesheet are
reproduced in the output tree. As a result, the output contains
the text shown in Figure 5 (extracted
from the top of Figure 3 above). Note that the text in the
output matches the text node in the stylesheet.
A Match Root Figure 5 |
<xsl:apply-templates select="top" />
Note that the context node at this point in the process is the document node. The literal text node in Listing 1 is followed by an xsl:apply-templates element with a select attribute value of top. This instructs the XSLT processor to search out all child nodes of the document node whose names are top, and to apply one the following template rules to each of those nodes:
- A template rule that matches top, or
- A built-in template rule for the type of node if there is no matching template rule.
A template rule that matches top
The tree view fragment of the XSL file shown in Figure 6 shows that the stylesheet does contain a template rule that matches top.
xsl:template ELEMENT_NODE
Attribute: match=top
#textB Match top
xsl:apply-templates ELEMENT_NODE
Attribute: select=theData
Figure 6
|
(The template rule in Figure 6 was extracted from Figure 2. It is the first blue template rule in Figure 2.)
Listing 2 shows the XSL code fragment that
corresponds to the tree view of the template rule shown in Figure 6.
<xsl:template match="top"> B Match top <xsl:apply-templates select="theData" /> </xsl:template> Listing 2 |
Another literal text node
Once again, the template rule contains a literal text node, (highlighted in red), which passes through to the output shown in Figure 3. You should be able to identify this literal text in the third line in the output shown in Figure 3 with no difficulty.
<xsl:apply-templates select="theData" />
At this point, the context node is the node named top. This template rule also contains an xsl:apply-templates element immediately following the literal text. In this case, the value of the select attribute is theData.
This element instructs the XSLT processor to search out all child nodes of top named theData and to apply one the following template rules to each of those child nodes:
- A template rule that matches theData, or
- A built-in template rule for the type of node if there is no matching template rule.
Figure 1 shows that top has three child nodes named theData.
(I colored those three nodes in alternating colors of red and blue in Figure 1 to make them easier to identify.)
As you can see in Figure 1, the first node named theData is somewhat more complex
than the other two nodes with the same name. I purposely made it
more complex to illustrate several concepts that I will cover in this
lesson.
A template rule
that matches theData
Referring back to the tree view in Figure 2, we see that the
stylesheet does have a template rule that matches theData. That fragment of the
style sheet tree view is extracted from Figure 2 and reproduced in
Figure 7 below.
xsl:template ELEMENT_NODE
Attribute: match=theData
#textC Match theData and show attribute
xsl:value-of ELEMENT_NODE
Attribute: select=@attr
xsl:apply-templates ELEMENT_NODE
Attribute: select=title
Figure 7
|
The corresponding stylesheet code fragment is shown in Listing 3. In both cases, a literal text node in the stylesheet is highlighted in red.
<xsl:template match="theData"> C Match theData and show attribute <xsl:value-of select="@attr" /> <xsl:apply-templates select="title" /> </xsl:template> Listing 3 |
Literal text in the output
As always, the text node in the template rule is reproduced in the output. You should be able to identify this text in the fourth line of output text in Figure 3.
A more complex template rule
This template rule is a little more complex than those discussed previously. In particular, this template rule has two XSLT elements following the literal text.
<xsl:value-of select="@attr" />
The first element following the literal text in Listing 3 is an element that instructs the XSLT processor to get the value of an XML attribute named attr (belonging to the context node) and to cause that value to become a text node in the output.
The item for which the value is to be obtained is specified by the value of the XSL attribute named select. The fact that the value of the XSL attribute begins with @ specifies that the target is an attribute in the XML file belonging to the context node.
Following the execution thread
I am currently following the execution thread in discussing the transformation. At this point in the process, the context node is the first XML node named theData.
Referring back to Figure 1, you can see that the first XML node named theData has an attribute named attr whose value is "Dummy Attr Value".
Figure 8 shows a recap of the output down to and including the value of the XML attribute named attr. Note that only the value of the XML attribute appears in the output. The name of the XML attribute does not appear in the output.
<?xml version="1.0" encoding="UTF-8"?> A Match Root B Match top C Match theData and show attribute Dummy Attr Value ... Figure 8 |
<xsl:apply-templates select="title" />
The second element inside the template rule shown in Listing 3 instructs the XSLT processor to search for all nodes named title that are children of the context node. As each such child node is encountered, the processor is to apply a template rule that matches title, or a built-in template rule if there is no matching template rule.
Referring back to the stylesheet tree view in Figure 2, we see that the stylesheet does have a template rule that matches title. That fragment of the tree view was extracted from Figure 2 and is reproduced in Figure 9 below.
xsl:template ELEMENT_NODE
Attribute: match=title
#text
D Match title and show value of title as context
xsl:value-of ELEMENT_NODE
Attribute: select=.
#textE Show value of subtitle
xsl:value-of ELEMENT_NODE
Attribute: select=subtitle
xsl:apply-templates ELEMENT_NODE
Attribute: select=subtitle
Figure 9
|
The corresponding stylesheet code fragment
The corresponding stylesheet code fragment is shown in Listing 4. Literal text nodes in the stylesheet are highlighted in red in both views. Note that in this case there are two separate text nodes in the template rule separated by an xsl:value-of element.
<xsl:template match="title"> D Match title and show value of title as context <xsl:value-of select="." /> E Show value of subtitle <xsl:value-of select="subtitle" /> <xsl:apply-templates select="subtitle" /> </xsl:template> Listing 4 |
You should have no difficulty identifying the result of the first text node in the sixth line of text in the output in Figure 3.
The template rule shown in Listing 4 is considerable more complex than those shown previously.
<xsl:value-of select="." />
This is the first XSLT element following the first text node in Listing 4. A select value of "." specifies the context node, which in this case is an element named title. (Note that my discussion is still following the thread of execution.) As such, this will be the element named title belonging to the first XML element named theData in the XML document represented by the tree view in Figure 1.
I have extracted that tree view fragment of the XML document from Figure 1 and reproduced it in Figure 10 below with the XML text nodes highlighted in green.
title ELEMENT_NODE
#text Java
subtitle ELEMENT_NODE
Attribute: position=Low
#text really
part1 ELEMENT_NODE
#text This is part 1
part2 ELEMENT_NODE
#text This is part 2
#text rules
Figure 10
|
Get concatenated text values
As you will see shortly, this XSLT element instructs the processor to get (and send to the output) the concatenated text values of the context node and all of its descendant nodes.
The descendant nodes of the node named title in Figure 10 are:
- subtitle
- part1
- part2
(The order of the text nodes and the descendant element nodes is important.)
Figure 11 shows a recap of the output up to this point in the execution thread, with the red output in Figure 11 matching the concatenated green text node values of title and all its descendants in Figure 10.
(Note that the order in which the text node values are concatenated matches the order in which the nodes occur in the XML document.)
<?xml version="1.0" encoding="UTF-8"?> A Match Root B Match top C Match theData and show attribute Dummy Attr Value D Match title and show value of title as context Java really This is part 1 This is part 2 rules ... Figure 11 |
Another XSL text node
The next thing in the template rule shown in Listing 4 is another XSL text node, which will be reproduced in the output. (This text node is also colored red in Listing 4.) You should have no difficulty identifying this text node in the output in Figure 3.
<xsl:value-of select="subtitle" />
The second text node in Listing 4 is followed by another xsl:value-of element, but this time with a different value for the select attribute. A select value of "subtitle" instructs the XSLT processor to get (and send to the output) the concatenated text values of a child node named subtitle and all of its descendants.
(The context node at this point is still the node named title, so the processor is looking for a node named subtitle as a child of title.
Although I haven't seen it written down anywhere, it is easy to demonstrate that if there are two or more child nodes with that name, only the first one found is processed. The others are ignored.)
subtitle ELEMENT_NODE
Attribute: position=Low
#text really
part1 ELEMENT_NODE
#text This is part 1
part2 ELEMENT_NODE
#text This is part 2
Figure 12
|
Recap the output
Figure 13 shows the output up to this point in the execution thread with the red output in Figure 13 corresponding to the concatenated green text node values in Figure 12.
<?xml version="1.0" encoding="UTF-8"?> A Match Root B Match top C Match theData and show attribute Dummy Attr Value D Match title and show value of title as context Java really This is part 1 This is part 2 rules E Show value of subtitle really This is part 1 This is part 2 ... Figure 13 |
<xsl:apply-templates select="subtitle" />
The last XSLT element in the template rule in Listing 4 is an xsl:apply-templates element with the value of the select attribute being subtitle.
At this point in the execution stream, the context node is a node named title. This element instructs the processor to search for all child nodes of title named subtitle. As usual, when a matching node is found, one of the following two template rules will be applied to that node:
- A template rule that matches subtitle, or
- A built-in template rule for the type of node if there is no matching template rule.
The final template rule from Figure 2 is reproduced below. This template rule matches subtitle.
xsl:template ELEMENT_NODE
Attribute: match=subtitle
#text
F match subtitle and show value of attribute
xsl:value-of ELEMENT_NODE
Attribute: select=@position
#text
G Show value of subtitle as context node
xsl:value-of ELEMENT_NODE
Attribute: select=.
Figure 14
|
(Note that even though I arranged the template rules in the stylesheet in the order that I wanted to discuss them, the order of the template rules in the stylesheet is immaterial. I could completely rearrange them and the results would be the same.)
Listing 5 shows a fragment of the XSL stylesheet that corresponds to the tree view of the template rule in Figure 14. Once again, in both cases, text nodes in the stylesheet are highlighted in red.
<xsl:template match="subtitle"> F match subtitle and show value of attribute <xsl:value-of select="@position" /> G Show value of subtitle as context node <xsl:value-of select="." /> </xsl:template> Listing 5 |
You should have no difficulty identifying the first text node in Listing 5 as it appears in Figure 3.
<xsl:value-of select="@position">
The element following the first text node in Listing 5 is an xsl:value-of element that instructs the processor to get the value of an XML attribute named position belonging to the context node. (I discussed an element like this earlier.)
Figure 1 shows this attribute to have a value of low in the subtitle node belonging to title node, which in turn belongs to the first node named theData. The word low appears at the appropriate location in the output shown in Figure 3.
Another XSL text node
The next item in the template rule in Listing 5 is another XSL text node. This text also appears at the appropriate location in the output in Figure 3.
<xsl:value-of select="." />
The last element in the template rule shown in Listing 5 instructs the processor to get the concatenated text value of the context node and all its descendants. (I also discussed an element like this earlier.)
Continuing with the execution thread, the context node at this point is still the subtitle node belonging to title node, which in turn belongs to the first node named theData in Figure 1. A tree view fragment of that node, extracted from Figure 1, is shown in Figure 15. The text nodes belonging to subtitle, part1, and part2 are highlighted in green in Figure 15.
subtitle ELEMENT_NODE
Attribute: position=Low
#text really
part1 ELEMENT_NODE
#text This is part 1
part2 ELEMENT_NODE
#text This is part 2
Figure 15
|
Recap the output
Figure 16 shows the output up to this point in the execution thread. The concatenated text values highlighted in red in Figure 16 correspond to the text values highlighted in green in Figure 15.
<?xml version="1.0" encoding="UTF-8"?> A Match Root B Match top C Match theData and show attribute Dummy Attr Value D Match title and show value of title as context Java really This is part 1 This is part 2 rules E Show value of subtitle really This is part 1 This is part 2 F match subtitle and show value of attribute Low G Show value of subtitle as context node really This is part 1 This is part 2 ... Figure 16 |
The same portion of the tree from different viewpoints
Figure 16 also shows some output text highlighted in blue that is identical to that highlighted in red. (The blue text is output text that was discussed earlier.)
The blue output in Figure 16 was produced by the following XSLT element that appears in Listing 4 where the context node was title:
<xsl:value-of select="subtitle" />The red output text in Figure 16 was produced by the following XSLT element that appears in Listing 5 where the context node was subtitle:
<xsl:value-of select="." />Both XSLT elements refer to the same portion of the tree, but from different viewpoints. The first XSLT element refers to the subtitle node from the viewpoint of its parent named title. The second XSLT element refers to the subtitle node from the viewpoint of the subtitle node itself.
End of the recursion
Note that the template rule shown in Listing 5 contains only text nodes and xsl:value-of elements. There are no xsl:apply-templates or xsl:for-each elements. Thus, there are no instructions for the XSLT processor to continue drilling down into the depths of the DOM tree. As a result, the recursive process works it way back toward the root of the tree.
The nodes named author and price
Referring back to Figure 1, we see that the first node named theData has two more child nodes that haven't been processed yet:
- author
- price
author ELEMENT_NODE
#text R.Baldwin
price ELEMENT_NODE
#text $9.95
Figure 17 |
What do they contribute to the output?
In order for these two nodes to contribute anything to the output, something in the XSL stylesheet must cause each of them to become the context node at some point in the process.
However, an examination of the five template rules in Figure 2 reveals that none of the template rules will cause either of these nodes to become the context node at any point in the process. Therefore, they cannot contribute to the output.
Summary of the five template rules
The first template rule shown in Figure 2, Figure 4, and Listing 1 matches the root (document) node and causes templates to be applied to nodes named top.
The second template rule shown in Figure 2, Figure 6, and Listing 2 matches nodes named top and causes templates to be applied to nodes named theData.
The third template rule shown in Figure 2, Figure 7, and Listing 3 matches nodes named theData and causes templates to be applied to nodes named title.
(This might be the most likely place to find something in the stylesheet that would cause the nodes named author and price to become context nodes, but that doesn't happen. The template rule that matches their parent, theData, simply ignores the child nodes named author and price.)
Finally, the fifth template rule shown in Figure 2, Figure 14, and Listing 5 matches subtitle and doesn't cause template rules to be applied to any other nodes. Thus, it signals the end of the traversal down one leg of the DOM tree.
Not necessary to contribute to the output
Therefore, this XSLT transformation completely ignores the nodes named author and price, and they do not contribute anything to the output.
The main point is that it is not necessary for everything in an XML document to contribute to the output of an XSLT transformation. The author of the stylesheet can pick and choose among the nodes in the DOM tree that will be used to produce nodes in the output tree.
Completes processing of first node named theData
That completes the processing of the first node in Figure 1 named theData. Figure 16 shows all of the output produced by processing that node.
Referring back to Figure 6, we see an xsl:apply-templates element instructing the XSLT processor to apply templates to all nodes named theData that are children of the node named top. So far, only one such node named theData has been processed. Referring to Figure 1, we see that there are two more nodes named theData waiting to be processed.
The second node named theData
The second node named theData was extracted from Figure 1 and reproduced in Figure 18.
theData ELEMENT_NODE
title ELEMENT_NODE
#text Python
author ELEMENT_NODE
#text R.Baldwin
price ELEMENT_NODE
#text $15.42
Figure 18 |
Comparing Figure 18 with the first node named theData in Figure 1 reveals that the second node named theData is much simpler than the first node named theData. In particular, the title node in Figure 18 doesn't have any children, whereas the title node in Figure 10 has one child (subtitle) and two grandchildren (part1 and part2).
Furthermore, we also know by now that the nodes named author and price in Figure 18 will be completely ignored by the XSLT processor.
Won't explain the processing in detail
Given all of that, it shouldn't be necessary for me to explain the processing in detail for this node. The processing proceeds as before, and produces the output shown in Figure 19.
C Match theData and show attribute |
A couple of things in Figure 19 are worthy of note.
No attribute named attr
To begin with, unlike the first node named theData, the second node named theData doesn't have an attribute named attr. Therefore, unlike the output shown in Figure 16, the value of that attribute is blank in Figure 19.
(See the template rule in Figure 7 that selects the value of the attribute named attr.)
Also, unlike the first node named theData, the second node named theData doesn't have descendants named subtitle, part1, or part2. Therefore, all the output contributed by those descendant nodes to the output in Figure 16 is missing from Figure 19.
One more node named theData
An examination of Figure 1 shows that there is one more node named theData waiting to be processed. However, except for the text values of the child nodes named title, author, and price, it is identical to the second node named theData, which was discussed above. Therefore, a further discussion of the final node named theData is not warranted.
The Java Code Transformation
Now let's change direction and concentrate on Java code rather than XSLT elements. The following paragraphs describe a Java program named Dom12, which emulates the XSLT transformation described above.This program is an update of the program named Dom11 from the previous lesson. This updated program is designed to test and exercise features of various methods that were not tested by the sample used with Dom11.
Mainly, this program adds code to the processNode method to simulate the template rules in the XSL file named Dom12.xsl.
Also, as was the case in the previous lesson, this program implements six built-in template rules for an XML processor.
Instructions for creating a custom template rule
To create a custom template rule for this program:
- Go to the processNode method.
- Identify the node type.
- Change the conditional clause in the if statement to implement the required match.
- Write code in the body of the if statement to implement the custom rule.
Behavior of the program
This program compares the transformation of a specified XML file into a result file, using two different approaches:
- An XSLT style sheet and transformation, as discussed above.
- Program code that emulates the behavior of the XSLT transformation.
Usage instructions
The program requires three command line arguments in the following order:
- The name of the input XML file - must be Dom12.xml.
- The name of the output file to be produced by the XSLT transformation.
- The name of the output file to be produced by the program code that emulates the XSLT transformation.
Order of execution
The program begins by executing code to transform the incoming XML file in a way that mimics the XSLT Transformation. Along the way, it saves the processing instructions containing the ID of the stylesheet file for use by the XSLT transformation process later. Otherwise, the code that performs the XSLT transformation would have to search the DOM tree for the XSL stylesheet file.
Then the program uses the XSLT style sheet to transform the XML file into a result file by performing an XSLT transformation under program control.
Errors, exceptions, and testing
No effort was made to provide meaningful information about errors and exceptions.
The program was tested using SDK 1.4.2 under WinXP.
Will discuss in fragments
I will discuss this program in fragments. A complete listing of the program is shown in Listing 24 near the end of the lesson.
Much of the code in this program is very similar to, or identical to code that I discussed in the previous lesson. I will discuss that repetitious code only briefly, if at all.
The main method
Listing 6 shows an abbreviated version of the beginning of the class named Dom12 and the ending of the main method.
public class Dom12{
|
The code in this portion of the program is identical to code that I discussed in detail in the previous lesson, so I won't discuss further. I included it here solely to establish the context for discussion of code that is to follow.
Behavior of this code
Briefly, the code in the main method does the following:
- Performs all the steps necessary to parse the input XML file, producing an object of type Document whose reference is saved in a reference variable named document.
- Instantiates an object of the Dom12 class and saves its reference in a reference variable named thisObj.
- Invokes the method named processDocumentNode on thisObj to transform the DOM tree to an output file using program code to perform the transformation.
- Invokes the method named doXslTransform on thisObj to perform an XSLT
transformation using an XSL stylesheet.
The processDocumentNode method
The entire processDocumentNode method is shown in Listing 7.
void processDocumentNode(Node node){
|
This method is used to produce any text required in the output at the document level, such as the XML declaration for an XML document. As you can see from Listing 7, the code in this method writes an XML declaration into the output.
In addition, the code in Listing 7 produces output text that matches the literal text node in the XSL stylesheet shown in Figure 4 and Listing 1.
Both of these lines of text can be see near the top of the XSLT output in Figure 3.
Invoke the processNode method
Despite the name that I chose to give to the processDocumentNode method, it doesn't actually process the document node directly. Rather after sending any required text to the output, it invokes the method named processNode to actually process the document node.
(Note that the Document object's reference is passed to the method named processNode in Listing 7.)
When the processNode method returns, (after the entire DOM tree has been processed), the processDocumentNode method flushes the output stream and returns control to the main method.
As you saw in Listing 6, code in the main method then invokes the doXslTransform to cause an XSLT transformation using the stylesheet to take place.
The processNode method
As you learned in the previous lesson, there are seven possible types of nodes in an XML document:
- root or document node
- element node
- attribute node
- text node
- comment node
- processing instruction node
- namespace node
(Apparently it is not possible to handle namespace nodes in a Java program because there is no constant in the Node class that can be used to identify namespace nodes. This will become clear as we examine the code in the processNode method.)
The processNode method in this program contains quite a few changes relative to the program that I discussed in the previous lesson. In fact, this is where most of the changes occur in this program. (The only other change is the addition of one line of code to the processDocumentNode method.) Therefore, I will discuss the processNode method in detail.
Code that you write in this method (and in the processDocumentNode method discussed above) is somewhat analogous to writing an XSL stylesheet to be used in an XSLT transformation.
Test for a valid node, and get its type
The beginning of the processNode method is shown in Listing 8. The method receives an incoming parameter of type Node, which can represent any of the seven types of nodes in the above list.
As you can see in Listing 8, if the parameter doesn't point to an actual object, the method quietly returns, as opposed to throwing a NullPointerException.
void processNode(Node node){
|
The final statement in Listing 8 invokes the getNodeType method to get and save
the type of the node whose reference was received as an incoming
parameter.
Process the node
Each time the processNode
method is invoked, it receives a Node
object's reference as an incoming parameter. The code in Listing
8 determines the type of the incoming node. Listing 9 shows the
beginning of a switch
statement that is used to initiate the processing of each incoming node
based on its type.
switch (type){
|
The switch statement has six
cases to handle six types of nodes, plus a default case to ignore
namespace nodes.
The
DOCUMENT_NODE case
The code in Listing 9 will be executed whenever the incoming method
parameter points to a document node.
(Note that this will happen only once during the processing of a DOM tree. The first node processed will always be the document node, and there is only one document node in a DOM tree.)
Will invoke default behavior in this case
The code in the case in Listing 9 is an if else construct. If the conditional clause in the if statement evaluates to true (which is not possible in this case because it is set to the literal value false), the code in the if statement will be executed. (As you will see later, this is where I place the code for custom template rules.)
If the conditional clause in the if statement does not evaluate to true, the code in the else statement will be executed. (This is where I have placed the code that mimics the built-in template rules. This was explained in detail in the previous lesson.)
Note that the code in the else statement in Listing 9 invokes a method named defElOrRtNodeTemp. The behavior of this method mimics one of the built-in template rules that I explained in the previous lesson. That method has not changed since the previous lesson. Therefore, I won't discuss it in this lesson. You will find the method in Listing 24 near the end of this lesson.
Creating custom template rules
Although this lesson does not create a custom template rule for document nodes, the process for creating a custom template rule is as follows:
- Go to this method named processNode.
- Identify the case for the node type in the switch statement.
- Change the conditional clause in the if statement for that case to implement a match for a particular node of that type.
- Write code in the body of the if statement to implement the custom template rule.
The ELEMENT_NODE case
Most of the changes to this program (as compared to the program in the previous lesson) consist of changes to the code that processes element nodes in the switch statement. The code for this case is rather long, so I will discuss it in fragments.
A match for element nodes named top
The beginning of the case for element nodes is shown in Listing 10.
case Node.ELEMENT_NODE:{
|
I will begin by calling your attention to the similarity between the code in Listing 10 and the XSLT template rule shown earlier in Figure 6 and Listing 2.
The if statement in Listing 10 returns true if the name of the element node being processed is top. That corresponds to the XSLT match pattern in the first line in Listing 2.
The material shown in red in Listing 10 corresponds to the literal text shown in red in the XSLT template rule in Listing 2.
The invocation of the method named applyTemplates in Listing 10 corresponds to the xsl:apply-templates element in Listing 2.
The applyTemplates method
The only code in Listing 10 that is of any complexity is the invocation of the applyTemplates method.
The applyTemplates method in this program is identical to the method having the same name in the previous lesson. I discussed the method in detail in that lesson. Therefore, I won't discuss it further in this lesson. However, an understanding of that method is critical to an understanding of this program. If you haven't done so already, I strongly urge you to go back and review the previous lesson entitled Java JAXP, Implementing Default XSLT Behavior in Java .
A match for element nodes named theData
Continuing with the case for element nodes, the code in Listing 26 shows an else if clause that matches element nodes named theData.
(Note that this is an else if clause that follows the if statement begun in Listing 10.)
}else if(node.getNodeName().equals( |
Once again, I will point out the similarity of the code in Listing 11 to the XSLT template rule shown in Figure 7 and Listing 3.
This code will be executed for all element nodes named theData that are passed as an input parameter to the processNode method. This code puts the text shown in red into the output just as the template rule puts the text shown in red in Listing 3 into the output.
This code invokes the valueOf method and the applyTemplates methods in a way that is very similar to the way the template rule executes the xsl: value-of element and the xsl:apply-templates element.
The valueOf method
The valueOf method in this program is identical to the method having the same name in the previous lesson. However, this program uses portions of that method that I didn't discuss in the previous lesson. Therefore, I will set the discussion of the switch statement in the processNode method aside temporarily, follow the thread of execution, and discuss the valueOf method in some detail in the paragraphs that follow.
Request value of attribute named attr
Note the parameters being passed to the valueOf method in listing 11. The first parameter is a reference to the Node object being processed by the processNode method. The second parameter is a String that begins with the @ character and continues with the characters attr. As is the case for the template rule in Listing 3, this invocation of the valueOf method requests the value of the attribute named attr belonging to the node that is passed as the first parameter.
Description of the valueOf method
The valueOf method emulates the following XSLT element:
<xsl:value-of select="???"/>The general form of the method call is:
valueOf(Node theNode,String select)The valueOf method recognizes three forms of call based on the value of the select parameter:
- "@attrName"
- "."
- "nodeName"
In the first form, the method returns the text value of the named attribute of the Node. An attribute is specified by a select value that begins with @. The name of the attribute follows the @ character in the string. If the attribute doesn't exist, the method returns an empty string.
Return the value of the context node
In the second form, the method returns the concatenated text values of the context node and its descendants. This form of call was discussed in detail in the previous lesson, so I will only mention it briefly in this lesson.
Return the value of a specified child of the context node
In the third form, the method returns the concatenated text values of a specified child node of the context node and its descendants. If the context node has more than one child node with the specified name, only the first one found is processed. The others are ignored.
I will discuss this form of method call later in the lesson when it occurs in the execution thread.
Method does not support ...
The valueOf method does not support the following standard features of xsl:value-of:
- disable-output-escaping
- processing instruction nodes
- comment nodes
- namespace nodes
The beginning of the valueOf method is shown in Listing 12.
public String valueOf(Node node,String select){
|
The method begins by testing the incoming parameter to see if it starts with the @ character. If so, the method call is interpreted as a request to return the value of an attribute belonging to the node specified by the first parameter. The name of the attribute is specified by the characters following the @ character in the incoming string.
Get the attribute name
The code in Listing 12 uses the substring method of the String class to get the name of the attribute and to save it in the reference variable named attrName.
(As you will see shortly, if the attribute doesn't exist on that node, the method simply returns an empty string as the return value.)
Following this, the program executes the two statements in Listing 13 to access the attribute node and to save it in the reference variable named attrNode.
NamedNodeMap attrList = |
A map of attribute nodes
Attribute nodes are not simply child nodes of element nodes. In particular, all child nodes of an element node can be obtained in a collection of type NodeList by invoking the method named getChildNodes on the element node.
In order to get the attributes belonging to an element node, it is necessary to invoke the method named getAttributes on the element node. This method returns a reference to an object of type NamedNodeMap. This object contains unordered references to all the attribute nodes belonging to the element node.
Save the attribute node's reference
References to objects representing attribute nodes can be accessed in a NamedNodeMap object either on the basis of the attribute name, or on the basis of an ordinal index.
(Access by ordinal index is supported for convenience even though the references are unordered. No ordering is implied by the ordinal index.)
Return value of attribute node
The code in Listing 14 invokes the getNodeValue method to get and return the value of the attribute node.
if(attrNode != null){
|
If the context node doesn't have an attribute with that name, the value of attrNode will be null. In that case, the valueOf method returns an empty string.
The remainder of the valueOf method
That completes the portion of the valueOf method used to return the value of an attribute. Listing 15 shows the overall structure of the remainder of the valueOf method, to help you keep track of the big picture. (Most of the code was deleted from Listing 15 for brevity.)
else if(select != null |
I will return to a discussion of the valueOf method later in this lesson, at which time I will discuss some of the code that was deleted from Listing 15.
Back to the template rule
Please return your attention to Listing 11, which emulates the XSLT template rule shown in Listing 3. When the valueOf method returns the value of the attribute named attr (or returns an empty string), the code in Listing 11 invokes the applyTemplates method to cause templates to be applied to theData's child nodes named title.
Once again, note the similarity of this code to the XSLT template rule shown in Listing 3.
Back to the switch statement
Control flows recursively through the applyTemplates method back to the element node case for the element named title in the switch statement in the processNode method. That code begins in Listing 16.
}else if(node.getNodeName().equals( |
Note the similarity of this code and the beginning of the XSLT template rule shown in Listing 4.
By now, the code in Listing 16 should be very familiar to you and should require very little in the way of an explanation. This code begins by sending a literal text string to the output. Then it gets the value of the context node named title and sends that text to the output as well. (A value of "." for the second parameter of the valueOf method requests the value of the context node.)
Invoke valueOf with select equal to subtitle
The remaining code that emulates the XSLT template rule shown in Listing 4 is shown in Listing 17.
out.println( |
This code begins by sending literal text to the output. Then it invokes the valueOf method passing the name of the node named subtitle as the select parameter. That brings us to a discussion of the one remaining portion of the valueOf method not previously discussed.
Overall structure of the valueOf method
Listing 18 shows a greatly condensed version of the two sections of the valueOf method that were discussed previously (one in this lesson and one in the previous lesson). The code in Listing 18 is provided to help you understand the overall structure of the valueOf method and to keep track of the big picture.
public String valueOf(Node node,String select){
|
Return the value of a specified child node
Listing 19 shows that portion of the valueOf method that processes a child node whose name is specified by the value of the incoming parameter named select. This code returns the concatenated text values of the specified child node and all of its descendants.
else if(select != null){
|
(This process assumes that there is only one child node with the specified name and processes the first one that it finds. If there are additional child nodes having the same name, they are ignored.)
Assuming that you are comfortable with recursion, the code in Listing 19 is relatively straightforward. This code
- Traps the specified child node
- Causes it to become the context node
- Passes it recursively to that portion of the same valueOf method that returns the value of the context node.
I discussed the portion of the valueOf method that returns the value of the context node in the previous lesson, so I won't repeat that discussion here.
Back to the switch statement
Once again, that takes us back to the code in Listing 17, which emulates the latter portion of the XSLT template rule in Listing 4. Note that upon return from valueOf, the code in Listing 17 invokes the applyTemplates method passing the name subtitle as the select parameter.
Control flows recursively through the applyTemplates method back to the element node case for the element named subtitle in the switch statement in the processNode method. That code is shown in Listing 20.
}else if(node.getNodeName().equals( |
Compare the code in Listing 20 with the XSLT template rule in Listing 5.
Nothing new here
All of the code in Listing 20 is similar to code that I have already discussed in detail. Therefore, not much in the way of further discussion should be needed.
No call to applyTemplates
However, there is one very important thing to note in Listing 20. The code in Listing 20 does not make a call to applyTemplates. Therefore, the code in Listing 20 signals the end of the recursive flow of control being used to traverse this leg of the DOM tree. All of the methods that have been called recursively in order to get to this point in the DOM tree will start returning in the reverse of the order in which they were called.
Finish the case for Node.ELEMENT_NODE
Listing 21 shows the completion of the code for the element node case that began in Listing 10. This code will be invoked if an element node is encountered with a name that does not match top or one of the node names in the sequential else if constructs discussed above.
The code in Listing 21 invokes a method named defElOrRtNodeTemp that emulates one of the built-in XSLT template rules. This method and the methods that emulate the other built-in template rules were discussed in detail in the previous lesson.
}else{//invoke default behavior
|
The remainder of the processNode method
Listing 22 shows the remaining code in the processNode method. All of the remaining cases in the switch statement invoke methods that emulate built-in XSLT template rules.
The code in listing 22 is identical to the same code in the previous lesson where it was discussed in detail. Therefore, I won't discuss it further in this lesson.
switch (type){
|
The program output
The output produced by this program is essentially the same as the XSLT transform output discussed in the early part of the lesson. With some minor exceptions having to do with blank lines, the output shown in Figure 3 represents the output both of the program and the XSLT transform.
Compare with XSL stylesheet
To summarize the situation, I'm going to show you one more view of the new code in the program for comparison with the XSL stylesheet in Listing 26.
The code in Listing 23 plus the one red statement in Listing 7 is analogous to the stylesheet shown in Listing 26 from a functional viewpoint.
case Node.ELEMENT_NODE:{
if(node.getNodeName().equals("top")){
out.println("B Match top");
applyTemplates(node,"theData");
}else if(node.getNodeName().equals("theData")){
out.println(
"C Match theData and show attribute");
out.println(valueOf(node,"@attr"));
applyTemplates(node,"title");
}else if(node.getNodeName().equals("title")){
out.println(
"D Match title and show value of title as "
+ "context");
out.println(valueOf(node,"."));
out.println("E Show value of subtitle");
out.println(valueOf(node,"subtitle"));
applyTemplates(node,"subtitle");
}else if(node.getNodeName().equals("subtitle")){
out.println(
"F match subtitle and show value of attribute");
out.println(valueOf(node,"@position"));
out.println(
"G Show value of subtitle as context node");
out.println(valueOf(node,"."));
}else{//invoke default behavior
defElOrRtNodeTemp(node);
}//end else
break;
}//end case ELEMENT_NODE
Listing 23
|
As you can see, the code in Listing 23 is no more complex than the stylesheet. The point is that once you have a library of Java methods that emulate the required XSLT elements, it is no more difficult to write a Java program to transform an XML document than it is to write an XSL stylesheet to transform the same document.
Run the Program
I encourage you to copy the Java code, XML file, and XSL file from the listings near the end of this lesson. Compile and execute the program. Experiment with the files, making changes, and observing the results of your changes.
Summary
In this lesson, I showed you how to write a Java program that mimics an XSLT transformation for converting an XML file into a text file. I showed that once you have a library of Java methods that emulate XSLT elements, it is no more difficult to write a Java program to transform an XML document than it is to write an XSL stylesheet to transform the same document.What's Next?
In the next lesson, I will show you how to use XSLT to transform an XML document into an XHTML document. I will also show you how to write Java code that performs the same transformation.
Complete Program Listings
/*File Dom12.java |
<?xml version="1.0"?> |
<?xml version='1.0'?> |
Copyright 2004, Richard G. Baldwin. Reproduction in whole or in part in any form or medium without express written permission from Richard Baldwin is prohibited.
About the author
Richard Baldwin is a college professor (at Austin Community College in Austin, TX) and private consultant whose primary focus is a combination of Java, C#, and XML. In addition to the many platform and/or language independent benefits of Java and C# applications, he believes that a combination of Java, C#, and XML will become the primary driving force in the delivery of structured information on the Web.Richard has participated in numerous consulting projects, and he frequently provides onsite training at the high-tech companies located in and around Austin, Texas. He is the author of Baldwin's Programming Tutorials, which has gained a worldwide following among experienced and aspiring programmers. He has also published articles in JavaPro magazine.
Richard holds an MSEE degree from Southern Methodist University and has many years of experience in the application of computer technology to real-world problems.
-end-
