Java Programming Notes # 2210
- Preface
- Preview
- Some
Details Regarding
XHTML - Some Details Regarding XSLT
- Discussion and
Sample Code - Run the Program
- Summary
- What’s Next?
- Complete Program
Listings
Preface
In the previous lesson entitled Java
JAXP, Writing Java Code to Emulate an XSLT Transformation, I showed
you how to
write a Java program that mimics an XSLT transformation for converting
an XML file into a text file. I also showed that once you have a
library of Java
methods that
emulate XSLT elements, it is no more difficult to
write a Java program to transform an XML document than it is to
write an XSL stylesheet to transform the same document.
In this lesson, I will show you how to use XSLT to transform an XML
document into an XHTML document. I will also show you how to
write Java code that performs the same transformation.
This lesson is one in a series designed to teach you how to use JAXP
and Sun’s Java Web Services Developer
Pack
(JWSDP).
The first lesson in the series was entitled Java
API for XML Processing (JAXP), Getting Started.
As mentioned above, the
previous lesson was entitled Java
JAXP, Writing Java Code to Emulate an XSLT Transformation.
JAXP, XML, XSL, XSLT, W3C, and XHTML, a
Review
JAXP is an
API designed
to help you write programs for creating and processing XML
documents. It is a critical part of Sun’s Java Web Services Developer
Pack
(JWSDP).
XML is an acronym for the eXtensible
Markup Language.
I will assume that you already
understand
XML, and will teach you how to use JAXP to write programs for
creating and processing XML documents.
XSL is an acronym for Extensible Stylesheet language.
XSLT is an acronym for XSL Transformations.
The numerous uses of XSLT include the following:
- Transforming non-XML documents into XML documents.
- Transforming XML documents into other XML documents.
- Transforming XML documents into non-XML documents.
This
lesson explains a Java program
that transforms an XML document into an XHTML document.
An XHTML document is an XML
document that provides a rigorous alternative to the use of an HTML document. According to
the W3C, XHTML 1.0 is a “Reformulation of HTML 4 in XML 1.0.”
Viewing tip
You may find it useful to open another copy of this lesson in a
separate browser window. That will make it easier for you to
scroll back and forth among the different listings and figures while
you are reading about them.
Supplementary material
I recommend that you also study the other lessons in my extensive
collection of online Java and XML tutorials. You will find those
lessons
published at Gamelan.com.
As of the date of this writing, Gamelan doesn’t maintain a
consolidated index of my tutorial lessons, and sometimes
they are difficult to locate there. You will find a consolidated
index at www.DickBaldwin.com.
Preview
A tree structure in memory
A DOM parser can be used to
create a tree structure in memory that represents an XML
document. In Java, that tree structure is encapsulated in an
object of the interface type Document.
Many operations are possible
Given an object of type Document (often called a DOM tree), there
are many
methods that
can be invoked on the object to perform a variety of operations.
Two ways to
transform an XML document
There are at least two ways to transform the contents of an XML
document into another document:
- By writing Java code to manipulate the DOM tree and perform the
transformation. - By using XSLT to perform the transformation.
A skeleton
library of Java methods
This is one of several lessons that show you
how to write the skeleton of a Java library containing methods that
emulate the most common XSLT elements. Once you have the library,
writing Java code to transform XML documents consists mainly of writing
a short driver program to access those methods. Given the proper
library of methods, it is no more difficult to write a
Java program to perform the transformation than it is to write
an
XSLT stylesheet.
Library is
not my primary purpose
However, my primary purpose in these lessons is not to provide such
a library, but rather is to help you understand how to use a DOM
tree to create, modify, and manipulate XML documents. By
comparing Java code that manipulates a DOM tree with similar XSLT
operations, you will have an opportunity to learn a little about XSLT
in the process of learning how to manipulate a DOM tree using Java code.
Some
Details Regarding XHTML
XHTML documents, a special case
An XHTML document is an XML document. It is a rigorous
alternative to an HTML document.
One of
the interesting
uses of XSLT is the transformation of XML documents into
XHTML documents. This
makes it possible to render the information contained in an XML
document using an XHTML-compatible Web browser.
Where does the transformation take place?
When transforming an XML document for rendering
with an XHTML browser, the transformation can take place anywhere
between the
source of the XML document and the browser.
Transforming on the server
For example, a transformation program can be written in Java and run
on a web
server as a
servlet, or it can be written as a JavaBeans component and accessed
from a scriptlet in JavaServer pages (JSP).
Transforming at the browser
The transformation can also be performed by the browser. For
example, Microsoft IE 6.0 and XSLT can be used for this
purpose.
Will
transform XML into XHTML
This and the next several lessons will illustrate parallel Java code
and XSLT transformations to transform XML documents into XHTML
documents. The sample programs will illustrate various aspects of
the manipulation of a DOM tree using Java code.
Requirements
for XHTML documents
According to Web Design
& Development Using XHTML by Griffin, Morales, and Finnegan, an
XHTML document differs from an HTML document in the following ways:
- XHTML documents must be well-formed.
- Element and attribute names must be in lower case.
- Non-empty elements require end tags.
- Attribute values must always be quoted.
- XHTML documents have no attribute minimization.
- XHTML documents end empty elements.
- XHTML documents use elements with id and name attributes.
- XHTML documents use Document Type Declarations
- XHTML documents use XML namespaces.
Although it is not a requirement, an XHTML document often has an XML
declaration at the beginning to identify the document as an XML
document.
Some
Details Regarding XSLT
Previous lessons in this series have provided quite a bit of
detailed information regarding the operation of XSLT. Therefore,
this discussion will be brief.
Assume that an XML document has been parsed to produce a DOM
tree
in memory that represents the XML document.
Execute
template rules
An XSLT processor starts examining the DOM tree at its root
node. It
obtains instructions from the XSLT stylesheet telling it how to
navigate the
tree, and how to treat each node that it encounters along the way.
As each node is encountered, the processor searches the stylesheet
looking for a template rule that governs how to treat nodes of that
type. If the
processor finds
a template rule that matches the node type, it performs the operations
indicated by the template rule. Otherwise, it
executes a built-in template rule appropriate to that node.
Literal text in
template rules
If the template rule being applied
contains literal text, that literal text is used to
create text in the output.
Traversal of
the DOM tree
There are at least two XSLT elements that can be used to
traverse the children of a context node:
- xsl:apply-templates
- xsl:for-each
The
xsl:apply-templates element
The xsl:apply-templates
element was discussed in detail in previous lessons.
The
xsl:for-each element
The xsl:for-each element
executes an iterative
examination of all child nodes of the context node that
match a required select attribute. As each child
node is examined, it is processed using XSLT elements that form the
content of the xsl:for-each
element in the template rule.
This lesson will include examples that use the xsl:for-each element in addition to
the xsl:apply-templates
element. The lesson will also explain a Java method that emulates
the xsl:for-each element.
Enough talk,
let’s
see some code
I will begin by discussing the XML file named Dom03.xml (shown in Listing 24 near the end of the
lesson) along with
the XSL
stylesheet file named Dom03.xsl
(shown in Listing 25).
A Java program
named Dom03
After explaining the transformation produced by applying this
stylesheet to this XML document, I will explain the transformation
produced by processing the XML file with a Java program named Dom03 (shown in Listing 23) that mimics
the behavior of the XSLT transformation.
Discussion
and Sample Code
The XML
file named Dom03.xml
The XML file shown in Listing 24 is relatively straightforward. A
tree view of the XML file is shown in Figure 1. (This XML file is both well-formed and
valid.)
#document DOCUMENT_NODE Figure 1 |
tree view of the XML file was
produced using a program named DomTree02, which was discussed in an
earlier lesson.
Note that in order to make the tree view more
meaningful, I manually removed extraneous line breaks and text nodes
associated with those line breaks. The extraneous
line breaks in Figure 1 were caused by extraneous line breaks in the
XML file. The extraneous line breaks in the XML file were placed
there for cosmetic reasons and to force it to fit into this narrow
publication format.)
Content of the XML
document
The structure and content of the XML document was primarily designed to
illustrate various transformation concepts that I intend to explain in
this lesson. However, to some extent, I designed the structure
and
content keeping in mind the ultimate rendering of the XHTML file that
will be produced by transforming the XML file into an XHTML file.
The rendered
XHTML file
At this point, I’m going to jump ahead and show you what the final
XHTML file
looks like when rendered using Netscape Navigator v7.1. The
rendering of the XHTML file is shown in Figure 2.
rendering in Figure 2 with the XML file structure and content in Figure
1. You should be able to identify text nodes in Figure 1 that
match up with rendered text in Figure 2.)
Figure 2 Rendered XHTML file
The XSLT Transformation
The XSL
stylesheet file named Dom03.xsl
Recall that an XSL stylesheet is itself an XML file, and can therefore
be represented as a tree. Figure 3 presents an
abbreviated tree view of the stylesheet shown in Listing 25. I
colored each of the template rules in this view with alternating
colors of red and blue to make them easier to identify.
case with XSL stylesheets, this stylesheet file is well-formed but it
is not
valid.)
NOTE: IT WAS NECESSARY TO MANUALLY ENTER SOME Figure 3 |
Why abbreviated?
The reason that I refer to this as
an abbreviated tree view is because I manually deleted comment nodes
and
extraneous text nodes in order to emphasize the important elements in
the stylesheet.
of inserting line breaks in the original XSL document for cosmetic
purposes.
Note that I also manually entered several line breaks near the
beginning to force the material to fit into
this narrow
publication format.)
The root element
The root node of all XML documents is the document node. In
addition to the root node, there is also a root element, and it is
important not to confuse the two.
As you can see from Figure 3, the root element in the XSL document is
of type xsl:stylesheet.
The root element has two attributes, each of which is standard for XSL
stylesheets.
in the second attribute of the xsl:stylesheet
node to force it to fit into this narrow publication format. I also
manually entered line breaks into two of the attributes of the xsl:output element node to force
them
to fit into this narrow publication format.)
The first attribute provides
the XSLT
version.
The second attribute points to the XSLT namespace URI, which you can
read about in the W3C
Recommendation.
Children of the
root element node
The root element node (xsl:stylesheet)
in Figure
3 has ten child
nodes, nine of which are template rules. (The green child node is not a template
rule. I will discuss it in detail later.) I colored
the template rules in alternating colors of red and blue to make them
easier to identify
visually.
The template
rules
Each of the nine template rules has a match
pattern. The nine match patterns in the order that they appear in
Figure 3 are as follows:
- match=/ (root node)
- match=B (matches element
node named B) - match=C (matches element
node named C) - match=D (matches element
node named D) - match=G (matches
element node named G) - match=Q (matches
element node named Q) - match=R (matches
element node named R) - match=S (matches
element node named S) - match=T (matches
element node named T)
I will discuss each of the nine template rules later, but before doing
that
I will show you the raw XHTML output produced
by this XSLT transformation.
that the Java program discussed later produces essentially the same
output as the XSLT transformation.)
The output from
the transformation
The result of performing an XSLT transformation (by applying the XSL
stylesheet shown in Listing 25 to the XML file shown in Listing 24)
is
shown in Figure 4. This is the raw XHTML code that
was rendered in Figure 2.
I will explain the operations in the XSLT transformation that produced
most of the text in Figure 4.
NOTE THAT IT WAS NECESSARY FOR ME TO MANUALLY |
that I manually deleted a couple of extraneous line breaks from
the output shown in Figure 4. It was also necessary for me to
manually insert line breaks in several of the long lines to force the
material to fit in this narrow publication format. I also
manually inserted line breaks at certain critical points to make it
easier to interpret the material visually.)
Can sometimes
get confusing
I will caution you up front that this discussion can become
confusing but I will do everything that I can to minimize the
confusion. The problem is that the discussion will be mixing
tags, attributes and elements from the XML file with tags, attributes,
and
elements from the stylesheet file and the XHTML file. With so
many tags, attributes, and elements being discussed, it is sometimes
difficult to keep
them separated in your mind.
In particular, in order to cause the output to be a valid XHTML
document, it is necessary to manually insert XHTML tags, attributes,
and elements in the XSL template rules, which themselves involve XML
tags, attributes, and elements.
I will make heavy use of color in an attempt to minimize the confusion.
The first line of
text
The first line of text in the output shown in Figure 4
is an XML declaration
that is produced automatically by the XSLT transformer available with
JAXP. As I mentioned earlier, such a declaration is not
required, but is highly recommended by most authors.
The xsl:output
element
Before getting into the template rules in Figure 3, I need to explain
the xsl:output element shown
in green in Figure 3 and reproduced in Figure 5 below for convenient
viewing.
xsl:output ELEMENT_NODE Figure 5 |
The XSL
stylesheet version
Listing 1 shows the XSL code that corresponds to the tree view of the
stylesheet element shown in Figure 5.
<xsl:output method="xml" |
to remind you that it was necessary for me to manually insert line
breaks in Listing 1 to cause the material to fit in this narrow
publication format.)
Literal text passes
through to the output
As you learned in the previous lesson, any literal text that you
include in your XSL stylesheet will be passed through to the
output. As you will see later, I will cause the output to contain
much of the required XHTML text simply by including that XHTML text as
literal text in the stylesheet.
The stylesheet
is an XML document
It is important to remember, however, that the XSL stylesheet is itself
an XML document, and you cannot include any literal text that would
cause a parser
to reject it as an XML document. You also cannot do anything that
will cause the XSLT processor to reject it as a stylesheet.
XHTML document
requires a specific DTD reference
One of the things that is required in the XHTML output is the DTD
reference
shown in Figure 6.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 |
material in Figure 6 was extracted from Figure 4 and reproduced here
for convenient viewing. This is one of three alternative DTDs
that can be used with an XHTML document.)
Correct DTD for
XHTML but not for stylesheet
The DTD reference in Figure 6 is a correct DTD reference for an XHTML
document, but it is not a correct DTD reference for an XSL
stylesheet. (In fact,
stylesheets don’t require a DTD and often don’t have one.)
If you simply include the text from Figure 6 as literal text in the
stylesheet, (in hopes that it will
pass through to the output), the XSLT processor will interpret
it as a DTD reference for the stylesheet, and will attempt to validate
the stylesheet against that reference. The stylesheet will then
be declared invalid and the transformation effort will fail.
Therefore, you must find a way to cause this DTD reference to end up in
the XHTML document without confusing the XSLT transformation process.
Two ways to
accomplish that
I know of two ways to accomplish that objective. One way is to
include the text from Figure 6 in a CDATA section in the
stylesheet. This
raises some other issues, but it can be made to work.
The easier way is to use the xsl:output
element shown in Listing 1 to cause the DTD reference to be written
into the output without confusing the parser or the XSLT processor.
The xsl:output
element
Here is a partial quotation from XML In A Nutshell, (which I highly recommend), by
Elliotte Rusty Harold and
W. Scott Means.
top-level xsl:output element helps determine the exact formatting of
the XML document produced when the result tree is stored in a file,
written onto a stream, or otherwise serialized into a sequence of
bytes.”
Ten optional
attributes
To make a long story short, this element has ten optional attributes
that are used by the XSLT processor to determine the formatting of the
output. The XSLT element shown in Listing 1 specifies values for
three of those optional attributes:
- method
- doctype-public
- doctype-system
The default value for method is
xml, so I could have omitted
this attribute from my stylesheet with no problems. When the
value of this attribute is xml,
(which is the case in Listing 1),
that instructs the processor to produce a well-formed XML document.
The doctype-public attribute
sets the public identifier used in the document type declaration.
The doctype-system attribute
sets the system identifier used in the document type declaration.
The required
XHTML DTD
There are three allowable DTDs that can be used for an XHTML document:
- Strict
- Transitional
- Frameset
I’m not going to get into the differences between these three
DTDs in this lesson. Suffice it to say that I elected to use the
transitional
DTD for this example because it is somewhat easier to use than the
other two.
The
transitional DTD
Here is what the W3C has to say about the DTD for XHTML 1.0 Transitional:
following PUBLIC and SYSTEM identifiers:
PUBLIC
“-//W3C//DTD XHTML 1.0 Transitional//EN”
SYSTEM
“http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd”
As you can see, these values match the doctype-public
and doctype-system attribute
values in Listing 1, and result in the correct output for the
XHTML DTD in Figure 6.
The first
template rule
The first template rule (extracted
from Figure 3 and given a different color scheme) is shown in
tree view in Figure 7. This
template rule contains an XPath expression that matches the document
root (note the forward slash).
xsl:template ELEMENT_NODE Figure 7 |
The template rule in
XSL format
Listing
2 shows the same template rule in
XSL format, (extracted from Listing
25).
<xsl:template match="/"> |
that according to most of the books that I have read, the following
namespace attribute should be used on the html
tag. However, something about it causes problems with the JAXP
transformer so
I left it off. The resulting XHTML file is still valid according
to the W3C Markup
Validation Service even without the namespace attribute.
xml:lang=”en” lang=”en”)
The literal text is shown in red
From my viewpoint as the author of the stylesheet, everything that is
colored red in Listing 2 is simply literal text that I want to pass
through to the output so that it will become part of the raw XHTML text.
The template rule
must be well-formed
However, as you can see from Figure 7, the XML parser considers all of
this material to be well-formed (but
not valid) XML element nodes, attribute nodes, and text
nodes. Were I to make a change to any of the red literal text
that would corrupt the well-formed nature of the XML code in Listing 2,
the
stylesheet could not be used to control an XSLT transformation.
While a stylesheet is not required to be valid, it is required to be
well-formed.
Must be very
careful when including markup in stylesheet
Therefore, you must be very careful when you include literal markup
text in the stylesheet for whatever purpose. Any markup that you
include in the stylesheet must result in the stylesheet being
well-formed.
of literal text in the stylesheet in the previous lesson, because the
literal text didn’t contain markup characters. As a result, the
literal text was interpreted simply as text nodes in the
stylesheet. As you can see from Figure 7, however, the literal
markup text that was included in this stylesheet was interpreted by the
parser as element nodes, attributes and text nodes.)
A very simple
template rule.
At first blush, this template rule appears to be very long and very
complex. However, as you can see from Listing 2, once you isolate
out all of the literal XHTML text that’s included in the template rule,
the actual XSLT template rule is very simple. This rule simply
passes a lot of literal markup text through to the output and causes
templates
to be applied to all children of the root (document) node. (You learned what it means to apply
templates in
the previous lesson.)
The XHTML tags
If you are familiar with XHTML syntax, you will recognize that the
literal text shown in red in Listing 2 begins with typical XHTML tags
such as <html>, <head>, and <body>. These
tags are required for an XHTML document. This text is sent to the
output before any processing of the DOM tree is performed.
Then the literal text creates an XHTML table with a yellow
background. The start tags for the table are sent to the output
before the xsl:apply-templates
element is executed.
All of the output produced by executing the xsl:apply-templates element is
inserted into a single data <td> cell in the table.
Finally, when the xsl:apply-templates
element returns, the end tags for the table and the end tags for the
document are sent to the output.
The raw XHTML
output
Figure 8 shows a condensed version of the raw XHTML output. The
XHTML output shown in red in Figure 8 matches the literal text shown in
red in the template rule of Listing 2.
NOTE THAT IT WAS NECESSARY FOR ME TO MANUALLY |
The effect of
xsl:apply-templates
Referring once again to Listing 2, we see that this template rule
causes templates to be applied to all child nodes of
the root or document node. A root node can have only one child
node, which is the root element node. Referring back to Figure 1,
we see that the root element node is named A.
Now referring back to the tree view of the stylesheet in Figure 3 (and also the list of match patterns
presented earlier), we see that the stylesheet doesn’t contain a
template rule that matches an element named A.
Important to
understand built-in behavior
If the processor encounters a node for which there is no matching
template rule, it executes a built-in template rule for that
type of node. This is where it becomes important to understand
the behavior of the built-in template rules, which I explained in the
earlier lesson entitled Java
JAXP, Implementing Default XSLT Behavior in Java.
The behavior of the built-in template rule for element nodes is to
apply templates to all child nodes of the element node.
Therefore, in this case, the processor will apply templates to all
child nodes of the root element node named A.
Referring back to Figure 1, we see that the root element node has three
child nodes, which occur in the following order: Q, B, and
B.
Therefore, the first node that will be processed is the node named Q.
A template rule
that matches Q
Figure 9 and Listing 3 show a template rule that matches an element
named Q.
xsl:template ELEMENT_NODE Figure 9 |
The tree view of the template rule is shown in Figure 9. The XSL
stylesheet code is shown in Listing 3.
<xsl:template match="Q"> |
A level 1
header in the output
This template rule sends the start and end tags for a level 1 XHTML
header to the output, and inserts something between those tags by
applying templates to all child nodes of the element node named Q.
Referring back to the element node named Q in Figure 1, we see that it has
only one child node, and that node is a text node. Executing the xsl:apply-templates element on a
text node causes the built in version of the template rule to be
applied. The built-in version gets the value of the text node and
sends it to the output. This produces the raw XHTML output shown
in Figure 10.
<h1> |
You should be able to easily identify the header from Figure 10 in the
first line of the rendered output in Figure 2.
A template rule
that matches B
That takes care of processing the root element node’s child named Q. The next child to be
processed is a child node named B.
A template rule that matches an element node named B is shown in Figure 11 and Listing
4.
xsl:template ELEMENT_NODE Figure 11 |
As before, the tree view is shown in Figure 11 and the stylesheet code
is shown in Listing 4.
<xsl:template match="B"> |
This template rule is very
simple. It simply causes templates to be applied to all child
nodes of the element node named B.
Referring back to Figure 1, we see that the first child node named B has several child nodes, which
occur in the following order: C, R, C, S, B, S, B, R, C.
An abbreviated
DOM tree
Don’t worry, I’m not going to discuss them all. In fact, I’m
going to ignore many of those nodes and their descendants, and
concentrate on the abbreviated portion of the DOM tree shown in Figure
12. I am going to concentrate on this portion because it uses
XSLT templates not previously discussed in this lesson or in my earlier
lessons.
B ELEMENT_NODE Figure 12 |
To help you keep your bearings, the first node named B in Figure 12 is the first node
named B belonging to the root
element node named A in Figure
1. That
node named B will be the
starting point for the following discussion. Nodes have
been manually removed from Figure 12 at each point where you see an
ellipses (…). I will ignore those nodes.
Traversing down
the DOM tree
As you saw in the template rule that matches B in Figure 4, each time the
processor encounters an element node named B, templates are applied to all
child nodes of that node and no other action is required.
Therefore, we can immediately skip down to a discussion of the element
node named D.
A template rule
that matches D
Figure 13 shows a tree view of the template rule that matches D.
xsl:template ELEMENT_NODE Figure 13 |
The stylesheet code for the template rule that matches D is shown in Listing 5.
<xsl:template match="D">List of items in E |
In an attempt to separate the text
and markup that controls the transformation process from the text and
markup destined to become part of the XHTML document, I colored the
latter red in Figure 13 and Listing 5. I also colored the XML
comments blue in Listing 5 to make them easy to ignore.
A simpler
version
In an attempt to make it even easier to understand the behavior of this
template rule, I have reproduced it in Listing 6 with all literal text
and all comments removed. I also added indentation to help with
the visual aspect of the XSL code.
NOTE: LITERAL TEXT AND COMMENTS WERE MANUALLY |
First consider the behavior of the
top half of the template rule in Listing 6. This rule is invoked
whenever the processor encounters an element node named D.
<xsl:for-each
select=”E”>
The processor identifies all child nodes of D whose name is E and processes them in the order in
which they occur.
nodes in sorted order using a more complex implementation, but that
isn’t being done here. That will be the topic for a future
lesson.)
<xsl:apply-templates>
The processing that is applied to each child node named E depends on the elements that
follow the xsl:for-each
element in the template rule. In this case, the processor is
instructed to apply templates to all child nodes of each node named E.
Referring back to Figure 12, you will see that the node named D has three child nodes named E and three child nodes named F.
alternating colors of red and blue to make them easier to identify
visually.)
One of the child nodes named E
has a child node named G.
No matching
template rules for E or F
Referring back to the tree view of the stylesheet in Figure 3, you can
see that there are no matching template rules for nodes named E or F. However, there is a
matching template rule for nodes named G.
Apply built-in
template rule to node E
When the processor encounters the first node named E, it will apply the built-in
template rule for element nodes. That will cause it to apply
templates to all child nodes of the node named E. The first child node that
it will encounter will be a text node containing the following text:
First list item in E
This text will be sent to the
output.
Then it will encounter the node named G
and apply the matching template rule to that node. The tree view
of that template rule is shown in Figure 14.
xsl:template ELEMENT_NODE Figure 14 |
The stylesheet code for the template rule that matches G is shown in Listing 7.
<xsl:template match="G"> |
This template rule applies templates
to all child nodes of G, and
surrounds the output produced by that operation with the XHTML start
and end tags to cause that material to be displayed as bold.
Referring back to Figure 12, we see that the node named G has only one child node. It
is a text node containing the following text:
Nested G text element
That text will be sent to the output
next, surrounded by XHTML bold tags, <b>…</b>.
That completes the processing of the first child of D named E.
Note in Figure 12 that the next child node of D is a node named F. However, we are discussing
the behavior of that portion of the template rule shown in Figure 6
that is using the xsl:for-each
element to iterate on nodes named E.
Therefore, the processor will skip over the node named F and process the next node named E.
This is a simple node that has only one child node and it is a text
node containing the following text:
Second list item in E
This text will be the next thing to
be sent to the output.
The node named D has one more child node named E, and it has a single
child node, which is a text node. The text node contains the
following text:
Third list item in E
When that text is sent to the
output, the execution of the top half of the template rule shown in
Figure 6 will be complete. Then the processor will execute the
bottom half of the template rule in Figure 6. The bottom half is
identical to the top half except that it iterates on child nodes named F, so I won’t discuss it in detail.
Let’s look at
the XHTML output
Before moving along, let’s take a look at the raw XHTML produced by the
template rule shown in Listing 5. That XHTML output is shown in
Listing 15.
List of items in E |
Black text
originates in XML document
The black text in Listing 15 originated in the XML file shown in Figure
12. You should be able to match the seven lines of black text in
Figure 15 to the corresponding text in Figure 12.
Red and blue
text originates in stylesheet
The red text in Listing 15 originated in the stylesheet template rule
shown in Listing 5. This literal text is also shown in red in
Listing 5.
The blue text in Listing 15 originated in the template rule shown in
Listing 7. This text is also shown in blue in Listing 7.
How does it
render?
If you go back and examine Figure 2, which shows the XHTML as rendered
by the Netscape Navigator browser, you should be able to identify the
output in Figure 2 produced by the raw XHTML text in Figure 15. (It occurs between the lines that read Text block 5 and Text block 6.)
As you can see, the template rule shown in Figure 5 used an xsl:for-each element
- To iterate on child nodes
named E, - To extract the text values of
those nodes and their descendants, and - To embed those values in XHTML
elements to cause the values to be rendered as an unordered list.
The value of a child node of one of
the E nodes was also caused to
be rendered in bold.
Then the template rule used an xsl:for-each
element
- To iterate on child nodes
named F, - To extract text values from
those nodes, and - To embed those values in XHTML
elements to cause the values to be rendered an ordered list.
New XSLT material
has been covered
I could go on for hours discussing the interaction of this stylesheet
with
the XML file in the transformation process. However, a review of
the tree view of the
stylesheet in Figure 3 reveals that the behavior of the remaining
template rules has either been covered in this lesson or in a previous
lesson. Therefore, I will terminate this discussion of the XSLT
transformation at this point and discuss a Java program that mimics the
behavior of this XSLT transformation.
The
Java Code Transformation
At this point, I will change
direction and
concentrate on Java code instead of XSLT elements. The
following paragraphs describe a Java program named Dom03, which emulates the XSLT
transformation described above. This program transforms an XML
file into an XHTML file using a combination of recursive and iterative
processing. Along the way, it creates and populates an XHTML
table.
This program defines a new method named forEach that mimics the behavior of
the xsl:for-each element
described above. In addition, this program adds code to the processDocumentNode and processNode
methods to emulate the template rules in the XSL file named Dom03.xsl.
Also, as was the case in the previous lessons, this program implements
six built-in template rules
for an XML processor.
Instructions
for creating a custom template rule
To create a custom template rule for this program:
- Go to the processNode method.
- Identify the node type.
- Change the conditional clause
in the if statement to
implement the required match. - Write code in the body of the if statement to implement the
custom rule.
If the modified conditional clause
evaluates to true, the custom rule will be executed. If the modified conditional clause evaluates
to false, the default rule
will be executed. You will see examples of several custom
template rules
in this program.
Behavior of the
program
This program compares the transformation of a specified XML file into a
result file, using two different approaches:
- An XSLT style sheet and
transformation, as discussed above. - Program code that emulates the
behavior of the XSLT transformation.
In particular, this program
illustrates Java code that emulates the XSLT templates in the file
named Dom03.xsl.
Both output
files are valid
The program produces two output files, one from the XSLT
transformation,
and one from executing the Java code. Both files validate as
XHTML transitional at the W3C validation service,
http://validator.w3.org/file-upload.html.
Both also validate as HTML files at
http://www.htmlhelp.com/tools/validator/upload.html.
Finally, both files validate using the program named DomTree02, which means that they
validate as XML under JAXP.
Usage
instructions
The program requires three command line arguments in the following
order:
- The name of the input XML file
– must be Dom03.xml. - The name of the output file to
be produced by the XSLT transformation. - The name of the output file to
be produced by the program code that emulates the XSLT transformation.
The name of the XSL stylesheet file
is extracted from the processing instruction in the XML file, but you
could easily modify the program to obtain the name of that file from a
command-line argument.
Order of execution
The program begins by executing code to transform the incoming XML file
in a way that mimics the XSLT Transformation. Along the way, it
saves the processing instructions containing the ID of the stylesheet
file for use by the XSLT transformation process later. Otherwise,
the code that
performs the XSLT transformation would have to search the DOM
tree for the XSL stylesheet file.
Then the program uses the XSLT style sheet to transform the XML file
into a result file by performing an XSLT transformation under program
control.
Errors,
exceptions, and testing
No effort was made to provide meaningful information about errors and
exceptions.
The program was tested using SDK 1.4.2 under WinXP.
Will discuss in
fragments
I will discuss this program in fragments. A complete listing of
the program is shown in Listing 23 near the end of the lesson.
Much of the code in this program is very similar to, or identical to
code that I discussed in previous lessons. I will discuss that
repetitious code only briefly, if at all.
The main method
Listing 8 shows an
abbreviated version of the beginning of the class named Dom03 and the ending of the main method.
public class Dom03{ |
The code in this portion of the
program is identical to code that I discussed in detail in previous
lessons, so I won’t discuss it further. I included it here
solely to establish the context for discussion of code that is to
follow.
Behavior of
this code
Briefly, the code in the main
method does the following:
- Performs all the steps
necessary to parse the input XML file, producing an object of type Document whose reference is saved in
a reference variable named document. - Instantiates an object of the Dom03 class and saves its
reference in a reference variable named thisObj. - Invokes the method named processDocumentNode on thisObj to transform the
DOM tree to an output file using program code to perform the
transformation. - Invokes the method named doXslTransform on thisObj to perform an XSLT
transformation using an XSL stylesheet.
The methods named processDocumentNode and doXslTransform are methods of my own
design.
The
processDocumentNode method
The beginning of the processDocumentNode
method is shown in Listing 9. This version of the method is much
longer than versions discussed in previous lessons.
void processDocumentNode(Node node){ |
However, even though this version is much longer, there is nothing in
the method that should be a stretch for capable Java programmers.
All of the new code in this method is in the form of print statements
to cause appropriate XHTML text to appear in the output.
Produces all required output text
This method is used to produce any text required in the output at
the document level, such as the XML declaration for an XML
document, or the DTD reference for an XHTML document. As you can
see from Listing 9, the code in this method does both.
The code in Listing 9 writes an XML declaration, and then writes XHTML
text into the output that matches text produced by the green xsl:output element in Figure
3. I have already discussed the need for the XHTML DTD in the
XHTML file, so I won’t discuss it further here.
The start tag for the html root element
The code in Listing 10 writes the start tag for the html root element of the XHTML
document. Then it writes the XML namespace attribute in the
output.
stylesheet shown in Figure 3 doesn’t write an XML namespace attribute
for reasons that I explained earlier.)
out.println("<html xmlns="http://www.w3." |
Following this, the code in Listing
10 writes the same XHTML text in the output that is written by the
first red template rule in Figure 3.
Invoke the processNode method
Then the code in Listing 11 invokes the processNode method to trigger a
recursive process that processes the entire DOM tree.
processNode(node); |
When the processNode method returns, the code
in Listing 11 writes XHTML text into the output consisting of end tags
for the table, the body, and the document. That completes the
production of the XHTML document, so the code in Listing 11 flushes the
output buffer to assure that everything is written into the file.
Invoke the doXslTransform method
Then the processDocumentNode
method terminates and returns control to the main method in Listing 8. At
that point, the doXslTransform
method is invoked to perform an XSLT transformation on the XML file
using the stylesheet discussed earlier in this lesson.
Quite a lot of code was added to the processDocumentNode
method, but as
mentioned earlier, all of that code was added simply to write XHTML
text into the output at the document level. All of the changes to
the program that were significant from a programming viewpoint were
either included in the processNode method,
or were part of a new method named forEach.
Invoke the
processNode method
Despite the name that I chose to give to the processDocumentNode method, it
doesn’t actually process the document node directly. Rather after
sending any required text to the output, it invokes the
method named processNode (see Listing 11) to
actually process the document node.
that the Document object’s
reference is passed to the method named processNode in Listing 11.)
The processNode
method
As you have learned in previous lessons, there are seven possible types
of nodes in an XML document:
- root or document node
- element node
- attribute node
- text node
- comment node
- processing instruction node
- namespace node
The processNode method handles
the first six types and ignores namespace nodes.
it is not possible to handle namespace nodes in a Java program because
there is no constant in the Node class that can be used to identify
namespace nodes. This will become clear as we examine the
code in the processNode
method.)
Get and save
the node type
The processNode method in this
program contains quite a few changes relative to the programs that I
discussed in previous lessons. Therefore, I will discuss the processNode method in detail.
Code that you write in this method (and
in the processDocumentNode
method discussed above) is somewhat analogous to writing an XSL
stylesheet to be used in an XSLT transformation.
Test for a
valid node, and get its type
The beginning of the processNode
method is shown in Listing 12. The method receives an
incoming parameter of type Node,
which can represent any of the seven types of nodes in the above list.
As you can see in Listing 12, if the parameter doesn’t point to an
actual object, the method quietly
returns, as opposed to throwing a NullPointerException.
void processNode(Node node){ |
The final statement in Listing 12 invokes the getNodeType method to get and save
the type of the node whose reference was received as an incoming
parameter.
Process the node
Each time the processNode
method is invoked, it receives a Node
object’s reference as an incoming parameter. The code in Listing
12 determines the type of the incoming node. Listing 13 shows the
beginning of a switch
statement that is used to initiate the processing of each incoming node
based on its type.
switch (type){ |
The switch statement has six
cases to handle six types of nodes, plus a default case to ignore
namespace nodes.
The
DOCUMENT_NODE case
The code in Listing 13 will be executed whenever the incoming method
parameter points to a document node.
that this will happen only once during the processing of a DOM
tree. The first node processed will always be the document node,
and there is only one document node in a DOM tree.)
This code is identical to code that I have discussed in previous
lessons, so I won’t discuss it further. I included it here solely
to help you get oriented as to the overall control structure of the processNode method.
I do want to point out, however, that when the processNode method is invoked on a
document node, the code in Listing 13 causes a method named defElOrRtNodeTemp to be
invoked. This method emulates the behavior of a built-in template
rule, which in this case causes templates to be applied to all child
nodes of the document node.
Creating custom
template rules
Although this lesson does not create a custom template rule for
document nodes, the process for creating a
custom template rule is as follows:
- Go to this method named processNode.
- Identify the case for the node
type in the switch statement. - Change the conditional clause
in the if statement for that
case to
implement a match for a particular node of that type. - Write code in the body of the if statement to implement the custom
template rule.
If the modified conditional clause
evaluates to true, the custom template rule will be executed. If
it evaluates to false, the
default rule will be executed.
The
ELEMENT_NODE case
Most of the changes to this program (as
compared to programs discussed in previous lessons) consist of
changes to the code that
processes element nodes in the switch
statement. The code for element node case is rather long, so I
will
discuss it in fragments.
new method named forEach was
also added to the program. I will discuss that method in detail
later.)
A match for
element nodes named B
The beginning of the case for element nodes is shown in Listing 14.
case Node.ELEMENT_NODE:{ |
Note the similarity of the code in
Listing 14 and the XSLT template rule shown in Listing 4. When
the node being is processed is an element node whose name is B, the code in Listing 14 invokes
the applyTemplates method to
cause templates to be applied to all child nodes of the node named B.
I discussed the applyTemplates
method in earlier lessons, and won’t repeat that discussion here.
A match for
element nodes named C
Listing 15 shows code that
matches element nodes named C.
else if(node.getNodeName() == "C"){ |
This code applies templates to all
child nodes of the node named C,
and wraps the output produced by that operation in an XHTML paragraph
element, <p>…</p>.
Compare the code in Listing 15 with the second red XSLT template rule
in Figure 3.
A match for
element nodes named D
Listing 16 shows code that matches element nodes named D.
else if(node.getNodeName() == "D"){ |
I’ll start my discussion of the code
in Listing 16 by comparing it with the template rule shown in Listing
5. The behavior of this code is the same as the behavior of the
template rule in Listing 5. However, the execution structure is
slightly different.
The code in Listing 16 begins by sending some text followed by the
start tag for an unordered list to the output. Then it invokes
the forEach method, passing
the context node and the name of the child node named E as parameters.
The forEach
method
The entire forEach method is
shown in Listing 17.
This method, in conjunction with the processNode
method, emulates the behavior of an xsl:for-each
XSLT element.
private void forEach(Node node,String select){ |
If you have been studying the
previous lessons in this series, the structure of the method should be
familiar to you.
The structure of the forEach method
The method receives two parameters:
- A reference to a particular
node of type Node. - The name of a node that should
be a child node of the node.
The purpose of the method is to
access each child node that matches the name, in the order in which
they appear in the DOM tree, and to apply a particular operation to
each of those nodes.
access the nodes in sorted order.)
Get and iterate on a list of child nodes
The code in Listing 17 starts by getting a list of all the child nodes
of the node referenced by the first incoming parameter.
Then it iterates on the list, identifying those nodes whose names match
the second incoming parameter. When it finds a match, it makes a
recursive call to the processNode
method where the operation to be applied to that node is defined.
When it has processed all the nodes in the list, it returns void to the
code shown in Listing 16.
Process all child nodes named E
The first time this method is called in Listing 16, it is instructed to
identify and perform an operation on all the child nodes named E. When the forEach method calls the processNode method, passing an E node’s reference as a parameter,
the code shown in Listing 18 is executed. (Note that this is part of the element
node case in the switch
statement belonging to the processNode
method.)
else if(node.getNodeName() == "E"){ |
Note that I could have put the code
in Listing 18 inside the forEach
method. However, I elected to do it the way that I did to make
the forEach method more
general, and confine all the code for custom template rules to the processDocumentNode and processNode methods.
As you can see, the code in Listing 18 causes templates to be applied
to all child nodes of the node named E,
and causes the output produced by that operation to be surrounded by
the start and end tags for an XHTML list item, <li>…<li>.
Finished with nodes named E
That completes the operation necessary to emulate the template rule in
Listing 5 for nodes named E,
and completes the top half of the code being executed in Listing 16.
Process all child nodes named F
The bottom half of the code in Listing 16 does essentially the same
thing, except that it iterates on child nodes named F and wraps the results in the XHTML
tags for an ordered list, <ol>..</ol>.
In this case, the forEach
method will isolate nodes named F
and pass them recursively to the processNode
method.
At that point, the code in Listing 19 will be executed with exactly the
same behavior as the code in Listing 18, except that it is applied to
nodes named F instead of nodes
named E.
else if(node.getNodeName() == "F"){ |
A match for element
nodes named G
Listing 20 shows custom code that applies to nodes named G.
else if(node.getNodeName() == "G"){ |
This code applies templates to the
child nodes of nodes named G,
and wraps the output from that operation in the XHTML tags for bold, <b>…</b>.
Compare this code to the template rule shown in Listing 7.
A match for
elements Q, R, S, and T
Listing 21 shows custom code that applies to nodes named Q, R,
S, and T.
//Create four levels of XHTML headers |
Similar
blocks of code
The four blocks of code are very
similar. Each block of code applies templates to the matching
node type, and surrounds the output from that operation with the XHTML
tags for a header, such as <h1>…</h1>.
However, the size of the header differs from one to the next.
Compare the block of code in Listing 21 that matches Q with the template rule in Listing
3. Compare all four of the code blocks to the last four template
rules in Figure 3.
Processing
nodes with no match
This XML document contains several nodes for which there is no matching
template in the stylesheet and no matching code block in this program,
including the root element node named A.
Whenever the XSLT processor encounters an element node for which there
is no matching template rule, it executes a built-in rule for element
nodes.
When this program encounters an element node for which there is no
matching code block in the element node case of the switch statement, it executes the
code shown in Listing 22.
else{//invoke default behavior |
As you can see, the code in Listing
22 invokes the method named defElOrRtNoteTemp,
passing the unmatched node as a parameter. This is a method that
mimics the built-in behavior of the XSLT processor. I discussed
it in detail in an earlier lesson, and won’t repeat that discussion
here.
The
remainder
of the processNode method
That completes the discussion of the case for elements nodes in the switch statement of the processNode method. That
leaves the following cases not yet discussed:
- Text nodes
- Attribute nodes
- Comment nodes
- Processing instruction nodes
- Namespace nodes (default case)
No new code
for these nodes
However, there is no new code in the
cases for these nodes in comparison with the code discussed in previous
lessons. Therefore, I won’t repeat that discussion in this lesson.
That completes the discussion of the processNode
method, and leaves the following methods not yet discussed:
- main
- defTextOrArrrTemp
- defElOrRtNodeTemp
- defComOrProcInstrTemp
- applyTemplates
- valueOf
- doXslTransform
However, these methods are identical to methods having the same name
that I discussed in detail in earlier lessons. I won’t repeat
that discussion in this lesson.
The
program output
The output produced by this program is
essentially the same as the XSLT transform output discussed in the
early part of the lesson. The output shown in rendered form in
Figure 2, and in raw XHTML form in Figure 4 represents the output
of both the program and the XSLT transform.
Run the Program
I encourage you to copy the Java code, XML file, and XSL file from
the listings near the end of this lesson. Compile and execute the
program. Experiment with the files, making changes, and observing
the
results
of your
changes.
Summary
In this lesson, I showed you how to use XSLT to transform an XML
document into an XHTML document. I also showed you how to
write Java code to perform the same transformation.
What’s Next?
The next several lessons in this series will illustrate parallel
Java code
and XSLT transformations to transform XML documents into XHTML
documents. The sample programs will illustrate various aspects of
the manipulation of a DOM tree using Java code.
Complete Program Listings
Complete listings of the various files discussed in this lesson are
contained in the listings that follow.
/*File Dom03.java |
NOTE: IT WAS NECESSARY TO MANUALLY ENTER SOME |
NOTE: IT WAS NECESSARY TO MANUALLY ENTER SOME |
Copyright 2004, Richard G. Baldwin. Reproduction in whole or
in
part in any form or medium without express written permission from
Richard
Baldwin is prohibited.
About the author
Richard Baldwin
is a college professor (at Austin Community College in Austin, TX) and
private consultant whose primary focus is a combination of Java, C#,
and XML. In addition to the many platform and/or language independent
benefits of Java and C# applications, he believes that a combination of
Java, C#, and XML will become the primary driving force in the delivery
of structured information on the Web.
Richard has participated in numerous consulting projects, and he
frequently provides onsite training at the high-tech companies located
in and around Austin, Texas. He is the author of Baldwin’s
Programming Tutorials, which
has gained a worldwide following among experienced and aspiring
programmers. He has also published articles in JavaPro magazine.
Richard holds an MSEE degree from Southern Methodist University
and has many years of experience in the application of computer
technology to real-world problems.
-end-