Java Data & Java Java JAXP, Transforming XML to XHTML

Java JAXP, Transforming XML to XHTML

Java Programming Notes # 2210


Preface

In the previous lesson entitled Java
JAXP, Writing Java Code to Emulate an XSLT Transformation
, I showed
you how to
write a Java program that mimics an XSLT transformation for converting
an XML file into a text file.
  I also showed that once you have a
library of Java
methods that
emulate XSLT elements, it is no more difficult to
write a Java program to transform an XML document than it is to
write an XSL stylesheet to transform the same document.

In this lesson, I will show you how to use XSLT to transform an XML
document into an XHTML document.  I will also show you how to
write Java code that performs the same transformation.

This lesson is one in a series designed to teach you how to use JAXP
and Sun’s Java Web Services Developer
Pack
(JWSDP).

The first lesson in the series was entitled Java
API for XML Processing (JAXP), Getting Started

As mentioned above, the
previous lesson was entitled Java
JAXP, Writing Java Code to Emulate an XSLT Transformation
.

JAXP, XML, XSL, XSLT, W3C, and XHTML, a
Review

JAXP is an
API designed
to help you write programs for creating and processing XML
documents. It is a critical part of Sun’s Java Web Services Developer
Pack
(JWSDP).

XML is an acronym for the eXtensible
Markup Language. 
I will assume that you already
understand
XML, and will teach you how to use JAXP to write programs for
creating and processing XML documents.

XSL is an acronym for Extensible Stylesheet language. 
XSLT is an acronym for XSL Transformations.


The numerous uses of XSLT include the following:

  • Transforming non-XML documents into XML documents.
  • Transforming XML documents into other XML documents.
  • Transforming XML documents into non-XML documents.

This
lesson explains a Java program
that transforms an XML document into an XHTML document.

An XHTML document is an XML
document that provides a rigorous alternative to the use of an HTML document.  According to
the W3C, XHTML 1.0 is a “Reformulation of HTML 4 in XML 1.0.”

Viewing tip

You may find it useful to open another copy of this lesson in a
separate browser window.  That will make it easier for you to
scroll back and forth among the different listings and figures while
you are reading about them.

Supplementary material

I recommend that you also study the other lessons in my extensive
collection of online Java and XML tutorials.  You will find those
lessons
published at Gamelan.com
As of the date of this writing, Gamelan doesn’t maintain a
consolidated index of my tutorial lessons, and sometimes
they are difficult to locate there.  You will find a consolidated
index at www.DickBaldwin.com.

Preview

A tree structure in memory

A DOM parser can be used to
create a tree structure in memory that represents an XML
document.  In Java, that tree structure is encapsulated in an
object of the interface type Document.

Many operations are possible

Given an object of type Document (often called a DOM tree), there
are many
methods that
can be invoked on the object to perform a variety of operations.

Two ways to
transform an XML document

There are at least two ways to transform the contents of an XML
document into another document:

  • By writing Java code to manipulate the DOM tree and perform the
    transformation.
  • By using XSLT to perform the transformation.

A skeleton
library of Java methods

This is one of several lessons that show you
how to write the skeleton of a Java library containing methods that
emulate the most common XSLT elements.  Once you have the library,
writing Java code to transform XML documents consists mainly of writing
a short driver program to access those methods.  Given the proper
library of methods, it is no more difficult to write a
Java program to perform the transformation than it is to write
an
XSLT stylesheet.

Library is
not my primary purpose

However, my primary purpose in these lessons is not to provide such
a library, but rather is to help you understand how to use a DOM
tree to create, modify, and manipulate XML documents.  By
comparing Java code that manipulates a DOM tree with similar XSLT
operations, you will have an opportunity to learn a little about XSLT
in the process of learning how to manipulate a DOM tree using Java code.

Some
Details Regarding XHTML


XHTML documents, a special case

An XHTML document is an XML document.  It is a rigorous
alternative to an HTML document. 

One of
the interesting
uses of XSLT is the transformation of XML documents into
XHTML documents.  This
makes it possible to render the information contained in an XML
document using an XHTML-compatible Web browser.

Where does the transformation take place?

When transforming an XML document for rendering
with an XHTML browser, the transformation can take place anywhere
between the
source of the XML document and the browser.

Transforming on the server

For example, a transformation program can be written in Java and run
on a web
server as a
servlet, or it can be written as a JavaBeans component and accessed
from a scriptlet in JavaServer pages (JSP).

Transforming at the browser

The transformation can also be performed by the browser.  For
example, Microsoft IE 6.0 and XSLT can be used for this
purpose.

Will
transform XML into XHTML

This and the next several lessons will illustrate parallel Java code
and XSLT transformations to transform XML documents into XHTML
documents.  The sample programs will illustrate various aspects of
the manipulation of a DOM tree using Java code.

Requirements
for XHTML documents

According to Web Design
& Development Using XHTML
by Griffin, Morales, and Finnegan, an
XHTML document differs from an HTML document in the following ways:

  • XHTML documents must be well-formed.
  • Element and attribute names must be in lower case.
  • Non-empty elements require end tags.
  • Attribute values must always be quoted.
  • XHTML documents have no attribute minimization.
  • XHTML documents end empty elements.
  • XHTML documents use elements with id and name attributes.
  • XHTML documents use Document Type Declarations
  • XHTML documents use XML namespaces.

Although it is not a requirement, an XHTML document often has an XML
declaration at the beginning to identify the document as an XML
document.

Some
Details Regarding XSLT

Previous lessons in this series have provided quite a bit of
detailed information regarding the operation of XSLT.  Therefore,
this discussion will be brief.

Assume that an XML document has been parsed to produce a DOM
tree
in memory that represents the XML document.

Execute
template rules

An XSLT processor starts examining the DOM tree at its root
node.  It
obtains instructions from the XSLT stylesheet telling it how to
navigate the
tree, and how to treat each node that it encounters along the way.

As each node is encountered, the processor searches the stylesheet
looking for a template rule that governs how to treat nodes of that
type.  If the
processor finds
a template rule that matches the node type, it performs the operations
indicated by the template rule.  Otherwise, it
executes a built-in template rule appropriate to that node.

Literal text in
template rules

If the template rule being applied
contains literal text, that literal text is used to
create text in the output.

Traversal of
the DOM tree

There are at least two XSLT elements that can be used to
traverse the children of a context node:

  • xsl:apply-templates
  • xsl:for-each

The
xsl:apply-templates element

The xsl:apply-templates
element was discussed in detail in previous lessons.

The
xsl:for-each element

The xsl:for-each element
executes an iterative
examination of all child nodes of the context node that
match a required select attribute.  As each child
node is examined, it is processed using XSLT elements that form the
content of the xsl:for-each
element in the template rule.

This lesson will include examples that use the xsl:for-each element in addition to
the xsl:apply-templates
element.  The lesson will also explain a Java method that emulates
the xsl:for-each element.

Enough talk,
let’s
see some code

I will begin by discussing the XML file named Dom03.xml (shown in Listing 24 near the end of the
lesson)
along with
the XSL
stylesheet file named Dom03.xsl
(shown in Listing 25).

A Java program
named Dom03

After explaining the transformation produced by applying this
stylesheet to this XML document, I will explain the transformation
produced by processing the XML file with a Java program named Dom03 (shown in Listing 23) that mimics
the behavior of the XSLT transformation.

Discussion
and Sample Code


The XML
file named Dom03.xml

The XML file shown in Listing 24 is relatively straightforward.  A
tree view of the XML file is shown in Figure 1.  (This XML file is both well-formed and
valid.)


#document DOCUMENT_NODE
A DOCUMENT_TYPE_NODE
#comment COMMENT_NODE
xml-stylesheet PROCESSING_INSTRUCTION_NODE
A ELEMENT_NODE
Q ELEMENT_NODE
#text A Big Header
B ELEMENT_NODE
C ELEMENT_NODE
#text Text block 1.
R ELEMENT_NODE
#text A Mid Header
C ELEMENT_NODE
#text Text block 2.
#comment COMMENT_NODE
processor PROCESSING_INSTRUCTION_NODE
S ELEMENT_NODE
#text A Small Header
B ELEMENT_NODE
C ELEMENT_NODE
#text Text block 3.
S ELEMENT_NODE
#text Another Small Header
B ELEMENT_NODE
C ELEMENT_NODE
#text Text block 4.
T ELEMENT_NODE
#text A Smallest Header
B ELEMENT_NODE
C ELEMENT_NODE
#text Text block 5.
D ELEMENT_NODE
E ELEMENT_NODE
#text First list item in E
G ELEMENT_NODE
#text Nested G text element
F ELEMENT_NODE
#text First list item in F
E ELEMENT_NODE
#text Second list item in E
F ELEMENT_NODE
#text Second list item in F
E ELEMENT_NODE
#text Third list item in E
F ELEMENT_NODE
#text Third list item in F
C ELEMENT_NODE
#text Text block 6.
C ELEMENT_NODE
#text Text block 7.
R ELEMENT_NODE
#text Another Mid Header
C ELEMENT_NODE
#text Text block 8.
B ELEMENT_NODE
R ELEMENT_NODE
#text Another Mid Header in Another B
C ELEMENT_NODE
#text Text block 9.
Figure 1

(This
tree view of the XML file was
produced using a program named DomTree02, which was discussed in an
earlier lesson.

Note that in order to make the tree view more
meaningful, I manually removed extraneous line breaks and text nodes
associated with those line breaks.  The extraneous
line breaks in Figure 1 were caused by extraneous line breaks in the
XML file.  The extraneous line breaks in the XML file were placed
there for cosmetic reasons and to force it to fit into this narrow
publication format.)

Content of the XML
document

The structure and content of the XML document was primarily designed to
illustrate various transformation concepts that I intend to explain in
this lesson.  However, to some extent, I designed the structure
and
content keeping in mind the ultimate rendering of the XHTML file that
will be produced by transforming the XML file into an XHTML file.

The rendered
XHTML file

At this point, I’m going to jump ahead and show you what the final
XHTML file
looks like when rendered using Netscape Navigator v7.1.  The
rendering of the XHTML file is shown in Figure 2. 

(You may find it useful to compare the
rendering in Figure 2 with the XML file structure and content in Figure
1.  You should be able to identify text nodes in Figure 1 that
match up with rendered text in Figure 2.)


Rendered XHTML file

Figure 2 Rendered XHTML file

The XSLT Transformation


The XSL
stylesheet file named Dom03.xsl

Recall that an XSL stylesheet is itself an XML file, and can therefore
be represented as a tree.  Figure 3 presents an
abbreviated tree view of the stylesheet shown in Listing 25.  I
colored each of the template rules in this view with alternating
colors of red and blue to make them easier to identify.

(As is often the
case with XSL stylesheets, this stylesheet file is well-formed but it
is not
valid.)


NOTE:  IT WAS NECESSARY TO MANUALLY ENTER SOME
LINE BREAKS IN THIS PRESENTATION TO FORCE IT TO
FIT INTO THIS NARROW PUBLICATION FORMAT.

#document DOCUMENT_NODE
xsl:stylesheet ELEMENT_NODE
Attribute: version=1.0
Attribute: xmlns_xsl=http://www.w3.org/1999
/XSL/Transform
xsl:output ELEMENT_NODE
Attribute: method=xml
Attribute: doctype-public=-//W3C//DTD
XHTML 1.0 Transitional//EN
Attribute: doctype-system=http://www.w3.
org/TR/xhtml1/DTD/xhtml1-transitional.dtd

xsl:template ELEMENT_NODE
Attribute: match=/
html ELEMENT_NODE
head ELEMENT_NODE
meta ELEMENT_NODE
Attribute: http-equiv=content-type
Attribute: content=text/html;
charset=UTF-8
title ELEMENT_NODE
#text Generated XHTML file
body ELEMENT_NODE
table ELEMENT_NODE
Attribute: border=2
Attribute: cellspacing=0
Attribute: cellpadding=0
Attribute: width=330
Attribute: bgcolor=#FFFF00
tr ELEMENT_NODE
td ELEMENT_NODE
xsl:apply-templates ELEMENT_NODE

xsl:template ELEMENT_NODE
Attribute: match=B
xsl:apply-templates ELEMENT_NODE

xsl:template ELEMENT_NODE
Attribute: match=C
p ELEMENT_NODE
xsl:apply-templates ELEMENT_NODE

xsl:template ELEMENT_NODE
Attribute: match=D
#text List of items in E

ul ELEMENT_NODE
xsl:for-each ELEMENT_NODE
Attribute: select=E
li ELEMENT_NODE
xsl:apply-templates ELEMENT_NODE
#text List of items in F
ol ELEMENT_NODE
xsl:for-each ELEMENT_NODE
Attribute: select=F
li ELEMENT_NODE
xsl:apply-templates ELEMENT_NODE

xsl:template ELEMENT_NODE
Attribute: match=G
b ELEMENT_NODE
xsl:apply-templates ELEMENT_NODE

xsl:template ELEMENT_NODE
Attribute: match=Q
h1 ELEMENT_NODE
xsl:apply-templates ELEMENT_NODE

xsl:template ELEMENT_NODE
Attribute: match=R
h2 ELEMENT_NODE
xsl:apply-templates ELEMENT_NODE

xsl:template ELEMENT_NODE
Attribute: match=S
h3 ELEMENT_NODE
xsl:apply-templates ELEMENT_NODE

xsl:template ELEMENT_NODE
Attribute: match=T
h4 ELEMENT_NODE
xsl:apply-templates ELEMENT_NODE
Figure 3

Why abbreviated?

The reason that I refer to this as
an abbreviated tree view is because I manually deleted comment nodes
and
extraneous text nodes in order to emphasize the important elements in
the stylesheet.

(Extraneous text nodes occur as a result
of inserting line breaks in the original XSL document for cosmetic
purposes.

Note that I also manually entered several line breaks near the
beginning to force the material to fit into
this narrow
publication format.)


The root element

The root node of all XML documents is the document node.  In
addition to the root node, there is also a root element, and it is
important not to confuse the two.

As you can see from Figure 3, the root element in the XSL document is
of type xsl:stylesheet
The root element has two attributes, each of which is standard for XSL
stylesheets.

(Note that I manually entered a line break
in the second attribute of the xsl:stylesheet
node to force it to fit into this narrow publication format. 
I also
manually entered line breaks into two of the attributes of the xsl:output element node to force
them
to fit into this narrow publication format.
)


The first attribute provides
the XSLT
version.

The second attribute points to the XSLT namespace URI, which you can
read about in the W3C
Recommendation
.


Children of the
root element node

The root element node (xsl:stylesheet)
in Figure
3
has ten child
nodes, nine of which
are template rules.  (The green child node is not a template
rule.  I will discuss it in detail later.)
  I colored
the template rules in alternating colors of red and blue to make them
easier to identify
visually.

The template
rules

Each of the nine template rules has a match
pattern.  The nine match patterns in the order that they appear in
Figure 3 are as follows:

  1. match=/ (root node)
  2. match=B (matches element
    node named B)
  3. match=C (matches element
    node named C)
  4. match=D (matches element
    node named D)
  5. match=G (matches
    element node named G)
  6. match=Q (matches
    element node named Q)
  7. match=R (matches
    element node named R)
  8. match=S (matches
    element node named S)
  9. match=T (matches
    element node named T)

I will discuss each of the nine template rules later, but before doing
that
I will show you the raw XHTML output produced
by this XSLT transformation.

(Note
that the Java program discussed later produces essentially the same
output as the XSLT transformation.)


The output from
the transformation

The result of performing an XSLT transformation (by applying the XSL
stylesheet shown in Listing 25 to the XML file shown in Listing 24)

is
shown in Figure 4.  This is the raw XHTML code that
was rendered in Figure 2.

I will explain the operations in the XSLT transformation that produced
most of the text in Figure 4.


NOTE THAT IT WAS NECESSARY FOR ME TO MANUALLY
INSERT LINE BREAKS IN SEVERAL OF THE LONG LINES
IN THIS MATERIAL TO FORCE IT TO FIT INTO THIS
NARROW PUBLICATION FORMAT. I ALSO MANUALLY
INSERTED LINE BREAKS AT CRITICAL POINTS TO
MAKE IT EASIER TO INTERPRET THE MATERIAL
VISUALLY.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0
Transitional//EN" "http://www.w3.org/TR/
xhtml1/DTD/xhtml1-transitional.dtd">
<html
xml_lang="en" lang="en">
<head>
<meta http-equiv="content-type"
content="text/html; charset=UTF-8"/>
<title>Generated XHTML file</title>
</head>
<body>
<table border="2" cellspacing="0" cellpadding="0"
width="330" bgcolor="#FFFF00"><tr><td>
<h1>
A Big Header
</h1>
<p>
Text block 1.
</p>
<h2>
A Mid Header
</h2>
<p>
Text block 2.
</p>
<h3>
A Small Header
</h3>
<p>
Text block 3.
</p>
<h3>
Another Small Header
</h3>
<p>
Text block 4.
</p>
<h4>
A Smallest Header
</h4>
<p>
Text block 5.
</p>
List of items in E
<ul>
<li>
First list item in E
<b>
Nested G text element
</b>
</li>
<li>
Second list item in E
</li>
<li>
Third list item in E
</li>
</ul>
List of items in F
<ol>
<li>
First list item in F
</li>
<li>
Second list item in F
</li>
<li>
Third list item in F
</li>
</ol>
<p>
Text block 6.
</p>
<p>
Text block 7.
</p>
<h2>
Another Mid Header
</h2>
<p>
Text block 8.
</p>
<h2>
Another Mid Header in Another B
</h2>
<p>
Text block 9.
</p>
</td></tr></table>
</body></html>

Figure 4

(Note
that I manually deleted a couple of extraneous line breaks from
the output shown in Figure 4.  It was also necessary for me to
manually insert line breaks in several of the long lines to force the
material to fit in this narrow publication format.  I also
manually inserted line breaks at certain critical points to make it
easier to interpret the material visually.
)

Can sometimes
get confusing

I will caution you up front that this discussion can become
confusing but I will do everything that I can to minimize the
confusion.  The problem is that the discussion will be mixing
tags, attributes and elements from the XML file with tags, attributes,
and
elements from the stylesheet file and the XHTML file.  With so
many tags, attributes, and elements being discussed, it is sometimes
difficult to keep
them separated in your mind.

In particular, in order to cause the output to be a valid XHTML
document, it is necessary to manually insert XHTML tags, attributes,
and elements in the XSL template rules, which themselves involve XML
tags, attributes, and elements.

I will make heavy use of color in an attempt to minimize the confusion.

The first line of
text

The first line of text in the output shown in Figure 4
is an XML declaration
that is produced automatically by the XSLT transformer available with
JAXP.  As I mentioned earlier, such a declaration is not
required, but is highly recommended by most authors.

The xsl:output
element

Before getting into the template rules in Figure 3, I need to explain
the xsl:output element shown
in green in Figure 3 and reproduced in Figure 5 below for convenient
viewing.

    xsl:output ELEMENT_NODE
Attribute: method=xml
Attribute: doctype-public=-//W3C//DTD
XHTML 1.0 Transitional//EN
Attribute: doctype-system=http://www.w3.
org/TR/xhtml1/DTD/xhtml1-transitional.dtd
Figure 5


The XSL
stylesheet version

Listing 1 shows the XSL code that corresponds to the tree view of the
stylesheet element shown in Figure 5.

<xsl:output method="xml" 
doctype-public="-//W3C//DTD
XHTML 1.0 Transitional//EN"
doctype-system="http://www.w3.
org/TR/xhtml1/DTD/xhtml1-transitional.dtd" />

Listing 1
(As on several previous occasions, I need
to remind you that it was necessary for me to manually insert line
breaks in Listing 1 to cause the material to fit in this narrow
publication format.)


Literal text passes
through to the output

As you learned in the previous lesson, any literal text that you
include in your XSL stylesheet will be passed through to the
output.  As you will see later, I will cause the output to contain
much of the required XHTML text simply by including that XHTML text as
literal text in the stylesheet.

The stylesheet
is an XML document

It is important to remember, however, that the XSL stylesheet is itself
an XML document, and you cannot include any literal text that would
cause a parser
to reject it as an XML document.  You also cannot do anything that
will cause the XSLT processor to reject it as a stylesheet.

XHTML document
requires a specific DTD reference

One of the things that is required in the XHTML output is the DTD
reference
shown in Figure 6.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 
Transitional//EN" "http://www.w3.org/TR/
xhtml1/DTD/xhtml1-transitional.dtd">

Figure 6


(The
material in Figure 6 was extracted from Figure 4 and reproduced here
for convenient viewing.  This is one of three alternative DTDs
that can be used with an XHTML document.)


Correct DTD for
XHTML but not for stylesheet

The DTD reference in Figure 6 is a correct DTD reference for an XHTML
document, but it is not a correct DTD reference for an XSL
stylesheet.  (In fact,
stylesheets don’t require a DTD and often don’t have one.)

If you simply include the text from Figure 6 as literal text in the
stylesheet, (in hopes that it will
pass through to the output),
the XSLT processor will interpret
it as a DTD reference for the stylesheet, and will attempt to validate
the stylesheet against that reference.  The stylesheet will then
be declared invalid and the transformation effort will fail.

Therefore, you must find a way to cause this DTD reference to end up in
the XHTML document without confusing the XSLT transformation process.

Two ways to
accomplish that

I know of two ways to accomplish that objective.  One way is to
include the text from Figure 6 in a CDATA section in the
stylesheet.  This
raises some other issues, but it can be made to work.

The easier way is to use the xsl:output
element shown in Listing 1 to cause the DTD reference to be written
into the output without confusing the parser or the XSLT processor.

The xsl:output
element

Here is a partial quotation from XML In A Nutshell, (which I highly recommend), by
Elliotte Rusty Harold and
W. Scott Means.

“The
top-level xsl:output element helps determine the exact formatting of
the XML document produced when the result tree is stored in a file,
written onto a stream, or otherwise serialized into a sequence of
bytes.”

Ten optional
attributes

To make a long story short, this element has ten optional attributes
that are used by the XSLT processor to determine the formatting of the
output.  The XSLT element shown in Listing 1 specifies values for
three of those optional attributes:

  1. method
  2. doctype-public
  3. doctype-system

The default value for method is
xml, so I could have omitted
this attribute from my stylesheet with no problems.  When the
value of this attribute is xml,
(which is the case in Listing 1),
that instructs the processor to produce a well-formed XML document.

The doctype-public attribute
sets the public identifier used in the document type declaration.

The doctype-system attribute
sets the system identifier used in the document type declaration.

The required
XHTML DTD

There are three allowable DTDs that can be used for an XHTML document:

  • Strict
  • Transitional
  • Frameset

I’m not going to get into the differences between these three
DTDs in this lesson.  Suffice it to say that I elected to use the
transitional
DTD for this example because it is somewhat easier to use than the
other two.

The
transitional DTD

Here is what the W3C has to say about the DTD for XHTML 1.0 Transitional:

This DTD module is identified by the
following PUBLIC and SYSTEM identifiers:

PUBLIC
“-//W3C//DTD XHTML 1.0 Transitional//EN”
SYSTEM
“http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd”

As you can see, these values match the doctype-public
and doctype-system attribute
values  in Listing 1, and result in the correct output for the
XHTML DTD in Figure 6.

The first
template rule


The first template rule (extracted
from Figure 3 and given a different color scheme)
is shown in
tree view in Figure 7.  This
template rule contains an XPath expression that matches the document
root (note the forward slash).


    xsl:template ELEMENT_NODE
Attribute: match=/
html ELEMENT_NODE
head ELEMENT_NODE
meta ELEMENT_NODE
Attribute: http-equiv=content-type
Attribute: content=text/html;
charset=UTF-8
title ELEMENT_NODE
#text Generated XHTML file
body ELEMENT_NODE
table ELEMENT_NODE
Attribute: border=2
Attribute: cellspacing=0
Attribute: cellpadding=0
Attribute: width=330
Attribute: bgcolor=#FFFF00
tr ELEMENT_NODE
td ELEMENT_NODE
xsl:apply-templates ELEMENT_NODE
Figure 7


The template rule in
XSL format

Listing
2 shows the same template rule in
XSL format, (extracted from Listing
25).

<xsl:template match="/">
<html>
<head>
<meta http-equiv="content-type"
content="text/html; charset=UTF-8"/>
<title>Generated XHTML file</title>
</head>
<body>
<table border="2" cellspacing="0"
cellpadding="0" width="330"
bgcolor="#FFFF00" >
<tr>
<td>
<xsl:apply-templates/>
</td>
</tr>
</table>
</body>
</html>
</xsl:template>

Listing 2
(Note
that according to most of the books that I have read, the following
namespace attribute should be used on the html
tag.  However, something about it causes problems with the JAXP
transformer so
I left it off.  The resulting XHTML file is still valid according
to the W3C Markup
Validation Service
even without the namespace attribute.





xml:lang=”en” lang=”en”)


The literal text is shown in red

From my viewpoint as the author of the stylesheet, everything that is
colored red in Listing 2 is simply literal text that I want to pass
through to the output so that it will become part of the raw XHTML text.

The template rule
must be well-formed


However, as you can see from Figure 7, the XML parser considers all of
this material to be well-formed (but
not valid)
XML element nodes, attribute nodes, and text
nodes.  Were I to make a change to any of the red literal text
that would corrupt the well-formed nature of the XML code in Listing 2,
the
stylesheet could not be used to control an XSLT transformation. 
While a stylesheet is not required to be valid, it is required to be
well-formed.

Must be very
careful when including markup in stylesheet

Therefore, you must be very careful when you include literal markup
text in the stylesheet for whatever purpose.  Any markup that you
include in the stylesheet must result in the stylesheet being
well-formed.

(This was not a problem with the inclusion
of literal text in the stylesheet in the previous lesson, because the
literal text didn’t contain markup characters.  As a result, the
literal text was interpreted simply as text nodes in the
stylesheet.  As you can see from Figure 7, however, the literal
markup text that was included in this stylesheet was interpreted by the
parser as element nodes, attributes and text nodes.)


A very simple
template rule.

At first blush, this template rule appears to be very long and very
complex.  However, as you can see from Listing 2, once you isolate
out all of the literal XHTML text that’s included in the template rule,
the actual XSLT template rule is very simple.  This rule simply
passes a lot of literal markup text through to the output and causes
templates
to be applied to all children of the root (document) node.  (You learned what it means to apply
templates in
the previous lesson.)

The XHTML tags

If you are familiar with XHTML syntax, you will recognize that the
literal text shown in red in Listing 2 begins with typical XHTML tags
such as <html>, <head>,  and <body>.  These
tags are required for an XHTML document.  This text is sent to the
output before any processing of the DOM tree is performed.

Then the literal text creates an XHTML table with a yellow
background.  The start tags for the table are sent to the output
before the xsl:apply-templates
element is executed.

All of the output produced by executing the xsl:apply-templates element is
inserted into a single data <td> cell in the table.

Finally, when the xsl:apply-templates
element returns, the end tags for the table and the end tags for the
document are sent to the output.

The raw XHTML
output

Figure 8 shows a condensed version of the raw XHTML output.  The
XHTML output shown in red in Figure 8 matches the literal text shown in
red in the template rule of Listing 2.

NOTE THAT IT WAS NECESSARY FOR ME TO MANUALLY
INSERT LINE BREAKS IN SEVERAL OF THE LONG LINES
IN THIS MATERIAL TO FORCE IT TO FIT INTO THIS
NARROW PUBLICATION FORMAT.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0
Transitional//EN" "http://www.w3.org/TR/
xhtml1/DTD/xhtml1-transitional.dtd">
<html>
<head>
<meta http-equiv="content-type"
content="text/html; charset=UTF-8"/>
<title>Generated XHTML file</title>
</head>
<body>
<table border="2" cellspacing="0" cellpadding="0"
width="330" bgcolor="#FFFF00"><tr><td>

...HTML CODE DELETED FOR BREVITY...

</td></tr></table>
</body></html>

Figure 8


The effect of
xsl:apply-templates

Referring once again to Listing 2, we see that this template rule
causes templates to be applied to all child nodes of
the root or document node.  A root node can have only one child
node, which is the root element node.  Referring back to Figure 1,
we see that the root element node is named A.

Now referring back to the tree view of the stylesheet in Figure 3 (and also the list of match patterns
presented earlier),
we see that the stylesheet doesn’t contain a
template rule that matches an element named A.

Important to
understand built-in behavior

If the processor encounters a node for which there is no matching
template rule, it executes a built-in template rule for that
type of node.  This is where it becomes important to understand
the behavior of the built-in template rules, which I explained in the
earlier lesson entitled Java
JAXP, Implementing Default XSLT Behavior in Java
.

The behavior of the built-in template rule for element nodes is to
apply templates to all child nodes of the element node. 
Therefore, in this case, the processor will apply templates to all
child nodes of the root element node named A.

Referring back to Figure 1, we see that the root element node has three
child nodes, which occur in the following order:  Q, B, and
B. 
Therefore, the first node that will be processed is the node named Q.

A template rule
that matches Q

Figure 9 and Listing 3 show a template rule that matches an element
named Q.

    xsl:template ELEMENT_NODE
Attribute: match=Q
h1 ELEMENT_NODE
xsl:apply-templates ELEMENT_NODE
Figure 9


The tree view of the template rule is shown in Figure 9.  The XSL
stylesheet code is shown in Listing 3.


<xsl:template match="Q">
<h1>
<xsl:apply-templates />
</h1>
</xsl:template>

Listing 3

 A level 1
header in the output

This template rule sends the start and end tags for a level 1 XHTML
header to the output, and inserts something between those tags by
applying templates to all child nodes of the element node named Q.

Referring back to the element node named Q in Figure 1, we see that it has
only one child node, and that node is a text node.  Executing the xsl:apply-templates element on a
text node causes the built in version of the template rule to be
applied.  The built-in version gets the value of the text node and
sends it to the output.  This produces the raw XHTML output shown
in Figure 10.

<h1>
A Big Header
</h1>

Figure 10


You should be able to easily identify the header from Figure 10 in the
first line of the rendered output in Figure 2.

A template rule
that matches B


That takes care of processing the root element node’s child named Q.  The next child to be
processed is a child node named B.

A template rule that matches an element node named B is shown in Figure 11 and Listing
4.

    xsl:template ELEMENT_NODE
Attribute: match=B
xsl:apply-templates ELEMENT_NODE
Figure 11


As before, the tree view is shown in Figure 11 and the stylesheet code
is shown in Listing 4.


<xsl:template match="B">
<xsl:apply-templates />
</xsl:template>

Listing 4

This template rule is very
simple.  It simply causes templates to be applied to all child
nodes of the element node named B
Referring back to Figure 1, we see that the first child node named B has several child nodes, which
occur in the following order:  C, R, C, S, B, S, B, R, C.

An abbreviated
DOM tree

Don’t worry, I’m not going to discuss them all.  In fact, I’m
going to ignore many of those nodes and their descendants, and
concentrate on the abbreviated portion of the DOM tree shown in Figure
12.  I am going to concentrate on this portion because it uses
XSLT templates not previously discussed in this lesson or in my earlier
lessons. 

    B ELEMENT_NODE
...
B ELEMENT_NODE
...
B ELEMENT_NODE
...
D ELEMENT_NODE
E ELEMENT_NODE
#text First list item in E
G ELEMENT_NODE
#text Nested G text element
F ELEMENT_NODE
#text First list item in F
E ELEMENT_NODE
#text Second list item in E
F ELEMENT_NODE
#text Second list item in F
E ELEMENT_NODE
#text Third list item in E
F ELEMENT_NODE
#text Third list item in F
Figure 12


To help you keep your bearings, the first node named B in Figure 12 is the first node
named B belonging to the root
element node named A in Figure
1.  That
node named B will be the
starting point for the following discussion.  Nodes have
been manually removed from Figure 12 at each point where you see an
ellipses (…).  I will ignore those nodes.

Traversing down
the DOM tree


As you saw in the template rule that matches B in Figure 4, each time the
processor encounters an element node named B, templates are applied to all
child nodes of that node and no other action is required. 
Therefore, we can immediately skip down to a discussion of the element
node named D.

A template rule
that matches D

Figure 13 shows a tree view of the template rule that matches D.

    xsl:template ELEMENT_NODE
Attribute: match=D
#text List of items in E

ul ELEMENT_NODE
xsl:for-each ELEMENT_NODE
Attribute: select=E
li ELEMENT_NODE
xsl:apply-templates ELEMENT_NODE
#text List of items in F
ol ELEMENT_NODE
xsl:for-each ELEMENT_NODE
Attribute: select=F
li ELEMENT_NODE
xsl:apply-templates ELEMENT_NODE
Figure 13


The stylesheet code for the template rule that matches D is shown in Listing 5.


<xsl:template match="D">List of items in E
<ul>
<!-- loop -->
<xsl:for-each select="E">
<li>
<xsl:apply-templates />
</li>
</xsl:for-each>
<!-- End loop -->
</ul>List of items in F
<ol>
<!-- loop -->
<xsl:for-each select="F">
<li>
<xsl:apply-templates />
</li>
</xsl:for-each>
<!-- End loop -->
</ol>

</xsl:template>

Listing 5

In an attempt to separate the text
and markup that controls the transformation process from the text and
markup destined to become part of the XHTML document, I colored the
latter red in Figure 13 and Listing 5.  I also colored the XML
comments blue in Listing 5 to make them easy to ignore.

A simpler
version

In an attempt to make it even easier to understand the behavior of this
template rule, I have reproduced it in Listing 6 with all literal text
and all comments removed.  I also added indentation to help with
the visual aspect of the XSL code.

NOTE:  LITERAL TEXT AND COMMENTS WERE MANUALLY
REMOVED FROM THIS TEMPLATE RULE FOR DISCUSSION
PURPOSES.

<xsl:template match="D">
<xsl:for-each select="E">
<xsl:apply-templates />
</xsl:for-each>

<xsl:for-each select="F">
<xsl:apply-templates />
</xsl:for-each>
</xsl:template>

Listing 6

First consider the behavior of the
top half of the template rule in Listing 6.  This rule is invoked
whenever the processor encounters an element node named D.

<xsl:for-each
select=”E”>

The processor identifies all child nodes of D whose name is E and processes them in the order in
which they occur.

(It is also possible to process the child
nodes in sorted order using a more complex implementation, but that
isn’t being done here.  That will be the topic for a future
lesson.)


<xsl:apply-templates>

The processing that is applied to each child node named E depends on the elements that
follow the xsl:for-each
element in the template rule.  In this case, the processor is
instructed to apply templates to all child nodes of each node named E.

Referring back to Figure 12, you will see that the node named D has three child nodes named E and three child nodes named F.

(I colored the child nodes named E and F, and their descendants, in
alternating colors of red and blue to make them easier to identify
visually.)


One of the child nodes named E
has a child node named G.

No matching
template rules for E or F

Referring back to the tree view of the stylesheet in Figure 3, you can
see that there are no matching template rules for nodes named E or F.  However, there is a
matching template rule for nodes named G.

Apply built-in
template rule to node E

When the processor encounters the first node named E, it will apply the built-in
template rule for element nodes.  That will cause it to apply
templates to all child nodes of the node named E.  The first child node that
it will encounter will be a text node containing the following text:

First list item in E

This text will be sent to the
output.

Then it will encounter the node named G
and apply the matching template rule to that node.  The tree view
of that template rule is shown in Figure 14.

    xsl:template ELEMENT_NODE
Attribute: match=G
b ELEMENT_NODE
xsl:apply-templates ELEMENT_NODE
Figure 14


The stylesheet code for the template rule that matches G is shown in Listing 7.


<xsl:template match="G">
<b>
<xsl:apply-templates />
</b>
</xsl:template>

Listing 7

This template rule applies templates
to all child nodes of G, and
surrounds the output produced by that operation with the XHTML start
and end tags to cause that material to be displayed as bold.

Referring back to Figure 12, we see that the node named G has only one child node.  It
is a text node containing the following text:

Nested G text element

That text will be sent to the output
next, surrounded by XHTML bold tags, <b>…</b>.

That completes the processing of the first child of D named E.

Note in Figure 12 that the next child node of D is a node named F.  However, we are discussing
the behavior of that portion of the template rule shown in Figure 6
that is using the xsl:for-each
element to iterate on nodes named E
Therefore, the processor will skip over the node named F and process the next node named E.

This is a simple node that has only one child node and it is a text
node containing the following text:

Second list item in E

This text will be the next thing to
be sent to the output.

The node named D has one more child node named E, and it has a single
child node, which is a text node.  The text node contains the
following text:

Third list item in E

When that text is sent to the
output, the execution of the top half of the template rule shown in
Figure 6 will be complete.  Then the processor will execute the
bottom half of the template rule in Figure 6.  The bottom half is
identical to the top half except that it iterates on child nodes named F, so I won’t discuss it in detail.

Let’s look at
the XHTML output

Before moving along, let’s take a look at the raw XHTML produced by the
template rule shown in Listing 5.  That XHTML output is shown in
Listing 15.

List of items in E
<ul>
<li>
First list item in E
<b>
Nested G text element
</b>
</li>
<li>
Second list item in E
</li>
<li>
Third list item in E
</li>
</ul>
List of items in F
<ol>
<li>
First list item in F
</li>
<li>
Second list item in F
</li>
<li>
Third list item in F
</li>
</ol>

Figure 15


Black text
originates in XML document

The black text in Listing 15 originated in the XML file shown in Figure
12.  You should be able to match the seven lines of black text in
Figure 15 to the corresponding text in Figure 12.

Red and blue
text originates in stylesheet

The red text in Listing 15 originated in the stylesheet template rule
shown in Listing 5.  This literal text is also shown in red in
Listing 5.

The blue text in Listing 15 originated in the template rule shown in
Listing 7.  This text is also shown in blue in Listing 7.

How does it
render?

If you go back and examine Figure 2, which shows the XHTML as rendered
by the Netscape Navigator browser, you should be able to identify the
output in Figure 2 produced by the raw XHTML text in Figure 15.  (It occurs between the lines that read Text block 5 and Text block 6.)

As you can see, the template rule shown in Figure 5 used an xsl:for-each element

  • To iterate on child nodes
    named E,
  • To extract the text values of
    those nodes and their descendants, and
  • To embed those values in XHTML
    elements to cause the values to be rendered as an unordered list.

The value of a child node of one of
the E nodes was also caused to
be rendered in bold.

Then the template rule used an xsl:for-each
element

  • To iterate on child nodes
    named F,
  • To extract text values from
    those nodes, and
  • To embed those values in XHTML
    elements to cause the values to be rendered an ordered list.

New XSLT material
has been covered

I could go on for hours discussing the interaction of this stylesheet
with
the XML file in the transformation process.  However, a review of
the tree view of the
stylesheet in Figure 3 reveals that the behavior of the remaining
template rules has either been covered in this lesson or in a previous
lesson.  Therefore, I will terminate this discussion of the XSLT
transformation at this point and discuss a Java program that mimics the
behavior of this XSLT transformation.

The
Java Code Transformation

At this point, I will change
direction and
concentrate on Java code instead of XSLT elements.  The
following paragraphs describe a Java program named Dom03, which emulates the XSLT
transformation described above.  This program transforms an XML
file into an XHTML file using a combination of recursive and iterative
processing.  Along the way, it creates and populates an XHTML
table.

This program defines a new method named forEach that mimics the behavior of
the xsl:for-each element
described above.  In addition, this program adds code to the processDocumentNode and processNode
methods to emulate the template rules in the XSL file named Dom03.xsl.

Also, as was the case in the previous lessons, this program implements
six built-in template rules
for an XML processor.

Instructions
for creating a custom template rule

To create a custom template rule for this program:

  • Go to the processNode method.
  • Identify the node type.
  • Change the conditional clause
    in the if statement to
    implement the required match.
  • Write code in the body of the if statement to implement the
    custom rule.

If the modified conditional clause
evaluates to true, the custom rule will be executed.  If
the modified conditional clause evaluates
to
false, the default rule
will be executed.  You will see examples of several custom
template rules
in this program.

Behavior of the
program

This program compares the transformation of a specified XML file into a
result file, using two different approaches:

  1. An XSLT style sheet and
    transformation, as discussed above.
  2. Program code that emulates the
    behavior of the XSLT transformation.

In particular, this program
illustrates Java code that emulates the XSLT templates in the file
named Dom03.xsl.

Both output
files are valid

The program produces two output files, one from the XSLT
transformation,
and one from executing the Java code.  Both files validate as
XHTML transitional at the W3C validation service,
http://validator.w3.org/file-upload.html.

Both also validate as HTML files at
http://www.htmlhelp.com/tools/validator/upload.html.

Finally, both files validate using the program named DomTree02, which means that they
validate as XML under JAXP.

Usage
instructions

The program requires three command line arguments in the following
order:

  1. The name of the input XML file
    – must be Dom03.xml.
  2. The name of the output file to
    be produced by the XSLT transformation.
  3. The name of the output file to
    be produced by the program code that emulates the XSLT transformation.

The name of the XSL stylesheet file
is extracted from the processing instruction in the XML file, but you
could easily modify the program to obtain the name of that file from a
command-line argument.

Order of execution

The program begins by executing code to transform the incoming XML file
in a way that mimics the XSLT Transformation.  Along the way, it
saves the processing instructions containing the ID of the stylesheet
file for use by the XSLT transformation process later.  Otherwise,
the code that
performs the XSLT transformation would have to search the DOM
tree for the XSL stylesheet file.

Then the program uses the XSLT style sheet to transform the XML file
into a result file by performing an XSLT transformation under program
control.

Errors,
exceptions, and testing

No effort was made to provide meaningful information about errors and
exceptions.

The program was tested using SDK 1.4.2 under WinXP.

Will discuss in
fragments


I will discuss this program in fragments.  A complete listing of
the program is shown in Listing 23 near the end of the lesson.

Much of the code in this program is very similar to, or identical to
code that I discussed in previous lessons.  I will discuss that
repetitious code only briefly, if at all.

The main method

Listing 8 shows an
abbreviated version of the beginning of the class named Dom03 and the ending of the main method.

public class Dom03{
//Code deleted for brevity

//In main method
//Process the DOM tree
thisObj.processDocumentNode(document);

//Perform XSLT transformation
thisObj.doXslTransform(
document,argv[1],procInstr);

//Exception handling code deleted for brevity
}// end main()

Listing 8

The code in this portion of the
program is identical to code that I discussed in detail in previous
lessons, so I won’t discuss it further.  I included it here
solely to establish the context for discussion of code that is to
follow. 

Behavior of
this code

Briefly, the code in the main
method does the following:

  • Performs all the steps
    necessary to parse the input XML file, producing an object of type Document whose reference is saved in
    a reference variable named document.
  • Instantiates an object of the Dom03 class and saves its
    reference in a reference variable named thisObj.
  • Invokes the method named processDocumentNode on thisObj to transform the
    DOM tree to an output file using program code to perform the
    transformation.
  • Invokes the method named doXslTransform on thisObj to perform an XSLT
    transformation using an XSL stylesheet.

The methods named processDocumentNode and doXslTransform are methods of my own
design.

The
processDocumentNode method

The beginning of the processDocumentNode
method is shown in Listing 9.  This version of the method is much
longer than versions discussed in previous lessons.

  void processDocumentNode(Node node){
//Create the beginning of the XHTML document
out.println("<?xml version="1.0" "
+ "encoding="UTF-8"?>");
out.println(
"<!DOCTYPE html PUBLIC "-//W3C//DTD "
+ "XHTML 1.0 Transitional//EN" "
+ ""http://www.w3.org/TR/xhtml1/"
+ "DTD/xhtml1-transitional.dtd">");

Listing 9

However, even though this version is much longer, there is nothing in
the method that should be a stretch for capable Java programmers. 
All of the new code in this method is in the form of print statements
to cause appropriate XHTML text to appear in the output.

Produces all required output text

This method is used to produce any text required in the output at
the document level, such as the XML declaration for an XML
document, or the DTD reference for an XHTML document.  As you can
see from Listing 9, the code in this method does both.

The code in Listing 9 writes an XML declaration, and then writes XHTML
text into the output that matches text produced by the green xsl:output element in Figure
3.  I have already discussed the need for the XHTML DTD in the
XHTML file, so I won’t discuss it further here.

The start tag for the html root element

The code in Listing 10 writes the start tag for the html root element of the XHTML
document.  Then it writes the XML namespace attribute in the
output.

(The
stylesheet shown in Figure 3 doesn’t write an XML namespace attribute
for reasons that I explained earlier.)


    out.println("<html xmlns="http://www.w3."
+ "org/1999/xhtml" xml_lang="en""
+ " lang="en">");
out.println("<head>");
out.println(
"<meta http-equiv="content-type" "
+ "content="text/html; charset="
+ "UTF-8"/>");
out.println("<title>Generated XHTML file"
+ "</title>");
out.println("</head>");
out.println("<body>");
//Output similar to the above applies to
// most XHTML documents.

//Now set up an XHTML table. This is
// peculiar to this particular example.
out.println("<table border="2" " +
"cellspacing="0" " +
"cellpadding="0" " +
"width="330" " +
"bgcolor="#FFFF00">" +
"<tr><td>");

Listing 10

Following this, the code in Listing
10 writes the same XHTML text in the output that is written by the
first red template rule in Figure 3.

Invoke the processNode method

Then the code in Listing 11 invokes the processNode method to trigger a
recursive process that processes the entire DOM tree.

    processNode(node);

//Finish the XHTML table. This output is
// peculiar to this particular example.
out.println("</td></tr></table>");

//Now finish the output document and flush
// the output buffer. This would apply to
// most XHTML documents.
out.println("</body></html>");
out.flush();
}//end processDocumentNode

Listing 11

When the processNode method returns, the code
in Listing 11 writes XHTML text into the output consisting of end tags
for the table, the body, and the document.  That completes the
production of the XHTML document, so the code in Listing 11 flushes the
output buffer to assure that everything is written into the file.

Invoke the doXslTransform method

Then the processDocumentNode
method terminates and returns control to the main method in Listing 8.  At
that point, the doXslTransform
method is invoked to perform an XSLT transformation on the XML file
using the stylesheet discussed earlier in this lesson.

Quite a lot of code was added to the processDocumentNode
method, but as
mentioned earlier, all of that code was added simply to write XHTML
text into the output at the document level.  All of the changes to
the program that were significant from a programming viewpoint were
either included in the processNode method,
or were part of a new method named forEach.

Invoke the
processNode method

Despite the name that I chose to give to the processDocumentNode method, it
doesn’t actually process the document node directly.  Rather after
sending any required text to the output, it invokes the
method named processNode (see Listing 11) to
actually process the document node.

(Note
that the Document object’s
reference is passed to the method named processNode in Listing 11.)

The processNode
method

As you have learned in previous lessons, there are seven possible types
of nodes in an XML document:

  1. root or document node
  2. element node
  3. attribute node
  4. text node
  5. comment node
  6. processing instruction node
  7. namespace node

The processNode method handles
the first six types and ignores namespace nodes.

(Apparently
it is not possible to handle namespace nodes in a Java program because
there is no constant in the
Node class that can be used to identify
namespace nodes.  This will become clear as we examine the
code in the processNode
method.)

Get and save
the node type

The processNode method in this
program contains quite a few changes relative to the programs that I
discussed in previous lessons.  Therefore, I will discuss the processNode method in detail.

Code that you write in this method (and
in the processDocumentNode
method discussed above)
is somewhat analogous to writing an XSL
stylesheet to be used in an XSLT transformation.

Test for a
valid node, and get its type



The beginning of the processNode
method is shown in Listing 12.  The method receives an
incoming parameter of type Node,
which can represent any of the seven types of nodes in the above list.

As you can see in Listing 12, if the parameter doesn’t point to an
actual object, the method quietly
returns, as opposed to throwing a NullPointerException.

  void processNode(Node node){

try{
if (node == null){
System.err.println(
"Nothing to do, node is null");
return;
}//end if

//Get the actual type of the node
int type = node.getNodeType();

Listing 12

The final statement in Listing 12 invokes the getNodeType method to get and save
the type of the node whose reference was received as an incoming
parameter.

Process the node

Each time the processNode
method is invoked, it receives a Node
object’s reference as an incoming parameter.  The code in Listing
12 determines the type of the incoming node.  Listing 13 shows the
beginning of a switch
statement that is used to initiate the processing of each incoming node
based on its type.

      switch (type){
case Node.DOCUMENT_NODE:{
if(false){
//unreachable in this program
}else{//invoke default behavior
defElOrRtNodeTemp(node);
}//end else
break;
}//end case DOCUMENT_NODE

Listing 13

The switch statement has six
cases to handle six types of nodes, plus a default case to ignore
namespace nodes.

The
DOCUMENT_NODE case

The code in Listing 13 will be executed whenever the incoming method
parameter points to a document node.

(Note
that this will happen only once during the processing of a DOM
tree.  The first node processed will always be the document node,
and there is only one document node in a DOM tree.)

This code is identical to code that I have discussed in previous
lessons, so I won’t discuss it further.  I included it here solely
to help you get oriented as to the overall control structure of the processNode method.

I do want to point out, however, that when the processNode method is invoked on a
document node, the code in Listing 13 causes a method named defElOrRtNodeTemp to be
invoked.  This method emulates the behavior of a built-in template
rule, which in this case causes templates to be applied to all child
nodes of the document node.

Creating custom
template rules

Although this lesson does not create a custom template rule for
document nodes, the process for creating a
custom template rule is as follows:

  • Go to this method named processNode.
  • Identify the case for the node
    type in the switch statement.
  • Change the conditional clause
    in the if statement for that
    case to
    implement a match for a particular node of that type.
  • Write code in the body of the if statement to implement the custom
    template rule.

If the modified conditional clause
evaluates to true, the custom template rule will be executed.  If
it evaluates to false, the
default rule will be executed.

The
ELEMENT_NODE case

Most of the changes to this program (as
compared to programs discussed in previous lessons)
consist of
changes to the code that
processes element nodes in the switch
statement.  The code for element node case is rather long, so I
will
discuss it in fragments.

(A
new method named forEach was
also added to the program.  I will discuss that method in detail
later.)


A match for
element nodes named B

The beginning of the case for element nodes is shown in Listing 14.

        case Node.ELEMENT_NODE:{

if(node.getNodeName() == "B"){
applyTemplates(node,null);
}//end if

Listing 14

Note the similarity of the code in
Listing 14 and the XSLT template rule shown in Listing 4.  When
the node being is processed is an element node whose name is B, the code in Listing 14 invokes
the applyTemplates method to
cause templates to be applied to all child nodes of the node named B.

I discussed the applyTemplates
method in earlier lessons, and won’t repeat that discussion here.

A match for
element nodes named C

Listing 15 shows code that
matches element nodes named C.


          else if(node.getNodeName() == "C"){
out.println("<p>");
applyTemplates(node,null);
out.println("</p>");
}//end if

Listing 15

This code applies templates to all
child nodes of the node named C,
and wraps the output produced by that operation in an XHTML paragraph
element, <p>…</p>.

Compare the code in Listing 15 with the second red XSLT template rule
in Figure 3.

A match for
element nodes named D

Listing 16 shows code that matches element nodes named D.

          else if(node.getNodeName() == "D"){
out.println("List of items in E");
out.println("<ul>");
forEach(node,"E");
out.println("</ul>");

out.println("List of items in F");
out.println("<ol>");
forEach(node,"F");
out.println("</ol>");
}//end if

Listing 16

I’ll start my discussion of the code
in Listing 16 by comparing it with the template rule shown in Listing
5.  The behavior of this code is the same as the behavior of the
template rule in Listing 5.  However, the execution structure is
slightly different.

The code in Listing 16 begins by sending some text followed by the
start tag for an unordered list to the output.  Then it invokes
the forEach method, passing
the context node and the name of the child node named E as parameters.

The forEach
method

The entire forEach method is
shown in Listing 17. 
This method, in conjunction with the processNode
method, emulates the behavior of an xsl:for-each
XSLT element.

  private void forEach(Node node,String select){
NodeList children = node.getChildNodes();
if (children != null){
int len = children.getLength();
//Iterate on NodeList of child nodes,
// processing nodes that match the select.

for (int i = 0; i < len; i++){
if(children.item(i).getNodeName().
equals(select)){
//Make a recursive call from within
// this iterative template rule.
processNode(children.item(i));
}//end if
}//end for loop
}//end if
}//end forEach

Listing 17

If you have been studying the
previous lessons in this series, the structure of the method should be
familiar to you.

The structure of the forEach method

The method receives two parameters:

  • A reference to a particular
    node of type Node.
  • The name of a node that should
    be a child node of the node.

The purpose of the method is to
access each child node that matches the name, in the order in which
they appear in the DOM tree, and to apply a particular operation to
each of those nodes.

(A future lesson will show you how to
access the nodes in sorted order.)


Get and iterate on a list of child nodes

The code in Listing 17 starts by getting a list of all the child nodes
of the node referenced by the first incoming parameter.

Then it iterates on the list, identifying those nodes whose names match
the second incoming parameter.  When it finds a match, it makes a
recursive call to the processNode
method where the operation to be applied to that node is defined.

When it has processed all the nodes in the list, it returns void to the
code shown in Listing 16.

Process all child nodes named E

The first time this method is called in Listing 16, it is instructed to
identify and perform an operation on all the child nodes named E.  When the forEach method calls the processNode method, passing an E node’s reference as a parameter,
the code shown in Listing 18 is executed.  (Note that this is part of the element
node case in the switch
statement belonging to the processNode
method.)


          else if(node.getNodeName() == "E"){
out.println("<li>");
applyTemplates(node,null);
out.println("</li>");
}//end if

Listing 18

Note that I could have put the code
in Listing 18 inside the forEach
method.  However, I elected to do it the way that I did to make
the forEach method more
general, and confine all the code for custom template rules to the processDocumentNode and processNode methods.

As you can see, the code in Listing 18 causes templates to be applied
to all child nodes of the node named E,
and causes the output produced by that operation to be surrounded by
the start and end tags for an XHTML list item, <li>…<li>.

Finished with nodes named E

That completes the operation necessary to emulate the template rule in
Listing 5 for nodes named E,
and completes the top half of the code being executed in Listing 16.

Process all child nodes named F

The bottom half of the code in Listing 16 does essentially the same
thing, except that it iterates on child nodes named F and wraps the results in the XHTML
tags for an ordered list, <ol>..</ol>.

In this case, the forEach
method will isolate nodes named F
and pass them recursively to the processNode
method. 

At that point, the code in Listing 19 will be executed with exactly the
same behavior as the code in Listing 18, except that it is applied to
nodes named F instead of nodes
named E.

          else if(node.getNodeName() == "F"){
out.println("<li>");
applyTemplates(node,null);
out.println("</li>");
}//end if

Listing 19

A match for element
nodes named G

Listing 20 shows custom code that applies to nodes named G.

          else if(node.getNodeName() == "G"){
out.println("<b>");
applyTemplates(node,null);
out.println("</b>");
}//end if

Listing 20

This code applies templates to the
child nodes of nodes named G,
and wraps the output from that operation in the XHTML tags for bold, <b>…</b>.

Compare this code to the template rule shown in Listing 7.

A match for
elements Q, R, S, and T

Listing 21 shows custom code that applies to nodes named Q, R,
S, and T.

          //Create four levels of XHTML headers
else if(node.getNodeName() == "Q"){
out.println("<h1>");
applyTemplates(node,null);
out.println("</h1>");
}//end if

else if(node.getNodeName() == "R"){
out.println("<h2>");
applyTemplates(node,null);
out.println("</h2>");
}//end if

else if(node.getNodeName() == "S"){
out.println("<h3>");
applyTemplates(node,null);
out.println("</h3>");
}//end if

else if(node.getNodeName() == "T"){
out.println("<h4>");
applyTemplates(node,null);
out.println("</h4>");
}//end if

Listing 21

Similar
blocks of code

The four blocks of code are very
similar.  Each block of code applies templates to the matching
node type, and surrounds the output from that operation with the XHTML
tags for a header, such as <h1>…</h1>
However, the size of the header differs from one to the next.

Compare the block of code in Listing 21 that matches Q with the template rule in Listing
3.  Compare all four of the code blocks to the last four template
rules in Figure 3.

Processing
nodes with no match

This XML document contains several nodes for which there is no matching
template in the stylesheet and no matching code block in this program,
including the root element node named A.

Whenever the XSLT processor encounters an element node for which there
is no matching template rule, it executes a built-in rule for element
nodes.

When this program encounters an element node for which there is no
matching code block in the element node case of the switch statement, it executes the
code shown in Listing 22.

          else{//invoke default behavior
defElOrRtNodeTemp(node);
}//end else
break;
}//end case ELEMENT_NODE

Listing 22

As you can see, the code in Listing
22 invokes the method named defElOrRtNoteTemp,
passing the unmatched node as a parameter.  This is a method that
mimics the built-in behavior of the XSLT processor.  I discussed
it in detail in an earlier lesson, and won’t repeat that discussion
here.

The
remainder
of the processNode method


That completes the discussion of the case for elements nodes in the switch statement of the processNode method.  That
leaves the following cases not yet discussed:

  • Text nodes
  • Attribute nodes
  • Comment nodes
  • Processing instruction nodes
  • Namespace nodes (default case)

No new code
for these nodes

However, there is no new code in the
cases for these nodes in comparison with the code discussed in previous
lessons.  Therefore, I won’t repeat that discussion in this lesson.

That completes the discussion of the processNode
method, and leaves the following methods not yet discussed:

  • main
  • defTextOrArrrTemp
  • defElOrRtNodeTemp
  • defComOrProcInstrTemp
  • applyTemplates
  • valueOf
  • doXslTransform

However, these methods are identical to methods having the same name
that I discussed in detail in earlier lessons.  I won’t repeat
that discussion in this lesson.

The
program output

The output produced by this program is
essentially the same as the XSLT transform output discussed in the
early part of the lesson.  The output shown in rendered form in
Figure 2, and in raw XHTML form in Figure 4 represents the output
of both the program and the XSLT transform.

Run the Program

I encourage you to copy the Java code, XML file, and XSL file from
the listings near the end of this lesson.  Compile and execute the
program.  Experiment with the files, making changes, and observing
the
results
of your
changes.

Summary

In this lesson, I showed you how to use XSLT to transform an XML
document into an XHTML document.  I also showed you how to
write Java code to perform the same transformation.

What’s Next?

The next several lessons in this series will illustrate parallel
Java code
and XSLT transformations to transform XML documents into XHTML
documents.  The sample programs will illustrate various aspects of
the manipulation of a DOM tree using Java code.

Complete Program Listings


Complete listings of the various files discussed in this lesson are
contained in the listings that follow.

/*File Dom03.java
Copyright 2003 R.G.Baldwin

Ths program transforms an XML file into an XHTML
file using a combination of recursive and
iterative processing.

New material added to this lesson includes a
method that emulates an xsl:for-each template
rule.

This program compares the transformation of an
XML file to an XHTML file using two different
approaches:

1. An XSLT style sheet
2. Program code that emulates the behavior of the
XSL transformation.

Two XHTML files are produced, one by the XSL
transformation and one by the program code.

Both output files validate as XHTML at
http://validator.w3.org/file-upload.html

Both also validate as HTML at
http://www.htmlhelp.com/tools/validator/
upload.html

Both also validate using the program named
DomTree02.java,which means that they validate as
XML under JAXP.

The program requires three command line
parameters in the following order:
1, The name of the input XML file - must be
Dom03.xml
2. The name of the XHTML output file to be
produced by the XSL transformation.
3. The name of the XHTML output file to be
produced by the program code that emulates
the XSL transformation.

This program implements all six built-in default
template rules for an XML processor. In
addition, it implements several other
template rules that are required to support
the built in rules, such as xsl:value-of and
xsl:for-each.

The program creates several custom template
rules.

To create a custom temtlate rule:
1. Go to the processNode method.
2. Identify the node type.
3. Change the conditional clause in the if
statement to implement the match.
4. Write code in the body of the if statement to
implement the custom rule.

If the modified conditional clause evaluates to
true, the custom rule will be executed. If
false,the default rule will be executed.

In particular, this program illustrates Java code
that emulates the XSLT templates in the file
named Dom03.xsl.

The name of the XSL stylesheet file is extracted
from the processing instruction in the XML file.

The program begins by executing code to transform
the incoming XML file in a way that mimics the
XSL Transformation. Along the way, it saves the
processing instructions containing the ID of the
stylesheet file for use by the XSLT process
later. Otherwise, the code that performs the
XSL transformation later would have to search the
DOM tree for the XSL stylesheet file.

Then the program uses the XSLT style sheet to
transform the XML file into a result file.

This is not a general purpose program. This
program, and the XSLT file named Dom03.xsl
are specifically designed to transform the
contents of the file named Dom03.xml into an
XHTML file.

No effort was made to provide meaningful
information about errors and exceptions.

Tested with SDK 1.4.2 under WinXP.
************************************************/

import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;

import org.w3c.dom.*;

import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.*;

import java.util.*;
import java.io.*;

public class Dom03{

PrintWriter out;//output stream
//Save processing instruction nodes here
static Vector procInstr = new Vector();

public static void main(String argv[]){
if (argv.length != 3){
System.err.println(
"usage: java Dom03 "
+ "xmlFileIn "
+ "xformFileOut "
+ "codeFileOut");
System.exit(0);
}//end if

try{
//Get a factory object for DocumentBuilder
// objects
///
DocumentBuilderFactory factory =
DocumentBuilderFactory.newInstance();

//Configure the factory object. Change
// the following parameter to false for a
// non-validating parser.
///
factory.setValidating(true);
factory.setNamespaceAware(false);
//The following statement causes the parser
// to ignore cosmetic whitespace between
// elements.
///
factory.
setIgnoringElementContentWhitespace(true);

//Get a DocumentBuilder (parser) object
///
DocumentBuilder builder =
factory.newDocumentBuilder();

//Parse the XML input file to create a
// Document object that represents the
// input XML file.
///
Document document = builder.parse(
new File(argv[0]));

//Instantiate an object of this class
///
Dom03 thisObj = new Dom03();

//TRANSFORMATION THROUGH PROGRAM CODE
//Use program code to transform the
// DOM tree into an output file.
//
//Get an output stream for the output
// produced by the program code. This
// stream object is used by several
// methods, so it was instantiated at this
// point and saved as an instance variable
// of the object.
///
thisObj.out = new PrintWriter(
new FileOutputStream(argv[2]));

//Process the DOM tree, beginning with the
// Document node to produce the output.
// Invocation of processDocumentNode starts
// a recursive process that processes the
// entire DOM tree.
///
thisObj.processDocumentNode(document);


//XSLT TRANSFORMATION
//Use XSLT to transform the DOM tree into
// an output file. Note that the success
// of this method call depends on the
// stylesheet processing instruction having
// been saved while the transformation was
// being performed using program code
// above. Otherwise, it would be necessary
// to include the code in this method to
// search the DOM tree for the stylesheet
// processing instruction. All processing
// instructions are saved in a Vector
// object, which is passed as the third
// parameter to this method.
///
thisObj.doXslTransform(
document,argv[1],procInstr);

}catch(Exception e){
//Note that no effort was made to provide
// meaningful results in the event of an
// exception or error.
///
e.printStackTrace(System.err);
}//end catch
}// end main()
//-------------------------------------------//

//This method is used to produce any text
// required in the output at the document
// level, such as the XML declaration for an
// XML document.
void processDocumentNode(Node node){
//Create the beginning of the XHTML document
out.println("<?xml version="1.0" "
+ "encoding="UTF-8"?>");
out.println(
"<!DOCTYPE html PUBLIC "-//W3C//DTD "
+ "XHTML 1.0 Transitional//EN" "
+ ""http://www.w3.org/TR/xhtml1/"
+ "DTD/xhtml1-transitional.dtd">");
out.println("<html xmlns="http://www.w3."
+ "org/1999/xhtml" xml_lang="en""
+ " lang="en">");
out.println("<head>");
out.println(
"<meta http-equiv="content-type" "
+ "content="text/html; charset="
+ "UTF-8"/>");
out.println("<title>Generated XHTML file"
+ "</title>");
out.println("</head>");
out.println("<body>");
//Output similar to the above applies to
// most XHTML documents.

//Now set up an XHTML table. This is
// peculiar to this particular example.
out.println("<table border="2" " +
"cellspacing="0" " +
"cellpadding="0" " +
"width="330" " +
"bgcolor="#FFFF00">" +
"<tr><td>");

//Go process the root (document) node. This
// method call triggers a recursive process
// that processes the entire DOM tree.
processNode(node);

//Finish the XHTML table. This output is
// peculiar to this particular example.
out.println("</td></tr></table>");

//Now finish the output document and flush
// the output buffer. This would apply to
// most XHTML documents.
out.println("</body></html>");
out.flush();
}//end processDocumentNode
//-------------------------------------------//

//There are seven kinds of nodes:
// root or document
// element
// attribute
// text
// comment
// processing instruction
// namespace
//
//This method handles the first six.
// Apparently it is not possible to handle
// namespace nodes in Java because there is
// no constant in the Node class to identify
// namespace nodes
///
void processNode(Node node){

try{
if (node == null){
System.err.println(
"Nothing to do, node is null");
return;
}//end if

//Process the incoming node based on its
// type.
///
int type = node.getNodeType();

//To define an overriding template rule,
// insert the matching condition in the
// conditional clause of the if statement,
// and provide code to implement the rule
// in the body of the if statement. If the
// conditional clause evaluates to true,
// the default rule for that element type
// will not be processed.
///
switch (type){
case Node.TEXT_NODE:{
if(true){
out.println(node.getNodeValue());
}else{//invoke default behavior
//This won't be reached in this
// example, but I will leave it
// here as a reminder of the
// default behavior.
out.print(defTextOrAttrTemp(node));
}//end else
break;
}//end case Node.TEXT_NODE

case Node.ATTRIBUTE_NODE:{
if(false){
//Change conditional and write
// overriding handler here
///
}else{//invoke default behavior
out.print(defTextOrAttrTemp(node));
}//end else
break;
}//end case Node.ATTRIBUTE_NODE

case Node.ELEMENT_NODE:{
if(node.getNodeName() == "B"){
//Process all XML child nodes
// recursively
applyTemplates(node,null);
}//end if

else if(node.getNodeName() == "C"){
//Begin XHTML paragraph element
out.println("<p>");
//Process all XML child nodes
// recursively
applyTemplates(node,null);
//End XHTML paragraph element
out.println("</p>");
}//end if

else if(node.getNodeName() == "D"){
//First process the child nodes
// named E.
out.println("List of items in E");
//Begin XHTML unordered list
out.println("<ul>");
//Iteratively put text from E
// elements and their children into
// the list.
forEach(node,"E");
//End XHTML unordered list
out.println("</ul>");

//Now process the child nodes
// named F.
out.println("List of items in F");
//Begin XHTML ordered list
out.println("<ol>");
//Iteratively put text from F
// elements and their children in the
// list.
forEach(node,"F");
//End XHTML ordered list
out.println("</ol>");
}//end if

else if(node.getNodeName() == "G"){
//Display children as XHTML bold
out.println("<b>");
applyTemplates(node,null);
out.println("</b>");
}//end if

//Create four levels of XHTML headers
else if(node.getNodeName() == "Q"){
out.println("<h1>");
applyTemplates(node,null);
out.println("</h1>");
}//end if

else if(node.getNodeName() == "R"){
out.println("<h2>");
applyTemplates(node,null);
out.println("</h2>");
}//end if

else if(node.getNodeName() == "S"){
out.println("<h3>");
applyTemplates(node,null);
out.println("</h3>");
}//end if

else if(node.getNodeName() == "T"){
out.println("<h4>");
applyTemplates(node,null);
out.println("</h4>");
}//end if

//The following rules for E and F
// are invoked as a result of the
// behavior of the forEach method. The
// code could have been placed inside
// the forEach method. However, I
// elected to put it here in an attempt
// to confine all of the custom code
// to the methods named processNode and
// processDocumentNode.
else if(node.getNodeName() == "E"){
//Create an XHTML list item
// containing information from child
// nodes.
out.println("<li>");
applyTemplates(node,null);
out.println("</li>");
}//end if

else if(node.getNodeName() == "F"){
//Create an XHTML list item
// containing information from child
// nodes.
out.println("<li>");
applyTemplates(node,null);
out.println("</li>");
}//end if

else{//invoke default behavior
defElOrRtNodeTemp(node);
}//end else
break;
}//end case ELEMENT_NODE

case Node.DOCUMENT_NODE:{
if(false){
//Change conditional and write
// overriding handler here
///
}else{//invoke default behavior
defElOrRtNodeTemp(node);
}//end else
break;
}//end case DOCUMENT_NODE

case Node.COMMENT_NODE:{
if(false){
//Change conditional and write
// overriding handler here
///
}else{//invoke default behavior
defComOrProcInstrTemp(node);
}//end else
break;
}//end case COMMENT_NODE

case Node.PROCESSING_INSTRUCTION_NODE:{
if(false){
//Change conditional and write
// overriding handler here
}else{//invoke default behavior
//First save proc instr for later
// use.
procInstr.add(node);
//Now invoke default behavior.
defComOrProcInstrTemp(node);
}//end else
break;
}//end case PROCESSING_INSTRUCTION_NODE

default:{
//Ignore all other node types.
}//end default

}//end switch

}catch(Exception e){
e.printStackTrace(System.err);
}//end catch
}//end processNode(Node)
//-------------------------------------------//

//This method emulates the following default
// template rule:
// <xsl:template match="text()|@*">
// <xsl:value-of select="."/>
// </xsl:template>
///
String defTextOrAttrTemp(Node node)
throws Exception{
int nodeType = node.getNodeType();
if((nodeType == Node.ATTRIBUTE_NODE)
|| (nodeType == Node.TEXT_NODE)){
//Get and return the value of the context
// node.
///
return valueOf(node,".");
}else{
throw new Exception(
"Bad call to defaultTextOrAttr method");
}//end else
}//end defaultTextOrAttr
//-------------------------------------------//

//This method emulates the following default
// template rule:
// <xsl:template match="*|/">
// <xsl:apply-templates/>
// </xsl:template>
///
void defElOrRtNodeTemp(Node node)
throws Exception{
int nodeType = node.getNodeType();
if((nodeType == Node.ELEMENT_NODE) ||
(nodeType == Node.DOCUMENT_NODE)){
//Note that the following is a recursive
// method call.
///
applyTemplates(node,null);
}else{
throw new Exception(
"Bad call to defElOrRtNodeTemp");
}//end else
}//end defElOrRtNodeTemp
//-------------------------------------------//

//This method emulates the following default
// template rule:
// <xsl:template
// match="processing-instruction()|comment()"
///
String defComOrProcInstrTemp(Node node)
throws Exception{
int nodeType = node.getNodeType();
if((nodeType == Node.COMMENT_NODE) ||
(nodeType ==
Node.PROCESSING_INSTRUCTION_NODE)){
//According to page Nutshell pg 148, the
// default rule for comments and processing
// instructions doesn't output anything
// into the result tree.
///
return "";//empty string
}else{
throw new Exception("Bad call to " +
"defalutCommentOrProcInstrTemplate");
}//end else
}//end defComOrProcInstrTemp
//-------------------------------------------//

//See Nutshell, pg 148 for an explanation as to
// why it is not possible to write a Java
// method that emulates the default namespace
// template.
///
void defaultNamespaceTemplate(Node node)
throws Exception{
throw new Exception("See Nutshell pg 148" +
"regarding default behavior for " +
"namespace template.");
}//end defaultNamespaceTemplate
//-------------------------------------------//

//Simulates an XSLT apply-templates rule.
// <xsl:apply-templates
// optional select = "..."
// optional mode = "..."
// >
//Note that the mode attribute is not supported
// in this version.
//If the select parameter is null, all child
// nodes are processed.
void applyTemplates(Node node,String select){
NodeList children = node.getChildNodes();
if (children != null){
int len = children.getLength();
//Iterate on NodeList of child nodes.
for (int i = 0; i < len; i++){
if((select == null) ||
(select.equals(children.item(i).
getNodeName()))){
//Note that the following is a
// recursive method call.
///
processNode(children.item(i));
}//end if
}//end for loop
}//end if children != null

}//end applyTemplates
//-------------------------------------------//

//This method simulates an XSLT
// <xsl:value-of select="???"/>
// The general form of the method call is
// valueOf(Node theNode,String select)
//
//The method recognizes three forms of call:
// valueOf(Node theNode,String "@attrName")
// valueOf(Node theNode,String ".")
// valueOf(Node theNode,String "nodeName")
//
//In the first form, the method returns the
// text value of the named attribute of
// theNode. An attribute is specified by a
// select value that begins with @. If the
// attribte doesn't exist, the method returns
// an empty string.
//
//In the second form, the method returns the
// concatenated text values of descendants of
// the context node.
//
//In the third form, the method returns the
// concatenated text values of all descendants
// of a specified child node of the context
// node. If the context node has more than one
// child node with the specified name, only the
// first one found is processed. The others
// are ignored.
//
//The method does not support the following,
// which are standard features of xsl:value-of:
// disable-output-escaping
// processing instruction nodes
// comment nodes
// namespace nodes
///

public String valueOf(Node node,String select){

if(select != null
&& select.charAt(0) == '@'){
//This is a request for the value of an
// attribute. Returns empty string if the
// attribute doesn't exist on the element.
String attrName = select.substring(1);
NamedNodeMap attrList =
node.getAttributes();
Node attrNode = attrList.getNamedItem(
attrName);
if(attrNode != null){
return attrNode.getNodeValue();
}else{
return "";//empty string
}//end else
}//end if on @

else if(select != null
&& select.equals(".")){
//This is a request to process the context
// node
int nodeType = node.getNodeType();
if(nodeType == Node.ELEMENT_NODE){
//Process the context node as an element
// node. Return the concatenated text
// values of all descendants of the
// context node.
NodeList childNodes =
node.getChildNodes();
int listLen = childNodes.getLength();
String nodeTextValue = "";//result

for(int j = 0; j < listLen; j++){
nodeTextValue +=
valueOf(childNodes.item(j),".");
}//end for loop
return nodeTextValue;
}else if(nodeType == Node.TEXT_NODE){
//Process the context node as a text
// node. Simply get and return its
// value.
return node.getNodeValue();
}else{
//ignore all other context node types
}//end else
}//end if for context node

else if(select != null){
//Process a child node whose name is
// specified by the value of the incoming
// parameter named select. Get and return
// the concatenated text values of all
// descendants of the specified child node.
//This process assumes that there is only
// one child node with the specified name
// and processes the first one that it
// finds.
NodeList children = node.getChildNodes();
int len = children.getLength();
for (int i = 0; i < len; i++){
//Trap the specified child node
if(children.item(i).getNodeName().
equals(select)){
//Make a recursive call and let
// existing code do the work.
return valueOf(children.item(i),".");
//The above return statement causes any
// additional child nodes having the
// same name to be ignored.
}//end if getNodeName == select
}//end for loop on all child nodes
}//end else if(select != null)
//Will reach here only if value of select
// is null.
///
return "";//empty string
}//end method valueOf
//-------------------------------------------//

//This method simulates an XSLT for-each
// template rule
private void forEach(Node node,String select){
NodeList children = node.getChildNodes();
if (children != null){
int len = children.getLength();
//Iterate on NodeList of child nodes,
// processing nodes that match the select.

for (int i = 0; i < len; i++){
if(children.item(i).getNodeName().
equals(select)){
//Make a recursive call from within
// this iterative template rule.
processNode(children.item(i));
}//end if
}//end for loop
}//end if
}//end forEach
//-------------------------------------------//

//This method uses an incoming XSLT stylesheet
// file to transform an incoming Document
// object into an output file. Note that the
// successful invocation of this method depends
// on the processing instruction containing the
// stylesheet having been saved in a Vector
// object that is received as an incoming
// parameter. Otherwise, this method would
// have to search the DOM for the stylesheet
// processing instruction.
///
void doXslTransform(Document document,
String outFile,
Vector procInstr)
throws Exception{
try{
//Get stylesheet ID from proc instr.
ProcessingInstruction pi = null;
boolean piFlag = false;
int size = procInstr.size();
//Search for a stylesheet in the Vector
// containing processing instruction nodes.
///
for(int i = 0; i < size; i++){
pi = (ProcessingInstruction)procInstr.
get(i);
if(pi.getTarget().startsWith(
"xml-stylesheet") && pi.getData().
startsWith("type="text/xsl"")){
//Looks like a good stylesheet.
///
piFlag = true;
break;
}//end if
}//end for loop
if(piFlag == false){//still false?
throw new Exception(
"No valid stylesheet");
}//end if
//Get the stylesheet file reference
///
String xslFile = pi.getData().
substring(pi.getData().indexOf(
"href=")+6);
//Eliminate the quotation mark at the end
///
xslFile = xslFile.substring(
0,xslFile.length()-1);

//Get a TransformerFactory object
///
TransformerFactory xformFactory =
TransformerFactory.newInstance();
//Get an XSL Transformer object based on
// the XSL file discovered above.
///
Transformer transformer =
xformFactory.newTransformer(
new StreamSource(
new File(xslFile)));
//Get a DOMSource object that represents
// the DOM tree.
///
DOMSource source = new DOMSource(document);

//Get an output stream for the output
// file.
///
PrintWriter xformStream = new PrintWriter(
new FileOutputStream(outFile));

//Get a StreamResult object that points to
// the output file. Then transform the DOM
// sending text to the output file.
///
StreamResult xformResult =
new StreamResult(xformStream);

//Do the transform
///
transformer.transform(source,xformResult);
}catch(Exception e){
e.printStackTrace(System.err);
}//end catch

}//end doXslTransform

}// class Dom03

Listing 23

NOTE:  IT WAS NECESSARY TO MANUALLY ENTER SOME
LINE BREAKS INTO THIS DOCUMENT TO FORCE IT TO
FIT INTO THE NARROW PUBLICATION FORMAT.

<?xml version="1.0"?>

<!-- File Dom03.xml
Copyright 2003 R. G. Baldwin
Illustrates recursive and
iterative transformation using
templates and for-each.-->

<!DOCTYPE A [
<!ELEMENT A (Q,B,B)*>
<!ELEMENT B (B | C | D | R | S | T)*>
<!ELEMENT C (#PCDATA)>
<!ELEMENT D (E | F)*>
<!ELEMENT E (#PCDATA | G)*>
<!ELEMENT F (#PCDATA)>
<!ELEMENT G (#PCDATA)>
<!ELEMENT Q (#PCDATA)>
<!ELEMENT R (#PCDATA)>
<!ELEMENT S (#PCDATA)>
<!ELEMENT T (#PCDATA)>
]>

<?xml-stylesheet type="text/xsl"
href="Dom03.xsl"?>

<A>
<Q>A Big Header</Q>

<B>
<C>Text block 1.</C>

<R>A Mid Header</R>

<C>Text block 2.</C>

<!--Following PI should be ignored by both
the XSLT and the coded processor.-->
<?processor ProcInstr="Dummy"?>

<S>A Small Header</S>
<B>
<C>Text block 3.</C>
</B>

<S>Another Small Header</S>
<B>
<C>Text block 4.</C>

<T>A Smallest Header</T>
<B>
<C>Text block 5.</C>

<D>
<E>First list item in E
<G>Nested G text element</G>
</E>
<F>First list item in F</F>
<E>Second list item in E</E>
<F>Second list item in F</F>
<E>Third list item in E</E>
<F>Third list item in F</F>
</D>

<C>Text block 6.</C>
</B>
<C>Text block 7.</C>
</B>

<R>Another Mid Header</R>
<C>Text block 8.</C>
</B>

<B>
<R>Another Mid Header in Another B</R>
<C>Text block 9.</C>
</B>
</A>

Listing 24

NOTE:  IT WAS NECESSARY TO MANUALLY ENTER SOME
LINE BREAKS INTO THIS DOCUMENT TO FORCE IT TO
FIT INTO THE NARROW PUBLICATION FORMAT.

<?xml version='1.0'?>

<!-- File Dom03.xsl
Copyright 2003 R. G. Baldwin
Illustrates recursive and
iterative transformation using
templates and for-each.-->

<xsl:stylesheet
version="1.0"
xmlns:xsl="http://www.w3.org/1999
/XSL/Transform" >

<xsl:output method="xml"
doctype-public="-//W3C//DTD
XHTML 1.0 Transitional//EN"
doctype-system="http://www.w3.
org/TR/xhtml1/DTD/xhtml1-transitional.dtd" />

<!--Match the root of the DOM tree-->
<xsl:template match="/">

<!--Note, would like to see the following
attribute on the html tag, but it causes
problems with the JAXP transformer.

xml:lang="en" lang="en">
-->

<html>

<head>
<meta http-equiv="content-type"
content="text/html; charset=UTF-8"/>
<title>Generated XHTML file</title>
</head>

<body>

<table border="2" cellspacing="0"
cellpadding="0" width="330"
bgcolor="#FFFF00" >
<tr>
<td>
<!--Process children of DOM root-->
<xsl:apply-templates/>
</td>
</tr>
</table>
</body>

</html>

</xsl:template>
<!-- End root match template -->

<xsl:template match="B">
<xsl:apply-templates />
</xsl:template>
<!-- End B match template -->

<xsl:template match="C">
<p>
<xsl:apply-templates />
</p>
</xsl:template>
<!-- End C match template -->

<xsl:template match="D">List of items in E
<ul>
<!-- loop -->
<xsl:for-each select="E">
<li>
<xsl:apply-templates />
</li>
</xsl:for-each>
<!-- End loop -->
</ul>List of items in F
<ol>
<!-- loop -->
<xsl:for-each select="F">
<li>
<xsl:apply-templates />
</li>
</xsl:for-each>
<!-- End loop -->
</ol>

</xsl:template>
<!-- End D match template -->

<xsl:template match="G">
<b>
<xsl:apply-templates />
</b>
</xsl:template>
<!-- End G match template -->

<!-- Header templates follow -->
<xsl:template match="Q">
<h1>
<xsl:apply-templates />
</h1>
</xsl:template>
<!-- End Q match template -->

<xsl:template match="R">
<h2>
<xsl:apply-templates />
</h2>
</xsl:template>
<!-- End R match template -->

<xsl:template match="S">
<h3>
<xsl:apply-templates />
</h3>
</xsl:template>
<!-- End S match template -->

<xsl:template match="T">
<h4>
<xsl:apply-templates />
</h4>
</xsl:template>
<!-- End T match template -->

</xsl:stylesheet>


Listing 25


Copyright 2004, Richard G. Baldwin.  Reproduction in whole or
in
part in any form or medium without express written permission from
Richard
Baldwin is prohibited.

About the author

Richard Baldwin
is a college professor (at Austin Community College in Austin, TX) and
private consultant whose primary focus is a combination of Java, C#,
and XML. In addition to the many platform and/or language independent
benefits of Java and C# applications, he believes that a combination of
Java, C#, and XML will become the primary driving force in the delivery
of structured information on the Web.

Richard has participated in numerous consulting projects, and he
frequently provides onsite training at the high-tech companies located
in and around Austin, Texas.  He is the author of Baldwin’s
Programming Tutorials, which
has gained a worldwide following among experienced and aspiring
programmers. He has also published articles in JavaPro magazine.

Richard holds an MSEE degree from Southern Methodist University
and has many years of experience in the application of computer
technology to real-world problems.

[email protected]

-end-

Latest Posts

Related Stories