|

Java JAXP, Transforming XML to XHTML
By Richard G. Baldwin
Java Programming Notes # 2210
Preface
In the previous lesson entitled Java
JAXP, Writing Java Code to Emulate an XSLT Transformation, I showed
you how to
write a Java program that mimics an XSLT transformation for converting
an XML file into a text file. I also showed that once you have a
library of Java
methods that
emulate XSLT elements, it is no more difficult to
write a Java program to transform an XML document than it is to
write an XSL stylesheet to transform the same document.
In this lesson, I will show you how to use XSLT to transform an XML
document into an XHTML document. I will also show you how to
write Java code that performs the same transformation.
This lesson is one in a series designed to teach you how to use JAXP
and Sun's Java Web Services Developer
Pack
(JWSDP).
The first lesson in the series was entitled Java
API for XML Processing (JAXP), Getting Started.
As mentioned above, the
previous lesson was entitled Java
JAXP, Writing Java Code to Emulate an XSLT Transformation.
JAXP, XML, XSL, XSLT, W3C, and XHTML, a
Review
JAXP is an
API designed
to help you write programs for creating and processing XML
documents. It is a critical part of Sun's Java Web Services Developer
Pack
(JWSDP).
XML is an acronym for the eXtensible
Markup Language.
I will assume that you already
understand
XML, and will teach you how to use JAXP to write programs for
creating and processing XML documents.
XSL is an acronym for Extensible Stylesheet language.
XSLT is an acronym for XSL Transformations.
The numerous uses of XSLT include the following:
- Transforming non-XML documents into XML documents.
- Transforming XML documents into other XML documents.
- Transforming XML documents into non-XML documents.
This
lesson explains a Java program
that transforms an XML document into an XHTML document.
An XHTML document is an XML
document that provides a rigorous alternative to the use of an HTML document. According to
the W3C, XHTML 1.0 is a "Reformulation of HTML 4 in XML 1.0."
Viewing tip
You may find it useful to open another copy of this lesson in a
separate browser window. That will make it easier for you to
scroll back and forth among the different listings and figures while
you are reading about them.
Supplementary material
I recommend that you also study the other lessons in my extensive
collection of online Java and XML tutorials. You will find those
lessons
published at Gamelan.com.
As of the date of this writing, Gamelan doesn't maintain a
consolidated index of my tutorial lessons, and sometimes
they are difficult to locate there. You will find a consolidated
index at www.DickBaldwin.com.
A tree structure in memory
A DOM parser can be used to
create a tree structure in memory that represents an XML
document. In Java, that tree structure is encapsulated in an
object of the interface type Document.
Many operations are possible
Given an object of type Document (often called a DOM tree), there
are many
methods that
can be invoked on the object to perform a variety of operations.
Two ways to
transform an XML document
There are at least two ways to transform the contents of an XML
document into another document:
- By writing Java code to manipulate the DOM tree and perform the
transformation.
- By using XSLT to perform the transformation.
A skeleton
library of Java methods
This is one of several lessons that show you
how to write the skeleton of a Java library containing methods that
emulate the most common XSLT elements. Once you have the library,
writing Java code to transform XML documents consists mainly of writing
a short driver program to access those methods. Given the proper
library of methods, it is no more difficult to write a
Java program to perform the transformation than it is to write
an
XSLT stylesheet.
Library is
not my primary purpose
However, my primary purpose in these lessons is not to provide such
a library, but rather is to help you understand how to use a DOM
tree to create, modify, and manipulate XML documents. By
comparing Java code that manipulates a DOM tree with similar XSLT
operations, you will have an opportunity to learn a little about XSLT
in the process of learning how to manipulate a DOM tree using Java code.
Some
Details Regarding XHTML
XHTML documents, a special case
An XHTML document is an XML document. It is a rigorous
alternative to an HTML document.
One of
the interesting
uses of XSLT is the transformation of XML documents into
XHTML documents. This
makes it possible to render the information contained in an XML
document using an XHTML-compatible Web browser.
Where does the transformation take place?
When transforming an XML document for rendering
with an XHTML browser, the transformation can take place anywhere
between the
source of the XML document and the browser.
Transforming on the server
For example, a transformation program can be written in Java and run
on a web
server as a
servlet, or it can be written as a JavaBeans component and accessed
from a scriptlet in JavaServer pages (JSP).
Transforming at the browser
The transformation can also be performed by the browser. For
example, Microsoft IE 6.0 and XSLT can be used for this
purpose.
Will
transform XML into XHTML
This and the next several lessons will illustrate parallel Java code
and XSLT transformations to transform XML documents into XHTML
documents. The sample programs will illustrate various aspects of
the manipulation of a DOM tree using Java code.
Requirements
for XHTML documents
According to Web Design
& Development Using XHTML by Griffin, Morales, and Finnegan, an
XHTML document differs from an HTML document in the following ways:
- XHTML documents must be well-formed.
- Element and attribute names must be in lower case.
- Non-empty elements require end tags.
- Attribute values must always be quoted.
- XHTML documents have no attribute minimization.
- XHTML documents end empty elements.
- XHTML documents use elements with id and name attributes.
- XHTML documents use Document Type Declarations
- XHTML documents use XML namespaces.
Although it is not a requirement, an XHTML document often has an XML
declaration at the beginning to identify the document as an XML
document.
Some
Details Regarding XSLT
Previous lessons in this series have provided quite a bit of
detailed information regarding the operation of XSLT. Therefore,
this discussion will be brief.
Assume that an XML document has been parsed to produce a DOM
tree
in memory that represents the XML document.
Execute
template rules
An XSLT processor starts examining the DOM tree at its root
node. It
obtains instructions from the XSLT stylesheet telling it how to
navigate the
tree, and how to treat each node that it encounters along the way.
As each node is encountered, the processor searches the stylesheet
looking for a template rule that governs how to treat nodes of that
type. If the
processor finds
a template rule that matches the node type, it performs the operations
indicated by the template rule. Otherwise, it
executes a built-in template rule appropriate to that node.
Literal text in
template rules
If the template rule being applied
contains literal text, that literal text is used to
create text in the output.
Traversal of
the DOM tree
There are at least two XSLT elements that can be used to
traverse the children of a context node:
- xsl:apply-templates
- xsl:for-each
The
xsl:apply-templates element
The xsl:apply-templates
element was discussed in detail in previous lessons.
The
xsl:for-each element
The xsl:for-each element
executes an iterative
examination of all child nodes of the context node that
match a required select attribute. As each child
node is examined, it is processed using XSLT elements that form the
content of the xsl:for-each
element in the template rule.
This lesson will include examples that use the xsl:for-each element in addition to
the xsl:apply-templates
element. The lesson will also explain a Java method that emulates
the xsl:for-each element.
Enough talk,
let's
see some code
I will begin by discussing the XML file named Dom03.xml (shown in Listing 24 near the end of the
lesson) along with
the XSL
stylesheet file named Dom03.xsl
(shown in Listing 25).
A Java program
named Dom03
After explaining the transformation produced by applying this
stylesheet to this XML document, I will explain the transformation
produced by processing the XML file with a Java program named Dom03 (shown in Listing 23) that mimics
the behavior of the XSLT transformation.
Discussion
and Sample Code
The XML
file named Dom03.xml
The XML file shown in Listing 24 is relatively straightforward. A
tree view of the XML file is shown in Figure 1. (This XML file is both well-formed and
valid.)
#document DOCUMENT_NODE A DOCUMENT_TYPE_NODE #comment COMMENT_NODE xml-stylesheet PROCESSING_INSTRUCTION_NODE A ELEMENT_NODE Q ELEMENT_NODE #text A Big Header B ELEMENT_NODE C ELEMENT_NODE #text Text block 1. R ELEMENT_NODE #text A Mid Header C ELEMENT_NODE #text Text block 2. #comment COMMENT_NODE processor PROCESSING_INSTRUCTION_NODE S ELEMENT_NODE #text A Small Header B ELEMENT_NODE C ELEMENT_NODE #text Text block 3. S ELEMENT_NODE #text Another Small Header B ELEMENT_NODE C ELEMENT_NODE #text Text block 4. T ELEMENT_NODE #text A Smallest Header B ELEMENT_NODE C ELEMENT_NODE #text Text block 5. D ELEMENT_NODE E ELEMENT_NODE #text First list item in E G ELEMENT_NODE #text Nested G text element F ELEMENT_NODE #text First list item in F E ELEMENT_NODE #text Second list item in E F ELEMENT_NODE #text Second list item in F E ELEMENT_NODE #text Third list item in E F ELEMENT_NODE #text Third list item in F C ELEMENT_NODE #text Text block 6. C ELEMENT_NODE #text Text block 7. R ELEMENT_NODE #text Another Mid Header C ELEMENT_NODE #text Text block 8. B ELEMENT_NODE R ELEMENT_NODE #text Another Mid Header in Another B C ELEMENT_NODE #text Text block 9.
Figure 1
|
(This
tree view of the XML file was
produced using a program named DomTree02, which was discussed in an
earlier lesson.
Note that in order to make the tree view more
meaningful, I manually removed extraneous line breaks and text nodes
associated with those line breaks. The extraneous
line breaks in Figure 1 were caused by extraneous line breaks in the
XML file. The extraneous line breaks in the XML file were placed
there for cosmetic reasons and to force it to fit into this narrow
publication format.)
Content of the XML
document
The structure and content of the XML document was primarily designed to
illustrate various transformation concepts that I intend to explain in
this lesson. However, to some extent, I designed the structure
and
content keeping in mind the ultimate rendering of the XHTML file that
will be produced by transforming the XML file into an XHTML file.
The rendered
XHTML file
At this point, I'm going to jump ahead and show you what the final
XHTML file
looks like when rendered using Netscape Navigator v7.1. The
rendering of the XHTML file is shown in Figure 2.
(You may find it useful to compare the
rendering in Figure 2 with the XML file structure and content in Figure
1. You should be able to identify text nodes in Figure 1 that
match up with rendered text in Figure 2.)
Figure 2 Rendered XHTML file
The XSLT Transformation
The XSL
stylesheet file named Dom03.xsl
Recall that an XSL stylesheet is itself an XML file, and can therefore
be represented as a tree. Figure 3 presents an
abbreviated tree view of the stylesheet shown in Listing 25. I
colored each of the template rules in this view with alternating
colors of red and blue to make them easier to identify.
(As is often the
case with XSL stylesheets, this stylesheet file is well-formed but it
is not
valid.)
NOTE: IT WAS NECESSARY TO MANUALLY ENTER SOME LINE BREAKS IN THIS PRESENTATION TO FORCE IT TO FIT INTO THIS NARROW PUBLICATION FORMAT.
#document DOCUMENT_NODE xsl:stylesheet ELEMENT_NODE Attribute: version=1.0 Attribute: xmlns:xsl=http://www.w3.org/1999 /XSL/Transform xsl:output ELEMENT_NODE Attribute: method=xml Attribute: doctype-public=-//W3C//DTD XHTML 1.0 Transitional//EN Attribute: doctype-system=http://www.w3. org/TR/xhtml1/DTD/xhtml1-transitional.dtd
xsl:template ELEMENT_NODE Attribute: match=/ html ELEMENT_NODE head ELEMENT_NODE meta ELEMENT_NODE Attribute: http-equiv=content-type Attribute: content=text/html; charset=UTF-8 title ELEMENT_NODE #text Generated XHTML file body ELEMENT_NODE table ELEMENT_NODE Attribute: border=2 Attribute: cellspacing=0 Attribute: cellpadding=0 Attribute: width=330 Attribute: bgcolor=#FFFF00 tr ELEMENT_NODE td ELEMENT_NODE xsl:apply-templates ELEMENT_NODE
xsl:template ELEMENT_NODE Attribute: match=B xsl:apply-templates ELEMENT_NODE
xsl:template ELEMENT_NODE Attribute: match=C p ELEMENT_NODE xsl:apply-templates ELEMENT_NODE
xsl:template ELEMENT_NODE Attribute: match=D #text List of items in E
ul ELEMENT_NODE xsl:for-each ELEMENT_NODE Attribute: select=E li ELEMENT_NODE xsl:apply-templates ELEMENT_NODE #text List of items in F ol ELEMENT_NODE xsl:for-each ELEMENT_NODE Attribute: select=F li ELEMENT_NODE xsl:apply-templates ELEMENT_NODE
xsl:template ELEMENT_NODE Attribute: match=G b ELEMENT_NODE xsl:apply-templates ELEMENT_NODE
xsl:template ELEMENT_NODE Attribute: match=Q h1 ELEMENT_NODE xsl:apply-templates ELEMENT_NODE
xsl:template ELEMENT_NODE Attribute: match=R h2 ELEMENT_NODE xsl:apply-templates ELEMENT_NODE
xsl:template ELEMENT_NODE Attribute: match=S h3 ELEMENT_NODE xsl:apply-templates ELEMENT_NODE
xsl:template ELEMENT_NODE Attribute: match=T h4 ELEMENT_NODE xsl:apply-templates ELEMENT_NODE
Figure 3
|
Why abbreviated?
The reason that I refer to this as
an abbreviated tree view is because I manually deleted comment nodes
and
extraneous text nodes in order to emphasize the important elements in
the stylesheet.
(Extraneous text nodes occur as a result
of inserting line breaks in the original XSL document for cosmetic
purposes.
Note that I also manually entered several line breaks near the
beginning to force the material to fit into
this narrow
publication format.)
The root element
The root node of all XML documents is the document node. In
addition to the root node, there is also a root element, and it is
important not to confuse the two.
As you can see from Figure 3, the root element in the XSL document is
of type xsl:stylesheet.
The root element has two attributes, each of which is standard for XSL
stylesheets.
(Note that I manually entered a line break
in the second attribute of the xsl:stylesheet
node to force it to fit into this narrow publication format. I also
manually entered line breaks into two of the attributes of the xsl:output element node to force
them
to fit into this narrow publication format.)
The first attribute provides
the XSLT
version.
The second attribute points to the XSLT namespace URI, which you can
read about in the W3C
Recommendation.
Children of the
root element node
The root element node (xsl:stylesheet)
in Figure
3 has ten child
nodes, nine of which are template rules. (The green child node is not a template
rule. I will discuss it in detail later.) I colored
the template rules in alternating colors of red and blue to make them
easier to identify
visually.
The template
rules
Each of the nine template rules has a match
pattern. The nine match patterns in the order that they appear in
Figure 3 are as follows:
- match=/ (root node)
- match=B (matches element
node named B)
- match=C (matches element
node named C)
- match=D (matches element
node named D)
- match=G (matches
element node named G)
- match=Q (matches
element node named Q)
- match=R (matches
element node named R)
- match=S (matches
element node named S)
- match=T (matches
element node named T)
I will discuss each of the nine template rules later, but before doing
that
I will show you the raw XHTML output produced
by this XSLT transformation.
(Note
that the Java program discussed later produces essentially the same
output as the XSLT transformation.)
The output from
the transformation
The result of performing an XSLT transformation (by applying the XSL
stylesheet shown in Listing 25 to the XML file shown in Listing 24)
is
shown in Figure 4. This is the raw XHTML code that
was rendered in Figure 2.
I will explain the operations in the XSLT transformation that produced
most of the text in Figure 4.
NOTE THAT IT WAS NECESSARY FOR ME TO MANUALLY INSERT LINE BREAKS IN SEVERAL OF THE LONG LINES IN THIS MATERIAL TO FORCE IT TO FIT INTO THIS NARROW PUBLICATION FORMAT. I ALSO MANUALLY INSERTED LINE BREAKS AT CRITICAL POINTS TO MAKE IT EASIER TO INTERPRET THE MATERIAL VISUALLY.
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/ xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <meta http-equiv="content-type" content="text/html; charset=UTF-8"/> <title>Generated XHTML file</title> </head> <body> <table border="2" cellspacing="0" cellpadding="0" width="330" bgcolor="#FFFF00"><tr><td> <h1> A Big Header </h1> <p> Text block 1. </p> <h2> A Mid Header </h2> <p> Text block 2. </p> <h3> A Small Header </h3> <p> Text block 3. </p> <h3> Another Small Header </h3> <p> Text block 4. </p> <h4> A Smallest Header </h4> <p> Text block 5. </p> List of items in E <ul> <li> First list item in E <b> Nested G text element </b> </li> <li> Second list item in E </li> <li> Third list item in E </li> </ul> List of items in F <ol> <li> First list item in F </li> <li> Second list item in F </li> <li> Third list item in F </li> </ol> <p> Text block 6. </p> <p> Text block 7. </p> <h2> Another Mid Header </h2> <p> Text block 8. </p> <h2> Another Mid Header in Another B </h2> <p> Text block 9. </p> </td></tr></table> </body></html>
Figure 4
|
(Note
that I manually deleted a couple of extraneous line breaks from
the output shown in Figure 4. It was also necessary for me to
manually insert line breaks in several of the long lines to force the
material to fit in this narrow publication format. I also
manually inserted line breaks at certain critical points to make it
easier to interpret the material visually.)
Can sometimes
get confusing
I will caution you up front that this discussion can become
confusing but I will do everything that I can to minimize the
confusion. The problem is that the discussion will be mixing
tags, attributes and elements from the XML file with tags, attributes,
and
elements from the stylesheet file and the XHTML file. With so
many tags, attributes, and elements being discussed, it is sometimes
difficult to keep
them separated in your mind.
In particular, in order to cause the output to be a valid XHTML
document, it is necessary to manually insert XHTML tags, attributes,
and elements in the XSL template rules, which themselves involve XML
tags, attributes, and elements.
I will make heavy use of color in an attempt to minimize the confusion.
The first line of
text
The first line of text in the output shown in Figure 4
is an XML declaration
that is produced automatically by the XSLT transformer available with
JAXP. As I mentioned earlier, such a declaration is not
required, but is highly recommended by most authors.
The xsl:output
element
Before getting into the template rules in Figure 3, I need to explain
the xsl:output element shown
in green in Figure 3 and reproduced in Figure 5 below for convenient
viewing.
xsl:output ELEMENT_NODE Attribute: method=xml Attribute: doctype-public=-//W3C//DTD XHTML 1.0 Transitional//EN Attribute: doctype-system=http://www.w3. org/TR/xhtml1/DTD/xhtml1-transitional.dtd
Figure 5
|
The XSL
stylesheet version
Listing 1 shows the XSL code that corresponds to the tree view of the
stylesheet element shown in Figure 5.
<xsl:output method="xml" doctype-public="-//W3C//DTD XHTML 1.0 Transitional//EN" doctype-system="http://www.w3. org/TR/xhtml1/DTD/xhtml1-transitional.dtd" />
Listing 1
|
(As on several previous occasions, I need
to remind you that it was necessary for me to manually insert line
breaks in Listing 1 to cause the material to fit in this narrow
publication format.)
Literal text passes
through to the output
As you learned in the previous lesson, any literal text that you
include in your XSL stylesheet will be passed through to the
output. As you will see later, I will cause the output to contain
much of the required XHTML text simply by including that XHTML text as
literal text in the stylesheet.
The stylesheet
is an XML document
It is important to remember, however, that the XSL stylesheet is itself
an XML document, and you cannot include any literal text that would
cause a parser
to reject it as an XML document. You also cannot do anything that
will cause the XSLT processor to reject it as a stylesheet.
XHTML document
requires a specific DTD reference
One of the things that is required in the XHTML output is the DTD
reference
shown in Figure 6.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/ xhtml1/DTD/xhtml1-transitional.dtd">
Figure 6
|
(The
material in Figure 6 was extracted from Figure 4 and reproduced here
for convenient viewing. This is one of three alternative DTDs
that can be used with an XHTML document.)
Correct DTD for
XHTML but not for stylesheet
The DTD reference in Figure 6 is a correct DTD reference for an XHTML
document, but it is not a correct DTD reference for an XSL
stylesheet. (In fact,
stylesheets don't require a DTD and often don't have one.)
If you simply include the text from Figure 6 as literal text in the
stylesheet, (in hopes that it will
pass through to the output), the XSLT processor will interpret
it as a DTD reference for the stylesheet, and will attempt to validate
the stylesheet against that reference. The stylesheet will then
be declared invalid and the transformation effort will fail.
Therefore, you must find a way to cause this DTD reference to end up in
the XHTML document without confusing the XSLT transformation process.
Two ways to
accomplish that
I know of two ways to accomplish that objective. One way is to
include the text from Figure 6 in a CDATA section in the
stylesheet. This
raises some other issues, but it can be made to work.
The easier way is to use the xsl:output
element shown in Listing 1 to cause the DTD reference to be written
into the output without confusing the parser or the XSLT processor.
The xsl:output
element
Here is a partial quotation from XML In A Nutshell, (which I highly recommend), by
Elliotte Rusty Harold and
W. Scott Means.
"The
top-level xsl:output element helps determine the exact formatting of
the XML document produced when the result tree is stored in a file,
written onto a stream, or otherwise serialized into a sequence of
bytes."
Ten optional
attributes
To make a long story short, this element has ten optional attributes
that are used by the XSLT processor to determine the formatting of the
output. The XSLT element shown in Listing 1 specifies values for
three of those optional attributes:
- method
- doctype-public
- doctype-system
The default value for method is
xml, so I could have omitted
this attribute from my stylesheet with no problems. When the
value of this attribute is xml,
(which is the case in Listing 1),
that instructs the processor to produce a well-formed XML document.
The doctype-public attribute
sets the public identifier used in the document type declaration.
The doctype-system attribute
sets the system identifier used in the document type declaration.
The required
XHTML DTD
There are three allowable DTDs that can be used for an XHTML document:
- Strict
- Transitional
- Frameset
I'm not going to get into the differences between these three
DTDs in this lesson. Suffice it to say that I elected to use the
transitional
DTD for this example because it is somewhat easier to use than the
other two.
The
transitional DTD
Here is what the W3C has to say about the DTD for XHTML 1.0 Transitional:
This DTD module is identified by the
following PUBLIC and SYSTEM identifiers:
PUBLIC
"-//W3C//DTD XHTML 1.0 Transitional//EN"
SYSTEM
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"
As you can see, these values match the doctype-public
and doctype-system attribute
values in Listing 1, and result in the correct output for the
XHTML DTD in Figure 6.
The first
template rule
The first template rule (extracted
from Figure 3 and given a different color scheme) is shown in
tree view in Figure 7. This
template rule contains an XPath expression that matches the document
root (note the forward slash).
xsl:template ELEMENT_NODE Attribute: match=/ html ELEMENT_NODE head ELEMENT_NODE meta ELEMENT_NODE Attribute: http-equiv=content-type Attribute: content=text/html; charset=UTF-8 title ELEMENT_NODE #text Generated XHTML file body ELEMENT_NODE table ELEMENT_NODE Attribute: border=2 Attribute: cellspacing=0 Attribute: cellpadding=0 Attribute: width=330 Attribute: bgcolor=#FFFF00 tr ELEMENT_NODE td ELEMENT_NODE xsl:apply-templates ELEMENT_NODE
Figure 7
|
The template rule in
XSL format
Listing
2 shows the same template rule in
XSL format, (extracted from Listing
25).
<xsl:template match="/"> <html> <head> <meta http-equiv="content-type" content="text/html; charset=UTF-8"/> <title>Generated XHTML file</title> </head> <body> <table border="2" cellspacing="0" cellpadding="0" width="330" bgcolor="#FFFF00" > <tr> <td> <xsl:apply-templates/> </td> </tr> </table> </body> </html> </xsl:template>
Listing 2
|
(Note
that according to most of the books that I have read, the following
namespace attribute should be used on the html
tag. However, something about it causes problems with the JAXP
transformer so
I left it off. The resulting XHTML file is still valid according
to the W3C Markup
Validation Service even without the namespace attribute.
xmlns="http://www.w3.org/1999/xhtml"
xml:lang="en" lang="en")
The literal text is shown in red
From my viewpoint as the author of the stylesheet, everything that is
colored red in Listing 2 is simply literal text that I want to pass
through to the output so that it will become part of the raw XHTML text.
The template rule
must be well-formed
However, as you can see from Figure 7, the XML parser considers all of
this material to be well-formed (but
not valid) XML element nodes, attribute nodes, and text
nodes. Were I to make a change to any of the red literal text
that would corrupt the well-formed nature of the XML code in Listing 2,
the
stylesheet could not be used to control an XSLT transformation.
While a stylesheet is not required to be valid, it is required to be
well-formed.
Must be very
careful when including markup in stylesheet
Therefore, you must be very careful when you include literal markup
text in the stylesheet for whatever purpose. Any markup that you
include in the stylesheet must result in the stylesheet being
well-formed.
(This was not a problem with the inclusion
of literal text in the stylesheet in the previous lesson, because the
literal text didn't contain markup characters. As a result, the
literal text was interpreted simply as text nodes in the
stylesheet. As you can see from Figure 7, however, the literal
markup text that was included in this stylesheet was interpreted by the
parser as element nodes, attributes and text nodes.)
A very simple
template rule.
At first blush, this template rule appears to be very long and very
complex. However, as you can see from Listing 2, once you isolate
out all of the literal XHTML text that's included in the template rule,
the actual XSLT template rule is very simple. This rule simply
passes a lot of literal markup text through to the output and causes
templates
to be applied to all children of the root (document) node. (You learned what it means to apply
templates in
the previous lesson.)
The XHTML tags
If you are familiar with XHTML syntax, you will recognize that the
literal text shown in red in Listing 2 begins with typical XHTML tags
such as <html>, <head>, and <body>. These
tags are required for an XHTML document. This text is sent to the
output before any processing of the DOM tree is performed.
Then the literal text creates an XHTML table with a yellow
background. The start tags for the table are sent to the output
before the xsl:apply-templates
element is executed.
All of the output produced by executing the xsl:apply-templates element is
inserted into a single data <td> cell in the table.
Finally, when the xsl:apply-templates
element returns, the end tags for the table and the end tags for the
document are sent to the output.
The raw XHTML
output
Figure 8 shows a condensed version of the raw XHTML output. The
XHTML output shown in red in Figure 8 matches the literal text shown in
red in the template rule of Listing 2.
NOTE THAT IT WAS NECESSARY FOR ME TO MANUALLY INSERT LINE BREAKS IN SEVERAL OF THE LONG LINES IN THIS MATERIAL TO FORCE IT TO FIT INTO THIS NARROW PUBLICATION FORMAT.
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/ xhtml1/DTD/xhtml1-transitional.dtd"> <html> <head> <meta http-equiv="content-type" content="text/html; charset=UTF-8"/> <title>Generated XHTML file</title> </head> <body> <table border="2" cellspacing="0" cellpadding="0" width="330" bgcolor="#FFFF00"><tr><td>
...HTML CODE DELETED FOR BREVITY...
</td></tr></table> </body></html>
Figure 8
|
The effect of
xsl:apply-templates
Referring once again to Listing 2, we see that this template rule
causes templates to be applied to all child nodes of
the root or document node. A root node can have only one child
node, which is the root element node. Referring back to Figure 1,
we see that the root element node is named A.
Now referring back to the tree view of the stylesheet in Figure 3 (and also the list of match patterns
presented earlier), we see that the stylesheet doesn't contain a
template rule that matches an element named A.
Important to
understand built-in behavior
If the processor encounters a node for which there is no matching
template rule, it executes a built-in template rule for that
type of node. This is where it becomes important to understand
the behavior of the built-in template rules, which I explained in the
earlier lesson entitled Java
JAXP, Implementing Default XSLT Behavior in Java.
The behavior of the built-in template rule for element nodes is to
apply templates to all child nodes of the element node.
Therefore, in this case, the processor will apply templates to all
child nodes of the root element node named A.
Referring back to Figure 1, we see that the root element node has three
child nodes, which occur in the following order: Q, B, and
B.
Therefore, the first node that will be processed is the node named Q.
A template rule
that matches Q
Figure 9 and Listing 3 show a template rule that matches an element
named Q.
xsl:template ELEMENT_NODE Attribute: match=Q h1 ELEMENT_NODE xsl:apply-templates ELEMENT_NODE
Figure 9
|
The tree view of the template rule is shown in Figure 9. The XSL
stylesheet code is shown in Listing 3.
<xsl:template match="Q"> <h1> <xsl:apply-templates /> </h1> </xsl:template>
Listing 3
|
A level 1
header in the output
This template rule sends the start and end tags for a level 1 XHTML
header to the output, and inserts something between those tags by
applying templates to all child nodes of the element node named Q.
Referring back to the element node named Q in Figure 1, we see that it has
only one child node, and that node is a text node. Executing the xsl:apply-templates element on a
text node causes the built in version of the template rule to be
applied. The built-in version gets the value of the text node and
sends it to the output. This produces the raw XHTML output shown
in Figure 10.
<h1> A Big Header </h1>
Figure 10
|
You should be able to easily identify the header from Figure 10 in the
first line of the rendered output in Figure 2.
A template rule
that matches B
That takes care of processing the root element node's child named Q. The next child to be
processed is a child node named B.
A template rule that matches an element node named B is shown in Figure 11 and Listing
4.
|