LanguagesXMLLearning XML: Trees, Nodes, and Templates, Part II

Learning XML: Trees, Nodes, and Templates, Part II


Preface

I have authored numerous online articles on XML.  These articles cover the waterfront from introductory topics to advanced topics. I maintain a consolidated index of hyperlinks to all of my XML articles at my personal website so that you can access earlier articles from there.

You may find it useful to open another copy of this lesson in a separate browser window.  That will make it easier for you to scroll back and forth among the different listings while you are reading about them. 

As of this writing, to my knowledge, Microsoft IE5 is the only widely-used web browser that has the ability to render XML documents.  IE5 can render XML documents using either CSS or XSL. This is one in a series of articles that discuss the use of XSL for the rendering of XML documents, with particular emphasis on the use of IE5 for that purpose.

Introduction

In Part I of this series on Trees, Nodes, and Templates, I showed you how to manually convert an XML document into a tree representation.  You can view that tree in Listing 1.
 

A(root)
+-Q(big header)(text)
+-B(block)
| +-C(paragraph)(text)
| +-R(mid-size header)(text)
| +-C(paragraph)(text)
| +-S(small header)(text)
| +-B(block)
| | +-C(paragraph)(text)
| |
| +-S(small header)(text)
| +-B(block)
| | +-C(paragraph)(text)
| | +-T(smallest header)(text)
| | +-B(block)
| | | +-C(paragraph)(text)
| | | +-D(list)
| | | | +-E(list item)(text)
| | | | +-E(list item)(text)
| | | | +-E(list item)(text)
| | | |
| | | +-C(paragraph)(text)
| | |
| | +-C(paragraph)(text)
| |
| +-R(mid-size header)(text)
| +-C(paragraph)(text)
|
+-B(block)
  +-R(mid-size header)(text)
  +-C(paragraph)(text)

Listing 1

I also promised to explain how XSLT can be used to transform that XML tree into an HTML tree.  I showed you such an HTML tree as Listing 2, but I didn’t show you how the transformation was accomplished.  That is the purpose of this lesson.
 

HTML
 +-BODY
   +-table(with attributes)
     +-tr
       +-td
         +-H1(text)
         +-P(text)
         +-h2(text)
         +-P(text)
         +-h3(text)
         +-P(text)
         +-h3(text)
         +-P(text)
         +-h4(text)
         +-P(text)
         +-UL
         | +-LI(text)
         | +-LI(text)
         | +-LI(text)
         |
         +-P(text)
         +-P(text)
         +-h2(text)
         +-P(text)
         +-h2(text)
         +-P(text)

Listing 2

In this lesson (and the next), I will develop the XSLT code that could be used to convert the XML tree shown in Listing 1 into the HTML tree shown in Listing 2. The standard rendering of the HTML represented by Listing 2 can be viewed in Figure 1 which follows:
 

A Big Header

Level 0. This is the beginning of a B. This text is in the Introduction section.

A Mid Header

Level 0. This is a continuation of the same B. This text is in the Technical Details section. This section contains two smallHeaders, each of which is followed by a nested B.

A Small Header

Level 1. This is the beginning of a nested B. This block of text is nested inside of a larger overall block of text. This is also the end of this B.

Another Small Header

Level 1. This is the beginning of another nested B at the same level as the previous one. Another B is nested inside of this one.

A Smallest Header

Level 2. This is the beginning of another nested B that is inside of the previous one. This block of text is nested another level down. This B also contains a list.

  • First list item
  • Second list item
  • Third list item

Level 2. Still inside of, but ending the innermost B.

Level 1. Still inside of, but ending the middle B.

Another Mid Header

Level 0. This block of text is back out at the top level of the outermost B.

Another Mid Header in Another B

Level 0. This text is in a completely new B that is at the top B level.
 

Figure 1

The XML File

The XML file is named XSL007.xml.  I discussed the XML code in Part I of this series on Trees, Nodes, and Templates.  Since my purpose in this lesson is to discuss the process in terms of trees, I won’t discuss the raw XML code further.  The code is provided, however, near the end of this lesson.

A Sample Program

If you load XSL007.xml into IE5 you should see something similar to Figure 1 above. Alternately, you can load the XML file and the XSLT file into the XSL Debugger discussed in an earlier lesson and get essentially the same result.

This example is based on XSLT information available at the W3C and on IE5 information available at Microsoft. I will discuss the XSLT code in fragments.  You can view a complete listing of the XSLT code in the complete listings near the end of this lesson.

The XSLT File

This XSLT file is named XSL007.xsl.  Although this XSLT file is significantly different from XSLT files discussed in earlier lessons, it looks very similar to previous examples at the beginning. The XSLT code in the following fragment takes care of some preliminary requirements that have been discussed in earlier lessons.  It then defines a template to match the root node.

<?xml version=’1.0′?>

<xsl:stylesheet 
xmlns:xsl=”http://www.w3.org/TR/WD-xsl”>

<xsl:template match=”/”>
<HTML>
<BODY>
<table BORDER=”2″ CELLSPACING=”0″ 
    CELLPADDING=”0″ WIDTH=”330″ 
    BGCOLOR=”#FFFF00″ >
<tr>
<td>

I highlighted this template statement in boldface to make it easy to spot.  I will discuss it in more detail later.

After defining a template to match the root node, the fragment provides the literal text required to create an HTML table with a yellow background. Unlike in earlier lessons, I won’t do anything special with the table (such as inserting XML data into different rows). Rather, I am simply using the table as a container for the output HTML rendering.  I need to control the width of the rendering on the browser screen and this is an easy way to do that. All of the HTML output will be placed in a single cell in this table.

Now back to the template statement that I highlighted in boldface.  You might interpret the behavior of that statement something like this:
 

Find the root node and use the information contained therein to insert text into the output stream based on the contents and behavior of this template.

The contents of the root node consist mainly of the literal text that produces literal text in the HTML output node, plus one other very important thing, as shown in the following fragment:

<xsl:apply-templates select=”A/*” />

Can you see the asterisk? Just in case you are having difficulty reading it in your browser, there is an asterisk (*) immediately ahead of the final quotation mark. What does this processing element do? I will first try to explain it in my own words, and then will provide some backup from the W3C.  But first, let’s have a reality check.

Although we like to talk about trees, the reality is that we aren’t producing a tree.  We are simply producing a text stream. If the text stream has certain characteristics, it can be thought of as a tree, but it is simply a stream of text. I’m not certain that I can define all of the characteristics that would be necessary to make it possible to represent the text stream as a tree, but here is one way to look at it. 

Each complete element in the text stream can be represented by a node in a tree. A complete element consists of a start tag, an end tag, some optional attributes, and some optional content in between the tags.  Let’s call this the current node just to give it a name. The content of the current node can consist optionally of text and other elements. If the content includes text, this constitutes a special text node that is a child of the current node. If the content includes other elements, each of those elements is a new node that is a child of the current node. Each of those elements can have text and other elements.  This means that each of them can have other child nodes, and each of those child nodes can have child nodes, and each of those child nodes can have child nodes, and the story goes on and on.

What if an end tag is missing from the text stream? I would say that in that case, the text stream couldn’t be represented as a tree. This is a common situation in HTML.  For example, when you write HTML that contains a <BR> tag, you might not ordinarily provide a </BR> tag.  (I’m not even sure that such a tag would be legal in HTML.) XML would call this an empty tag, and would say that it should be written as <BR/>.  IE5 seems to be willing to accept this, but I don’t know what other browsers might do with such a tag. However, this does lead me to expand my previous description of a node to include empty tags using the <BR/> syntax as an alternative to having both a start tag and an end tag. A node that represents an empty element doesn’t have any child nodes (but it can have attributes). 

As another example, many people who write HTML commonly use <P> without a corresponding </P> and HTML browsers are normally willing to accept that also.  However, I would say that such an HTML document could not be represented as a tree. 

All of this leads me to say that a document that is well-formed can probably be represented as a tree (but it is still nothing more than a stream of text characters in a particular arrangement). So, what is there to guarantee that XSLT will produce an output text stream that can be represented as a tree? 

Interestingly enough, the guarantee lies in the fact that the XSLT file itself must be well formed. For example, if I include <BR> in the XSLT file without also including </BR>, the resulting XSLT file will not be well-formed.  The IE5 XML processor will reject it.  Note that IE5 will accept either of the following:

  • <BR/>
  • <BR></BR>

I will repeat the processing element from the previous fragment for convenience:

<xsl:apply-templates select=”A/*” />

The asterisk is a wild-card character when used in this sense. Here is what this fragment means.
 

Go find each of the nodes that are children of the node named A

In this example XML file, there are three such children named Q, B, and B (see Listing 1). For each of the child nodes, the transformation will check to see if there is a matching template defined in this XSLT file. If so, the information contained in the node, along with the instructions in the matching template, will be used to create and to insert some text at this point in the output stream.

Note that this is a recursive process.  When the processor looks at one of the child nodes and finds a matching template, that template may also contain a processing element that reads <xsl:apply-templates select=…>. That means

  • Go look for nodes matching the select criteria,
  • Check to see if they have matching templates, and if so,
  • Process those nodes with those templates.

As is always the case in recursion, this means that the system may go off and begin processing another template before completing the processing of the current template.  Only after the processing of the new template has been completed will control return to the processing of the current template. This process will continue until all of the specified nodes for which there are matching templates have been processed.

Now I am going to repeat the previous fragment one more time, except this time I am going to include the statements immediately ahead of and immediately following it.

<td>
<xsl:apply-templates select=”A/*” />
</td>

Note that the xsl:apply-templates processing element appears between the start and end tags for a cell (table data) in an HTML table. This means that all of the output text produced by processing this statement, including all of the text produced by recursively processing all of the statements that result from processing this statement, will appear in the output stream between these two HTML tags.  This means, in turn, that all of the output data will appear in a single cell in the HTML table being produced by this transformation. 

The above statements are very important.  You should make sure that you understand them. 

What the W3C has to say about this processing element.
 

<xsl:apply-templates
    select = node-set-expression 
    …
</xsl:apply-templates> 

A select attribute can be used to process nodes selected by an expression instead of processing all children. The value of the select attribute is an expression. The expression must evaluate to a node-set. The selected set of nodes is processed in document order, unless a sorting specification is present.

In the absence of a select attribute, the xsl:apply-templates instruction processes all of the children of the current node, including text nodes. 

Multiple xsl:apply-templates elements can be used within a single template to do simple reordering.

Typically, xsl:apply-templates is used to process only nodes that are descendants of the current node. Such use of xsl:apply-templates cannot result in non-terminating processing loops. However, when xsl:apply-templates is used to process elements that are not descendants of the current node, the possibility arises of non-terminating loops.

I will show you some examples later that use the xsl:apply-templates instruction without a select attribute to process all of the children of the current node. Note, however, that this doesn’t seem to work in IE5 when the current node was selected by matching the root node. Therefore, I used the select element with the wild-card characters in this example.

The remainder of the template that matches and process the root node is shown in the following listing:

</tr>
</table>

</BODY>
</HTML>
</xsl:template>

As you can see, all this does is

  • Provide the end tags for the HTML elements being inserted into the output stream, and
  • Provide the end tag for the template itself.

I like to hold the size of these lessons down to reasonably digestible chunks.  There is a lot more to be discussed on this topic, so I will let that be the end of this lesson. I will pick up at this point in Part III of this series on Trees, Nodes, and Templates.

Complete Program Listings

A listing of the XML file (XSL007.xml) is shown in the following listing:

<?xml version=”1.0″?>

<!– File XSL007.xml
Copyright 2000 R. G. Baldwin
Illustrates recursive transformation using templates.
Works with IE5.0
–>

<?xml-stylesheet type=”text/xsl” href=”XSL007.xsl”?>
 

<A>

<Q>A Big Header</Q>

<B>
<C>Level 0.  This is the beginning of a B. 
This text is in the Introduction section.</C>

<R>A Mid Header</R>

<C>Level 0.  This is a continuation of the same B. 
This text is in the Technical Details section. This 
section contains two smallHeaders, each of which is 
followed by a nested B.</C>

<S>A Small Header</S>
<B>
<C>Level 1.  This is the beginning of a nested B. 
This block of text is nested inside of a larger overall 
block of text. This is also the end of this B.</C>
</B>

<S>Another Small Header</S>
<B>
<C>Level 1.  This is the beginning of another nested B
at the same level as the previous one.
Another B is nested inside of this one.</C>

<T>A Smallest Header</T>
<B>
<C>Level 2.  This is the beginning of another nested B
that is inside of the previous one.  This block of text is nested 
another level down.  This B also contains a list.</C>

<D>
<E>First list item</E>
<E>Second list item</E>
<E>Third list item</E>
</D>

<C>Level 2.  Still inside of, but ending the innermost B.</C>
</B>
<C>Level 1.  Still inside of, but ending the middle B.</C>
</B>
 

<R>Another Mid Header</R>
<C>Level 0.  This block of text is back out at the top level of 
the outermost B.</C>
</B>

<B>
<R>Another Mid Header in Another B</R>
<C>Level 0.  This text is in a completely new B that is at the
top B level.</C>
</B>

</A>

A listing of the XSLT file (XSL007.xsl) is shown in the following code listing:

<?xml version=’1.0′?>
<!– File XSL007.xsl
Copyright 2000 R. G. Baldwin
Illustrates recursive transformation using templates.

Works with IE5.0
–>
<xsl:stylesheet 
xmlns:xsl=”http://www.w3.org/TR/WD-xsl”>

<xsl:template match=”/”>
<HTML>
<BODY>
<table BORDER=”2″ CELLSPACING=”0″ 
    CELLPADDING=”0″ WIDTH=”330″ 
    BGCOLOR=”#FFFF00″ >
<tr>
<td>
<xsl:apply-templates select=”A/*” />
</td>
</tr>
</table>

</BODY>
</HTML>
</xsl:template>
<!– End root match template –>

<!– Simulate built-in text template –>
<xsl:template match=”text()”>
<xsl:value-of select=”.”/>
</xsl:template>
<!– End text match template –>

<xsl:template match=”B”>
<xsl:apply-templates /> 
</xsl:template>
<!– End B match template –>

<xsl:template match=”C”>
<P>
<xsl:apply-templates /> 
</P>
</xsl:template>
<!– End C match template –>

<xsl:template match=”D”>
<UL>
<!– loop –>
<xsl:for-each select=”E”>
<LI>
<xsl:apply-templates /> 
</LI>
</xsl:for-each>
<!– End loop –>
</UL>
</xsl:template>
<!– End D match template –>

<!– Header templates follow –>
<xsl:template match=”Q”>
<h1>
<xsl:apply-templates /> 
</h1>
</xsl:template>
<!– End Q match template –>

<xsl:template match=”R”>
<h2>
<xsl:apply-templates /> 
</h2>
</xsl:template>
<!– End R match template –>

<xsl:template match=”S”>
<h3>
<xsl:apply-templates /> 
</h3>
</xsl:template>
<!– End S match template –>

<xsl:template match=”T”>
<h4>
<xsl:apply-templates /> 
</h4>
</xsl:template>
<!– End T match template –>

</xsl:stylesheet>

A listing of the HTML produced by applying this transform to this XML file is shown in the following listing:

<HTML><BODY><table BORDER=”2″ CELLSPACING="0" CELLPADDING="0" BGCOLOR="#FFFF00"><tr><td><h1>A Big Header</h1><P>Level 0.  This is the beginning of a B. 
This text is in the Introduction section.</P><h2>A Mid Header</h2><P>Level 0.  This is a continuation of the same B. 
This text is in the Technical Details section. This 
section contains two smallHeaders, each of which is 
followed by a nested B.</P><h3>A Small Header</h3><P>Level 1.  This is the beginning of a nested B. 
This block of text is nested inside of a larger overall 
block of text. This is also the end of this B.</P><h3>Another Small Header</h3><P>Level 1.  This is the beginning of another nested B
at the same level as the previous one. 
Another B is nested inside of this one.</P><h4>A Smallest Header</h4><P>Level 2.  This is the beginning of another nested B
that is inside of the previous one.  This block of text is nested 
another level down.  This B also contains a list.</P><UL><LI>First list item</LI><LI>Second list item</LI><LI>Third list item</LI></UL><P>Level 2.  Still inside of, but ending the innermost B.</P><P>Level 1.  Still inside of, but ending the middle B.</P><h2>Another Mid Header</h2><P>Level 0.  This block of text is back out at the top level of 
the outermost B.</P><h2>Another Mid Header in Another B</h2><P>Level 0.  This text is in a completely new B that is at the
top B level.</P></td></tr></table></BODY></HTML>


Copyright 2000, Richard G. Baldwin.  Reproduction in whole or in part in any form or medium without express written permission from Richard Baldwin is prohibited.

About the author

Richard Baldwin (baldwin.richard@iname.com ) is a college professor and private consultant whose primary focus is a combination of Java and XML. In addition to the many platform-independent benefits of Java applications, he believes that a combination of Java and XML will become the primary driving force in the delivery of structured information on the Web.

Richard has participated in numerous consulting projects involving Java, XML, or a combination of the two.  He frequently provides onsite Java and/or XML training at the high-tech companies located in and around Austin, Texas.  He is the author of Baldwin’s Java Programming Tutorials, which has gained a worldwide following among experienced and aspiring Java programmers. He has also published articles on Java Programming in Java Pro magazine.

Richard holds an MSEE degree from Southern Methodist University and has many years of experience in the application of computer technology to real-world problems.

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends & analysis

Latest Posts

Related Stories