November 25, 2014
Hot Topics:

Learning XML: Trees, Nodes, and Templates, Part I

  • January 18, 2001
  • By Richard G. Baldwin
  • Send Email »
  • More Articles »


Preface

I have authored numerous online articles on XML.  These articles cover the waterfront from introductory topics to advanced topics. I maintain a consolidated index of hyperlinks to all of my XML articles at my personal website so that you can access earlier articles from there. 

As of this writing, to my knowledge, Microsoft IE5 is the only widely-used web browser that has the ability to render XML documents.  IE5 can render XML documents using either CSS or XSL. This is one in a series of articles that discuss the use of XSL for the rendering of XML documents, with particular emphasis on the use of IE5 for that purpose.

You may find it useful to open another copy of this lesson in a separate browser window.  That will make it easier for you to scroll back and forth among the different figures and listings, without losing your place, while you are reading about them.

Introduction

Much of the literature on XSLT talks about transforming an XML tree to another tree of some sort.  There is good reason for the use of this terminology.  It makes sense when you think of XML documents as trees. Up until now, I have avoided the use of tree terminology.  I did this purposely so that I could show you a few practical examples of the use of XSLT before getting into the issue of trees. The time for talking about trees has come.  This lesson will introduce you to the representation of XML documents as trees.  A following lesson will introduce you to the use of XSLT to transform XML trees into HTML trees.

XSLT is not restricted to producing an HTML tree as its output.  It can be used to transform an XML tree into just about any other kind of tree (including a tree of your own invention).  This is easy enough to demonstrate using the XSL Debugger that I discussed in an earlier lesson.

In this lesson, I am concentrating on the use of IE5 to render the tree that is created through the transformation process. I could design the XSLT code to produce any kind of tree.  However, if I design the XSLT code to transform the XML into any kind of a tree other than an HTML tree, IE5 won't know how to render the output tree.  If IE5 doesn't find any legitimate HTML elements in the output, it will simply produce a blank screen. Therefore, in this and the following lesson, I will concentrate on the transformation of an XML tree into an HTML tree.

The XML File

The XML file is named XSL007.xml.  A copy of the source code for the XML file is shown near the end of the lesson. The following figure, Figure 1, contains a rendering of the XML file in tree format. I created this rendering manually using a text editor and pasted it into an HTML file.  This was a very tedious process.  Hopefully your browser renders it in the format that I intended.
 
A(root)
+-Q(big header)(text)
+-B(block)
| +-C(paragraph)(text)
| +-R(mid-size header)(text)
| +-C(paragraph)(text)
| +-S(small header)(text)
| +-B(block)
| | +-C(paragraph)(text)
| |
| +-S(small header)(text)
| +-B(block)
| | +-C(paragraph)(text)
| | +-T(smallest header)(text)
| | +-B(block)
| | | +-C(paragraph)(text)
| | | +-D(list)
| | | | +-E(list item)(text)
| | | | +-E(list item)(text)
| | | | +-E(list item)(text)
| | | |
| | | +-C(paragraph)(text)
| | |
| | +-C(paragraph)(text)
| |
| +-R(mid-size header)(text)
| +-C(paragraph)(text)
|
+-B(block)
  +-R(mid-size header)(text)
  +-C(paragraph)(text)


Figure 1

This is a typical text-based tree rendering (using monospaced text, hyphens, vertical bars and plus signs) showing that the tree has a root node named A.  If it doesn't look like a tree in your browser, make certain that your browser is rendering it using a monospaced font (such as Courier). The root node has three children named Q, B, and B.  Note that unlike the directory tree structures that you may be accustomed to, it is allowable for a node to have two or more children with the same name.

With the exception of the word text, the information in Figure 1 in parentheses is intended for explanatory purposes only.  It indicates the general type of HTML node, if any, that the XML node will be transformed into.  I will discuss text shortly.

The node named Q doesn't have any children (or does it? --  see below). The node named B, which is the second child of the root (immediately following Q), has a number of children named C, R, S, and B.  It is allowable for the node named B to have a child of its own type, B. The innermost node of type B has a child of type D.  We can add that to the list of node types that can be children of type B. The node of type D has three children of type E.

"So what," you say, and I can understand why you might be inclined to say that. What does it mean when I say that the root node (A) has three children named Q, B, and B This concept is illustrated in the following listing, which is a very abbreviated version of the complete XML file.


<A>

    <Q>...</Q>
    <B>...</B>
    <B>...</B>
</A>

I replaced the content of each of the elements named Q, B, and B with ellipses (...). If you think of the outermost element of the XML file (A) as the root node, then all of the elements that are defined in the content of the root node are children of that root node. In this case, the content of the root node defines three elements named Q, B, and B, so they are children of the root node. 

Now consider the element (or node) named Q.  Does it have any children? It doesn't have any elements defined in its content, but it does have some text defined in its content as shown in the following listing:


<Q>A Big Header</Q>

Text constitutes a special kind of node, often referred to as a text node. So, I need to expand my previous definition to say that all elements defined within the content of the current node (element) plus any text defined within that content become child nodes of the current node. So, the answer to the above question is that the node named Q does have one child, and it is a text node.

Note that in my manual tree representation of the XML file in Figure 1, I didn't represent text nodes in the same way that I represented the other types of nodes. Rather, I simply indicated that a node has a child text node using the following notation -- (text). I did it this way so that the entire tree could be contained in a reasonable vertical space.  Adding text nodes using the same representation as the other nodes caused the tree to be almost twice as tall, and, in my opinion, makes it more difficult to analyze visually.

As one more example of a conversion from raw XML notation to tree notation, let's take a look at the raw XML code that represents the very bottom node of type B in the tree of Figure 1.  That code is:


<B>

    <R>...text...</R>
    <C>...text...</C>
</B>

Here we see that the node named B has two child nodes named R and C.  Each of these nodes has a child text node, but no other children.

The Output HTML Tree

The output HTML code produced by performing the transformation on the XML file is shown in a listing near the end of this lesson. Note that I didn't attempt to make it pretty.  I simply grabbed the HTML output from the XSL Debugger. The standard rendering of this HTML code is shown in the following figure, Figure 2.
 

A Big Header

Level 0. This is the beginning of a B. This text is in the Introduction section.

A Mid Header

Level 0. This is a continuation of the same B. This text is in the Technical Details section. This section contains two Small Headers, each of which is followed by a nested B.

A Small Header

Level 1. This is the beginning of a nested B. This block of text is nested inside of a larger overall block of text. This is also the end of this B.

Another Small Header

Level 1. This is the beginning of another nested B at the same level as the previous one. Another B is nested inside of this one.

A Smallest Header

Level 2. This is the beginning of another nested B that is inside of the previous one. This block of text is nested another level down. This B also contains a list.
  • First list item
  • Second list item
  • Third list item
Level 2. Still inside of, but ending the innermost B.

Level 1. Still inside of, but ending the middle B.

Another Mid Header

Level 0. This block of text is back out at the top level of the outermost B.

Another Mid Header in Another B

Level 0. This text is in a completely new B that is at the top B level.
 

Figure 2

Figure 3 shows a manual rendering of this HTML code into a tree format.  I hope that your browser renders it in the format that I intended.  Make certain that your browser renders it in a monospaced font.
 

HTML
 +-BODY
   +-table(with attributes)
     +-tr
       +-td
         +-H1(text)
         +-P(text)
         +-h2(text)
         +-P(text)
         +-h3(text)
         +-P(text)
         +-h3(text)
         +-P(text)
         +-h4(text)
         +-P(text)
         +-UL
         | +-LI(text)
         | +-LI(text)
         | +-LI(text)
         |
         +-P(text)
         +-P(text)
         +-h2(text)
         +-P(text)
         +-h2(text)
         +-P(text)



Figure 3

As you can see, this tree doesn't look very much like the XML tree.  I will discuss the transformation process that produced this HTML in conjunction with my discussion of the XSLT file in a following article. In the meantime, you might want to take a shot at manually transforming the HTML to a tree representation to see if you get the same tree that I got.

Complete Program Listings

A listing of the XML file (XSL007.xml) is shown in the following listing:
<?xml version="1.0"?>

<!-- File XSL007.xml
Copyright 2000 R. G. Baldwin
Illustrates recursive transformation using
templates.
Works with IE5.0
-->

<?xml-stylesheet type="text/xsl" href="XSL007.xsl"?> 

<A>

<Q>A Big Header</Q>

<B>
<C>Level 0.  This is the beginning of a B. 
This text is in the Introduction section.</C>

<R>A Mid Header</R>

<C>Level 0.  This is a continuation of the same B. 
This text is in the Technical Details section. This 
section contains two Small Headers, each of which is 
followed by a nested B.</C>

<S>A Small Header</S>
<B>
<C>Level 1.  This is the beginning of a nested B. 
This block of text is nested inside of a larger overall 
block of text. This is also the end of this B.</C>
</B>

<S>Another Small Header</S>
<B>
<C>Level 1.  This is the beginning of another nested B
at the same level as the previous one. 
Another B is nested inside of this one.</C>

<T>A Smallest Header</T>
<B>
<C>Level 2.  This is the beginning of another nested B
that is inside of the previous one.  This block of text is nested 
another level down.  This B also contains a list.</C>

<D>
<E>First list item</E>
<E>Second list item</E>
<E>Third list item</E>
</D>

<C>Level 2.  Still inside of, but ending the innermost B.</C>
</B>
<C>Level 1.  Still inside of, but ending the middle B.</C>
</B>
 

<R>Another Mid Header</R>
<C>Level 0.  This block of text is back out at the top level of 
the outermost B.</C>
</B>

<B>
<R>Another Mid Header in Another B</R>
<C>Level 0.  This text is in a completely new B that is at the
top B level.</C>
</B>

</A>

A listing of the HTML file produced by applying a transformation to the XML file:

<HTML><BODY><table BORDER="2" CELLSPACING="0" CELLPADDING="0" WIDTH="330" BGCOLOR="#FFFF00"><tr><td><h1>A Big Header</h1><P>Level 0.  This is the beginning of a B. 
This text is in the Introduction section.</P><h2>A Mid Header</h2><P>Level 0.  This is a continuation of the same B. 
This text is in the Technical Details section. This 
section contains two smallHeaders, each of which is 
followed by a nested B.</P><h3>A Small Header</h3><P>Level 1.  This is the beginning of a nested B. 
This block of text is nested inside of a larger overall 
block of text. This is also the end of this B.</P><h3>Another Small Header</h3><P>Level 1.  This is the beginning of another nested B
at the same level as the previous one.
Another B is nested inside of this one.</P><h4>A Smallest Header</h4><P>Level 2.  This is the beginning of another nested B
that is inside of the previous one.  This block of text is nested 
another level down.  This B also contains a list.</P><UL><LI>First list item</LI><LI>Second list item</LI><LI>Third list item</LI></UL><P>Level 2.  Still inside of, but ending the innermost B.</P><P>Level 1.  Still inside of, but ending the middle B.</P><h2>Another Mid Header</h2><P>Level 0.  This block of text is back out at the top level of 
the outermost B.</P><h2>Another Mid Header in Another B</h2><P>Level 0.  This text is in a completely new B that is at the
top B level.</P></td></tr></table></BODY></HTML>
 


Copyright 2000, Richard G. Baldwin.  Reproduction in whole or in part in any form or medium without express written permission from Richard Baldwin is prohibited.

About the author

Richard Baldwin (baldwin.richard@iname.com) is a college professor and private consultant whose primary focus is a combination of Java and XML. In addition to the many platform-independent benefits of Java applications, he believes that a combination of Java and XML will become the primary driving force in the delivery of structured information on the Web.

Richard has participated in numerous consulting projects involving Java, XML, or a combination of the two.  He frequently provides onsite Java and/or XML training at the high-tech companies located in and around Austin, Texas.  He is the author of Baldwin's Java Programming Tutorials, which has gained a worldwide following among experienced and aspiring Java programmers. He has also published articles on Java Programming in Java Pro magazine.

Richard holds an MSEE degree from Southern Methodist University and has many years of experience in the application of computer technology to real-world problems.






Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 


Enterprise Development Update

Don't miss an article. Subscribe to our newsletter below.

Sitemap | Contact Us

Rocket Fuel