Java Programming Notes # 2204
- Preface
- Preview
- Discussion and
Sample Code - Run the Program
- Summary
- What’s Next?
- Complete Program
Listings
Preface
What is JAXP?
As the name implies, the Java API for XML Processing (JAXP) is an
API designed
to help you write programs for processing XML documents. JAXP is
very important for many reasons, not the least of which is the
fact that it is a critical part of the Java Web Services Developer Pack
(JWSDP). As you are probably already aware, web services is
expected by many to be a very important aspect of the Internet of the
future
This is the third lesson in a series designed to initially help you
understand how to use JAXP,
and to eventually help you understand how to use the JWSDP.
The first lesson was entitled Java
API for XML Processing (JAXP), Getting Started. The
previous lesson was entitled Getting
Started with Java JAXP and XSL Transformations (XSLT).
What is XML?
XML is an acronym for the eXtensible Markup Language.
I will not attempt to teach XML in this series of
tutorial lessons. Rather, I will assume that you already
understand
XML, and I will teach you how to use JAXP to write programs for
creating and processing XML documents.
I have published numerous tutorial lessons on XML at Gamelan.com and www.DickBaldwin.com.
You may find it useful to refer to those lessons. In addition, I
provided
a review of the salient aspects of XML in the first lesson in this
series. From time to time, I will also provide background
information regarding XML in the lessons in this series.
Viewing tip
You may find it useful to open another copy of this lesson in a
separate browser window. That will make it easier for you to
scroll back and forth among the different listings and figures while
you are reading about them.
Supplementary material
I recommend that you also study the other lessons in my extensive
collection of online Java tutorials. You will find those lessons
published at Gamelan.com.
However, as of the date of this writing, Gamelan doesn’t maintain a
consolidated index of my Java tutorial lessons, and sometimes
they are difficult to locate there. You will find a consolidated
index at www.DickBaldwin.com.
Preview
A tree structure in memory
A DOM parser can be used to
create a tree structure in memory that represents an XML
document. In Java, that tree structure is encapsulated in an
object of the interface type Document. Document
and its superinterface Node declare numerous methods that may
be used to navigate, extract information from, modify, and otherwise
manipulate the DOM tree. As
is always the case, classes that implement Document must
provide concrete definitions of those methods.
Many operations are possible
Given an object of type Document, there are many
methods that
can be invoked on the object to perform a variety of operations.
For example, it is possible to move nodes from one location in the tree
to another location in the tree, thus rearranging the structure of the
XML document represented by the Document object. It is
also possible to delete nodes, and to insert new nodes. It is
also possible
to
recursively traverse the tree, extracting information about the nodes
along
the way.
I showed you …
In the previous lesson on Java JAXP, I began by providing a brief
review of XSL and XSL Transformations (XSLT).
Then I showed you how to create an identity Transformer
object, and how to use that object to:
- Display a DOM tree structure on the screen in XML format.
- Write the contents of a DOM tree structure into an output XML
file.
Following that, I showed you how to write exception handlers that
provide meaningful information in the event of errors and exceptions,
with particular emphasis on parser errors and exceptions.
I will show you
…
In this lesson, I will show you how to write a program to display a
DOM tree on the screen in a format that is much easier to interpret
than raw XML code. I will explain two different versions of the
program. One version will simply identify text nodes in the
output tree. The other will display the value of text nodes in
the output tree. The first version will ignore attributes in the
output tree. The second version will include attributes in the
output tree.
Discussion
and Sample Code
The first program that I will discuss, named DomTree01, analyzes a DOM tree that
represents an XML document, and produces an output on the screen
similar to the tree shown in Figure 1.
#document DOCUMENT_NODE Figure 1 |
The physical tree structure shown in Figure 1 represents the
corresponding XML document as a visual tree. As I discuss the
various parts of the XML document, you should be able to correlate
those parts of the document to the tree structure shown in Figure 1.
The sample XML
file named DomTree01.xml
The tree structure in Figure 1 corresponds to an XML file named DomTree01.xml. As is often the
case, I will discuss the XML files and the programs in fragments.
A complete listing of DomTree01.xml
is shown in Listing 21 near the end of the lesson. Listing 1
shows the beginning of the XML file
<?xml version="1.0"?> |
The structure
of the XML file named DomTree01.xml
That portion of the XML file shown in Listing 1 consists of five
items that are represented by the following nodes in the DOM tree:
- A Document node
- A Document-Type node
- A Comment node
- A Processing Instruction node representing a stylesheet
- A Processing Instruction node representing a dummy processing
instruction
The last four node types in the above list represent nodes that are
children of the Document node. The Document node is the root of
the entire DOM tree, and all other nodes in the DOM tree are children
of the Document node.
The five items are separated by blank lines in Listing 1, so you
should be able to correlate them visually with the five nodes in the
above list.
that although it is tempting to believe that the Document node
correlates with the XML declaration in the first line of Listing 1, the
XML declaration is not required, and the DOM tree will be rooted in a
Document node, even in the absence of an XML declaration.)
The DOM tree
exposed
Figure 2 shows a reproduction of the first five lines from Figure
1. Each line in Figure 2 represents a node in the DOM tree.
You should be able to correlate each line in Figure 2 with one
of the nodes in the above list, and also with one of the items in
Listing 1 (except for the
DOCUMENT_NODE for which there is no explicit item in Listing 1).
The indentation in Figure 2 indicates that the last four lines in
Figure 2 represent nodes that are children of the node represented by
the Document node in the first line.
#document DOCUMENT_NODE Figure 2 |
The prolog of
the XML document
Listing 1 shows the prolog
for this XML document, which includes everything prior to the start tag
for the root element. Figure 2 shows the DOM nodes associated
with the prolog.
The root
element in the XML document
Listing 2 shows the XML code for the root element and the six nodes
following the root-element node in the DOM tree.
The XML code in Listing 2 produces the following node types in the DOM
tree, with the parent-child relationships shown.
- An Element node named A, which is the root element node
- An Element node named Q
- A Text node
- An Element node named B
- An Element node named C
- A Text node
- A CDATA Section node
<A> |
A is a child of
the document root node
Referring back to Figure 1, you can see that the Element node named A
is a child of the Document node,
which forms the root of the DOM tree. The node for element A is
the
root element node for the DOM
tree, (which is different from the
root node for the DOM tree). All of the data stored in an
XML
document is stored in the root element node and its children.
Figure 3 shows a reproduction of the next seven lines from Figure 1,
showing the tree structure and the parent-child relationships among the
nodes. The nodes shown in Figure 3 correspond to the XML code in
Listing 2.
A ELEMENT_NODE Figure 3 |
Easier to
interpret
Unless you have a lot of practice reading XML code, you may have
concluded
by now that the representations of the DOM tree in Figures 2, and 3 are
much easier to get your mind around than the raw XML shown in Listings
1 and 2.
Node types seen
thus far
So far, we have seen the following types of nodes:
- Document node
- Document-Type node
- Comment node
- Processing Instruction node
- Element node
- Text node
- CDATA Section node
It will be useful at this point to provide a brief explanation for each
of
these node
types.
The Document
node and the XML declaration
According to XML in a Nutshell by Harold and Means, which I recommend
as an excellent book,
documents should, (but do not have to) begin with an XML
declaration. The XML declaration looks like a processing
instruction with the name xml and version, standalone, and encoding
attributes. Technically, it’s not a processing instruction
though, just the XML declaration; nothing more, nothing less.”
As I mentioned earlier, every XML DOM tree is rooted in a Document
node, even in the absence of an XML declaration. Apparently, the
DOM
tree does not contain a node that represents the XML declaration, and
the XML document doesn’t contain any specific text that represents the
Document node.
Although the XML declaration is used for
information purposes by a validating XML parser, if it is possible to
recover the XML declaration from the DOM tree, I don’t know how to do
that at this time.
Document-Type
node
A valid XML document contains a reference to a Document Type Declaration (DTD) to
which the document should be compared for validation purposes.
The DTD can also be included in the XML document prolog, as is the case
in Listing 1.
DTD in Listing 1 begins with <!DOCTYPE and ends with ]>)
According to XML in a Nutshell,
are written in a formal syntax that explains precisely which elements
and entities may appear where in the document and what the elements’
contents and attributes are.”
For example, the DTD in Listing 1 states that the element named A must
contain the elements named Q, B, and B, in that order. I’m not
going to try to explain the rules for writing DTDs. There are
numerous tutorials on the Web that you can refer to in this regard.
The DTD in Listing 1 produced the Document-Type node in the tree in
Figure 2.
certain situations, a schema can be used for validation in place of a
DTD.)
Comment node
A comment in XML means pretty much the same thing as a comment in
Java. XML comments are generally ignored by XML processors.
They are intended primarily for human consumption.
Listing 1 contains an XML comment with the file name and some other
information. This comment produced the Comment node in the tree
of Figure 2.
Processing
Instruction node
XML processing instructions begin with <? and end with ?>.
Processing instructions are intended to provide instructions to
processing programs that may be called upon to process an XML document.
Listing 1 contains two separate processing instructions. The two
processing instructions gave rise to the two Processing Instruction
nodes in the tree in Figure 2.
Element node
As you learned in the previous two lessons, XML syntax includes
elements, consisting of start tags, end tags, optional content, and
optional attributes.
Listing 2 contains all or part of several elements. The elements
gave rise to the Element nodes in Figure 3. The text content of
the elements gave rise to the Text nodes in Figure 3.
that the actual text in this XML document is not intended to have any
meaning other than to constitute text nodes in the DOM tree for
illustration purposes.)
Text node
When you include text as part or all of
the content of an XML element, each chunk of text gives rise to a text
node in the DOM tree. Figure 3 shows two text nodes produced by
the text content of the elements in Listing 2.
CDATA Section
node
XML recognizes two kinds of text data, PCDATA and CDATA. PCDATA
stands for parsed character data. CDATA stands for character data.
The primary difference between the two is as follows. PCDATA
cannot contain
certain characters such as left angle brackets (<) and ampersands
(&). The reason is that a left angle bracket would confuse
the parser, causing it to believe that it had encountered the first
character in a start or end tag. Therefore, if these characters
appear in
PCDATA, they must be represented by entities, such as <.
A CDATA section
When a block of text is declared to be of type CDATA, it is
ignored by the parser. Therefore, it can contain any
characters (with the possible
exception of ]]). A block of CDATA always begins with
<![CDATA[. The block always ends with ]]>.
that the periods in the above sentences are not parts of the CDATA
beginning and ending syntax.)
Listing 2 contains a block of CDATA, which gave rise to the CDATA
Section node in Figure 3.
Note that the Element node named C in Figure 3 has two children.
One child is a text node. The other child is a CDATA
Section node.
An interesting
case involving whitespace
I’m not going to bore you by discussing the entire XML document in this
level of detail. By now, you should be able to compare the XML in
Listing 21 with the DOM tree represented by Figure 1, and understand
how the XML code relates to the DOM tree,.
However, there is one tricky aspect involving whitespace that deserve a little
more
explanation. The DOM tree nodes shown in Figure 4 represent the
XML code shown in Listing 3.
E ELEMENT_NODE Figure 4 |
Too many text
nodes
I have colored the obvious text in Listing 3 green for emphasis.
At
first glance, it would appear that there are too many Text nodes
showing in Figure 4 to correspond to the text shown in Listing 3.
<E>First list item in E |
Another
representation of the DOM tree
Figure 5 shows another representation of the DOM tree, similar to
Figure
4, except that the actual text belonging to each Text node is shown in
Figure 5.
E ELEMENT_NODE Figure 5 |
Note the blank lines in Figure 5. This is caused by newline
characters in the actual XML code in Listing 3. In particular,
there are two Text nodes belonging to the element named E. One of
those Text nodes appears before
the element named G and the other appears after the element named
G. The Text
node after the element named G was caused by the newline character
immediately following the end tag for the element named G.
Element E may
contain PCDATA
This happens because of one line in the DTD shown in Listing 1 and
repeated below for convenience.
<!ELEMENT E (#PCDATA | G)*>
This DTD statement says that the content for an element named E may
contain Text nodes (#PCDATA) and/or elements named G in any number and
in
any order. Thus, simple newline characters inserted into the XML
to make it easier to read were interpreted as Text nodes. This
gave rise to what appears to be extra Text nodes in Figure 4.
That’s probably enough talk. It’s time to see some Java code.
The program
named DomTree01
With the preceding discussion as background, I will now discuss the
program named DomTree01,
which was used to process the file named DomTree01.xml
and to produce the Dom tree representation shown in Figure 1. As
usual, I
will discuss the program in fragments. A complete listing of the
program is shown in Listing 20 near the end of the lesson.
Purpose and
limitations of the program
This program produces a text-based output on the screen that represents
the DOM tree structure for an XML file. Note that although the
code was written to support these node types, the program was not
actually tested for the following node types:
- DOCUMENT_FRAGMENT_NODE
- ENTITY_NODE
- ENTITY_REFERENCE_NODE
- NOTATION_NODE
Note also that this program does not display attributes. That
will be accomplished in the sample program named DomTree02 to be discussed later in
this lesson.
Also note that for simplicity, no effort was made to cause the program
to produce meaningful output in the event of errors and exceptions.
The program was tested using Sun’s SDK 1.4.2 under WinXP.
Overall program
structure
This program consists of a single class with a main method that runs as a Java
application. Listing 4 shows the beginning of the class
definition and the beginning of the main
method.
public class DomTree01{ |
The code in Listing 4 is straightforward:
- It declares and
initializes an instance variable that is used later for control of
indentation in the output display. - It also provides usage instructions if the user
starts the program with the wrong number of command-line arguments.
Running the
program
Two command-line parameters are required. The first parameter is
the path and file name of the file containing the XML document to be
processed. The second command-line parameter is either “y” or “n”
specifying whether or not the parser should attempt to validate the XML
document.
the program is instructed to validate the document,
a DTD (or schema) must be
provided either inline or as a reference in the XML document.)
Steps for creating a Document object
As you learned in an earlier lesson, three steps
are required to create a Document object:
- Create a DocumentBuilderFactory object
- Use the DocumentBuilderFactory object to create a DocumentBuilder
object - Use the parse method of the DocumentBuilder object
to create a Document object
Create a
DocumentBuilderFactory object
The first step in the above list is accomplished by the code in Listing
5..
try{ |
There is very little in Listing 5 that wasn’t discussed in detail in
earlier lessons. About the only thing that is new is the
invocation of the setter method at the end of Listing 5 to cause the
parser to ignore cosmetic whitespace in the XML document.
whitespace consists of spaces, tabs, newlines, etc., inserted into the
XML document between elements to make the document easier to read.)
This wasn’t discussed in the previous lessons because it only works
with a validating parser. The parsers used in the two previous
lessons were not validating parsers.
Create a
Document object
The remaining two steps required to create a Document object are accomplished in
Listing 6.
//Get a DocumentBuilder (parser) object |
The code in Listing 6 was also discussed in detail in the two previous
lessons, so I won’t discuss that code further here.
Process the
Document object
Code that is new to this lesson begins in Listing 7. The code in
Listing 7 instantiates a new object of the program class and invokes
the processNode method on that
object, passing the Document
object’s reference as a parameter.
//Instantiate an object of this class |
Listing 7 also contains a simple exception handler, which signals the
end of the main method.
The processNode
method
The processNode method, which
begins in Listing 8, is used to recursively process the DOM tree,
identifying and displaying the tree structure along the way.
private void processNode(Node node){ |
Recall from an earlier lesson that the Document
interface extends the Node
interface, which provides a multiplicity of
methods that can be used to navigate and manipulate the DOM tree.
Therefore, a Document object
can be treated as
type Node. The required
type for the incoming
parameter to the processNode
method is type Node.
The code in Listing 8 simply checks to confirm that the incoming
reference
does not have a value of null. If it does, the code in Listing 8
prints an error message and
returns.
Perform the
recursive processing on the incoming node
The code in Listing 9 shows the beginning of what happens if the
incoming parameter is not null.
indent++; |
As you will see later, the processNode
method will continue calling itself recursively until all of the nodes
in the DOM tree have been examined. Information about the tree
structure will be extracted and displayed as each node is
examined. When all of
the nodes in the DOM tree have been examined, the program will
terminate.
Indentation
Recall the instance variable named indent
that was declared and initialized in Listing 4. Each time control
enters the processNode method (with a non-null Node parameter), the
value of that instance variable is incremented. Each time control
exits the method (except for the
case of a null Node parameter),
the value of that instance variable is
decremented. Therefore, at any point in time, the value of indent indicates the current depth (in the DOM tree)
of the node that is being examined.
Get node name
and type
The variable named indent is
incremented in Listing 9. Following this, two methods are called
on the incoming Node parameter
to get and save the name and the type of the node currently being
examined.
Some types of nodes have generic names, such as #text. Other types of nodes
have actual names, which match element names in the XML document.
The doIndent
method
At this point, I am going to skip ahead and show you a very simple
method named doIndent, (which actually appears near the end of
the program code in Listing 20).
The code for this method is shown in Listing 10.
private void doIndent(){ |
The purpose of this method is to move the cursor to the right on the
screen to accomplish indentation in the display. Each time method
is called, it moves the cursor to the right by an amount equal to twice
the value of the variable named indent.
This produces two spaces for each level of indentation.
Display the
name of the node
Returning to the discussion of the processNode
method, Listing 11 invokes the doIndent
method to produce the required indentation, and then displays the name
of the
current node, followed by a space. Note that the
cursor remains immediately to the right of the space and does not
advance to the
next line at this time.
doIndent(); |
Display the
type of the node on the same line
Recall that the invocation of the getNodeType
method in Listing 9 returned a value of type int. The Node interface defines about a dozen
symbolic constants that correlate the type values to names such as CDATA_SECTION_NODE.
A switch
statement
Listing 12 shown the beginning of a switch
statement that uses the type value from Listing 9, along with the
constants from the Node
interface to display the alphanumeric node type to the right of the
node name that was displayed by the code in Listing 11.
switch(type){ |
When the alphanumeric node type is displayed, the cursor moves down to
the left-hand side of the next line.
For example, the code in Listings 11 and 12 would produce output
similar to that shown in Figure 6 (the
indentation may be different for different XML documents).
#cdata-section CDATA_SECTION_NODE Figure 6 |
The remainder
of the switch statement
Listing 13 shows the remainder of the switch
statement. There is nothing special about the code in Listing
13. As each node is examined, the code in Listing 11 performs the
proper indentation and displays the name of the node. Then one of
the cases in the switch
statement is invoked to display the alphanumeric node type to the
right of the node name and to advance the display cursor to the next
line.
case Node.COMMENT_NODE:{ |
Get and process
children of the current node
Following the switch
statement, the code in Listing 14 invokes the getChildNodes method on the current
node to get a list of the nodes that are children of the current
node. That list is returned as an object of type NodeList. The NodeList object’s
reference is stored in the reference variable named children.
NodeList children = node.getChildNodes(); |
A NodeList object provides an
ordered collection of nodes, and provides two methods for accessing the
items in the list:
- A method named getLength
returns the number of
nodes in the list. - A method named item
takes a parameter of type int,
and uses that parameter to
return the Node object’s
reference that is stored at that index.
Make recursive
call to processNode method on each child node
Provided that the NodeList
reference in the variable named children
is not null, the code in Listing 15 uses a for loop to process each node whose
reference is stored in the list.
if (children != null){ |
This is where the recursive processing occurs. The boldface
statement in Listing 15, recursively invokes the processNode method once for each
item in the list, passing the item as a parameter to the processNode method.
This causes the program to recursively examine every node in the DOM
tree, (except for attribute nodes)
extracting and
displaying information about each node as it is examined. This
includes nodes in the prolog of the XML document as well as nodes in
the body of the XML document.
issue of attribute nodes will be addressed in the next sample program.)
Decrease
indentation level and terminate processNode method
When all the invocations of the processNode
method finally return and the current instance of the processNode method terminates, it
decreases the value of the variable named indent prior to termination as shown
in Listing 16.
indent--; |
Listing 16 signals the end of the processNode
method, and the beginning of the method named doIndent, which was discussed
earlier.
it was discussed earlier, the code for the doIndent method was not included in
Listing 16.)
The end of the doIndent method
signals the end of the class and the end of the program named DomTree01.
The program
named DomTree02
The program named DomTree02 is
an upgraded
version of DomTree01.
This program displays the actual text belonging to text nodes instead
of
simply showing the type of node as TEXT_NODE.
DomTree02 also displays
attribute names and values, which is not the
case with DomTree01.
Sample output
from DomTree02
Figure 7 shows the output produced by using DomTree02 to process the XML file
named DomTree02.xml. (You can view a listing of this XML file
in Listing 23
near the end of the lesson.)
I colored the attributes red and the text green in Figure 7
to make them easy to spot.
that some of the text consists of invisible newline characters, which
are impossible to color green.)
#document DOCUMENT_NODE Figure 7 |
Displaying text
versus displaying node type
Sometimes it can be very useful to display the actual text values in
the tree. At other times, the text is so voluminous that it
completely overwhelms the display making it difficult to pick out the
structure of the tree. In those cases, the version that simply
identifies the node as a text node is probably advantageous.
good learning exercise would be for you to write a single program where
the user specifies whether the tree is to simply identify text nodes,
or is to display the actual
text value of each text node, by entering a parameter on the command
line.)
Will discuss in
fragments
I will discuss the program named DomTree02
in fragments. A complete listing of the program is shown in
Listing 22 near the end of the lesson.
Large portions of this program are identical or very similar to the
code in the program named DomTree01,
discussed earlier in this lesson. Therefore, I won’t repeat the
discussion of that code. Rather, I will restrict this discussion
to those parts of this program that differ from the earlier program.
The main method in this
program is essentially the same as the main
method in the previous program, so I will skip a discussion of the main method.
As before, the method named processNode
is used to recursively process the entire DOM tree, extracting and
displaying information about the nodes in the tree along the way.
The method named processNode
in this program is the same as in the previous program except for the
code in a couple of cases in the switch
statement.
New features in
DomTree02
Previously, the cases in the switch
statement were used to display the alphanumeric type of each node in
the tree. In this program, the case for TEXT_NODE is modified to cause
the actual text value of the text node to be displayed instead of the
type of the node.
In addition, the case for ELEMENT_NODE
in this program is modified to get and display the names and
values of all attributes associated with elements.
The
ELEMENT_NODE case
I will begin by explaining the changes to the ELEMENT_NODE case in the switch statement. Listing 17
shows the beginning of the ELEMENT_NODE
case.
private void processNode(Node node){ |
A map of
attribute nodes
There is a very important conceptual issue to deal with here.
Specifically, attribute nodes are not simply child nodes of element
nodes. In particular, all child nodes of an element node can be
obtained in a collection of type NodeList
by invoking the method named getChildNodes
on the element node.
In order to get the attributes belonging to an element node, it is
necessary to invoke the method named getAttributes
on the element node. This method returns a reference to an object
of type
NamedNodeMap containing
unordered references to the
attribute nodes.
NamedNodeMap
versus NodeList
A NamedNodeMap is a different
type of data structure than a NodeList.
A NodeList is an ordered
collection of references to Node
objects. Items in the list are accessed on the basis of an
ordinal index. They cannot be accessed on the basis of the name
of a node. The order of the items in the list matches the
ordering of the corresponding nodes in the DOM tree.
NamedNodeMap
Sun describes objects of type NamedNodeMap
as
of nodes that can be accessed by name”.
Sun goes on to tell us,
are not maintained in any particular order. Objects contained in an
object implementing NamedNodeMap may also be accessed by an ordinal
index, but this is simply to allow convenient enumeration of the
contents of a NamedNodeMap, and does not imply that the DOM specifies
an order to these Nodes.”
Therefore, references to objects representing attribute nodes can be
accessed
in a NamedNodeMap object
either on the basis of the attribute name, or on the basis of an
ordinal index. I will use an ordinal index in this program, as
shown in Listing 18.
Get and display
name and value of attribute nodes
Listing 18 shows the remaining code for the ELEMENT_NODE case in the switch statement.
for(int i = 0; i < attrLen; i++){ |
Listing 18 uses a for loop to
iterate on the NamedNodeMap
object, getting a reference to each attribute node in sequence, and
using that
reference to get and display the name and value of the attribute
properly indented.
in Listing 18 and Figure 7 that the attribute information was indented
an additional four spaces relative to the element node to visually
separate the attribute information from the child node of the
element. This was done solely for cosmetic purposes.)
The modified
TEXT_NODE case
Listing 19 shows the modified TEXT_NODE
case in the switch statement,
and the end of the switch
statement.
//Case code deleted for brevity |
The version of this case in the program named DomTree01 simply displayed the text TEXT_NODE each time the case was
invoked.
This version invokes the method named getNodeValue
on the node and displays the String
that is returned by that method. This code produced the green
text values for the text nodes represented in Figure 7.
that the word #text in Figure
7 was displayed by code that invoked the getNodeName method prior to control
entering the switch
statement. This is the same in both programs. Only the red
and green text in Figure 7 is new.)
Beyond this
point, both programs are the same
The remainder of this program is the same as DomTree01, and therefore, doesn’t
merit further discussion.
Run the Programs
I encourage you to copy the code and XML data from Listings 20
through 23 into your text editor. Compile and execute the
programs. Experiment with them, making changes, and observing the
results
of your
changes.
Summary
In this lesson, I showed you how to write a program to display a
DOM tree on the screen in a format that is much easier to interpret
than raw XML code. I explained two different versions of the
program. One version simply identifies text nodes in the
output tree. The other version displays the value of text nodes
in
the output tree. Also, the first version ignores attributes in
the
output tree, while the second version includes attributes in the
output tree.
What’s Next?
In the next lesson, I will explain default XSLT behavior
and show you how to write Java code that mimics that behavior.
The resulting Java code will serve as a skeleton for more advanced
transformation programs.
Complete Program Listings
Complete listings of the Java class and the XML documents discussed in
this lesson are shown in Listings 20 through 23 below.
/*File DomTree01.java |
/*File DomTree02.java |
<?xml version="1.0"?> |
Copyright 2003, Richard G. Baldwin. Reproduction in whole or
in
part in any form or medium without express written permission from
Richard
Baldwin is prohibited.
About the author
Richard Baldwin
is a college professor (at Austin Community College in Austin, TX) and
private consultant whose primary focus is a combination of Java, C#,
and XML. In addition to the many platform and/or language independent
benefits of Java and C# applications, he believes that a combination of
Java, C#, and XML will become the primary driving force in the delivery
of structured information on the Web.
Richard has participated in numerous consulting projects, and he
frequently provides onsite training at the high-tech companies located
in and around Austin, Texas. He is the author of Baldwin’s
Programming Tutorials, which
has gained a worldwide following among experienced and aspiring
programmers. He has also published articles in JavaPro magazine.
Richard holds an MSEE degree from Southern Methodist University
and has many years of experience in the application of computer
technology to real-world problems.
-end-