LanguagesXMLParsing XML Documents: Events, Part 6

Parsing XML Documents: Events, Part 6

Developer.com content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.


Preface

This is the last in a series of six tutorial lessons intended to teach you how to use a SAX-based parser and the Java programming language to parse and process XML documents.

I maintain a consolidated index of hyperlinks to all of my XML articles at my personal website.  You can easily locate and access my XML articles from there.

In order for XML to be useful, you must be able to process your XML documents so as to produce a useful output.  That is what this series of six lessons is all about — processing XML documents.

Preview

One of the common ways to create custom XML processing tools is through the use of an event-based parser that implements the SAX interface, along with either the Java or Python programming language. 

In the previous lesson, entitled Parsing XML Documents: Events, Part 5, I showed you the output produced by processing a simple XML file using a parser program from IBM known as XML4J.  That parser is currently available for free downloading from IBM. The XML file contained an XML syntax error.  The syntax error was purposely introduced into the file to illustrate the error handling capability of a SAX-based XML parser.

Introduction

An event-based parser reports events to the processing program using callbacks. The processing program implements and registers event handlers for the different types of events. Code written into the event handlers is designed to achieve the overall objective of the program. IBM’s XML for Java is a validating XML parser written in 100% pure Java. This is the parser that I will use in the sample program that I will discuss in this lesson. According to IBM, XML4J Version 3.1.1 contains public and stable support of the SAX Level 1 specifications.

A Sample Program

In the previous lessons of this series, I have promised to show you how to write a Java program that uses XML4J to parse a simple XML document.  The first five lessons in the series were intended to prepare you to understand the detailed material that I will provide and discuss in this lesson. I promised that the program will deliver a series of events to the appropriate event handler methods as the parser traverses the XML document, and that the event handler methods will extract and display information about the XML document. In this lesson, I will discuss the processing program in detail.

The XML file that I will be parsing is shown in Listing 1 below:
 

<?xml version="1.0"?>

<bookOfPoems>

<poem PoemNumber="1" 
                  DummyAttribute="dummy value">
<line>Roses are red,</line>
<line>Violets are blue.</line>
<line>Sugar is sweet,</line>
<line>and so are you.</line>
</poem>

<poem PoemNumber="2"
                  DummyAttribute="dummy value">
<line>Twas the night before Christmas,</line>
<line>And all through the house,
<line>Not a creature was stirring,</line>
<line>Not even a mouse.</line>
</poem>

</bookOfPoems>

Listing 1

As you can see from Listing 1, the XML file used with this sample program represents the rudimentary aspects of a book of poems. It contains one verse from each of two well-known poems. This book of poems contains two poems, one about roses, and the other about a mouse.  The XML markup for the first poem is correct from a syntax viewpoint. However, a syntax error was purposely introduced into the second poem to illustrate the error-handling capability of SAX and the IBM parser. The error is highlighted in bold in Listing 1 above. The highlighted element is missing its end tag (</line>).

This program uses XML4J-3.1.1, along with the XML file shown in Listing 1, to illustrate the trapping and handling of parser events along with customized error handling. The purpose of the program is to

  • Traverse the XML file
  • Display the elements
  • Display the attributes
  • Display the text of the poems

As mentioned earlier, the first poem has the correct XML syntax, but the second poem is purposely missing an end tag midway through the poem. The program was tested using JDK 1.3 from Sun under WinNT 4.0 using XML4J-3.1.1 from IBM.  The processing results were presented and discussed in the previous lesson entitled Parsing XML Documents: Events, Part 5.  

Listing 2 below shows the output produced by the program. I manually inserted some line breaks to force the output material to fit in this narrow presentation format. I also deleted some blank lines to reduce the overall size of the output listing and colored the error message red to make it easy to spot.
 

Start Document
Start element: bookOfPoems

Start element: poem
Attribute: 
PoemNumber, Value = 1, Type = CDATA
Attribute: 
DummyAttribute, Value = dummy value, 
Type = CDATA

Start element: line
Roses are red,
End element: line

Start element: line
Violets are blue.
End element: line

Start element: line
Sugar is sweet,
End element: line

Start element: line
and so are you.
End element: line

End element: poem

Start element: poem
Attribute: 
PoemNumber, Value = 2, Type = CDATA
Attribute: 
DummyAttribute, Value = dummy value, 
Type = CDATA

Start element: line
Twas the night before Christmas,
End element: line

Start element: line
And all through the house,

Start element: line
Not a creature was stirring,
End element: line

Start element: line
Not even a mouse.
End element: line
 
systemID: file:///D:/Baldwin
/AA-School/JavaProg/Combined/Java
/Sax01.xml
[Fatal Error] Sax01.xml:17:9: 
The element type "line" must be 
terminated by the matching 
end-tag "</line>".
Terminating

Listing 2

You may need to really put your brain in gear for this lesson. It goes much more deeply into technical material than is normally the case in these XML lessons. If the Java technology used in this article is unfamiliar to you, see my online Java tutorials for an explanation of these and other Object-Oriented Programming concepts using Java.

Interesting Code Fragments

I will discuss the sample program in fragments.  A complete listing of the program is shown in Listing 26 near the end of the lesson. The first fragment in Listing 3 shows the required import directives. These directives are shown here to illustrate that the program imports packages that are part of the IBM parser library and are not part of the standard Java API.
 

import org.xml.sax.*;
import org.xml.sax.helpers.ParserFactory;

Listing 3

Identifying the parser package

Listing 4 shows the controlling class and the beginning of the main method for the program. The class definition begins by defining a String that identifies the class from which the parser will be instantiated. The particular string used here identifies the IBM parser in XML4J-3.1.1.  This is a different string than was required in the earlier version of the IBM parser.  Although I haven’t tried it, I believe that this is the only statement that would need to be modified in order to use this program with a SAX based parser from a different vendor.
 

class Sax01 {
  static final String parserClass = 
      "org.apache.xerces.parsers.SAXParser";

  public static void main (
      String args[])throws Exception{
    Parser parser = 
            ParserFactory.makeParser(
                        parserClass);

Listing 4

The first statement inside the main method shown above uses a SAX factory method along with the identification of the parser vender to create an object of type Parser. This is actually an object of the interface type org.xml.sax.Parser. All SAX parsers must implement this interface. It allows applications to register handlers for different types of events and to initiate a parse from a URI, or a character stream.

Completely as an aside, in case you, like many others, are having difficulty separating URI, URL, URN, and URC in your mind, Listing 5 contains a quote from a W3C document that explains the differences in the terms.  I provide this here simply as reference material.  I won’t attempt to explain it further.
 

URI — Uniform Resource Identifier. The generic set of all names/addresses that are short strings that refer to resources. (specified 1994; ratified as Internet Draft Standard 1998)

URL — Uniform Resource Locator. The set of URI schemes that have explicit instructions on how to access the resource on the internet. Full definition is given in the URL specification. 

URN — Uniform Resource Name.
1.An URI that has an institutional commitment to persistence, availability, etc. Note that this sort of URI may also be a URL. See, for example, PURLs.

2.A particular scheme which is currently (1991,2,3,4,5,6,7) under development in the IETF (see discussion forums below), which should provide for the resolution using internet protocols of names which have a greater persistence than that currently associated with internet host names or organizations. When defined, a URN(2) will be an example of a URI.

URC — Uniform Resource Citation, or Uniform Resource Characteristics. A set of attribute/value pairs describing a resource. Some of the values may be URIs of various kinds. Others may include, for example, authorship, publisher, datatype, date, copyright status and shoe size. Not normally discussed as a short string, but a set of fields and values with some defined free formatting.

Listing 5

Listing 6 shows what the Java documentation has to say about the class named org.xml.sax.helpers.ParserFactory.
 

Java-specific class for dynamically loading SAX parsers. 

This class is not part of the platform-independent definition of SAX; it is an additional convenience class designed specifically for Java XML application writers. SAX applications can use the static methods in this class to allocate a SAX parser dynamically at run-time based either on the value of the ‘org.xml.sax.parser’ system property or on a string containing the class name.

Listing 6

Listing 7 tells us what Clifford J. Berg, author of Advanced JAVA Development for Enterprise Applications has to say about factory methods in general:
 

A class you have defined that has a method createInstance() — or any method — that has the function of creating an instance based on runtime or configuration criteria such as property settings.

Listing 7

The bottom line is that the makeParser() method of the ParserFactory class creates an instance (object) of a class that implements the Parser interface. The object is based on a String that specifies the class libraries provided by the vendor of the SAX based parser software. This parser object can then be used to perform the routine processing of the XML file, generating a series of document events and potentially error events based on the information in the file.

The code in Listing 8 instantiates an object of the DocumentHandler type to handle events and errors. Note that DocumentHandler is an interface, not a class. I will explain how this object performs its work in conjunction with a discussion of the EventHandler class later.
 

    DocumentHandler handler = 
                  new EventHandler();

Listing 8

The two statements in Listing 9 below can be confusing to persons who have become used to Java Beans design patterns.
 

  parser.setDocumentHandler(handler);
  parser.setErrorHandler(
              (ErrorHandler)handler);

Listing 9

Generally the design pattern specifications indicate that:

  • Methods that are used to register event listeners should begin with the word add
  • Methods that provide mutable access to properties should begin with the word set.

However, the two statements in Listing 9 invoke methods that begin with the word set to register two different listeners on the Parser object.  These are not property methods, they are registration methods. One of the handlers listens for document events such as the start or end of an element. The other handler listens for events caused by errors in the XML data.

Document event methods and error event methods are declared in two different interfaces. The handler object instantiated in Listing 8 above is of the type EventHandler. A superclass of that class implements both interfaces making it possible for an object of that type to listen for both types of events. However, it does give rise to the requirement to downcast the handler object to type ErrorHandler before registering it on the parser object as shown in Listing 9.

The single executable statement in Listing 10 below is where the action is centered. This statement executes the parse() method on the object of type Parser causing it to make a pass through the XML document specified by the parameter having the value Sax01.xml.  If you would like to use this program to parse a different XML document, you will need to change the value of this parameter to match the name of your XML document.
 

parser.parse("Sax01.xml");

Listing 10

While making the pass through the document, this method generates a variety of document events and error events as the various tags, attributes, and data values in that document are encountered. This, in turn, causes event and error handling methods overridden by the application programmer to be executed, providing the functional behavior of the program. The statement in Listing 10 above ends the main() method and also ends the controlling class.

The code fragment in Listing 11 below begins the definition of the class containing overridden methods for handling document events and error events.
 

class EventHandler 
                 extends HandlerBase{
  //handle startDocument event
  public void startDocument(){
    System.out.println(
                   "Start Document");
  }//end startDocument()
    
  //handle endDocument event
  public void endDocument(){
    System.out.println(
                     "End Document");
  }//end endDocument()

Listing 11

This class extends the class named HandlerBase. The class named HandlerBase, which is the default base class for handlers, implements the default behavior for four different SAX interfaces:

  • DocumentHandler
  • ErrorHandler
  • EntityResolver
  • DTDHandler

Only the first two of these interfaces are of interest to us in this lesson.

The use of the HandlerBase class is optional. Application writers can extend this class when they need to implement only part of an interface. Parser writers can instantiate this class to provide default handlers when the application has not supplied its own.

The EventHandler class overrides the event handling methods of the DocumentHandler interface and the ErrorHandler interface to provide the desired functionality for the program. The fragment in Listing 11 above shows the beginning of the class along with the first two overridden event-handling methods. The Parser object invokes these two overridden methods when the parse process encounters the beginning and the end of the XML document. The default versions of these two methods return quietly doing nothing. Application writers can override the startDocument() method to take specific actions at the beginning of a document (such as creating an output file, for example). Similarly the application writer can override endDocument() to take specific action at the end of a document (such as closing a file). Note that these methods don’t receive any parameters. In this sample program, these overridden methods simply announce the beginning and the end of the document.

The next overridden handler method, shown in Listing 12 below, is more complicated than most in this article. This method is invoked at the start of every element. For review, the start or beginning of an element might look like this in an XML document:

<poem PoemNumber="1"
DummyAttribute="dummy value">

The boldface portions are commonly referred to as attributes. Unless prohibited by a DTD, an element can contain zero or more attributes. In this case, the element named poem contains two attributes.  The attributes are named PoemNumber and DummyAttribute (the name of the attribute is unrelated to the name of the element). Each attribute also has a value, which is enclosed in double quotation marks. In this case, the values for the two attributes are 1 and dummy value.

The event handler method that gets called when the parser encounters a new element is startElement(), as shown in Listing 12 below. This method receives two parameters. The first parameter is a String containing the name of the element. The second parameter is a reference to an object of type AttributeList containing information about the attributes.  If you examine the code, you will note that this object is an indexable container, which can contain information about none or more attributes. The code in the following fragment iterates through the AttributeList object, extracting and displaying information about each of the attributes described by that object.
 

  public void startElement(
                String name,
                AttributeList atts){
    System.out.println(
          "Start element: " + name);
    if (atts != null) {
      int len = atts.getLength();
      //process all attributes
      for (int i = 0; i < len; i++){
        String attName = 
                    atts.getName(i);
        String type = 
                    atts.getType(i);
        String value = 
                   atts.getValue(i);
        System.out.println(
             "Attribute: " + attName 
              + ", Value = " + value 
              + ", Type = " + type);
      }//end for loop on attributes
    }//end if
  }//end start element

Listing 12

Note that AttributeList is an interface that is implemented by the parser vendor, so the vendor programmer is free to implement the container in whatever way she chooses. An AttributeList object includes only attributes that have been specified or defaulted: #IMPLIED attributes are not included. Here is some information about attributes of the #IMPLIED variety:
 

The XML document may provide a value for the attribute but is not required to do so. In this case, if no value is provided, an application-dependent value will be used. For example, for an IMPLIED attribute named backgroundColor, an XML processor might accept a value if provided in the XML document, and might cause the background color to be green if an attribute value is not provided. A different XML processor might cause the same default background color to be red. That is what I mean by application-dependent value.

Listing 13

There are two ways for the application to obtain information from the AttributeList.

  1. It can iterate through the entire list as in Listing 12 above.
  2. It can request the value or type of specific attributes as in the code in Listing 14 below where the name of the attribute is passed as a parameter.  This explains why it is not simply a Java array object but is a more sophisticated container.

Note that the formulation shown in Listing 14 is not used in this sample program.
 

public void startElement(
                 String name, 
                 AttributeList atts){
  String identifier = 
                 atts.getValue("id");
  String label = 
              atts.getValue("label");
   […]
}//end startElement()

Listing 14

The output produced for the first element and the attributes of that element for each of the poems in this article is shown in Listing 15 below. (Note that line breaks and spaces were manually inserted to force the material to fit into this narrow presentation format.)
 

Start element: poem
Attribute: PoemNumber,
           Value = 1,
           Type = CDATA
Attribute: DummyAttribute,
           Value = dummy value,
           Type = CDATA

Start element: poem
Attribute: PoemNumber,
           Value = 2,
           Type = CDATA
Attribute: DummyAttribute,
           Value = dummy value,
           Type = CDATA

Listing 15

The name and value of the attribute is pretty obvious, but what about the type CDATA?  Here is some information about the concept of CDATA in XML:
 

CDATA means that the value of this attribute may be any string of characters (as well as an empty string) and should be ignored by the parser. CDATA is used in situations where it is impossible to force more strict limitations on the attribute value with one of the following keywords…

Listing 16

There are three allowable types for an attribute:

  • String type, such as CDATA
  • Tokenized types
  • Enumerated types, such as (true | false)

I’m going to drop this discussion on CDATA at this point. If you would like to pursue it further, read my tutorial that contains detailed information about DTDs.

The endElement() handler method is much simpler

Because it doesn’t need to deal with attributes, the overridden endElement() event handler, shown in Listing 17 below, is much simpler than the startElement() method shown in Listing 12. The endElement() method is invoked when the parser encounters an end tag for an element. This method receives a single parameter that is the name of the element. This overridden version simply announces that the event has occurred and displays the name of the element.
 

  public void endElement(
                        String name){
    System.out.println(
             "End element: " + name);
  }//end endElement()

Listing 17

The content of an XML element is the text that appears between the beginning and ending tags. The characters() method shown in Listing 19 below, shows the event handler that is invoked by the parser when the parser encounters content. Listing 18 tells us what the documentation has to say about the characters() method:
 

public void characters(char[] ch,
              int start,
              int length)
              throws SAXException

Receive notification of character data. 

The Parser will call this method to report each chunk of character data. SAX parsers may return all contiguous character data in a single chunk, or they may split it into several chunks; however, all of the characters in any single event must come from the same external entity, so that the Locator provides useful information.

The application must not attempt to read from the array outside of the specified range.

Note that some parsers will report whitespace using the ignorableWhitespace() method rather than this one (validating parsers must do so).

Parameters:
ch – The characters from the XML document.
start – The start position in the array.
length – The number of characters to read from the array.

Listing 18

This method receives a character array (along with some indexing information) containing the content of an element. The overridden version of the method in this sample program simply converts the array to a String object and displays it.
 

  public void characters(char[] ch,
                         int start,
                         int length){
    System.out.println(
      new String(ch, start, length));
  }//end characters()

Listing 19

This overridden method produced the boldface lines in the output for the first poem shown in Listing 20.
 

Start element: line
Roses are red,
End element: line

Start element: line
Violets are blue.
End element: line

Start element: line
Sugar is sweet,
End element: line

Start element: line
and so are you.
End element: line

Listing 20

That completes my discussion of overridden methods of the DocumentHandler interface. The above examples have shown all of the methods of this interface except for the following:

  • ignorableWhitespace(char[] ch, int start, int length)
  • processingInstruction(java.lang.String target, java.lang.String data)
  • setDocumentLocator(Locator locator)

I will leave it as an exercise for you to investigate the first two methods in this list. The third method will be used later in this sample program. That brings us to the methods that are declared in the interface named ErrorHandler. This interface, which declares three different handler methods, is the basic interface for SAX error handlers.

A SAX application that needs to implement customized error handling, must implement this interface. Then it must register an object of the interface type with the SAX parser using the parser’s setErrorHandler() method. The parser will then report all errors and warnings through this interface.

When the handler object is registered on the parser, the parser will use this object instead of throwing an exception. It is then up to the application to decide what to do about the problem, including whether to throw an exception for different types of errors and warnings. Note that there is no requirement for the parser to continue to provide useful information after a call to the fatalError() method.

The HandlerBase class provides a default implementation of this interface, ignoring warnings and recoverable errors and throwing a SAXParseException for fatal errors. For convenience, an application can extend that class, as was done in this sample program, rather than to implement the complete interface itself. 

The overridden versions of all three of the error handler methods are shown in Listing 21 below. All three of the methods make a call to the method named getLocationString() to determine the location of the problem in the XML document and to display that location along with the nature of the message. The getLocationString() method is discussed later. In addition, the fatalError() method terminates the program after displaying a termination message.
 

  public void warning(
               SAXParseException ex){
    System.out.println("[Warning] " +
             getLocationString(ex)+ 
             ": " + ex.getMessage());
  }//end warning()

  //handle an error
  public void error(
               SAXParseException ex){
    System.out.println("[Error] "+
             getLocationString(ex)+
             ": " + ex.getMessage());
  }//end error()

  //handle a fatal error
  public void fatalError(
                SAXParseException ex)
                throws SAXException {
    System.out.println(
             "[Fatal Error] "+
              getLocationString(ex) +
             ": " + ex.getMessage());
    System.out.println(
                      "Terminating");
    System.exit(1);
  }//end fatalError()

Listing 21

Listing 22 below shows the beginning of a private utility method named getLocationString(). This method is called by each of the error handling methods shown in Listing 21 to determine the location in the XML file where the error was detected by the parser. The method declares a StringBuffer object that is later used to construct a String containing the desired information.  A reference to this object is returned to the calling method.
 

  private String getLocationString(
               SAXParseException ex){
    StringBuffer str = 
                  new StringBuffer();

Listing 22

The first task undertaken by the getLocationString() method is to determine the name of the XML file being processed when the error occurred. This information, and other useful information as well, is contained in the SAXParseException object received by the error handler and passed on to this method as a parameter. The following methods of the SAXParseException class are of interest in this article:

  • getColumnNumber() — Get the column number (int) of the end of the text where the exception occurred.
  • getLineNumber() — Get the line number (int) of the end of the text where the exception occurred.
  • getSystemId() — Get the system identifier (String)of the entity where the exception occurred.

I’m going to begin the discussion by showing you the output produced on my computer by purposely omitting an end tag from one of the lines. (Note that I manually inserted line breaks to force the material to fit in this format.)  The error message is shown in Listing 23 below.
 

systemID: file:///D:/Baldwin
/AA-School/JavaProg/Combined/Java
/Sax01.xml
[Fatal Error] Sax01.xml:17:9: 
The element type "line" must be 
terminated by the matching 
end-tag "</line>".
Terminating

Listing 23

The beginning portions of the code that produced this output are shown in Listing 24.
 

  String systemId = ex.getSystemId();
    if(systemId != null){
      System.out.println(
            "systemID: " + systemId);
      //get file name from end of
      // systemID
      int index = 
           systemId.lastIndexOf(‘/’);
      if(index != -1){
        systemId = systemId.
                substring(index + 1);
      }//end if(index..
      str.append(systemId);
    }//end if(systemID…

Listing 24

The complete output shown in Listing 23 was produced by a combination of the getLocationString() method and the fatalError() method shown earlier in Listing 21. Part of the output was produced by the fatalError() method using the String object returned by the getLocationString() method.  As you can see from the error output shown earlier in Listing 23, the String that was returned by the getSystemId() method is the URL for the XML file on the local drive (D:). Although there is quite a bit of code involved, all that it does is extract the filename from the end of the URL and append it at the beginning of the StringBuffer object being constructed for return to the calling method.

The next fragment, shown in Listing 25 below, completes the construction of the StringBuffer object by getting the line and column number of the location of the problem in the XML file using the two methods described earlier. This information is appended onto the StringBuffer object with some colons added for cosmetic purposes. Finally, the StringBuffer object is converted to a String object and returned to the calling error handler method where it is displayed on the screen.
 

    str.append(‘:’);
    str.append(ex.getLineNumber());
    str.append(‘:’);
    str.append(ex.getColumnNumber());

    return str.toString();

  }//end getLocationString()

Listing 25

Summary

So, there you have it; a six-part series of tutorial lessons intended to teach you how to use a SAX-based parser to write your own XML processing programs in the Java programming language. Now you know what SAX is, and why it is important to Java programmers writing applications to process XML documents.  You also know how to use the capability of SAX to process XML documents.

Complete Program Listing

A complete listing of the program is shown in Listing 26.  Once you have the proper Java development software and XML4J installed, you should be able to copy this code into a Java source file, compile, and execute it providing any XML file named Sax01.xml as input.
 

/*File Sax01.java
Revised 1/20/01 to use XML4J 3.0 
upgrade.

Illustrates parser events and
customized error handling.

An XML file contains two poems.  The 
first has the correct syntax. The 
second is missing an end tag midway 
through the poem.

The first poem is parsed and displayed
successfully along with the element 
names and attribute values.

The second poem is also displayed but 
a fatal error occurs at the point where
the parser is able to determine that 
the end tag is missing.  Note, however,
that this determination is not made 
until several lines beyond the actual 
missing tag.

The program was tested using JDK 1.3 
under WinNT 4.0 and IBM XML4J-3_1_1.

Output from the program is as shown 
below.  Note that the error messages 
were sent to System.out instead of 
System.err so that they could be 
captured and reproduced here.  Note 
also that some line breaks were 
manuallyinserted to force the material
to fit in this format.  Also, a lot of
blank lines were deleted.

Note that the end tag was missing 
following the line in the poem that 
reads "And all through the house,"
  
Start Document
Start element: bookOfPoems
Start element: poem
Attribute: 
PoemNumber, Value = 1, Type = CDATA
Attribute: 
DummyAttribute, Value = dummy value, 
Type = CDATA
Start element: line
Roses are red,
End element: line
Start element: line
Violets are blue.
End element: line
Start element: line
Sugar is sweet,
End element: line
Start element: line
and so are you.
End element: line
End element: poem
Start element: poem
Attribute: 
PoemNumber, Value = 2, Type = CDATA
Attribute: 
DummyAttribute, Value = dummy value, 
Type = CDATA
Start element: line
Twas the night before Christmas,
End element: line
Start element: line
And all through the house,
Start element: line
Not a creature was stirring,
End element: line
Start element: line
Not even a mouse.
End element: line
systemID: 
file:///D:/Baldwin/AA-School/JavaProg
/Combined/Java/Sax01.xml
[Fatal Error] Sax01.xml:17:9: The 
element type "line" must be terminated
by the matching end-tag "</line>".
Terminating
**************************************/
import org.xml.sax.*;
import org.xml.sax.helpers.ParserFactory;

class Sax01 {
  static final String parserClass = 
          "org.apache.xerces.parsers.SAXParser";

  public static void main (
        String args[])throws Exception{
    Parser parser = 
              ParserFactory.makeParser(
                          parserClass);
    //Instantiate an event and error 
    // handler
    DocumentHandler handler = 
                    new EventHandler();
    //Register the event handler and
    // the error handler
    parser.setDocumentHandler(handler);
    parser.setErrorHandler(
                (ErrorHandler)handler);
    //Parse the document to create
    // the events.
    parser.parse("Sax01.xml");
  }//end main
}//end class Sax01
//===================================//

//Methods of this class are listeners
// for document events and error events
// Note that HandlerBase implements
// the ErrorHandler interface.
class EventHandler extends HandlerBase{
  //handle startDocument event
  public void startDocument(){
    System.out.println(
                     "Start Document");
  }//end startDocument()
    
  //handle endDocument event
  public void endDocument(){
    System.out.println("End Document");
  }//end endDocument()

  //handle startElement event
  // displaying attributes
  public void startElement(
       String name,AttributeList atts){
    System.out.println(
             "Start element: " + name);
    if (atts != null) {
      int len = atts.getLength();
      //process all attributes
      for (int i = 0; i < len; i++) {
        String attName = 
                       atts.getName(i);
        String type = atts.getType(i);
        String value = 
                      atts.getValue(i);
        System.out.println(
              "Attribute: " + attName 
              + ", Value = " + value 
              + ", Type = " + type);
      }//end for loop on attributes
    }//end if
  }//end start element

  //handle endElememt event
  public void endElement (String name){
    System.out.println(
               "End element: " + name);
  }//end endElement()
      
  //handle characters event
  public void characters(
       char[] ch,int start,int length){
    System.out.println(
        new String(ch, start, length));
  }//end characters()
    
  //Begin error handlers here.  These
  // methods are declared in the
  // ErrorHandler interface that is
  // implemented by the HandlerBase
  // class and extended by this class.
  
  //Handle a warning
  public void warning(
                 SAXParseException ex){
    System.out.println("[Warning] " +
              getLocationString(ex)+ 
               ": " + ex.getMessage());
  }//end warning()

  //handle an error
  public void error(
                SAXParseException ex) {
    System.out.println("[Error] "+
              getLocationString(ex)+
               ": " + ex.getMessage());
  }//end error()

  //handle a fatal error
  public void fatalError(
                  SAXParseException ex)
                  throws SAXException {
    System.out.println(
             "[Fatal Error] "+
              getLocationString(ex) +
               ": " + ex.getMessage());
    System.out.println("Terminating");
    System.exit(1);
  }//end fatalError()
  
  //Private method called by error
  // handlers to return information
  // regarding the point in the
  // document where the error was
  // detected by the parser.  
  private String getLocationString(
                 SAXParseException ex){
    StringBuffer str = 
                    new StringBuffer();

    //get SystemId, display it, and
    // use it to get the  name of the
    // file being parsed
    String systemId = ex.getSystemId();
      if(systemId != null){
        System.out.println(
              "systemID: " + systemId);
        //get file name from end of
        // systemID
        int index = 
             systemId.lastIndexOf(‘/’);
        if(index != -1){
          systemId = systemId.
                  substring(index + 1);
        }//end if(index..
        str.append(systemId);
      }//end if(systemID…
      //now get and append location
      // information
      str.append(‘:’);
      str.append(ex.getLineNumber());
      str.append(‘:’);
      str.append(ex.getColumnNumber());

      return str.toString();

    }//end getLocationString()

}//end class EventHandler
//===================================//

Listing 26


Copyright 2000, Richard G. Baldwin.  Reproduction in whole or in part in any form or medium without  express written permission from Richard Baldwin is prohibited.

About the author

Richard Baldwin (baldwin.richard@iname.com) is a college professor and private consultant whose primary focus is a combination of Java and XML. In addition to the many platform-independent benefits of Java applications, he believes that a combination of Java and XML will become the primary driving force in the delivery of structured information on the Web.

Richard has participated in numerous consulting projects involving Java, XML, or a combination of the two.  He frequently provides onsite Java and/or XML training at the high-tech companies located in and around Austin, Texas.  He is the author of Baldwin’s Java Programming Tutorials, which has gained a worldwide following among experienced and aspiring Java programmers. He has also published articles on Java Programming in Java Pro magazine.

Richard holds an MSEE degree from Southern Methodist University and has many years of experience in the application of computer technology to real-world problems.

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends & analysis

Latest Posts

Related Stories