Parsing XML Documents: Events, Part 1
When processing your XML documents, you will probably make a choice between standard tools or custom tools. As time goes on, more and more standard tools will become available for your use in processing your XML documents. Also, if none of the standard tools are suitable for your needs, you can create your own custom tools.
For the past several weeks, I have been teaching you how to transform and render information from your XML documents using XSLT. In this case, I have been teaching you how to use the built-in XSLT tools of Internet Explorer. Because this is a built-in capability of Internet Explorer, I would consider this to be a standard tool. In order to use XSLT, you must learn how to write scripts using the transformation language. Although there is still much to be learned in the XSLT area, I have decided to temporarily set that topic aside and spend the next few weeks discussing another equally important topic. In this series of lessons, I will teach you how to use the Java programming language, (and possibly the Python programming language as well) to create custom tools. These custom tools will parse and process XML documents.
Preview One of the common ways to create custom XML processing tools is through the use of event-driven programming. In this lesson, I will introduce you to the concept of event-driven programming using a very simple example. In the lessons that follow, I will expand on that example, and will then show you how to use a SAX parser with event-driven programming to parse and process XML documents.
Virtually all of the previous discussions surrounding XSLT have involved the use of a web browser (Internet Explorer) to process and render XML documents. However, XML is not restricted to being used with web browsers. XML is also useful in a variety of applications that do not involve web browsers, and might not even involve web servers. In such cases, you may need to write your own program to parse, process, and render your XML documents.
Very often, the task of processing an XML document will involve parsing the document before applying some processing algorithm. By parsing, I mean subdividing a stream of XML text into its various components such as elements, attributes, content, etc.
There are at least two different ways to parse an XML document. One way involves reading the document and creating a tree structure that represents the document. The processing algorithm is then applied to the tree structure. This involves something commonly referred to as the Document Object Model, or DOM for short. I will discuss the DOM in more detail in a subsequent lesson.
Another way to parse a document is to analyze the XML document as a stream of text, recognizing the various components as they are encountered, and applying the processing algorithm as the components are recognized. There are advantages and disadvantages to each approach, depending on your overall objective. A very common way to implement this approach is by using a concept (often referred to as SAX) that will examine the sequence of characters that comprise the XML text and raise events (such as the start and end of elements) as the components in the document are encountered.
An event-based parser
An event-based parser reports events to the processing program using callbacks. The program implements and registers event handlers for the different types of events. Code written into the event handlers is designed to achieve the overall objective. In other words, the overall behavior of the custom XML processor is coded into the event handlers.
So, what do I mean by events, event handlers, callbacks, etc.? Although you may not have realized it, you probably have been using event-driven programs for many years. In order to understand how to use an event-based parser, you must understand how event-driven programming is accomplished in your programming language of choice.
Event processing for XML can be a little complicated, particularly if you don't have a background in event-driven programming. Before getting into the details of event processing for XML, I am going to develop a very simple event-driven program using the Java programming language and walk you through it.
I will use an applet in a Graphical User Interface (GUI) with the Java programming language. I have chosen a visual GUI approach because most people find it easier to understand event-driven programming with visual components than with other kinds of events. (XML events are not visual.) I have chosen to use an applet because it is a little easier to develop an applet with a GUI than to develop other kinds of Java programs having a GUI. Hopefully, this example will prepare you to understand event-driven programming for the purpose of parsing and processing XML documents. In short, this and the next few lessons will be a crash course in how to write event-driven programs using the Java programming language. You will apply this knowledge to the parsing and processing of XML documents.
I read somewhere once that "An object-oriented program is just a bunch of objects laying around sending messages to each other."
While the above statement is not entirely true, such a statement is not far from the truth.
Similarly, it could be said that "An event driven program is just a bunch of objects laying around waiting for an event to happen." Once an event-driven program is started, after some initial setup effort, the program usually goes into a quiescent state, waiting for an event to happen. When an event happens, an event handler springs into action and does some work. Then it goes back to waiting for the next event.
So what is an event? An event is the occurrence of anything of interest in the context of your program. For example, a program used for automatic stock trading might consider an event to have happened when the price of a stock crosses a particular threshold. At that point, the program may spring into action to either buy or sell some shares of stock. An event-driven program that controls the temperature in a building might consider an event to have happened when the temperature crosses a predefined threshold. In that case, the program may spring into action and cause the heater or the air-conditioner to start running.
An event-driven program that deals with a GUI may consider an event to have happened when someone clicks a button with a mouse. (That is the case in this sample program.)
The sample program was written as an applet. Therefore, it can be executed either by loading it into a web browser, or by loading it into a special program named AppletViewer. (AppletViewer is a utility program from Sun Microsystems intended for use in testing applet programs.)
Figure 1 shows the screen output when the AppletViewer program starts running this sample applet. (This screen shot was taken before the user caused any events to happen.)
As you can see in the figure, the program starts running with a red button showing on the screen. Following startup, it does nothing until the user clicks the red button with the mouse (or clicks one of the buttons in the top right-hand corner of the frame).
The gray buttons along the top, the text in the light blue area, and the text at the bottom have nothing to do with the sample program. Rather, they are associated with the AppletViewer program itself. The text in the dark blue banner and the red button are produced by the sample program.
At this point, the program is in a quiescent state, waiting for an event to happen. The program is designed to recognize and handle only one type of event. If the user clicks on the red button, this will cause a so-called Action Event to occur. An Action Event Handler is registered on the button. (This means that the handler is set up to listen for Action Events on the button.) Using Java jargon, the button is an event source and the event handler is a listener.
When an ActionEvent happens on the button, the event handler will spring into action and cause the button to turn yellow as shown in Figure 2.
Actually, the behavior of the event handler is to toggle the background color of the button between yellow and blue. Clicking the button again causes the color of the button to change to blue, as shown in Figure 3.
From this point forward, each time the button is clicked, the background color of the button will be toggled between yellow and blue. The next click on the button produces the output shown in Figure 4, etc.
This behavior will continue until the user terminates the AppletViewer program (by clicking the button with the X in the upper right-hand corner of the frame). If you were to load this applet into a late-model browser instead of using the AppletViewer program, you would see the red button in the client area of your browser as shown in Figure 5.
The event-driven behavior of the applet when running in the browser would be the same as when running under the AppletViewer program. What you have seen in the above figures is the behavior of a very simple event-driven program written using the Java programming language.
Summary In this lesson, I have introduced you to the concept of event-driven programming, using the Java programming language as a vehicle.
In the lessons that follow, I will show you how to use a SAX parser with event-driven programming (using both the Java and Python programming languages) to parse and process XML documents. I will also show you how to use the DOM (Document Object Model) to parse and process XML documents.
Copyright 2000, Richard G. Baldwin. Reproduction in whole or in part in any form or medium without express written permission from Richard Baldwin is prohibited.Richard Baldwin (firstname.lastname@example.org) is a college professor and private consultant whose primary focus is a combination of Java and XML. In addition to the many platform-independent benefits of Java applications, he believes that a combination of Java and XML will become the primary driving force in the delivery of structured information on the Web.
Richard has participated in numerous consulting projects involving Java, XML, or a combination of the two. He frequently provides onsite Java and/or XML training at the high-tech companies located in and around Austin, Texas. He is the author of Baldwin's Java Programming Tutorials, which has gained a worldwide following among experienced and aspiring Java programmers. He has also published articles on Java Programming in Java Pro magazine.
Richard holds an MSEE degree from Southern Methodist University and has many years of experience in the application of computer technology to real-world problems.