LanguagesXMLTransforming Flat Files To XML With SAX and XSLT

Transforming Flat Files To XML With SAX and XSLT

Introduction

When we need to transform XML into other formats, XSLT (eXtensible Stylesheet Language for Transformations) does a wonderful job. However, sometimes we have a flat file or non-XML data structure that we need to transform into XML or other markup languages. Wouldn’t it be nice if we could use the power of XSLT to transform these data structures as well?

Well, the answer is we can use XSLT to transform non-XML data sources using SAX (Simple API for XML). In this article, we’ll build a Java class that transforms Java properties files into XML. This real, working component will illustrate the concept and help you learn how to use this technique for transforming virtually any data structure into XML.

The following outline is a road map for how we’ll cover our topic:

  • SAX Parser and Handler Review
  • Writing Your Own SAX Parser (it’s easier than you think)
  • The “Echo” Stylesheet
  • Transforming a SAX Source with TrAX (Transformation API for XML)
  • Summary

SAX Parser and Handler Review

If you’ve worked with SAX, you know that it is an API for processing XML documents as a stream of events. You may have written a handler class to be the recipient of these events. The handler class is notified of the following events, among others:

  • Start of Document
  • Start of Element
  • Characters
  • End of Element
  • End of Document

The handler class can respond to these events as it wishes. The easiest way to implement the ContentHandler interface is by extending the DefaultHandler object.

To parse an XML file using a custom handler, we might use the following code:

File f = new File("test.xml");
ContentHandler handler = new YourCustomHandler();

SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser parser = factory.newSAXParser();
parser.parse(file, handler);

The SAXParser will invoke the callback methods on YourCustomHandler.

Writing Your Own SAX Parser

To work with non-XML data structures, we need to build a parser that broadcasts SAX events to any registered handler classes. We don’t even need to write a handler class. This seems strange at first if you are accustomed to writing handler classes.

A SAXSource object, representing the input to the transformation, is needed to use your parser in conjunction with the TrAX API. A SAXSource object can be constructed from an object that implements the XMLReader interface. This interface consists of several methods, most of which we don’t need to be concerned with in our example.

We’ll create an implementation of an XMLReader that can transform Java properties files into a stream of XML events. It should be a simple enough example to demonstrate how to transform arbitrary data structures into XML.

Here are the contents of the sample properties file we’ll be working with.

Font-Family=Arial
Font-Size=12pt
Background-Color=White
Foreground-Color=Black

Note that there can be any number of key value pairs in such a file. Now, let’s take a look at our class that will read the properties file and transmit a series of SAX events.

public class PropertyFileParser implements XMLReader
{
private ContentHandler contentHandler = null;

  public ContentHandler getContentHandler()
  {
    return contentHandler;
  }
  public void setContentHandler(ContentHandler handler)
  {
    contentHandler = handler;
  }

PropertyFileParser implements the XMLReader interface. Even though we don’t have to write a ContentHandler, we do have to provide a mechanism for content handlers to be registered to receive events from our parser. TrAX will provide a content handler in this scenario.

Our main task is implementing the parse() method. The first parse method is the implementation required by the XMLReader interface. Here, we take the InputSource and load a Properties object. Then, we call our custom parse method.

public void parse(InputSource source) throws IOException,
                                             SAXException
{
  InputStream is = source.getByteStream();
  Properties p = new Properties();
  p.load(is);
  parse(p);
}

The custom parse method starts broadcasting the stream of events with the startDocument() and startElement() events for the root element of the “document.” It iterates through an enumeration of the properties and generates startElement(), characters(), and endElement() events for each property. Finally, the endElement() for the root element and endDocument() events are sent.

private void parse(Properties p) throws SAXException
{
  contentHandler.startDocument();
  contentHandler.startElement(namespaceURI,
                              "Properties",
                              "Properties", attribs);

  Enumeration e = p.propertyNames();

while (e.hasMoreElements())
{
  String key = (String)e.nextElement();
  String value = (String)p.getProperty(key);

    contentHandler.startElement(namespaceURI, key, key, attribs);
    contentHandler.characters(value.toCharArray(), 0,
                              value.length());
    contentHandler.endElement(namespaceURI, key, key);
}

  contentHandler.endElement(namespaceURI, "Properties",
                                          "Properties");
  contentHandler.endDocument();
}

To satisfy the XMLReader interface, we will implement several other methods as null methods. However, this is the meat of our SAX Parser. The entire class can be viewed here: PropertyFileParser.java.

Echo Stylesheet

We’ll use a very simple stylesheet to output the XML document that corresponds exactly with the SAX events that our parser broadcast. I’ve decided to call this echo.xsl because the stylesheet simply sounds back the input document. Keep in mind that any stylesheet could be used here. The output XML doesn’t have to look anything like the SAX events that it will be receiving. The stylesheet will be “fooled” into thinking that it’s dealing with XML even though we’re dealing with a non-XML data structure. In an indirect way, the stylesheet becomes your “handler,” or recipient of the events broadcast by the parser.

<?xml version="1.0" encoding="UTF-8"?>

<xsl:stylesheet version="1.0"
                xmlns_xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns_fo="http://www.w3.org/1999/XSL/Format">

  <xsl:output method="xml" indent="yes"/>

  <xsl:template match="node()">
    <xsl:copy>
      <xsl:apply-templates/>
    </xsl:copy>
  </xsl:template>

</xsl:stylesheet>

Our “echo” stylesheet is doing a “deep copy.” The single template matches on all document nodes. It copies the context node and, recursively, all of its children via the <xsl:apply-templates/>. This type of template can be useful when you would like to make global changes to an XML file.

Transforming a SAX Source With TrAX

In a previous article, Optimizing Stylesheet Execution With The Transformation API for XML, I wrote a brief introduction to TrAX and discussed how to optimize stylesheet execution. We’ll be using non-optimized TrAX here to transform our properties file into XML, using our custom SAX Parser and our echo stylesheet.

public static void main(String[] args) throws Exception
{
// construct SAXSource with our custom XMLReader
InputStream props = ClassLoader.getSystemResourceAsStream
                    ("my.properties");
InputSource inputSource = new InputSource(props);
XMLReader parser = new PropertyFileParser();
SAXSource saxSource = new SAXSource(parser, inputSource);

// construct a transformer using the echo stylesheet
TransformerFactory factory = TransformerFactory.newInstance();
StreamSource xslSource = new StreamSource("echo.xsl");
Transformer transformer = factory.newTransformer(xslSource);

// transform the SAXSource to the result
StreamResult result = new StreamResult("properties.xml");
transformer.transform(saxSource, result);
}

Using the TrAX API, the main() method of our program performs the following steps:

  • A SAXSource object is constructed with our PropertyFileParser and an input source representing the property file to parse.
  • A transformer object is constructed for our “echo” stylesheet.
  • The SAXSource is transformed into a Result object.

Although a StreamResult was used in this example so that we could output a file, the result could have been a DOM or any other type of Result object. Here is the XML that was produced by the transformation:

<?xml version="1.0" encoding="UTF-8"?>
<Properties>
  <Background-Color>White</Background-Color>
  <Font-Family>Arial</Font-Family>
  <Font-Size>12pt</Font-Size>
  <Foreground-Color>Black</Foreground-Color>>
</Properties>

Summary

Transforming non-XML data structures into XML is a common problem. Many custom solutions are employed. It’s possible to use standard XML APIs to solve this problem. Rather than creating a multiple-step process, we can directly get from our source format to our target XML format. This can result in performance improvements by eliminating unnecessary steps.

We reviewed SAX parsing concepts and learned how to broadcast our own SAX events. We developed a class to transform Java properties files into a stream of XML events. We showed how to use the TrAX API to orchestrate the transformation process using our “echo” stylesheet. By using the technique demonstrated in this article, you can transform virtually any data structure into XML using the power of SAX and XSLT. But the rest is up to you!

Code Examples

To download the sample Java and XSL code, click here.

About the Author

Jeff Ryan is an architect for Hartford Financial Services. He has eighteen years of experience designing and developing automated solutions to business problems. His current focus is on Java, XML, and Web Services technology. He may be reached at jryan@thehartford.com.
Other Articles Written by Jeff Ryan
Get the Free Newsletter!
Subscribe to Developer Insider for top news, trends & analysis
This email address is invalid.
Get the Free Newsletter!
Subscribe to Developer Insider for top news, trends & analysis
This email address is invalid.

Latest Posts

Related Stories