This is the first article in a new series introducing XSLT. XSLT is an acronym for XML Stylesheet Language Transformations, but I believe the W3C should change it into XML Scripting Language.
Over the years, I have used XSLT to publish Web sites, to generate PDFs from documentation, to prepare e-commerce transactions, to build Web services, to import documents in databases, to construct UML models, to pre- or post-process articles, to generate Java code, … you name it. If it involves manipulating an XML document, chances are XSLT is my favorite solution.
Obviously, there’s nothing you can do with XSLT that can’t be done with straight Java or C#. Why bother learning a new language, then? Because XSLT is highly specialized, you will find that coding is faster and more maintainable.
Getting the Tools
Before going any further, you need to install an XSLT processor. Chance are there’s already one on your machine because both Microsoft and Java ship with one.
Microsoft’s XSLT processor is MSXML. There’s a command line interface that is great for testing, or you can call the processor from your application through the .NET run-time.
On Java 1.4 or above, the XSLT processor is available via the javax.xml.transform package.
For this series, I recommend that you install Eclipse and the ananas.org XM plugin. Eclipse is an IDE available on most platforms (Windows, Linux, and MacOS X). Refer to “Using XML for Web Publishing” for more details.
XSLT Basics
Listing 1 is a very simple stylesheet to show you what XSLT looks like. It takes an XML article and publishes it as an HTML page. Download the listings for a sample XML document.
Listing 1: basic.xsl
<?xml version="1.0"?> <xsl:stylesheet xmlns_xsl="http://www.w3.org/1999/XSL/Transform" xmlns_a="http://psol.com/2004/article" version="1.0"> <xsl:output method="html"/> <xsl:template match="a:article"> <html><xsl:apply-templates/></html> </xsl:template> <xsl:template match="a:body"> <body> <xsl:apply-templates/> <p>This page was made with XML and XSLT.</p> </body> </xsl:template> <xsl:template match="a:para"> <p><xsl:apply-templates/></p> </xsl:template> <xsl:template match="a:section"> <xsl:apply-templates/><hr/> </xsl:template> <xsl:template match="a:info/a:title"> <head><title><xsl:apply-templates/></title></head> </xsl:template> <xsl:template match="a:section/a:title"> <h1><xsl:apply-templates/></h1> </xsl:template> </xsl:stylesheet>
An XSLT stylesheet is an XML document itself (this has several implications, as we will see in a minute). The instructions must appear in the http://www.w3.org/1999/XSL/Transform namespace. If you encounter problems with a stylesheet, make sure the namespace has been declared properly; it’s the number one cause of problems that my students have.
The root of the stylesheet is the <xsl:stylesheet> element. It needs a version attribute and the value must be “1.0.” Below the root comes the <xsl:output> element that specifies whether the result is an HTML, XML, or text document.
Then come the templates. Each template is a rule that transforms one or more elements from the source document into one or more elements in the result. For example, the template:
<xsl:template match="a:article"> <html><xsl:apply-templates/></html> </xsl:template>
specifies that <a:article> in the source becomes the <html> in the result. In other words, the root of the XML document becomes the HTML root.
The <xsl:apply-templates/> instruction is a placeholder for the content of the element. In the above example, the processor inserts the article content between the <html> tags.
The position of <xsl:apply-templates/> in the template is important because it determines where the element content appears in the result.
Look at the following template:
<xsl:template match="a:section"> <xsl:apply-templates/><hr/> </xsl:template>
It inserts an horizontal line after the section content. If <hr/> is placed before the <xsl:apply-templates/>, the line would appear before the section so that <xsl:apply-templates/> represents the section content.
The match attributes select to which source elements the template applies. In most cases, that’s an element name. When there’s a risk of confusion, you can specify a path (or conditions, as we’ll see next month) to test on the element ancestor.
The a:section/a:title path selects the <a:title> elements as a child of <a:section>. Note that it’s <a:title> as a child of <a:section>, and not the opposite.
Finally, I’d like to draw your attention to syntax issues. A stylesheet is an XML document and it must respect the XML syntax, which means that:
- Elements need both a starting and ending tag (in HTML you often dispense with the ending tag).
- An empty element follows the XML convention, so <hr> is written as <hr/>. (Don’t worry; the processor will remove the trailing slash.)
Testing and Exercise
I encourage you to download the listing and run the example for yourself. The listings also includes a small exercise so you can practice what you have learned.
As you work with the listings, you will notice that the XML documents start with the following processing instruction:
<?xml-stylesheet href="basic.xsl" type="text/xsl"?>
It tells the processor which stylesheet applies to the document.
In case it is being misunderstood, let me stress that the processing instruction appears in the XML document, not the stylesheet! So, if you want to apply another stylesheet to a document, you need to modify the document.
Next month, we will cover XPath, attributes, and more XSLT instructions.
About the Author
Benoît Marchal is a Belgian writer and consultant. He is the author of XML by Example and other XML books. He works mostly on Web services, XML, and Java.