Discover the Wonders of XSLT: XPaths
This is Part 2 of the developer.com introduction to XSLT. The first part was about tools and the basic syntax. I recommend you read it first.
Make sure you download the updated listings before reading any further.
The style sheet language is made up of two W3C recommendations:
- XPath, which is a querying language
- XSLT itself, which is a scripting language with an XML syntax
A style sheet describes how to convert the input document into the output. XPath deals with the input; it allows you to retrieve values from the input document. XSLT deals with generating the output. It offers instructions to create elements, attributes and other XML markup in the output.
XPaths are not unlike file paths and URLs, but are adapted to the XML syntax. For example, download the listings and open the sample2.xml file. The path to the document titles is the following:
Essentially, an XPath lists all the elements that lead to the one you're interested into, just like the way that a file path lists all the directories leading to the file you're interested in. The separator is the forwards slash, /.
An XPath returns a node set, i.e. a list of nodes that match the XPath. A node set may contain zero (which most likely indicates an error in the XPath), one, or more nodes. The node set for the XPath above contains only one node (the article title).
The element names in an XPath must be fully qualified, i.e. they must include both the namespace prefix and the local name. Make sure you declare the namespace prefix in the style sheet as well (see the example below).
The previous example was for an absolute path because it starts from the root of the document. XPaths may also be relative to the current node. Again, the concept is very similar to file paths that can either start from the root (or a disk under Windows) or be relative to the current directory.
Absolute XPaths start with the forward slash; relative XPaths start with an element name. Assuming the current node is /a:article, the following XPath points to the article title.
You may recognize this XPath from the style sheet in the previous article. Indeed, the template match attribute contains an XPath, in most cases a relative one.
As it interprets the style sheet, the XSLT processor keeps track of the current node. Some instructions, such as xsl:apply-templates and xsl:for-each (see below), change the current node.
Attributes and other special cases
To include an attribute in an XPath, prefix its name with the @ character. The following (relative) XPath selects the link's URI if the current node is a section:
The @ is not a separator but a prefix identifying attributes. Therefore, you still need the forward slash between the attribute name and its parent.
The single and double dot (. and ..) represent the current element and the parent of the current element respectively. If the current element is a paragraph,
selects all the paragraphs in the section. The .. selects the paragraph's parent (the section); from there, the XPath selects all the paragraphs in the section. Note that this XPath may return a node set with several nodes, as many nodes as paragraphs in the section, in fact.
To select all the paragraphs in the body, use this XPath:
Using two slashes as a separator // selects amongst the descendants, as opposed to the children, of the element. The descendants include the children, the children of the children, the children of the children of the children, and so on. The following absolute XPath selects all the titles (article and section titles):
Page 1 of 2