LanguagesXMLXPath Expressions

XPath Expressions

Xpath expressions offer a flexible structure for navigating inside an XML document. While they are usually used in conjunction with XSLT style sheets to identify pieces of an XML document targeted for transformations, they can be effective in any application that needs to extract information from an XML document.

To demonstrate various examples, I’ll use a Java program from the Apache Xalan distribution ( http://xml.apache.org). The program is called ApplyXPath.java and is found in the samples directory of the distribution. This program takes an XML document filename and an Xpath expression as its arguments and then produces a resulting XML output. The following is a code fragment where most of the work is happening. A DOM tree is built after parsing the document and an instance of TransformerFactory is retrieved. Then, the selectNodeIterator() method of the org.apache.xpath.XPathAPI class is used to apply the Xpath expression to the DOM tree.


 InputSource in = 
             new InputSource(new FileInputStream(filename));
 DocumentBuilderFactory dfactory = 
                       DocumentBuilderFactory.newInstance();
 Document doc = dfactory.newDocumentBuilder().parse(in);
 Transformer serializer = 
            TransformerFactory.newInstance().newTransformer();
 serializer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, 
                                       "yes");

 NodeIterator nl = XPathAPI.selectNodeIterator(doc, xpath); 
 
 Node  n;
 while ((n = nl.nextNode()) != null) 
 { 
       serializer.transform(new DOMSource(n),
                          new StreamResult(System.out)); 
       System.out.println(); 				       
 }  


Here is the XML file we’ll be using:


<?xml version="1.0"?>
<doc>
  <customer id="1">
     <name>
        <fname> Jim </fname>
        <lname> Jones </lname>
     </name>
     <address> 8 ABC Avenue </address>
     <zip> 76767 </zip>
  </customer>

  <customer id="2">
     <name>
        <fname> Jane </fname>
        <lname> Lewis </lname>
        <zip id="8"> 12345 </zip>
     </name>
     <address> 78 Cherry Lane </address>
     <zip> 90872 </zip>
  </customer>

  <customer id="3">
     <name>
        <fname> Ronald </fname>
        <lname> Smith </lname>
     </name>
     <address> 23 Main Street </address>
     <zip> 12121 </zip>
  </customer>

</doc>


Xpath expressions are formed using element names, attributes and built-in functions. The goal is to create expressions that point to specific parts of an XML document. Given the hierarchical nature of XML, one of the simplest expressions is to follow the tree structure to point to a particular element. For example, the expression “doc/customer/zip” would extract all the <zip> elements that are direct children of <customer> from the XML document. To test this, you run the ApplyXPath program as follows:


D:>java ApplyXPath foo.xml "/doc/customer/zip"
Loading classes, parsing foo.xml, and setting up serializer
Querying DOM using /doc/customer/zip
<output>
<zip> 76767 </zip>
<zip> 90872 </zip>
<zip> 12121 </zip>
</output>

Using the // operator, you can retrieve an element regardless of its position in the hierarchy.

D:>java ApplyXPath foo2.xml //zip
Loading classes, parsing foo2.xml, and setting up serializer
Querying DOM using //zip
<output>
<zip> 76767 </zip>
<zip> 12345 </zip>
<zip> 90872 </zip>
<zip> 12121 </zip>
</output>

The character “*” is used as the wildcard. For example, the expression “//*” selects all elements in the document. The expression “doc/customer/name/*” selects all elements that are children of <name>. You can also go backwards. Expressions like “/*/*/*/foo” select particular elements (e.g., foo) who have a specific number of ancestors (3 in this case). Obviously you need to have a good understanding of the hierarchy to use such expressions that rely on exact number of ancestors or descendants. Also, as the structure of the document changes, these expressions need to change as well.

The square brackets allow you to specify element position. For example, “/doc/customer[3]” will return the third instance of the <customer> element. “doc/customer[last()] returns the last instance of the <customer> element. Xpath expressions can also be formed based on attributes. To select an element based on its attribute, you could use the following format: //element_name[@attribute_name]. So “//zip[@id] would look for <zip> elements that have an attribute named “id”. The expression would return:


<output>
<zip id="8"> 12345 </zip>
</output>


Combining attribute and wildcard, you could search for elements that have attributes or not have attributes regardless of the attribute names. For example, //zip[not(@*)] would return instances of <zip> element that do not have an attribute. You can also limit the search based on the value of the attribute using the structure //element_name[@attribute_name=’attribute_value’].

There are a number of functions you can use with Xpath expressions. The count() function will return the number of elements that match the expression. The name() function returns the name of the element and contain() allows you to search for one string inside another string. This is useful when element names follow a certain order based on their name. The string-length() function returns the number of characters in its argument. The following expression returns all elements whose length is equal to 5 characters:


//*[string-length(name()) = 5]

There is much more to Xpath expressions. We have only covered the basics to give you a glimpse on what Xpath expressions are and how they can be used. Within the context of XSLT, Xpath provides the facility for identifying target areas of an XML document. If your application requires search capabilities, then you should consider XML Query, which focuses strictly on search and extraction of data among collection of XML documents.

About the Author

Piroz Mohseni is president of Bita Technologies,
focusing on business improvement through the effective use of technology. His
areas of interest include enterprise Java, XML, and e-commerce
applications.

# # #

Get the Free Newsletter!
Subscribe to Developer Insider for top news, trends & analysis
This email address is invalid.
Get the Free Newsletter!
Subscribe to Developer Insider for top news, trends & analysis
This email address is invalid.

Latest Posts

Related Stories