September 14, 2014
Hot Topics:
RSS RSS feed Download our iPhone app

Understanding XPath

  • May 31, 2002
  • By Kirk Allen Evans
  • Send Email »
  • More Articles »

This article is an excerpt from the book XML and ASP.NET (ISBN:073571200X), written by Kirk Allen Evans, Ashwin Kamanna, and Joel Mueller, published by New Riders Publishing.

What is XPath?

XML is simply markup for data. That's it. XML is not a magic wand; it does not specify how data is transmitted over the wire, it does not specify how data is stored. XML simply determines the format of the data: what you do with the data is up to you. That said, the real power behind XML is not solely its ability to represent data: XML's real power lies in ancillary technologies that, when combined with XML, provide robust solutions, and XPath is one of those ancillary technologies.

Version 1.0 of the XML Path Language became a World Wide Web Consortium (W3C) recommendation on November 16th, 1999. You can view the W3C recommendation for XPath 1.0 at http://www.w3.org/TR/xpath. This document shows all information relating to XPath including an overview of XPath and a description of its components.

XPath grew out of efforts to share a common syntax between XSL Transformations (XSLT) and XPointer. It allows for the search and retrieval of information within an XML document structure. XPath is not an XML syntax: rather, it uses a syntax that relates to the logical structure of an XML document.

An Analogy to SQL

Consider a relational database. Is the real power of a database the ability to simply store data, index the data, and specify relations between tables of data? After all, a database is supposed to hold data, so is the capability of persisting data the real advantage behind a relational database? If so, a simple file would suffice for this. It is easy to see that the real power of a database is the ability to use Structured Query Language (SQL) statements to retrieve subsets of data. To take this example one step further, the fact that SQL is an ANSI standard makes your knowledge of SQL applicable to different databases running on different platforms.

Using this same logic, XML would simply be a format for data storage without a prescribed way of retrieving that data. This is exactly what XPath is: XPath is the query language for XML documents. XPath is the common name used for XML Path Language. Using XPath statements, you can retrieve complex subsets of data from XML documents using a syntax that is universal across implementations. The same XPath statements that work within the System.Xml and System.Xml.XPath namespaces should work exactly the same as XPath statements in the MSXML Parser, and both should work exactly the same as other parsers that implement the W3C XPath recommendation.

An Analogy to a File Path

Computers are built around files and the organization of those files. To access files, you need to be able to navigate to different portions of the file system. One way to navigate a file system is to use the Uniform Naming Convention (UNC) for specifying the location of resources on a local area network (LAN). UNC separates folders and files using a backslash (\) character. In the good ol' DOS days before point-and-click, file systems were navigated using command-line syntax.

Go to the Start button on your computer; choose Run, and type cmd in the text box to bring up a DOS command shell window. You will see the following text:

Microsoft Windows 2000 [Version 5.00.2195]
(C) Copyright 1985-2000 Microsoft Corp.
C:\>

At the command prompt, change directories from the C: root to the Program Files\Microsoft Visual studio directory.

C:\>cd Program Files\Microsoft Visual Studio

To change directories, you specified a path for the file system to navigate. More to the point, you specified a series of location steps used to navigate to a new folder based on the current folder. XPath uses a very similar syntax. Imagine your file system as an XML document.

<?xml version="1.0" encoding="utf-8" ?>
<C>
   <INETPUB>
     <WWWROOT>
        <ASPNET_CLIENT/>
     </WWWROOT>
   </INETPUB>
<C>

We could easily represent this as an XPath statement:

C/INETPUB/WWWROOT/ASPNET_CLIENT

If we are currently positioned at the very beginning of the document there are four location steps made. But what if we were currently positioned on the WWWROOT element and wanted to reposition to the ASPNET_CLIENT element? We would specify the following XPath statement:

./ASPNET_CLIENT

The period (.) at the beginning of the XPath statement represents the expression "the context node", meaning the node that we originally started from. Instead of specifying that we are navigating based on the context node, we can also use a short form of XPath that specifies a path relative to the context node:

ASPNET_CLIENT

A location path is composed of three parts: the axis, the node-test, and zero or more predicates.

XPath Axis

The axis component of an XPath query determines the direction of the node selection in relation to the context node. An axis can be thought of as a directional query. The axes listed in Table 1 are provided in XPath.

Table 1 XPath Axes

Axis

Description

ancestor

The context node's parent, the parent's parent, and so on.

ancestor-or-self

The context node as well as its ancestors.

attribute

The attributes of the context node.

child

All children of the context element (attributes cannot have children).

descendant

All descendants of the context: children, children's children, and so on.

descendant-or-self

All descendants as well as the context node.

following

All nodes in the same document as the context node that are after the context node. This does not include descendants, attribute nodes, or namespace nodes.

following-sibling

All the following siblings of the context node. A sibling is an element occurring at the same level in the tree.

namespace

The namespace nodes of the context node.

parent

The parent of the context node.

preceding

All nodes in the same document as the context node that are immediately before the context node.

preceding-sibling

Contains the preceding siblings. If the context node is either an attribute or a name-space node, the preceding-sibling axis is empty






Page 1 of 3



Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 


Sitemap | Contact Us

Rocket Fuel