http://www.developer.com/

Back to article

XQuery Language Expressions


March 19, 2004

The Need for XQuery

When you combine XML, which is essentially an open data format independent of any specific data-formatting language, with a universal addressing mechanism such as URLs, great things happen. These two factors together make the endpoints of a transaction transparent to the kind of technologies that are on the other ends.

However, this transparency can break down in the presence of databases. Even if a database has an XML-based interface (either through SOAP or through HTTP POST or GET commands), querying that database to retrieve a dataset makes things more complicated. SQL is the standard way to make queries against most databases—but each database vendor utilizes slightly (or not so slightly) different implementations of the SQL standard. What's worse is that the resultsets that come back are also formatted according to the whims of the database provider.

As mentioned in my book XQuery Kick Start, XQuery was proposed as a solution to this conundrum. To make databases truly transparent (so that it doesn't matter whether there is a Microsoft SQL Server, IBM DB2, Oracle 9i, PostgreSQL, or any other vendor's database engine), it is necessary to make a query language that will provide the following:

  • Platform consistency—The language should be the same, regardless of which vendor or product is being used.

  • XML-centric—The language should treat non-XML databases as being equivalent to XML data stores. Note that this does not necessarily imply that the language has to be written in XML.

  • Set-capable—SQL differs from languages such as Java or Visual Basic because it implicitly works on sets of data all at once rather than one item of data at a time. A new query language would similarly need to be set-focused—or more properly, node-focused.

  • Ease of use—The language should be easy enough to enable someone with a minimal database background to create complex queries.

The goal of this chapter is to examine the language in XQuery that was designed to handle all these primary points. The examples in this chapter are deliberately on the simplistic side to better illustrate the role of the command structure, but in general, XQuery finds utility more in the creation of fairly complex scripts.

Why Not SQL?

One of the first questions you can ask about XQuery is "Why not just create an updated version of SQL and use it with XML?" Beyond the vendor implementation issue we mentioned, one of the biggest problems that occurs comes from the infosets of SQL versus XML.

When you make a SQL query, you create from several tables a single virtual table (in simplest terms), which consists of the intersections, unions, and other filtered operations of the source tables. This is one of the reasons why being able to factor a problem domain into a minimal set of tables is so important: The more overlap that exists in tables, the more work must be done to eliminate spurious data sets.

Relational databases can be extraordinarily complex. Moreover, they are not good with "semi-structured" data (documents). Even a simple HTML document can very quickly overwhelm a relational database with its apparent complexity, and one of the characteristics of XML is the fact that a given element can have a range of possible subelements (or might even have an open definition in which the element can have anything as a subelement).

SQL mirrors the tabular mindset of good database design, and has been optimized for it. This means that it doesn't handle unpredictability and irregularity in data structure well.

As more data moves into an XML form, SQL's limitations become more pressing. As code becomes more complex, the advantages of a common XML-oriented data query language will become more obvious.

XSLT and XPath as Query Languages

A well-defined query language already exists for XML: XPath, covered in detail in Chapter 2, "Understanding the XPath Specification." You can format that query output with XSLT to create any XML format, so you might wonder why we need a dedicated XML query language.

XPath is an integral part of XQuery, as will be discussed in this chapter. Thus the real question is, "Why use XQuery rather than XSLT?" There are a few answers, although how compelling they are depends on how experienced you are with using XML:

  • Easier to use—XQuery uses more procedurally-oriented code than XSLT, so it might be more familiar to use than the often paradigm-bending XSLT language.

  • Less verbose—This is a valid charge for XSLT 1.0, but less so for XSLT 2.0. XSLT is an XML-based language, and even simple routines can take up pages of code.

  • Less document-centric—XSLT assumes an input stream of an XML document, although with features such as unparsed-text() and collection(), this requirement is less stringent in XSLT2. XQuery works implicitly upon sets of nodes that don't necessarily have to be XML in origin, although in all likelihood any XQuery solution would do an implicit conversion to XML before processing it.

The Structure of XQuery

Having described why XQuery is more or less necessary, let's look at its general makeup. The language consists of these primary areas:

  • for/let—Analogous to the SQL SELECT and SET statements, the for and let clauses let you define variables or iterate across a range of sequence values that are in turn assigned to a variable.

  • where—Analogous to the SQL WHERE statement, the where clause provides a set of conditions that filter or limit the initial selection in a for statement.

  • order-by—Analogous to SORT BY in SQL, order-by provides the ordering constraints on a sequence.

  • return—Analogous to the SQL RETURN statement, the XQuery return clause uses a custom formatting language to create output. The output does not necessarily have to be XML, although it is optimized to produce XML.

  • XPath—Most XPath 2.0 functions are supported in XQuery, as is the axis model used to navigate over XML structures (although some data sources might not support all aspects of XPath because of the type of data involved).

  • Functions—Analogous to SQL stored procedures, you can define functions in XQuery using the XPath language that can be called inline in an XML query.

  • Namespaces—A feature of XML rather than SQL. The declare namespace function associates a namespace URI with a prefix, crucial for indicating functionality.

Any number of keywords are associated with XQuery, but these broad categories describe different syntaxes depending upon what specifically needs to be done. For instance, here's an example of a full XQuery expression that uses all five of these areas:

declare namespace xs = "http://www.w3.org/2001/XMLSchema"
define function summaryText($char) returns xs:string
{
   concat($char/name,' is a ',$char/gender,' ',$char/species)
}
<results>{
   let $chars := input()//character[gender = 'Female']
   for $char in $chars
      where $char/level gt 5
      return
          <summary health="{$char/health}"> {
          attribute level {$char/level},
          attribute date {current-dateTime()},
         summaryText($char)
          }
         </summary>
   }
 </results>

This is a query for pulling out a series of records from a game database. It assumes an initial XML data stream with records corresponding to different characters, as shown in the file characters.xml:

<characters>
<character>
  <name>Aleria</name>
  <gender>Female</gender>
  <species>Heroleim</species>
  <vocation>Bard</vocation>
  <level>5</level>
  <health>25</health>
</character>
<character>
  <name>Shar</name>
  <gender>Male</gender>
  <species>Human</species>
  <vocation>Merchant</vocation>
  <level>6</level>
  <health>28</health>
</character>
<character>
  <name>Gite</name>
  <gender>Female</gender>
  <species>Aelvar</species>
  <vocation>Mage</vocation>
  <level>7</level>
  <health>18</health>
</character>
<character>
  <name>Horukkan</name>
  <gender>Male</gender>
  <species>Udrecht</species>
  <vocation>Warrior</vocation>
  <level>5</level>
  <health>32</health>
</character>
<character>
  <name>Gounna</name>
  <gender>Female</gender>
  <species>Noleim</species>
  <vocation>Mage</vocation>
  <level>8</level>
  <health>31</health>
</character>
<character>
  <name>Sheira</name>
  <gender>Female</gender>
  <species>Human</species>
  <vocation>Cleric</vocation>
  <level>4</level>
  <health>9</health>
</character>
<character>
  <name>Drue</name>
  <gender>Female</gender>
  <species>Voleim</species>
  <vocation>Warrior</vocation>
  <level>6</level>
  <health>32</health>
</character>
<character>
  <name>Paccu</name>
  <gender>Male</gender>
  <species>Human</species>
  <vocation>Merchant</vocation>
  <level>5</level>
  <health>24</health>
</character>
</characters>

When the query is executed, it returns an XML document showing all female characters that are of greater than fifth level, with a text summary:

<results>
   <summary date="2002-10-24T14:38:48" health="18" level="7">
Gite is a Female Aelvar</summary>
   <summary date="2002-10-24T14:38:48" health="31" level="8">
Gounna is a Female Noleim</summary>
   <summary date="2002-10-24T14:38:48" health="32" level="6">
Drue is a Female Voleim</summary>
 </results>

Note that this output has been reformatted somewhat to make it more legible. Whitespace is not usually significant in XQuery—without the formatting, the result of the previous query is as follows:

<results><summary date="2002-10-24T14:38:48" health="18" level="7">
Gite is a Female Aelvar
         </summary><summary date="2002-10-24T14:38:48" 
health="31" level="8">Gounna is a Female Noleim
         </summary><summary date="2002-10-24T14:38:48" 
health="32" level="6">Drue is a Female Voleim
         </summary>

</results>

Because each of the major areas has its own language and syntax, the best way to understand the full XQuery language is to break it into the code for each type of expression.

Assignments with let

XQuery, like XPath and XSLT, is a declarative language. What that means in practice is a little more nebulous. One way of thinking about it is that in a declarative language, after an item is defined within a given scope (such as a program block or function), the item can't be redefined within that scope. For instance, the following code is illegal in a declarative language:

summa = 0;
while (a in (1,3,6,4,7)){
   summa=summa+a;
   }
print(summa);

What causes the problem is that the variable summa changes its value within the same context. For many programmers, the idea that you can't use an accumulator like this might seem counterintuitive, but it turns out that placing one restriction on your code can significantly reduce errors when both developing and deploying code.

A declarative language can perform the same type of operation, but it works on the assumption of a generalized context. In essence, you are creating a buffer to which you're adding content, and although you can control what goes into that buffer, after the buffered content is created, you can't go in and change that buffer—you can only create other buffers from that one. That's why certain operations, such as summation, require specialized functions:

let $summa := sum((1,3,6,4,7))

The let statement in XQuery defines a function that has a constant value. This might seem like a way of saying a variable, but in fact, after the function is defined, it can't be redefined. Thus,

let $summa := sum((1,3,6,4,7))
let $summa := $summa + 6

is illegal in XQuery, because you are attempting to redefine the variable $summa.

You can use the let operator in conjunction with XPath to create a reference to a sequence. For instance, in the characters.xml file, you could create a sequence of female characters and assign it to the variable $femaleChars as follows:

let $femaleChars := document('characters.xml')//character[@gender = 'Female']

You could then retrieve the second female character by using the sequence notations discussed in Chapter 2:

let $secondFemaleChar := $femaleChars(2)

Iterations with for

The let keyword by itself makes sense if you are dealing with one item from a sequence at any given time, but XPath is ultimately a set-manipulation language, and you need to have some way of dealing with the information as a set. This is the domain of the for keyword.

The principle purpose of for is to assign to a temporary variable successive values in a sequence. For instance, the following code line steps you through the first five (named) letters of the Greek alphabet:

for $grkLetter in ("alpha","beta","gamma","delta","epsilon")

This code first associates the value "alpha" with the variable $grkLetter to perform some processing, then sets the value to "beta", and so forth until it reaches "epsilon". You could also do this with a previously defined sequence stored in a variable:

let $femaleChars := document('characters.xml')//character[@gender = 'Female']
for $femaleCharacter in $femaleCharacters

Similarly, you can use the XPath to operator to iterate over numbers to do something analogous to the for statement in C++, Java, or Visual Basic. This example iterates over the first ten numbers:

for $index in (1 to 10)

After the discussion about XQuery being a functional language in which you are unable to assign multiple values to a single variable name, the use of the for statement might seem to be a clear violation. However, it isn't. Technically speaking, the restriction says you cannot create two variables with the same name in the same scope. This is somewhat analogous to a set of loops in a language such as JavaScript. This language has the var keyword, which indicates that the variable being defined is unique for this scope. For instance, consider the following JavaScript fragment:

for (var index=0; index!= 1; index++){
   write(index+":");
   for (var index=0;index!=2;index++){
      writeln(index);
      }
   }

You have two distinct scopes: the first belonging to the outer for loop, the second to the inner. This example, when run, prints a potentially counterintuitive result:

0:0
0:1
0:2
1:0
1:1
1:2

The outer scope is temporarily suspended when a variable with the same name is defined within the inner scope, as long as the inner variable is defined with the var keyword. This makes it possible to avoid the possibility of namespace collisions, where you end up naming a variable the same way someone else named it in some other piece of code.

In essence, the XQuery for operator acts the same way—the local variable (the variable before the in keyword) is defined within the scope of the internal block, something analogous to

for (var tempVar in mySequence){

in a language like JavaScript. $tempVar is instantiated, populated, used, and then destroyed, at which point a second (or third, or fifth, or whatever) $tempVar is created. Because the variable is never created when it already exists, it cannot violate the tenet of reassignment.

Both for and let can also work with full node trees that can be defined inline. For instance, you can create an XQuery that defines a set of regular expression filters, which can be accessed later:

let $filters := (
  <filter name="phone" regex="\(\d{3}\)\d{3}-\d{4}"/>,
  <filter name="zipcode" regex="\d{5}(-\d{4})?"/>,
  <filter name="email" regex="\w+@\w+\.\w"/>
  )
for $filter in $filters
return $filter

The output is as follows:

<filter name="phone" regex="\(\d{3}\)\d{3}-\d{4}"/>
<filter name="zipcode" regex="\d{5}(-\d{4})?"/>
<filter name="email" regex="\w+@\w+\.\w"/>

In this case, the sequence of elements is defined explicitly. Because whitespace is not (generally) significant within XML queries, you can create rich XML trees inline:

let $filters := (
  <filter>
    <name>phone</name>
    <regex>\(\d{3}\)\d{3}-\d{4}</regex>
  </filter>,
  <filter>
    <name>zipcode</name>
    <regex>\d{5}(-\d{4})?</regex>
  </filter>,
  <filter>
    <name>email</name>
    <regex>\w+@\w+\.\w </regex>
  </filter>
)
for $filter in $filters
return $filter

This code produces slightly more complex output:

  <filter>
    <name>phone</name>
    <regex>\(\d{3}\)\d{3}-\d{4}</regex>
  </filter>
  <filter>
    <name>zipcode</name>
    <regex>\d{5}(-\d{4})?</regex>
  </filter>
  <filter>
    <name>email</name>
    <regex>\w+@\w+\.\w </regex>
  </filter>

Returning Results

Neither assignment nor iteration by itself can produce output—they are used only to define variables or iterate through sets of variables, in a manner similar to the SQL SET and SELECT statements, respectively. The key to working with XQuery is to use these statements to choose the nodes with which you're going to work, and then pass those nodes onto the relevant output format. This is where the return keyword comes into play.

The purpose of return is to take the sets defined by the previous for and let statements and turn them into some form of result. It's important to realize that the result of an XQuery expression does not have to be XML. It could be a sequence of values, a single string expression, or a host of any other possible results, although the language is optimized to produce XML preferentially.

Notice, for instance, that the result in the previous sample is not, strictly speaking, an XML document:

<filter>
    <name>phone</name>
    <regex>\(\d{3}\)\d{3}-\d{4}</regex>
  </filter>
  <filter>
    <name>zipcode</name>
    <regex>\d{5}(-\d{4})?</regex>
  </filter>
  <filter>
    <name>email</name>
    <regex>\w+@\w+\.\w </regex>
  </filter>

Instead, it is a sequence of such documents. The output of an XQuery is a sequence of something, whether of XML document nodes, strings, numbers, or some combination. For instance,

for $a in (1 to 10) return $a

produces the following numeric sequence output

1,2,3,4,5,6,7,8,9,10

as distinct nodes.

This raises an interesting question. What is an output node? In essence, when an XQuery generates a result, the implementation of the result is application-specific. The result is, as mentioned, a sequence of items. Internally, what is returned usually is a DOM XMLNode object, although it might be subclassed as an element, attribute, text node, or some other resource. Typically, a non-XML result (anything that can't immediately be translated into an XML element or attribute) is returned as a text node, regardless of the data type of the variables being returned.

The expression after the return syntax can be a little confusing, especially if you are used to working with XSLT. You can introduce new elements into the output directly through the use of traditional XML-bracketed elements. For instance, you could in theory generate XML output from the list of numbers by placing an arbitrary XML element (such as a <number> tag) around the variable:

for $a in (1 to 3) return <number>$a</number>

Unfortunately, this will likely not give you the result you expect. The previous XQuery produces this result:

<number>$a</number>
<number>$a</number>
<number>$a</number>

Because any time you introduce an XML tag (opening and closing) into a result, the XQuery processor treats anything within those tags as being more markup and doesn't evaluate the result. Consequently, to do such an evaluation, you need to use the evaluation operators: {}. Such operators instruct the XQuery engine to treat the content within the brackets as XQuery expressions.

So, to get the expected result (a set of three numbers within tags), you change the XQuery to incorporate the evaluation operators:

for $a in (1 to 3)
return <number>{$a}</number>

This can lead to some interesting conditions. In any XQuery, there is an implicit assumption that the expression starts in evaluation mode—in other words, there is an implicit return statement at the highest level. That's why expressions such as

for $a in (1 to 3)
return <number>{$a}</number>

are evaluated in the first place. However, if you place arbitrary elements around the XQuery expression, the mode of operation switches into static mode:

<numbers><!-- now in static mode -->
for $a in (1 to 3)
return <number>{$a}</number>
</numbers>

In this case, the text is treated as if it is just that—text—until the evaluation brackets are reached. At that point, you ask the XQuery expression to evaluate a variable that has not been previously defined ($a), and it should fail. Indeed, using eXcelon Stylus Studio, the error received when this script ran was specifically "Variable a does not exist".

Consequently, to evaluate the text as if it were an XQuery expression, you must encompass the text within the <numbers> element with the evaluation operators {}:

<numbers>
{
for $a in (1 to 3)
return <number>{$a}</number>
}
</numbers>

This returns the expected results:

<numbers>
   <number>1</number>
   <number>2</number>
   <number>3</number>
</numbers>

This example also illustrates a second principal about evaluating XQuery expressions: You can have multiple nested {} elements, as long as they are used within elements in static context. For instance, in the example, the <numbers> tag puts the XQuery into static mode, and you have to place evaluation operators around the whole expression. Similarly, the <number> element puts the XQuery expression back into static mode, so you once again have to place the expression to be evaluated (in this case, the $a element) into the braces.

This can be seen in a slightly more sophisticated XQuery:

<numbers>
{for $a in (1 to 3) return
   <set>{for $b in (1 to $a)
      return <item>{$b}</item>
   }</set>
}
</numbers>

In this case, the $a variable iterates through the values from 1 to 3, producing <number> elements as a child. Each number element in turn evaluates from 1 to $a (whatever it happens to be for that loop) and performs its own internal return to produce <item> elements. This produces the following result:

<numbers>
   <set>
      <item>1</item>
   </set>
   <set>
      <item>1</item>
      <item>2</item>
   </set>
   <set>
      <item>1</item>
      <item>2</item>
      <item>3</item>
   </set>
</numbers>

These evaluated expressions can, of course, be more complex than simply returning the value of a variable. For instance, you can create a table in HTML that sums up sequences of varying lengths, as follows:

<html>
<head>
   <title>Summations</title>
</head>
<body>
<h1>Summations</h1>
<table>
   <tr>
      <th>Index</th>
      <th>Sum From 1 to Index</th>
   </tr>
{for $a in (1 to 10) return
   <tr>
      <td>{$a}</td>
      {
      let $b := (1 to $a)
      return
          <td>{sum($b)}</td>
      }
   </tr>
}
</table>
</body>
</html>

This example points out several salient lessons. First, you can use XQuery to generate HTML, which makes it a particularly potent tool for creating reports—an avenue we'll explore in greater depth in Chapter 4, "XQuery and XSLT." Second, you can use XQuery functions in the result blocks, such as the use of the sum() function to add up each successive $b list (that is, the lists (1), (1,2), (1,2,3), (1,2,3,4), and so on). Finally, any variable that is defined in an outside expression (such as the $a variable) is available for use within the inside expression, such as

let $b := (1 to $a)

You can similarly perform such evaluated expressions within attributes. For instance, suppose you want to create a table of colors in HTML. To do so, you need both the name of the table and a rectangle of the appropriate color illustrating the shade, set using the Cascading Style Sheets background-color property, as follows (see Figure 1):

<html>
<head>
   <title>Summations</title>
</head>
<body>
<h1>Summations</h1>
{let $colors :=("white","red","blue","green","yellow","purple","orange","black")
return
<table border="1">
   <tr>
      <th>Color</th>
      <th>Example</th>
   </tr>
{for $color in $colors return
   <tr>
      <td>{$color}</td>
       <td style="background-color:{$color}">&#160;</td>
   </tr>
}
</table>
}
</body>
</html>

Figure 1
You can use XQuery to generate more than just textual data, as this color sample illustrates.

The entity &#160; is a nonbreaking space—within an HTML <td> element, it ensures that the background color will always be rendered. What's most important here is the use of the evaluated expression in the style attribute:

<td style="background-color:{$color}">&#160;</td>

This basically replaces the indicated expression {$color} with its associated values: "white", "red", "blue", and so on, and as with elements, the expression within the attribute block could be a full XQuery expression (whitespace, including carriage returns, doesn't count in the way the attribute is handled).

The tag notation is useful in certain circumstances, but sometimes it can get in the way. The element and attribute operators perform the same purpose, but they don't require the use of the closing tag. The previous XQuery could be rewritten using these operators as follows:

<html>
<head>
   <title>Summations</title>
</head>
<body>
<h1>Summations</h1>
{let $colors :=("white","red","blue","green","yellow","purple","orange","black")
return
<table border="1">
   <tr>
      <th>Color</th>
      <th>Example</th>
   </tr>
{for $color in $colors return
   element tr {
      element td {$color},
       element td {
         attribute style {'background-color:',$color},
         '&#160;'
         }
      }
}
</table>
}
</body>
</html>

The non-XML usage for listing elements, attributes, and text content can make your code easier to read. The element constructor, for instance, takes the name of the element as the first parameter, and the value of the element (possible as an evaluated expression) as the second element. Thus,

element td {$color},

creates a new element <td> and places the text value of $color into it.

You can create sibling nodes (attribute, element, or text) with the comma separator (,) operator. Thus, in the definition of the second td element, the expression

element td {
   attribute style {'background-color:',$color},
   '&#160;'
   }

includes a new attribute node named style that in turn creates two child text nodes: the literal 'background-color' and the result of evaluating the $color variable. Because the content of an attribute must be a string, the XQuery engine concatenates these two values together into a single string value.

The same type of action is at work with the encompassing td element, which not only generates the style attribute, but also includes the literal '&#160;', the nonbreaking space character, as a text node. There is no direct concatenation here of the two nodes, by the way, because they are of a differing type—the attribute node attaches to an element as an attribute, whereas the text node is attached in a different way as part of the set of text nodes.

This can be given in a slightly simpler form. This expression

element a {
   element b {
      attribute b1 {t1},
      element c,
      'strD'
      }
   }

is the same as this tagged expression:

<a>
   <b b1="t1">
   <c/>
   strD
   </b>
</a>

The two formats are equivalent in their application, so you should use the format that works best for your needs.

Given the richness of possible output, there is always more to say about the return keyword, but the examples given here are sufficient to lay the groundwork for complex tasks.

where Oh where?

The where keyword serves as a means to define conditions about which data should be chosen, and serves a purpose roughly analogous to the SQL WHERE clause. In both cases, you are creating what is called a predicate cause, which is used to narrow a specific range of potential options.

For instance, suppose you want to display only those colors that contain the letter r in a table, as shown in the previous section. (Why you'd want this search criteria is a task better suited to psychoanalysts than programmers, but it's illustrative nonetheless.) Here you could use the where clause to quantify the search (see Figure 2):

<html>
<head>
   <title>Summations</title>
</head>
<body>
<h1>Summations</h1>
{let $colors :=("white","red","blue","green","yellow","purple","orange","black")
return
<table border="1">
   <tr>
      <th>Color</th>
      <th>Example</th>
   </tr>
{for $color in $colors where contains($color,'r')return
   element tr {
      element td {$color},
       element td {
         attribute style {'background-color:',$color},
         '&#160;'
         }
      }
}
</table>
}
</body>
</html>

The contains() function is, of course, one of the functions that XQuery and XPath2 jointly share. It returns a Boolean value if the particular node is either a string or can be automatically converted to a string, and the second expression can be found in the first.

The where predicate should generally follow either the let or for keyword (usually for) but precede the statement's return clause. It should also return either a Boolean value or something that can be converted to a Boolean value, such as an empty node-list or sequence, the value 0 or an empty string ("").

Figure 2
You can filter on colors that have the letter r using the where clause.

The where expression does not hold quite the same level of importance in XQuery as does the analogous WHERE expression in SQL. For starters, it is possible to use XPath predicates in the for and let expressions to act as a filter for XPath (and it is more economical to do so). The previous example can be rewritten using an XPath predicate as follows:

{for $color in $colors[contains(.,'r')] return . . .

Although this code might not be as readily understandable (you have to appreciate the rule of the context operator), it works just as well. In this case, the context operator (.) performs an automatic iteration over each node in $colors to perform the calculation to determine whether r is part of the color value. In the first case, the where expression examines each $color object in turn to determine which node to pass on; in the second, the XPath predicate winnows down the set of $colors to be passed to the iterator. It's the same result in both cases, but one occurs at a different point in the cycle than the other.

Ordering Up Some XQuery

For more than a year, XQuery had been inextricably tied to the acronym FLWR (presumably pronounced "flower") for For|Let|Where|Return. However, as of the November 15, 2002 draft, this acronym became FLWOR, an explicit recognition that ordering plays a fairly major part in database retrieval. The O stands for order by, a keyword that performs much the same actions as the SQL ORDER BY command.

The order by command follows for expressions (and where expressions where they exist), and indicates for a given set of data the order in which the data is output. When it is not explicitly specified with order by, the output order generally depends on the specific system architecture. With an XML file, for instance, the specific order is the order in which the XML parser walks the tree (usually the child of each node is processed, then the next sibling, and then the parent)—the so-called document order.

The order by keyword indicates that the content should be sorted in ascending order, unless otherwise indicated according to the data type of the expressions being evaluated. This also implicitly assumes that the data type has some internal notion of ordering, which in turn assumes that there is some sense of schema validation acting on the data. For instance, consider the characters.xml file defined earlier in the chapter. Assume that it has an XSD schema (characters.xsd), which looks something like this:

<?xml version="1.0"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">

   <xsd:element name="characters" type="Characters"/>

   <xsd:complexType name="Character">
      <xsd:sequence>
         <xsd:element name="name" type="xsd:string"/>
         <xsd:element name="gender" type="Gender"/>
         <xsd:element name="species" type="xsd:string"/>
         <xsd:element name="vocation" type="xsd:string"/>
         <xsd:element name="level" type="xsd:nonNegativeInteger"/>
         <xsd:element name="health" type="xsd:int"/>
      </xsd:sequence>
   </xsd:complexType>

   <xsd:simpleType name="Gender">
      <xsd:restriction base="xsd:string">
         <xsd:enumeration value="Female"/>
         <xsd:enumeration value="Male"/>
         <xsd:enumeration value="Other"/>
      </xsd:restriction>
   </xsd:simpleType>

   <xsd:complexType name="Characters">
      <xsd:sequence>
         <xsd:element name="character" type="Character" minOccurs="0"
 maxOccurs="unbounded"/>
      </xsd:sequence>
   </xsd:complexType>

</xsd:schema>

The schema can then be associated with the characters.xml file by editing the enclosing <characters> element:

<characters xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="http://www.kurtcagle.net/schemas/Characters.xsd"> <character> <name>Aleria Delamare</name> <gender>Female</gender> <species>Heroleim</species> <vocation>Bard</vocation> <level>5</level> <health>25</health> </character> <character> <name>Shar Denasthenes</name> <gender>Male</gender> <species>Human</species> <vocation>Merchant</vocation> <level>6</level> <health>28</health> </character> ... </characters>

The xsi:noNamespaceSchemaLocation gives the URL (here on a local machine) to the schema file. This schema thus indicates that <level> is a nonnegative integer, and <health> is an unbounded integer.

Given that, an XQuery can be written that orders the characters by health:

for $character in document('characters.xml')//character
order by health
return
   <healthReport>
      {$character/name}
      {$character/health}
</healthReport>

This will return the following collection:

<healthReport>
   <name>Sheira</name>
   <health>9</health>
</healthReport>
<healthReport>
   <name>Gite</name>
   <health>18</health>
</healthReport>
<healthReport>
   <name>Paccu</name>
   <health>24</health>
</healthReport>
<healthReport>
   <name>Aleria</name>
   <health>25</health>
</healthReport>
<healthReport>
   <name>Shar</name>
   <health>28</health>
</healthReport>
<healthReport>
   <name>Gounna</name>
   <health>31</health>
</healthReport>
<healthReport>
   <name>Horukkan</name>
   <health>32</health>
</healthReport>
<healthReport>
   <name>Drue</name>
   <health>32</health>
</healthReport>

Notice here that the order was done implicitly using integers. The first health report indicates a health of 9:

<healthReport>
   <name>Sheira</name>
   <health>9</health>
</healthReport>

Had the ordering been done without regard to the data type of the element, this entry would have been last, because 9 is alphanumerically higher in order than 32.

Suppose, however, that you didn't have a specific schema against which to validate the characters.xml file. Would this have made sorting numerically impossible? No, because you can also declare an element's data type inline:

for $character in document('characters.xml')//character
order by $character\health as xs:integer
return
   <healthReport>
      {$character/name}
      {$character/health}
</healthReport>

In this case, the contents of the <health> element are cast to an integer. The xs: namespace is implicitly defined within the XQuery schema, and indicates the XSD schema data types.

You can also determine the order of a query. The ascending and descending keywords determine the direction of the sort, and appear after the sort expression is given. For instance, suppose that each character element has an additional child element: <createdBy> of type xs:dateTime. To order the characters in descending order by the time they were created, you'd write this query:

for $character in document('characters.xml')//character
order by $character\dateCreated as xs:dateTime descending
return
   <dateReport>
      {$character/name}
      {$character/dateCreated}
   </dateReport>

Creating more sophisticated queries requires the intelligent use of FLWOR type expressions. For instance, if you want to list all entries in the characters.xml file by level, then sort all characters at that level by name, your query would need to use the distinct-values() function to retrieve the levels, then use that as the key for determining the set of each character within that level:

let $characters := document('characters.xml')//character
for $level in distinct-values($characters/level)
   order by $level
return
   <level value="{$level}">{
      for $character in $characters
      where $character/level = $level
         order by $character/name
      return $character/name
      }
   </level>

This would create the following output:

<level value="4">
   <name>Sheira</name>
</level>
<level value="5">
   <name>Aleria</name>
   <name>Horukkan</name>
   <name>Paccu</name>
</level>
<level value="6">
   <name>Shar</name>
   <name>Drue</name>
</level>
<level value="7">
   <name>Gite</name>
</level>
<level value="8">
   <name>Gounna</name>
</level>

One of the problems with such sorting is that occasionally an empty order by expression will be used. For instance, suppose that a new character is added to the characters.xml list:

<character>
  <name>Yane Helavela</name>
  <gender>Female</gender>
  <species>Human</species>
  <vocation>Illusionist</vocation>
</character>

In this case, no level element exists. The question that arises is whether this particular element should appear at the beginning or end of a sequence when sorted. This problem is answered by the empty greatest and empty least keywords. The empty greatest command indicates that whenever an item to be sorted has an empty sort key (or doesn't have one at all), it will always be assumed to be at the end of the sort order. On the other hand, empty least will place the same element at the beginning of the sort order.

The stable keyword, when added to the list of order by qualifiers, indicates another condition: what happens when two identical sort keys are found for different elements. For instance, if two different entries have the same creation date, the order of the output for the items becomes more questionable. In that situation, the stable keyword is used to tell the processor to retain the initial order in which the elements are retrieved, whereas the processor is free to implement its own ordering scheme if the stable keyword is absent. In most cases, this shouldn't make a major difference in the ordering.

Finally, if you have a reference to a specific collation (such as "eng-us"), you can indicate this particular collation as the basis for sorting through the collation keyword on the order by expression. For instance,

for $character in $characters
   order by $character/name collation "eng-us"
return $character

will order the characters in ascending (the default) order using the U.S. English collation.

Conditional Logic

XQuery supports XPath 2.0, and consequently works with the conditional if/then/else keywords of XPath. This makes it possible to create fairly sophisticated logical expressions, depending on specific characteristics in the source data. For instance, suppose you want to list the characters from the characters.xml file in a table, with ledger printing (one row white, the next row light green, the next row white, and so forth). This could be incredibly difficult to do with FLWOR notation, but by incorporating the conditional if keyword (and a little CSS), it becomes much easier (see Figure 3):

<html>
   <head>
   <style>
.evenRow {{background-color:white;color:black;}}
.oddRow {{background-color:lightGreen;color:black;}}
   </style>
   </head>
   <body>
      <h1>Characters</h1>
      <table cellspacing="0" cellpadding="3">
      <tr>
         <th>Name</th>
         <th>Gender</th>
         <th>Species</th>
         <th>Vocation</th>
         <th>Level</th>
      </tr>
      {
   let $characters := input()//character
   for $character in $characters
   let $class := if (index-of($characters,$character) mod 2 = 0)
 then 'evenRow' else 'oddRow'
      return
         <tr class="{$class}">
            <td>{string($character/name)}</td>
            <td>{string($character/gender)}</td>
            <td>{string($character/species)}</td>
            <td>{string($character/vocation)}</td>
            <td>{string($character/level)}</td>
         </tr>
      }
      </table>
   </body>
</html>

Figure 3
Conditional logic can be used to create changes in both content and stylistic output.

This particular example works by applying CSS classes (evenRow and oddRow) to alternating lines in the output table. The conditional test relies upon the expression

let $class := if (index-of($characters,$character) mod 2 = 0) 
then 'evenRow' else 'oddRow'

where the index-of() function returns the position of the character relative to the $characters sequence. The mod keyword from XPath performs a modulus (or remainder) on the expression, returning 0 if the expression is divisible by two, or 1 if it is not. The then and else functions have implicit return elements associated with them, so they can include complex XQuery statements.

In addition to illustrating the use of the if/then/else statement (covered in greater detail in Chapter 2), this sample also illustrates another feature: escaping the bracket {} characters. CSS uses brackets to indicate the CSS rule definitions, but in the sample, the XQuery processor would attempt to interpret the contents as XQuery expressions. To escape this behavior, use double brackets rather than single brackets (that is, {{ and }} instead of { and }).

Note as well that, unlike other languages, you must include both a then and an else in an if expression in XQuery. Should you run into a situation where you don't need to return anything in the then or else block, return an empty string '' as the result:

If ($a) then 'b' else ''

I Ain't Got No...

The conditions that can be evaluated in the if block include anything that can be cast to a Boolean value. However, there are a few expressions that XQuery specifically provides that can significantly improve performance, namely some ... satisfies and every ... satisfies. These two expressions make it possible to determine whether there exists at least one item in a sequence that satisfies a given condition, and whether all items in a sequence satisfy a given condition, respectively, without having to create counting functions to perform the same tests.

As an example, for the set of $characters defined previously, you can test to see whether the group contains at least one mage (see Figure 4):

let $characters := input()//character
let $response := if (some $character in $characters satisfies $character/
vocation="Mage") then
'Party has a mage' else 'Party does not have a mage.'
return
<html>
   <head>
   </head>
   <body>
      <h1>Mage Query</h1>
      <div>{$response}</div>
   </body>
</html>

The $response variable determines, for the set of $characters, whether at least one character has a vocation element of value "Mage", and returns the appropriate response string.

Figure 4
This query determines whether a party of characters includes a mage.

Similarly, the every keyword is used to perform a blanket test to determine whether all items in the set satisfy a condition, returning the value false() if even one item does not. For instance, assume that the cutoff level for a character to enter into a party is the fifth level. If even one character is below the fifth level, the party is underqualified:

let $characters := input()//character
let $response := if (every $character in $characters satisfies $character/
level ge 5) then
'Party is qualified to depart' else 'Party is not experienced enough.'
return
<html>
   <head>
   </head>
   <body>
      <h1>Is Party Qualified?</h1>
      <div>{$response}</div>
   </body>
</html>

Once again, it's worth noting how much XQuery is geared toward set manipulation. The every and some functions are extremely efficient; for instance, some will stop evaluating the moment it discovers one item that satisfies the query. Because many databases also have the capability to do fast queries based upon generalized some or every queries, XQuery can leverage these to significantly speed up the evaluation of expressions.

Defining Functions

XPath has a fairly comprehensive set of functions for doing everything from performing date calculations to evaluating regular expressions. However, sometimes it's useful to be able to build more sophisticated functions out of this core set of basic functions for doing business logic-types of evaluations.

XQuery consequently also supports the capability to create user-definable functions. These functions are XQuery/XPath in origin, and are called in the same context as XPath expressions. For instance, suppose you want to take a date in the standard XSD notation (YYYY-MM-DD) and convert it into the American standard notation MM/DD/YYYY, and you want to do it for several different instances of data that have the following structure (directory.xml):

<directory>
   <file name="chapter3.xml" dateCreated="2002-11-25" dateModified=
"2002-11-28"/>
   <file name="chapter3app1.xml" dateCreated="2002-11-25" dateModified=
"2002-11-29"/>
   <file name="chapter3.toc" dateCreated="2002-11-25" dateModified=
"2002-11-30"/>
   <file name="chapter2.xml" dateCreated="2002-11-18" dateModified=
"2002-11-21"/>
</directory>

You can create an XQuery to provide a report:

<html>
   <body>
   <h1>File Report</h1>
   <table>
      <tr>
         <th>File Name</th>
         <th>Date Created</th>
         <th>Date Last Modified</th>
      </tr>
{for $file in document('directory.xml')//file
return
   <tr>
      <td>{$file/@name}</td>
      <td>{let $refDateStr := string($file/@dateCreated)
   let $year := substring($refDateStr,1,4)
   let $month := substring($refDateStr,6,2)
   let $day := substring($refDateStr,9,2)
   return concat($month,'/',$day,'/',$year)}</td>
   <td>{let $refDateStr := string($file/@dateModified)
   let $year := substring($refDateStr,1,4)
   let $month := substring($refDateStr,6,2)
   let $day := substring($refDateStr,9,2)
   return concat($month,'/',$day,'/',$year)}</td></tr>
}
      </table>
   </body>
</html>

However, there are two difficulties. First, you have a certain degree of code duplication, with similar routines given for formatting the date in the new order. A second related problem is that it is difficult to ascertain exactly what the script is designed to do.

This is a case where working with functions can ameliorate your problems somewhat. You can define a new function called format-date() that takes an XSD type date string as a parameter and formats the date into an American Standard notation:

define function format-date($dt as xs:dateTime) as xs:String
{
let $refDateStr := string($dt)
   let $year := substring($refDateStr,1,4)
   let $month := substring($refDateStr,6,2)
   let $day := substring($refDateStr,9,2)
   return concat($month,'/',$day,'/',$year)
}

First the code creates a new function called format-date in the immediate environment:

define function format-date

The name of the function can include alphanumeric characters along with the underscore (_) and dash (-) characters only.

You can set up zero or more parameters, separated by commas:

define function format-date($dt as xs:dateTime)

The parameters (such as $dt) must be preceded by a dollar sign. A parameter doesn't have to include a data type, as the data type can be inferred dynamically at runtime, but it's generally a good idea to include one if needed. Note, however, that this also precludes overloading (more than one function with the same name and different parametric signatures). The parameter-passing model is thus more akin to languages like JavaScript (or XSLT) than it is Java.

The xs: namespace prefix, discussed earlier, is necessary if you include schema types. Note that some XQuery parsers might support other schema languages, and as such will probably have different data type prefixes:

define function format-date($dt as xs:dateTime) as xs:string

The result type, similarly, need not be specified (as with parameters, it defaults to string and/or numeric general types depending upon the processor) but can be useful to ensure that the results are type-safe. The implication here, of course, is that the resulting output will be a string, rather than a specific nodal type.

The body of the function is itself an XQuery expression—here, a series of lets that breaks the initial text string into chronological pieces:

define function format-date($dt as xs:dateTime) as xs:string
{
   let $refDateStr := string($dt)
   let $year := substring($refDateStr,1,4)
   let $month:= substring($refDateStr,6,2)
   let $day := substring($refDateStr,9,2)
   return concat($month,'/',$day,'/',$year)
}

Note that there are also date-specific functions that do the same thing; however, they are still being finalized in the XQuery working draft.

The outer return clause for the function returns the specific contents of the function to the outside world, and of course should have the data type specified by the return type in the function declaration (here, xs:string):

define function format-date($dt as xs:dateTime) as xs:string
{
   let $refDateStr := string($dt)
   let $year := substring($refDateStr,1,4)
   let $month:= substring($refDateStr,6,2)
   let $day := substring($refDateStr,9,2)
   return concat($month,'/',$day,'/',$year)
}

You can have subordinate return values that create intermediate results, but you must have at least one final outer return value.

There is one implication of XQuery being a "side effect-free" language: You cannot, within XQuery, have a situation where the function changes some external, global variable. Anything passed into an XQuery function is passed by value, not reference. An immediate consequence is that a function must always return something of value—you cannot have a void function type (although you can have one with an empty string or sequence).

The functions are defined ahead of time within the XQuery command, and then are invoked as you would expect for functions. Taking the date-modifying report code mentioned earlier, the functional notation simplifies it considerably (see Figure 5):

define function format-date($dt as xs:dateTime) as xs:string
{
   let $refDateStr := string($dt)
   let $year := substring($refDateStr,1,4)
   let $month:= substring($refDateStr,6,2)
   let $day := substring($refDateStr,9,2)
   return concat($month,'/',$day,'/',$year)
}

<html>
   <body>
   <h1>File Report</h1>
   <table>
      <tr>
         <th>File Name</th>
         <th>Date Created</th>
         <th>Date Last Modified</th>
      </tr>
{for $file in document('directory.xml')//file
return
   <tr>
      <td>{string($file/@name)}</td>
      <td>{format-date($file/@dateCreated)}</td>
      <td>{format-date($file/@dateModified)}</td>
   </tr>
}
      </table>
   </body>
</html>

The body of the code now contains much less code, and the intent of the programming becomes considerably clearer in this example. The expressions

<td>{format-date($file/@dateCreated)}</td>
<td>{format-date($file/@dateModified)}</td>

indicate that the dateCreated attribute and then the dateModified attribute of the file element (here contained in the $file variable) be changed into the MM/DD/YYYY notation, to produce the final output in HTML:

Figure 5
Consolidating reused code into functions can significantly simplify the source query, and encourages the development of code libraries.

<html>
   <body>
      <h1>File Report</h1>
      <table>
         <tr>
            <th>File Name</th>
            <th>Date Created</th>
            <th>Date Last Modified</th>
         </tr>
         <tr>
            <td>chapter3.xml</td>
            <td>11/25/2002</td>
            <td>11/28/2002</td>
         </tr>
         <tr>
            <td>chapter3app1.xml</td>
            <td>11/25/2002</td>
            <td>11/29/2002</td>
         </tr>
         <tr>
            <td>chapter3.toc</td>
            <td>11/25/2002</td>
            <td>11/30/2002</td>
         </tr>
         <tr>
            <td>chapter2.xml</td>
            <td>11/18/2002</td>
            <td>11/21/2002</td>
         </tr>
      </table>
   </body>
</html>

There are two issues to be aware of when dealing with functions: namespaces and libraries. The following sections describe these issues.

Namespaces

The XQuery language has its own specific (default) namespace that defines the functions that are commonly available. Although you can create functions into that namespace (which you are doing implicitly when you create a function without a namespace prefix), you stand the possibility of overwriting a system function with one of your own. In the current context, where all functions are local, this is not necessarily a bad thing, but in situations where more than one person is relying on these functions, that situation could prove disastrous.


Namespaces Can Be Useful, Not Just a Nuisance - People who work with XML only periodically sometimes see namespaces as something of a nuisance. Namespace identifiers are often long and unwieldy, and namespace prefixes can make a seemingly straightforward block of XML seem much more complex. However, namespaces can come in handy.

For instance, I've dealt with XML schemas describing such things as framework components (such as describing a form in Visual Basic via XML). A number of us were working on this application, and it was very important to ensure that the XML code one person wrote with samples wouldn't end up contaminating the base code until it had properly been approved.

To get around this, each developer was assigned their own namespace. The interpreter of the XML was programmed so that only certain namespaces would be enabled in each respective build (that is, you could run your own test-code, but other people running this code without the namespace enabled wouldn't have your functionality). After the XML code was deemed to be working correctly, a particular code's prefixes were swapped over to the formal implementation.

This system worked surprisingly well, especially as we moved more of our operant code into XML form. Without the namespaces, it would have been impossible to keep the coding straight; with them, not only could we tell at a glance whose code we were dealing with, but the software applications could use the same namespaces to assign functionality.


The declare namespace command lets you define other namespaces for use within function declarations. For instance, you might decide to create a package of date functions, and associate them with a given namespace URI (http://www.kurtcagle.net/schemas/xquery/date, for instance). This namespace would then be used to refer to all functions within that package as follows:

declare namespace dates = "http://www.kurtcagle.net/schemas/xquery/date"

define function dates:format-date($dt as xs:dateTime) as xs:string
{
   let $refDateStr := string($dt)
   let $year := substring($refDateStr,1,4)
   let $month:= substring($refDateStr,6,2)
   let $day := substring($refDateStr,9,2)
   return concat($month,'/',$day,'/',$year)
}

<html>
   <body>
   <h1>File Report</h1>
   <table>
      <tr>
         <th>File Name</th>
         <th>Date Created</th>
         <th>Date Last Modified</th>
      </tr>
{for $file in document('directory.xml')//file
return
   <tr>
      <td>{string($file/@name)}</td>
      <td>{dates:format-date($file/@dateCreated)}</td>
      <td>{dates:format-date($file/@dateModified)}</td>
   </tr>
}
      </table>
   </body>
</html>

By doing this, you avoid the problem of namespace collision, and not coincidentally, make it easier to organize your code.

Code Libraries

The second issue is a little more irksome, and has to do with the creation of code libraries. One of the principal reasons for working with functions is the capability to build function libraries that you (and others working in the same space) can use in your own code.

Currently (as of November 15, 2002) there is no provision within the XQuery specification for indicating code libraries, although there is an open-issue item concerning it. The primary difficulty in working with such external libraries revolves around the fact that such function libraries should realistically be in their own namespaces.

One speculative form for adding such functional libraries might look something like the following for the date library:

import "dateFunctions.xquery" in namespace dates = "http://www.kurtcagle.net/
schemas/xquery/dates"

In this case, dateFunctions.xquery is a URL that contains all the custom functions associated with dates. Once declared in this manner, the functions require the namespace prefix to be invoked (for example, dates:format-date($myDate)). Note that by associating the namespace (and its prefix) with the function set, you can use a different prefix than any defined within the imported XQuery call.

Ultimately, functions in XQuery serve much the same purpose as stored procedures (SPROCS) within SQL: They simplify the coding involved within queries, and also form a mechanism for encapsulating business logic within queries (a topic to be covered in Chapter 4 and elsewhere). This becomes critically important in dealing with pipelined architectures, in which the XQuery acts as a filter on a dataset to be passed to another component (such as a Web service, or an XSLT transform). Expect to see more on function libraries in the final specification.

In Brief

  • XQuery is a declarative language. After an item is defined within a given scope (such as a program block or function), the item can't be redefined within that scope.

  • The let statement defines a function that has a constant value.

  • The for keyword assigns to a temporary variable successive values in a sequence.

  • The return keyword takes the sets defined by for and let statements and turns them into a result—XML, a sequence of values, a single string expression, and so on.

  • The where keyword lets you define conditions about which data should be chosen. It creates a predicate cause that narrows a range of potential options.

  • The order by command follows for expressions (and where expressions where they exist) and indicates for a given set of data the order in which the data is output. When it is not explicitly specified with order by, the output order depends on the system architecture.

  • XQuery works with XPath's conditional if/then/else keywords to create logical expressions.

  • XQuery provides many built-in functions and also lets you create user-definable functions. Anything passed into an XQuery function is passed by value, not reference, and a function must always return something of value.

  • About the Author

    Kurt Cagle was a founding writer and frequent contributor for Fawcette's XML and Web Services Magazine, and has spoken on XSLT, SVG, and XSL-FO issues at more than a dozen conferences in the last five years. Kurt wrote his first book on XML in 1998, was one of the first authors to concentrate on the XSLT 1.0 specification for the Microsoft environment in Sybex's XML Developer's Handbook, and was a contributing author to the best selling Beginning XML for Wrox Press, among thirteen other books. His current book is XQuery Kick Start, published by Sams Publishing. He lives in Kirkland, Washington with his wife and two daughters, and writes novels when he isn't writing computer books.


Sitemap | Contact Us

Thanks for your registration, follow us on our social networks to keep up-to-date