December 22, 2014
Hot Topics:

XQuery Language Expressions

  • March 19, 2004
  • By Kurt Cagle
  • Send Email »
  • More Articles »

Assignments with let

XQuery, like XPath and XSLT, is a declarative language. What that means in practice is a little more nebulous. One way of thinking about it is that in a declarative language, after an item is defined within a given scope (such as a program block or function), the item can't be redefined within that scope. For instance, the following code is illegal in a declarative language:

summa = 0;
while (a in (1,3,6,4,7)){
   summa=summa+a;
   }
print(summa);

What causes the problem is that the variable summa changes its value within the same context. For many programmers, the idea that you can't use an accumulator like this might seem counterintuitive, but it turns out that placing one restriction on your code can significantly reduce errors when both developing and deploying code.

A declarative language can perform the same type of operation, but it works on the assumption of a generalized context. In essence, you are creating a buffer to which you're adding content, and although you can control what goes into that buffer, after the buffered content is created, you can't go in and change that buffer—you can only create other buffers from that one. That's why certain operations, such as summation, require specialized functions:

let $summa := sum((1,3,6,4,7))

The let statement in XQuery defines a function that has a constant value. This might seem like a way of saying a variable, but in fact, after the function is defined, it can't be redefined. Thus,

let $summa := sum((1,3,6,4,7))
let $summa := $summa + 6

is illegal in XQuery, because you are attempting to redefine the variable $summa.

You can use the let operator in conjunction with XPath to create a reference to a sequence. For instance, in the characters.xml file, you could create a sequence of female characters and assign it to the variable $femaleChars as follows:

let $femaleChars := document('characters.xml')//character[@gender = 'Female']

You could then retrieve the second female character by using the sequence notations discussed in Chapter 2:

let $secondFemaleChar := $femaleChars(2)

Iterations with for

The let keyword by itself makes sense if you are dealing with one item from a sequence at any given time, but XPath is ultimately a set-manipulation language, and you need to have some way of dealing with the information as a set. This is the domain of the for keyword.

The principle purpose of for is to assign to a temporary variable successive values in a sequence. For instance, the following code line steps you through the first five (named) letters of the Greek alphabet:

for $grkLetter in ("alpha","beta","gamma","delta","epsilon")

This code first associates the value "alpha" with the variable $grkLetter to perform some processing, then sets the value to "beta", and so forth until it reaches "epsilon". You could also do this with a previously defined sequence stored in a variable:

let $femaleChars := document('characters.xml')//character[@gender = 'Female']
for $femaleCharacter in $femaleCharacters

Similarly, you can use the XPath to operator to iterate over numbers to do something analogous to the for statement in C++, Java, or Visual Basic. This example iterates over the first ten numbers:

for $index in (1 to 10)

After the discussion about XQuery being a functional language in which you are unable to assign multiple values to a single variable name, the use of the for statement might seem to be a clear violation. However, it isn't. Technically speaking, the restriction says you cannot create two variables with the same name in the same scope. This is somewhat analogous to a set of loops in a language such as JavaScript. This language has the var keyword, which indicates that the variable being defined is unique for this scope. For instance, consider the following JavaScript fragment:

for (var index=0; index!= 1; index++){
   write(index+":");
   for (var index=0;index!=2;index++){
      writeln(index);
      }
   }

You have two distinct scopes: the first belonging to the outer for loop, the second to the inner. This example, when run, prints a potentially counterintuitive result:

0:0
0:1
0:2
1:0
1:1
1:2

The outer scope is temporarily suspended when a variable with the same name is defined within the inner scope, as long as the inner variable is defined with the var keyword. This makes it possible to avoid the possibility of namespace collisions, where you end up naming a variable the same way someone else named it in some other piece of code.

In essence, the XQuery for operator acts the same way—the local variable (the variable before the in keyword) is defined within the scope of the internal block, something analogous to

for (var tempVar in mySequence){

in a language like JavaScript. $tempVar is instantiated, populated, used, and then destroyed, at which point a second (or third, or fifth, or whatever) $tempVar is created. Because the variable is never created when it already exists, it cannot violate the tenet of reassignment.

Both for and let can also work with full node trees that can be defined inline. For instance, you can create an XQuery that defines a set of regular expression filters, which can be accessed later:

let $filters := (
  <filter name="phone" regex="\(\d{3}\)\d{3}-\d{4}"/>,
  <filter name="zipcode" regex="\d{5}(-\d{4})?"/>,
  <filter name="email" regex="\w+@\w+\.\w"/>
  )
for $filter in $filters
return $filter

The output is as follows:

<filter name="phone" regex="\(\d{3}\)\d{3}-\d{4}"/>
<filter name="zipcode" regex="\d{5}(-\d{4})?"/>
<filter name="email" regex="\w+@\w+\.\w"/>

In this case, the sequence of elements is defined explicitly. Because whitespace is not (generally) significant within XML queries, you can create rich XML trees inline:

let $filters := (
  <filter>
    <name>phone</name>
    <regex>\(\d{3}\)\d{3}-\d{4}</regex>
  </filter>,
  <filter>
    <name>zipcode</name>
    <regex>\d{5}(-\d{4})?</regex>
  </filter>,
  <filter>
    <name>email</name>
    <regex>\w+@\w+\.\w </regex>
  </filter>
)
for $filter in $filters
return $filter

This code produces slightly more complex output:

  <filter>
    <name>phone</name>
    <regex>\(\d{3}\)\d{3}-\d{4}</regex>
  </filter>
  <filter>
    <name>zipcode</name>
    <regex>\d{5}(-\d{4})?</regex>
  </filter>
  <filter>
    <name>email</name>
    <regex>\w+@\w+\.\w </regex>
  </filter>

Returning Results

Neither assignment nor iteration by itself can produce output—they are used only to define variables or iterate through sets of variables, in a manner similar to the SQL SET and SELECT statements, respectively. The key to working with XQuery is to use these statements to choose the nodes with which you're going to work, and then pass those nodes onto the relevant output format. This is where the return keyword comes into play.

The purpose of return is to take the sets defined by the previous for and let statements and turn them into some form of result. It's important to realize that the result of an XQuery expression does not have to be XML. It could be a sequence of values, a single string expression, or a host of any other possible results, although the language is optimized to produce XML preferentially.

Notice, for instance, that the result in the previous sample is not, strictly speaking, an XML document:

<filter>
    <name>phone</name>
    <regex>\(\d{3}\)\d{3}-\d{4}</regex>
  </filter>
  <filter>
    <name>zipcode</name>
    <regex>\d{5}(-\d{4})?</regex>
  </filter>
  <filter>
    <name>email</name>
    <regex>\w+@\w+\.\w </regex>
  </filter>

Instead, it is a sequence of such documents. The output of an XQuery is a sequence of something, whether of XML document nodes, strings, numbers, or some combination. For instance,

for $a in (1 to 10) return $a

produces the following numeric sequence output

1,2,3,4,5,6,7,8,9,10

as distinct nodes.

This raises an interesting question. What is an output node? In essence, when an XQuery generates a result, the implementation of the result is application-specific. The result is, as mentioned, a sequence of items. Internally, what is returned usually is a DOM XMLNode object, although it might be subclassed as an element, attribute, text node, or some other resource. Typically, a non-XML result (anything that can't immediately be translated into an XML element or attribute) is returned as a text node, regardless of the data type of the variables being returned.

The expression after the return syntax can be a little confusing, especially if you are used to working with XSLT. You can introduce new elements into the output directly through the use of traditional XML-bracketed elements. For instance, you could in theory generate XML output from the list of numbers by placing an arbitrary XML element (such as a <number> tag) around the variable:

for $a in (1 to 3) return <number>$a</number>

Unfortunately, this will likely not give you the result you expect. The previous XQuery produces this result:

<number>$a</number>
<number>$a</number>
<number>$a</number>

Because any time you introduce an XML tag (opening and closing) into a result, the XQuery processor treats anything within those tags as being more markup and doesn't evaluate the result. Consequently, to do such an evaluation, you need to use the evaluation operators: {}. Such operators instruct the XQuery engine to treat the content within the brackets as XQuery expressions.

So, to get the expected result (a set of three numbers within tags), you change the XQuery to incorporate the evaluation operators:

for $a in (1 to 3)
return <number>{$a}</number>

This can lead to some interesting conditions. In any XQuery, there is an implicit assumption that the expression starts in evaluation mode—in other words, there is an implicit return statement at the highest level. That's why expressions such as

for $a in (1 to 3)
return <number>{$a}</number>

are evaluated in the first place. However, if you place arbitrary elements around the XQuery expression, the mode of operation switches into static mode:

<numbers><!-- now in static mode -->
for $a in (1 to 3)
return <number>{$a}</number>
</numbers>

In this case, the text is treated as if it is just that—text—until the evaluation brackets are reached. At that point, you ask the XQuery expression to evaluate a variable that has not been previously defined ($a), and it should fail. Indeed, using eXcelon Stylus Studio, the error received when this script ran was specifically "Variable a does not exist".

Consequently, to evaluate the text as if it were an XQuery expression, you must encompass the text within the <numbers> element with the evaluation operators {}:

<numbers>
{
for $a in (1 to 3)
return <number>{$a}</number>
}
</numbers>

This returns the expected results:

<numbers>
   <number>1</number>
   <number>2</number>
   <number>3</number>
</numbers>

This example also illustrates a second principal about evaluating XQuery expressions: You can have multiple nested {} elements, as long as they are used within elements in static context. For instance, in the example, the <numbers> tag puts the XQuery into static mode, and you have to place evaluation operators around the whole expression. Similarly, the <number> element puts the XQuery expression back into static mode, so you once again have to place the expression to be evaluated (in this case, the $a element) into the braces.

This can be seen in a slightly more sophisticated XQuery:

<numbers>
{for $a in (1 to 3) return
   <set>{for $b in (1 to $a)
      return <item>{$b}</item>
   }</set>
}
</numbers>

In this case, the $a variable iterates through the values from 1 to 3, producing <number> elements as a child. Each number element in turn evaluates from 1 to $a (whatever it happens to be for that loop) and performs its own internal return to produce <item> elements. This produces the following result:

<numbers>
   <set>
      <item>1</item>
   </set>
   <set>
      <item>1</item>
      <item>2</item>
   </set>
   <set>
      <item>1</item>
      <item>2</item>
      <item>3</item>
   </set>
</numbers>

These evaluated expressions can, of course, be more complex than simply returning the value of a variable. For instance, you can create a table in HTML that sums up sequences of varying lengths, as follows:

<html>
<head>
   <title>Summations</title>
</head>
<body>
<h1>Summations</h1>
<table>
   <tr>
      <th>Index</th>
      <th>Sum From 1 to Index</th>
   </tr>
{for $a in (1 to 10) return
   <tr>
      <td>{$a}</td>
      {
      let $b := (1 to $a)
      return
          <td>{sum($b)}</td>
      }
   </tr>
}
</table>
</body>
</html>

This example points out several salient lessons. First, you can use XQuery to generate HTML, which makes it a particularly potent tool for creating reports—an avenue we'll explore in greater depth in Chapter 4, "XQuery and XSLT." Second, you can use XQuery functions in the result blocks, such as the use of the sum() function to add up each successive $b list (that is, the lists (1), (1,2), (1,2,3), (1,2,3,4), and so on). Finally, any variable that is defined in an outside expression (such as the $a variable) is available for use within the inside expression, such as

let $b := (1 to $a)

You can similarly perform such evaluated expressions within attributes. For instance, suppose you want to create a table of colors in HTML. To do so, you need both the name of the table and a rectangle of the appropriate color illustrating the shade, set using the Cascading Style Sheets background-color property, as follows (see Figure 1):

<html>
<head>
   <title>Summations</title>
</head>
<body>
<h1>Summations</h1>
{let $colors :=("white","red","blue","green","yellow","purple","orange","black")
return
<table border="1">
   <tr>
      <th>Color</th>
      <th>Example</th>
   </tr>
{for $color in $colors return
   <tr>
      <td>{$color}</td>
       <td style="background-color:{$color}">&#160;</td>
   </tr>
}
</table>
}
</body>
</html>

Figure 1
You can use XQuery to generate more than just textual data, as this color sample illustrates.

The entity &#160; is a nonbreaking space—within an HTML <td> element, it ensures that the background color will always be rendered. What's most important here is the use of the evaluated expression in the style attribute:

<td style="background-color:{$color}">&#160;</td>

This basically replaces the indicated expression {$color} with its associated values: "white", "red", "blue", and so on, and as with elements, the expression within the attribute block could be a full XQuery expression (whitespace, including carriage returns, doesn't count in the way the attribute is handled).

The tag notation is useful in certain circumstances, but sometimes it can get in the way. The element and attribute operators perform the same purpose, but they don't require the use of the closing tag. The previous XQuery could be rewritten using these operators as follows:

<html>
<head>
   <title>Summations</title>
</head>
<body>
<h1>Summations</h1>
{let $colors :=("white","red","blue","green","yellow","purple","orange","black")
return
<table border="1">
   <tr>
      <th>Color</th>
      <th>Example</th>
   </tr>
{for $color in $colors return
   element tr {
      element td {$color},
       element td {
         attribute style {'background-color:',$color},
         '&#160;'
         }
      }
}
</table>
}
</body>
</html>

The non-XML usage for listing elements, attributes, and text content can make your code easier to read. The element constructor, for instance, takes the name of the element as the first parameter, and the value of the element (possible as an evaluated expression) as the second element. Thus,

element td {$color},

creates a new element <td> and places the text value of $color into it.

You can create sibling nodes (attribute, element, or text) with the comma separator (,) operator. Thus, in the definition of the second td element, the expression

element td {
   attribute style {'background-color:',$color},
   '&#160;'
   }

includes a new attribute node named style that in turn creates two child text nodes: the literal 'background-color' and the result of evaluating the $color variable. Because the content of an attribute must be a string, the XQuery engine concatenates these two values together into a single string value.

The same type of action is at work with the encompassing td element, which not only generates the style attribute, but also includes the literal '&#160;', the nonbreaking space character, as a text node. There is no direct concatenation here of the two nodes, by the way, because they are of a differing type—the attribute node attaches to an element as an attribute, whereas the text node is attached in a different way as part of the set of text nodes.

This can be given in a slightly simpler form. This expression

element a {
   element b {
      attribute b1 {t1},
      element c,
      'strD'
      }
   }

is the same as this tagged expression:

<a>
   <b b1="t1">
   <c/>
   strD
   </b>
</a>

The two formats are equivalent in their application, so you should use the format that works best for your needs.

Given the richness of possible output, there is always more to say about the return keyword, but the examples given here are sufficient to lay the groundwork for complex tasks.





Page 2 of 5



Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 


Enterprise Development Update

Don't miss an article. Subscribe to our newsletter below.

Sitemap | Contact Us

Rocket Fuel