December 20, 2014
Hot Topics:

XQuery Language Expressions

  • March 19, 2004
  • By Kurt Cagle
  • Send Email »
  • More Articles »

where Oh where?

The where keyword serves as a means to define conditions about which data should be chosen, and serves a purpose roughly analogous to the SQL WHERE clause. In both cases, you are creating what is called a predicate cause, which is used to narrow a specific range of potential options.

For instance, suppose you want to display only those colors that contain the letter r in a table, as shown in the previous section. (Why you'd want this search criteria is a task better suited to psychoanalysts than programmers, but it's illustrative nonetheless.) Here you could use the where clause to quantify the search (see Figure 2):

<html>
<head>
   <title>Summations</title>
</head>
<body>
<h1>Summations</h1>
{let $colors :=("white","red","blue","green","yellow","purple","orange","black")
return
<table border="1">
   <tr>
      <th>Color</th>
      <th>Example</th>
   </tr>
{for $color in $colors where contains($color,'r')return
   element tr {
      element td {$color},
       element td {
         attribute style {'background-color:',$color},
         '&#160;'
         }
      }
}
</table>
}
</body>
</html>

The contains() function is, of course, one of the functions that XQuery and XPath2 jointly share. It returns a Boolean value if the particular node is either a string or can be automatically converted to a string, and the second expression can be found in the first.

The where predicate should generally follow either the let or for keyword (usually for) but precede the statement's return clause. It should also return either a Boolean value or something that can be converted to a Boolean value, such as an empty node-list or sequence, the value 0 or an empty string ("").

Figure 2
You can filter on colors that have the letter r using the where clause.

The where expression does not hold quite the same level of importance in XQuery as does the analogous WHERE expression in SQL. For starters, it is possible to use XPath predicates in the for and let expressions to act as a filter for XPath (and it is more economical to do so). The previous example can be rewritten using an XPath predicate as follows:

{for $color in $colors[contains(.,'r')] return . . .

Although this code might not be as readily understandable (you have to appreciate the rule of the context operator), it works just as well. In this case, the context operator (.) performs an automatic iteration over each node in $colors to perform the calculation to determine whether r is part of the color value. In the first case, the where expression examines each $color object in turn to determine which node to pass on; in the second, the XPath predicate winnows down the set of $colors to be passed to the iterator. It's the same result in both cases, but one occurs at a different point in the cycle than the other.

Ordering Up Some XQuery

For more than a year, XQuery had been inextricably tied to the acronym FLWR (presumably pronounced "flower") for For|Let|Where|Return. However, as of the November 15, 2002 draft, this acronym became FLWOR, an explicit recognition that ordering plays a fairly major part in database retrieval. The O stands for order by, a keyword that performs much the same actions as the SQL ORDER BY command.

The order by command follows for expressions (and where expressions where they exist), and indicates for a given set of data the order in which the data is output. When it is not explicitly specified with order by, the output order generally depends on the specific system architecture. With an XML file, for instance, the specific order is the order in which the XML parser walks the tree (usually the child of each node is processed, then the next sibling, and then the parent)—the so-called document order.

The order by keyword indicates that the content should be sorted in ascending order, unless otherwise indicated according to the data type of the expressions being evaluated. This also implicitly assumes that the data type has some internal notion of ordering, which in turn assumes that there is some sense of schema validation acting on the data. For instance, consider the characters.xml file defined earlier in the chapter. Assume that it has an XSD schema (characters.xsd), which looks something like this:

<?xml version="1.0"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">

   <xsd:element name="characters" type="Characters"/>

   <xsd:complexType name="Character">
      <xsd:sequence>
         <xsd:element name="name" type="xsd:string"/>
         <xsd:element name="gender" type="Gender"/>
         <xsd:element name="species" type="xsd:string"/>
         <xsd:element name="vocation" type="xsd:string"/>
         <xsd:element name="level" type="xsd:nonNegativeInteger"/>
         <xsd:element name="health" type="xsd:int"/>
      </xsd:sequence>
   </xsd:complexType>

   <xsd:simpleType name="Gender">
      <xsd:restriction base="xsd:string">
         <xsd:enumeration value="Female"/>
         <xsd:enumeration value="Male"/>
         <xsd:enumeration value="Other"/>
      </xsd:restriction>
   </xsd:simpleType>

   <xsd:complexType name="Characters">
      <xsd:sequence>
         <xsd:element name="character" type="Character" minOccurs="0"
 maxOccurs="unbounded"/>
      </xsd:sequence>
   </xsd:complexType>

</xsd:schema>

The schema can then be associated with the characters.xml file by editing the enclosing <characters> element:

<characters xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="http://www.kurtcagle.net/schemas/Characters.xsd"> <character> <name>Aleria Delamare</name> <gender>Female</gender> <species>Heroleim</species> <vocation>Bard</vocation> <level>5</level> <health>25</health> </character> <character> <name>Shar Denasthenes</name> <gender>Male</gender> <species>Human</species> <vocation>Merchant</vocation> <level>6</level> <health>28</health> </character> ... </characters>

The xsi:noNamespaceSchemaLocation gives the URL (here on a local machine) to the schema file. This schema thus indicates that <level> is a nonnegative integer, and <health> is an unbounded integer.

Given that, an XQuery can be written that orders the characters by health:

for $character in document('characters.xml')//character
order by health
return
   <healthReport>
      {$character/name}
      {$character/health}
</healthReport>

This will return the following collection:

<healthReport>
   <name>Sheira</name>
   <health>9</health>
</healthReport>
<healthReport>
   <name>Gite</name>
   <health>18</health>
</healthReport>
<healthReport>
   <name>Paccu</name>
   <health>24</health>
</healthReport>
<healthReport>
   <name>Aleria</name>
   <health>25</health>
</healthReport>
<healthReport>
   <name>Shar</name>
   <health>28</health>
</healthReport>
<healthReport>
   <name>Gounna</name>
   <health>31</health>
</healthReport>
<healthReport>
   <name>Horukkan</name>
   <health>32</health>
</healthReport>
<healthReport>
   <name>Drue</name>
   <health>32</health>
</healthReport>

Notice here that the order was done implicitly using integers. The first health report indicates a health of 9:

<healthReport>
   <name>Sheira</name>
   <health>9</health>
</healthReport>

Had the ordering been done without regard to the data type of the element, this entry would have been last, because 9 is alphanumerically higher in order than 32.

Suppose, however, that you didn't have a specific schema against which to validate the characters.xml file. Would this have made sorting numerically impossible? No, because you can also declare an element's data type inline:

for $character in document('characters.xml')//character
order by $character\health as xs:integer
return
   <healthReport>
      {$character/name}
      {$character/health}
</healthReport>

In this case, the contents of the <health> element are cast to an integer. The xs: namespace is implicitly defined within the XQuery schema, and indicates the XSD schema data types.

You can also determine the order of a query. The ascending and descending keywords determine the direction of the sort, and appear after the sort expression is given. For instance, suppose that each character element has an additional child element: <createdBy> of type xs:dateTime. To order the characters in descending order by the time they were created, you'd write this query:

for $character in document('characters.xml')//character
order by $character\dateCreated as xs:dateTime descending
return
   <dateReport>
      {$character/name}
      {$character/dateCreated}
   </dateReport>

Creating more sophisticated queries requires the intelligent use of FLWOR type expressions. For instance, if you want to list all entries in the characters.xml file by level, then sort all characters at that level by name, your query would need to use the distinct-values() function to retrieve the levels, then use that as the key for determining the set of each character within that level:

let $characters := document('characters.xml')//character
for $level in distinct-values($characters/level)
   order by $level
return
   <level value="{$level}">{
      for $character in $characters
      where $character/level = $level
         order by $character/name
      return $character/name
      }
   </level>

This would create the following output:

<level value="4">
   <name>Sheira</name>
</level>
<level value="5">
   <name>Aleria</name>
   <name>Horukkan</name>
   <name>Paccu</name>
</level>
<level value="6">
   <name>Shar</name>
   <name>Drue</name>
</level>
<level value="7">
   <name>Gite</name>
</level>
<level value="8">
   <name>Gounna</name>
</level>

One of the problems with such sorting is that occasionally an empty order by expression will be used. For instance, suppose that a new character is added to the characters.xml list:

<character>
  <name>Yane Helavela</name>
  <gender>Female</gender>
  <species>Human</species>
  <vocation>Illusionist</vocation>
</character>

In this case, no level element exists. The question that arises is whether this particular element should appear at the beginning or end of a sequence when sorted. This problem is answered by the empty greatest and empty least keywords. The empty greatest command indicates that whenever an item to be sorted has an empty sort key (or doesn't have one at all), it will always be assumed to be at the end of the sort order. On the other hand, empty least will place the same element at the beginning of the sort order.

The stable keyword, when added to the list of order by qualifiers, indicates another condition: what happens when two identical sort keys are found for different elements. For instance, if two different entries have the same creation date, the order of the output for the items becomes more questionable. In that situation, the stable keyword is used to tell the processor to retain the initial order in which the elements are retrieved, whereas the processor is free to implement its own ordering scheme if the stable keyword is absent. In most cases, this shouldn't make a major difference in the ordering.

Finally, if you have a reference to a specific collation (such as "eng-us"), you can indicate this particular collation as the basis for sorting through the collation keyword on the order by expression. For instance,

for $character in $characters
   order by $character/name collation "eng-us"
return $character

will order the characters in ascending (the default) order using the U.S. English collation.





Page 3 of 5



Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 


Enterprise Development Update

Don't miss an article. Subscribe to our newsletter below.

Sitemap | Contact Us

Rocket Fuel