XQuery Language Expressions
The Need for XQuery
When you combine XML, which is essentially an open data format independent of any specific data-formatting language, with a universal addressing mechanism such as URLs, great things happen. These two factors together make the endpoints of a transaction transparent to the kind of technologies that are on the other ends.
However, this transparency can break down in the presence of databases. Even if a database has an XML-based interface (either through SOAP or through HTTP POST or GET commands), querying that database to retrieve a dataset makes things more complicated. SQL is the standard way to make queries against most databasesbut each database vendor utilizes slightly (or not so slightly) different implementations of the SQL standard. What's worse is that the resultsets that come back are also formatted according to the whims of the database provider.
Platform consistencyThe language should be the same, regardless of which vendor or product is being used.
-
XML-centricThe language should treat non-XML databases as being equivalent to XML data stores. Note that this does not necessarily imply that the language has to be written in XML.
Set-capableSQL differs from languages such as Java or Visual Basic because it implicitly works on sets of data all at once rather than one item of data at a time. A new query language would similarly need to be set-focusedor more properly, node-focused.
Ease of useThe language should be easy enough to enable someone with a minimal database background to create complex queries.
The goal of this chapter is to examine the language in XQuery that was designed to handle all these primary points. The examples in this chapter are deliberately on the simplistic side to better illustrate the role of the command structure, but in general, XQuery finds utility more in the creation of fairly complex scripts.
Why Not SQL?
One of the first questions you can ask about XQuery is "Why not just create an updated version of SQL and use it with XML?" Beyond the vendor implementation issue we mentioned, one of the biggest problems that occurs comes from the infosets of SQL versus XML.
When you make a SQL query, you create from several tables a single virtual table (in simplest terms), which consists of the intersections, unions, and other filtered operations of the source tables. This is one of the reasons why being able to factor a problem domain into a minimal set of tables is so important: The more overlap that exists in tables, the more work must be done to eliminate spurious data sets.
Relational databases can be extraordinarily complex. Moreover, they are not good with "semi-structured" data (documents). Even a simple HTML document can very quickly overwhelm a relational database with its apparent complexity, and one of the characteristics of XML is the fact that a given element can have a range of possible subelements (or might even have an open definition in which the element can have anything as a subelement).
SQL mirrors the tabular mindset of good database design, and has been optimized for it. This means that it doesn't handle unpredictability and irregularity in data structure well.
As more data moves into an XML form, SQL's limitations become more pressing. As code becomes more complex, the advantages of a common XML-oriented data query language will become more obvious.
XSLT and XPath as Query Languages
A well-defined query language already exists for XML: XPath, covered in detail in Chapter 2, "Understanding the XPath Specification." You can format that query output with XSLT to create any XML format, so you might wonder why we need a dedicated XML query language.
XPath is an integral part of XQuery, as will be discussed in this chapter. Thus the real question is, "Why use XQuery rather than XSLT?" There are a few answers, although how compelling they are depends on how experienced you are with using XML:
Easier to useXQuery uses more procedurally-oriented code than XSLT, so it might be more familiar to use than the often paradigm-bending XSLT language.
Less verboseThis is a valid charge for XSLT 1.0, but less so for XSLT 2.0. XSLT is an XML-based language, and even simple routines can take up pages of code.
Less document-centricXSLT assumes an input stream of an XML document, although with features such as unparsed-text() and collection(), this requirement is less stringent in XSLT2. XQuery works implicitly upon sets of nodes that don't necessarily have to be XML in origin, although in all likelihood any XQuery solution would do an implicit conversion to XML before processing it.
The Structure of XQuery
Having described why XQuery is more or less necessary, let's look at its general makeup. The language consists of these primary areas:
for/letAnalogous to the SQL SELECT and SET statements, the for and let clauses let you define variables or iterate across a range of sequence values that are in turn assigned to a variable.
whereAnalogous to the SQL WHERE statement, the where clause provides a set of conditions that filter or limit the initial selection in a for statement.
order-byAnalogous to SORT BY in SQL, order-by provides the ordering constraints on a sequence.
returnAnalogous to the SQL RETURN statement, the XQuery return clause uses a custom formatting language to create output. The output does not necessarily have to be XML, although it is optimized to produce XML.
XPathMost XPath 2.0 functions are supported in XQuery, as is the axis model used to navigate over XML structures (although some data sources might not support all aspects of XPath because of the type of data involved).
FunctionsAnalogous to SQL stored procedures, you can define functions in XQuery using the XPath language that can be called inline in an XML query.
NamespacesA feature of XML rather than SQL. The declare namespace function associates a namespace URI with a prefix, crucial for indicating functionality.
Any number of keywords are associated with XQuery, but these broad categories describe different syntaxes depending upon what specifically needs to be done. For instance, here's an example of a full XQuery expression that uses all five of these areas:
declare namespace xs = "http://www.w3.org/2001/XMLSchema"
define function summaryText($char) returns xs:string
{
concat($char/name,' is a ',$char/gender,' ',$char/species)
}
<results>{
let $chars := input()//character[gender = 'Female']
for $char in $chars
where $char/level gt 5
return
<summary health="{$char/health}"> {
attribute level {$char/level},
attribute date {current-dateTime()},
summaryText($char)
}
</summary>
}
</results>
This is a query for pulling out a series of records from a game database. It assumes an initial XML data stream with records corresponding to different characters, as shown in the file characters.xml:
<characters> <character> <name>Aleria</name> <gender>Female</gender> <species>Heroleim</species> <vocation>Bard</vocation> <level>5</level> <health>25</health> </character> <character> <name>Shar</name> <gender>Male</gender> <species>Human</species> <vocation>Merchant</vocation> <level>6</level> <health>28</health> </character> <character> <name>Gite</name> <gender>Female</gender> <species>Aelvar</species> <vocation>Mage</vocation> <level>7</level> <health>18</health> </character> <character> <name>Horukkan</name> <gender>Male</gender> <species>Udrecht</species> <vocation>Warrior</vocation> <level>5</level> <health>32</health> </character> <character> <name>Gounna</name> <gender>Female</gender> <species>Noleim</species> <vocation>Mage</vocation> <level>8</level> <health>31</health> </character> <character> <name>Sheira</name> <gender>Female</gender> <species>Human</species> <vocation>Cleric</vocation> <level>4</level> <health>9</health> </character> <character> <name>Drue</name> <gender>Female</gender> <species>Voleim</species> <vocation>Warrior</vocation> <level>6</level> <health>32</health> </character> <character> <name>Paccu</name> <gender>Male</gender> <species>Human</species> <vocation>Merchant</vocation> <level>5</level> <health>24</health> </character> </characters>
When the query is executed, it returns an XML document showing all female characters that are of greater than fifth level, with a text summary:
<results> <summary date="2002-10-24T14:38:48" health="18" level="7">Gite is a Female Aelvar</summary> <summary date="2002-10-24T14:38:48" health="31" level="8">
Gounna is a Female Noleim</summary> <summary date="2002-10-24T14:38:48" health="32" level="6">
Drue is a Female Voleim</summary> </results>
Note that this output has been reformatted somewhat to make it more legible. Whitespace is not usually significant in XQuerywithout the formatting, the result of the previous query is as follows:
<results><summary date="2002-10-24T14:38:48" health="18" level="7">Gite is a Female Aelvar </summary><summary date="2002-10-24T14:38:48"
health="31" level="8">Gounna is a Female Noleim </summary><summary date="2002-10-24T14:38:48"
health="32" level="6">Drue is a Female Voleim </summary> </results>
Because each of the major areas has its own language and syntax, the best way to understand the full XQuery language is to break it into the code for each type of expression.
Page 1 of 5
