October 23, 2014
Hot Topics:
RSS RSS feed Download our iPhone app

XQuery Language Expressions

  • March 19, 2004
  • By Kurt Cagle
  • Send Email »
  • More Articles »

The Need for XQuery

When you combine XML, which is essentially an open data format independent of any specific data-formatting language, with a universal addressing mechanism such as URLs, great things happen. These two factors together make the endpoints of a transaction transparent to the kind of technologies that are on the other ends.

However, this transparency can break down in the presence of databases. Even if a database has an XML-based interface (either through SOAP or through HTTP POST or GET commands), querying that database to retrieve a dataset makes things more complicated. SQL is the standard way to make queries against most databases—but each database vendor utilizes slightly (or not so slightly) different implementations of the SQL standard. What's worse is that the resultsets that come back are also formatted according to the whims of the database provider.

As mentioned in my book XQuery Kick Start, XQuery was proposed as a solution to this conundrum. To make databases truly transparent (so that it doesn't matter whether there is a Microsoft SQL Server, IBM DB2, Oracle 9i, PostgreSQL, or any other vendor's database engine), it is necessary to make a query language that will provide the following:

The goal of this chapter is to examine the language in XQuery that was designed to handle all these primary points. The examples in this chapter are deliberately on the simplistic side to better illustrate the role of the command structure, but in general, XQuery finds utility more in the creation of fairly complex scripts.

Why Not SQL?

One of the first questions you can ask about XQuery is "Why not just create an updated version of SQL and use it with XML?" Beyond the vendor implementation issue we mentioned, one of the biggest problems that occurs comes from the infosets of SQL versus XML.

When you make a SQL query, you create from several tables a single virtual table (in simplest terms), which consists of the intersections, unions, and other filtered operations of the source tables. This is one of the reasons why being able to factor a problem domain into a minimal set of tables is so important: The more overlap that exists in tables, the more work must be done to eliminate spurious data sets.

Relational databases can be extraordinarily complex. Moreover, they are not good with "semi-structured" data (documents). Even a simple HTML document can very quickly overwhelm a relational database with its apparent complexity, and one of the characteristics of XML is the fact that a given element can have a range of possible subelements (or might even have an open definition in which the element can have anything as a subelement).

SQL mirrors the tabular mindset of good database design, and has been optimized for it. This means that it doesn't handle unpredictability and irregularity in data structure well.

As more data moves into an XML form, SQL's limitations become more pressing. As code becomes more complex, the advantages of a common XML-oriented data query language will become more obvious.

XSLT and XPath as Query Languages

A well-defined query language already exists for XML: XPath, covered in detail in Chapter 2, "Understanding the XPath Specification." You can format that query output with XSLT to create any XML format, so you might wonder why we need a dedicated XML query language.

XPath is an integral part of XQuery, as will be discussed in this chapter. Thus the real question is, "Why use XQuery rather than XSLT?" There are a few answers, although how compelling they are depends on how experienced you are with using XML:

  • Easier to use—XQuery uses more procedurally-oriented code than XSLT, so it might be more familiar to use than the often paradigm-bending XSLT language.

  • Less verbose—This is a valid charge for XSLT 1.0, but less so for XSLT 2.0. XSLT is an XML-based language, and even simple routines can take up pages of code.

  • Less document-centric—XSLT assumes an input stream of an XML document, although with features such as unparsed-text() and collection(), this requirement is less stringent in XSLT2. XQuery works implicitly upon sets of nodes that don't necessarily have to be XML in origin, although in all likelihood any XQuery solution would do an implicit conversion to XML before processing it.

The Structure of XQuery

Having described why XQuery is more or less necessary, let's look at its general makeup. The language consists of these primary areas:

  • for/let—Analogous to the SQL SELECT and SET statements, the for and let clauses let you define variables or iterate across a range of sequence values that are in turn assigned to a variable.

  • where—Analogous to the SQL WHERE statement, the where clause provides a set of conditions that filter or limit the initial selection in a for statement.

  • order-by—Analogous to SORT BY in SQL, order-by provides the ordering constraints on a sequence.

  • return—Analogous to the SQL RETURN statement, the XQuery return clause uses a custom formatting language to create output. The output does not necessarily have to be XML, although it is optimized to produce XML.

  • XPath—Most XPath 2.0 functions are supported in XQuery, as is the axis model used to navigate over XML structures (although some data sources might not support all aspects of XPath because of the type of data involved).

  • Functions—Analogous to SQL stored procedures, you can define functions in XQuery using the XPath language that can be called inline in an XML query.

  • Namespaces—A feature of XML rather than SQL. The declare namespace function associates a namespace URI with a prefix, crucial for indicating functionality.

Any number of keywords are associated with XQuery, but these broad categories describe different syntaxes depending upon what specifically needs to be done. For instance, here's an example of a full XQuery expression that uses all five of these areas:

declare namespace xs = "http://www.w3.org/2001/XMLSchema"
define function summaryText($char) returns xs:string
{
   concat($char/name,' is a ',$char/gender,' ',$char/species)
}
<results>{
   let $chars := input()//character[gender = 'Female']
   for $char in $chars
      where $char/level gt 5
      return
          <summary health="{$char/health}"> {
          attribute level {$char/level},
          attribute date {current-dateTime()},
         summaryText($char)
          }
         </summary>
   }
 </results>

This is a query for pulling out a series of records from a game database. It assumes an initial XML data stream with records corresponding to different characters, as shown in the file characters.xml:

<characters>
<character>
  <name>Aleria</name>
  <gender>Female</gender>
  <species>Heroleim</species>
  <vocation>Bard</vocation>
  <level>5</level>
  <health>25</health>
</character>
<character>
  <name>Shar</name>
  <gender>Male</gender>
  <species>Human</species>
  <vocation>Merchant</vocation>
  <level>6</level>
  <health>28</health>
</character>
<character>
  <name>Gite</name>
  <gender>Female</gender>
  <species>Aelvar</species>
  <vocation>Mage</vocation>
  <level>7</level>
  <health>18</health>
</character>
<character>
  <name>Horukkan</name>
  <gender>Male</gender>
  <species>Udrecht</species>
  <vocation>Warrior</vocation>
  <level>5</level>
  <health>32</health>
</character>
<character>
  <name>Gounna</name>
  <gender>Female</gender>
  <species>Noleim</species>
  <vocation>Mage</vocation>
  <level>8</level>
  <health>31</health>
</character>
<character>
  <name>Sheira</name>
  <gender>Female</gender>
  <species>Human</species>
  <vocation>Cleric</vocation>
  <level>4</level>
  <health>9</health>
</character>
<character>
  <name>Drue</name>
  <gender>Female</gender>
  <species>Voleim</species>
  <vocation>Warrior</vocation>
  <level>6</level>
  <health>32</health>
</character>
<character>
  <name>Paccu</name>
  <gender>Male</gender>
  <species>Human</species>
  <vocation>Merchant</vocation>
  <level>5</level>
  <health>24</health>
</character>
</characters>

When the query is executed, it returns an XML document showing all female characters that are of greater than fifth level, with a text summary:

<results>
   <summary date="2002-10-24T14:38:48" health="18" level="7">
Gite is a Female Aelvar</summary>
   <summary date="2002-10-24T14:38:48" health="31" level="8">
Gounna is a Female Noleim</summary>
   <summary date="2002-10-24T14:38:48" health="32" level="6">
Drue is a Female Voleim</summary>
 </results>

Note that this output has been reformatted somewhat to make it more legible. Whitespace is not usually significant in XQuery—without the formatting, the result of the previous query is as follows:

<results><summary date="2002-10-24T14:38:48" health="18" level="7">
Gite is a Female Aelvar
         </summary><summary date="2002-10-24T14:38:48" 
health="31" level="8">Gounna is a Female Noleim
         </summary><summary date="2002-10-24T14:38:48" 
health="32" level="6">Drue is a Female Voleim
         </summary>

</results>

Because each of the major areas has its own language and syntax, the best way to understand the full XQuery language is to break it into the code for each type of expression.





Page 1 of 5



Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 


Sitemap | Contact Us

Rocket Fuel