The Fundamentals of the SQL Server 2005 XML Datatype
Have you ever wanted flexibility in your SQL Server database without having to add additional tables? Do you store XML in your SQL Server 2000 database and yearn for an easier way to validate it against a XML Schema definition? If you're planning on upgrading to SQL Server 2005, you're in luck.
The world is moving to XML for many of its data storage needs, and SQL Sever 2005 has embraced XML in a big way. Alhough SQL Server 2000 always supported some form of XML, SQL Server 2005 ups the ante quite a bit. Microsoft has added the new XML datatype, along with a range of functions to manipulate it.
The topic of XML in SQL Server 2005 is complex enough to warrant five whitepapers. So, if you're a relational database developer, you're probably wondering how to begin piecing together the XML landscape in SQL Server 2005. Using the AdventureWorks sample database, which ships with the CTP version of SQL Server 2005, this article provides the fundamentals for working with the XML datatype and refers you to other XML resources for additional study (see the Further Reading section at the end).
The first stop is a primer on some basic XML concepts.
XML Primer, XSD, and XML Namespaces
A complete introduction to XML is beyond the scope of this article. Besides, if you've modified any of the various standard configuration files that accompany .NET products, you're probably already familiar with XML's hierarchical, HTML-like look. The following is a typical XML document:
<ns0:root xmlns:ns0="http://schemas.microsoft.com/sqlserver/2004/ 07/adventure-works/ProductModelManuInstructionsModified"> <ns0:Location LocationID="100" SetupHours="10.4" MachineHours="10.4" LaborHours="10.4" LotSize="10.4" /> </ns0:root>
XML namespaces and XML Schema Definition (XSD) are XML concepts with which you may be unfamiliar, however. A namespace is simply a set of related names that must be unique across a particular context, and it is used to identify things within that context. A good namespace analogy is the names of your family members. Everyone in your family has a unique name, eliminating the confusion of identifying a particular member. A common use for a namespace is to create a unique identifier for a class in an object-oriented programming language.
A XML namespace is a namespace for a particular XSD. To understand the importance of a namespace to a Schema definition, you must understand XSD and be familiar with some XML terminology. XML documents contain two essential pieces of information: elements and attributes. In the XML document example above, "Location" is an element and "LocationID" is an attribute. The basic difference is that elements can contain attributes and other elements, whereas attributes simply are pieces of additional information attached to an element.
In the SQL programming languages, a table definition describes the structure of data in a database. Likewise, XSD describes the structure of a particular XML document. The XSD for the XML document shown above appears below:
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://schemas.microsoft.com/sqlserver/ 2004/07/adventure-works/ ProductModelManuInstructionsModified" elementFormDefault="qualified"> <xsd:element name="root"> <xsd:complexType mixed="true"> <xsd:complexContent mixed="true"> <xsd:restriction base="xsd:anyType"> <xsd:sequence> <xsd:element name="Location" maxOccurs="unbounded"> <xsd:complexType mixed="true"> <xsd:complexContent mixed="true"> <xsd:restriction base="xsd:anyType"> <xsd:attribute name="LocationID" type="xsd:integer" use="required" /> <xsd:attribute name="SetupHours" type="xsd:decimal" /> <xsd:attribute name="MachineHours" type="xsd:decimal" /> <xsd:attribute name="LaborHours" type="xsd:decimal" /> <xsd:attribute name="LotSize" type="xsd:decimal" /> </xsd:restriction> </xsd:complexContent> </xsd:complexType> </xsd:element> </xsd:sequence> </xsd:restriction> </xsd:complexContent> </xsd:complexType> </xsd:element> </xsd:schema>
In the example above, "Location" is explicitly defined as an element and "LocationID" is defined as an attribute on the element Location. Like a table definition, elements and attributes in an XSD have type information.
Like records in a table, multiple XML documents modeled on the same XSD can exist in a database. As stated earlier, a XML namespace is a namespace for a particular XSD. So, including the XML namespace in the XML document you created ties the XML document to the XSD the document is modeled after. Using the XML namespace, a XML processor can quickly determine which Schema a document has utilized, retrieve the Schema based on the namespace, and validate the XML document using the appropriate Schema.
As you may have noticed, XML namespaces are typically formatted to look like a URL for a Web site. Like a class namespace, a XML namespace must be unique. To ensure that the namespace is unique, a portion of a URL is usually a registered domain name.
With this short introduction to namespaces and XML Schema Definition, you're ready for a discussion of the XML datatype in SQL Server 2005. Click here to download the accompanying source code for the examples.
Adding a New XML Datatype to a Table
Like other datatypes in SQL Server 2005, you can add a XML datatype programmatically by using Transact SQL (T-SQL) or SQL Server Management Studio. Figure 1 shows the configuration options for adding a XML datatype with SQL Server Management Studio.
Figure 1. Add a XML Column with SQL Server Management Studio
You must make three decisions when you add a XML datatype:
- Will your XML field be typed or untyped?
- Will you be storing complete XML documents or fragments of XML documents?
- Will you store XML documents associated with multiple Schema definitions?
Typed or Untyped XML?
Like other datatypes, the XML datatype must meet specific formatting criteria. It must conform to well-formed XML criteria (untyped) and you can optionally add additional conformance criteria by specifying a Schema collection (typed).
A Schema collection consists of a group of Schema definitions. Again, you have two options for adding Schemas to a collection, T-SQL or SQL Server Management Studio. Figure 2 shows the area you must access to maintain your Schema collection using SQL Server Management Studio.
Figure 2. View XML Schema in SQL Server Management Studio
Once you have built your Schema collection, you can assign the appropriate Schema collection to your XML datatype. Figure 3 shows a dropdown list of Schema collections in the AdventureWorks database.
Figure 3. Schema Dropdown List
Complete XML Documents or Fragments
Once you've decided on a typed or untyped XML datatype field, you must determine whether you will be storing a XML document based on a single Schema, multiple Schemas, or fragments of XML. Selecting yes for the "Is XML Document" option in the datatype configuration dialog sets the field to accept a single XML document associated with a single Schema. Setting "Is XML Document" to no configures the field for all other options, including fragments of XML data.
Below are some other facts to consider when you configure your XML datatype solution:
- XML documents are limited to two gigabytes.
- A XML document is UTF-16 encoded in the database.
- XML documents can be indexed, though you must make considerations when you build the table with the contained XML datatype.
- XML Schema Collection operations require special permissions to execute the appropriate T-SQL statements.
With knowledge of XML namespaces, Schema definitions, and XML datatype definition, you're ready to employ XQuery to manipulate the XML data in the database.