July 22, 2014
Hot Topics:
RSS RSS feed Download our iPhone app

.NET and XML: XSD Schemas

  • July 30, 2004
  • By Klaus Salchner
  • Send Email »
  • More Articles »

W3C (the World Wide Web Consortium, http://www.w3.org) published the XML 1.0 specification on February 10th, 1998. The XML 1.1 specification was published six years later, on February 4th 2004. In the six years, XML has taken the industry by storm. XML has become the standard for how to describe and exchange data. The current development platforms, .NET and J2EE, support XML natively. All modern enterprise applications, be it a SQL Server or Oracle database, a BizTalk Server, an Office suite, or any of the other thousands of applications support XML to various degrees. You will be pretty hard pressed to find an application that does not support or use XML.

The first article explained the fundamentals and powers of XPath queries. XPath queries allow you to search and navigate your XML documents easily. This article looks at the fundamentals and powers of XSD schemas. The following articles look at XSL Transformations and then how well these three standards are supported by the .NET framework and what the most important namespaces and types are. This series of articles is not intended as a comprehensive description of all the .NET types around XML. The goal is rather to provide a good introduction so you understand the XML capabilities of the .NET framework and can start leveraging them for your current .NET projects.

The Sample XML Document for the Series of Articles

This series of articles takes it as a given that you are familiar with XML itself. The sample XML document used throughout the articles is a list of employees, which must have for each employee the first name, last name, phone number, and e-mail address and can also provide the job title and a Web address.

<?xml version="1.0" encoding="utf-8"?>
<Employees xmlns="http://tempuri.org/MySchema.xsd">
   <Employee ID="1">
      <FirstName>Klaus</FirstName>
      <LastName>Salchner</LastName>
      <PhoneNumber>410-727-5112</PhoneNumber>
      <EmailAddress>klaus_salchner@hotmail.com</EmailAddress>
      <WebAddress>http://www.enterprise-minds.com</WebAddress>
      <JobTitle>Sr. Enterprise Architect</JobTitle>
   </Employee>
   <Employee ID="2">
      <FirstName>Peter</FirstName>
      <LastName>Pan</LastName>
      <PhoneNumber>604-111-1111</PhoneNumber>
      <EmailAddress>peter.pan@fiction.com</EmailAddress>
      <JobTitle>Sr. Developer</JobTitle>
   </Employee>
</Employees>

The Fundamentals of XSD Schemas

It is very easy to create XML documents whether programmatically or manually through an XML editor like XML Spy, Stylus Studio, or Visual Studio .NET 2003. But, very often when processing a XML document, you want to know that it conforms to a certain structure, the structure your application understands. That is where XSD schemas come into play. XSD schemas are the successor of DTDs (Document Type Definition), the difference being that XSD itself uses a XML syntax. XSD schemas allow you to declare the structure of an XML document, which elements and attributes are allowed, is it a mandatory or optional element, can there be more then one instance of an element, and so forth. You then can use the XSD schema to validate the XML document, meaning does the XML document conform to the structure described by the XSD schema. The XML describes the data and the XSD schema describes the structure of the data. Version 1.0 of the XSD schema standard has been released May 2001 and can be found at http://www.w3.org/TR/xmlschema-0/, http://www.w3.org/TR/xmlschema-1/ along with http://www.w3.org/TR/xmlschema-2/. The working draft of XSD 1.1 can be found at http://www.w3.org/TR/2003/WD-xmlschema-11-req-20030121/.

When you create your XSD schema, you do two things. First, you declare an element or attribute. Declaring means you associate an element or attribute name with a set of constraints, for example an element with the name FirstName is of the string type and only one element of that name is allowed. Second, you define new simple or complex types. XSD has a set of standard types such as string, boolean, integer, date, and so forth. The .NET framework maps these XSD data types against its .NET data types. In our sample XML document, the Employee is a complex type. Think in terms of data structures. In your application code, you would define a new structure called Employee and it would contain the elements FirstName, LastName, PhoneNumber, EmailAddress, WebAddress, and JobTitle. In XSD schemas, you do exactly the same. You define a complex type of the name Employee and then declare all the elements this type has plus the constraints for each element; for example, the FirstName element is of the string type. See the XSD below schema for our sample XML document:

<?xml version="1.0"?>
<xs:schema targetNamespace="http://tempuri.org/MySchema.xsd"
           
xmlns="http://tempuri.org/MySchema.xsd"
                      xmlns:xs="http://www.w3.org/2001/XMLSchema"
            
attributeFormDefault="unqualified"
            
elementFormDefault="unqualified">
   <xs:element name="Employees">
      <xs:complexType>
      <xs:choice minOccurs="1" maxOccurs="unbounded">
         <xs:element name="Employee" type="EmployeeType"/>
      </xs:choice>
      </xs:complexType>
   </xs:element>
      <xs:complexType name="EmployeeType">
         <xs:sequence>
            <xs:element name="FirstName" type="xs:string"
                        minOccurs
="1" maxOccurs="1"/>
            <xs:element name="LastName" type="xs:string"
                        minOccurs
="1" maxOccurs="1"/>
            <xs:element name="PhoneNumber" type="xs:string"
                        minOccurs
="1" maxOccurs="1"/>
            <xs:element name="EmailAddress" type="xs:string"
                        minOccurs
="1" maxOccurs="1"/>
            <xs:element name="WebAddress" type="xs:string"
                        minOccurs
="0" maxOccurs="1"/>
            <xs:element name="JobTitle" type="xs:string"
                        minOccurs
="0" maxOccurs="1"/>
         </xs:sequence>
         <xs:attribute name="ID" form="unqualified" type="xs:string"/>
      </xs:complexType>
</xs:schema>

Let's first look at the XSD elements, meaning the XML elements you use in your XSD schema, which you use to declare an element or attribute. W3C provides a XSD schema that describes all the valid XSD element and attribute names. It can be found at http://www.w3.org/2001/XMLSchema.xsd.

Element Description
element Used to declare an element. Can have any of the attributes listed below to describe the element you are declaring.
attribute Used to declare an attribute. Can have any of the attributes listed below to describe the attribute you are declaring, except otherwise specified.
name (attribute) Specifies the name of the XML element or attribute.
type (attribute) Specifies the type of the XML element or attribute. XSD comes with a number of simple data types like string, integer, date, and so on. Each .NET data type can be mapped to a XSD data type. Refer to your MSDN library for a complete list of the XSD types (search for "XML Data Types Reference"; make sure to put it in double quotes so it searches for the whole term, not just the individual words)
minOccurs (attribute) Describes the minimum number of occurrences of the element (not allowed for attributes). A value of zero means that you can omit this element. Any other value means you must have this element that often; for example, one time. This allows you to make elements mandatory.
maxOccurs (attribute) Describes the number maximum number of occurrences of the element (not allowed for attributes). Setting this value to zero un-declares the element, meaning no element of this name is allowed. Setting it to the value "unbounded" means an unlimited number of elements is allowed. Specifying a value means the element is not allowed to be present more often then specified.
default (attribute) Specifies the default value of the element or attribute. This can only be used for simple data types or text only data types. The "default" and "fixed" attributes are mutually exclusive.
fixed (attribute) Specifies the predetermined and unchangeable value of an element or attribute. This can only be used for simple data types or text-only data types. The "default" and "fixed" attributes are mutually exclusive.
ref (attribute) References a global element or attribute declared someplace else in this or any other referenced XSD schema. This allows you to declare another instance of that element or attribute under a complex type without having to repeat all the constraints (meaning the type, name, minOccurs, maxOccurs, and so on). It does not allow to reference another element or attribute when part of another complex type, only global ones.
form (attribute) If set to "unqualified" then this element or attribute is not required to be qualified with a namespace prefix. If set to "qualified" then this element or attribute must be qualified with a namespace prefix. If not specified then the default from the schema element applies (elementFormDefault and attributeFormDefault).

This is not a complete list, but these are the main XSD elements you use to declare elements or attributes. Refer to the XSD standard for a complete reference. Now, let's look at the XSD elements you use to define new types. You can define simple types and complex types. A simple type takes a base type and applies some restrictions to it.

Type Description
simpleType

Defines a simple type, which takes a base type and applies additional restrictions to it. A simple type cannot declare any elements or attributes. It takes the base type and applies new restrictions. Here is an example:

<xs:element name="MyValue" type="MyInteger"/>
<xs:simpleType name="MyInteger">
   <xs:restriction base="xs:positiveInteger">
      <xs:minInclusive value="1"/>
      <xs:maxInclusive value="10"/>
   </xs:restriction>
</xs:simpleType>

It declares a new element of the name MyValue, which is of the type MyInteger. It then defines the new type called MyInteger that uses as base type a positiveInteger and restricts its values between one and ten (inclusive). If you don't want to define a new type, you can nest it within the element you declare:

<xs:element name="MyValue">
   <xs:simpleType>
      <xs:restriction base="xs:positiveInteger">
         <xs:minInclusive value="1"/>
         <xs:maxInclusive value="10"/>
      </xs:restriction>
   </xs:simpleType>
</xs:element>

As you can see, in this case you don't specify a type attribute but rather nested within the element have the simple type defined. Also, the simple type does not get any name attribute, so it can't be used for any other element. The same applies for attributes you declare.

restriction Defines a restriction within a simple type. With the base attribute you specify the base type this simple type is based on; for example, a positiveInteger. It by default inherits then all the restriction of that base type. See the example above.
maxInclusive The maximum value the type allows, including the value you specify. So, this translates to "less than or equal".
maxExclusive The maximum value the type allows, excluding the value you specify. So, this translates to "less than".
minInclusive The minimum value the type allows, including the value you specify. So, this translates to "greater then or equal".
minExclusive The minimum value the type allows, excluding the value you specify. So, this translates to "greater than".
maxLength The maximum length of the type (less than or equal).
minLength The minimum length of the type (greater than or equal).

The restrictions element allows a number of restrictions to apply. The list above shows the most common XSD elements for defining simple types. For a complete list, please refer to the XSD standard. A complex type defines a new type that has elements and attributes declared in it.

Element Description
complexType

Defines a complex type, which can declare a number of elements or attributes. This is like your data structure in your traditional programming days. Here is an example:

<xs:element name="Address" type="AddressType"/>
<xs:complexType name="AddressType">
   <xs:sequence>
      <xs:element name="Country" type="xs:string"/>
      <xs:element name="State" type="xs:string"/>
      <xs:element name="ZIP" type="xs:string"/>
      <xs:element name="Address1" type="xs:string"/>
   </xs:sequence>
   <xs:attribute name="ID" type="xs:positiveInteger"/>
   <xs:attribute name="Zone" type="xs:string"/>
</xs:complexType>

It declares a new element named Address of the type AddressType. It then defines the type AddressType as a type with four elements: Country, State, ZIP and Address1. The same as with simple types, you can nest the type definition within the element declaration:

<xs:element name="Address">
   <xs:complexType>
      <xs:sequence>
         <xs:element name="Country" type="xs:string"/>
         <xs:element name="State" type="xs:string"/>
         <xs:element name="ZIP" type="xs:string"/>
         <xs:element name="Address1" type="xs:string"/>
      </xs:sequence>
      <xs:attribute name="ID" type="xs:positiveInteger"/>
      <xs:attribute name="Zone" type="xs:string"/>
   </xs:complexType>
</xs:element>

As you can see, in this case you don't specify a type attribute but rather nested within the element have the complex type defined. Also, the complex type does not get any name attribute, so it can't be used for any other element. Attributes can never be of a complex type.

sequence A complex type can be a sequence, list, or choice of elements. Any element you declare within a complex type needs to be within a "sequence", "all", or "choice" block. See the example above. The "sequence" element specifies that the elements need to appear in the specified order in the XML document. The "sequence" element can have a minOccurs and maxOccurs, which define how often that sequence can be present. The attributes you define for a complex type are outside of the "sequence" block. See the example above again. The "sequence", "all" and "choice" elements are mutually exclusive.
all

A complex type can be a sequence, list, or choice of elements. Any element you declare within a complex type needs to be within a "sequence", "all", or "choice" block. The "all" element is used to declare a list of elements that can appear in no particular order within that complex type. The "all" element can have a minOccurs and maxOccurs, which define how often that list can be present. The attributes you define for a complex type are outside of the "all" block. Here is an example:

<xs:complexType name="AddressType">
   <xs:all>
      <xs:element name="Country" type="xs:string"/>
      <xs:element name="State" type="xs:string"/>
   </xs:all>
</xs:complexType>

This means that the type AddressType has a Country and State element which can appear in any order. The "sequence", "all" and "choice" elements are mutually exclusive.

choice A complex type can be a sequence, list, or choice of elements. Any element you declare within a complex type needs to be within a "sequence", "all", or "choice" block. The "choice" element specifies that only one of the elements is allowed within the complex type. The "choice" element can have a minOccurs and maxOccurs, which define how often that "choice" block can be present. The attributes you define for a complex type are outside of the "choice" block. Here is an example:

<xs:complexType name="StateProvinceType">
   <xs:choice>
      <xs:element name="State" type="xs:string"/>
      <xs:element name="Province" type="xs:string"/>
   </xs:choice>
</xs:complexType>

This means that the StateProvinceType is only allowed to have a State or a Province element but not both. The "sequence", "all", and "choice" elements are mutually exclusive.

simpleContent

A simple type does not have any attributes. If you want a simple type with attributes, you define a complex type and use the simpleContent element. Here is an example:

<xs:element name="Line" type="LineType"/>
<xs:complexType name="LineType">
   <xs:simpleContent>
      <xs:extension base="xs:string">
         <xs:attribute name="ID" type="xs:Integer"/>
      </xs:extension>
   </xs:simpleContent>
</xs:complexType>

It declares an element Line of the LineType type and it defines the new type LineType, which is based on the base type string and extends it with an additional attribute called ID of the type integer.

complexContent

Allows you to extend or restrict an existing complex type. Here is an example:

<xs:complexType name="Address2Type">
   <xs:complexContent>
      <xs:extension base="AddressType">
         <xs:sequence>
            <xs:element name="City" type="xs:string"/>
            <xs:element name="POBox" type="xs:string"/>
         </xs:sequence>
      </xs:extension>
   </xs:complexContent>
</xs:complexType>

It defines a new type, Address2Type, that uses the AddressType type as a base type and extends it with two new elements: City and POBox. Here is an example for restricting an existing complex type:

<xs:complexType name="Address2Type">
   <xs:complexContent>
      <xs:restriction base="AddressType">
         <xs:sequence>
            <xs:element name="Country" type="xs:string"                         fixed="US"/>
         </xs:sequence>
      </xs:restriction>
   </xs:complexContent>
</xs:complexType>

It again defines a new type, Address2Type, that uses the AddressType type as a base type and restricts the Country element to only allow "US". You cannot add new elements or attributes. But you restrict existing ones.

extension Used within simpleContent and complexContent to extend an existing simple or complex type. Simple types can only be extended with attributes. Complex types can be extended with elements and attributes. Elements need to be again within a "sequence", "all", or "choice" block. See the last few samples above. The attribute base specifies the base type you extend.
restriction Used within complexContent to restrict an existing complex type. The attributes "extension" and "restriction" are mutually exclusive. You can restrict attributes and elements. Elements need to be within a "sequence", "all", or "choice" block. See the last few examples above. The restriction element can not be used with simpleContent.

This now enables you to define any number of simple and complex types and use them. Again, the list above shows the most common XSD elements. For a complete list, please refer to the XSD standard. Here are a few other very useful XSD elements:

Element Description
annotation

Annotations can be added anywhere in your schema. This allows you to create self-describing schemas that do not require any additional documentation. Applying a simple XSLT allows you to create user-friendly descriptions of your XSD schema. The annotation element is the root element of any annotation followed by an appinfo or documentation child element. Here is an example:

<xs:element name="Employees">
   <xs:annotation>
      <xs:documentation>
         The collection of employee nodes. List
         minimum one employee. Allows to list an
         unlimited number of employees.
      </xs:documentation>
   </xs:annotation>
   <xs:complexType>
      <xs:choice maxOccurs="unbounded" minOccurs="1">
         <xs:element name="Employee" type="EmployeeType"/>
      </xs:choice>
   </xs:complexType>
</xs:element>

The annotation describes the Employees node declared. You can place annotations anywhere in the schema, but placing it right under a element, attribute, simple type or complex type makes it easy to read.

documentation Adds a note to the XSD schema. This element is always placed under an annotation element and intended for human readers. See the example above.
appinfo Adds a note to the XSD schema. This element is always placed under an annotation element and intended for applications. See the example above.
include

Allows you to include other schemas into your schema. If you have many types that you need to describe, it makes more sense to split the schema up in parts instead of having one huge XSD document. You might split it in base and extended types or maybe a file per type. In whatever form you split it up, the include element allows you to include other schemas into your schema. Here is an example:

<xs:schema ... >
   <xs:include schemaLocation="BaseTypes.xsd"/>
</xs:schema>

The include element must appear right underneath the schema element, at the beginning of your schema. The schemaLocation attribute tells the parser where the XSD schema is located. If you specify only the name, the parser assumes they are all in the same folder.

group

The "group" element allows you to declare a group of elements that you can incorporate into a complex type. This is not the same as defining a complex type. It is just creating a group of elements and then reusing this group within complex types. The group can only have a name but no type. Here is an example:

<xs:group name="AddressGroup">
   <xs:sequence>
      <xs:element name="Country" type="xs:string"/>
      <xs:element name="State" type="xs:string"/>
      <xs:element name="ZipCode" type="xs:string"/>
   </xs:sequence>
</xs:group>

<xs:complexType name="AddressType">
   <xs:group ref="AddressGroup"/>
</xs:complexType>

Elements declared within a group need to be again within a "sequence", "all", or "choice" block. You can reference an existing group by using the group element with the ref attribute. The example above creates a AddressGroup group and then creates a complex AddressType type which then references the AddressGroup group.

These few XSD elements and attributes allow you to create very comprehensive XSD schemas. These schemas can describe the expected format of the XML document in any detail. In your XML document, you reference the XSD schema (see below) so you can use the XML parser or tools like Visual Studio .NET 2003 to validate your XML document.

Sometimes, you have conflicting declarations of elements and attributes or definitions of simple or complex data types. In this case, you place each conflicting definition into a separate namespace. Namespaces are nothing other than a way to logically group declarations of elements and attributes and definitions of types together. Namespaces are identified with a "namespancename:" prefix. Each namespace you use needs to be declared with an "xmlns:namespacename" attribute in the schema element. When you create your XSD schema, you add to the "schema" element the following four attributes (see example above):

Attribute Description
targetNamespace Sets the namespace for which the XSD schema is declaring elements and attributes and defining types. So, all unqualified elements and attributes belong to this namespace. The value specifies the URI for this namespace.
xmlns Defines the default namespace in this XSD schema. In most cases, this is the same as the targetNamespace. But, if you have multiple namespaces in the XSD schema, you would choose via this attribute which namespace is the default one. The value specifies the URI of the default namespace.
xmlns:xs All the XSD schema elements and attributes are all prefixed with "xs:", which means they are residing in the "xs" namespace. This attribute references the schema that describes all the valid XSD elements and attributes itself (so the elements and attributes you use to create your schema). It is provided by W3C, which defined the XSD standard.
attributeFormDefault Set the value of this attribute to "qualified" if you want all attributes to be qualified with a namespace. Otherwise, set it to "unqualified". This can be overwritten by each attribute by specifying the "form" attribute.
elementFormDefault Set the value of this attribute to "qualified" if you want all elements to be qualified with a namespace. Otherwise, set it to "unqualified". This can be overwritten by each element by specifying the "form" attribute.

In the XML document itself, you add a reference to the XSD schema with the "xmlns" attribute using the URI of the XSD schema to the root element. Here is an example:

<?xml version="1.0" encoding="utf-8"?>
<Employees xmlns="http://tempuri.org/MySchema.xsd">
   ...
</Employees>

You can use Visual Studio .NET 2003 to create a XSD schema out of an existing XML document. Load the XML into Visual Studio and then select the "XML | Create Schema" menu. But, the generated schema is fairly loose, as there is only so much you can infer from an XML document. You, for example, cannot know from an XML document whether the minOccurs and maxOccurs should be set to certain values, and so forth. If you have a XML document but no XSD schema, this is a good approach to start with the creation of the schema. But, it is essential to edit the XSD schema manually until it comprehensively documents the valid XML formats.

Summary

It has become common practice to use XML to describe and exchange data. Exchanging these XML documents between two or more applications can pose a challenge because you want to make sure each application expects the XML in the same format. This is where XSD schemas come into play. Development teams should base the XML data exchanges on published XSD schemas. These schemas define the valid formats of the XML documents. And, only XML documents that validate successfully should be processed. Very often, it makes sense to validate the received XML document against the XSD schema you used to build the data exchange. If the XML cannot be validated successfully, you should return an error stating this to be an invalid XML document.

The creation and use of XSD schemas can avoid a lot of problems resulting from invalid XML documents. This article provides a good overview of the XSD syntax. With a small number of XSD elements and attributes, you are able to build comprehensive XSD schemas. The next article in this series will explain the fundamentals of XSL Transformations. If you have comments to this article, please contact me @ klaus_salchner@hotmail.com. I want to hear if you learned something new. Contact me if you have questions about this topic or article.

About the Author

Klaus Salchner has worked for 14 years in the industry, nine years in Europe and another five years in North America. As a Senior Enterprise Architect with solid experience in enterprise software development, Klaus spends considerable time on performance, scalability, availability, maintainability, globalization/localization and security. The projects he has been involved in are used by more than a million users in 50 countries on three continents.

Klaus calls Vancouver, British Columbia his home at the moment. His next big goal is doing the New York marathon in 2005. Klaus is interested in guest speaking opportunities or as an author for .NET magazines or Web sites. He can be contacted at klaus_salchner@hotmail.comor http://www.enterprise-minds.com.






Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 


Sitemap | Contact Us

Rocket Fuel