Inferring an XML Schema from an XML Document
After a couple attempts, XML isn't that hard to write. Create a text document with matching opening a closing tags, like <Customer></Customer>, with text values in between. That's not too hard. Unfortunately, I don't write XSD (XML Schemas) documents from scratch enough for them to be easy.
An XSD document is to an XML document what a SQL schema is to a SQL element like a table. The XSD document means that XML documents must contain the elements in the order and types defined by the schema to be valid XML documents relative to that schema. That is, the XSD document describes what to anticipate in an XML document that matches the schema. The challenge can be that an XSD document contains attributes and namespace elements that are a little cryptic and can be hard to remember. Fortunately, you don't have to remember.
The .NET framework provides for inferring the schema from a document. If you have the document, you can generate the schema. This article shows you how.
Defining an XML Document
For your purposes, any XML document will do. The XML contained in Listing 1 is an XML document containing columns from the Northwind Customers table. (It was used because it is convenient.) The XML document contains the <xml> tag with the version and encoding attributes, and the rest of the document describes the content.
Listing 1: A sample XML document containing customer information.
<?xml version="1.0" encoding="utf-8" ?> <!--Generated XML--> <Root> <Customer> <CustomerID>ALFKI</CustomerID> <CompanyName>Alfreds Futterkiste</CompanyName> <ContactName>Paul Kimmel</ContactName> <ContactTitle>Sales Representative</ContactTitle> <Address>Obere Str. 57</Address> <City>Berlin</City> <RegionRegion> <PostalCode>12209</PostalCode> <Country>Germany</Country> <Phone>030-0074321</Phone> <Fax>030-0076541</Fax> </Customer> <Customer> <CustomerID>ANATR</CustomerID> <CompanyName>Ana Trujillo Emparedados y helados</CompanyName> <ContactName>Ana Trujillo</ContactName> <ContactTitle>Owner</ContactTitle> <Address>Avda. de la Constitución 2222</Address> <City>México D.F.</City> <Region></Region> <PostalCode>05021</PostalCode> <Country>Mexico</Country> <Phone>(5) 555-4729</Phone> <Fax>(5) 555-3745</Fax> </Customer> </Root>
The number of records was shortened to conserve space, but the size of the document doesn't matter. This XML document (refer to Listing 1) repeats Custom objects with each child element corresponding to the columns in the Northwind Customers table.
A corresponding XSD document would need to decribe the contents that one would expect in all Customer XML documents, such as the fact that the contents are multiple complex types and each type has specific fields. The field names and types would be expressed in the XSD as well.
Writing Code to Infer the XML Schema and Return an XDocument
The XDocument type is a new type that is part of LINQ to XML. (For more on LINQ to XML, check out my book LINQ Unleashed for C#. VB programmers shouldn't have that much trouble following the C# examples in the book.)
XDocument represents an XML document, and in fact, XSD documents are also XML documents. Listing 2 demonstrates how to use streams, basic IO, and System.XML classes to get the framework to infer (figure out) what the schema should be as indicated by the XML data.
Listing 2: Inferring the XSD (schema) for the XML document in Listing 1.
Imports System.Xml.Schema Imports System.IO Imports System.Text Imports System.Xml Module Module1 Sub Main() Console.WriteLine(CreateXSD("..\..\Customers.xml")) Console.ReadLine() End Sub Public Function CreateXSD(ByVal filename As String) As XDocument Dim xml As XDocument = XDocument.Load(filename) Dim inference As XmlSchemaInference = New XmlSchemaInference Dim stream As MemoryStream = _ New MemoryStream(Encoding.ASCII.GetBytes(xml.ToString())) Dim reader As XmlTextReader = New XmlTextReader(stream) Dim schemaSet As XmlSchemaSet = inference.InferSchema(reader) Dim schema As XmlSchema = schemaSet.Schemas()(0) Using target As TextWriter = New StringWriter() schema.Write(target) Return XDocument.Parse(target.ToString()) End Using End Function End Module