Somehow, modeling and XML aren’t often found together in the same sentence. In my experience, I’ve seen XML vocabularies created using the “fly by the seat of your pants” methodology more than anything else. After all, because XML is the eXtensible Markup Language, it’s easy to create your own markup, right?
If we were talking about storing data in a relational database, on the other hand, you would think about modeling the entities, relationships, and attributes the way they exist in the real world to provide the most flexible data access. You would have the rigor of 3rd normal form to guide you. There could be performance considerations in how you model the data as well.
As we begin to use XML to represent our portable, and even persistent data, should we throw out all we’ve learned about how to model relational data? I don’t think so.
In this article, we’ll discuss some options for implementing one-to-many relationships in XML. We’ll consider three different techniques:
- Containment relationship
- Intra-document relationships
- Inter-document relationships
For each technique, we’ll produce the following artifacts to flesh out the idea:
- DTD(s) to represent document structure
- Sample XML stream(s)
- An XSL stylesheet to demonstrate data access
We’ll discuss when it would be appropriate to use each approach and then summarize what we’ve learned. Some additional resources about modeling XML are also listed at the end of the article.
Department and Employee Domain
To begin, let’s define a business domain to model. We’ll implement this model by using our three one-to-many XML modeling techniques.
In the relational database world, departments and employees are often used to illustrate concepts. Because this is such a well-known problem domain that the reader may be already familiar with, I’ll also use this as an example. No need to reinvent the wheel here.
The Entity-Relationship, or ER, diagram depicted here shows that we have two entities, Department and Employee. Departments are uniquely identified by department_id. Similarly, Employee uses emp_id as its unique identifier.
The line between Department and Employee indicates a relationship. The infinity symbol next to Employee indicates that there may be many Employees in a Department. In a one-to-many relationship, the key of the one side of the relationship, in this case Department, would become a foreign key on the many side of the relationship, in this case Employee.
In a Containment Relationship, a structure is defined where one element is contained within another. In the strongest form of this relationship, the “contained” element ceases to exist when the “container” element is removed.
Let’s take a look at a DTD we’ll use in the containment relationship implementation of our domain model:
<?xml version="1.0" encoding="UTF-8"?><!ELEMENT Company (Department+)><!ELEMENT Department (Name, Employee+)><!ELEMENT Employee (Name)><!ELEMENT Name (#PCDATA)>
A Company may contain many Department Elements. A Department element contains a Name and may contain many Employee elements. Employee elements may also contain a Name.
A sample XML stream follows:
<?xml version="1.0" encoding=";UTF-8"?><?xml-stylesheet href="Containment.xsl" type="text/xsl"?><Company> <Department> <Name>Enterprise Development</Name> <Employee> <Name>Jeff</Name> </Employee> <Employee> <Name>Mike</Name> </Employee>> </Department> <Department> <Name>Foundation Services</Name> <Employee> <Name>Sam</Name> </Employee> </Department></Company>
An employee list page can be created from the stream with the stylesheet below:
<?xml version="1.0" encoding="UTF-8"?><xsl:stylesheet version="1.0" xmlns_xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="html" indent="yes"/> <xsl:template match="Company"> <html> <head> <title>Employee List</title> </head> <body> <h1>Employee Listing</h1> <table> <tr> <td>Employee</td> <td>Department</td> </tr> <xsl:apply-templates select="//Employee"/> </table> </body> </html> </xsl:template> <xsl:template match="Employee"> <tr> <td> <xsl:value-of select="Name"/> </td> <td> <xsl:value-of select="../Name"/> </td> </tr> </xsl:template></xsl:stylesheet>
The Company template creates the shell of the HTML page. The column headers for a table, listing Employee and Departments, are created. The Employee template is invoked for each Employee node of the document. The employee name (Name) and department name (../Name) is selected into the appropriate table cell.
When implementing our model through an Intra-Document relationship, we’ll be getting a little bit closer to how we may have implemented the domain model in a relational database. Rather than a department containing employees, an employee will have a “works for” relationship to a department. In fact, an employee could have many potential relationships to various departments.
We’ll take advantage of ID and IDREF data types in our DTD:
- The ID is used to uniquely identify a node in a document. You can think of the ID as being a “key” to a node.
- An IDREF is used to reference another node in a document. You can think of an IDREF as a “foreign key” to associate a node with another node.
<?xml version="1.0" encoding="UTF-8"?><!ELEMENT Company (Departments, Employees)><!ELEMENT Department (Name)><!ATTLIST Department id ID #REQUIRED><!ELEMENT Departments (Department+)><!ELEMENT Employees (Employee+)><!ELEMENT Employee (Name)><!ATTLIST Employee DepartmentRef IDREF #REQUIRED><!ELEMENT Name (#PCDATA)>
A Company will contain Departments and Employees elements. Each Departments element may contain many Department elements. Each Department has an id attribute of type ID which uniquely identifies the department.
Similarly, each Employees element may contain many Employee elements. Each Employee element has a DeparmentRef attribute of type IDREF. This is very similar to a foreign key in a relational database.
Whenever modeling a one-to-many relationship in a relational database, you put the key of the “one” as a foreign key on the side of the “many.” This is exactly what we did by putting the department id as a reference on the Employee element.
A sample XML stream will help to illustrate:
<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet href="IntraDocument.xsl"; type="text/xsl"?><!DOCTYPE Company SYSTEM "IntraDocument.dtd"><Company> <Departments> <Department id="d1"> <Name>Enterprise Development</Name> </Department> <Department id="d2"> <Name>Foundation Services</Name> </Department> </Departments> <Employees> <Employee DepartmentRef="d1"> <Name>Jeff</Name> </Employee> <Employee DepartmentRef="d1"> <Name>Mike</Name> </Employee> <Employee DepartmentRef="d2"> <Name>Sam</Name> </Employee> </Employees></Company>
Rather than Jeff and Mike being contained in the Enterprise Development Department, they now just reference this department’s ID via the DepartmentRef attribute.
The stylesheet from the containment example will only need a simple modification to the Employee template now to display the employee.
<xsl:template match="Employee"> <tr> <td> <xsl:value-of select="Name"/> </td> <td> <xsl:value-of select="id(@DepartmentRef)/Name"/> </td> </tr> </xsl:template>
The id() function in XPath is used to efficiently find ID nodes in an XML document. We find the Department node associated with a given employee by supplying its key as specified in the @DepartmentRef attribute.
Note that if we didn’t have a DTD for this XML document, the id() function wouldn’t produce the desired results. The following XPath expression could have been used in its place, but it wouldn’t be nearly as efficient:
//Department[@id = current()/@DepartmentRef]. This expression is searching through all Department nodes, looking for one with an id attribute equal to the DepartmentRef attribute of the current Employee node.
In a relational database, the Department and Employee tables might live in different filesystems or tablespaces. In XML, the Department and Employee nodes may live in different documents. In fact, each employee or department may even live in its own document.
We’ll define two DTDs for this example: one for the Departments and one for the Employees.
<?xml version="1.0"; encoding="UTF-8"?><!ELEMENT Departments (Department+)><!ELEMENT Department (Name)><!ATTLIST Department id ID #REQUIRED><!ELEMENT Name (#PCDATA)>
The department’s DTD is very simple. Departments nodes may contain many Department nodes. Each Department has an id attribute and a Name element.
The following document is a sample Departments stream:
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE Departments SYSTEM "Departments.dtd"><Departments> <Department id="d1"> <Name>Enterprise Development</Name> </Department> <Department id="d2"> <Name>Foundation Services</Name> </Department></Departments>
Let’s pay close attention to the Employee DTD:
<?xml version="1.0" encoding="UTF-8"?><!ELEMENT Employees (Employee+)><!ELEMENT Employee (Department, Name)><!ELEMENT Department EMPTY><!ATTLIST Department href CDATA #REQUIRED><!ELEMENT Name (#PCDATA)>
The Employees element may contain many Employee elements. Each Employee must contain a Department and a Name element. The Department element must be empty. It has an href attribute which will be used to specify the document and id of that department. The Department element is really a “proxy” element that points to the real department element.
Here is a sample Employees stream:
<?xml-stylesheet href="InterDocument.xsl"; type="text/xsl"?><!DOCTYPE Employees SYSTEM "Employees.dtd"><Employees> <Employee> <Department href=";Departments.xml#d1"/> <Name>Jeff</Name> </Employee> <Employee> <Department href="Departments.xml#d1"/> <Name>Mike</Name> </Employee> <Employee> <Department href="Departments.xml#d2"/> <Name>Sam</Name> </Employee></Employees>
The href attribute should look quite familiar to HTML developers. This is used to link to different documents and different locations within documents. In XML, it can serve the same function.
The href attribute as listed here is an XPointer expression. XPointer is a language for identifying fragments of documents referenced in links or included in other documents. Rather than using an XQuery engine to process this expression, we’ll use standard XPath features available in XSLT.
<xsl:template match="Employee"> <xsl:variable name="document" select="document(substring-before( Department/@href,'#'))"/> <xsl:variable name="id" select="substring-after(Department/@href,'#')"/> <tr> <td><xsl:value-of select="Name"/></td> <!-for-each changes context node for id function--> <xsl:for-each select="$document"> <td><xsl:value-of select="id($id)/Name"/></td> </xsl:for-each> </tr> </xsl:template>
In the Employee template, a document and an id variable are created from the Department href attribute. Remember that this Department is the “proxy” to the real department. The document() function is used to reference the Departments document. The substring-before() and substring-after() functions come in handy to parse the href into the document and id portions of the string.
The <xsl:for-each> statement is used in a peculiar way. It’s not being used to iterate through a node set, but to change the context node to the root node of the departments document. Then the id() function is used to reference the actual Department node.
Note that because we only have a single department document, a more efficient implementation would be to pass the department document name as a parameter to the stylesheet and have a global document variable. The implementation here allows for departments to exist in multiple documents.
Also, note that the main template changed slightly from the prior examples. Please see the Code Examples link at the bottom of the article for the entire stylesheet.
When to Use Each Technique
Nine times out of ten, I’ve seen developers use the Containment relationship approach to implement one-to-many relationships. I believe that this is more out of ignorance of other approaches than anything else.
When passing small documents between components, the Containment relationship approach is very expedient. However, it may not model reality (that is, true containment relationships) and does not provide the most flexible data access.
The Intra-document relationship approach provides much more flexible data access. You can model many relationships between elements, not just containment relationships. The use of ID/IDREF attribute types makes Intra-document relationship access very efficient.
The Inter-document relationship approach can come in handy in certain situations where it’s beneficial to have several smaller documents. However, in the ease of use category, it’s a little bit awkward at this point, although this should improve with XPointer support. When passing data between components, it can get a little tricky.
The following table summarizes the characteristics of the three approaches:
|Technique||Passing Data||Flexibility||Ease of Use|
I would like to see the Intra-document approach used more frequently and to become the “default” choice of developers for relating elements. Legacy and true containment relationships would use the Containment approach. Inter-document relationships are going to become more and more prevalent as we move to persisting documents in XML databases.
We shouldn’t throw away all that we’ve learned in modeling relational databases and object models when designing XML vocabularies. With both technologies, we’re modeling a problem domain that can be represented in a format that is technology neutral. We just happen to implement the model using documents, elements, and attributes as opposed to tables, row, and columns.
We looked at the Containment, Intra-document, and Inter-document approaches for modeling one-to-many relationships in XML by using the familiar Department and Employee problem domain. We discussed when each approach might be appropriate.
So, now you have a few new techniques in your toolbox for modeling XML. The next time you design an XML vocabulary, don’t use the “fly by the seat of your pants” method. Model it first, and then consider various implementations. The rest is up to you!
To download the example XML streams, DTDs, and stylesheets, click here.
I’m hoping to find more books and articles on this topic. David Carlson’s book, Modeling XML Applications With UML, is the best resource I’ve found to date. As XML becomes more and more pervasive, well-designed XML vocabularies and modeled data will become more important. The following link will take you the associated Web site for this book: http://www.xmlmodeling.com/.
About the Author
|Jeff Ryan is an architect for Hartford Financial Services. He has eighteen years of experience designing and developing automated solutions to business problems. His current focus is on Java, XML, and Web Services technology. He may be reached at firstname.lastname@example.org.|
|Other Articles Written by Jeff Ryan|