XPath Rules!

Developer.com content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

Introduction

XPath is a language used in XSLT and DOM programming for addressing elements and asserting expressions in XML documents. This latter capability of asserting expressions is a lesser known and used capability of the language. It’s possible to use XPath as a non-proprietary rule language. We’ll look to exploit this capability and demonstrate how DOM and XPath can be used to build a simple declarative rule engine.

This article will assume some level of familiarity with DOM and XPath. The conceptual model of the rule engine depicted here could be implemented using various DOM and XPath implementations. We’ll build a Java implementation. Because JAXP (Java API for XML) doesn’t include any native XPath APIs at this time, we’ll look to open source to fill the gap. In order to run the code presented here, you’ll need to have Apache Xalan[i] installed in your classpath.

The following outline is a roadmap for how we’ll cover the topic:

  • First, we’ll begin with a conceptual model of our rule engine
  • Then, we’ll review the schemas for the various XML documents we’ll be using
  • Next, we’ll dive into the code for our rule processor and review it step by step
  • Then, we’ll discuss pros and cons of this type of approach
  • Finally, we’ll wrap up by summarizing what we’ve learned

Conceptual Model

Using the capabilities of DOM and XPath, our goal is to build a declarative rule engine as illustrated in the following diagram:

The model document represents an instance of the domain data that rules will be run against. Because we desire our rule processor to be generic, this document may be of any schema.

A rules document contains the rules to be asserted versus a domain model document. The rules defined must adhere precisely to the rules schema we define. The rule processor will have a dependency upon this schema.

The rule processor will assert the rules in the rules document versus the model document and output an errors document showing any rules that were violated. The errors document is also generic and can report the errors for any schema.

Development and maintenance of rules is accomplished by simply adding rules to the rules document via boolean XPath expressions. This means that the rule engine is declarative. New rules can be entered without programming.

XML Schemas

Let’s review the schemas in more detail for our three XML documents and look at some sample XML streams.

Model Document

The model document may be any schema. For our purposes, we’ll be using a fairly simple schema containing a collection of customer records. Each Customer element contains Name, City, and State elements.

A sample stream follows:

<Customers>
  <Customer id="C1">
    <Name>XYZ Plumbing</Name>
    <City>New Haven</City>
    <State>CT</State>
  </Customer>
  <Customer id="C2">
    <Name>Joes Bar and Grill</Name>
    <City>Waterbury</City>
    <State>CT</State>
  </Customer>
  <Customer id="C3">
    <Name>ABC Pizza</Name>
    <City>Hartford</City>
    <State>CT</State>
  </Customer>
    <Customer id="C4">
    <Name>A really long customer name goes here</Name>
    <City>Southington</City>
    <State>CT</State>
  </Customer>
  <Customer id="C5">
    <Name>xxx</Name>
    <City>xxx</City>
    <State>CT</State>
  </Customer>
</Customers>

Notice that each Customer has an id attribute uniquely identifying the customer.

Rules Document

The rules schema is also fairly simple and contains a collection of rules. Each rule has Context, ValidationAssertion, and ErrorMessage elements.

The Context node contains an XPath expression representing a node list in the model document. The ValidationAssertion contains an XPath expression that will be asserted versus the nodes in the Context node list. The ErrorMessage represents the message to be returned when a ValidationAssertion evaluates false.

In the sample document below, the first rule checks to see that a customer name is not more than 30 characters. The second rule, that a customer name is greater than 3 characters. The third rule, that the City is greater than 3 characters. Each assertion is an XPath expression that evaluates to a boolean result. Assertions that evaluate to false are considered errors or violations.

<Rules>
  <Rule>
    <Context>/Customers/Customer</Context>
    <ValidationAssertion>string-length(normalize-space(Name))
     &lt;= 30</ValidationAssertion>
    <ErrorMessage>Customer name must be less than 30
     characters</ErrorMessage>
  </Rule>
  <Rule>
    <Context>/Customers/Customer</Context>
    <ValidationAssertion>string-length(normalize-space(Name))
     &gt; 3</ValidationAssertion>
    <ErrorMessage>Customer name must be greater than 3
     characters</ErrorMessage>
  </Rule>
  <Rule>
    <Context>/Customers/Customer</Context>
    <ValidationAssertion>string-length(normalize-space(City))
     &gt; 3</ValidationAssertion>
    <ErrorMessage>City must be greater than 3
     characters</ErrorMessage>
  </Rule>
  </Rules>

The ACORD standards body for the insurance industry defines custom rules to its schema in a similar way in the service provider extensions or SPX specification[ii].

Errors Schema

The Errors Schema represents a collection of rule violations. Each Error contains a Context element representing the model document node that generated the error, and an associated Message. As you can see from the sample stream, the Context is an XPath expression indicating the specific node in the document whose rule was violated.

<Errors>
  <Error>
    <Context>/Customers/Customer[@id='C4']</Context>
    <Message>Customer name must be less than 30
     characters</Message>
  </Error>
  <Error>
    <Context>/Customers/Customer[@id='C5']</Context>
    <Message>Customer name must be greater than 3
     characters</Message>
  </Error>
  <Error>
    <Context>/Customers/Customer[@id='C5']</Context>
    <Message>City must be greater than 3 characters</Message>
  </Error>
  </Errors>

XPath Rule Processor

From our discussion of the schemas and sample documents, you should have a good idea of what the inputs and output of the rule processor will be. Let’s step through the code and see how it performs its task of asserting the rules in the rules document versus the model document and outputting the errors found.

Where possible, we are using standard Java API for XML (JAXP 2.0). However, DOM XPath capability is lacking in these interfaces. We’ll use the XPath capabilities in Apache Xalan to fill in the gaps.

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.apache.xpath.XPathAPI;
import org.apache.xpath.objects.XBoolean;

We begin by opening up our rules document, “Rules.xml” and our model document, “Customer.xml” using the standard JAXP APIs. We get the document element for each document for subsequent processing.

public static void main(String[] args) throws Exception
{
  DocumentBuilderFactory factory =
    DocumentBuilderFactory.newInstance();
  DocumentBuilder builder = factory.newDocumentBuilder();
  Document rules = builder.parse("Rules.xml");
  Node rulesRoot = rules.getDocumentElement();
  Document model = builder.parse("Customers.xml");
  Node modelRoot = model.getDocumentElement();

Next, we get a NodeList of all the rules we are going to assert versus the model document. In our sample rules document, this will return three nodes. Notice how using XPath with DOM makes for much more understandable and less verbose code than DOM programming by itself. A StringBuffer is also prepared to hold the errors document.

NodeList ruleNodes = XPathAPI.selectNodeList(rulesRoot,
                                             "/Rules/Rule");

StringBuffer errors = new StringBuffer("<Errors>");

Now the real fun begins. The outer loop iterates through the rules to be processed.It gets the context, assertion, and message elements for a given rule. Then, it uses the context to get a NodeList from the model document of all the elements the rule needs to be run against. When run versus our example model document, this will return the five Customer nodes.

for (int i = 0; i < ruleNodes.getLength(); i++)
{
  Node rule = ruleNodes.item(i);
  String context =
    XPathAPI.selectSingleNode(rule, "Context/text()")
      .getNodeValue();
  String assertion = 
    XPathAPI.selectSingleNode(rule, "ValidationAssertion/text()")
      .getNodeValue();
  String message =
    XPathAPI.selectSingleNode(rule, "ErrorMessage/text()")
      .getNodeValue();

  NodeList modelNodes = XPathAPI.selectNodeList(modelRoot, context);

The inner loop iterates through the model document nodes. Using the eval() method in XPathAPI, the rule assertion is made versus a given model node. If the assertion evaluates to false, we add an Error element to the Errors document that includes the error message and context.

for (int j = 0; j < modelNodes.getLength(); j++)
{
  Node modelNode = modelNodes.item(j);
  XBoolean result = (XBoolean)XPathAPI.eval(modelNode, assertion);
  if (!result.bool())
  {
    String path = getAbsolutePath((Element)modelNode);
    errors.append("<Error>"
          + "<Context>" + path + "</Context>"
          + "<Message>" + message + "</Message>"
          + "</Error>");
    }
  }
}
errors.append("</Errors>");
System.out.println(errors.toString());

The getAbsolutePath() method returns an XPath expression representing the ContextNode which violated the rule. It recursively ascends the DOM tree to build the expression. Any nodes uniquely identified with id attributes are also added to the XPath expression.

public static String getAbsolutePath(Element e)
{
  String path = "/" + e.getTagName();
  if (e.hasAttribute("id"))
  {
    path += "[@id='" + e.getAttribute("id") + "']";
  }
  Node parent = e.getParentNode();
  if (parent.getNodeType() == Node.ELEMENT_NODE)
  {
    return getAbsolutePath((Element)parent) + path;
  }
  else
 {
    return path;
  }
}

When we run the XPathRuleProcessor using the Rules.xml and Customers.xml document we presented earlier, the following results are produced:

<Errors>
  <Error>
    <Context>/Customers/Customer[@id='C4']</Context>
    <Message>Customer name must be less than 30
     characters</Message>
  </Error>
  <Error>
    <Context>/Customers/Customer[@id='C5']</Context>
    <Message>Customer name must be greater than 3
     characters</Message>
  </Error>
  <Error>
    <Context>/Customers/Customer[@id='C5']</Context>
    <Message>City must be greater than 3 characters</Message>
  </Error>
  </Errors>

With a few lines of code, we wrote a truly generic rule engine. Rule documents could be authored without programming to validate any domain model schema. Our rule processor would need a bit of work to be production quality. However, it illustrates how a simple rule assertion algorithm can be implemented using XPath as the rule language.

Pros and Cons of This Approach

The high-level conceptual model presented here could be used to implement a lightweight rules framework. Most Java shops already have the tools in house to build such a framework. Because a declarative approach is used, new rules can be added without programming, reducing maintenance costs.

You wouldn’t expect a declarative XPath rules engine to perform up to the capabilities of an industrial-strength rule engine with rules represented in compiled code. However, performance could be quite acceptable.

You are probably wondering why you would use an XPath rule engine as described in this article when XML Schemas could be used to validate a domain model document. There are several reasons why you might consider this. There may be semantic or syntactic checks that cannot be implemented in XML Schema. In addition, the error violations reported by a validating parser may not be granular enough or friendly enough to meet user requirements.

Summary

XPath is a language for addressing elements in XML. But, it also can be used to assert expressions versus XML documents. We used the latter capability along with some basic DOM programming to build a simple rule engine. The conceptual model and code presented here could be the genesis of a validation framework or other lightweight rules framework for your next project. But, the rest is up to you!

Code Examples

To download the XML documents, schemas, and Java code, click here.

About the Author

Jeff Ryan is an architect for Hartford Financial Services. He has twenty years experience designing, developing and delivering automated solutions to business problems. His current focus is on Java, XML and Service Oriented Architecture. He may be reached at jryan@thehartford.com.
Other Articles Written by Jeff Ryan

End Notes

[i] Apache Xalan Home Page

[ii] ACORD XML Specification

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends & analysis

Latest Posts

Related Stories