XPath and XQuery are two of the innovations that have developed around XML. These XML-related technologies have matured, giving developers the flexibility to use them in different combinations. XPath is a simple query language for XML data storage and XQuery, an extension of XPath, is used for more complex data-selection requirements such as formatting, sorting, etc.
JXPath is an open source Java API (under the Apache Commons component list) for evaluating XPath expressions over XML and Java object models. By default, JXPath gives results in the DOM/JDOM model format, which is parser dependent. However, JXPath’s design is also extensible, allowing you to customize the API to generate results for XML processing in custom model formats. This feature of JXPath is not documented and not explored by many people.
This article explores JXPath and demonstrates how to extend it to get results in a custom object model when working with an application you do not want to be coupled with the DOM/JDOM data model. The article also explains the rationale behind this customization and points out a few advantages of having your own data model.
JXPath for XML Processing
The JXPath library is built to apply XPath expressions on Plain Old Java Object (POJO) models and also on XML data. So, without processing Java objects in the Java layer, you can use XPath expressions and get the desired results.
Let’s look at a simple XML file (see Figure 1), which will serve as a reference throughout the article, and walk through using JXPath with JDOM for processing XML data.
The PackageContainer class actually specifies that the above XML file be processed by the JXPath API. This class also introduces a new JXPath concept called Container.
Container provides an indirection to the data. So, if a property of the context node has a Container as its value, then the XPath of that property will produce the content of the container and not the container itself. The following interface defines this responsibility:
A specific subtype called XMLDocumentContainer used to deal with XML documents, but it is now deprecated. You are supposed to use DocumentContainer instead (see Figure 2).
Note: This class’s
getPackages() method actually specifies CodeElementModel.xml as the data file for the JXPath API. In this example, DOM is the model for the JXPath API.
Figure 3 shows the JUnitTest method that executes the XPath expression over XML data.
Note: In this test code, the XPath expression ‘
packages/Root//Method‘ is interpreted as follows:
packages‘ is a property of the PackageContainer class.
Root‘ is the first tag of CodeElementModel.xml.
Method‘ is a child of the ‘Root’ tag of CodeElementModel.xml at any level of the ‘Root’ tag.
context.getValue("packages/Root//Method"); will pick the value of
[email protected]='payByVisaCreditCard' "Does the Payment for the Visa Credit Cards", and
context.iterate("packages/Root//Method") will pick up all the methods.
To this point, you have seen how to use the standard JXPath API to process simple XPath expression over XML input. You have also seen what the role of the Container interface is. In this section, you got the results in the DOM model. In the next section, you will see how to extend the library to get results in a custom model.
The Case for Custom Models
Suppose you need to have a generic domain model for heterogeneous data structures, and you sometimes use service data objects (SDO) or your own custom model. So, instead of writing a model translator that does JDOM/DOM model-to-custom model translations, you can modify JXPath API for a more efficient solution.
Modifying JXPath is more efficient in applications where XPath is the dominant XML-processing solution and each time you process XML you need to do translation of a model as well. Customizing JXPath will take some effort, but it’s a one-time effort.
How to Extend/Modify JXPath
Figure 4 shows the custom data model that is used in the code download, which would be generated by JXPath and the SAX parser.
In this model:
- Attribute represents the attribute of the XML node.
- Property represents the node of an XML data graph. This represents a leaf node and not a parent node.
- DataEntity represents a parent node of an XML data graph.
- DynamicProperty represents dynamic properties of nodes, such as line no., column no., or any user-defined data.
Note: There are interfaces for Attribute, Property and DynamicProperty, which are not shown in the class diagram to keep it simple.
JXPath Extension Points
This section is about the extension points of JXPath. JXPath is designed such that these extension points are used to generate results in a custom object model for XML data input, instead of in a DOM or JDOM data model.
JXPath supports multiple models through the notion of different abstractions. Basic JXPath abstractions that you need to understand are Pointer, XMLParser, and JXPathContext.
- Pointer: The notion of a pointer is the same as what you have in programming languages like C, which uses the pointer as an address that points to a variable rather than actually holding data. Similarly, in JXPath, if you need to find where the node in the graph is and do not want to get the actual node from the object graph itself, then you can use pointers.
Figure 5 illustrates how you can use pointers.
- XMLParser: The XML parser abstraction is provided by the abstract class org.apache.commons.jxpath.xml.XMLParser2 (see Figure 6). JXPath provides implementation of this abstraction for different types of models.
- JXPathContext: This is the important abstraction of the JXPath API; it provides access to the traversal of JavaBean and XML graphs using XPath syntax. It also provides a factory to create JXPathContext objects, JXPathContextFactory. This factory enables control over the creation of JXPathContext, and it can have multiple implementations of JXPathContext and return that particular object from factory.
JXPathContext.newContext()is the static method provided on Context to get the object.
The next section explains the relation between the basic JXPath components.
The view in Figure 7 will help illustrate the relationship between the main components of the JXPath API.
Here are the components and their respective roles:
- JXPath Engine is the core of the JXPath API. It uses NodePointers and Parsers to evaluate the XPath expression over the XML document/Java model.
JXPath Engine is being modified so that it will use a new Parser/NodePointer implementation apart from the older one, which is required to get results in the form of DOM/JDOM Data model.
- The JXPath-XML Parser component allows you to create a new implementation of XML Parser. This example uses an implementation of a SAX-based XML parser, which parses the XML and populates the result in the generic Data Model objects. This component will be used by Engine to parse the XML and get the parsed results.
- The JXPath-NodePointer component is called by JXPath Engine after it parses the XML document using the JXPath-XML Parser component. This component includes NodePointer, NodeIterator and NodePointerFactory abstractions.
The example implements NodePointer, NodeIterator and NodePointerFactory abstractions of the JXPath API, these implementations are required for a new data model.
You now have a fair understanding of the basic abstractions of the JXPath API, its component-level relationship, and so on. Now it’s time to see the details of all the classes/interfaces that need to be extended and how they are supposed to be used.
You need to have custom implementations of the following abstract classes related to Pointer abstraction:
- NodePointer – JXPath Type: org.apache.commons.jxpath.ri.model.NodePointer
- NodeIterator – JXPath Type: org.apache.commons.jxpath.ri.model.NodeIterator
- NodePointerFactory – JXPath Type: org.apache.commons.jxpath.ri.model.NodePointerFactory
Node Pointer represents the location of a node in an object graph. In XML, it could represent a node, attribute or namespace. Here are the new Pointer implementation classes (XML tag, namespace and attribute) for this example:
Figure 8 shows the Pointer class diagram.
Node Iterator is an iterator for any kind of node (attribute, namespace or tag). Here are the new Iterator implementations (XML tag, namespace and attribute) for this example:
Figure 9 shows the Iterator class diagram.
The Node Pointer Factory is based on the Factory design pattern. The Factory pattern is a creational pattern in which the Factory method hides the complexity of creating the resultant object. NodePointerFactory specifies factory methods to create the NodePointer object by allowing you to pass Object as a formal parameter. This object is one of the model objects:
NodePointer createNodePointer(NodePointer parent, QName name, Object object);
Figure 10 shows an implementation of the above method. IDataElement and Property are the new data model classes, and the new Factory class is org.apache.commons.jxpath.ri.model.ia.MyModelNodePointerFactory.
Figure 11 shows the Pointer Factory class diagram.
This example needs to have a custom implementation of an abstract XMLParser, which represented by org.apache.commons.jxpath.xml.XMLParser2. This component parses XML and gives a relevant model to JXPath Engine for further evaluation. The example implementation, com.jxpath.setl.jw.sample.genericxmlparser.XMLSAXParser, uses a SAX-based parser and an implemented XMLParser2 interface. This parser parses any XML and populates the memory model using a generic model graph, on which JXPath applies XPath expression and returns the result in the generic model result.
Figure 12 shows the XML parser class diagram.
JXPathContext adds a node pointer factory by adding it to the JXPathContextReferenceImpl. This allows you to pass any custom NodePointerFactory to the JXPath API. However, as the method provided by JXPath is in the implementation of JXPath, client code will be coupled with the implementation of JXPath, which might not be preferable in some cases. The other option is to change some code in JXPathContextReferenceImpl to register NodePointerFactory, so that by default that factory will be added.
You can achieve this by calling a static method called
public static void addNodePointerFactory(NodePointerFactory factory) of the org.apache.commons.jxpath.ri.JXPathContextReferenceImpl class. Figure 13 shows a code snippet illustrating how to use this option to configure the custom node pointer factory.
In Figure 13, the custom node pointer factory is added in the JXPathContextReferenceImpl type before calling
JXPathContext.newContext(cont); because before getting a new object of JXPathContextReferenceImpl, which initializes the other node pointer factories, you need to add your custom one. Sequence is important.
Also notice that the class called MyModelPackageContainer is passed to create a new JXPathContext. This class is also a little different from the older PackageContainer (see Figure 2) as it has to register a new XML parser and a new model type.
Using JXPath After Extension for a Custom Model
The code snippets in Figures 14 and 15 show how to create an XML container with a new parser using a new Key. Figure 14 shows the
getPackage() method from the model class MyModelPackageContainer and Fig 15 shows the
setXMLParser method, two methods for registering a custom parser and model.
In Figure 14, the XML Parser (com.jxpath.setl.jw.sample.genericxmlparser.XMLSAXParser) object is registered to the JXPath framework by calling
In Figure 15, DocumentContainer is created with the model as
Note: You need to use same Key
(SAMPLE_PARSER) while registering a parser and model and while creating DocumentContainer.
This concludes the JXPath API explanation. The final section briefly discusses the data model used throughout the article.
What Have You Learned?
You now know the details involved in extending the JXPath API to derive a custom model while processing XPath language expressions over XML data. This way, you can extend the XPath engine, or even other XQuery engines that support extensions, to produce custom data models in different types of enterprise applications, as XML is used mostly as a means of data transfer. These applications could range from data integration to dynamic configuration.
Also, there many areas where you can improve from the example. You, for example, could have a more efficient algorithm for evaluating XPath expression. You could have a single layer of query for RDBMS and XML, an abstraction for querying that will hide the type of repository from which the data is fetched. So, the same query could be used seamlessly to interact with XML or a RDBMS.