Comparing XML documents
Within the Unix environment, one of my favorite utilities is the diff command, which allows me to get a glimpse of differences between two files. I often use it to compare text files containing data or program source files to see what exactly changed between two versions. Naturally, I used diff to compare XML files as well since these days it seems all data files are in XML. It soon became evident that I'm missing half of the story. The diff tool has its roots in text comparison (character by character). It obviously is not aware of the structure of an XML file and cannot take advantage of the inherent hierarchy to make a comparison that is meaningful in the XML context. I started searching for an XML-aware diff tool and found one at http://www.deltaxml.com.
There are two main products that are both Java-based. The first is DeltaXML-Markup, which can compare and combine well-formed XML files (without DTD). The other product is DeltaXML-DTD that does the same for valid XML files (with a DTD). This results in smaller change files because it understands more about the structure of the data. For example, it can ignore changes to element order that are not significant, and it can match elements in the two files based on some keys in the data.
The software has a simple command-line interface. After unzipping the download, add the dxml.jar file to your classpath. You can then run the utility by typing:
java jar dxml.jar status
which should produce the following:
Delta XML Tools for DeltaXML-Markup(version 1_7)
, 2000, 2001 Monsell EDM Ltd. All rights reserved.
Using built-in license key
| Function | Mode | Expiration |
| Compare |DEMO | PERMANENT |
| Combine |DEMO | PERMANENT |
java -jar dxml.jar compare [-v] [-q] file1.xml file2.xml delta.xml
java -jar dxml.jar combine [-v] [-q] file.xml delta.xml result.xml
java -jar dxml.jar combine-forward [-v] [-q] file1.xml delta.xml file2.xml
java -jar dxml.jar combine-reverse [-v] [-q] file2.xml delta.xml file1.xml
java -jar dxml.jar relicense license-key
java -jar dxml.jar status
With the evaluation license, you are limited to small files. I used the following simple XML file (stored in a file called a.xml):
<name> John Doe </name>
<score> 88 </score>
<student grade = "10">
<name> Jane Doe </name>
<score> 98 </score>
<name> Bill Jones </name>
<score> 91 </score>
To create b.xml, I changed the grade of the second student to 11 (instead of 10) and I changed the last student's score element to scored. I then ran the comparator as follows:
java -jar dxml.jar compare -v a.xml b.xml delta.xml
The resulting file defines its own namespace (deltaxml) and then proceeds to mark the various elements as either unchanged or specify the nature of the change. For example, the student whose grade was changed to 11 produced the following:
<student deltaxml:delta= "WFmodify"deltaxml:old-attributes="grade= "10""deltaxml:new-attributes="grade="11""
The product ships with an XSL stylesheet that formats the result file into HTML tables for a concise and clear picture of the changes. The HTML table uses color coding to show what elements/attributes have changed and what the old/new values are. I used the Apache Xalan processor as follows:
C:\DeltaXML-Markup-1_7>java org.apache.xalan.xslt.Process -xsl deltaxml-tables.xsl -in delta.xml -out visualdelta.html
You can find additional documentation and technical details about DeltaXML from the web site. The version that understands DTD should be helpful when dealing with various meta data repositories. I can also see some usage in comparing/combining XSLT stylesheets with this tool.
- Piroz Mohseni
Piroz Mohseni is a freelance writer for Developer.com