March 1, 2021
Hot Topics:

VTD-XML: A New Vision of XML

  • By Victor Volkman
  • Send Email »
  • More Articles »


VTD-XML is a suite of innovative XML processing technologies centered around a non-extractive XML parsing technique called Virtual Token Descriptor (VTD). VTD-XML provides interfaces for C, C#, and Java. VTD-XML solves a number of problems inherent with existing DOM and SAX models in a way that makes it ideal for Service Oriented Architecture (SOA) applications.

Depending on the perspective, VTD-XML can be viewed as one of the following:

  • A "document-centric" XML parser
  • A native XML indexer or a file format that uses binary data to enhance the text XML
  • An incremental XML content modifier
  • An XML slicer/splitter/assembler
  • An XML editor/eraser
  • A way to port XML processing onto a chip

VTD-XML is highly memory efficient; benchmarks show a typical overhead of only 1.3 to 1.5 times the size of an XML document (in bytes) to achieve random access. VTD-XML beats SAX parsers in benchmarks by a margin of 1.5 to 2.0 times parsing speed. A report comparing the performance of C, C#, and Java interfaces is available. In this article, you'll look at the theory behind VTD-XML and some sample apps that show off its best features.

How It Works

A typical DOM parser allocates one unit of memory for each token in the XML input file tree. This is costly in both memory performance (due to fragmentation) and time because of the sheer quantity of allocation requests. VTD-XML simply stores a verbatim copy of the XML in-memory unparsed and then adds an "index" in front of it to allow for simple navigation and access. Because reading an XML file is by definition a read-only process, it makes sense that you need not have the flexibility of variable-allocation at this point in the parsing. Last, keep in mind that VTD-XML is technically a processing model rather than an API and you can build your own API on top of a VTD-XML model. Through the remainder of this article, I'll demonstrate the XimpleWare implementation available from SourceForge.

Making libvtd-xml.lib for Visual Studio 2005

VTD-XML 2.2.1 for C (as released on 10/27/2007) includes only makefiles designed for Gnu CC (GCC). To build for Visual Studio 2005, I just improvised by compiling all the code in the directory and shoving it into a library, like this:

del *.lib
lib /out:libvtd-xml.lib arrayList.obj autoPilot.obj
   binaryExpr.obj bookMark.obj contextBuffer.obj
   decoder.obj elementFragmentNs.obj fastIntBuffer.obj
   fastLongBuffer.obj filterExpr.obj funcExpr.obj
   helper.obj indexHandler.obj intHash.obj l8.tab.obj
   lex.yy.obj literalExpr.obj locationPathExpr.obj
   nodeRecorder.obj numberExpr.obj pathExpr.obj
   RSSReader.obj textIter.obj unaryExpr.obj
   unionExpr.obj vtdGen.obj vtdNav.obj XMLChar.obj

Hello, VTD-XML!

In the longstanding Computer Science tradition of showing the simplest possible example first, you'll start off by looking at the shortest C program you can reasonably write to parse an XML file. Its task: Echo all the nodes in an XML file to a stdout stream.

 1 #include "everything.h"
 2 struct exception_context the_exception_context[1];
 3 int main(){
 4    exception e;
 5    VTDGen *vg = NULL;
 6    VTDNav *vn = NULL;
 7    UCSChar *string = NULL;
 8    Try{
 9       vg = createVTDGen();
10       if (parseFile(vg,TRUE,"input.xml")){
11          vn = getNav(vg);
12          if (toElementNS(vn,FIRST_CHILD,L"someURL",L"b")){
13             int i = getText(vn);
14             if (i!=-1){
15                string = toString(vn,i);
16                wprintf(L"the text node value is %d ==> %s \n",
17                free(string);
18             }
19          }
20          free(vn->XMLDoc);
21       } else {
22          free(vg->XMLDoc);
23       }
24    }Catch(e){    // handle various types of exceptions here
25    }
26    freeVTDGen(vg);
27    freeVTDNav(vn);
28    return 0;
29 }

The first step is to create yourself a VTD Generator instance via createVTDGen() as you do in Line 9. The VTDGen object parses the DTD but doesn't resolve declared entities. Next, you use the VTDGen object to parse the XML input file. Your input file for this test is appropriately simple:

<ns1:a xmlns:ns1="someURL">
   <ns1:b> hello world! </ns1:b>

Listing 1: Input.xml for the hello program

You then pass in the input filename and a flag indicating whether the parser should be namespace-aware to parseFile(). For Internet feeds, such as RSS files, you can use parseHttpUrl() in a similar manner, except that you pass in the "http://.." URI.

Next, you initialize a VTDnav navigation cursor object with getNav(); now, you can use the toElementNS() method to begin traversal. The first parameter in toElementNS() signifies the direction of travel that can be an enumerated value ROOT, PARENT, FIRST_CHILD, LAST_CHILD, NEXT_SIBLING, or PREV_SIBLING. The second parameter is an URL, which is irrelevant for this example. The third and final parameter is the namespace of interest, which in the example is namespace "b" (for example, <ns1:b>)

Assuming the navigation worked, you then can call getText() to get the VTD index of the text node and then toString() to pull out the actual node data. In accordance with your input.xml, above, the output when run from the DOS prompt is:

C:\ximpleware\demo> hello_world.exe
the text node value is 5 ==>  hello world!

The whole code block is surrounded by a Try/Catch macro, an approximation of C++ style exception handling, courtesy of Adam M. Costello's cexcept.

Page 1 of 2

This article was originally published on December 3, 2007

Enterprise Development Update

Don't miss an article. Subscribe to our newsletter below.

Thanks for your registration, follow us on our social networks to keep up-to-date