December 21, 2014
Hot Topics:

A Flexible, Compile Time, Configurable XML Parser

  • October 1, 2004
  • By Radu Braniste
  • Send Email »
  • More Articles »

Who Needs Another Parser?

XML parsing is very standard business; there are now parsers written in every possible language, for every possible platform. Very often, these parsers are less suited for simple, repetitive tasks, when complexity usually takes its toll on performance—and cautious users tend to avoid monolithic, "catch-all" approaches when not required.

For a particular project, I needed a simple and highly configurable library, capable of parsing subsets of XML without paying the cost of extra features. And, as usually happens, I couldn't find an exact match; that's why another parser steps into the limelight.

It's All About State

XML parsing can be reduced to a small collection of states:

STATE REPRESENTATION OBSERVATIONS
START <  
END >  
SPECIAL ! Used for special behavior (comments, CDATA sections, and so forth)
END_FOUND / Used in conjunction wih START & END
EXIT_LOOP   Artificial

A simple finite state machine can be designed to take advantage of this observation, being practically encapsulated by a class like:

namespace XMLParserState
{
   enum STATE {EXIT_LOOP, END_FOUND, END, START, SPECIAL, LAST_STATE};
}
class NativeXMLParser
{
private:
   typedef XMLParserState::STATE (XMLParser::*FUNC_TYPE)() ;
   typedef std::map<int, FUNC_TYPE> States;
   States states_;
   //other private data
public:
   void parseSource()
   {
      XMLParserState::STATE state = XMLParserState::START;
      FUNC_TYPE f =0;
      while (state != XMLParserState::EXIT_LOOP)
      {
         f = states_[state];
         state = f ? (*this.*f)() : XMLParserState::START;
      }
   }
public:
   XMLParserState::STATE findStart();
   XMLParserState::STATE findEnd();
   XMLParserState::STATE endFound();
   XMLParserState::STATE specialFound();
private:
   //other private members
};

The state functions can be expressed like:

findStart
  1. if START is followed by '/', then endFound
  2. if START is followed by '!', then specialFound
  3. else findEnd
findEnd
  1. if END is preceded by '/', then findStart
  2. else specialFound
endFound
  1. process
  2. goto findStart
specialFound
  1. process
  2. goto findStart

This looks pretty straightforward and some readers might be inclined to yawn or express their boredom in different ways at this point. Please be assured that the interesting stuff follows!

SAX or DOM? Static or Dynamic Polymorphism?

As per design, NativeXMLParser is intended to be small, simple, and flexible. This means that:

  1. It has no ambition in supporting the whole XML standard (even if it is easy to extend it to add additional features)
  2. It is highly configurable; for example, supporting CDATA but not Comments, and so on. This point deserves some additional explanation. Suppose we have to parse a XML-based grammar centered on attributes, like this:
    <if i="1">
       <method name="some1"/>
    <else/>
       <method name="some"/>
    </endif>
    

    In this case, there is definitely no interest in more than the tags themselves. But what about:

    <if i="1">
       <method> someOne </method>
    <else/>
       <method> some </method>
    </endif>
    
    This time, the content has to be taken into account—and similar for CDATA and comments.
  3. It loosely supports both events-based and document model (not conformant).

Let's talk about implementing these features for one moment. The usual way of varying properties and behavior is runtime polymorphism. Many well-documented patterns and idioms can be used in such an endeavor [1]. Our perspective is a little bit different: NativeXMLParser may be used for highly repetitive tasks (see the XML grammar example mentioned earlier), where the virtual mechanism doesn't respond well in terms of performance [2]. As a result, NativeXMLParser is built using template-based techniques and static polymorphism.





Page 1 of 2



Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 


Enterprise Development Update

Don't miss an article. Subscribe to our newsletter below.

Sitemap | Contact Us

Rocket Fuel