http://www.developer.com/xml/article.php/3416091/A-Flexible-Compile-Time-Configurable-XML-Parser.htm
XML parsing is very standard business; there are now parsers written in every possible language, for every possible platform. Very often, these parsers are less suited for simple, repetitive tasks, when complexity usually takes its toll on performance—and cautious users tend to avoid monolithic, "catch-all" approaches when not required. For a particular project, I needed a simple and highly configurable library, capable of parsing subsets of XML without paying the cost of extra features. And, as usually happens, I couldn't find an exact match; that's why another parser steps into the limelight. XML parsing can be reduced to a small collection of states: A simple finite state machine can be designed to take advantage of this observation, being practically encapsulated by a class like: The state functions can be expressed like: This looks pretty straightforward and some readers might be inclined to yawn or express their boredom in different ways at this point. Please be assured that the interesting stuff follows! As per design, NativeXMLParser is intended to be small, simple, and flexible. This means that: In this case, there is definitely no interest in more than the tags themselves. But what about: Let's talk about implementing these features for one moment. The usual way of varying properties and behavior is runtime polymorphism. Many well-documented patterns and idioms can be used in such an endeavor [1]. Our perspective is a little bit different: NativeXMLParser may be used for highly repetitive tasks (see the XML grammar example mentioned earlier), where the virtual mechanism doesn't respond well in terms of performance [2]. As a result, NativeXMLParser is built using template-based techniques and static polymorphism. NativeXMLParser takes two template parameters: MODEL_TYPE is the model (SAX, DOM, or something else) and the implementation simply takes advantage of the template Strategy pattern to vary the nature of the parser: XMLDOM and XMLEvtHandler are just specializations of an interface with the following signature: SPECIAL_CONFIGURATION_TYPE is by far the most interesting thing. As mentioned before, the functionality of this parser can be easily adjusted and SPECIAL_CONFIGURATION_TYPE does the trick. The idea is simple: Having defined a set of properties to be adjusted at compile time, it is natural to look for a compile-time, type-safe mechanism to download them. Actually, the mechanism has been known for a long time [3] and what has been used in this implementation is a loop traversing the properties and downloading values, accordingly to the property specialization. This is the set of properties: and an index is used to quickly loop through them: Let's see how this works: XMLParserHandler is a loop in disguise that's executed recursively until the exit condition is met (XMLParserHandler specialization returning zero). Interesting things happen during the looping process: The last step is to understand who E and SpecialHandler are expected to be: SpecialHandler is specialized for all COMMANDS (actually XMLPARSER_SPECIAL_INDEX::COMMANDS). EXPER is a simple way to express whether or not a property should be included. For example: As expected, EXPER_ALL considers all properties (COMMANDS) to be taken into account. But: EXPER_NO_CDATA, as the name says, doesn't consider CDATA; that's why CDATA index is switched to LAST_COMMAND (a command that's never actually executed). Based on this extensible mechanism, we now can write code like this: EXPER family contains the following variations: The code was tested with both VC++ 7.1 and g++ 3.3 compilers. Download the code that accompanies this article here. [1] Erich Gamma, Richard Helm, Ralph Johnson, and John M. Vlissides. Design Patterns: Elements of Reusable Object-Oriented Software, Addison-Wesley 1994. [2] Dov Bulka and David Mayhew. Efficient C++: Performance Programming Technique, Addison-Wesley 1999. [3] T. Veldhuizen. "Using C++ template metaprograms," C++ Report Vol. 7 No. 4 (May 1995).
A Flexible, Compile Time, Configurable XML Parser
October 1, 2004
Who Needs Another Parser?
It's All About State
STATE
REPRESENTATION
OBSERVATIONS START
<
END
>
SPECIAL
!
Used for special behavior (comments, CDATA sections, and so forth) END_FOUND
/
Used in conjunction wih START & END EXIT_LOOP
Artificial
namespace XMLParserState
{
enum STATE {EXIT_LOOP, END_FOUND, END, START, SPECIAL, LAST_STATE};
}
class NativeXMLParser
{
private:
typedef XMLParserState::STATE (XMLParser::*FUNC_TYPE)() ;
typedef std::map<int, FUNC_TYPE> States;
States states_;
//other private data
public:
void parseSource()
{
XMLParserState::STATE state = XMLParserState::START;
FUNC_TYPE f =0;
while (state != XMLParserState::EXIT_LOOP)
{
f = states_[state];
state = f ? (*this.*f)() : XMLParserState::START;
}
}
public:
XMLParserState::STATE findStart();
XMLParserState::STATE findEnd();
XMLParserState::STATE endFound();
XMLParserState::STATE specialFound();
private:
//other private members
};
findStart
findEnd
endFound
specialFound
SAX or DOM? Static or Dynamic Polymorphism?
<if i="1">
<method name="some1"/>
<else/>
<method name="some"/>
</endif>
<if i="1">
<method> someOne </method>
<else/>
<method> some </method>
</endif>
This time, the content has to be taken into account—and similar for CDATA and comments.Implementation
template <class MODEL_TYPE, class SPECIAL_CONFIGURATION_TYPE>
class XMLParser;
void parseXMLEvt(const std::string& s)
{
typedef XMLParser<XMLEvtHandler, SpecialDataHandler> XI;
//use XI
}
void parseXMLDOM(const std::string& s)
{
typedef XMLParser<XMLDOM, SpecialDataHandler> XI;
//use XI
}
struct I
{
void setName(const std::string& name)
{}
void setContent(const std::string& content)
{}
XMLBit* getRoot()
{}
void addNewXMLBit()
{}
void setParent()
{}
void setParentToBit()
{}
};
namespace XMLPARSER_SPECIAL
{
enum COMMANDS{
CDATA=XML_DICTIONARY::CDATA_START/*'['*/,
COMMENT=XML_DICTIONARY::COMMENT_BIT/* '-'*/, CONTENT
};
};
namespace XMLPARSER_SPECIAL_INDEX
{
enum COMMANDS{
CDATA, COMMENT, CONTENT, LAST_COMMAND
};
};
template < template <int Y> class E, int T >
struct XMLParserHandler
{
template<class V>
static int idx(V& v)
{
typedef E<T> B;
SpecialData sd;
SpecialHandler<B::value>::getName(sd.name);
SpecialHandler<B::value>::getEnd(sd.end);
sd.offset = SpecialHandler<B::value>::getStartOffset();
v[T] = sd;
return XMLParserHandler< E, T+1>::idx(v);
}
};
template < template <int Y> class E>
struct XMLParserHandler<E, XMLPARSER_SPECIAL_INDEX::LAST_COMMAND >
{
template<class V>
static int idx(V& v)
{
return 0;
}
};
template < int I>
struct SpecialHandler
{
static void getName(std::string& name)
{ }
static void getEnd(std::string& end)
{ }
static int getStartOffset()
{
return 0;
}
};
template < int Y>
struct Int2Type
{
enum {
value= Y
};
};
template <int Y>
struct EXPER_ALL : public Int2Type<Y>
{};
template <int Y>
struct EXPER_NO_CDATA: public EXPER_ALL<Y>
{};
template <>
struct EXPER_NO_CDATA<XMLPARSER_SPECIAL_INDEX::CDATA> :
public EXPER_ALL<XMLPARSER_SPECIAL_INDEX::LAST_COMMAND>
{};
void parseXMLEvt(const std::string& s)
{
typedef XMLParser<XMLEvtHandler, SpecialDataHandler> XI;
XI::parse<XMLParserHandlerInit<EXPER_NO_CDATA> >(s);
}
void parseXMLDOM(const std::string& s)
{
typedef XMLParser<XMLDOM, SpecialDataHandler> XI;
XI::parse<XMLParserHandlerInit<EXPER_NO_CDATA> >(s);
}
EXPER_ALL
EXPER_NO_CDATA
EXPER_CONTENT_ONLY
EXPER_BASIC (no CDATA, no comments, no content)
EXPER_NO_CONTENT
EXPER_NO_COMMENT
Download the Code
References