August 29, 2014
Hot Topics:
RSS RSS feed Download our iPhone app

Let Your Parser Go for the GOLD

  • July 19, 2007
  • By Victor Volkman
  • Send Email »
  • More Articles »

Building the GOLD Engine

If there is any downside to the GOLD parser, it is that the language-specific bindings for the GOLD Engine are neither documented online nor offline. Instead, you have to use Doxygen to generate the HTML, which I found a silly chore—especially considering that C++ is probably one of the most popular platforms and documentation for its GOLD Engine should have been readily available. In the end, I felt like I was more reverse-engineering the demo programs rather than actually learning something from the docs. This is not quite the same as a step-by-step tutorial.

I just improvised by compiling all the code in the SRC directory and shoving it into a library, like this:

cl /c /GX *.cpp /I..include
lib /out:cpp-gpenginelibgpengine.lib *.obj

However, there are Project files available for a variety of Visual Studio versions if you want to do that, even one that builds the Engine as a DLL. Again, I don't see a reason why precompiled binaries weren't provided for these common platforms. The final link step, then, looks like this:

cl /GX main.cpp /Id:cpp-gpengineinclude /link
d:cpp-gpeginelibgpengine.lib

An App Made of GOLD

Last, you'll take a look at a simple program that uses the CALC.CGT written out by the Gold Builder in the earlier steps and proceeds to parse it using the C++ binding you've already prepared:

 1 #include <iostream>
 2 #include <stdlib.h>
 3
 4 #include "CGTFile.h"
 5 #include "simpleErrorRep.h"
 6 #include "calc.h"    // list of constants generated by Builder
 7
 8 char *load_source (char *filename);
 9 void printTokens (vector <Token*> &t);
10
11 int main(int argc, char *argv[])
12 {
13    CGTFile    cgtFile;
14    char       *source;
15    Symbol     *rdc;
16    DFA        *dfa;
17    LALR       *lalr;
18
19    ErrorTable       *myErrors;
20    SimpleErrorRep   myReporter;
21
22    // Load grammar file
23    if (cgtFile.load ("calc.cgt")) {
24       wprintf (L"%sn", "Grammar loaded succesfully");
25       cgtFile.printInfo ();
26    } else {
27       wprintf (L"%sn", "error loading file");
28       return -1;
29    }
30
31    // Make some source!
32    source = "3 * (4 + 5)r";
33
34    // Get DFA (Deterministic Finite Automata) from the cgt file
35    dfa = cgtFile.getScanner();
36
37    // Scan the source in search of tokens
38    dfa->scan(source);
39
40    // Get the error table
41    myErrors = dfa->getErrors();
42
43    // If there are errors report them
44    if (myErrors->errors.size() > 0) {
45       for (int i=0; i < myErrors->errors.size(); i++) {
46          cout << myReporter.composeErrorMsg
            (*myErrors->errors[i]) << endl;
47       }
48       return -1;
49    }
50
51    // Get the tokens to feed the LALR machine with them
52    vector <Token*> tokens = dfa->getTokens();
53    cout << "printing tokens..." << endl;
54    printTokens (tokens);
55
56    // Get the LALR (Look Ahead, Left-to-right parse,
      //               Rightmost-derivation)
57    lalr = cgtFile.getParser();
58
59    // Parse the tokens
60    rdc = lalr->parse (tokens, true, true);
61
62    myErrors = lalr->getErrors();
63    if (myErrors->errors.size() != 0) {
64       for (int i=0; i < myErrors->errors.size(); i++) {
65       cout << myReporter.composeErrorMsg
         (*myErrors->errors[i]) << endl;
66    }
67    return -1;
68    }
69
70    cout << "printing reduction tree..." << endl;
71    lalr->printReductionTree(rdc, 0);
72
73    delete rdc;
74
75    return EXIT_SUCCESS;
76 }
77
78
79 void printTokens (vector <Token*> &t) {
80    for (integer i = 0; i < t.size(); i++) {
81       wchar_t *pwe = t[i]->symbol;
82       wprintf (t[i]->symbol);
83       wprintf (L":");
84       wprintf (t[i]->image);
85       wprintf (L"n");
86    }
87 }
88

The program starts in main() on Line 23 where you load the externally generated Compiled Grammar Table (CGT), which is the binary output of the GOLD Builder. Thus, you can tweak the grammar as much as you like without necessarily needing a recompile.

On Line 38, you do the actual scanning of the source text (hardcoded for this simple example) using a DFA built from the stored image in the CGT file. The next steps are to get the tokens out of the DFA results and stuff them into the parser.

51    // Get the tokens to feed the LALR machine with them
52    vector <Token*> tokens = dfa->getTokens();
53    cout << "printing tokens..." << endl;
54    printTokens (tokens);
55
56    // Get the LALR (Look Ahead, Left-to-right parse,
      //               Rightmost-derivation)
57    lalr = cgtFile.getParser();
58
59    // Parse the tokens
60    rdc = lalr->parse (tokens, true, true);

The two arguments at the end of Line 60 designate trim reduction and whether to stop parsing after the first error is encountered. Trim reduction means to simplify rules of the form: NonTerminal1 := Nonterminal2 (intermediate productions). Last, to prove that your program did something, you print the tokens (see Line #53) and print the reduction tree (see Line #71). The printing of the tokens is mainly to show that you recognized each of tokens correctly and should read left-to-right along the input. The reduction tree shows a tree you would use to control the expression evaluation so that, in fact, it does "4 + 5" and then multiplies that result by 3. Below is the output from the demo program:

printing tokens...
NUMBER:3
ASTERISK:*
LPAREN:(
NUMBER:4
PLUS:+
NUMBER:5
RPAREN:)
NewLine:
EOF:EOF

printing reduction tree...
input
 term
  factor
   NUMBER:3
  ASTERISK:*
  factor
   LPAREN:(
   expr
    factor
     NUMBER:4
    PLUS:+
    factor
     NUMBER:5
   RPAREN:)
NewLine:

Conclusion

I've barely scratched the surface of what you can do with the GOLD parser system. In this article really only took you through the steps up to a complete parse tree. Adding in the semantics is the next step for you to take. There are lots of applications from compilers to XML translators that you can implement with this flexible tool. The added advantage of being able to test your grammar inside the Builder can save you time wasted recompiling and debugging your code just to test the quality of the grammar. This alone is a big step up from the traditional YACC/Bison solutions. Add in the incredible cross-platform support for everything from .NET to Java and you'll be sure to be finding new ways to use GOLD every day.

About the Author

Victor Volkman has been writing for C/C++ Users Journal and other programming journals since the late 1980s. He is a graduate of Michigan Tech and a faculty advisor board member for Washtenaw Community College CIS department. Volkman is the editor of numerous books, including C/C++ Treasure Chest and is the owner of Loving Healing Press. He can help you in your quest for open source tools and libraries; just drop an e-mail to sysop@HAL9K.com.





Page 2 of 2



Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 


Sitemap | Contact Us

Rocket Fuel