October 15, 2018
Hot Topics:

Classic Parsing with Flex and Bison

  • November 8, 2006
  • By Victor Volkman
  • Send Email »
  • More Articles »

A Simple Calculator Language

I chose a calculator language as the example for many reasons. First of all, nearly all programming applications need to parse expressions with plus, minus, multiply, and divide operators and parenthesis. Second, it's a universally understood problem and not specific to any particular language. Although many examples use postfix (RPN) notation, this one uses good old infix notation since everyone understands that. Specifically, the goal is for the calculator to be able to handle expressions like this:

-1.1 + 2 * ( 4 / 3 )

...and produce the correct answer!

First off, you need to determine what the tokens of the language are. Basically, you will have numbers, four operators, parenthesis, and newline (standing in for the "=" key on your calculator). As such, you can write a straightforward Flex scanner (".l" file) to process this, as Listing 1 illustrates with lexer.l:

  1 %{
  2 // lexer.l -- From tcalc: a simple calculator program
  4 #include <stdio.h>
  5 #include <string.h>
  6 #include <stdlib.h>
  7 #include "calc.tab.h"
  8 extern YYSTYPE yylval;
 10 %}
 11 %option noyywrap
 12 delim         [ \t]
 13 whitesp       {delim}+
 14 digit         [0-9]
 15 number        [-]?{digit}*[.]?{digit}+
 16 %%
 17 {number}  { sscanf(yytext, "%lf", &yylval); return NUMBER;}
 18 "+"       { return PLUS; }
 19 "-"       { return MINUS; }
 20 "/"       { return SLASH; }
 21 "*"       { return ASTERISK; }
 22 "("       { return LPAREN; }
 23 ")"       { return RPAREN; }
 24 "\n"      { return NEWLINE; }
 25 {whitesp} { /* No action and no return */}

Listing 1. lexer.l

Note a few things in lexer.l (Listing 1):

  1. The definition of number on line 15 is a regular expression built up from instances of the earlier expression digit on line 14. The re-use of expressions is one of Flex's most powerful features.
  2. It thoroughly converts single-character symbols into tokens even though you could have interpreted them as characters later on in Bison. I feel that it enhances maintainability not to hardcode character references in the grammar file later. Also, it would help to isolate internationalization issues, if any, in the input language.
  3. Return values are used to designate the type of token, whereas the actual value of the token is stored in yytext.

Now, take a look at the grammar—the rules and actions that involve building the parse tree. Listing 2 shows these with tcalc.y:

  1 /* tcalc.y - a four function calculator */
  2 %{
  3 #define YYSTYPE double      /* yyparse() stack type */
  4 #include <malloc.h>
  5 #include <stdlib.h>
  6 %}
  7 /* BISON Declarations */
 10 /* Grammar follows */
 11 %%
 12 input:              /* empty string */
 13     | input line
 14     ;
 15 line: NEWLINE
        | expr NEWLINE           { printf("\t%.10g\n",$1); }
 17     ;
 18 expr: expr PLUS term         { $$ = $1 + $3; }
 19     | expr MINUS term        { $$ = $1 - $3; }
 20     | term
 21     ;
 22 term: term ASTERISK factor   { $$ = $1 * $3; }
 23     | term SLASH factor      { $$ = $1 / $3; }
 24     | factor
 25     ;
 26 factor:  LPAREN expr RPAREN  { $$ = $2; }
 27       | NUMBER
 28       ;
 29 %%
 30 /*--------------------------------------------------------*/
 31 /* Additional C code */
 32 /* Error processor for yyparse */
 33 #include <stdio.h>
 34 int yyerror(char *s)        /* called by yyparse on error */
 35 {
 36     printf("%s\n",s);
 37     return(0);
 38 }
 40 /*--------------------------------------------------------*/
 41 /* The controlling function */
 42 int main(void)
 43 {
 44     yyparse();
 45     exit(0);
 46 }

Listing 2. tcalc.y

Like all good C programs, the program begins in the main() function on line 42. The instruction yyparse() begins the input parsing and returns when end-of-input is reached. The output of the calculator will be displayed after each NEWLINE token is reduced on line #16. Also, notice the yyerror() function called in times of parsing failure on lines 34-38. More sophisticated state management routines are available for error handling, but this is only a simple example!

By way of screen dumps, Figure 3 offers some proof that the tcalc application works.

Figure 3. TCALC Screen Dumps

Here is the batch file I used to build the project in a CMD window:

set CCOPT=/c /Z7 /W3
bison calc.y
cl %CCOPT% calc.tab.c
bison -d calc.y
flex lexer.l
cl %CCOPT% lex.yy.c
link /DEBUG calc.tab.obj lex.yy.obj /out:tcalc.exe

Rapidly Develop Useful Parsers

Yacc/Bison remains one of the top 10 tools in the compiler writer's toolbox, despite being over 30 years old. Part of its enduring appeal is the similarity of the grammar input (".y" files) to the actual BNF syntax of the language you are parsing. This contributes greatly to the maintainability of Bison-based parsers. This ease of use combined with its low footprint and high degree of portability make Bison and its friend Flex the clear winners for rapidly developing useful parsers for today's challenging translation applications.

Book Recommendation

Compilers: Principles, Techniques, and Tools (Hardcover)

by Alfred V. Aho, Ravi Sethi, Jeffrey D. Ullman (ISBN 0201100886)

The legendary "dragon book" is still the preferred text for learning compiler design methodologies. This book has a perfect mixture of theory and practice that is unequalled. Along the way, you'll use real-world tools such as Flex and Bison to help develop your projects and, of course, your understanding of compiler technologies.

Compilers: Principles, Techniques, and Tools

About the Author

Victor Volkman has been writing for C/C++ Users Journal and other programming journals since the late 1980s. He is a graduate of Michigan Tech and a faculty advisor board member for Washtenaw Community College CIS department. Volkman is the editor of numerous books, including C/C++ Treasure Chest and is the owner of Loving Healing Press. He can help you in your quest for open source tools and libraries; just drop an e-mail to sysop@HAL9K.com.

Page 2 of 2

Comment and Contribute


(Maximum characters: 1200). You have characters left.



Enterprise Development Update

Don't miss an article. Subscribe to our newsletter below.

By submitting your information, you agree that developer.com may send you developer offers via email, phone and text message, as well as email offers about other products and services that developer believes may be of interest to you. developer will process your information in accordance with the Quinstreet Privacy Policy.


Thanks for your registration, follow us on our social networks to keep up-to-date