http://www.developer.com/

Back to article

Working With Design Patterns: Interpreter


May 22, 2008

The Design Patterns book suggest that the interpreter pattern is the same as the composite design pattern, except for intent. A composite is an interpreter when the resulting hierarchical structure represents something you might consider a grammar. Technically, then, the example I built for composite is an implementation that also represents an interpreter! So I need go no further here, right?

No, I'm not done here, and for the primary reason that there are few good examples available to help developers understand interpreters. My example here perhaps isn't perfect, but I'm hoping it adds a bit more understanding than all the other examples I found when I searched for "interpreter design pattern."

My goal is to build an interpreter that supports a simple boolean language. The language in turn is the basis for a set of simple commands to move documents from one folder to another, based on dynamic criteria. Think "resume filter" as a potential application. For example, one command might be:

   move from incomingResumes to phoneScreen when
      contains fortran or smalltalk and olderThan 01/01/2006

Here, I'll concern myself with the second part of the expression, starting with "contains," because the interpreter pattern is most appropriate when the expression is something that can resolve to a single result.

Because I'm working with documents, I'll need a simple document class to demonstrate use of the interpreter. Listing 1 shows a TextDocument implementation that supports a contains method and also encapsulates a creation date. Because I'm always working in a TDD fashion, Listing 2 shows tests for the TextDocument class.

Listing 1: Documents.

// Document.java:
public interface Document {
   public boolean contains(String... keywords);

   public java.util.Date getDate();
}

// TextDocument.java
import java.util.*;

public class TextDocument implements Document {
   private final String contents;
   private final Date date;

   public TextDocument(Date date, String contents) {
      this.date = date;
      this.contents = contents;
   }

   @Override
   public boolean contains(String... textElements) {
      for (String text: textElements)
         if (contents.contains(text))
            return true;
      return false;
   }

   @Override
   public Date getDate() {
      return date;
   }
}

Listing 2: A test for Contains.

import static org.junit.Assert.*;

import java.util.*;
import org.junit.*;

public class ContainsTest {
   private static final String CONTENTS = "these are the contents";
   private TextDocument document;

   @Before
   public void createDocument() {
      document = new TextDocument(new Date(), CONTENTS);
   }

   @Test
   public void failsWhenTextNotInContents() {
      Expression expression = new Contains(CONTENTS + "x");
      assertFalse(expression.evaluate(document));
   }

   @Test
   public void passesWhenTextInContents() {
      String text = "contents";
      assertTrue(CONTENTS.indexOf(text) != -1);

      Expression expression = new Contains(text);
      assertTrue(expression.evaluate(document));
   }
}

The basic direction for building an interpreter, using this design pattern, is to translate the grammar at hand into a set of classes. A class represents each rule in the grammar. Fields on that class represent the symbols to which a rule translates. For example, in my little sub-grammar, there is a rule that looks like:

   And ::= Expression 'and' Expression

Thus, my implementation of the interpreter pattern includes a class named And, with two fields, each containing an Expression object. (Instead of creating a separate object to represent text, such as "and," I usually can embed it directly into code in the rule class, And in this case.)

With basic support in place, I can start creating simple rule classes that can work on Document objects. For example, Contains (see Listing 3) indicates whether or not a document contains any of the keywords passed in. OlderThan (see Listing 4) determines whether or not a document is older than a specified date. Each of these inquiries implements the Expression interface (see Listing 5), allowing clients to evaluate the condition.

Listing 3: The Contains expression.

public class Contains implements Expression {
   private final String[] keywords;

   public Contains(String... keywords) {
      this.keywords = keywords;
   }

   @Override
   public boolean evaluate(Document document) {
      return document.contains(keywords);
   }
}

Listing 4: The OlderThan expression.

import java.util.*;

public class OlderThan implements Expression {
   private final Date date;

   public OlderThan(Date date) {
      this.date = date;
   }

   @Override
   public boolean evaluate(Document document) {
      return document.getDate().before(date);
   }
}

Listing 5: The Expression interface.

public interface Expression {
   boolean evaluate(Document document);
}

The Contains and OlderThan expressions are known as terminal expressions in the interpreter pattern. They are the building blocks for composite expressions, such as expressions involving the use of "and" or "or." Listing 6 shows a test for an And class, and Listing 7 shows the implementation.

The code for evaluate in the And class presents the core of the composite/interpreter pattern: Evaluating an "and" expression is a matter of recursively evaluating each of its left-hand and right-hand expressions. Composites such as And are known as nonterminal expressions in the interpreter pattern.

Listing 6: A test for a binary nonterminal expression.

import static org.junit.Assert.*;

import java.util.*;
import org.junit.*;

public class AndTest {
   private static final String CONTENTS = "these are the contents";
   private static final Date NOW = new Date();
   private TextDocument document;

   @Before
   public void createDocument() {
      document = new TextDocument(NOW, CONTENTS);
   }

   @Test
   public void failsWhenNeitherConditionMet() {
      Expression expression = new And(new Contains(CONTENTS + "x"),
         new OlderThan(NOW));
      assertFalse(expression.evaluate(document));
   }

   @Test
   public void failsWhenOnlyFirstConditionMet() {
      String text = "contents";
      assertTrue(CONTENTS.indexOf(text) != -1);

      Expression expression = new And(new Contains(text),
         new OlderThan(NOW));
      assertFalse(expression.evaluate(document));
   }

   @Test
   public void passesWhenBothConditionsMet() {
      String text = "contents";
      assertTrue(CONTENTS.indexOf(text) != -1);

      Date future = new Date(NOW.getTime() + 1);

      Expression expression = new And(new Contains(text),
         new OlderThan(future));
      assertTrue(expression.evaluate(document));
   }
}

Listing 7: The And class.

public class And implements Expression {
   private final Expression leftExpression;
   private final Expression rightExpression;

   public And(Expression leftExpression,
      Expression rightExpression) {
         this.leftExpression = leftExpression;
         this.rightExpression = rightExpression;
   }

   public boolean evaluate(Document document) {
      return leftExpression.evaluate(document) &&
         rightExpression.evaluate(document);
   }
}

Within the interpreter design pattern, the Document passed from Expression object to Expression object (via the evaluate method) is known as the context.

The interpreter pattern provides many benefits. As you can see, tests for individual expression classes are small and easy to understand. The tests, too, are very straightforward. It's also easy to incorporate a new set of expression classes into the grammar without impacting other, existing expressions.

Interpreter is not designed for large grammars; for that, you should consider parser generators and the like. It can suffer from performance problems, and can get too unwieldy with more complex grammars.

Parsing

Unfortunately, the Interpreter pattern discussions rarely discuss how to create the composite structure in the first place. For my amusement, I chose to test-drive a simple parser that could take a string and generate the appropriate interpreter hierarchy. Listing 8 starts things off with a simple test to verify that I can parse a single terminal expression.

Listing 8: A first test for the parser.

import static org.junit.Assert.*;
import org.junit.*;

public class ParserTest {
   @Test
   public void singleExpression() {
      Parser parser = new Parser();
      Expression expression = parser.parse("contains text");
      assertEquals(Contains.class, expression.getClass());
      Contains contains = (Contains)expression;
      String[] keywords = contains.getKeywords();
      assertEquals("text", keywords[0]);
   }
}

Getting this test to pass is a simple matter (see Listing 9).

Listing 9: A very specific Parser implementation.

public class Parser {
   public Expression parse(String expression) {
      String[] tokens = expression.split(" ");
      return new Contains(tokens[1]);
   }
}

Listing 10 adds just a little more complexity—the "contains" keyword now needs to allow specifying multiple keywords.

Listing 10: A test for supporting multiple keywords.

import static org.junit.Assert.*;
import org.junit.*;

public class ParserTest {
   private Parser parser;

   @Before
   public void createParser() {
      parser = new Parser();
   }

   @Test
   public void singleContainsExpression() {
      Expression expression = parser.parse("contains text");
      assertKeywords((Contains)expression, "text");
   }

   @Test
   public void containsMultipleKeywords() {
      Expression expression = parser.parse("contains text bozo");
      assertKeywords((Contains)expression, "text", "bozo");
   }

   private void assertKeywords(Contains contains,
                               String... expected) {
      assertArrayEquals(expected, contains.getKeywords());
   }
}

Listing 11 provides the implementation for supporting multiple keywords. Note that the constructor for the Contains class will need to change to support taking a String array.

Listing 11: Supporting multiple keywords.

import java.util.*;

public class Parser {
   private List<String> arguments = new ArrayList<String>();

   public Expression parse(String expression) {
      String[] tokens = expression.split(" ");
      for (String token: tokens) {
         if (!token.equals("contains"))
            pushArgument(token);
      }
      return new Contains
         ((String[])arguments.toArray(new String[0]));
   }

   private void pushArgument(String token) {
      arguments.add(token);
   }
}

The next simplest thing to implement would be another terminal expression, such as "olderThan." Listings 12 and 13 lay out the test and changes to the Parser class.

Listing 12: Driving support for an additional terminal expression.

import static org.junit.Assert.*;
import java.util.*;
import org.junit.*;

public class ParserTest {
   private Parser parser;

   @Before
   public void createParser() {
      parser = new Parser();
   }

   @Test
   public void singleContainsExpression() {
      Expression expression = parser.parse("contains text");
      assertKeywords((Contains)expression, "text");
   }

   @Test
   public void containsMultipleKeywords() {
      Expression expression = parser.parse("contains text bozo");
      assertKeywords((Contains)expression, "text", "bozo");
   }

   @Test
   public void olderThan() {
      Expression expression = parser.parse("olderThan 03/31/2008");
      OlderThan olderThan = (OlderThan)expression;
      assertDate(2008, Calendar.MARCH, 31, olderThan.getDate());
   }

   private void assertDate(int year, int month,
                           int dayOfMonth, Date date) {
      Calendar calendar = GregorianCalendar.getInstance();
      calendar.setTime(date);
      assertEquals(year, calendar.get(Calendar.YEAR));
      assertEquals(month, calendar.get(Calendar.MONTH));
      assertEquals(dayOfMonth,
                   calendar.get(Calendar.DAY_OF_MONTH));
   }

   private void assertKeywords(Contains contains,
                               String... expected) {
      assertArrayEquals(expected, contains.getKeywords());
   }
}

Listing 13: Supporting an additional terminal expression.

import java.text.*;
import java.util.*;

public class Parser {
   private List<String> arguments = new ArrayList<String>();

   public Expression parse(String expression) {
      String command = null;
      String[] tokens = expression.split(" ");
      for (String token: tokens)
         if (isKeyword(token))
            command = token;
         else
            pushArgument(token);
      return createExpression(command);
   }

   private Expression createExpression(String command) {
      if (command.equals("contains"))
         return new Contains
            ((String[])arguments.toArray(new String[0]));
      if (command.equals("olderThan"))
         return new OlderThan(parseDate(arguments.get(0)));
      return null;
   }

   private Date parseDate(String textDate) {
      try {
         return new
            SimpleDateFormat("MM/dd/yyyy").parse(textDate);
      } catch (ParseException unexpected) {
         return null;
      }
   }

   private boolean isKeyword(String token) {
      return token.equals("contains") ||
         token.equals("olderThan");
   }

   private void pushArgument(String token) {
      arguments.add(token);
   }
}

Well, my Parser so far is a bit messy. It's now very obvious that there is a violation of responsibilities in the Parser class because it must understand how to parse differing arguments for the associated Expression objects. I'm also not fond of the notion of having to hold onto a string representing the command, and then instantiating it later when arguments can be attached. Maybe following the Java bean convention is more appropriate: I'll change the Expression classes to support a no-argument constructor as well as a setter for arguments.

After about a dozen minutes of incremental refactoring, I end up with a cleaner solution, shown in Listing 14.

Listing 14: A refactored Parser.

import java.util.*;

public class Parser {
   private List<String> arguments = new ArrayList<String>();

   public Expression parse(String expressionText) {
      String command = null;
      String[] tokens = expressionText.split(" ");
      for (String token: tokens)
         if (isKeyword(token))
            command = token;
         else
            arguments.add(token);

      Expression expression = createExpression(command);
      expression.setArgs(arguments);
      return expression;

   }

   private Expression createExpression(String command) {
      if (command.equals("contains"))
         return new Contains();
      if (command.equals("olderThan"))
         return new OlderThan();
      return null;
   }

   private boolean isKeyword(String token) {
      return token.equals("contains") ||
         token.equals("olderThan");
   }
}

Each of the Expression implementations changes a bit (see Listing 15), to support a no-arg constructor and to supply a setArgs method.

Listing 15: A reworked Expression sub-type interface.

import java.util.*;

public class Contains implements Expression {
   private List<String> keywords = new ArrayList<String>();

   @Override
   public boolean evaluate(Document document) {
      return document.contains
         ((String[])keywords.toArray(new String[0]));
   }

   String[] getKeywords() {
      return (String[])keywords.toArray(new String[0]);
   }

   @Override
   public void setArgs(List<String> args) {
      this.keywords = args;
   }
}

Well, that's nice, but it still leaves some redundancy that I must deal with. When adding a new Expression type, I have to add code to both createExpression and isKeyword. That's not only painful but increases risk. A bit of reflection closes off the code in both methods, in favor of a single map that must be initialized (see Listing 16). Adding new expression classes requires updating code only in this one place.

Listing 16: An appropriate use of reflection?

import java.util.*;

public class Parser {
   private List<String> arguments = new ArrayList<String>();
   private Map<String,Class<? extends Expression>>
      expressionTypes = new HashMap<String,Class<?
      extends Expression>>();
   {
      expressionTypes.put("contains", Contains.class);
      expressionTypes.put("olderThan", OlderThan.class);
   }

   public Expression parse(String expressionText) {
      String command = null;
      String[] tokens = expressionText.split(" ");
      for (String token: tokens)
         if (isKeyword(token))
            command = token;
         else
            arguments.add(token);

      Expression expression = createExpression(command);
      expression.setArgs(arguments);
      return expression;

   }

   private Expression createExpression(String command) {
      try {
         return expressionTypes.get(command).newInstance();
      } catch (Exception e) {
         throw new RuntimeException(e);
      }
   }

   private boolean isKeyword(String token) {
      return expressionTypes.containsKey(token);
   }
}

Not bad. I could make the case for splitting out the creation code into a separate factory class, but I'll do that on my own time.

A new test for "and" expressions (see Listing 17) fails.

Listing 17: Driving support for non-terminal expressions.

@Test
public void andExpression() {
   Expression expression =
      parser.parse("olderThan 03/31/2008 and contains java");
   And and = (And)expression;
   OlderThan olderThan = (OlderThan)and.getLeft();
   assertDate(2008, Calendar.MARCH, 31, olderThan.getDate());

   Contains contains = (Contains)and.getRight();
   assertKeywords((Contains)contains, "java");
}

The implementation requires a bit of rethinking the algorithm, but at least many of the required components are in place, so it doesn't take too long (see Listing 18).

Listing 18: Supporting non-terminal expressions.

import java.util.*;

public class Parser {
   private List<String> arguments = new ArrayList<String>();
   private Map<String, Class<? extends Expression>>
      expressionTypes =
      new HashMap<String, Class<? extends Expression>>();
   {
      expressionTypes.put("contains", Contains.class);
      expressionTypes.put("olderThan", OlderThan.class);
      expressionTypes.put("and", And.class);
   }

   private Expression current;
   private List<Expression> expressions =
      new ArrayList<Expression>();

   public Expression parse(String expressionText) {
      String[] tokens = expressionText.split(" ");
      for (String token: tokens)
         if (isKeyword(token)) {
            storeArguments();
            current = createExpression(token);
            if (isProcessingBinaryExpression()) {
               And and = (And)pop();
               Expression left = pop();
               and.set(left, current);
               push(and);
            } else
               push(current);
         } else
            arguments.add(token);

      storeArguments();
      return pop();
   }

   private boolean isProcessingBinaryExpression() {
      return expressions.size() == 2;
   }

   private void storeArguments() {
      if (current == null)
         return;
      current.setArgs(arguments);
      arguments = new ArrayList<String>();
   }

   private boolean push(Expression expression) {
      return expressions.add(expression);
   }

   private Expression pop() {
      return expressions.remove(expressions.size() - 1);
   }

   private Expression createExpression(String command) {
      try {
         return expressionTypes.get(command).newInstance();
      } catch (Exception e) {
         throw new RuntimeException(e);
      }
   }

   private boolean isKeyword(String token) {
      return expressionTypes.containsKey(token);
   }
}

Getting close! I obviously need tests to drive out support for "or" in addition to "and," and then a few big tests to demonstrate that this truly works for more involved expressions. A final version of the parser class appears in Listing 19. I also show some of the refactored Expression class hierarchy.

Listing 19: The Parser.

// Expression.java
import java.util.*;

public interface Expression {
   void setArgs(List<String> args);
   boolean evaluate(Document document);
}

// BinaryExpression
import java.util.*;

abstract public class BinaryExpression implements Expression {
   protected Expression leftExpression;
   protected Expression rightExpression;

   public void set(Expression leftExpression,
                   Expression rightExpression) {
      this.leftExpression = leftExpression;
      this.rightExpression = rightExpression;
   }

   abstract public boolean evaluate(Document document);

   public Expression getLeft() {
      return leftExpression;
   }

   public Expression getRight() {
      return rightExpression;
   }

   @Override
   public void setArgs(List<String> args) {
   // violation of Liskov! OK for now.
   }
}

// Or.java
public class Or extends BinaryExpression implements Expression {
   public boolean evaluate(Document document) {
      return leftExpression.evaluate(document) ||
             rightExpression.evaluate(document);
   }
}


// KeywordExpression
import java.util.*;

abstract public class KeywordExpression implements Expression {
   protected List<String> keywords;

   @Override
   abstract public boolean evaluate(Document document);

   @Override
   public void setArgs(List<String> keywords) {
      this.keywords = keywords;
   }

   String[] getKeywords() {
      return (String[])keywords.toArray(new String[0]);
   }
}

public class Contains extends KeywordExpression
   implements Expression {
   @Override
   public boolean evaluate(Document document) {
      return document.contains(ListUtil.asArray(keywords));
   }
}

// Parser.java
import java.util.*;

public class Parser {
   private List<String> arguments = new ArrayList<String>();
   private Map<String, Class<? extends Expression>>
      expressionTypes = new HashMap<String, Class<?
      extends Expression>>();
   {
      expressionTypes.put("contains", Contains.class);
      expressionTypes.put("excludes", Excludes.class);
      expressionTypes.put("olderThan", OlderThan.class);
      expressionTypes.put("and", And.class);
      expressionTypes.put("or", Or.class);
   }

   private Expression current;
   private List<Expression> expressions =
      new ArrayList<Expression>();

   public Expression parse(String expressionText) {
      String[] tokens = expressionText.split(" ");
      for (String token: tokens)
         if (isKeyword(token)) {
            storeArguments();
            newExpression(token);
         } else
            arguments.add(token);

      storeArguments();
      return pop();
   }

   private void newExpression(String token) {
      current = createExpression(token);
      if (isProcessingBinaryExpression()) {
         BinaryExpression binary = (BinaryExpression)pop();
         Expression left = pop();
         binary.set(left, current);
         push(binary);
      } else
         push(current);
   }

   private boolean isProcessingBinaryExpression() {
      return expressions.size() == 2;
   }

   private void storeArguments() {
      if (current == null)
         return;
      current.setArgs(arguments);
      arguments = new ArrayList<String>();
   }

   private boolean push(Expression expression) {
      return expressions.add(expression);
   }

   private Expression pop() {
      return expressions.remove(expressions.size() - 1);
   }

   private Expression createExpression(String command) {
      try {
         return expressionTypes.get(command).newInstance();
      } catch (Exception e) {
         throw new RuntimeException(e);
      }
   }

   private boolean isKeyword(String token) {
      return expressionTypes.containsKey(token);
   }
}

Overall, not a bad way to go. I felt comfortable test driving the entire solution, and ended up with a class that manages to isolate some of the complexity that parsing requires. Obviously, there's a lot missing here, particularly error handling, but I'm confident that I could test drive the remainder of these in a fairly straightforward fashion.

About the Author

Jeff Langr is a veteran software developer with over a quarter century of professional software development experience. He's written two books, including Agile Java: Crafting Code With Test-Driven Development (Prentice Hall) in 2005. Jeff has contributed a couple chapters to Uncle Bob Martin's upcoming book, Clean Code. Jeff has written over 75 articles on software development, with over thirty appearing at Developer.com. You can find out more about Jeff at his site, http://langrsoft.com, or you can contact him via email at jeff at langrsoft dot com.

Sitemap | Contact Us

Thanks for your registration, follow us on our social networks to keep up-to-date