March 5, 2021
Hot Topics:

VoiceXML Developer Series: A Tour Through VoiceXML, Part V

  • By Jonathan Eisenzopf
  • Send Email »
  • More Articles »

Keyword grammars

Let's take a look at the inline grammar on lines 41-47 first. This is probably the simplist form of a grammar. It contains three words, each representing a different selection. The ASR will attempt to recognize one of these three words after the prompt is played on line 48. If one of the words was not recognized or if the user didn't say anything, the <catch> element on lines 49-51 will tell the user that there was a problem and play the prompt again until the user says one of the options, small, medium, or large. Once the user provides valid input, the <filled> element for the size field is executed on lines 52-56. Notice that this grammar only contains single words rather than phrases.

Phrase grammars

The second inline grammar on lines 23-29 works within the scope of the pizza_type form field and will recognize one of three phrases only:

  • hand tossed
  • deep dish
  • stuffed crust

The three phrases are surrounded by parentheses. This indicates that all words inside the parenthesis must be spoken for a match to occur. We can specify optional words in the phrase by pre-pending them with a ? character. For example, to make hand, deep, and crust optional, we would change the grammar so it looked like the following:

       ( ?hand tossed )
       ( ?deep dish )
       ( stuffed ?crust )

So if the user just said "tossed", we would match hand tossed. We can add alternatives for each selection as well. For example, someone might say "Chicago" instead of deep dish. We might also want to allow someone to specify hand thrown or hand stretched as alternatives to hand tossed. We can do this by specifying the options inside a set of square brackets.

       ( ?hand [tossed stretched thrown] ) 
       ( ?deep [dish chicago] )
       ( stuffed ?crust ) 


Now we're going to take a look at the external grammar that we reference on line 10, which is used to recognize the user's phone number. This particular grammar is made up of several subgrammars that recognize the area code, exchange (the first 3 digits of the local phone number), and the last four digits of the phone number. These subgrammars, or phone number parts, are referenced in the PHONE grammar on lines 1-6. This grammar is listed below. The PHONE matches a number when the AREA_CODE, EXCHANGE, and NUMBER grammars are matched in that order since they're inside a set of parentheses, which require that all elements of the grammar match. Line 6 concatenates the three phone number components together as a single number and returns the number to the field, which uses the number as the value for the phone. Notice that each subgrammar called on lines 2-4 include a colon and second string, which names a local variable to store the results of the subgrammar. For example, one line 2, we call the AREA_CODE subgrammar and store the resulting number that was matched in the $area variable. These variables are referenced later on line 6, which returns the phone number. Line 6 utilizes the strcat() function to piece the numbers into one number. The strcat() function takes two parameters, the second of which will be concatenated to the first. To concatenate all three number segments, we join $exchange and $number in an inner strcat() function call with an outer call, which joins the results of the inner call with $area.

The AREA_CODE grammar on lines 8-13 is made up of exactly three DIGITs. The DIGIT grammar on lines 30-41 consists of a single number, zero through nine. Zero can either be pronounced zero or oh. Similarly, the EXCHANGE grammar is made up of three DIGITs, while the NUMBER grammar is made up of four DIGITs.

1  PHONE [
2     ( AREA_CODE:area
3       EXCHANGE:exchange
4       NUMBER:number
5     )
6  ] { return(strcat($area strcat($exchange $number))) }
9    ( DIGIT:a
10      DIGIT:b
11      DIGIT:c
12    ) { return(strcat($a strcat($b $c))) }
13  ]
16    ( DIGIT:a
17      DIGIT:b
18      DIGIT:c
19    ) { return(strcat($a strcat($b $c))) }
20  ]
22  NUMBER [
23    ( DIGIT:a
24      DIGIT:b
25      DIGIT:c
26      DIGIT:d
27    ) { return(strcat(strcat($a $b) strcat($c $d))) }
28  ]
30  DIGIT [
31    [zero oh] {return(0)}
32    one   {return(1)}
33    two   {return(2)}
34    three {return(3)}
35    four  {return(4)}
36    five  {return(5)}
37    six   {return(6)}
38    seven {return(7)}
39    eight {return(8)}
40    nine  {return(9)}
41  ]

As you can see from the example above, more complex grammars are made up of subgrammars, which may subsequently call on other subgrammars, so that we can match any form of speech by breaking the possibilities down into their most elementary components. You might also be surprised at how large our grammar turned out to be for a simple phone number. In fact, dealing with numbers can be alot more difficult than dealing with words.

Lists in grammars

In the grammar referenced on line 59, we must be able to match one or more toppings without knowing exactly how many topics the user will select. What we do know is what the available topping are. Fortunately, GSL includes a number of builtin list operators to make this requirement possible.

2    +( TOPPING:topping {insert-end(list $topping)} )
3  ] {return($list)}
6    (?and pepperoni)
7    (?and olives)
8    (?and green peppers)
9    (?and mushrooms)
10   (?and pineapple)
11   (?and anchovies)
12 ] {return($string)}

The TOPPINGS grammar above begins with a + sign outside of a set of parenthesis. What this means is match one or more occurences of the TOPPING grammar. The second part of line 2 calls the builtin insert-end function, which adds the new topping that was matched in the TOPPING grammar to the list of toppings that will be returned to the toppings form field in the VoiceXML document.

The TOPPING grammar on lines 5-12 contains our toppings selections: pepperoni, olives, green peppers, mushrooms, pineapple, and anchovies. We're also expecting that the user might separate their selections with the word and, which has been flagged as an optional word by pre-pending it with a ? character. That concludes our exploration of GSL grammars for now.


I want to reflect on some of the things that I've learned as I've been developing new VoiceXML applications over the past year as it relates to grammars. First, grammars can be difficult to develop and time consuming to tune. And things don't stop there. You will probably need to tune the dictionary that the system is using to include alternate word pronunciations as you begin to collect data on where the ASR application is failing. It's very important that the application will be able to recognize what the user is saying most of the time. Because DTMF input is almost 100% accurate, it should be preferred over speech for things like phone and credit card numbers. However, some voice interface designers recommend that you don't mix a touch-tone input with speech input. I'd say it's better than the alternative if you are having problems recognizing number sequences. Remember, speech recognition has gotten much better, but it still takes a great deal of work and care to reach the high 90s percentile success rates that vendors often mention. Thanks again for joining us for another edition of the VoiceXML Developer. In the next edition of the VoiceXML Developer, we will continue our exploration into grammars as part of our tour of the VoiceXML 1.0 specification. And don't forget to send me feedback on this series. I'd like to know how I'm doing and how I can improve this column. You can send feedback directly to eisen@ferrumgroup.com. Until next time.

About Jonathan Eisenzopf

Jonathan is a member of the Ferrum Group, LLC based in Reston, Virginia that specializes in Voice Web consulting and training. He has also written articles for other online and print publications including WebReference.com and WDVL.com. Feel free to send an email to eisen@ferrumgroup.com regarding questions or comments about the VoiceXML Developer series, or for more information about training and consulting services.

Page 2 of 2

This article was originally published on October 5, 2002

Enterprise Development Update

Don't miss an article. Subscribe to our newsletter below.

Thanks for your registration, follow us on our social networks to keep up-to-date