In the last edition of the VoiceXML Developer, we created a pizza
pie ordering system for Joe’s Pizza Palace, which utilized the GSL
grammar format. In this edition, we’re going to continue our focus on
grammars by examining the other widely used VoiceXML 1.0 grammar
format, JSFG.
Overview
JSGF stands for Java Speech Grammar Format and was developed by Sun
Microsystems. While the top three voice portal providers, Tellme,
BeVocal, and Voxeo, all use the Nuance GSL format, the IBM Voice
Server uses JSGF for grammars. As with GSL, VoiceXML can refer to
a grammar in an external
file or specify the grammar in the VoiceXML document inside the
<grammar> element. In fact, both GSL and JSGF are rule based
grammars. The basic syntax of a JSGF grammar rule is:
<rule> = token_string;
where the rule name is surrounded by < and > characters
and the tokens representing the input to match is contained on the
right hand side of an equal sign followed by a semi-colon.
Internal Grammars
You can directly embed JSGF grammars within the <grammar> element.
Below is an example of an inline grammar within a VoiceXML document that will match
the utterance, “I like pie”:
<grammar type="text/jsgf">I like pie</grammar>
As with GSL, when including a grammar within a form <field>
element, the grammar will return the match, and set the value for the field.
<?xml version="1.0"?> <vxml version="1.0"> <form id="clown"> <block name="welcome">Hi, I'm commie the clown.</block> <field modal="false" name="greeting"> <grammar type="text/jsgf">hi | hello | howdy | yo</grammar> </field> </form> </vxml>
File grammars
A JSGF file starts with
a JSGF declaration followed by the name of the grammar in the file. The file
can contain multiple rules.
#JSGF V1.0 grammar greeting; <greeting> = hi | hello | howdy | yo;
The example above contains the same grammar as the previous inline grammar example
except for the fact that it is contained in an external file. The modified
VoiceXML file, which refers to the external grammar, is below:
<?xml version="1.0"?> <vxml version="1.0"> <form id="clown"> <block name="welcome">Hi, I'm commie the clown.</block> <field modal="false" name="greeting"> <grammar type="application/x-jsgf" src="greeting.jsgf" /> </field> </form> </vxml>
Using JSGF lists
We can define a list of selections by separating them with the
| character.
<grammar type="text/jsgf">small | medium | large</grammar>
In the example above, the grammar will match small, medium, or large. We
can also make a word optional by surrounding it with a pair of square
brackets:
<grammar type="text/jsgf">small | medium | [real] large</grammar>
The example above will match, small, medium, large, or real large. If we were
to reuse the Joe’s Pizza
Palace example from the last article, and the grammar above was to match
the pizza size, we may want to allow users to provider alternate utterances
for the choices. We can do this by grouping a token with a set of parenthesis.
<grammar type="text/jsgf"> (small|little)|(medium|regular)|[real](large|big) </grammar>
Now a customer can say little or small, medium or regular, and large or big
or real large or real big.
Grammar rules may contain other grammar rules
A grammar can be made up of other grammar rules, which allows us to create
complex grammars by building larger grammar rules that are based on different
rule subsets. For example, a grammar that matches a pizza order that’s contained
in a external grammar is below:
1 #JSGF V1.0 2 grammar phone; 3 <order> = <pizzaSize> <pizzaType> [pizza] [with] <topping>+; 4 <pizzaSize> = small | medium | large; 5 <pizzaType> = [hand] (tossed | stretched | thrown) 6 | [deep] (dish | chicago) 7 | stuffed [crust]; 8 <topping> = [and] pepperoni 9 | [and] olives 10 | [and] green peppers 11 | [and] mushrooms 12 | [and] pineapple 13 | [and] anchovies;
The grammar above contains 4 grammar rules. The <order>
rule is composed of several other rules that exist in the same JSGF file. These
are <pizzaStyle>, <pizzaType>,
and <topping>. A sample utterance that would match this grammar
is listed below:
I would like to order a small deep dish pizza with olives, and pepperoni, and anchovies.
So a JSGF grammar can contain multiple
words and phrases represented by subgrammars that, when combined, can match a complex
utterance. The grammar above also contains an operator we haven’t talked about
yet. The last part of the <order> grammar rule uses
the <topping> subgrammar followed by a +
character. In GSL, we used the same character to match one or more toppings,
though it was placed before the subgrammar. Placing a +
character after a word, phrase, grouping, or subgrammar, tells the speech
recognition engine to look for one or more occurrences of the grammar. In
this case, we need to match the list of toppings that the customer
would like on their pizza.
Below is a simple VoiceXML fragment that uses the external JSGF grammar
listed above.
1 <?xml version="1.0"?> 2 <vxml version="1.0"> 3 <form id="pizzaOrder"> 4 <block>Hello, thank you for calling Joe's Pizza palace. May I take your order?</block> 5 <field name="order"> 6 <grammar src="pizzaOrder.jsgf" type="application/x-jsgf" /> 7 </field> 8 </form> 9 </vxml>
Conclusion
The JSGF format is, in my personal opinion, easier to work with for developing
simple grammars. However, I prefer GSL, because it is more widely supported
and contains more features for building large and
complex grammars. That’s not to say that you can’t do the same with JSGF. IBM’s
platform is more than capable of performing the same interactions as a platform
from Nuance or Speechworks. In the next
About Jonathan Eisenzopf
Jonathan is a member of the Ferrum Group, LLC based in Reston, Virginia
that specializes in Voice Web consulting and training. He has also written
articles for other online and print publications including WebReference.com
and WDVL.com. Feel free to send an email to eisen@ferrumgroup.com regarding
questions or comments about the VoiceXML Developer series, or for more
information about training and consulting services.