March 2, 2021
Hot Topics:

Building VoiceXML Dialogs

  • By Jeff Kusnitz & Dr. Bruce Lucas
  • Send Email »
  • More Articles »

As mentioned earlier, each field specifies the set of acceptable user responses. Limiting the acceptable responses serves two purposes. First, it allows the responses to be verified and for help in the case of an invalid response to be provided locally without delay of a round-trip over the network to the application server. Second, it is essential to achieving good speech-recognition accuracy - particularly over a relatively low-quality audio channel like a telephone - for the user input to be constrained to particular sets and patterns of words.

In the earlier example, the set of acceptable user inputs is specified implicitly using the "type" attribute ("phone" and "digits" in the example) in the field element. The VoiceXML 2.0 specification defines a number of built-in types that a VoiceXML browser may optionally provide:

  • boolean
  • date
  • digits
  • currency
  • number
  • phone
  • time

In addition to these built-in types, a VoiceXML application can specify its own input types using grammars. A grammar is essentially an enumeration in a compact form of a set of allowable phrases. The following VoiceXML fragment illustrates the use of grammars in an online voice-enabled restaurant application:


        <field name="drink">
            <prompt>What would you like to drink?</prompt>
            <grammar mode="voice" xml:lang="en-US" version="1.0"
                <rule id="drink">
                        <item> coffee </item>
                        <item> tea </item>
                        <item> orange juice </item>
                        <item> milk </item>
                        <item> nothing </item>

        <field name="sandwich">
            <prompt>What sandwich would you like?</prompt>
            <grammar src="sandwiches.grxml"/>

            <submit next="http://www.example.com/servlet/getOrder"/>


The grammars in this example are specified using the W3C Speech Recognition Grammar Specification [SRGS] format. The first grammar is in-line, and it simply identifies a list of words and phrases ("coffee", "tea", and so on) that the user may say in response to the prompt for that field. Surrounding the list of items with a <one-of> element tells the VoiceXML browser that the user can speak only one of these items at a time.

The second grammar is contained in the dile "sandwiches.grxml" and is referenced via a URI:

    <grammar mode="voice" xml:lang="en-US" version="1.0"
        xmlns="http://www.w3.org/2001/06/grammar" root="sandwich">

        <rule id="bread">
                <item> rye </item>
                <item> white </item>
                <item> whole wheat </item>

        <rule id="ingredient">
                <item> ham </item>
                <item> roast beef </item>
                <item> tomato </item>
                <item> lettuce </item>
                     <item> swiss </item>
                     <item repeat="0-1"> cheese </item>

        <rule id="sandwich">
            <ruleref uri="#ingredient"/>
            <item repeat="0-">
                <item repeat="0-1"> and </item>
                <ruleref uri="#ingredient"/>
            <item> on </item>
            <ruleref uri="#bread"/>


This grammar consists of three rules. The first rule, named "bread" is just a list of bread types, similar to "drink" grammar which was place in-line in the form. It allows the user to say either "rye" or "white" or "whole wheat."

The second rule in this grammar, named "ingredient" is also a fairly simple list of items, but one of its items included an optional part. The last item in the rule includes a repeat attribute on the "cheese" item, which makes "cheese" optional. To match this rule, the user can say either "swiss" or "swiss cheese".

The third rule, named "sandwich", specifies that a complete description of a sandwich consists of a series of rule references, to the other rules defined within this grammar. It states that a "sandwich" is made up of at least one ingredient, followed by zero or more additional ingredients optionally separated by the word "and", and ending finally with the word "on" followed by the name of a bread. This rule would accept phrases such as "ham and swiss on rye" and "lettuce and tomato on whole wheat."

Page 2 of 3

This article was originally published on August 16, 2004

Enterprise Development Update

Don't miss an article. Subscribe to our newsletter below.

Thanks for your registration, follow us on our social networks to keep up-to-date