VoiceVoiceXML Developer Series: A Tour Through VoiceXML, Part VIII

VoiceXML Developer Series: A Tour Through VoiceXML, Part VIII

Developer.com content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

Up ’til now, the VoiceXML examples we’ve used have been directed
dialogs, which prompt users for input in a pre-defined order.
In this edition of the VoiceXML Developer, we’re going to learn
how to develop mixed initiative dialogs, which allow users to
fill multiple fields with a single utterance.


When an <initial> element appears in
a VoiceXML document, the VoiceXML interpreter will execute
it before gathering input for any <field> elements in the document.
The <initial> element utilizes a <form>
level grammar that is defined elsewhere. Otherwise, it can contain prompts and
event handlers, but cannot contain a <filled> element, nor can it contain
its own grammar. Once an utterance matches the form grammar, the VoiceXML
interpreter executes the remainder of the document. Fields that were filled
as a result of the initial user utterance will normally be skipped by the
VoiceXML interpreter.

This technique enables us to create a grammar that is capable
of matching multiple field values in a single utterance and
also allows the user to control the order of the input. A great
example of this is a bank application where a user wants to transfer
$100.00 from their savings to their checking account. In a directed
dialog, the dialog progression is controlled by the computer and
takes multiple prompts to collect all of the information:

Computer: Please say the type of account you 
would like to transfer the funds from.
Customer: savings.
Computer: Please say the type of account you would 
like to transfer the funds to.
Customer: checking.
Computer: Please say the amount that you would like 
to transfer.
Customer: One hundred dollars.
Computer: Transferring one hundred dollars from your 
savings to your checking account. Is this correct?
Customer: yes.

In a mixed initiative dialog, the user could simply tell the
system what to do in a single natural sentence:

Customer: Transfer one hundred dollars from savings 
to checking.
Computer: Transferring one hundred dollars from your 
savings to your checking account. Is this Correct?
Customer: yes.

Wow, that’s powerful. It means less time per call and if done right, will
make your customers happy too.

Example 5

test this application, dial the VoiceXML Planet call VoiceXML Planet at 510-315-6666;
press 1 to listen to the demos, then press 5 to hear this example. This example
is a variation of the Pizza Palace example that we developed in Part V of this
series. This time, we’re developing an interface for Frank’s Pizza Palace,
a fierce competitor of Joe’s Pizza Palace. Frank would like to implement a
streamlined version of Joe’s order application and allow customers to tell
the system their order in a more natural way.

view example 5

The first thing that you should notice is that we’ve defined a form-level grammar
on line 7. The <initial> element on line 8 contains a <prompt>, which
plays the initial prompt for the document and waits for the user to speak. The
system will attempt to match the utterance against PIZZA.grammar#ORDER,
a GSL subgrammar named ORDER contained in the grammar file named
PIZZA.grammar. After the the form grammar matches an utterance, it
may prompt the user for more information if the initial utterance didn’t fill all
three form fields. For example, if I were to say, “I’d like a small”, then the
system would set the pizza_size field value to equal “small”,
and then proceed to prompt me for input for the pizza_type
and pizza_toppings fields.

Ok, let’s take a look at the grammar file. This grammar file is used to not
only fill values for the <initial> element, but also for the other form fields
in the event that the user’s utterance does not match all the fields.

view the grammar file

Line 1 contains the ORDER subgrammar, which is set
as the <form> grammar. A customer could say any one of the following
utterances and match all three fields:

  • I’d like a large hand tossed pizza with pepperoni and mushrooms.
  • I would like a small pizza deep dish olives.
  • Small mushroom pizza stuffed crust.
  • There are many more utterances, and many more possibilities that we’ve left out. The point
    here is that we can accommodate the many different combinations that customers might provide.
    The ORDER subgrammar contains the PIZZA subgrammar,
    which begins on line 5 and continues through line 13. This subgrammar is essentialy a listing
    of possible combinations, one per line, of how a customer might order their pizza. We’ve only
    listed a few possibilities. There would likely be many more. The PIZZA
    subgrammar in turn contains the SIZE, TOPPINGS, and
    TYPE subgroups. Let’s take a closer look at these three subgrammars.
    On line 25 of the TYPE subgrammar, you’ll notice a set of curly
    brackets that contain the statement:

    <pizza_type $string>

    The curly brackets contain the value that the subgrammar will return, and the
    statement above assigns $string variable, or the matched string, to the
    pizza_type slot. This actually tells the interpreter to assign
    the results of the match to the pizza_type form field. This is
    how a grammar is able to set field values in a mixed initiated VoiceXML dialog.
    You should see similar statements on lines 19 and 30 that fill the values
    for the pizza_size and pizza_toppings form

    If the initial utterance does not match all of the form fields, then subsequent
    calls to the subgrammars within each of the remaining fields will. Once all
    fields have been filled, we play the customer’s order back to them on lines
    23-27 of the VoiceXML document.


    Mixed initiative dialogs are the heart and soul of next generation voice
    dialogs. We will be covering mixed initiative dialogs in more detail in the
    future. Thanks again for joining us for another edition of the VoiceXML
    Developer Tour Through VoiceXML.

    About Jonathan Eisenzopf

    Jonathan is a member of the Ferrum Group, LLC based in Reston, Virginia
    that specializes in Voice Web consulting and training. He has also written
    articles for other online and print publications including WebReference.com
    and WDVL.com. Feel free to send an email to eisen@ferrumgroup.com regarding
    questions or comments about the VoiceXML Developer series, or for more
    information about training and consulting services.

    Get the Free Newsletter!

    Subscribe to Developer Insider for top news, trends & analysis

    Latest Posts

    Related Stories