Up ’til now, the VoiceXML examples we’ve used have been directed
dialogs, which prompt users for input in a pre-defined order.
In this edition of the VoiceXML Developer, we’re going to learn
how to develop mixed initiative dialogs, which allow users to
fill multiple fields with a single utterance.
Overview
When an <initial> element appears in
a VoiceXML document, the VoiceXML interpreter will execute
it before gathering input for any <field> elements in the document.
The <initial> element utilizes a <form>
level grammar that is defined elsewhere. Otherwise, it can contain prompts and
event handlers, but cannot contain a <filled> element, nor can it contain
its own grammar. Once an utterance matches the form grammar, the VoiceXML
interpreter executes the remainder of the document. Fields that were filled
as a result of the initial user utterance will normally be skipped by the
VoiceXML interpreter.
This technique enables us to create a grammar that is capable
of matching multiple field values in a single utterance and
also allows the user to control the order of the input. A great
example of this is a bank application where a user wants to transfer
$100.00 from their savings to their checking account. In a directed
dialog, the dialog progression is controlled by the computer and
takes multiple prompts to collect all of the information:
Computer: Please say the type of account you would like to transfer the funds from. Customer: savings. Computer: Please say the type of account you would like to transfer the funds to. Customer: checking. Computer: Please say the amount that you would like to transfer. Customer: One hundred dollars. Computer: Transferring one hundred dollars from your savings to your checking account. Is this correct? Customer: yes.
In a mixed initiative dialog, the user could simply tell the
system what to do in a single natural sentence:
Customer: Transfer one hundred dollars from savings to checking. Computer: Transferring one hundred dollars from your savings to your checking account. Is this Correct? Customer: yes.
Wow, that’s powerful. It means less time per call and if done right, will
make your customers happy too.
Example 5
To
test this application, dial the VoiceXML Planet call VoiceXML Planet at 510-315-6666;
press 1 to listen to the demos, then press 5 to hear this example. This example
is a variation of the Pizza Palace example that we developed in Part V of this
series. This time, we’re developing an interface for Frank’s Pizza Palace,
a fierce competitor of Joe’s Pizza Palace. Frank would like to implement a
streamlined version of Joe’s order application and allow customers to tell
the system their order in a more natural way.
The first thing that you should notice is that we’ve defined a form-level grammar
on line 7. The <initial> element on line 8 contains a <prompt>, which
plays the initial prompt for the document and waits for the user to speak. The
system will attempt to match the utterance against PIZZA.grammar#ORDER,
a GSL subgrammar named ORDER contained in the grammar file named
PIZZA.grammar. After the the form grammar matches an utterance, it
may prompt the user for more information if the initial utterance didn’t fill all
three form fields. For example, if I were to say, “I’d like a small”, then the
system would set the pizza_size field value to equal “small”,
and then proceed to prompt me for input for the pizza_type
and pizza_toppings fields.
Ok, let’s take a look at the grammar file. This grammar file is used to not
only fill values for the <initial> element, but also for the other form fields
in the event that the user’s utterance does not match all the fields.
Line 1 contains the ORDER subgrammar, which is set
as the <form> grammar. A customer could say any one of the following
utterances and match all three fields:
There are many more utterances, and many more possibilities that we’ve left out. The point
here is that we can accommodate the many different combinations that customers might provide.
The ORDER subgrammar contains the PIZZA subgrammar,
which begins on line 5 and continues through line 13. This subgrammar is essentialy a listing
of possible combinations, one per line, of how a customer might order their pizza. We’ve only
listed a few possibilities. There would likely be many more. The PIZZA
subgrammar in turn contains the SIZE, TOPPINGS, and
TYPE subgroups. Let’s take a closer look at these three subgrammars.
On line 25 of the TYPE subgrammar, you’ll notice a set of curly
brackets that contain the statement:
<pizza_type $string>
The curly brackets contain the value that the subgrammar will return, and the
statement above assigns $string variable, or the matched string, to the
pizza_type slot. This actually tells the interpreter to assign
the results of the match to the pizza_type form field. This is
how a grammar is able to set field values in a mixed initiated VoiceXML dialog.
You should see similar statements on lines 19 and 30 that fill the values
for the pizza_size and pizza_toppings form
fields.
If the initial utterance does not match all of the form fields, then subsequent
calls to the subgrammars within each of the remaining fields will. Once all
fields have been filled, we play the customer’s order back to them on lines
23-27 of the VoiceXML document.
Conclusion
Mixed initiative dialogs are the heart and soul of next generation voice
dialogs. We will be covering mixed initiative dialogs in more detail in the
future. Thanks again for joining us for another edition of the VoiceXML
Developer Tour Through VoiceXML.
About Jonathan Eisenzopf
Jonathan is a member of the Ferrum Group, LLC based in Reston, Virginia
that specializes in Voice Web consulting and training. He has also written
articles for other online and print publications including WebReference.com
and WDVL.com. Feel free to send an email to eisen@ferrumgroup.com regarding
questions or comments about the VoiceXML Developer series, or for more
information about training and consulting services.