In this edition of the series, we complete the first version
of Frank’s Pizza Palace application by developing the remaining VoiceXML dialogs.
Last time, we developed the first three dialogs in our
application. Now it’s time to complete the rest of the dialogs and
begin testing our application.
The first three dialogs were main.vxml,
telephone_number.vxml, and validate_phone_number.vxml. These dialogs
played a greeting for the user, prompted them for their phone
number, and looked up their address in the Access database
Assuming that the user did indeed have a record in the database
and confirmed that it was correct, the dialog transitions to
Now it’s time to take the customer’s order (view source).
This VoiceXML dialog is similar to the pizza ordering application we
developed in an earlier edition of this series. There are some things that
have changed however.
The customer’s phone number
has been stored in the application root as
through a previous dialog. I’ve also
added a <property> element on line 4. VoiceXML properties
provide various controls on how a VoiceXML dialog functions. This
particular property sets the minimum confidence level that the ASR
must achieve to successfully recognize an utterance. Values
can range from 0.1 to 1, whose value represents a percentage
from 10% to 100%. A value of 1 tells the ASR that it must be 100%
confident that it has recognized an utterance. Most ASRs will
be set to 0.5 (or 50%) by default. I have lowered the default to 30%
so that the ASR will not fail because of false negatives. I chose
this value after testing the grammar for a while in Nuance V-Builder. I found that while the confidence level fell below 50%
when there was background noise, the ASR still produced accurate
results the majority of the time.
This is a mixed initiative dialog, meaning that a user can fill in multiple
fields with a single utterance. The <initial> element on
lines 7 through
this functionality. This section
of the application will execute first and try to
match the grammar referenced on line 6. I’ve made some
significant changes to the PIZZA subgrammar in the PIZZA.grammar file since
then (view source). The reason for the change has
to do with the many variations that a customer might use to
order a pizza. After coding and testing about 20 additional
variances, I realized that it was ripe for consolidation using
positive (+) and kleene (*) operators. A positive closer will
match one or more occurrences of the phrase that is located to the
right of the operator. The kleene closer will match zero or more occurrences
of the phrase to its right.
Line 6 is listed below:
+([SIZE TYPE TOPPINGS] *[pizza with])
The + (or positive closer) operator enables this subgrammar to
match numerous variations of a pizza order. A customer can start with
pizza size, type, or toppings, optionally followed by the words pizza
and/or with. This grammar will match any of the following utterances:
- small hand tossed pepperoni pizza
- deep dish large mushroom and pepperoni pizza
- small pepperoni
- pizza with olives and mushrooms
The number of possible utterances that this grammar will match
is too high to count (for me at least). One of the side-effects
of this more open grammar is that ASR confidence for matches went
down from 60%-85% to as low as 40%. The rate of incorrect matches
also rose in some cases where I was not speaking directly into
the microphone or did not speak clearly. After lowering the confidence
property and tuning the grammar a bit, I decided that the greater
breath of possibilities was worth the tradeoff. Of course, if the
grammar only matches a few of the form fields, the application
can prompt for the unfilled values separately. In cases where grammars
start becoming more dynamic, it may be necessary to process the matched
text to see if the ASR actually provided a false match. This requires
some fancy text processing and/or natural language processing
techniques, which we’ll save for another time.
Yet another difference is the fact that we are using pre-recorded
prompts instead of synthesized speech from the TTS engine. This really
enhances the quality and usability of the application.
Once we’ve filled all the fields in the form, the input is
sent to the save_order.asp script.
This PerlScript ASP file (view source)
is responsible for taking the form field values passed
from the take_order.vxml dialog and saving them to the PizzaOrder table in the Access database. If an error occured while saving the record,
we transfer the caller to an operator on line 37. Lines 7 and 8 open a connection to
the Access database. Lines 11 through 14 retrieve the form field values from the ASP
Request object. Remember, we process VoiceXML forms on the backend the same way we do HTML forms.
Lines 15 through 25 convert the phone number from words to numbers and strip out
any extraneous text.
Now that we have a connection to the database and have retrieved our form data,
it’s time to build the SQL string that will save the data to the PizzaOrder table.
Line 28 builds the INSERT SQL statement that is sent to Access with the ADO
Execute method on line 29. Note that the syntax for the SQL statement may differ if you
decide to use a database other than Access.
Line 32 tests the results of the Execute command. If an error occurs,
we will output an error message and transfer the caller to a live operator.
Otherwise, we will thank the customer for their order and end the call.
save_address.asp is now upload_audio.pl
Now let’s step back to the
validate_phone_number.asp script. If you recall, if
we did not find a customer’s address for a given phone number,
or if the customer rejects the address on file, they are taken
to the record_address form to record their address.
This information is saved in a variable named
AddressAudio and submitted to save_address.asp.
One of the strange annoyances of ASP is its innability to handle
multipart form submittions, which are used to upload files from an
HTML form. You would have thought Microsoft would have added this
feature after 3 versions of ASP. They did finally add better support
in ASP.NET, but this application is being developed in ASP 3
using PerlScript. I had hoped to find a reliable Perl script to
handle binary files in ASP, but gave up after trying a few examples
that only seemed to half work. Instead, I decided to fall back to a
plain old, but very reliable, Perl CGI script to handle the uploaded
text. To do this, I changed the next attribute in the
(updated source) script
to point to upload_audio.pl (view source)
instead of save_address.asp. This script saves the address
audio recording to a file named by
the phone number that it’s associated with. Once the audio file has
been saved, the script transitions to take_order.vxml
. The updated validate_phone_number.asp file also creates
a new record in the Customers table on lines 39 and 40 if one does not
So now we have a complete application that utilizes many difference aspects
of the VoiceXML language. I hope you’ve learned a lot about VoiceXML in this series,
and I hope you keep coming back for more as we delve ever deeper into developing
VoiceXML applications. If you have followed this series all the way through, please
take some time and send me feedback to let me know that this series was of benefit to you
or where you think it can be improved. I’m also working on getting the demo operational so that
you can test it over the telephone if it happens that you are not able to test it
on your own.
About Jonathan Eisenzopf
Jonathan is a member of the Ferrum Group, LLC based in Reston,
Virginia that specializes in Voice Web consulting and training. He
has also written articles for other online and print publications
and WDVL.com. Feel free to send an
email to [email protected]
regarding questions or comments about the VoiceXML Developer series,
or for more information about training and consulting