Voice VoiceXML Developer Series: A Tour Through VoiceXML, Part XII

VoiceXML Developer Series: A Tour Through VoiceXML, Part XII

In this edition of the series, we complete the first version
of Frank’s Pizza Palace application by developing the remaining VoiceXML dialogs.

Overview


Last time, we developed the first three dialogs in our
application. Now it’s time to complete the rest of the dialogs and
begin testing our application.


The first three dialogs were main.vxml,
telephone_number.vxml, and validate_phone_number.vxml. These dialogs
played a greeting for the user, prompted them for their phone
number, and looked up their address in the Access database
respectively.


Assuming that the user did indeed have a record in the database
and confirmed that it was correct, the dialog transitions to
take_order.vxml.


take_order.vxml


Now it’s time to take the customer’s order (view source).
This VoiceXML dialog is similar to the pizza ordering application we
developed in an earlier edition of this series. There are some things that
have changed however.
The customer’s phone number
has been stored in the application root as
application.phone_number
through a previous dialog. I’ve also
added a <property> element on line 4. VoiceXML properties
provide various controls on how a VoiceXML dialog functions. This
particular property sets the minimum confidence level that the ASR
must achieve to successfully recognize an utterance. Values
can range from 0.1 to 1, whose value represents a percentage
from 10% to 100%. A value of 1 tells the ASR that it must be 100%
confident that it has recognized an utterance. Most ASRs will
be set to 0.5 (or 50%) by default. I have lowered the default to 30%
so that the ASR will not fail because of false negatives. I chose
this value after testing the grammar for a while in Nuance V-Builder. I found that while the confidence level fell below 50%
when there was background noise, the ASR still produced accurate
results the majority of the time.



This is a mixed initiative dialog, meaning that a user can fill in multiple
fields with a single utterance. The <initial> element on
lines 7 through
11 provide
this functionality. This section
of the application will execute first and try to
match the grammar referenced on line 6. I’ve made some
significant changes to the PIZZA subgrammar in the PIZZA.grammar file since
then (view source). The reason for the change has
to do with the many variations that a customer might use to
order a pizza. After coding and testing about 20 additional
variances, I realized that it was ripe for consolidation using
positive (+) and kleene (*) operators. A positive closer will
match one or more occurrences of the phrase that is located to the
right of the operator. The kleene closer will match zero or more occurrences
of the phrase to its right.


Line 6 is listed below:

+([SIZE TYPE TOPPINGS] *[pizza with])

The + (or positive closer) operator enables this subgrammar to
match numerous variations of a pizza order. A customer can start with
pizza size, type, or toppings, optionally followed by the words pizza
and/or with. This grammar will match any of the following utterances:

  • small hand tossed pepperoni pizza
  • deep dish large mushroom and pepperoni pizza
  • small pepperoni
  • pizza with olives and mushrooms
  • The number of possible utterances that this grammar will match
    is too high to count (for me at least). One of the side-effects
    of this more open grammar is that ASR confidence for matches went
    down from 60%-85% to as low as 40%. The rate of incorrect matches
    also rose in some cases where I was not speaking directly into
    the microphone or did not speak clearly. After lowering the confidence
    property and tuning the grammar a bit, I decided that the greater
    breath of possibilities was worth the tradeoff. Of course, if the
    grammar only matches a few of the form fields, the application
    can prompt for the unfilled values separately. In cases where grammars
    start becoming more dynamic, it may be necessary to process the matched
    text to see if the ASR actually provided a false match. This requires
    some fancy text processing and/or natural language processing
    techniques, which we’ll save for another time.

    Yet another difference is the fact that we are using pre-recorded
    prompts instead of synthesized speech from the TTS engine. This really
    enhances the quality and usability of the application.

    Once we’ve filled all the fields in the form, the input is
    sent to the save_order.asp script.

    save_order.asp

    This PerlScript ASP file (view source)
    is responsible for taking the form field values passed
    from the take_order.vxml dialog and saving them to the PizzaOrder table in the Access database. If an error occured while saving the record,
    we transfer the caller to an operator on line 37. Lines 7 and 8 open a connection to
    the Access database. Lines 11 through 14 retrieve the form field values from the ASP
    Request object. Remember, we process VoiceXML forms on the backend the same way we do HTML forms.
    Lines 15 through 25 convert the phone number from words to numbers and strip out
    any extraneous text.

    Now that we have a connection to the database and have retrieved our form data,
    it’s time to build the SQL string that will save the data to the PizzaOrder table.
    Line 28 builds the INSERT SQL statement that is sent to Access with the ADO
    Execute method on line 29. Note that the syntax for the SQL statement may differ if you
    decide to use a database other than Access.

    Line 32 tests the results of the Execute command. If an error occurs,
    we will output an error message and transfer the caller to a live operator.
    Otherwise, we will thank the customer for their order and end the call.

    save_address.asp is now upload_audio.pl


    Now let’s step back to the
    validate_phone_number.asp script. If you recall, if
    we did not find a customer’s address for a given phone number,
    or if the customer rejects the address on file, they are taken
    to the record_address form to record their address.
    This information is saved in a variable named
    AddressAudio and submitted to save_address.asp.


    One of the strange annoyances of ASP is its innability to handle
    multipart form submittions, which are used to upload files from an
    HTML form. You would have thought Microsoft would have added this
    feature after 3 versions of ASP. They did finally add better support
    in ASP.NET, but this application is being developed in ASP 3
    using PerlScript. I had hoped to find a reliable Perl script to
    handle binary files in ASP, but gave up after trying a few examples
    that only seemed to half work. Instead, I decided to fall back to a
    plain old, but very reliable, Perl CGI script to handle the uploaded
    text. To do this, I changed the next attribute in the
    validate_phone_number.asp
    (updated source) script
    to point to upload_audio.pl (view source)
    instead of save_address.asp. This script saves the address
    audio recording to a file named by
    the phone number that it’s associated with. Once the audio file has
    been saved, the script transitions to take_order.vxml
    . The updated validate_phone_number.asp file also creates
    a new record in the Customers table on lines 39 and 40 if one does not
    already exist.

    Conclusion

    So now we have a complete application that utilizes many difference aspects
    of the VoiceXML language. I hope you’ve learned a lot about VoiceXML in this series,
    and I hope you keep coming back for more as we delve ever deeper into developing
    VoiceXML applications. If you have followed this series all the way through, please
    take some time and send me feedback to let me know that this series was of benefit to you
    or where you think it can be improved. I’m also working on getting the demo operational so that
    you can test it over the telephone if it happens that you are not able to test it
    on your own.


    About Jonathan Eisenzopf


    Jonathan is a member of the Ferrum Group, LLC based in Reston,
    Virginia that specializes in Voice Web consulting and training. He
    has also written articles for other online and print publications
    including WebReference.com
    and WDVL.com. Feel free to send an
    email to [email protected]
    regarding questions or comments about the VoiceXML Developer series,
    or for more information about training and consulting
    services.

Latest Posts

Related Stories