VoiceVoiceXML Developer Series: A Tour Through VoiceXML, Part XI

VoiceXML Developer Series: A Tour Through VoiceXML, Part XI

In this edition of the VoiceXML tour, we will develop
the first three dialogs that will play a greeting, ask for a phone number, and
look up the customer’s address in the Access database.


Overview


Well, we’ve completed our design work for the pizza
ordering application, so it’s time to get started with developing the static and
dynamic VoiceXML application. I’ve chosen to use IIS and ASP to serve up the
dynamic content for this application. The ASP language I’ve chosen to use is
PerlScript, which provides more advanced text process features than other
languages. These features will come in handy as you’ll see later on. To develop
the VoiceXML content, I used Nuance V-Builder to quickly prototype the dialogs
based upon our design, and then used notepad to add the ASP code into the
VoiceXML dialog files.


Before we dive into the code, let’s revisit our design.
We have identified six VoiceXML files and scripts that we need to develop. Each
file contains zero or more dialogs. We also have detailed dialog flow diagrams
for each, which will help us prototype and develop the VoiceXML content. You
should probably take another look at the high level
architecture
diagram again as a reference. We will be developing the first
three files in the diagram. These are main.vxml, telephone_number.vxml, and validate_phone_number.asp. As you can imagine, the
last file will contain our PerlScript ASP code.


main.vxml


This file will be executed first when customers call
into the system. In addition to playing the greeting, it will store a number of
application-level variables that we will refer to throughout the application.
That means that main.vxml will be our application root, which all other VoiceXML
files will refer to.


When a customer calls, they will hear, “Thank you for
calling Frank’s Pizza Palace”.

1  <?xml version="1.0"?>
2 <vxml version="1.0">
3 <var name="phone_number" />
4 <var name="address" />
5 <form id="greeting">
6 <block name="play_greeting">
7 <prompt bargein="false">
8 <audio src="../prompts/greeting.wav" />
9 </prompt>
10 <goto next="telephone_number.vxml" />
11 </block>
12 </form>
13 </vxml>

The very first thing we do on lines 3 and 4 is
initialize the phone number and address variables. Other dialogs will set these
values. Keeping them at an application level makes the values available to all
the other dialogs. Line 8 plays our pre-recorded greeting and line 10
transitions to the next dialog, telephone_number.vxml .


telephone_number.vxml


This dialog will capture and confirm the customer's
phone number and submit it to validate_phone_number.asp for validation. In the
original dialog flow diagram, I had split this file into two separate forms;
telephone_number and confirm_phone_number. I decided that it would be
easier to roll all of the logic into a single form, so I eliminated the confirm_phone_number form. The dialog below
also refers to an external grammar on line 5 named PHONE.grammar which uses the PHONE rule. According to the type attribute, this is a GSL grammar, which means
it should work on most Voice ASP platforms. This is the same grammar that was
used in a previous example.

view example 2

Next, we play the prompt on line 7, "May I have your
phone number please", which has been pre-recorded and is contained in the
phone_number.wav file. The resulting
utterance will fill the phone_number
field. When the field is filled, it executes the assign statement on line 10, which assigned the
phone number as an application variable, also named phone_number. This will make this value available
to all other dialogs in the application that specify main.vxml as the application root in the application attribute of the <vxml> element (see line 2).


Next, on lines 13 through 19, we play back the number
that the ASR system recognized and which is now assigned as the application
variable phone_number. Users will hear, "I
heard xxxxxxxxxx. Is this correct?" where xxxxxxxxxx is the phone number that
was recognized.


To catch the user's response to this yes or no question,
we transition into a <subdialog>
element on line 20, which refers to the yes_or_no.vxml dialog file and set the return
variable that will be set as confirm. This
subdialog is actually used in several places, so I've included it below:

1  <?xml version="1.0"?>
2 <vxml version="1.0">
3 <form id="confirm">
4 <field name="answer" type="boolean?y=1;n=2">
5 <filled mode="any">
6 <return namelist="answer" />
7 </filled>
8 </field>
9 </form>
10 </vxml>

Line 4 of the yes_or_no.vxml dialog above sets the field
type to boolean and can accept a verbal "yes" or "no" answer as well as DTMF 1
for yes and DTMF 2 for no. Once the user has answered yes or no, the value is
returned back to the caller on line 6.


Now, back in the main dialog above, when yes_or_no.vxml returns a value, we test the value
to see whether it's true or false on line 22. if it is true, this means the
customer said yes or pressed 1 on their phone keypad. In either case, we submit
the phone number to the validate_phone_number.asp script with the <submit> element on line 23. If the
user says no or presses 2, then we return them back to the top of the dialog on
line 25, where they are prompted for their phone number again.

validate_phone_number.vxml


This dialog queries the Access database for an address
matching the phone number that the user entered and confirmed in the telephone_number.vxml dialog. If we find a
matching record, we transition to the confirm_address form, otherwise, we transition to
the record_address form, which prompts the
user to say their address.


This VoiceXML file is unlike the others we've develop so
far, because it intermingles VoiceXML with PerlScript ASP code. On line 3, we
specify that our ASP language in this script will be PerlScript. On lines 7 and
8, we create a new instance of an ADO object, and connect to the access database
using the Microsoft Access database driver. On line 11, we retrieve the phone_number variable from the Request object. If you are unfamiliar with ASP,
the Request object contains all of the field values that were
submitted from a form. In our case, the phone_number field was submitted to this script by
the telephone_number.vxml form.


Lines 12 through 22 convert the phone number, which may
have been passed as a set of words rather than digits, is converted to numbers. Perl is
well known for its regular expression and text processing capabilities, so it's
well suited for creating complex VoiceXML applications, which require many
different types of text and language processing from grammars and prompts to
parsing input and output values. In fact, on line 36, you'll see another Perl
regular expression that searches for a number in the address variable and
encloses it in a <sayas> element so
that the TTS engine will pronounce the number portion of the address as digits
rather than a large number.


Lines 25-27 execute the select statement that searches
for an address record in the database that matches a phone number. Lines 30-39
contain an if/else conditional expression that basically says, if we have a
matching address, create a string that will assign the value to
the application.address variable and
then transition the user to the confirm_address form, otherwise, transition to the
record_address form. In other words, if we
find an address, we want to have the user confirm that it is the correct
address. If we do not have an address on file, we want to have the customer tell
us their address so we can save it for next time. Line 40 ends the main block of
PerlScript code.

view example 3

Line 43 prints the value of the $string
variable, which controls which form the user will be trasitioned to. If the
customer has an address, they are transitioned to line 47, where we prompt the
user (lines 49-53) to confirm their address, i.e. "I have your address as,
555 green wood drive. Is this correct?". We re-used the
yes_or_no.vxml subdialog here, just as we did in
telephone_number.vxml. If the use confirms their
address, we transition to the take_order.vxml dialog, which
will prompt the customer for their order. If they do not confirm their address,
then we transition the user on line 60 to the record_address form starting on line 66.

The user will be sent to the record_address form if they say
no when asked if their address is correct, or if the customer does not have an
address on record. Either way, we need an address for the customer. Since we
cannot accurately recognize a full address, we have to record it to a wav file
with the <record> element on line 68. The audio content
is submitted to the save_address.asp script where their
database record is created or updated, and the audio file is saved to disk. An
operator will need to go through the database on a regular basis and manually
fill the address field in the Access database based on the recordings.

Conclusion

Well, we've completed the first three VoiceXML
dialog files. In the next article, we will finish the last three dialog
files in the application. Also, a few notes on what we've done so far. First, recognizing spoken numbers is not always
100% accurate. If you experience problems, you may want to convert to using DTMF tones to
capture the number, which is
almost always correct the first time. Also, we will need to go back
and add event handlers and error checking once we've completed the initial version of the
application.



About Jonathan Eisenzopf


Jonathan is a member of the Ferrum Group, LLC based in Reston,
Virginia that specializes in Voice Web consulting and training. He
has also written articles for other online and print publications
including WebReference.com
and WDVL.com. Feel free to send an
email to eisen@ferrumgroup.com
regarding questions or comments about the VoiceXML Developer series,
or for more information about training and consulting
services.

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends & analysis

Latest Posts

Related Stories