In the last two editions of the VoiceXML Developer, we learned how
to create VoiceXML grammars in both GSL and JSGF formats. In this edition
of the VoiceXML Developer, we’re going to learn how to record and playback
speech and how to transfer callers to another phone number.
Record speech with the <record> element
The <record> element records spoken
input and assigns the contents to a
VoiceXML variable defined by the name attribute.
<record name="caller_name" beep="true" maxtime="10s" finalsilence="2000ms" type="audio/wav" dtmfterm="true" />
The beep attribute determines whether an audible
tone is played before the gateway begins recording. Most people
are used to hearing a tone on answering machines and voice mail systems
as a signal to begin speaking. By default, this is set to false.
The maxtime attribute specifies the maximum number
of seconds to record input. The system will automatically stop recording
when this value is reached if the user hasn’t stopped speaking or hasn’t
otherwise indicated that they’ve completed the recording.
The finalsilence attribute sets the number of milliseconds
of silence that will signal the system to stop recording input. If you set this
to a value that is too small, the system might stop recording when the speaker
pauses between a sentence or takes a breath, so be careful cowboy.
The type attribute contains the mime type for the audio
format that the recording will be saved to. The supported formats will
differ based upon the VoiceXML gateway platform you’re using, however,
the audio/wav format should be standard on most if not all
When the dtmfterm attribute is set to true,
the system will stop recording input when it hears a DTMF tone. This can be
any button on a standard telephone keypad. It can be used instead of or
in addition to the finalsilence attribute, which stops
recording input when it hears a pause.
Because the <record> element is essentially a form
field that contains recorded audio input rather than text, it can contain
prompts and event handlers. The example below collects two recordings, first
the customer’s name, then their message. These recordings are then sent to
a back end Perl script for processing.
On lines 8-11 in the example above, we’re recording the customers name. If we don’t get any input, the
noinput event is triggered and the <noinput> element on line 10 is called, which reprompts the
user. Once we have the customer’s name, we record their emergency on lines 12-15. We submit
the recordings to a script with the <submit> element on line 18 and end the call.
Transfer a caller to another line with <transfer>
There are many instances where we will need to transfer a customer to a live operator
for assistance if they are having problems with the VoiceXML interface. In the case of
our previous example, we will ask the customer to confirm their name and emergency
request by saying yes. If they say no, then we know that there is a problem,
at which point, we’d want to transfer them to commie the clown for assistance.
We will also transfer the caller to an customer support representative if the
noinput event gets triggered more than once for either of the
<transfer name="transfer" dest="phone://8005551212" bridge="false" connecttimeout="30s" maxtime="0" />
The name attribute holds the result of the transfer command.
If the transfer succeeds, the VoiceXML gateway will terminate the call with the
customer and let the customer continue their conversation with the customer
service representative (CSR). If the transfer fails, this named variable will
hold one of the following values:
There are two types of call transfers. A blind transfer, and a bridged
call. A blind transfer is when the gateway terminates the call as soon as the
call has been transferred successfully. A bridged call is one in which the caller
resumes interaction with the VoiceXML application after the transferred call has
been completed. Most call transfers will be blind transfers. To make a bridged
call, set the bridge attribute to true. To make a
blind transfer, set bridge to false.
Support for bridged
transfers is spotty at best and largely depends on whether the hardware/software
platform you’re using supports it. If you’re not sure and you’d like to explore
this feature, you’ll need to contact your VoiceXML gateway provider (if you have one).
If you’re using a Voice ASP, contact their technical support for help.
The dest attribute defines the URI that you wish
to connect to. This will probably be a phone number, though future options
will likely include SIP.
The VoiceXML 1.0 spec does not explicitly define the URI options for the dest
attribute, so you will need to refer to your vendor documentation to find out exactly what
format you should be using. The value of the dest
attribute above looks a bit like a Web URL, but instead of http:// we have
phone:// and instead of an IP address, we have a 10 digit phone number. This format
should work on most if not all VoiceXML platforms by the way.
The connecttimeout attribute defines the number of seconds that
we should wait for the call to connect. If the time expires and a connection hasn’t
been made, then one of the values listed above for the name attribute
will be set. It’s up to you to evaluate the result and do something with the call
if it doesn’t get connected. You might try to do the transfer again, or give the
customer a warning message and disconnect the call.
The maxtime attribute determines the maximum length of the
call. Setting this attribute to zero removes a limit on the length of the call.
Note that this attribute is only relevant when the bridge attribute
is set to true.
Getting back to our clown dispatch example, the example below tranfers the
customer to a CSR if they trigger the noinput event more than
once or say no when asked to confirm their dispatch:
You’ll notice on lines 11-13 and 18-20, that we’ve added a second <noinput>
element, which when triggered, runs the form that transfers the customer to a CSR. Also, lines 22-34
contain the confirm boolean <field>, which prompts the user to say yes or no. If they say yes,
the customer receives a confirmation message on line 28, and the two recordings are submitted to
/cgi-bin/dispatch.pl on line 29. If the user says no, they are transferred
to a CSR on line 31 via a <goto> element.
Lines 36-50 contain the call_transform <form>,
which contains the <transfer> element on lines 38-46. At the point
that the VoiceXML interpreter reads the <transfer> element, it will dial
1-800-555-1212 and wait for an answer. If the call did not connect, the transfer
variable is filled with one of the transfer values listed above. We check for busy on line 41
and noanswer on line 42. We also set a local variable called duration,
which is assigned the value for the length of the call in seconds. Both of these values are then
sent to /cgi-bin/log.pl to be recorded in a log file for further processing.
It’s important to re-emphasize that the transfer element is vendor dependent.
The <transfer> element is actually optional and does not require a vendor
to implement it to be VoiceXML compliant. The examples provided here are general
and may or may not work in your environment. As for the <record> element,
while recording spoken input is simple enough, we stopped short of actually saving
the wav files to disk via a server-side script. This can be accomplished via any
back end scripting language such as Perl, PHP, ASP, Python, Java, etc. We will save this
exercise for a later article. For now, we’re sticking with the syntax of the VoiceXML
elements. This brings up a good point however; VoiceXML by itself is not sufficient
for developing voice applications, even with the ability to make documents more dynamic
the application becomes dynamic and capable of storing input and retrieving information
from a database. Some might call this an oversight. After all, if you think about it,
to grapple with VoiceXML, we have to learn several languages:
- GSL or JSGF
- Perl,Java,ASP, or other
So why did the VoiceXML authors decide to do it this way rather than just
adopting one language and being done with it. I’m not one of the authors, but
I think part of the answer is
that VoiceXML is for Web developers who are already used to this kind of environment.
Developers who have been writing voice applications in C++ or VB might be
better off sticking to their guns. On the other hand, coding voice applications
XML (at least partly) makes voice applications more portable and potentially
easier to write for non-programmers. Ooops, I just opened up a can of worms.
I’d better go now. See in the next edition of the VoiceXML Developer.
About Jonathan Eisenzopf
Jonathan is a member of the Ferrum Group, LLC based in Reston, Virginia
that specializes in Voice Web consulting and training. He has also written
articles for other online and print publications including WebReference.com
and WDVL.com. Feel free to send an email to [email protected] regarding
questions or comments about the VoiceXML Developer series, or for more
information about training and consulting services.