VoiceXML is an excellent tool for developing voice applications that meet
particular criteria. However, contrary to what might be claimed by VoiceXML
enthusiasts (including those who sell VoiceXML services), it is not the perfect
tool for every project. Just as there is no perfect programming language for
every software application or perfect database for every database application,
there is no perfect platform choice for all voice-enabled applications.
A number of factors will influence which architecture, hardware, operating
system, language, and off-the-shelf software you should use for a particular
project. This article will give you a basis for understanding the strengths and
weaknesses of VoiceXML in order to help you determine if VoiceXML is the right
tool for your project.
To fully benefit from reading this article, you should be familiar with
general web application development principles as well as XML. It would also
help to be familiar with the basics of VoiceXML. A full reference of the
VoiceXML 2.0 specification can be found at http://www.w3.org/TR/voicexml20. If
you aren't inclined to read the entire spec, it would be worthwhile to at least
read the overview and background sections of that document before continuing on.
The strengths of VoiceXML lend it to specific types of applications. First,
VoiceXML is designed to be platform-independent (on the gateway side, not on the
application server side). VoiceXML is designed around the same server-side-pull
model used for HTML applications. In fact, VoiceXML applications can, and often
are, run in conjunction with traditional web applications, accessing the same
data and performing the same essential tasks, even residing on the same
machines. VoiceXML allows a programmer to write a basic voice application
without having to know or learn anything about the voice hardware on which the
application will run.
VoiceXML also has a number of limitations. Its hardware independence comes at
a price; only a limited set of telephony functions are available in the VoiceXML
API (e.g. Onhook/offhook call control, touch-tone synthesis and recognition,
etc). Some occasionally essential (and admittedly less-used) functions are
simply not available in the VoiceXML API. For example, complex frequency
analysis used for outbound call progress detection is not available; speed and
volume control for audio file playback are also unavailable. Audio files cannot
be played beginning at an arbitrary point; this feature is necessary, for
example, when resuming playback of a paused or interrupted voicemail message.
A VoiceXML platform typically consists of a gateway and an application
server. The gateway almost always resides on the same machine as the voice
hardware and the application server interfaces with any data and control
sources, and houses the programming logic. In most cases, all programming takes
place on the application server side. For our purposes, you should treat the
VoiceXML gateway as a black box that interfaces with the phone network, the
caller, and the caller's telephone.
Platform Options
There are several platform options for
development and deployment of voice applications.
- Use a VoiceXML service bureau. This is the most common option for
less elaborate voice applications with relatively modest volume requirements.
You will probably still need to host the logic for your application on your own
equipment.
- Use a non-VoiceXML service bureau. You will have to pay
them to develop your application. There are fewer of these available as VoiceXML
takes over as the industry standard, but they may be less expensive, and can
provide you with some of the features missing from
VoiceXML.
- Purchase hardware and build your own non-VoiceXML
application. This is by far the most difficult path to pursue, and will require
significant specialized training in telephony, and speech recognition (if your
application requires it).
- Purchase a VoiceXML system to reside with
your equipment. You will probably still treat it largely as a black box, and may
need assistance ordering phone lines and connecting the system to the phone
network.
Positive Indications
Your application may be suited for
VoiceXML if the following conditions are true:
The application only requires basic input from the user, and will only
deliver basic audio information to the user. The specification allows for
the playback of audio voice files as well as text-to-speech audio. VoiceXML
applications can gather touch-tones (DTMF) as well as recognize speech
interaction from the user. A flight status information line might fit this
profile.
Little, if any, interaction with the phone network is required. The
application should answer calls, interact with the user, and hang up at the end
of the call. There is a large amount of functionality available from the phone
network, but most of this is handled behind-the-scenes by the VoiceXML gateway
for you. However, if you find that you need more sophisticated phone network
functionality, such as access to automatic number identification (ANI, like
caller ID), billing telephone numbers (BTN), or the ability to set these
attributes for an outgoing call transfer, VoiceXML may not be right for your
application. In addition, there are a number of functions available in the ISDN
and SS7 network specifications which simply aren't available in VoiceXML. You
probably won't need these, but if you do, you're out of luck with VoiceXML.
Your voice application is no more critical than your web site. The
server-side logic for your voice application must reside on a web server. If
your application gives the caller access to the same data used by your web site,
it may be a good decision to run your voice application from the same server.
But if your server has occasional busy periods or outages, this will affect your
voice application too.