March 8, 2021
Hot Topics:

Is VoiceXML the Right Tool for Your Voice Application?

  • By Brian Brown
  • Send Email »
  • More Articles »


The following situations may indicate that your application is not a good match for VoiceXML:

Playback of very long audio files is required. VoiceXML does not allow for playing voice files at different speeds, or for beginning playback at a specific point in a voice file. For example, most voice messaging systems allow the user to press a key to fast-forward 10 seconds in a message. VoiceXML would not support this.

Frequent Call transfers to other phone numbers are required. For example, applications designed to front-end call center transfers (for example, to gather information from a caller before the call is sent to a live agent) are typically better handled by hardware integrated with a call center's telephone system.

Outbound calling functionality is required. The VoiceXML specification does not handle outbound dialing requirements. Several VoiceXML service providrs allow outbound calls outside of the VoiceXML spec, but these are proprietary extensions, and applications written for one provider's platform will have to be ported to work with another provider. Determining the outcome of call attempts is a particularly difficult challenge in the development of outbound applications. This function is usually achieved by complex frequency analysis, which is not supported in VoiceXML. This function will probably be available from a VoiceXML service provider, but each provider will have its own approach and effectiveness claims for solving this problem (as well as its own outbound API extension).

Other considerations

There are a number of other considerations to be taken into account when deciding if VoiceXML is for your application.

Cost: If you host your application with a service bureau, you will pay by the minute for phone time . considerably more than you will pay per minute for long distance or local phone service into your own system.

Call Volume: If you anticipate very high volumes of calls, you may find a quicker ROI on an equipment purchase. If your call volume will fluctuate and occasionally spike, purchasing equipment may not be wise, since you will need enough capacity to support your peak call volumes. Your equipment may sit idle the rest of the time (for example, a vote-for-your-favorite-contestant by phone during a television special). Service bureaus may charge more for spiky volume, but they will probably be able to handle it better than you can with hundreds or thousands of phone lines at their disposal.

Connection to your equipment: A system hosted off-site can only be as reliable as the link between your server equipment, data storage, and the off-site voice gateway. If it's okay for your system to be occasionally unavailable, you can use the Internet for this connectivity. If outages aren't acceptable, you may have to lease a point-to-point data line between your site and your vendor's gateway. Purchasing your own voice system avoids these issues; you can co-locate it with your application server equipment and have a fast, reliable connection between them.

Your expertise level (and that of your IT staff): Programming your own non-VoiceXML application will require specialized skills, including acquiring detailed knowledge of telephony, and mastering a daunting C or C++ API. In addition, telephony equipment requires special maintenance skills. If your staff consists of web programmers and general IT personnel, a hosted VoiceXML solution may be better.

Capabilities of potential VoiceXML providers: Larger providers with large phone line capacities will probably be more expensive but may provide some valuable functionality, like larger maximum capacity. Choosing the right VoiceXML service provider is a topic for a separate discussion.

Portability of phone numbers: Most VoiceXML service bureaus will "lend" you phone numbers for your application. If your company makes a substantial investment in marketing the numbers for your application, the phone number(s) the provider has lent you may become valuable to you. Unless you negotiate a different arrangement up-front, a decision to switch providers may cost you your existing phone numbers.

Availability requirements: If the system is very critical to your business, and outages would be catastrophic, you will want a highly redundant system as well as redundant connections from the gateway to the system. This will drive up costs whether you go the equipment or service bureau route, and may force you toward using a service bureau due to the expense of redundant hardware.

Probability of "feature creep": If it is likely that additional requirements for your voice application may come later, remember that these new features might not be supported by the decision you're making. Feature creep often presents more of a challenge in voice applications because of the diversity of supported feature sets; the cost of changing platforms to support new requirements may be quite high.


Finally, here are a few examples of applications and issues involved when creating each with VoiceXML:

Voice Messaging (as well as unified messaging): Several increasingly common features of voice messaging are unavailable in the VoiceXML specification, including the ability to pause and resume message playback, speed up or slow down message playback, and fast-forward or rewind message playback. Additionally, reading text-to-speech'ed email messages to users almost demands these features. Prognosis: NOT a good match for VoiceXML; get closer to the hardware with C++ or something else.

Order status: Assuming that you provide your customers with a means for checking order status on the web, writing a narrow VoiceXML front-end to this application could be fairly easy. You can use the same logic from your web application to develop your VoiceXML application, and no special voice functionality is required. You will need to provide some special provisions to identify your callers (traditional usernames and passwords do not translate well to phone usage). Prognosis: A pretty good match for VoiceXML, if you can give them an easy way to log in on the phone.

Non-user-specific status information: (this may include flight status, road conditions, etc) When callers do not need to be uniquely and positively identified (i.e. any information can be provided to any caller), no prior setup needs to be completed. Assuming the data you can provide to your callers is readily accessible via your back-end web server, you should be able to quickly build an easy-to-use application. Prognosis: probably as close to an ideal match for VoiceXML as you will find.

Product availability or ordering system: Systems capable of providing hundreds of pages of information (or hundreds or thousands of possible search targets) are particularly difficult to develop for narrow interfaces such as voice. The availability of a keyboard, mouse and visual display makes the traditional html web interface a good mechanism for selecting an item from a large number or search results. The limited input bandwidth of voice applications makes this task difficult for voice applications. Speech recognition systems can help crack this nut, but building these requires considerable effort in the interface design. Prognosis: A difficult application for any voice system, but if you can get a good speech interface designed, VoiceXML may work just fine.

Next Steps

If, after reading this article, you think your application would work well with VoiceXML, sign up for a free development account with one of the larger VoiceXML service providers and begin to experiment with the methodology. If you have doubts, start to look at the APIs provided by telecommunications hardware vendors such as Intel's Dialogic, Lucent, or NMS. If your application may use speech recognition, investigate the programming methodologies used by Speechworks and Nuance.

VoiceXML is best suited for applications which require relatively little input from the user, deliver highly-targeted output, and in particular, provide a set of data which is already (or easily could be) available via an HTML web interface. When your application requires substantial content delivery, needs complex navigation or broad ranges of input, or is mission-critical, give careful consideration to your decision, seeking out a voice expert if you're not sure.

About the Author

Brian Brown has been designing and building telecommunications and telephony systems for 10 years, in various roles as employee, manager, company founder, and outside consultant. Brian is currently Vice President of Technology for a Denver-based transaction fulfillment startup. He holds a bachelor's degree in Computer Science from the Massachusetts Institute of Technology.

Page 2 of 2

This article was originally published on January 22, 2003

Enterprise Development Update

Don't miss an article. Subscribe to our newsletter below.

Thanks for your registration, follow us on our social networks to keep up-to-date