February 27, 2021
Hot Topics:

Demystifying 10 Common Misconceptions About VoiceXML

  • By Jonathan Eisenzopf
  • Send Email »
  • More Articles »

It's easy to write VoiceXML applications

Because VoiceXML is based on existing Web standards, many of the techniques and skills that Web developers have amassed over the past few years will translate into developing speech applications. Web developers too often underestimate the learning curve required to develop voice user interfaces and the difficulties that arise when integrating VoiceXML applications with telephony equipment.

For example, how do you route callers from the PBX to the VoiceXML gateway? Or how do you transfer a VoiceXML caller into the ACD? To become an effective speech application developer, you'll need to have a foundation in Web development, telephony, and networking.

Mastering speech applications also requires knowledge and experience in designing speech interfaces. That skill is part art, part science. There are few resources on designing Voice User Interfaces (VUIs) and there are only a handful of people and even fewer companies that have any significant experience in this area. One book I can recommend however is: 

"Designing Effective Speech Interfaces" by by Susan Weinschenk, Dean T. Barker, published by Wiley.

VoiceXML as a specification is fairly easy to learn, but don't think that means you can easily develop a good speech application. The best way to test your success is to have a friend test it in their car, on their cell phone, in traffic.

VoiceXML is portable if I use the standard tags

Wrong. Even though I wish it were so, I can't copy my applications from Tellme, to BeVocal, to Voxeo, to VoiceGenie and have it work without any changes. I'm not sure that I will EVER be able to because of the subtle differences in how vendors implement the standard.

What this means is that you can't develop and test your application on Tellme for free, and then go out and buy a dedicated gateway from VoiceGenie without any code changes. Fortunately, the code changes will be minor in scope compared to say, porting a C application to Java, however, it's best to select your platform before you start developing the application so you know it will work when it's deployed. So if you know you'll be going with a VoiceGenie gateway, then go ahead and develop and test your application in their hosted development environment. Then you know that your application will work exactly the same when you install it on the dedicated platform.

I've programmed IVRs so speech should be a breeze

Whoa there! This is equivalent to a Web developer saying that they can develop VoiceXML applications with no training. Experience with touch-tone IVRs will provide you with a good perspective of how it will function in your existing development environment, however you will need to become familiar with Web protocols and programming environments. 

Fortunately, IVR programmers have a leg up on understanding how to design a VUI. Most of this experience does translate to speech, however, you will have to throw out some of the design criteria and assumptions that you would normally make for a touch-tone interface. You'll have to switch from thinking in terms of a menu tree to thinking more about speech dialog progressions.

Since VoiceXML is an open standard, integrating a gateway with our PBX, ACD, or call center will be easier

Actually, the exact opposite is probably true, but for a different reason. Yes, it is true that VoiceXML is an open standard, which means that you will have more options in the future, but openness doesn't necessarily have anything to do with maturity. What I mean is that IVR systems that have had years to develop and mature will likely have features, tools, and integration features that VoiceXML gateways lack. Also, VoiceXML has limited call control functionality and no CTI integration capabilities. Gateway vendors either provide this functionality using proprietary APIs or will utilize a 3rd party product such as Intel's CT Connect. If you have a complex telephony environment, you will want to be very careful about which vendor you select. Make sure the vendor can explain exactly how they will integrate their product into your environment.

With VoiceXML, callers will be able to just talk to the system naturally and it will understand

This misconception has to do with continuous speech recognition products like Dragon Dictate and IBM Via Voice, which allow users to speak Word and email documents into existence. The speech recognition that's used in VoiceXML typically requires developers to create grammars. These grammars define everything that a caller can say. If the caller says something that's not in the grammar, then it will not get recognized. Furthermore, there isn't anything in VoiceXML that allows the speech recognition engine to take some action based upon an interpretation of what was being said. The actions are all coded into the VoiceXML code. Recently however, Nuance and Speechworks have introduced versions of their respective speech recognition engines that allow callers to speak more naturally by using statistical models instead of strictly defined grammars. This technology is still experimental from a VoiceXML standpoint and the voice browser working group at the W3C is still working out how to handle semantic interpretation for speech recognition. Within a year or so, it may be possible for a system to ask, "How may I help you?" Until then, grammars must be hand-coded, restricting the level of natural language that can be used in VoiceXML applications.

VoiceXML is too new and isn't well supported

Well, this may have been true a year and a half ago, but things have changes rapidly since then. Here's a partial list of recognizable companies offering VoiceXML capabilities. You be the judge as to whether VoiceXML is being supported:

  • Lucent
  • Cisco
  • IBM
  • Sun
  • Oracle
  • Siemens
  • Nortel
  • Intel
  • Motorola
  • AT&T

As to VoiceXML being new, yes, it's fairly new, however, it's based on stable technologies that have been developed over the last 30 years or so.

There really isn't a demand for VoiceXML yet and analysts haven't recommended it

To debunk the myth that VoiceXML is not getting traction, I talked with several speech recognition and IVR vendors. All four told pretty much the same story. Customers are including VoiceXML as a requirement in their Request For Proposals (RFPs) and are in the early stages of evaluating or developing VoiceXML applications.

As to analyst coverage, there has been some. Gartner published, "IVR Magic Quadrant for 1H02 - Challenges for Incumbents" in which speech recognition and VoiceXML are two drivers for IVRs. This briefing can be downloaded from the InterVoiceBrite Web site.


I hope these insights will save you from some of the flawed assumptions that I've made in the past. If you have stories or tidbits of advise that you'd like to share, send them over and I might publish them in the future.

About Jonathan Eisenzopf

Jonathan is a member of the Ferrum Group, LLC  which specializes in Voice Web consulting and training. He will be teaching the VoiceXML Bootcamp June 10-13 in Washington, D.C. Feel free to send an email to eisen@ferrumgroup.com regarding questions or comments about this or any article, or for more information about training and consulting services.

Page 2 of 2

This article was originally published on November 6, 2002

Enterprise Development Update

Don't miss an article. Subscribe to our newsletter below.

Thanks for your registration, follow us on our social networks to keep up-to-date