February 25, 2021
Hot Topics:

VoiceXML Developer Series, Introduction

  • By Jonathan Eisenzopf
  • Send Email »
  • More Articles »

VoiceXML Documents

While not as popular as interactive dialogs, VoiceXML can be used to synthesize texts like books, articles, or even Web pages.

<?xml version="1.0" 

<vxml version="1.0">
  <form id="form1">
    <block name="block1">Hello, 
this is an example of a Voice XML document 
using synthesized text. As you can hear, 
it's a bit choppy. But I might be able to 
pass as a silon from battle star galactica.

The VoiceXML document above is a good example of a simple VoiceXML document. vxml is the root element for VoiceXML documents in the same way that html is the root element for HTML documents. Most documents also contain a form element that contains a combination of recorded or synthesized prompts as well as form fields that users fill in with DTMF tones from keypad selections or from spoken input. This example contains no fields, but a paragraph of text. Text blocks are usually encapsulated inside block elements.

Voice Dialogs

The steps above sum up the activities that make up a single dialog interaction. In fact, most VoiceXML applications allow the user to hold a continuous dialog until they hang up. There are actually two types of voice dialogs that VoiceXML handles: directed and mixed initiative.

A directed dialog is one in which the system controls when and how the user can interact with the system. A good example are the numerous IVR system that allow us to check our account balances. The system plays a pre-recorded prompt, giving us a menu of selections and prompting us to push a number for a given item. Once the selection has been made, the system either gives us the information we've requested or plays another prompt for a sub-menu. For example:

Computer: For account balance, press one. 
For recent transactions posted you.re your 
account, press two. To transfer funds, 
press three.
User: 3 (DTMF)
Computer: To transfer from savings, press 
one. To transfer from checking, press two.
User: 1 (DTMF)
Computer: Please enter the amount to 
transfer using your keypad...

These systems are effective but not friendly. They don't allow the user to control the call flow other than to select a pre-defined choice. VoiceXML provides the <menu> tag, which gives us the same essential functionality as a standard IVR system.

The value would be high enough if it gave us equivalent functionality, but VoiceXML allows us to leverage recent advancements in speech recognition quality to allow users to interact with systems in a more natural way; through conversation. A mixed initiative dialog lets the user make requests in the same way you might ask a co-worker for a piece of information. It's up to the VoiceXML developer to guide the the user towards the right verbal commands and then to recognize them. For example

User: Transfer two hundred dollars from savings 
to checking.
Computer: Please verify that you want to transfer 
two hundred dollars from checking to savings by 
saying yes, or say no to start over.
User: Yes.

While choosing whether to use a directed dialog with menu selections or mixed initiative dialogs depends on the need, let's talk a little more about the specifics of what VoiceXML can provide for menu-driven dialogs versus more open-ended dialogs. First, like HTML forms, VoiceXML forms may contain multiple fields that can be filled out in any order the user chooses (though you could force the order through Javascript). In fact, VoiceXML allows mixed initiative dialogs via the <form> and <field> elements.

Despite the flexibility of a VoiceXML form, menus can also utilize voice recognition technology in addition to recognizing phone key presses (or DTMF tones).

<menu dtmf="true">
  <prompt>What is your favorite color? For red, 
say red or press 1. For blue, say blue or press 2. For 
Yellow, say yellow or press 3.</prompt>
  <choice next="red.vxml">red</choice>
  <choice next="#blue.vxml">blue</choice>
  <choice next="yellow.vxml#yel">yellow</choice>

The code segment above gives the user the choice of either using the phone keypad to make a selection or by simply saying the color they prefer. The text inside the choice element specifies the string that the ASR should try to match. You could (and should) prompt for DTMF tones ("press 1") or spoken text ("say red") but not both.

VoiceXML Deployment Costs

I'm often asked how much a VoiceXML system costs to deploy. Fortunately, the range is wide and it depends on whether you need a dedicated system or are willing to outsource to a Voice Service Provider (VSP). A dedicated VoiceXML gateway usually starts around $100,000 for the hardware, software, and installation depending on how many concurrent callers you need to handle.

On the low end, VSPs usually charge you per minute so you only have to pay for actual use. Prices are a few cents more than you're probably paying for long distance service and the top providers (TellMe, BeVocal, and Voxeo) are all quite good in terms of national coverage and pricing.

There really isn't a firm middle ground yet (below $100,000), but we should expect to see offerings in the $30,000 to $50,000 range as competition heats up and competitors move to serve demand in the mid-sized enterprise space. We will be looking at specific products in a future article and product reviews so that you have a better sense of what the options are.

Developing VoiceXML Applications

As was mentioned previously, VoiceXML gateways retrieve VoiceXML files over the HTTP protocol from any standard Web server. This also means that dynamic applications can be built with the same languages and technologies that you're using to build Web applications today. This is truly one of the great advantages of the technology. Furthermore, if you've gone to the trouble of separating your business logic from the presentation logic, you can leverage that same stored business logic to develop VoiceXML applications by swapping out the HTML presentation logic with VoiceXML content. Java Beans, CORBA, and .NET are all technology architectures that encourage this type of logic/presentment separation.

If all of your code is still embedded in a JSP, ASP, or Cold Fusion page, don't fret. You can leverage the existing code into new templates or take this opportunity to separate the code logic into libraries or components. We will address this process in a future article.

Vendors and Tools Support

Support for VoiceXML is nonexistent in most Web development tools that you might be using now like Dreamweaver and BBedit. However, you can use an XML tool like XMLSpy to develop and validate VoiceXML documents. There are also several VoiceXML editors available from independent providers like Voice Studio from Cambridge VoiceTech and V-Builder from Nuance that are shaping up fast.

Support from big vendors is on the horizon however. IBM is one of the few vendors that has integrated VoiceXML into its code editor for Web Sphere. This isn't su prising though since IBM is one of the leading VoiceXML platform providers.

The future of VoiceXML

The W3C hasn't made it totally clear what the next steps are beyond VoiceXML 2 other than the specification drafts that have been published in the past year. It seems likely that VoiceXML will be broken up into several specifications that control various aspects of a voice dialog, like speech synthesis or grammars. This will provide clarity and drive industry adoption. It will also create complexity. We'll have to wait and see the balance that's chosen in moving the VoiceXML standard forward. What is clear, however, is that VoiceXML (or whatever it becomes) is here to stay. One large technology vendor that has remained silent for some reason is Microsoft. I expect that we'll see something like Voice.Net in the future. It's worth noting that Microsoft licensed technology from Lernout & Houspie who was the leading voice technology vendor until they filed bankruptcy after creatively inventing some revenues in Asia.


Well, I hope you've enjoyed reading this introduction to VoiceXML as much as I have writing it. I hope that you'll come pack for the next edition of VoiceXML developer as we learn more about VoiceXML.


    VoiceXML Development Tools

  • Nuance V-Builder . http://extranet.nuance.com
  • IBM WebSphere
  • Cambridge VoiceTech Voice Studio . http://www.cambridgevoicetech.com
  • Voice Portal MSP

  • Voxeo . http://www.voxeo.com
  • BeVocal . http://www.bevocal.com
  • Turnkey Systems

  • Voice Genie - http://www.voicegenie.com
  • Cambridge VoiceTech - http://www.cambridgevoicetech.com
  • Articles

  • CTLabs VoiceXML Portal Report, img.cmpnet.com/commweb2000/whites/VXMLreport.pdf
  • Tellme More . http://www.voicexmlplanet.com
  • VoiceXML Adventure - http://www.voicexmlplanet.com
  • Web Sites

  • VoiceXML Planet . http://www.voicexmlplanet.com
  • VoiceXML Forum . http://www.voicexml.org
  • Training

  • The Ferrum Group, LLC . http://www.ferrumgroup.com

About Jonathan Eisenzopf

Jonathan is a member of the Ferrum Group, LLC based in Reston, Virginia that specializes in Voice Web consulting and training. He has also written articles for other online and print publications including WebReference.com and WDVL.com. Feel free to send an email to eisen@ferrumgroup.com regarding questions or comments about the VoiceXML Developer series, or for more information about training and consulting services.

Page 2 of 2

This article was originally published on October 1, 2002

Enterprise Development Update

Don't miss an article. Subscribe to our newsletter below.

Thanks for your registration, follow us on our social networks to keep up-to-date