February 25, 2021
Hot Topics:

VoiceXML Developer Series: A Tour Through VoiceXML

  • By Jonathan Eisenzopf
  • Send Email »
  • More Articles »

In this issue of the VoiceXML Developer, we'll begin a complete walk through of all elements included in the VoiceXML 1.0 specification. This issue introduces the basic elements used to markup content for the voice Web. We will focus primarily on the functionality that allows VoiceXML to control Text-To-Speech output.

Root Element

The root element of a VoiceXML document is the <vxml> element, which is similar to the <html> tag in HTML. The root element is preceded by an XML declaration and an optional document type declaration.

<?xml version="1.0"?>
<!DOCTYPE vxml PUBLIC '-//Nuance/DTD VoiceXML 1.0//EN' 
<vxml version="1.0">
can anybody hear me?</block>

The DTD above points to the Nuance version of the VoiceXML 1.0 specification and is necessary to run properly on the Nuance platform. You will need to change this DTD to support your vendor or alternatively remove it altogether since it's not required. The <form> element is similar to HTML forms in that a form can contain multiple fields, which are filled out and submitted by a user. VoiceXML operates in a similar manner, albeit a different user interface. The <block> element, which is the VoiceXML equivalent of the <p> HTML tag, synthesizes the enclosed text via a TTS (or Text-To-Speech) engine.

A VoiceXML document

The following is a first look at a complete VoiceXML document that utilizes the elements that we'll be learning about today. If you are using a VoiceXML editor such as V-Builder, you should be able to cut and paste the example into your editor and play it. To demo this VoiceXML example, call VoiceXML Planet at 510-315-6666. At the first menu, press one. At the demo menu, press 1 to hear the example below.

<?xml version="1.0" encoding="iso-8859-1"?>

<vxml version="1.0">
  <form id="form1">
    <block name="block1">Hello, 
this is an example of a Voice XML document using 
synthesized text. As you can hear, it's a bit choppy. 
But I might be able to pass as a silon from battle 
star galactica.

    <block name="block2">
      <prompt>Voice XML provides some features for
      controlling how I pronounce words and phrases.
      For  example, you can create a pause.
   <break size="large" msecs="5000" />
    I can also emphasize a phrase. John Bigbootae,
    I <emp level="strong">must</emp> 
have that overthruster!

<block name="block3">
<pros vol="1" rate="-50%"><audio
src="../prompts/prompt1.wav" />
      synthesized prompts.</pros>

    <block name="block4">
      <prompt>Sometimes, you may need to tell me how 
      to pronounce a phrase such as a date, currency
      or abbreviation. 
      Please mail
<sayas class="currency">$10,000.55</sayas> into
<sayas sub="world wide web consortium">W3C</sayas>
account number
<sayas class="digits">55432</sayas> by,
<sayas class="date">October 11, 2001</sayas> or call,
<sayas class="phone">800-555-1212</sayas>

    <block name="block5">
You can also control the <pros pitch="+50%">
prosity of <pros vol="1" rate="-50%">
my speech including volume, 
pitch, and speaking rate.</pros></pros>

The example above contains five <block> elements. The first block contains nothing but text, which is synthesized by the TTS engine. The second block creates a pause with the <break> element and adds an emphasis to a synthesized phrase with the <emp> element. The third block plays a pre-recorded prompt with the <audio> element, followed by synthesized text, which uses <pros> to increase the volume and decrease the speaking rate. The fourth block calls <sayas>, which is used to pronounce common character classes; in this case digits, currency, and a phone number.

Playing pre-recorded prompts with <audio>

<audio src="hi.wav">Hello there</audio>

The <audio> element is utilized to play a pre-recorded prompt. The src attribute specifies the URL of the audio file (which is usually a wav file). The <audio> element may also contain text, which is synthesized via the TTS engine in the case where the server cannot retrieve the sound file.

We will be covering the process of recording prompts in more detail in a future article.

Page 1 of 2

This article was originally published on October 2, 2002

Enterprise Development Update

Don't miss an article. Subscribe to our newsletter below.

Thanks for your registration, follow us on our social networks to keep up-to-date