VoiceXML Developer Series: A Tour Through VoiceXML
Controlling pitch, volume, and speed of TTS
You can emphasize synthesized words and phrases with the <emp> element. For example:
<emp level="strong">Officer</emp>, you must have mistaken my Dodge Dart with another lime green automobile.
The level attribute can be set to strong, moderate, or reduced based upon the emphasis you desire. The default is moderate.
<?xml version="1.0" encoding="iso-8859-1"?> <vxml version="1.0"> <form> <block name="block5"> <prompt> <pros pitch="+90%" rate="+40%">Hey turtle, you wanna race. Come on.</pros> <pros pitch="-40%" rate="-30%">Now rabbit, how many times do I have to win before you give up?</pros>; </prompt> </block> </form> </vxml>
In the example above, we increase the pitch and speaking rate when the rabbit speaks and reduce the rate and pitch when the turtle speaks. The attributes of the prosody element can be increased or decreased by percentage points. The rate attribute specifies the number of words that the TTS engine will speak per minute, while the volume attribute controls the volume (1 is the maximum). The controls for defining prosody were borrowed from the Java Speech Markup Language developed by Sun (see the Resources section at the end of the article).
Use <sayas> to pronounce special character classes
I mentioned a little earlier that VoiceXML is capable of pronouncing certain classes of text. For example, you wouldn't want the TTS engine to pronounce $220.25 as "dollar-two-two-zero-period-two-five". Rather, you would want it to say, "Two hundred twenty dollars and twenty five cents". VoiceXML also borrows the <sayas> element from JSML. The five built-in classes defined in the JSML specification are date, digits, literal, number, and time. Let's take a look at a couple examples:
Your speeding ticket comes to <sayas class="currency">$250.00</sayas> plus tip. You must pay the fine by <sayas class="date">December 1, 2002</sayas>. Prisoner <sayas class="digits">5164</sayas> , what are you in for?
The <sayas> element also provides a sub attribute, which allows us to control how the TTS engine pronounces words, phrases or abbreviations. For example:
<sayas sub="world wide web consortium">W3C</sayas>
Control pauses with <break>
The <break> element forces a pause in the execution flow. It can be used inside <audio>, <prompt>, and <pros> elements. The length of the pause is specified by the msecs attribute. For example:
<block> <prompt>The current temperature in San Francisco is fifty eight degrees. <break msecs="5000"/> The traffic on the golden gate bridge is . . . </prompt> </block>
We will continue our tour of VoiceXML in the next issue. For now, some closing thoughts on the elements that have been introduced so far. First, be forewarned that each TTS engine is different. For example, it seems that one TTS engine counts milliseconds differently for the <break> element than another. In addition, support for the TTS components of the VoiceXML specification remain spotty and inconsistent. Some implementations may not even recognize certain elements at all. Finally, when using elements like as <pros> and <sayas>, make sure that the platform you're testing on is the same platform you're deploying on or you will be in for big surprises. Well, that's it for now. I'll see you next time as VoiceXML Developer continues to dig deep into the voice Web.
About Jonathan Eisenzopf
Jonathan is a member of the Ferrum Group, LLC based in Reston, Virginia that specializes in Voice Web consulting and training. He has also written articles for other online and print publications including WebReference.com and WDVL.com. Feel free to send an email to email@example.com regarding questions or comments about the VoiceXML Developer series, or for more information about training and consulting services.
Page 2 of 2