October 31, 2014
Hot Topics:
RSS RSS feed Download our iPhone app

Building VoiceXML Dialogs

  • August 16, 2004
  • By Jeff Kusnitz & Dr. Bruce Lucas
  • Send Email »
  • More Articles »

A grammar's header is typically used to identify a single rule within the grammar as its "root rule" via the "root" attribute. This root rule is automatically activated when the grammar is referenced by a VoiceXML application, unless the application indicates that a different rule should be applied.

The grammars in the previous examples were written in SRGS XML format. They could have also been written using SRGS Augmented BNF syntax. In addition to being a more compact, more human readable form, grammars written in ABNF syntax are also more familiar to grammar developers.

An ABNF version of the sandwich grammar might look like this:

  #ABNF 1.0 ISO-8859-1;

  language en-US;
  root $sandwich;

  $ingredient = ham | roast beef | tomato | lettuce |
                swiss [ cheese ];

  $bread = rye | white | whole wheat;

  $sandwich = $ingredient ( [ and ] $ingredient ) <0-> on $bread;

A typical dialog enabled by the above form and either of the grammars might be:

   Browser: What would you like to drink?
   User: Orange juice
   Browser: What sandwich would you like?
   User: Roast beef lettuce and swiss on rye

As with the previous form example, once the browser has collected the input for both fields, the final block will be executed and cause the variables drink and sandwich to be sent to the getOrder application for processing.

In each of the preceding VoiceXML examples, tags were used to indicate text that would be synthesized by the browser and spoken to the user. The contents of each is Speech Synthesis Markup Language, or SSML. SSML not only allows a voice application developer to specify text to be synthesized, but it also provides a means to specify prerecorded audio that should be played.

In addition, SSML has numerous parameters to control the output itself. There are parameters for controlling the output volume, the rate the synthesized text is spoken at, what portions are emphasized, and so on.

A complete discussion of SRGS and SSML is beyond the scope of this article. For further reading, their respective specifications are a good starting point. Stay tuned for the second installment of Building Conversational Applications Using VoiceXML which will address how VoiceXML takes advantage of the distributed web-based application model as well as advanced features including: local validation and processing, audio playback and recording, support for context-specific and tapered help, and support for reusable sub dialogs.

About the Authors

Jeff Kusnitz has been with IBM since 1987 and focuses on telephony and speech recognition platforms. He is currently IBM's representative to the VoiceXML Forum and the W3C Voice Browser working group on voice application specifications and platform and developer certifications.

Bruce Lucas has been with IBM since 1986. He was the lead designer and developer for IBM's Speech Mark-up Language and VoiceXML browsers, and has been IBM's representative to the VoiceXML Forum and the W3C Voice Browser working group, and co-author of and major contributor to the VoiceXML 1.0 and 2.0 and related W3C specifications.



Page 3 of 3



Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 


Sitemap | Contact Us

Rocket Fuel