Natural vs. Direct Dialog and How VoiceXML Enables Both, Page 2
These <tag> elements determine what information is returned to the VoiceXML application when a rule is matched. If the rule "fromwhere" is activated, for example, and the user speaks the phrase "from San Francisco," the tag contents say that the "fromwhere" rule should ignore the word "from" and only return whatever the "cities" rule returned ($ = $cities). The "cities" rule, of course, would have returned the phrase "San Francisco", so the "fromwhere" rule would then also return "San Francisco".
Similarly, the more complex "travel" rule says that the user can say the optional phrase "I'd like to fly" followed by a phrase matching the "fromWhere" rule and then a phrase matching the "toWhere" rule, or even the converse.
As with the simple case, upon saying something that matches the rule of the grammar, the semantic interpretation information is then processed. The "travel" rule's <tag> element says that when a phrase is matched, the results of the $fromWhere and $toWhere rules are to be stored in the $.from_city and $.to_city variables respectively. This action has a special meaning when the grammar is being used as a form level grammar, which is the case now. In this case, the VoiceXML Browser will fill the "from_city" field with the value from $.from_city, and the "to_city" field with the value from $.to_city.
When the VoiceXML application above is executed, the VoiceXML Browser will activate the form-level grammar and then execute the <initial> element, which will cause the "How can I help you?" prompt to be played. At this point, the user can either say a phrase that matches the grammar, such as "I'd like to fly from San Francisco to New York" or he/she can say something that does not match the grammar. In the former case, the semantic interpretation processing will result in the form's two input fields being filled in, followed by the <submit> element within the <block> being executed.
In the event the user's input does not match the active grammar, the <nomatch> event handler will terminate the <initial> element processing and collect each field's input separately.
By further extending the grammar, any number of possible user inputs could be used to fill in the field information. For example, the "fromWhere" and "toWhere" rules could be made optional in the "travel" rule, allowing the user to speak "I'd like to fly from New York" without having to include "to Chicago."
The ability to accept such free-form utterances is only a first step toward natural dialog. Over time, VoiceXML will continue to evolve to incorporate more advanced features in support of natural dialog.
To review, until recently the web revolution had largely bypassed the huge market of customers of information and services represented by the worldwide installed base of telephones. Thanks to the work by the W3C and the VoiceXML Forum several complementary standards are changing the way we interact with voice services and applications - by simplifying the way these services and applications are built. VoiceXML is an XML-based [XML] language, designed to be used on the Web. As such, it inherits several key features common to all XML languages: First, it leverages existing Web protocols such as HTTP to access remote resources; second, any tool that is able to read or write XML documents can read and write a VoiceXML document and third, other XML documents can be embedded in VoiceXML documents and fragments; similarly, VoiceXML documents can embed other XML documents and fragments.
About the Authors
Jeff Kusnitz has been with IBM since 1987 and focuses on telephony and speech recognition platforms. He is currently IBM's representative to the VoiceXML Forum and the W3C Voice Browser working group on voice application specifications and platform and developer certifications.
Bruce Lucas has been with IBM since 1986. He was the lead designer and developer for IBM's Speech Mark-up Language and VoiceXML browsers, and has been IBM's representative to the VoiceXML Forum and the W3C Voice Browser working group, and co-author of and major contributor to the VoiceXML 1.0 and 2.0 and related W3C specifications.
Page 2 of 2