Voice W3C Unleashes VoiceXML 2.0

W3C Unleashes VoiceXML 2.0

The World Wide Web Consortium (W3C) Tuesday
delivered a cornerstone of its Speech Interface Framework
when it published VoiceXML 2.0.


VoiceXML is intended to bring the advantages of Web-based development and
content delivery to interactive voice response , or IVR,
applications. The specification allows developers to create audio dialogs
that feature synthesized speech, digitized audio, recognition of spoken and
DTMF (touch-tone) key input, recording of spoken input, telephony, and
mixed initiative conversations.

“VoiceXML 2.0 has the power to change the way phone-based information and
customer services are developed,” said Dave Raggett, W3C Voice Browser
Activity Lead. “No longer will we have to press ‘one’ for this or ‘two’ for
that. Instead, we will be able to make selections and provide information
by speech. In addition, VoiceXML 2.0 creates opportunities for people with
visual impairments or those needing Web access while keeping their hands
and eyes free for other things, such as getting directions while driving.”


VoiceXML’s role in the Speech Interface Framework is to control how an
application interacts with the user. Speech Synthesis Markup
Language
(SSML) handles spoken prompts, while Speech Recognition Grammar
Specification
(SRGS) is used to guide speech recognizers via grammars
describing expected user responses. Voice Browser Call Control (CCXML)
provides telephony call control support for VoiceXML and other dialog
systems, while Semantic Interpretation for Speech Recognition defines the syntax and
semantics of the contents of tags in SRGS.


The inception of VoiceXML lies with an AT&T project dubbed
Phone Markup Language (PML). In 1995, AT&T created an XML-based dialog
design language intended to simplify the speech recognition application
development process within PML. With the reorganization of AT&T, teams at
AT&T, Lucent and Motorola continued
working on PML-like languages. Following a W3C conference on voice browsers
in 1998, AT&T, IBM , Lucent and Motorola — all of which
were developing speech-based markup languages, created the VoiceXML Forum to pool their efforts
and define a standard dialog design language for building conversational
applications.

VoiceXML Forum released VoiceXML 1.0 to the public in 2000, and then
submitted the specification to the W3C. The specification slots into the
W3C’s work on the Speech Interface Framework, which would allow people to
use any telephone to access appropriately designed Web-based services.

VoiceXML is similar to the recently announced Speech Application Language Tags
(SALT) specification which has also been submitted to the W3C. SALT is a
set of light-weight extensions to existing markup languages, particularly
HTML and XHTML, that enable multimodal and telephony access to information,
applications and Web services from PCs, telephones, tablet PCs and wireless
personal digital assistants (PDAs). However, VoiceXML focuses on telephony
application development while SALT is focused on multimodal speech
application development (which, for example, would allow a PDA user to fill
out a form using both voice commands and stylus, whichever is more
convenient).

While the two are different, the specifications do share similar goals and
may eventually converge. In fact, SALT uses key components of the Speech
Interface Framework, including SRGS and the SSML.

Latest Posts

Related Stories