After months of hype but no substance, the SALT Forum finally
released a draft version of the Speech Application Language Tags
(SALT) specification on February 19. My initial impression is that
it was worth the wait and given follow-through from Microsoft, will significantly advance the use of speech driven applications on computer desktops and
PDAs.
What makes the release of the specification significant is the
fact that SALT was initiated by Microsoft. With dominance on the
desktop and a growing market share in the PDA market, the SALT Forum
and Microsoft have made the right move at the right time. While
speech-based applications will likely grow over time with the
improved quality of speech recognition and synthesis, the advent of
multi-model applications that are accessible from multiple devices
will likely become the next step in computing’s evolution.
Here are some of the features of the SALT draft specification:
Focus on multi-modal development – While VoiceXML could be
used on PDAs and the desktop, its focus is on the telephone. SALT
was designed to support multiple devices including the telephone.
Supports XML form of SRGS – why re-invent the wheel? What
an amazing concept. The SALT specification requires support for the
XML form of the Speech Recognition Grammar Specification (SRGS),
which was developed by the Voice Browser group of the World Wide Web
Consortium (W3C) and is also used in VoiceXML.
Parallel tasks – Users can interact with an application
and speak or listen to a SALT application at the same time. For
example, a user could browse a list of tasks on their PDA while they
also listened to a recorded annotation from their boss.
Applications are DOM based – SALT applications will use
the HTML and XML Document Object Model that is already familiar to
Web developers.
Uses SSML for speech synthesis – The Speech Synthesis
Markup Language (SSML) was also developed by the Voice Browser group
at the W3C. SALT utilizes this common format.
Call Control – SALT includes call control features, such
as distributing calls based upon the caller’s phone number. This is
clearly a telephone-based feature, and happens to be a critical
piece that the VoiceXML 2.0 specification lacks.
Applications are scripted in ECMAScript (aka Javascript) –
Like VoiceXML, applications can be scripted with ECMAScript,
however, SALT provides full access to the DOM as well as SALT specific
parameters. In fact, using SALT will require a more programmatic
approach to developing speech applications whereas VoiceXML provides
a wider range of XML elements to program applications in addition to
ECMAScript.
Uses fewer XML elements – In SALT, there are only four top
level elements: <prompt>, <listen>, <dtmf>, and
<smex>. There are additional elements such as <record>
and <grammar>, but there are only 10 XML elements total, versus
over 30 in VoiceXML. This may be good or bad, depending on how you
look at it. From the looks of it, the elements are basically place
holders upon which ECMAScript is hung. This is consistent with the
general approach of developing speech applications in a more
programmatic way versus the “document construction methodology” that is typically used with VoiceXML.
Conclusion
The bell for Microsoft’s dominance of the multi-model application
market may toll when it provides support for SALT in its development
tools, Web browser and mobile and desktop operating systems.
It’s still too early to tell whether SALT will join the
Microsoft Agent as another Microsoft stepchild or whether it will
emerge as part of Microsoft’s emerging .NET strategy.
As for the future of VoiceXML vs. SALT, yes the spec is a direct
competitor to VoiceXML even though SALT members have been careful to
avoid saying it directly. Will SALT displace VoiceXML? It’s too
early to answer that question. What I do predict is that SALT will
be the defacto standard for integrating speech functionality into
desktop, PDA, and Web applications. For now, VoiceXML will likely
remain the dominant standard for developing next generation IVR
functionality that integrates with backend Web applications.
Of course, SALT is just a spec, so you should consider SALT
applications purely vaporware until a vendor is able to produce a
real demo. I’ve not been able to get any of the contributors to
produce such as demo as of yet, so it is still unclear when SALT
will be a viable technology.
If you’d like to know more about SALT or to download the draft
specification, visit http://www.saltforum.org.
About Jonathan Eisenzopf
Jonathan is a member of the Ferrum Group, LLC which specializes in Voice Web consulting and training. He
has also written articles for other online and print publications
including WebReference.com
and WDVL.com. Feel free to send an
email to eisen@ferrumgroup.com
regarding questions or comments about the VoiceXML Strategy series,
or for more information about training and consulting
services.