The SALT (Speech Application Language Tags) specification pairs speech with HTML, XHTML, and XML. The specification boasts a 70-member consortium of backers including industry heavyweights Microsoft, Cisco, and Intel. The group submitted SALT 1.0 to the W3C. SALT is not a W3C standard, though. SALT was submitted to the W3C’s Voice Browser working group, which will consider it as a part of the next VoiceXML standard. This article will touch on the SALT/VoiceXML relationship again near the end.
Because SALT is implemented at the HTML, XHTML, and XML level, it shields developers from some of the complexity of working with individual vendor speech APIs. Both users and developers benefit from SALT’s ability to add speech capability for a multitude of devices—traditional PCs as well as telephones, PDAs, and other devices.
SALT actually goes beyond just the “speech” part of its name, though. The SALT specification includes “multi-modal” access, which means the user can access the application through a variety of input methods. Speech is the obvious one, but SALT also provides support for telephone keypads and for mixed mode input where a user can switch between typing or pointing for input and providing input via speech. For output, a SALT application can provide multiple output types as well including traditional text and graphics, synthesized speech, and audio. The application should detect the proper output type for the user client device and respond accordingly.
Processing SALT applications is done differently depending on the type of client platform. For example, when a user accesses a SALT-enabled Web page from a PC, the PC will provide the processing horsepower to take the speech input and turn it into input for the server. Likewise, the PC will render any text-to-speech audio output. On the other hand, SALT access from a processorless device such as a telephone depends on a SALT-enabled server for the user to call into. In that scenario, the server processes the user speech into commands and the audio output is rendered server-side as well.
With Microsoft backing, SALT is of course a part of the .NET Framework for Microsoft developers. Microsoft has a speech SDK (it is still in beta at this time) for .NET developers, which uses SALT. Developers should be familiar with ASP.NET before attempting to work with SALT.
In a sense, SALT is a competitor to the VoiceXML standard for speech applications. However, SALT does incorporate some VoiceXML and the related W3C standards SRGS (Speech Recognition Grammar Specification) and SSML (Speech Synthesis Markup Language), so the two are not completely separate items.
SALT will probably appeal more to Microsoft developers and other developers coming in to speech from a Web development background. Experienced Web developers will understand the SALT development model because it uses the event-model most Web applications are built on.
VoiceXML could be more appealing to developers with a background in traditional telephony or IVR (Interactive Voice Response) applications. VoiceXML could also be more appealing where the application has a very strict flow definition, like the ones you would find traditionally serviced well by an IVR application. VoiceXML is a larger standard because it is a complete standalone markup specification where SALT depends more on existing functionality handled by other Web application specifications. VoiceXML also has the advantage of being a more widely supported standard, with more than 250 companies involved in the VoiceXML Forum. VoiceXML is a more mature standard as well, now in a ratified version 2.0 with the original specification dating back five years to 1999. However, SALT supporters point out that VoiceXML’s maturity can also be a SALT advantage. VoiceXML has roots in an earlier Web era where SALT is based in more modern Web development architecture.
For further reading, Hitesh Seth has a series of five SALT articles on Developer.com, starting with an introduction SALT: By Example, which quickly dives into some SALT code.
For further reference see:
Jim Minatel is a freelance writer for Developer.com in addition to working with Wiley and WROX publishing.