VoiceSpeak and Listen to the Web using SALT

Speak and Listen to the Web using SALT

Developer.com content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

Attention is the real currency of business and individuals. The realm of the Internet most prominently displays the economics of attention, thereby placing the rising need to attract and retain attention as one of the highest priorities of Web site designers. Amongst the tactics employed to achieve this, the site has to offer the following four: relevance, engagement, community and convenience. Finally, there is a technology marvel available to enable three of these most important goals. Web developers can build and deploy speech-enabled applications at Web sites using the SALT (Speech Application Language Tags) specification, which has been developed by Microsoft, Cisco, Intel and other industry leaders. Voice Web StudioTM from Voice Web Solutions provides the tools for your Web developer to build and deploy SALT applications that will enable millions to “Speak and Listen to the Web”.

SALT is an extension to existing Web programming models and markup languages such as HTML, XHTML and XML. SALT facilitates speech interfaces that reside alongside traditional input/output modes such as text, audio, video and graphics. SALT is an emerging technology standard for deploying speech enabled accessibility services that will reach out to the world’s 40 million visually impaired.

The Web developer can use Voice Web Studio to create SALT-enhanced Web pages that can support speech access to Web content through a variety of devices, including telephones, internet kiosks, desktop and tablet PCs, and PDAs. The use of speech as an additional interaction mode to the standard modes of interaction (keyboards, pointing devices, and touch screens) is where one attracts the precious attention of the user.

Single modality is seen in the case of a touch-tone phone system. Here a touch-tone input is followed by a pre-recorded or synthesized speech output. Multimodality is where we can employ multiple user interfaces with the application. One could, for example, interact with an application through a speech input, whereas the response output could be provided through both a speech output as well as a screen display. This multimodal interaction does not have to be only through your desktop PC; it could be through your wireless PDA device as well.


The application areas for enhancing your Web site with speech-enabled features are limited only by the breadth of your vision. You could provide better customer service by integrating your speech interactive Web site with call centers. On the other hand, you could incorporate it into your e-learning solutions to guide the students with spoken instructions as well as accept voice responses from such students. Building interactivity is a critical strategy in improving instructional effectiveness in online teaching. You could thus be using technology to simulate the in-class learning environment with unlimited reach.

Online banking, online entertainment and mobile computing are some other applications that can benefit with speech-assisted elements. In the E-governance arena, speech-assisting the filling up of Web forms and guiding citizens through information retrieval processes can take the paradigm of citizen service to a new level.

This technology also offers the important benefit of enabling accessibility to important Web content for the visually impaired. Section 508 of the ‘Americans with Disabilities Act’ requires that public Web sites meet accessibility standards. SALT enables your site to deliver audio results to disabled users, thus allowing your site to be more accessible to those who are visually or motor impaired. The disabled user can access existing Web content using the standard desktop computer. The speech add-in that installs on the desktop is available at little or no cost and installs easily – just like a screen reader. Screen readers limit the user to using a mouse or keyboard to read the current location of a Web site aloud to the user. In contrast, the speech-enabled sites can read the Web content to the user and allow the user to use their microphone to verbally fill out forms and navigate the site, in addition to using the mouse or keyboard. Thus, the user can interact with the computer using speech, which is not possible with a screen reader.

Newspaper and book reading services and document reading services is another application area that can employ this technology effectively. Aircraft mechanics, insurance adjusters, real estate agents and others can also use it to increase productivity by being able to stay hands free and yet fill out Web based forms or hear spoken information.

SALT Elements

SALT is a set of XML elements and their associated attributes, events and methods. The five most important elements for multimodal use include the prompt, listen, grammar, record and bind elements.

A speech prompt is a SALT element that instructs the browser to play a recorded audio file or text-to-speech synthesis. Any events such as voice input from the user or an event that occurs in HTML (page loading, a button being clicked, mouse over, etc.) can trigger a prompt to play.

A listen element activates the device’s microphone to listen for speech input or record audio. As a parent element, the listen element manages the grammar element (for specifying input grammars during speech recognition) and the record element (for recording speech as a wav file).

The bind element enables you to add results from speech input and insert functionality to HTML elements on the page such as forms, text and other elements.

SALT Development Tool

Voice Web Studio by Voice Web Solutions, www.voicewebsolutions.net, is one such tool that enables developers to build SALT-enhanced Web sites that recognize speaker input, generate text-to-speech, and record audio. By using the highly popular Macromedia Dreamweaver MX as a development platform, Voice Web Studio enables developers to speech activate any new or existing Web page directly within the familiar and versatile Macromedia programming environment. With Voice Web Studio, developers can design and edit speech recognition, audio playback and human-computer dialog controls, as well as create HTML behaviors based on speech events. Thus, developers can quickly build a speech dialog grammar that listens for the user input and activates Web links, scrolls up and down, fills forms, and more.

About the Author

Brian Graham’s role as Chief Architect is to lead Voice Web Solutions’s vision as the premier Voice Web tools provider. Brian brings extensive experience in the development of Voice Web technologies to Voice Web Solutions, and has served on the VoxML and VoiceXML forum since their inception in addition to the SALT forum. In 1999, Brian’s research and development efforts led to the creation of Drive It! – the world’s first application designed to deliver fully automated, speech driven, point-to-point Internet-based driving directions via the telephone.

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends & analysis

Latest Posts

Related Stories