More than a Grain of SALT: Industry Leaders Assess How Specification Is Laying Foundation for Speech Technologies
"To check your account balance, press or say 'one' now."
For many, the sound of such instructions inevitably corresponds to a steady increase in blood pressure. This kind of traditional, proprietary telephony application, which includes limited speech interaction, has allowed businesses to take some initial steps toward lowering costs, but hasn't always been a favorite with customers.
Now the movement toward broad, widely available speech applications based on Web standards and infrastructure is gaining momentum in the technology world, and it's only a matter of time before the older, simplistic voice response system makes way for truly sophisticated speech interaction between humans and information systems. In addition, applications and services based on Speech Application Language Tags (SALT), which can utilize a "multimodal" approach combining voice and visual interaction on a range of devices, are moving closer to the marketplace.
Imagine calling your computer to have it read back e-mail messages, or even instructing it to call you back when the important e-mail you're expecting arrives. Imagine your car telling you how many miles or kilometers the fuel left in your tank will cover, then displaying a map of local service stations within that range. Imagine having instant access to account and billing information, without having to wait 20 minutes in a customer-service call-center queue.
The foundation for these kinds of applications is being laid today with the emergence of new specifications such as SALT that have fostered the kind of broad industry support necessary to take speech technology and its related benefits to a much broader audience, to a wider range of devices, with a wider range of service capabilities.
For businesses, this technology can offer value beyond reduced costs, such as increased employee productivity, increased customer satisfaction and new revenue opportunities. These applications are expected to bring significant value to many industries, especially those with extensive call-center service operations, such as financial services, retail and insurance. For consumers, the possibilities for new kinds of services and entertainment are limited only by the imagination.
As companies plan for this new opportunity, developers and vendors are hard at work creating the ecosystem of technologies that are making the promise of speech a reality. As part of this effort, Microsoft recently announced the shipment of a technical preview for the SALT-based Microsoft Speech Platform, which includes an enterprise-grade speech-recognition engine developed by Microsoft, a text-to-speech engine from industry partner SpeechWorks for enabling voice output of corporate or Web data, and a Microsoft SALT voice interpreter.
The new platform -- along with the Microsoft Speech Software Development Kit (SDK), which integrates into Visual Studio .NET (currently in beta 2 release and available now at www.microsoft.com/speech/) -- is expected to help broaden and advance the technology of speech by providing a common set of tools for developers, allowing them to more easily build applications and services that utilize the largely untapped power of speech in helping humans to interact more naturally with information systems.
Only a year and a half old, SALT has gained broad support from leading businesses across the value chain for delivering speech solutions, including hardware vendors, interactive voice response (IVR) suppliers, telecommunications carriers, handset manufacturers, speech technology application firms, and service providers such as systems integrators. These companies are developing technologies and services that take advantage of the power of SALT and other speech-enabling specifications, not only with speech interaction by telephone, but multimodal interactions as well. SALT can support a whole spectrum of devices, including PCs, Tablet PCs, telephones, cell phones, smart phones and wireless PDAs. Since many of these devices contain displays, multimodal interactions are a key focus.
To learn more about where the speech movement stands today, where SALT fits into the picture, and what to look for down the road, several industry leaders were asked for comment on a new era of voice interaction with information systems:
Vice President of Marketing
"We believe that all new technologies are driven when you provide people with substantial offerings, and also empower them with choice, and certainly SALT does that."
"HeyAnita was initially focused on the speech-only interface, and at the time multimodality was just a broad concept. But SALT really brings that promise to life, because it enables people to choose input and output modes that are relevant for them. It empowers them with choice. We believe that all new technologies are driven when you provide people with substantial offerings, and also empower them with choice, and certainly SALT does that.
"The ability to switch between modalities I think is extremely compelling, and it's actually going to become a necessity in an environment where everyone wants to be connected all the time. Think of a salesman walking into a meeting. The multimodal interface would allow him to request a PowerPoint from his e-mail via telephone, and to have it sent to the customer's laptop right as he walks in. That kind of efficiency and flexibility can be built into almost any kind of application using the multimodal capabilities afforded by programming styles such as SALT. It provides the best of all worlds, and leaves the decision up to the user.
"With the announcement of our FreeSpeechTM SALT Voice Browser last year, HeyAnita became the first to support all of our current applications with SALT. This includes powerful efficiency-based applications such as voice-activated dialing, voice access to e-mail and voice-SMS as well as all sorts of different content and information-based applications such as weather, news and sports. We also have other offerings, such as the HeyAnita Voice Care product suite, which is geared toward call centers, again utilizing SALT to provide the voice interface now, and then enable a company to move forward into the multimodal world."
Page 1 of 3