Microsoft is about to release a free Beta version of its .NET Speech SDK this week. This is Microsoft’s first foray in the growing voice Web market where speech recognition technology is fused with Web technologies and the telephone.
The .NET Speech SDK
will support the SALT XML specification developed by the SALT Forum,
a standards group formed by Microsoft, instead of VoiceXML, which is
being developed by the World Wide Web Consortium. Though SALT
supports much of the same functionality as VoiceXML, it has branched
out from a telephone constrained model into a multi-modal approach
where developers will be able to deploy speech applications to Web
browsers, telephones, and mobile devices.
For example, in the
future, users of a SALT-enabled PDA application that provides
driving directions could enter their starting location by speaking
an address, writing it, or by selecting the location on a map. This
type of seamless multi-modal interactivity that utilizes the same
Web programming model will drive a new wave of applications that
will be more natural and flexible for users. But SALT is not just
limited to the PDA.
For now this Beta release will only support the desktop
version of Internet Explorer and telephone access via a telephony
emulator. Support for Pocket Internet Explorer for PDAs is to be
released in the near term.
The Beta SDK integrates with the Visual Studio.NET development
environment and will include an extension for Microsoft Internet
Explorer (IE). This extension will enable the browser to interpret SALT tags
and execute voice content in-line with other XHTML content. Because
the SALT programming model is Web-centric, developers who are
familiar with the IE DOM programming model will be able to learn the
SALT language quickly.
The SDK will also include an ASP.NET control
for integrating speech dialogs with dynamics scripts on an IIS
server, a tool for developing Speech Recognition Grammar Format (SRGF)
grammars, a prompt editor for recording and editing recorded audio
prompts and a SALT debugger. SRGF is the format that is used to
define the words and phrases that can be recognized by the speech
recognition software. Interestingly enough, SRGF was developed by
the W3C and is also being used as the speech recognition format for
VoiceXML 2.0.
Developers will be able to test applications on their desktop via
a telephony emulator, speech recognition software from Microsoft,
and a text-to-speech engine licensed from Speechworks, all of which
will be included with the beta SDK.
Web pages that integrate SALT tags can be deployed by installing
the IE SALT extension and speech recognition software included with
the SDK. Telephone-based applications can be tested on the desktop
via the telephony emulator in the short term, with Microsoft
releasing a .NET Speech server by the end of the year according to
James Mastan, Group Product Manager of the .NET Speech Platform. A SALT extension for the Pocket Internet Explorer browser will allow SALT
applications to run on Pocket PC devices, and is expected to be released in the near term. A specific date has not yet been announced for that release.
About Jonathan Eisenzopf
Jonathan is a member of the Ferrum Group, LLC which specializes in Voice Web consulting and training. He
has also written articles for other online and print publications
including WebReference.com
and WDVL.com. Feel free to send an
email to eisen@ferrumgroup.com
regarding questions or comments about the VoiceXML Strategy series,
or for more information about training and consulting
services.