So far in our series on SALT (Speech Application Language Tags) we have looked at the basic architecture of SALT based applications. We have also taken a closer look at some of the "elements" of SALT and their syntax. Now that we are ready to start developing, we will take a look at the new Microsoft .NET Speech SDK which allows for the rapid development of SALT-based multimodal/telephony applications. We are going to be using the newly-released Beta 2 as the basis for this discussion.
Introduction to Microsoft .NET
Let’s first take an introductory look at the whole Microsoft .NET initiative. From a speech application developers perspective, Microsoft .NET is an environment for developing, deploying and executing web-based and windows applications as well as XML/SOAP-based web services. The objective of the framework is to really arm the developer with a set of tools and a rich set of class libraries, and a flexibility of working with different programming languages (to suit his expertise level). As speech application developers we are focusing on the application business and presentation logic while leaving the "plumbing" work (e.g. connecting to databases, parsing xml etc. etc.) to the underlying tools and framework.
There are really three major components of Microsoft .NET: Microsoft .NET Framework, Visual Studio.NET and .NET Enterprise Servers. Microsoft .NET Framework itself is composed of a set of components, including:
- .NET Common Language Runtime (CLR) which provides an object and type system common across the .NET compatible languages and is responsible for performing all the ground work/memory allocation, thread/process management, managing security, etc.
- .NET Supported Languages: The .NET Frameworks provides developers with the flexibility of using a number of different languages, including the newly developed C# (C++/Java like), C++, Visual Basic.NET and JScript. Apart from these core languages, a number of third party vendors have developed support for a number of additional applications as well.
- .NET Class Libraries., a rich and comprehensive set of classes that provide a lot of pre-built functionality–they are available to all .NET supported programming languages. These include classes built around user interface development, web services, database access, networking, input/output, web application development, XML/XSLT processing, multi-threading, security, etc.
- ASP.NET as the name suggests, it is the next generation of the popular Active Server Pages (ASP) web development environment for creating dynamic web applications. ASP.NET based applications can use any .NET supported languages as a scripting language within a page execution model. ASP.NET has major advances from ASP, including support for web services, server-based web controls and XML-based configuration of web applications.
For years, Microsoft Visual Studio has been the de-facto standard for developing Visual Basic, Visual C++ and ASP based applications. Visual Studio.NET, the next revision of the Visual Studio toolset, builds on its success and supports .NET programming in C#, Visual Basic.NET, Visual C++ and Jscript.NET for web and Windows application development. It provides an integrated development and debugging platform for the development of Windows form-based GUI applications, Windows services, reusable components (or building blocks), web-based applications and web services.
.NET Enterprise Servers including SQL Server, BizTalk Server, Commerce Server, SharePoint Portal Server, Application Center, Content Management Server, Exchange Server, Host Integration Server, Internet Security & Acceleration Server and Mobile Information Server. These provide the basis of pre-built enterprise class applications and services.
.NET Speech SDK
So where does .NET Speech SDK fit in with Microsoft .NET? The beta 2 release of the SDK has three main additions to the .NET framework and toolset.
.NET Speech Add-in for Internet Explorer
This add-in allows IE to be used as a viewer for the applications. Through a parameter in the URL, the application can be tested and used in a voice-only environment (simulating the telephony-based application model) and a multimodal environment (which is really a real world usage of the application).
ASP.NET-based Speech Controls
The ASP.NET-based Speech Controls allow developers using ASP.NET and Microsoft Visual Studio.NET to create multimodal/telephony applications and/or add speech interactivity to existing web applications. The screenshot below (click to enlarge) shows these tools being used to develop a speech-based interactive pizza ordering application.
These controls add to the existing .NET class libraries and provide developers with the ability to add speech-based interaction to their existing applications or to build new applications. The table below shows a quick reference to the functionality provided by these controls.
Control | Function | |
Speech Controls | ||
QA | Collects & process speech/DTMF input from the user | |
Command | Collects inputs such as help, repeat, cancel which is not processed by QA Control | |
Custom Validator | Validates input data through a script | |
Compare Validator | Validates input data by comparing with another control/value | |
Semantic Map | Contains a set of values which provide input controls semantic state and its bindings | |
Style Sheet | Contains a set of common speech controls properties | |
Call Control Controls | ||
Smex Message | Sends a CSTA (Computer-Supported Telecommunications Applications) Message | |
Transfer Call | Transfers the current call | |
Disconnect Call | Disconnects a call | |
Make Call | Initiates a telephone call | |
Answer Call | Answers a Call | |
Call Info | Contains basic information about the current call | |
Application Controls | ||
Alpha Digits | Collects a string of numbers and lettrs | |
Currency | Collects an amount in US dollars | |
Date | Collects a date | |
Natural Number | Collects and validates a natural number | |
Navigator | Allows navigation of a list of table based elements | |
Phone | Collets a US telephone number | |
Single Item Chooser | Allows a user to select a single item from a list by dynamically creating a grammar | |
SSN | Collets a US Social Security Number | |
Yes No | Collects a Yes/No answer | |
Zipcode | Collects a US Zip Code |
Speech Tools
The Speech Tools include grammar builder, prompt builder (shown below) and speech debugger, which aids in constructing and testing different parts of a speech application.
The table below provides a quick reference for the tools provided by .NET Speech SDK.
To be Continued
We will continue our exploration of SALT in the next article by actually walking step-by-step through what is involved in developing a telephony/multimodal application using SALT and Microsoft .NET Speech SDK.
Resources
About Hitesh Seth
A freelance author and known speaker, Hitesh is a columnist on VoiceXMLtechnology in XML Journal and regularly writes for other technology publications on emerging technology topics such as J2EE, Microsoft .NET, XML,Wireless Computing, Speech Applications, Web Services & Enterprise/B2BIntegration. Hitesh received his Bachelors Degree from the Indian Instituteof Technology Kanpur (IITK), India. Feel free to email any comments or suggestionsabout the articles featured in this column at hks@hiteshseth.com.