This article is part of a series of articles about building speech-enabling IVR (interactive voice response) using MS Speech Server 2004 and MS Speech Application SDK. The last article discussed creating Grammar and Prompts in IVR applications; this article outlines the telephony concepts and telephony functionality within MS Speech Server and SASDK. Telephony is the interface between PSTN and your speech-enabled IVR Web-based application; the developer with a background in Web application development requires a basic understanding of telephony when building speech-enabled IVR applications.
TAS (Telephony Application Services)
Telephony Application Services (TAS) and Speech Engine Services (SES) are both servers within the MS Speech Server. SES provides engine services on the server side that perform interpretation of the SALT and speech processing, including a speech recognition engine, prompt engine, and TTS engine for speech-enabling IVR customers through TAS. The Speech Recognition Engine component works on handing a caller’s speech input; the Prompt engine joins pre-recorded prompts from a prompt database and plays them back to the caller; the TTS engine works Text-to Speech to synthesize audio output from a text string. TAS serves as a connection proxy between PBXs and SES by managing a set of SALT interpreters, calling control Call control (establishing, managing, and terminating the voice connection) and telephony interface managers.
CSTA is an ECMA (European Computer Manufacturers Association) Standard with an extensive feature set and a comprehensive call model. CSTA supports basic first-party call control and advanced third-party call control with the same standardized model. CSTA exposes advanced communication platform features to telephony application developers without burdening them with underlying protocol specifics. CSTA specifies an Applications Interface and Protocols for monitoring and controlling calls and devices in a communications network. These calls and devices may support various media and can reside in various network environments such as IP, TDM, and mobile networks.
Telephony Call Control
Call control describes the collection of functions responsible for establishing, maintaining, and terminating calls. The IVR applications require call control functions such as answering a call, transferring a call, conferencing, making a call, disconnecting a call, and so forth.
In IVR applications with an analog phone line, call signaling is accomplished in-band; that is known as channel-associated signaling (CAS) because the signaling is transmitted in the same channel as the voice. Tones are also used for in-band signaling on digital connections. Alternatively, call control can be implemented as out-of-band signaling protocols; in this way, the signaling is communicated on a separate channel from the voice. Out-of-band signaling is more reliable and scalable than in-band signaling. For instance, in ISDN a single 64 kpbs signaling channel can support 23 voice channels. The out-of-band signaling protocols include SS7-ISUP, SIP, H.323, and so on.
Actually, two kinds of call control, first-party and third-party call control, are usually used in a speech-enabled IVR application to describe the relationship between the application and the call. In first-party call control, the IVR application is also a talking party on the call. This implies a direct connection between the caller and the application. In third-party call control, the IVR application is not necessarily a talking party. By using third-party call control, an application can simultaneously monitor several calls. Normally, MS Speech Server works on a first-party call control model to perform call answering, play prompts, transfer a call, and disconnect a call. If you want to implement complex telephony applications such as contact center and CRM, you can consider using CTI (Computer Telephony Integration) middleware such as Intel NetMerge Call Processing Software (formerly CT Connect), Genesys CTI, Cisco ICM CTI, Avaya ASAI, and Nortel CTI Links.
PBX Connects to Speech-Enabled IVR
A speech-enabled IVR application deployed in the enterprise environment can have many kinds of connection schemes among PSTN, IVR, and PBX, such as directly to the PSTN or sit behind or in front of a PBX. A PBX often supports multiple protocols simultaneously. In the MS Speech Server, normally the physical connection to the TAS/SES speech server can be analog lines or digital trunks by telephony interface boards installed in TAS server. Analog connections are deployed using interfaces similar to your home phone line. Digital circuit-switched networks use a time-division multiplexing (TDM) where a single voice channel occupies 64 Kbps of bandwidth. In North America, it called a T-1 that 24 voice channels are multiplexed into a single 1.544 Mbps bit stream. In Europe and most regions of Asia, E-1 is used; it has 30 voice channels with a 2.048 Mbps bit stream.
Call Management Controls in MS SASDK
The MS Speech Application SDK uses Computer Supported Telecommunications Applications (CSTA) services to implement telephony call control functionality that connects a phone call to Speech Controls on an ASP.NET Web page. In the current version, Call Management Controls consists of two support classes, and five classes that create controls.
Support Classes consists of a CallInfo class and SmexMessageBase class:
- The CallInfo class creates an object that exposes current call information after the AnswerCall control was running. Some information of this object could be used in CTI applications.
- The SmexMessageBase class is an internal abstract class from which the remaining Call Management Controls are derived.
The RunSpeech dialog manager activates Call Management Controls as well as Dialog Speech Controls and Application Speech Controls, according to their SpeechIndex properties and their source order on the Web page. In speech-enabling IVR applications, normally its Web page starts with an AnswerCall control, and ends with a DisconnectCall control.
- The AnswerCall control answers a telephone call. It is derived from SmexMessageBase. AnswerCall would be the first call control when you create an speech-enabling IVR application.
- The MakeCall control initiates a telephone call. It is derived from SmexMessageBase.
- The TransferCall control transfers the current telephone call to another phone. In a call center/IVR environment, typically TransferCall control transfers the IVR phone calls to a queue of PBX or an appropriate agent. It is derived from SmexMessageBase.
- The SmexMessage control handles generic CSTA messages and events. When you customize call management controls in your applications, they should be derived from SmexMessage. The SmexMessage is derived from SmexMessageBase.
- The DisconnectCall control disconnects a current telephone call. It is derived from SmexMessageBase. Typically, this control is used to terminate the IVR call.
Implement Complex Call Control Using SmexMessage
As mentioned above, the MS SASDK just provides several basic call control functionalities. If you need complex call control such as conference, supervised transfer, and so forth, you can do these by using the SmexMessage class. In fact, the SALT interpreter of the Microsoft Speech Server (MSS) is responsible for establishing a communication channel to the Telephony Interface Manager (TIM) to implement call control. The SALT <smex> element is used for this simple communication channel where XML messages are sent to the TIM (using the sent property) and received from the TIM (using the onreceive event). The XML message defined in Standard ECMA-323 consists of CSTA XML service requests and events. Typically, the SALT application makes service requests and the TIM responds with service request responses and call control events.
Telephony Hardware in MS Speech Server
TAS is comprised of both telephony hardware and a software interface. So far, the telephony hardware that can currently work with a TAS server include Intel Dialogic D41JCT, DM/V480, and DM/V960, which have 4, 48, and 96 voice ports. Normally, TAS must work with third-party Telephony Interface Manager (TIM) software that is an interface between TAS and telephony hardware; right now, for TAS software, Intel NetMerge CallManager and InterVoice TIM exist in the marketplace.
CTI Information in MS Speech Server
Computer-telephone integration (CTI) allows PC-based telephony applications to integrate with proprietary PBXs to retrieve the caller’s information such as ANI, DNIS, and Entered Digits (for example, the caller’s account number and PIN), and then implement a screen pop-in desktop agent according to the caller’s account information that is retrieved from the data base source. At the same time, the application can instruct the PBX to transfer the call from the TAS to a queue of PBXs or an appropriate agent.
The Callinfo class of MS SASDK could provide information that might be used in a CTI application. For instance, CallID properties provide the call ID for the current active call; CalledDevice properties can get the called device information that is provided by the phone network; basically it is DNIS. CallingDevice properties provides the calling device information that is given by the phone network, normally called ANI; In some specific IVR/CTI applications, you may need to know the port number of telephony hardware boards for the current active call. You can retrieve the port number from CorrelatorData properties.
This article presented the basic telephony concept and telephony functionalities in MS Speech Server and SASDK. The next article will discuss speech dialog design and development.
About the Author
Xiaole Song is a professional at designing, integrating, and consulting Telecommunication, CTI, IVR, Speech, Call Centers, and IP Telephony. Feel free to e-mail any comments about this article to [email protected].