Review: BeVocal Cafe (Part II)
SpeechObjects
SpeechObjects are a set of open, reusable components that encapsulate the frequently used functionality in a speech application--components aren't new to the world of application development. Depending on the choice of your development environment, as a developer you would use either EJB, COM, Microsoft .NET, etc. components as part of your application. The objective of SpeechObjects is to enable the reuse of these components and provide an object-oriented methodology to speech developers. SpeechObjects, which have been defined by Nuance Communications, can be used within the context of a VoiceXML application using the <object> and <param> tags.
BeVocal Cafe supports Nuance SpeechObjects methodology and allows developers to reuse a bunch of SpeechObjects developed by Nuance. In addition, Cafe includes a small set of speech objects which are specific to the Cafe environment and can be used by VoiceXML developers. The table below shows the various SpeechObjects that are supported by Cafe:
SpeechObject | Recognizes? |
nuance.so.SOAlphaDigitString | An alphanumeric string |
nuance.so.SOBrowsableList | Select an item by reading a sequence of items to the caller |
nuance.so.SOConfirm | Confirmation dialog |
nuance.so.SOCreditCardInfo | Credit-card related information |
nuance.so.SODate | Date |
nuance.so.SONATelephoneNumber | A 10-digit telephone number |
nuance.so.SOQuantity | Quantity (e.g. twenty two) |
nuance.so.SOSectionedDigitString | A sectioned/delimited string |
nuance.so.SOSimpleDigitString | A fixed-length digit string |
nuance.so.SOSocialSecurityNumber | SSN |
nuance.so.SOTime | A time expression |
nuance.so.SOUSCurrency | Amount in dollars/cents |
nuance.so.SOUSZipCode | U.S. 5/9 digit postal code |
nuance.so.SOYesNO | yes/no response |
bevocal.cafe.SOAirline | Airline name |
bevocal.cafe.SOPickStock | Equity name |
bevocal.cafe.SOCityState | City and state |
bevocal.cafe.SOStreet | A street in a particular city/state |
bevocal.cafe.SOStreetNumber | A street number in a particular street/city/state |
To illustrate the value of SpeechObjects, let's take a look at an example. The VoiceXML code snippet below shows a simple stock trading application prototype which recognizes an equity name or index and returns the name of the equity. The benefit that SpeechObjects brings to the table is clear from the simplicity of the code required to achieve the functionality. For instance, if this were to be coded in plain VoiceXML, the developer would need to create a fairly complex grammar which included all the equities traded on the stock exchange.
Speaker Verification
Whether you develop a client-server, web, wireless or speech application, security is always a concern. A key aspect of application security is authentication. An authentication mechanism allows an application to recognize a valid user for the application. In traditional web applications, authentication is typically handled through a combination of user-id and password. Some more secure web applications also allow the user to use a digital certificate as a token for authentication. In the world of speech applications, application authentication is typically managed through a combination PIN (personal identification number), Full Names (as cryptic user-ids can be hard to recognize), account numbers and/or telephone numbers. For instance a typical authentication dialog for a speech application would be something like:
"Please say or enter your account number" followed by
"Please say or enter your PIN."
This would allow the application to authenticate the user. The world of speech based applications allows a different form of authentication--a user's speech itself. Similar to a fingerprint which serves as a token of identity for a person, a user's natural speech can be constructed into a Voice Print which can recognize the user. Currently, VoiceXML doesn't include pre-built support for Voice Print related technologies, however several vendors such as Nuance and SpeechWorks have built speech verification products into their core recognition technologies. Cafe provides support for Voice Print-based speaker verification to VoiceXML developers through two tags - <register> and <verify>. As the name probably suggests, the <register> tag is used to register a Voice Print of the user into an application, whereas the <verify> tag is used to verify that same voice print. Both tags have a common identifier, the "key expression," which is used to store/retrieve the Voice Print.
The listing below shows how the <register> tag can be used.
Now that the Voice Print has been registered, the <verify> tag can be used to authenticate a user.
Page 1 of 2
This article was originally published on November 12, 2002