VoiceReview: BeVocal Cafe

Review: BeVocal Cafe

Introduction

Similar to Tellme Studio, BeVocal Cafe’s prime role is to provide VoiceXML developers with a hosted telephony environment that integrates multiple speech technologies, including a VoiceXML interpreter, speech recognition, a Text-to-Speech engine and speech verification.

Support for Standards

BeVocal Cafe supports VoiceXML application development based on the VoiceXML 1.0 Specification and also provides partial support for the VoiceXML 2.0 draft specification. From a grammar perspective, Cafe supports grammars developed in multiple languages, including support for ABNF, Nuance GSL, XML and JSGF (Java Speech Grammar Format) based grammars.

First looks – File Management

An essential aspect of a hosted VoiceXML development platform is to provide a basic tool for configuring a VoiceXML application based on a remote URL or providing an area where VoiceXML documents can be temporary stored. The File Management tool provides that functionality. Both local files and remote URL-based application development (with validation of the VoiceXML documents) is supported. On the Cafe site’s file area, developers can store about 2MB of content including grammar files, VoiceXML documents, text files (for input to the batch mode Vocal Scripter, reviewed later), JavaScript scripts, etc. This tool is the key to the rest of the tools along with one of the VoiceXML documents locally stored or the URLs will need to be activated for testing by the Cafe environment.


Click to see a larger version of this image

"Calling your Application"

After you have either uploaded your VoiceXML application through the File Management utility or you have linked your external web server URL and activated the appropriate application, you are all set to "call" your application to test it. A number of call-in options exist; for North American developers BeVocal provides a toll-free number – (877) 33-VOCAL; International users can use the non toll-free counterpart, +1 (408) 907-7328; for European callers, BeVocal has also provided a UK number (+44) 20.7961.3985 as well. When you are connected you will be prompted for your 4-digit pin (which you would have set on registration) and optionally your user ID which is typically your 7-digit phone number (a simple shortcut which doesn’t require the user to enter their user ID is to use the last 7 digits of your caller-ID as your user ID).

Once you are ready for your application to be available to other users you can make your application available on the BeVocal consumer Portal (1-800-4BVOCAL). To access the application, your demo users only need to provide a demo ID (which is configured in the demo configuration step).

As we will see later in this article, if you prefer to test your application using the browser-based interface you can do so using the Vocal Scripter tool. The Scripter simulates a call but through a simpler text-based interface. This can be useful for instance if you are constantly refining your VoiceXML application.

Vocal Player

As part of the execution of the VoiceXML application in the BeVocal Cafe environment, an audio recording is stored for the calls. Using the Vocal Player tool, these recordings can then be played and analyzed for any enhancements/changes that are required. From the user interface of the Vocal Player, it appears that the recording is broken into multiple, smaller .wav files, which are then played in sequence. A suggestion for the next release of the Cafe would be an option to download a combined .wav file to a PC so that the user can provide that as a reference to their end users for demonstration, or to serve as a "speech mockup". Vocal Player is implemented as a Java Applet, and requires JMF (Java Media Framework) to be installed and configured in the CLASSPATH on the local machine (the jar file containing the JMF implementation is available on the BeVocal Cafe site but requires manual configuration).


Click to see a larger version of this image

If you are interested in the more detailed multi-level textual logging, there is a companion tool called the "Log Browser." It allows the user to view the multiple levels of textual debugging information that is stored for each invocation of the VoiceXML application.

Vocal Scripter

As mentioned at the beginning of this review, the main function of the Vocal Scripter tool is to provide a simple textual-based simulation of the VoiceXML application. As is evident from the screenshot shown on the previous page, the Scripter can take both interactive inputs entered by the user or use a batch mode for input with the input coming from a text file. The text file can also be created by recording the user input while in a scripter session as well. One of the issues with Scripter is that when you are entering numbers you need to either spell them out or enter them with spaces. For the Vocal Scripter to recognize a phone number, an input such as (7 3 2 5 8 4 5 9 4 0) is required, instead of (7325845940). The Vocal Scripter is implemented as a Java Applet.


Click to see a larger version of this image

Vocal Debugger

A runtime tool, as the name "debugger" indicates, Vocal Debugger has functionality that is similar to a "Debugging Application" in a traditional IDE (Integrated Development Environment). For instance, similar to the watch variable list, the debugger shows the various field values and the flow control of a running VoiceXML application. The figure below shows the Vocal Debugger in action with a simple application which identifies a user’s city and state based on a recognized phone number. Due to the nature of the speech application, however, the variables are only shown for a fraction of a second as the call control passes from one tag to another. It would be really helpful from a debugging perspective if the tool could add in the missing break-point functionality, pointing to a line number in the VoiceXML source, that exists in a typical debugging environment so that the many variable values can be analyzed.


Click to see a larger version of this image

To be Continued

Apart from some of the tools that make up the Cafe, it contains a comprehensive set of developer resources. These resources include a large collection of pre-recorded audio prompts, a good selection of VoiceXML samples (including dynamic scripting- (JSP/Perl) based examples) and developer documentation that includes a VoiceXML “getting started” guide, a programmer’s guide and a VoiceXML tutorial & reference.

In the Vocal Scripter section we saw a simple VoiceXML application that could recognize any company/index which can be used to get stock quotes and other trading-related information. The VoiceXML application was built using a Speech Object that is provided by BeVocal Cafe. Nuance SpeechObjects is a component model for reusing and creating building blocks for speech applications.

In the next issue of our continuation of this review of BeVocal Cafe, we’ll look at SpeechObjects, a technology that is supported by BeVocal Cafe. We will also review some of the extensions to VoiceXML that BeVocal has developed specifically around the topic of voice enrollment and speaker verification.

Resources

About Hitesh Seth

Hitesh Seth is Chief Technology Evangelist for Silverline Technologies, a global eBusiness and mobile solutions consulting and integration services firm. He is a columnist on VoiceXML technology in XML Journal and regularly writes for other technology publications including Java Developers Journal and Web Services Journal on technology topics such as J2EE, Microsoft .NET, XML, Wireless Computing, Speech Applications, Web Services & Integration. Hitesh received his Bachelors Degree from the Indian Institute of Technology Kanpur (IITK), India. Feel free to email any comments or suggestions about the articles featured in this column at hitesh.seth@silverline.com.

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends & analysis

Latest Posts

Related Stories