VoiceReview: IBM WebSphere Voice Toolkit

Review: IBM WebSphere Voice Toolkit

Introduction

IBM WebSphere Voice Toolkit is a complete integrated VoiceXML application
development platform. Based on IBM’s next generation open source Eclipse IDE
(Integrated Development Environment)
platform (http://www.eclipse.org), Voice Toolkit includes the traditional rich
IDE features such as project/view Management, source code control management
(with integration with SCM (source code management) tools such as CVS) and an integrated set of VoiceXML
based tools including  application generation wizards,
VoiceXML editor, grammar development and testing tools, debugging, development
of both static VoiceXML content and J2EE (Java 2 Enterprise Edition) based dynamic application  and a
broad set of reusable dialog components. In the rest of the article, we review
key features of the Toolkit and the features it brings to the table for VoiceXML
based application development. Voice Toolkit supports VoiceXML 1.0 based
application development using multiple grammar formats.

Installation

Installation of the WebSphere Voice Toolkit requires installation of
three components – the Voice Toolkit itself, WebSphere Voice Server SDK and the
IBM Reusable Dialog Components. Another component, the Voice Application
Debugger (reviewed later in the article) which is currently in beta stage is
optional but adds important step-by-step debugging facility. The Voice Server
SDK includes desktop versions of IBM TTS (Text-to-Engine) and IBM ViaVoice ASR
(Advanced Speech Recognition) Engines. All these
components are available for download for Windows 2000 based development
environments from
IBM Voice Systems homepage.
(see the Resources section)

First Looks: The VoiceXML Editor

Perhaps one of the basic and the most common and useful features available in a
number of VoiceXML based IDEs, is a VoiceXML editor. IBM’s Voice Toolkit’s
VoiceXML IDE is based on a generic XML IDE but has features which are useful for
a VoiceXML application developer such as content assist, bookmarks, tasks.
Particularly interesting is the content assist feature which through either
a context-sensitive drop down menu or a hotkey (Ctrl-Space bar), provide
possible a list of the VoiceXML tags & attributes. The content assist feature is
driven based on the DTD (document type definition) based VoiceXML specification (as shown in figure below;
click the figure to see a complete IDE). The content assist feature is also
customizable, through macros which can be created for tags, attributes and
attribute values.


Pronunciation Builder

Apart from the development tools for VoiceXML planet, IBM’s forte in speech
systems is the capability to execute and host Voice Applications (function as a
VoiceXML gateway) with products such as IBM WebSphere Voice Server and
Integration with IVR (Interactive Voice Response) platforms such as DirectTalk. VoiceXML currently doesn’t
have a standard for representing creating phonology. However, Pronunciation Builder
(screenshots – 1,
2), a component of the VoiceXML Toolkit allows the developer to compose IPA
(International Phonetic Alphabet) based pronunciations of unknown
words (such as uncommon names or words typically said in a different fashion). For instance you could change the default pronunciation
of J2EE to be "J 2 double e" (represented in the IPA as "ʤeɪ tu ˈdʌ.bəl i") instead of the standard
"j 2 e e" (represented in IPA as "ʤeɪ tu i i").
The tool automatically adds a reference to the composed pronunciation into the VoiceXML
document using IBM’s VoiceXML extension tag "<ibmlexicon>" as shown in the
following code snippet. These composed pronunciations are then used by the IBM
Text-to-Speech Engine to appropriately create the correctly pronounced
synthesized speech using the IBM ViaVoice Text-to-Speech Engine.

Audio Recorder

One of the best practices in early VoiceXML application development is to keep
synthesized Text-to-Speech minimum. Instead pre-recorded prompts for dialog
introductions provide personality to the application. For integrated development
of audio prompts, Voice Toolkit, includes a
pretty basic audio recorder (shown below) which allows a developer to
record/edit .au/.wav based audio prompts which can be used for development and
later for deployment.

Voice Application Debugger

As developers of Java/C++/Visual Basic/Web applications we all are used to
debugging applications – the traditional breakpoints, step by step walkthrough,
variable watches, interactions etc. VoiceXML being dialog based system, leans
itself into the traditional programming paradigm, the major difference being
that inputs and outputs can be voice (pre-recorded/generated), and subroutines
are sub-dialogs. Voice Application Debugger a utility released out of IBM’s
alphaWorks division, integrates this step-by-step debugging methodology into the
Voice Toolkit. The debugger adds a menu item item called "Debug VoiceXML" which
starts the debugger with the VoiceXML document currently edited. The debugger (shown
below) also supports debugging of remote (URL based) VoiceXML applications.

Grammar Development

Your VoiceXML application is as rich/good as the grammar it supports. Grammar
development and testing is also perhaps the most difficult and also most
important part in the development of VoiceXML applications. A number of grammar
formats are being used by VoiceXML gateways and hosted voice portals including
JSGF (Java Speech Grammar Format), BNF (Backus-Naur Form),  XML based  grammar formats etc. IBM WebSphere Voice
Toolkit supports development, testing and inter-conversion of JSGF and BNF based grammars. Two
important functions included in the toolkit around grammar development include a
wizard for generation of possible utterances (screenshot)
and another for testing a grammar (shown below)
with any text/speech based utterances. Some features that I would like to see in
the next version around this functionality would be a visual (graph-like)
representation of the grammar and support for the upcoming XML based grammar
specification.

View Grammar Test Tool Screen Shot

Reusable Dialog Components

A key highlight of VoiceXML is that it truly integrates the web application
development world with the interactive speech-based telephony applications.
However, speech application development isn’t easy. It involves the creation
of complex dialogs for all the possible voice interactions. For instance, for
a simple dialog to get a valid US state as the input, you would need to create
a grammar which enlists all the states, etc.


Reusable Dialog Components
are an extensive set of reusable dialog components which are available from
IBM. They can be used within VoiceXML applications as sub-dialogs or templates.
Currently included in the 2.0 release are subdialogs (with their associated
grammars and VoiceXML code) for recognizing alphanumeric characters, selecting
elements from a list, confirmaiton; processing input for credit card numbers/expiration
dates, currency, dates, directions, durations, email addresses, numbers,
social security numbers, street types, telephone numbers, time, URL, major
cities of US, US states and time zones.


Reusable Dialog Components also include
another smaller set of components–known as VoiceXML code templates–which
represent a templated complex dialog flow created through a combination of
multiple reusable subdialogs. For instance, the included address template
can be used to get a user’s address information. This template uses Alpha,
AlphaNumeric, Confirmation, Number, Street Type, US Major City, US Postal
Code and US State subdialog components. Other components included are templates
for recognizing credit card information, date range, name and a time range.
Reusability–although a simple concept–isn’t easy to implement. For instance,
the creation of a library of reusable components is one thing, but using those
component easily in the application is another challenge. Voice Toolkit
makes the job simpler for reusing the extensive dialog components by providing
a simple wizard-based approach (shown in the figures below) for using a
dialog component in a VoiceXML-based application.
Figure: Select a Reusable Dialog Component

Figure: Customize the parameters of the component

The following code snippets show the code that is generated by the wizard.
The example below uses the US Postal Codes Dialog component to get a valid
postal code as an input and pass it on to rest of the application.

view code example 1

By just making a few changes, the dialog can be completed into a complete
application component. Once you start using the dialog components available,
you can easily recognize the value gained by the usage of reusable dialog
components and the time/effort that can be saved towards developing fully
functional VoiceXML applications through an assembly of components and
business logic.

view code example 2

Dynamic Application Development

Apart from the development of the static VoiceXML documents in the VoiceXML
editor (which can also be used as templates for dynamic content), Voice
Toolkit also provides dynamic VoiceXML application generation based on
J2EE (Java 2 Enterprise Edition) based Web Application Development. The
tool provides two basic wizards for JSP/Servlets based dynamic application
generation – Database Web Page & Java Bean Web Page (shown below). Even
though these wizards are quite basic, they provide a great deal of help
in getting the basic application template ready.

view the Application Wizard

For instance, the Java Bean Web Page wizard generates a starter VoiceXML-based application based on a Java Bean. Let’s complete our example from
the reusable dialog component section. We create an example/prototype
bean called “Weather” (this bean always returns the same weather for all
postal codes; an actual bean however would call a web service from weather.com
or some such service and get the actual weather for the postal code).

view bean example 1

Given the reference of the Weather bean to the “Java Bean Page Wizard,”
it generates the following JSP/Servlets based VoiceXML based Application:

Input:

view bean example 2

Output:

view bean example 3

Similarly, the Database Web Page Wizard generates the starter VoiceXML
application using a SQL query. Voice Toolkit generates all applications
based on Java Server Pages and Java Servlets specifications, and using the
toolkit’s support for either an embedded Apache Tomcat or IBM WebSphere
Application Server, the application can be remotely or locally deployed and
tested as well (using the Voice Application Debugger).

Conclusion

In summary, IBM WebSphere Voice Toolkit represents a comprehensive and
integrated set of tools for VoiceXML application development. What sets
it apart from the competition is the easy use of voice application debugging,
dynamic application generation from Java Beans and relational databases
and most importantly, the extensive set of reusable dialog components. What
I would like to see in the coming versions is support for VoiceXML 2.0
standard (currently in draft stage) features and graphical grammar creation
tools.

Resources

About Hitesh Seth

Hitesh Seth is Chief Technology Evangelist for Silverline Technologies,
a global eBusiness and mobile solutions consulting and integration services
firm. He is a columnist on VoiceXML technology in XML Journal and regularly
writes for other technology publications including Java Developer’s Journal
and Web Services Journal on technology topics such as J2EE, Microsoft
.NET, XML, Wireless Computing, Speech Applications, Web Services &
Integration. Hitesh received his Bachelors Degree from the Indian Institute
of Technology Kanpur (IITK), India. Feel free to email any comments or
suggestions about the articles featured in this column at
[email protected]

Latest Posts

Related Stories