Build your own email reader by phone using VoiceXML
by Mukund Bhagavan
This article demonstrates how to build an email reader for any phone using VoiceXML and Java Server Pages (JSP). The email reader application is an interesting application that illustrates the power of VoiceXML for every day use. The application allows a user to (a) register for the email reader by phone service by providing a POP server name, username, password and choosing a 4-digit pin that maps to this POP account (b) dial into the email reader and login using the chosen 4-digit pin and (c) listen and browse emails retrieved from the POP server. The user can browse through the sender and subject headers for all emails and then say ‘more’ to listen to the body of any email. For source code, download the following three zip files:
All three files should be downloaded to the same directory.
The development of the Email Reader application consists of the following steps:
- Application Design
- Application Testing
The Email Reader application was created using the BeVocal Cafi, a web-based VoiceXML development platform. The BeVocal Cafi provides several tools for voice application development such as a VoiceXML trace tool, debugger, and text emulator and resources including extensive documentation and newsgroup support to enable developers to rapidly build VoiceXML applications.
The typical development life cycle for a voice application using the BeVocal Cafi is as follows:
- Design the dialog. The dialog of an application is an essential part of the voice user interface.
- Create audio for pre-recorded prompts. Pre-recorded audio significantly improves the quality and usability of a voice application. Where necessary, use text-to-speech (TTS) for audio output.
- Build an application in VoiceXML using static content
- Use the VoiceXML Checker to check for syntax errors and use the Vocal Debugger to test flow of dialogs.
- Test the application by calling the toll free number (1-877-33-VOCAL) provided by the Cafi.
- Extend the application to serve dynamic content using server side scripts (such as JSP)
- Test the application flow either by calling in or using the text-based tool (Vocal Scripter). Use the Trace tool to trace application flow while the call is in progress. Alternatively, you can use the Log Browser to view historical trace logs.
- Deploy your application as a demo and distribute it to a limited user group to gather user feedback
The nature of voice applications is such that you continuously need to gather user feedback and measure user experience to ensure optimal quality of the application. Some of the metrics to look for in this regard are recognition accuracy of dialogs, task completion rate and confusability of dialogs. The BeVocal Cafi provides specific tools such as the Log Browser and the Vocal Player that help developers in identifying specific problem areas.
The application design for the email reader is quite simple. There are two discreet steps in the application:
- One-time registration. The registration provides a way for users to provide their POP account info and choose a pin that maps to this account. The POP account info and the pin mapping are stored to a persistent storage, and in this case it’s a flat file. When the user calls in through the phone and logs in using their pin, the user’s POP account info is retrieved from the flat file and used for fetching the emails.
- Call the application and listen to email. When a user calls the application, the application invokes a VoiceXML document that requests the user’s pin that was previously entered on the web page. This pin is submitted to a JSP that fetches the user’s email and generates the VoiceXML to read out the emails.
Application design involves deciding which VoiceXML documents will be static and which will be generated dynamically. In this application, there is a static VoiceXML document to authenticate the user and post the user information to a JSP. All other VoiceXML documents are dynamically generated through JSPs.
The Email Reader application uses JSPs to perform server-side functionality such as writing to files and generating VoiceXML documents with the latest emails. The implementation of the Email Reader involves three steps: registration, authentication and retrieval of emails.
In order to listen to emails, users must register for the Email Reader service through a web page. register.jsp contains the code for the registration page. Users are asked to enter a POP server name, username, password and a pin. When a user clicks on the “Register” button, the form is posted to register_confirm.jsp.
register_confirm.jsp retrieves post parameter values for the POP server, username, password, and pin. It ensures that the pin is not duplicated, and registers the user by writing all of the information to a flat file (called users).
When a user calls the email reader application, the first step is to authenticate the user. emailreader.vxml prompts a user to enter his/her pin and submits the form to access that user’s email. Once the pin field is filled, the form is posted to fetchemails.jsp.
fetchemails.jsp retrieves the POP account info associated with the provided pin, and using JavaMail API fetches all emails from the user’s POP account. The sender name, subject and email body are extracted and reformatted into VoiceXML content.
fetchemails.jsp iterates through all the emails and includes VoiceXML logic to traverse through emails by saying ‘next’, ‘previous’, ‘last’ or first’ as well as hear the body of any email by saying ‘more’.
The Email Reader application can be tested using some of the tools available on the BeVocal Cafi.
- Application logic testing: Application logic can be tested in the BeVocal Cafi without using a telephone using the Vocal Scripter. The Vocal Scripter is a web-based applet tool that runs with any standard web-browser. A developer can also access the application by calling 1-877-33-VOCAL.
- Recognition testing: Recognition testing is necessary to make sure that a voice application is usable. The Log Browser is an interface that lists all calls for any chosen timeframe. For each call it provides different pieces of info such as caller id, dialed number, duration and the number of times there was misrecognition during the call. A developer can use this info to look at the complete trace log for the call and identify any particular dialogs or grammars that might be the cause for poor recognition. Improving recognition is often accomplished by tuning the associated grammar, and is a larger topic that should be addressed separately. In our example, given the small grammar sizes, we are unlikely to see poor recognition.
- Usability testing: Usability testing, a critical part of testing VoiceXML applications, is an area many new VoiceXML developers are unfamiliar with. The purpose of usability testing is to detect potential problems that were not anticipated during the design review and is a process that continues well after the application is developed. The metrics that provide a measure for usability are the time it takes a user to complete a task, the ease of completing a goal, and frequency of occurrences for out-of grammar utterances.
- Performance testing: Voice applications need to set service level requirements and be tested to ensure that they meet these minimum requirements. The following metrics provide key indications on the performance of an application: fetch time for VoiceXML resources including documents and prompts, time to parse and execute a VoiceXML document, recognition accuracy and response time and text-to-speech quality and response time.
The BeVocal Cafi provides a demo feature through which developers can build an application and then promote that to their company, customers and partners by providing just a demo id. The demo feature can be enabled through the deployment tab of the developer’s account. Once a developer has chosen deployed an application using a demo id, the application can be accessed by calling 1-800-4-BVOCAL, saying “BeVocal Cafi” at the first prompt, and then saying or entering the demo id at the next prompt.
This article illustrates how to use VoiceXML and JSP to create a useful email reader by phone application , which reads emails from any POP server. The important facet of this application is that it gives the voice user interface in conjunction with generating content using existing web infrastructure.
This article was written by Mukund Bhagavan of Be Vocal, a company which provides a web-based development environment, Be Vocal Cafi, that provides all the tools and resources developers need to create their own innovative speech applications for the telephone.