This is Chapter 12: Processing Speech with Java from the book Java 2 Unleashed, Sixth Edition (ISBN:0-672-32394-X) written by Stephen Potts, Alex Pestrikov, and Mike Kopack, published by Sams Publishing.
In This Chapter
Understanding Java Speech
Creating and Allocating the Speech Engine
In the 1990s, engineers and programmers got a new dose of our favorite show in Star Trek: The Next Generation. When Captain Picard sat in his chair on the bridge and spoke to the starship’s computers, we saw the vision of what the world would be like if voice-driven systems were to become a reality.
Ever since 2001: A Space Odyssey premiered with its talking computer, Hal, the public has been waiting for voice-driven systems to become a reality. Who can forget the computer in War Games that said “Shall we play a game?” Now, after nearly 40 years of experimentation and uncountable sums of money have been expended, we are still waiting. That future is still possible, but the good news is that voice-driven systems are becoming more common. Most of us have encountered a voice-driven system that asks us to “press or say one to speak to the appointment desk.” The material covered in this chapter will teach you how to write systems that can respond to the spoken word as these systems do.
In this chapter, we will learn about getting computers to accept sounds as inputs and provide them to us as outputs. To do this, we will first learn how to get a computer to speak to us. We will also learn how a computer can be made to understand our world and react to it.
Understanding Java Speech
Speech is such a common subject that whenever we bring it up as a topic of conversation, our friends look at us as if we are a little strange. We all speak, and none of us can remember a time when we didn’t.
There is a lot to know about phonetics and language, and we need to understand more than a little of it if we are going to become good speech programmers. Although it is true that the software engineers at the tool vendors do much of the hard work associated with programming speech, we will not be able to take advantage of the tools they provide unless we understand the subject.
Computerized speech can be divided into two categories: speech recognition and speech synthesis. Speech recognition is the art of turning analog sound waves captured by a microphone into words. These words are either commands to be acted on or data to be stored, displayed, manipulated, and so on.
Speech synthesis is the art of taking the written word and transforming it into analog waveforms that can be heard over a speaker. When looked at in this light, the problem of teaching a computer how to listen and talk seems a little daunting.
Take yourself back mentally to the mid 1970s. Imagine for a moment that you have been given the task of taking one of the mainframe computers in the data center and teaching it to read out loud. You sit down at the computer terminal, and what do you type? What language do you write this system in? What speakers will you use? Where will you plug them in? What kind of pronunciation will you use? How will you generate the waveforms for each syllable? How will you time the output? How will punctuation be handled?
When we think about these issues, we are glad that we are living now. In the 1970s, this entire subject was in the hands of the researchers. Even now, while some commercial applications of speech synthesis have been written, it is still a fertile subject for Ph.D. and graduate students.
Our job as application programmers is much easier than it would have been in the 1970s because of two developments. The first is the creation and marketing of commercial speech products. The second is the creation of the Java Speech API by Sun, in conjunction with a number of other companies interested in this subject.
The Java Speech API is a set of abstract classes and interfaces that represent a Java programmer’s view of a speech engine, but it makes no assumptions about the underlying implementation of the engine. This engine can be either a hardware or a software solution (or a hybrid of the two). It can be implemented locally or on a server. It can be written in Java or in any other language that can be called from Java. In fact, different compliant engines can even have different capabilities. One of them might have the capability to learn your speech patterns, whereas another engine might choose not to implement this. It is the engine’s responsibility to handle any situations that it does not support in a graceful manner.
The Java Speech API is slightly different from other Java Extensions in that Sun Microsystems doesn’t provide a reference implementation for it. Instead, Sun provides a list of third-party vendors who have products that provide a Java Speech API interface. The official Java Speech Web site (http://java.sun.com/products/java-media/speech) lists the following companies as providers of Java-compatible speech products:
FreeTTS—This is an open source speech synthesizer written entirely in Java.
IBM’s Speech for Java—This implementation is based on the IBM ViaVoice product. You must purchase a copy of ViaVoice for this product to work.
The Cloud Garden—This implementation will work with any speech engine that is based on Microsoft Speech API (SAPI) version 5.
Lernout and Hauspie’s TTS for Java Speech API—This package runs on Sun and provides a number of advanced features.
Conversa Web 3.0—This product is a speech-enabled Web browser.
Festival—This product comes from Scotland and is Unix based. It supports a number of programming interfaces in addition to Java.