October 25, 2014
Hot Topics:
RSS RSS feed Download our iPhone app

Processing Speech with Java

  • September 26, 2002
  • By Sams Publishing
  • Send Email »
  • More Articles »

That, in combination with the "number" parameter in the <sayas> tag, provides an unambiguous specification for this pronunciation.

You can run this code by running the program called unleashed.ch12. JSMLSpeaker3.java. This class is identical to the previous one except that it instantiates the SpeakableShares class. It is also included with the code download for this chapter.

The result of running this example is both audible and visual. First you will hear 1999 pronounced "nineteen ninety-nine." Next, you will hear it pronounced "one thousand nine hundred ninety nine." You will see the usual message on the console:

You are hearing the JSML output now.

Note - Because the Java Speech API is always running on top of a third-party speech product, the behavior that you observe might be different. For example, some products might recognize a period as a sentence ending instead of as a "dot" as we encountered in this chapter.


JSML is a very good example of how to use XML to improve your products. By embedding these controls in the JSML file, it is possible to place the specification of the pronunciation details outside the program and into the data.

Speech Recognition

The other side of Java Speech is speech recognition. As you might have predicted, the state of the art in recognition is not nearly as advanced as speech synthesis. The reason for this is simple; it is a harder topic.

If you are an English speaker, your ear might be finely tuned to the nuances of the language as the native speakers pronounce them from your region of the United States or Canada. If you relocate to another part of your country, or to another English speaking country, such as Australia, your ability to understand the language is diminished for a while. Over time, your brain learns the subtleties of the new dialect, and you once again become a fluent listener.

For a computer, the problem is similar. Recognizers receive information electronically via microphones. They then must try to determine what set of syllables to create from the set of phonemes (sounds) just received. These syllables must then be combined into words.

Recognition Grammars

A grammar simplifies the job of the speech recognizer by limiting the number of possible words and phrases that it has to consider when trying to determine what a speaker has said. There are two kinds of grammars: rule grammars and dictation grammars.

Rule grammars are composed of tokens and rules. When a user speaks, the input is compared to the rules and tokens in the grammar to determine the identity of the word or phrase. An application provides a rule grammar to a recognizer, normally during initialization.

Dictation grammars are built in to the recognizer itself. They define thousands of words that can be spoken in a free form fashion. Dictation grammars come closer to our ultimate goal of unrestricted speech, but, at present, they are slower than rule grammars and more prone to errors.


Note - There are four basic error types that recognizers suffer from regardless of the grammar employed:

  • Failure to recognize a valid word

  • Misinterpreting a word to be another valid word

  • Detecting a word where none was present

  • Failure to recognize that a word was spoken


Java Speech supports dynamic grammars. This means that grammars can be modified at runtime. After a change is made to the grammar, it must be committed using the commitChanges() method of the recognizer. When these changes are committed, they are committed atomically, meaning all at once. Listing 12.9 shows a simple grammar.

Listing 12.9 A Simple Grammar

grammar javax.speech.demo;

public <sentence> = Hello world |
          Hello Java Unleashed |
          Java Speech API |
          computer |
          bye |
          I program computers;

This rule grammar is composed of six different tokens. A recognizer that is working against this grammar will understand no other words, phrases, or parts of phrases. The reason for this is to simplify the processing and increase the likelihood that an accurate result will be obtained.

This rule grammar is formatted in the Java Speech Grammar Format Specification (JSGF). Grammars formatted in JSGF can be converted logically into RuleGrammar objects and back again. (It might look different, but it will be equivalent.)

Armed with a grammar, we need a recognizer program to process speech against it. Listing 12.10 shows a program that will serve as a recognizer for this grammar.

Listing 12.10 The HelloRecognizer Class

/*
 * HelloRecognizer.java
 *
 * Created on March 11, 2002, 9:53 PM
 */

package unleashed.ch12;

/**
 *
 * @author Stephen Potts
 * @version
 */

import javax.speech.*;
import javax.speech.recognition.*;
import java.io.FileReader;
import java.util.Locale;

public class HelloRecognizer extends ResultAdapter
{

  static Recognizer recognizer;
  String gst;

  public void resultAccepted(ResultEvent re)
  {
    try
    {
    Result res = (Result)(re.getSource());
    ResultToken tokens[] = res.getBestTokens();

    for (int i=0; i < tokens.length; i++)
    {
      gst = tokens[i].getSpokenText();
      System.out.print(gst + " ");
    }
    System.out.println();

    if(gst.equals("bye"))
    {
      System.out.println("See you later!");
      recognizer.deallocate();
      System.exit(0);
    }
    }catch(Exception ee)
    {
      System.out.println("Exception " + ee);
    }
  }

  public static void main(String args[])
  {
    try
    {
      recognizer = Central.createRecognizer(
         new EngineModeDesc(Locale.ENGLISH));
      recognizer.allocate();

      FileReader grammar1 =
       new FileReader("c:/unleashed/ch12/SimpleGrammar.txt");

      RuleGrammar rg = recognizer.loadJSGF(grammar1);
      rg.setEnabled(true);

      recognizer.addResultListener(new HelloRecognizer());


    System.out.println("Ready for Input");
      recognizer.commitChanges();

      recognizer.requestFocus();
      recognizer.resume();
    }catch (Exception e)
    {
      System.out.println("Exception " + e);
    }
  }
}

Note - The filename in the FileReader constructor must match the actual filename of the SimpleGrammar.txt file on your computer.


The creation of a recognizer is similar to the creation of a synthesizer. We use the Central class to create both:

      recognizer = Central.createRecognizer(
         new EngineModeDesc(Locale.ENGLISH));

Once again, we have chosen English as the language for this example. Once we have a recognizer, we can load the grammar:

      FileReader grammar1 =
       new FileReader("c:/unleashed/ch12/SimpleGrammar.txt");

      RuleGrammar rg = recognizer.loadJSGF(grammar1);
      rg.setEnabled(true);

We will load the grammar from Listing 12.8, which is stored in the SimpleGrammar.txt file. We create a RuleGrammar object, and set it to be enabled.

The event listener will do the runtime work of the program. We set the event listener here:

      recognizer.addResultListener(new HelloRecognizer());

Next, we complete the initialization of the recognizer by committing the grammar, getting the focus, and putting the recognizer in the RESUMED state:

      recognizer.commitChanges();
      recognizer.requestFocus();
      recognizer.resume();

When a spoken pattern is recognized as part of the grammar, a result event occurs and the ResultAccepted() method is called. The event object contains the information that we need to find out which phrase in the grammar was spoken:

    Result res = (Result)(re.getSource());
    ResultToken tokens[] = res.getBestTokens();

The Result interface documentation says that getBestTokens() guesses at the phrase that has been spoken. This is in reference to the inexact nature of the process of speech recognition. We then extract the string that the recognizer guesses is the correct one:

      gst = tokens[i].getSpokenText();

For all the strings except bye, the result is echoed to the console as shown here:

Hello world
Hello Java Unleashed
computer
I program computers
bye
See you later!

Once the bye string is received, we print it and exit the program, adding a See you later! as a confirmation that we have exited.

Summary

This chapter covers the two primary functions of the Java Speech API—speech synthesis and speech recognition. In addition, you learned about the speech engine that provides services to both of these capabilities.

You learned how to synthesize speech using an implementation of the Synthesizer interface provided by the IBM ViaVoice product. We wrote several programs that produced speech from written input.

You also learned how to use the Java Speech Markup Language (JSML) to give instructions to the speech engine about how to pronounce the words, dates, and numbers that appear in the text. You saw examples of how you can use XML tags to communicate this information to the synthesizer.

Finally, we took a look at the art of recognizing speech with software. We created a simple grammar using the Java Speech Grammar Format (JSGF) and loaded it into a recognizer program. We then spoke into the microphone and watched as our spoken words appeared in the console. In addition, you saw how a command can be tied to a spoken word by the way the word bye was used to close this program.

The subject of speech in Java is larger than a single chapter can cover. This chapter provides enough information so that you will be able to get both a synthesizer and a recognizer working on your computer. Hopefully, you will be able to copy and paste these programs and enhance them to meet the requirements of your projects.

Authors of this Chapter

Stephen Potts is an independent consultant, author, and Java instructor in Atlanta, Georgia (United States). Steve received his computer science degree in 1982 from Georgia Tech. He has worked in a number of disciplines during his 20-year career, with manufacturing being his area of greatest expertise. His previous books include Special Edition Using Visual C++ 4 and Java 1.2 How-To. He can be reached via e-mail at stevepotts@mindspring.com.

Source of this material

This is Chapter 12: Processing Speech with Java from the book Java 2 Unleashed, Sixth Edition (ISBN:0-672-32363-X) written by Stephen Potts, Alex Pestrikov, and Mike Kopack, published by Sams Publishing.

To access the full Table of Contents for the book


Other Chapters from Sams Publishing:

Web Services and Flows (WSFL)
Overview of JXTA
Introduction to EJBs





Page 5 of 5



Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 


Sitemap | Contact Us

Rocket Fuel