Guides Exploring the Android Speech API for Voice Recognition

Exploring the Android Speech API for Voice Recognition

This tutorial will give you a brief introduction of the Android Speech API used for voice recognition, which is an area of computational linguistics that develops methodologies and technologies automating recognition and translation of spoken language into text—Speech-to-Text (STT). Previously, in another tutorial, we covered the topic for Text-to-Speech (TTS); the tutorial was called “Adding Basic Android Text-To-Speech to Your Apps.” You are more than welcome to check it out also. STT has numerous practical applications—home automation, security authentication, data entry, subtitling and translation, robotics, gaming, and so forth.

Learn Mobile Development and Start your Free Trial today!

The Android Speech API provides recognition control, background services, intents, and support for multiple languages. Again, it can look like a simple addition to the user input for your apps, but it’s a very powerful feature that makes them stand out. Imagine how helpful this feature can be for those people with disabilities using a keyboard or simply for those trying to find a way to increase productivity and improve their work flow.

API Package and Device Support

Virtual Keyboard

public class ShowSupportedLanguages extends Activity {
   private TextView mTextView;

   @Override
   protected void onCreate(Bundle savedInstanceState) {
      super.onCreate(savedInstanceState);
      setContentView(R.layout.lang);
      if (!SpeechRecognizer.isRecognitionAvailable(this)) {
         updateResults("nNo voice recognition support on
            your device!");
      } else {
         LanguageDetailsReceiver ldr = new
            LanguageDetailsReceiver(this);
         sendOrderedBroadcast(RecognizerIntent
            .getVoiceDetailsIntent(this), null, ldr, null,
             Activity.RESULT_OK, null, null);
      }
   }

   void updateResults(String s) {
      mTextView = (TextView)findViewById(R.id.tvlanglist);
      mTextView.setText(s);
   }
}

Listing 1: Display Supported Speech Languages

public class LanguageDetailsReceiver extends BroadcastReceiver {
   List<string> mLanguages;
   ShowSupportedLanguages mSSL;

   public LanguageDetailsReceiver(ShowSupportedLanguages ssl) {
      mSSL = ssl;
      mLanguages= new ArrayList<string>();
   }

   @Override
   public void onReceive(Context context, Intent intent)
   {
      Bundle extras = getResultExtras(true);
      mLanguages = extras.getStringArrayList
         (RecognizerIntent.EXTRA_SUPPORTED_LANGUAGES);
      if (mLanguages == null) {
         mSSL.updateResults("No voice data found.");
      } else {
         String s = "nList of language voice data:n";
         for (int i = 0; i < mLanguages.size(); i++) {
            s += (mLanguages.get(i) + ", ");
         }
            s += "n";
            mSSL.updateResults(s);
      }
   }
}

Listing 2: Language Details Broadcast Receiver

Supported Speech Data
Figure 3: Supported Speech Data

Basic Voice Recognition Example

We are now ready to start a basic example utilizing the voice recognition feature. RecognizerIntent.ACTION_RECOGNIZE_SPEECH is the intent defining the request. The only requirement is to specify RecognizerIntent.EXTRA_LANGUAGE_MODEL, which is assigned with RecognizerIntent.LANGUAGE_MODEL_FREE_FORM in our case. If another language is needed, you can supply the data for RecognizerIntent.EXTRA_LANGUAGE. Otherwise, the recognizer will simply use the default locale. To make the example more interesting, we also use RecognizerIntent.EXTRA_PROMPT to prompt a question. Then, we can start the recognition intent.

Once the recognition results are returned, they are saved in the data bundle associated with RecognizerIntent.EXTRA_RESULTS. In this example, we basically check if the answer contains a substring “Amazon”. Depending on your voice input, it will respond with the message on screen accordingly. The code is implemented in Listing 3.

When the app is run, it will prompt you the question message with a microphone icon waiting for you to say something, as in Figure 4. In Figure 5, I intentionally responded with “Google”, which does not contain the substring “Amazon” and therefore the result message was displayed that way.

public class StartVoiceRecognition extends Activity {
   private final int REQUEST_SPEECH_RECOGNIZER = 3000;
   private TextView mTextView;
   private final String mQuestion = "Which company is the largest
      online retailer on the planet?";
   private String mAnswer = "";

   @Override
   protected void onCreate(Bundle savedInstanceState) {
      super.onCreate(savedInstanceState);
      setContentView(R.layout.voicerecog);
      mTextView = (TextView)findViewById(R.id.tvstt);
      startSpeechRecognizer();
   }

   private void startSpeechRecognizer() {
      Intent intent = new Intent
         (RecognizerIntent.ACTION_RECOGNIZE_SPEECH);
      intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL,
         RecognizerIntent.LANGUAGE_MODEL_FREE_FORM);
      intent.putExtra(RecognizerIntent.EXTRA_PROMPT, mQuestion);
      startActivityForResult(intent, REQUEST_SPEECH_RECOGNIZER);
   }

   @Override
   protected void onActivityResult(int requestCode, int resultCode,
         Intent data) {
      super.onActivityResult(requestCode, resultCode, data);

      if (requestCode == REQUEST_SPEECH_RECOGNIZER) {
         if (resultCode == RESULT_OK) {
            List<string> results = data.getStringArrayListExtra
               (RecognizerIntent.EXTRA_RESULTS);
            mAnswer = results.get(0);

            if (mAnswer.toUpperCase().indexOf("AMAZON") > -1)
               mTextView.setText("nnQuestion: " + mQuestion +
                  "nnYour answer is '" + mAnswer +
                  "' and it is correct!");
            else
               mTextView.setText("nnQuestion: " + mQuestion +
                  "nnYour answer is '" + mAnswer +
                  "' and it is incorrect!");
         }
      }
   }
}

Listing 3: Basic Voice Recognition Example

Voice Recognition in Action
Figure 4: Voice Recognition in Action

Voice Recognition Result
Figure 5: Voice Recognition Result

Conclusion

Android makes the speech API easy and powerful enough to use for anyone interested in adding the voice recognition feature to their apps. We made a brief introduction of how to set it up, what recognizer intents are, what your device supports, and how to provide multi-lingual support through some basic examples. Because Speech-to-Text (STT) technology is popular in many practical applications, ranging from improving personal productivity to controlling complicated robots, it surely will become more and more common in daily-life software and hardware alike.

There are some other sample projects available in the Google Android official repository. They also have the voice recognition feature integrated, so it is highly recommended you check out different applications; they may give some great ideas for your users. For advanced developers, you should find something interesting in this Speech Recognition API offered by Google Cloud Platform.

References

About the Author

Author Chunyen Liu has been a software veteran in Taiwan and the United States. He is a published author of 40+ articles and 100+ tiny apps, a software patentee, technical reviewer, and programming contest winner by ACM/IBM/SUN. He holds advanced degrees in Computer Science with 20+ graduate-level classes. On the non-technical side, he is enthusiastic about the Olympic sport of table tennis, being a USA certified umpire, certified coach, certified referee, and categorized event winner at State Championships and the US Open.

Latest Posts

Related Stories