dcsimg
February 21, 2018
Hot Topics:

Exploring the Android Speech API for Voice Recognition

  • January 18, 2018
  • By Chunyen Liu
  • Send Email »
  • More Articles »

This tutorial will give you a brief introduction of the Android Speech API used for voice recognition, which is an area of computational linguistics that develops methodologies and technologies automating recognition and translation of spoken language into text—Speech-to-Text (STT). Previously, in another tutorial, we covered the topic for Text-to-Speech (TTS); the tutorial was called "Adding Basic Android Text-To-Speech to Your Apps." You are more than welcome to check it out also. STT has numerous practical applications—home automation, security authentication, data entry, subtitling and translation, robotics, gaming, and so forth.

The Android Speech API provides recognition control, background services, intents, and support for multiple languages. Again, it can look like a simple addition to the user input for your apps, but it's a very powerful feature that makes them stand out. Imagine how helpful this feature can be for those people with disabilities using a keyboard or simply for those trying to find a way to increase productivity and improve their work flow.

API Package and Device Support

Android's official Speech API with main programming interfaces and classes since Level 3 can be located at this link.

The classes we are mainly interested in for voice recognition are SpeechRecognizer and RecognizerIntent. The most important intent is RecognizerIntent.ACTION_RECOGNIZE_SPEECH with only one required extra data source, RecognizerIntent.EXTRA_LANGUAGE_MODEL, in the bundle to start the recognition process. If you want to use a language other than the default one, you can specify RecognizerIntent.EXTRA_LANGUAGE for that purpose.

Has voice recognition been used anywhere on your device already? Similar to the settings on my Nexus 6P running Android 8.0 Oreo, you also can find the option in "Settings -> System -> Languages and input -> Advanced -> Virtual keyboard -> Google voice typing," as shown in Figures 1 and 2. This is why you can simply do a Web search by speaking into the microphone or emulate the typing when presented with an on-screen keyboard. You can see that the possibilities of using this technology are unlimited.

Virtual Keyboard
Figure 1: Virtual Keyboard

Google Voice Typing
Figure 2: Google Voice Typing

Speech Data for Multiple Languages

First off, we can check if your device even supports the STT feature by using SpeechRecognizer.isRecognitionAvailable(). If it does, we can go ahead and use sendOrderedBroadcast() to request the current voice data details, as demonstrated in Listing 1. Through the broadcast receiver, we can unpack the result bundle associated with RecognizerIntent.EXTRA_SUPPORTED_LANGUAGES, as in Listing 2. The results are captured in Figure 3. They are in the international format of Best Current Practice (BCP) 47.

public class ShowSupportedLanguages extends Activity {
   private TextView mTextView;

   @Override
   protected void onCreate(Bundle savedInstanceState) {
      super.onCreate(savedInstanceState);
      setContentView(R.layout.lang);
      if (!SpeechRecognizer.isRecognitionAvailable(this)) {
         updateResults("\nNo voice recognition support on
            your device!");
      } else {
         LanguageDetailsReceiver ldr = new
            LanguageDetailsReceiver(this);
         sendOrderedBroadcast(RecognizerIntent
            .getVoiceDetailsIntent(this), null, ldr, null,
             Activity.RESULT_OK, null, null);
      }
   }

   void updateResults(String s) {
      mTextView = (TextView)findViewById(R.id.tvlanglist);
      mTextView.setText(s);
   }
}

Listing 1: Display Supported Speech Languages

public class LanguageDetailsReceiver extends BroadcastReceiver {
   List<string> mLanguages;
   ShowSupportedLanguages mSSL;

   public LanguageDetailsReceiver(ShowSupportedLanguages ssl) {
      mSSL = ssl;
      mLanguages= new ArrayList<string>();
   }

   @Override
   public void onReceive(Context context, Intent intent)
   {
      Bundle extras = getResultExtras(true);
      mLanguages = extras.getStringArrayList
         (RecognizerIntent.EXTRA_SUPPORTED_LANGUAGES);
      if (mLanguages == null) {
         mSSL.updateResults("No voice data found.");
      } else {
         String s = "\nList of language voice data:\n";
         for (int i = 0; i < mLanguages.size(); i++) {
            s += (mLanguages.get(i) + ", ");
         }
            s += "\n";
            mSSL.updateResults(s);
      }
   }
}

Listing 2: Language Details Broadcast Receiver

Supported Speech Data
Figure 3: Supported Speech Data

Basic Voice Recognition Example

We are now ready to start a basic example utilizing the voice recognition feature. RecognizerIntent.ACTION_RECOGNIZE_SPEECH is the intent defining the request. The only requirement is to specify RecognizerIntent.EXTRA_LANGUAGE_MODEL, which is assigned with RecognizerIntent.LANGUAGE_MODEL_FREE_FORM in our case. If another language is needed, you can supply the data for RecognizerIntent.EXTRA_LANGUAGE. Otherwise, the recognizer will simply use the default locale. To make the example more interesting, we also use RecognizerIntent.EXTRA_PROMPT to prompt a question. Then, we can start the recognition intent.

Once the recognition results are returned, they are saved in the data bundle associated with RecognizerIntent.EXTRA_RESULTS. In this example, we basically check if the answer contains a substring "Amazon". Depending on your voice input, it will respond with the message on screen accordingly. The code is implemented in Listing 3.

When the app is run, it will prompt you the question message with a microphone icon waiting for you to say something, as in Figure 4. In Figure 5, I intentionally responded with "Google", which does not contain the substring "Amazon" and therefore the result message was displayed that way.

public class StartVoiceRecognition extends Activity {
   private final int REQUEST_SPEECH_RECOGNIZER = 3000;
   private TextView mTextView;
   private final String mQuestion = "Which company is the largest
      online retailer on the planet?";
   private String mAnswer = "";

   @Override
   protected void onCreate(Bundle savedInstanceState) {
      super.onCreate(savedInstanceState);
      setContentView(R.layout.voicerecog);
      mTextView = (TextView)findViewById(R.id.tvstt);
      startSpeechRecognizer();
   }

   private void startSpeechRecognizer() {
      Intent intent = new Intent
         (RecognizerIntent.ACTION_RECOGNIZE_SPEECH);
      intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL,
         RecognizerIntent.LANGUAGE_MODEL_FREE_FORM);
      intent.putExtra(RecognizerIntent.EXTRA_PROMPT, mQuestion);
      startActivityForResult(intent, REQUEST_SPEECH_RECOGNIZER);
   }

   @Override
   protected void onActivityResult(int requestCode, int resultCode,
         Intent data) {
      super.onActivityResult(requestCode, resultCode, data);

      if (requestCode == REQUEST_SPEECH_RECOGNIZER) {
         if (resultCode == RESULT_OK) {
            List<string> results = data.getStringArrayListExtra
               (RecognizerIntent.EXTRA_RESULTS);
            mAnswer = results.get(0);

            if (mAnswer.toUpperCase().indexOf("AMAZON") > -1)
               mTextView.setText("\n\nQuestion: " + mQuestion +
                  "\n\nYour answer is '" + mAnswer +
                  "' and it is correct!");
            else
               mTextView.setText("\n\nQuestion: " + mQuestion +
                  "\n\nYour answer is '" + mAnswer +
                  "' and it is incorrect!");
         }
      }
   }
}

Listing 3: Basic Voice Recognition Example

Voice Recognition in Action
Figure 4: Voice Recognition in Action

Voice Recognition Result
Figure 5: Voice Recognition Result

Conclusion

Android makes the speech API easy and powerful enough to use for anyone interested in adding the voice recognition feature to their apps. We made a brief introduction of how to set it up, what recognizer intents are, what your device supports, and how to provide multi-lingual support through some basic examples. Because Speech-to-Text (STT) technology is popular in many practical applications, ranging from improving personal productivity to controlling complicated robots, it surely will become more and more common in daily-life software and hardware alike.

There are some other sample projects available in the Google Android official repository. They also have the voice recognition feature integrated, so it is highly recommended you check out different applications; they may give some great ideas for your users. For advanced developers, you should find something interesting in this Speech Recognition API offered by Google Cloud Platform.

References

About the Author

Author Chunyen Liu has been a software veteran in Taiwan and the United States. He is a published author of 40+ articles and 100+ tiny apps, a software patentee, technical reviewer, and programming contest winner by ACM/IBM/SUN. He holds advanced degrees in Computer Science with 20+ graduate-level classes. On the non-technical side, he is enthusiastic about the Olympic sport of table tennis, being a USA certified umpire, certified coach, certified referee, and categorized event winner at State Championships and the US Open.





Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 


Enterprise Development Update

Don't miss an article. Subscribe to our newsletter below.

Sitemap | Contact Us

Thanks for your registration, follow us on our social networks to keep up-to-date