MobileAdd Text-To-Speech and Speech Recognition to Your Android Applications

Add Text-To-Speech and Speech Recognition to Your Android Applications

Developer.com content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

Incorporating speech recognition and text-to-speech (TTS) services may prove very beneficial to certain types of applications and certain groups of users. Speech recognition involves listening for user’s voice input, processing the recorded sound, and interpreting the results. TTS services involve taking text string data and having the device “read” the content aloud using a “voice”. Hands-free applications, such as turn-based navigation utilities routinely use both technologies. Users with special needs, such as the visually impaired, also benefit from these features.

Android speech services are available within the SDK in the android.speech package. The speech recognition classes, such as the definition of the RecognizerIntent, can be found within this package. The TTS features are found in the android.speech.tts sub-package.

Applications require no special permissions to use Android speech services. Be aware, though, that speech recognition does require a data connection.

Note: Open source code is available for this tutorial.

Implementing Android Speech Recognition

Speech or voice recognition involves recording voice input using the device’s microphone. The resulting sound file is then analyzed and translated into a string. The built-in speech recognition services available in the Android SDK come in two forms: “free form” is used for dictation purposes and “web search” is used for short command-like phrases. You can also develop your own recognition services using the classes available in the android.speech package.

Access to speech recognition is built into the default software keyboard starting in Android 2.1. Therefore, your application may already support basic voice input without any changes whatsoever. However, directly accessing the recognizer can allow for more interesting spoken word control over applications.

The simplest speech recognition case involves launching the android.speech.RecognizerIntent intent to leverage the built-in speech recorder. This launches the Android speech recorder which prompts the user to record speech input. The resulting sound file is sent to an underlying recognition server for processing, requiring an internet connection. The results are returned to the calling activity. Here’s an example:


public class SimpleSpeechActivity extends Activity
{
private static final int VOICE_RECOGNITION_REQUEST = 0x10101;
@Override
public void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.main);
}
public void speakToMe(View view) {
Intent intent = new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH);
intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL,
RecognizerIntent.LANGUAGE_MODEL_FREE_FORM);
intent.putExtra(RecognizerIntent.EXTRA_PROMPT,
"Please speak slowly and enunciate clearly.");
startActivityForResult(intent, VOICE_RECOGNITION_REQUEST);
}
@Override
protected void onActivityResult(int requestCode, int resultCode, Intent data) {
if (requestCode == VOICE_RECOGNITION_REQUEST && resultCode == RESULT_OK) {
ArrayList matches = data
.getStringArrayListExtra(RecognizerIntent.EXTRA_RESULTS);
TextView textView = (TextView) findViewById(R.id.speech_io_text);
String firstMatch = matches.get(0);
textView.setText(firstMatch);
}
}
}

In this case, the intent is initiated through the click of a Button control, which causes the speakToMe() method to be called. The RecognizerIntent is configured as follows:

  • The intent action is set to ACTION_RECOGNIZE_SPEECH in order to prompt the user to speak and send that sound file in for speech recognition.
  • An intent extra called EXTRA_LANGUAGE_MODEL is set to LANGUAGE_MODEL_FREE_FORM in order to perform standard speech recognition. There is also another language model especially for Web searches called LANGUAGE_MODEL_WEB_SEARCH.
  • An intent extra called EXTRA_PROMPT is set to a string to display to the user during speech input.

Android Speech Services

Implementing Android Text-To-Speech Features

The Android TTS features were introduced in Android 1.6 as part of the android.speech.tts package. TTS services enable the device to “speak” to the user (provided the sound volume is cranked up). This feature is frequently required by navigational applications to enable the hands-free usage required by law in many regions. Other applications use this feature for users who have reading or sight problems. The synthesized speech can be played immediately or saved to an audio file.

For TTS services to function properly, the specific Android device must have both the TTS engine and the appropriate language resource files. In some cases, the user must install the appropriate voice language resource files. The users can install TTS voice resources from the Android operating system settings at: Settings, Voice input & output settings, Text-to-speech, Install Voice Data.

The simplest TTS use case uses the TextToSpeech class. A TextToSpeech instance is configured. Its language is set specifically and a string of text is supplied to the speak() method of the TextToSpeech object to perform the “speaking.” Here’s the sample code:


public class SampleSpeechActivity extends Activity implements OnInitListener {
private TextToSpeech mTextToSpeech = null;
private boolean speechSynthReady = false;

@Override
public void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.main);
}

@Override
protected void onPause() {
super.onPause();
mTextToSpeech.shutdown();
mTextToSpeech = null;
}

@Override
protected void onResume() {
super.onResume();
mTextToSpeech = new TextToSpeech(getApplicationContext(), this);
}

public void listenToMe(View view) {
if (!speechSynthReady) {
Toast.makeText(getApplicationContext(),
"Speech Synthesis not ready.", Toast.LENGTH_SHORT).show();
return;
}
int result = mTextToSpeech.setLanguage(Locale.US);
if (result == TextToSpeech.LANG_MISSING_DATA
|| result == TextToSpeech.LANG_NOT_SUPPORTED) {
Toast.makeText(
getApplicationContext(),
"Language not available. Check code or config in settings.",
Toast.LENGTH_SHORT).show();
} else {
TextView textView = (TextView) findViewById(R.id.speech_io_text);
String textToSpeak = textView.getText().toString();
mTextToSpeech.speak(textToSpeak, TextToSpeech.QUEUE_FLUSH, null);
}

}

@Override
public void onInit(int status) {
if (status == TextToSpeech.SUCCESS) {
speechSynthReady = true;
}
}
}

The speak() method takes three parameters: the string of text, the queuing strategy and the speech parameters. The queuing strategy can either be to add some text to speak to the queue or to flush the queue. In this case, we use the QUEUE_FLUSH strategy, as this is only a simple example. No special speech parameters are set, so we simply pass in null for the third parameter. Finally, when you are done with the TextToSpeech engine (such as in your activity’s onDestroy() method), make sure to release its resources using the shutdown() method, so that the background support service does not run needlessly.

In our example, the TTS language was set to American English, and so the voice used by the TTS engine has an American accent with related pronunciations. The Android TTS engine supports several languages, including English (American or British), French, German, Italian, and Spanish. You can easily enable British English pronunciation instead by simply providing a different language to the setLanguage() method of the TextToSpeech class, like this:

int result = mTextToSpeech.setLanguage(Locale.UK);

Support Limitations of Android Speech Services

There are limits to the Android speech services available. These limitations come in several forms:

  • Speech recognition and TTS are optional device features. That is, not all devices support these technologies. Verify that your target devices support these services prior to publication of your application and gracefully handle situations where the services are unavailable.
  • Only certain languages and locales are supported by the speech services. For example, English is widely supported, as well as some other languages. However, your application should check for specific language support programmatically prior to use.
  • Services like TTS use underlying Android background services to provide functionality. Using these services requires good stewardship, including starting and stopping support services as required. This also means that applications that use these services have the associated performance concerns.
  • Certain speech service features require network connectivity to function.

Conclusion

The speech services available on the Android platform are powerful and easy to use. These technologies can be used to make applications more accessible and safe for users, but they impose limitations as well.

We’d love to hear what innovative ways you’re using, or plan to use, speech services in your Android applications. Drop us a comment!

About the Authors

Shane Conder Shane Conder and Lauren DarceyContributing Editors, Mobile Development–have coauthored two books on Android development: an in-depth programming book entitled Android Wireless Application Development (ISBN-13: 978-0-321-62709-4) and Sams Teach Yourself Android Application Development in 24 Hours (ISBN-13: 978-0-321-67335-0). When not writing, they spend their time developing mobile software at their company and providing consulting services.

Email | Blog | Twitter

Lauren Darcey

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends & analysis

Latest Posts

Related Stories