November 26, 2014
Hot Topics:

Adding Speech Support to Your Windows Phone 8 Application

  • January 16, 2013
  • By Vipul Patel
  • Send Email »
  • More Articles »

Introduction

Windows Phone 8 platform provides users with the ability to interact with a Windows Phone 8 application by speech in 3 different ways. Having a knowledge of how to build speech support in a Windows Phone 8 application means additional ways in which users can interact with your application.

There are additional ways in which a user can interact with a Windows Phone 8 application: through voice commands, through speech recognition and through text-to-speech (TTS).

Voice Commands

By default, a user on a Windows Phone 8 device can use speech to launch any application by saying “open appname” or “start appname”.

In addition to this, a user can interact via speech with the Windows Phone 8 application in additional ways via support for voice commands. Users can discover the available voice commands for an application through the “What can I say” functionality in Windows Phone 8.

Speech Recognition

Windows Phone 8 provides the ability to an application to enable users to interact with it through speech recognition. With speech recognition, you interact with an application from within the application, whereas with voice commands, you interact with the application from outside of the application.

To support speech recognition, Windows Phone 8 has built-in support for dictation and specifying custom “grammar” built according to Speech Recognition Grammar Specification SRGS version 1.

Text-to-Speech

Windows Phone 8 supports reading out text to the user, be it a simple string or a page full of content.  Windows Phone 8 provides APIs in the Windows.Phone.Speech.Synthesis namespace to support TTS functionality.

Hands-On

In this hands-on, we will create a Windows Phone 8 application, which utilizes voice commands. The theme of the application is simple. Besides the default application page, which is present in every Windows Phone 8 application, we will create two additional pages, called OptionOne and OptionTwo. We will then enable voice commands so that we can navigate to either of these pages directly without needing to go to the application home page.

To start with, open Visual Studio 2012 and create a new Windows Phone 8 project titled WindowsPhoneVoiceCommandDemo.

Create a new Windows Phone 8 project
Create a new Windows Phone 8 project

When prompted, choose Windows Phone OS 8.0 as the target Windows Phone OS version.

New Windows Phone Application
New Windows Phone Application

Next, open up the Windows Phone application manifest file WMAppManifest.xml and chose the ID_CAP_SPEECH_RECOGNITION and ID_CAP_MICROPHONE capabilities. Do not change the other capabilities, which are selected by default.

WMAppManifest.xml
WMAppManifest.xml

Next, add two XAML pages to the application, calling them OptionOne.xaml and OptionTwo.xaml.

Next, we will add a Voice Command Definition file to the project. A Voice Command Definition file is an XML file that contains metadata about the speech commands the application will respond to. The structure of the file is explained below.

To add a new Voice Command Definition file, right click on the project and select to “Add New Item” and choose Voice Command Definition. Name the file VoiceCommandDefinitionDemo.xml.

Add New Item: Voice Command Definition
Add New Item: Voice Command Definition

We will next change how the voice command definition file will be handled by the Visual Studio build process. We will have to tag the voice command definition file as “content” by changing the property of the file and also changing the “Copy to Output Directory” setting to “Copy always” or “Copy if newer”. In this case, I have opted for “Copy always”.

Change the property of the file
Change the property of the file

Here is how the default content of the VoiceCommandDefinition file looks.

<?xml version="1.0" encoding="utf-8"?>
 
<VoiceCommands xmlns="http://schemas.microsoft.com/voicecommands/1.0">
  <CommandSet xml:lang="en-US">
    <CommandPrefix>Contoso Rodeo</CommandPrefix>
    <Example> play a new game </Example>
 
    <Command Name="PlayGame">
      <Example> play a new game </Example>
      <ListenFor> [and] play [a] new game </ListenFor>
      <ListenFor> [and] start [a] new game </ListenFor>
      <Feedback> Starting a new game... </Feedback>
      <Navigate />
    </Command>
 
    <Command Name="PlayLevel">
      <Example> replay level two </Example>
      <ListenFor> replay level {number} </ListenFor>
      <Feedback> Going to level {number}... </Feedback>
      <Navigate />
    </Command>
 
    <Command Name="PlayUnknownLevel">
      <Example> replay level two </Example>
      <ListenFor> [and] replay level {*} </ListenFor>
      <Feedback> Unknown level; going to level selection... </Feedback>
      <Navigate Target="LevelSelect.xaml" />
    </Command>
 
    <PhraseList Label="number">
      <Item> one </Item>
      <Item> two </Item>
      <Item> three </Item>
    </PhraseList>
 
  </CommandSet>
</VoiceCommands>
 
<!-- Example -->
<!--
 
    The preceding example demonstrates a hypothetical game called 'Contoso ROD3O!' which defines two
    Commands that a user can say to either start a new game or replay one of three levels in the game.  
    To initiate the PlayGame command, a user can say "Contoso Rodeo play  a new game" or "Contoso Rodeo
    play new game". Either phrase will start a new game. To initiate the second Command, a user can say
    "Contoso Rodeo replay level one", "Contoso Rodeo replay level two", or "Contoso Rodeo replay level 
    three".
    The second Command demonstrates how to use a PhraseList with a Command. PhraseLists can be updated 
    dynamically by the application (e.g., if a user unlocks a new level or game or feature, you might 
    want to allow the user to give commands for newfeatures after voice commands are already registered.)
    The third Command demonstrates how the {*} sequence can parallel another command to recognize speech
    that is not defined in the CommandSet.
 
  Note:
 
      [and] Indicates that "and" is optional. Making connecting words like this optional
            can help both "Contoso Rodeo, play new game" and "open Contoso Rodeo and play
            a new game" feel natural to speak.
                 
      {number} Defined separately from the Command, mapping to "one" or "two" or "three".
 
-->
 

We see that the <ListenTo> tag contains the voice command that the Windows Phone engine will listen to. <Feedback> tag shows the text Windows Phone will display to the user when the voice command has started processing. The <Navigate> tags contains the information where the user will be directed.

We will change this to as under:.

<?xml version="1.0" encoding="utf-8"?>
 
<VoiceCommands xmlns="http://schemas.microsoft.com/voicecommands/1.0">
  <CommandSet xml:lang="en-US">
    <CommandPrefix>Windows Phone Demo</CommandPrefix>
    <Example> Choose option 1 </Example>
 
    <Command Name="OptionOne">
      <Example> choose option one </Example>
      <ListenFor> choose option one </ListenFor>
      <Feedback> Selecting Windows Phone Demo option one... </Feedback>
      <Navigate Target="/OptionOne.xaml"/>
    </Command>
 
    <Command Name="OptionTwo">
      <Example> choose option two </Example>
      <ListenFor> choose option one  </ListenFor>
      <Feedback> Selecting Windows Phone Demo option two... </Feedback>
      <Navigate Target="/OptionTwo.xaml"/>
    </Command>
 
 
  </CommandSet>
</VoiceCommands>

In the changes above, we have created two commands, one being “choose option one” and the other “choose option two”. The <CommandPrefix> tag means that for the Windows Phone to successfully transfer us to OptionOne.xaml page, we need to say “Windows Phone Demo Option One”.

In the above file, we have provided two speech options to interact with the application. By saying, “Windows Phone Demo choose option one”, the Windows Phone 8 device will automatically navigate to OptionOne.xaml page. When you say “Windows Phone Demo choose option two”, the WindowsPhone VoiceCommandDemo application will launch and automatically navigate to the OptionTwo.xaml  page.

Now that our application is done, we can build and deploy to our phone and emulator and test.

Testing Our Application

To test our application, once the application is deployed, we can go to the start screen and long press the “start” button to open the voice command listener.

Press the “start” button to open the voice command listener
Press the “start” button to open the voice command listener

And then say, “Windows Phone demo option one”. If the command is successfully recognized, the screen will look as under.

Windows Phone Voice Command Demo
Windows Phone Voice Command Demo

The Windows Phone 8 platform will automatically take the user to the Option One page of WindowsPhoneVoiceCOmmandDemo application.

Option 1
Option 1

Summary

If you have trouble following along, sample code is available for our readers from here. I hope you have found this information useful.

About the author

Vipul Patel is a Program Manager currently working at Amazon Corporation. He has formerly worked at Microsoft in the Lync team and in the .NET team (in the Base Class libraries and the Debugging and Profiling team). He can be reached at vipul.patel@hotmail.com


Tags: applications, speech, Windows Phone 8




Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 


Enterprise Development Update

Don't miss an article. Subscribe to our newsletter below.

Sitemap | Contact Us

Rocket Fuel