October 31, 2014
Hot Topics:
RSS RSS feed Download our iPhone app

Designing an Interactive Voice Response System Using VoiceXML and CCXML

  • March 28, 2008
  • By Ponkumar and Sujata De
  • Send Email »
  • More Articles »
<form id="getWebservice">
   <property name="bargein" value="false"/>
      <block>
         <log expr="' travelsearch: departureDate =
            ['+departureDate+']'"/>
         <log expr="' travelsearch: returnDate =
            ['+returnDate+']'"/>
         <log expr="' travelsearch: departureCity =
            ['+departureCityCode+']'"/>
         <log expr="' travelsearch: destinationCity =
            ['+destinationCityCode+']'"/>

      <data name="MyData"

         srcexpr="'http://220.227.31.118:8080/teis/
            FlightBusinessService.jws?method=getMinFare&&
            departureCity='   + departureCityCode + '  &&
            destinationCity=' + destinationCityCode + '&&
            departureDate='   + departureDate + '      &&
            returnDate='      + returnDate"
            method="get"
            fetchtimeout="100s"/>
         <assign name="document.MyData"
                 expr="MyData.documentElement"/>
         <assign name="response"
                 expr="GetData(MyData,'getMinFareReturn')"/>
         <log expr="' travelsearch: response = ['+response+']'"/>

      <exit namelist="response"/>
   </block>
</form>

Listing 2: Snippet from userTravelSearchDetails.vxml

<?xml version="1.0" encoding="UTF-8"?>
<vxml version="2.0" xmlns="http://www.w3.org/2001/vxml">

   <var name="resp"
        expr="session.connection.ccxml.values.response"/>


   <form id="getWebservice">
      <block>
         <log expr="'THE SEARCH VoiceXML ' + resp"/>
         <prompt>
            <value expr="resp"/>
            <break time="1.5s"/>
         </prompt>
      </block>
   </form>
</vxml>

Listing 3: userTravelSearchResults.vxml

Grammar

Grammars are used to specify the constraints of the expected utterance, whether initiated by user or given to the system as a response to a directed dialogue. A sample grammar used by the VoiceXMLs in the application is shown in Listing 4:

CITY [
(new [york yok yak])        {<city "new york ">      <code "NYC">}
(san [jose hose hoze])      {<city "sanjose ">       <code "SJC">}
[boston bostan bostun]      {<city "boston ">        <code "BOS">}
(san [francisco fransisco]) {<city "sanfrancisco ">  <code "SFO">}
]

Listing 4: Sample grammar file for Cities

Challenges of Voice User Interaction using VoiceXML

Often, the main challenge in VUI is the dependence on the speaker. Every speaker has a different voice where the various qualities of speech—amplitude, frequencies, and context—are different from those of another voice. In addition to that, the locale of the speaker makes a difference in the language and the dialect of the speaker.

To avoid this problem here, the VUI is designed to be a semi-automatic user interface. If the system fails to comprehend the caller's utterances, the call will be forwarded to a trained human operator without loss of any data. The operator will be in conference with the caller, record the information of the user session, and forward the request to the business service. The rest of the workflow will remain the same.

Because the current implementation of VUI is in a real-life commercial application, there also is a need to introduce advertising while the caller waits for the search results.

However, VoiceXML is not capable of handling multiple events, asynchronous communication, and call conferencing. Hence, CCXML is introduced to modify the workflow.

CCXML

The call control markup language is the W3C standard that complements VoiceXML with the capabilities of advanced telephony, handling asynchronous and multiple events, conferencing, switching between various audio channels, and so forth. CCXML is used in the application in the following scenarios:

  1. Handling multiple events: In the travel search call flow, certain spots were selected for playing advertisement. One such scenario is 'search wait time'; two dialogs are executed in parallel:
    1. Fetch the advertisement and play until the travel search completes
    2. Search for travel details
  2. Call Conferencing:
    1. Conferencing with external human operator in case the user utterance is not recognized by the voice platform.
    2. In travel search, fetching the number for the listing from the DB and creating the call the dealer.




Page 2 of 4



Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 


Sitemap | Contact Us

Rocket Fuel