In this article, we will test six VoiceXML browsers for
VoiceXML 2.0 conformance to determine how compatible today’s
VoiceXML platforms are with each other.
Getting a group of scientists to agree on something is a
challenge. Getting a small group of toddlers to play quietly is even
more challenging. Getting business people and scientists to agree on
anything is nearly impossible. Similarly, getting companies to create
conforming VoiceXML browsers that are compatible with each other has
so far been impossible.
First, we need to define what conformance means
within the context of VoiceXML. An application that conforms to a
standard means that it fully implements the specification, matches the syntax and follows the rules. For
example, many off-the-shelf applications today are able to
communicate with other programs on the network. How do they do that?
Did all of these software companies work together to enable their
programs to talk to one another? Well no, they are able to
communicate because they all use a common network protocol called
TCP/IP (actually, that’s 2 protocols, TCP and IP, but they have a
close working relationship). The Internet Engineering TaskForce (IETF)
has been responsible for creating networking protocols for several
years now. The reason computer programs are able to communicate with
each other on a network without understanding the communication
mechanisms of every other application out there is because they all
utilize a common communications protocol. This is a very powerful
concept because it provides a common communications mechanism that allows programmers to
leverage existing technologies to create ever more comprehensive and
powerful network applications. Imagine having to write your own
communications layer every time you wrote a new application.
VoiceXML is in fact a technology that leverages
several layers of standardized protocols that are used to transport
messages between applications (in our case, a VoiceXML browser and a
The diagram above depicts the standards that a voice
browser relies on to communicate.
Why Conformance is Important
I conducted an informal survey made up of participants that are
either evaluating VoiceXML platforms or have already implemented a
When asked why they were considering VoiceXML, the most common
- New technology
- Based on open standards
- Can move to a different platform later
- Can extend Web applications
Next, I asked participants to rate a list of nine VoiceXML
benefits from one to ten, one meaning that the benefit was not
important at all and ten meaning that it was very important. The
list was created based on a common set of expected benefits that my
company, The Ferrum Group, typically gets from customers when they
come to us to help them select and implement a speech IVR solution.
The top three VoiceXML benefits important to
Provides a wider variety of platform choices
Uses open standards
Can port applications to any other VoiceXML
Finally, I asked participants What is the most important benefit that you want to see from VoiceXML?
The two most common answers were:
While the survey was not scientific, the results did seem to
indicate that customers were most interested in the benefits that
come from using an open standard like VoiceXML.
My conclusion as to why conformance is important is that
customers naturally expect it as a byproduct of an open standard.
Without conformance, the benefits of using an "open
standard" are greatly diminished.
Conformance Test Suite
The next step in my study was to test VoiceXML conformance across
a range of VoiceXML browsers using only the VoiceXML 2.0 and Speech
Recognition Grammar Specification (SRGS) as guidelines for creating
the test source code. I did not refer to any VoiceXML or SRGS documentation from any of the platform providers.
The purpose of this test was to determine how many platform
providers that claim VoiceXML 2.0 support are actually able to run
compliant code without requiring additional modifications.
Instead of using a proprietary VoiceXML tool, I decided to use
XML Spy and the DTDs provided by the Voice Browser working group
(which are linked within VoiceXML 2.0 and SRGS 1.0 specifications).
This ensured that:
- Code that was created was platform independent
- Code was validated against the official DTDs
Test Source Code
For the test, I developed a minimal VoiceXML application that
- One VoiceXML form to gather a social security number
- One SRGS XML DTMF grammar
VoiceXML Source Code
<?xml version="1.0"?> <!DOCTYPE vxml PUBLIC "-//W3C//DTD VOICEXML 2.0//EN" "http://www.w3.org/TR/voicexml20/vxml.dtd"> <vxml version="2.0" xmlns="http://www.w3.org/2001/vxml"> <form id="ssn"> <field name="ssn_number"> <grammar src="ssn_dtmf.grxml" mode="dtmf" type="application/srgs+xml"/> <prompt bargein="true">Please enter your social security number</prompt> <filled> <prompt>You entered <value expr="ssn_number"/> </prompt> <clear namelist="ssn_number"/> <goto next="#ssn"/> </filled> </field> <catch event="nomatch noinput"> <reprompt/> </catch> </form> </vxml>
SRGS XML Grammar Source Code
<?xml version="1.0"?> <!DOCTYPE grammar PUBLIC " -//W3C//DTD GRAMMAR 1.0//EN" "http://www.w3.org/TR/speech-grammar/grammar.dtd"> <grammar mode="dtmf" version="1.0" xml_lang="en-US" root="ssn" xmlns="http://www.w3.org/2001/06/grammar"> <rule id="ssn" scope="public"> <ruleref uri="#digit"/><ruleref uri="#digit"/> <ruleref uri="#digit"/><ruleref uri="#digit"/> <ruleref uri="#digit"/><ruleref uri="#digit"/> <ruleref uri="#digit"/><ruleref uri="#digit"/> <ruleref uri="#digit"/> </rule> <rule id="digit" scope="private"> <one-of> <item>1</item><item>2</item><item>3</item> <item>4</item><item>5</item><item>6</item> <item>7</item><item>8</item><item>9</item> <item>0</item> </one-of> </rule> </grammar>
The results of the conformance test for the 6 platforms are
listed below. The good news is that 3 out of 6 platforms executed
the code. The bad news is that 3 of the 6 platforms didn’t.
While Browser 4 and Browser 6 didn’t execute the code, the changes
required to make it work were minimal. However, for the sake of the
test, the code either worked or it didn’t. To be fair, I did go to
the trouble of troubleshooting what needed to change to allow the code
to run. This information is detailed below.
To make the code work on Browser 4, I had to change the DTD
reference from W3C to one provided by the vendor.
This is a minor change that is acceptable when you want to use
extra browser extensions, however, it should still be capable of running
generic VoiceXML code that uses the default W3C DTD.
The second change that I had to make was to change the mime type
attribute of the <grammar> element to:
This is forgivable because the VoiceXML specification only
provides an example of what the mime type might be rather than
stating what it must be.
Browser 5 was more difficult. I gave up troubleshooting the problem
after spending an hour trying to figure it out.
Like Browser 4, Browser 6 required a different DTD.
Also, as with Browser 4, the mime type attribute of the <grammar>
element needed to be changed to:
The third and final change was to remove the SRGS grammar DTD. It
took me a while through the process of elimination to discover the
solution to this particular problem.
VoiceXML DTD Problems
During the testing process, I noticed that several code checking
tools offered by the platform vendors consistently complained about
the W3C DTD referenced in the VoiceXML test program. One of the
VoiceXML contributors later confirmed that the DTD listed in the
specification contained errors, which would be fixed soon. This may
or may not have contributed to the fact that Browser 4 and Browser 6
required a different DTD since some XML parsers would not have been
able to validate VoiceXML source code using the W3C DTD.
Testing Tool Validation Problems
One thing I noticed as I was testing the various platforms is
that the source code valuators offered by the vendors often gave
false positive results meaning that when I tested a VoiceXML program
that I had intentionally broken, the majority of the tools often
reported the code to be valid even though it would not work when I
dialed into the application. This made the troubleshooting process
all the more difficult. Browser 2 and Browser 3 were the only
platform code valuators that accurately identified problems in the source code.
I spent about 60 minutes troubleshooting each of the three
platforms that didn’t run the VoiceXML test program and I was only
able to figure out how to fix the problem on two of them. The fact
that debugging output was not very helpful most of the time meant
that I had to resort to fixing problems through the process of
elimination, which is very time consuming. These code valuators need
to do a better job of inspecting element data and attributes in
addition to validating the code against a DTD.
From my perspective, it would be better to use a proprietary
standard that was supported by a wide range of vendors whose
platforms achieved interoperability and conformance than to use an
"open standard" in which implementations were inspired by
the standard rather than conforming to it.
Unless ALL VoiceXML platforms are able to run compliant code,
VoiceXML will not be portable, will not meet customers expectations,
and will therefore not be very useful. If this test of 6 browsers is
an general indication that only 50% of the available platforms are
VoiceXML compliant, then customers need to be careful to test
platforms for compliance before making a final decision.
In the future, I plan on extending the VoiceXML 2.0 test script
to exercise the rest of the specification and also plan to expand
the number of platforms that will be tested. If you have ideas or
recommendations on what the test script should contain or would like
to recommend VoiceXML gateways that you’d like to see tested,
please send me an email with that information.
About Jonathan Eisenzopf
Jonathan is a Senior Partner of The Ferrum Group, LLC
which provides speech IVR consulting, training, and voice user
interface design. Feel free to send an email to firstname.lastname@example.org
regarding questions or comments about this or any article.