VoiceVoiceXML 2.0 Grammars, Part I

VoiceXML 2.0 Grammars, Part I

This technical series will provide programmers with a complete
introduction to the VoiceXML 2.0 grammar format. In part I, we will discuss the XML and ABNF formats, as well as the structure and elements included in a VXML 2.0 document.

Overview

Grammars define the words and sentences (or touch-tone DTMF
input) that can be recognized by a VoiceXML application. One big
drawback of VoiceXML 1.0 was that it lacked a standard speech
recognition grammar format. To some degree, this reduced the
benefits of the specification because it left the burden on VoiceXML
browser developers to define the grammar language and format. For
example, application grammars written for Nuance Voice Web Server
would have to be re-written to work on IBM Voice Server. This
problem was rectified with the Speech Recongition Grammar
Specification (SRGS) introduced by the W3C Voice Browser group in
conjunction with the VoiceXML 2.0 specification.

XML or ABNF?

The VoiceXML 2.0 grammar specification provides two text formats
for writing speech recognition grammars: XML or ABNF. XML is a Web
standard for representing structured data. Many programming and
editing tools incorporate XML editing and processing capabilities.
These XML tools can be used to write VoiceXML 2.0 grammars. ABNF
stands for Augmented Bacus-Naur Form, and is a format used to
specify languages, protocols and text formats. For example HTTP, the
communications protocol used on the World Wide Web (and for
VoiceXML applications), is specified in ABNF format.

The ABNF grammar format uses special characters to define grammar
expressions in a text string while XML grammars are composed of text
strings enclosed in XML elements. Whether to use the ABNF or XML
format is up to you, however, VoiceXML 2.0 only requires implementers
to support the XML format. Therefore, you may want to use the XML
format to write grammars if portability is important to you.

If you’re already experienced with the GSL or JSGF grammar
formats, then you’ll likely prefer the ABNF format because of its
similarity. If you decide to use the XML format, you will quickly
discover that it is extremely verbose compared to ABNF, making it
more difficult to read. On the other hand, using the DTD or XML
Schema for the XML grammar format in conjunction with an XML editor
makes the task less tedious and reduces syntax errors. The authors
of the VoiceXML 2.0 grammar format have also included an XSL style
sheet for converting XML grammars to ABNF format, which may aid
linguists who prefer to proof grammars in a less verbose text
format.

Examples will be listed in both ABNF and XML format.

Grammar Headers

ABNF and XML grammar files must contain specific header
information; otherwise, the VoiceXML interpreter will fail to
recognize the grammar properly. The elements of a grammar file are:

  • grammar declaration
  • language/locale
  • mode
  • root grammar
ABNF
# ABNF 1.0 ISO-8859-1;
language en;
mode voice;
root $topRule;
XML
<?xml version="1.0" encoding="ISO-8859-1"?>
<grammar version="1.0" 
  xmlns="http://www.w3.org/2001/06/grammar"
  xml_lang="en" mode="voice" root="topRule">
...

Grammar declaration

The grammar declaration specifies the grammar version and
optionally, the character encoding scheme that should be used. The
grammar version should always be set to 1.0. The character
encoding specifies the character symbols that will be used for the
grammar. For example, ISO-8859-1 is usually the character encoding
used for English. Asian languages including Japanese and Chinese
(Big5 or Mandarine) would use a different encoding scheme. In ABNF
grammars, this is the first line. In XML, the encoding scheme is
defined by the encoding attribute of the XML declaration (the
first line of any XML file). 

ABNF
# ABNF 1.0 ISO-8859-1;
language en;
mode voice;
root $topRule;
XML
<?xml version="1.0" encoding="ISO-8859-1"?>
<grammar version="1.0" 
  xmlns="http://www.w3.org/2001/06/grammar"
  xml_lang="en" mode="voice" root="topRule">
...

The grammar version in an XML grammar is defined by the version
attribute of the <grammar> element.

Language

Unless the grammar is a DTMF grammar, a language must be
specified in the grammar header. For ABNF grammars, the language
parameter defines the language (in this example, US English):

ABNF
# ABNF 1.0 ISO-8859-1;
language en;
mode voice;
root $topRule;
XML
<?xml version="1.0" encoding="ISO-8859-1"?>
<grammar version="1.0" 
  xmlns="http://www.w3.org/2001/06/grammar"
  xml:lang="en" mode="voice" root="topRule">
...

The language in an XML grammar is specified by the xml:lang attribute of the <grammar> element.

Mode

Grammars can be scoped for speech input (voice) or
touch-tone input (dtmf) based on the value of the mode
parameter. The default mode is voice. If the grammar is
scoped dtmf, then speech input will not be recognized.
VoiceXML 2.0 grammars do not allow mixed mode grammars. That means
that a voice scoped grammar cannot include a dtmf
scoped grammar or vice versa. In cases where we may want an
application to accept both voice and DTMF input, two separate
grammars can be defined within a given VoiceXML scope so long as
they aren’t combined into a single grammar in any way.

ABNF
# ABNF 1.0 ISO-8859-1;
language en;
mode voice;
root $topRule;
XML
<?xml version="1.0" encoding="ISO-8859-1"?>
<grammar version="1.0" 
  xmlns="http://www.w3.org/2001/06/grammar"
  xml_lang="en" mode="dtmf" root="topRule">
...

Root grammar 

When grammars contain many sub-grammar rules in a single file,
it’s important to identify the root grammar, or main the grammar
that will be executed when a VoiceXML dialog calls the grammar
file. 

ABNF
# ABNF 1.0 ISO-8859-1;
language en;
mode voice;
root $topRule;
XML
<?xml version="1.0" encoding="ISO-8859-1"?>
<grammar version="1.0" 
  xmlns="http://www.w3.org/2001/06/grammar"
  xml_lang="en" mode="voice" root="topRule">
...

Filename Extensions

The filename extension for ABNF grammars is .gram
and .grxml for XML grammars. This is the recommended (but not
required) filename extension format for grammar files.

About Jonathan Eisenzopf


Jonathan is a member of The Ferrum Group, LLC which specializes in Voice Web consulting and training. Feel free to send an email to [email protected] regarding questions or comments about this or any article.

Latest Posts

Related Stories