VoiceExcerpt: Early Adopter VoiceXML: VoiceXML with XSLT (HTML and WML)

Excerpt: Early Adopter VoiceXML: VoiceXML with XSLT (HTML and WML)

Developer.com content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

Wrox Press Book – Early Adopter VoiceXML


Chapter 7, VoiceXML with XSLT (HTML and WML)


WROX Press' New VoiceXML book
This chapter examines the use of the Extensible Stylesheet Language for Transformations (XSLT) as a tool for the generation of VoiceXML. I intend to illustrate a complete, end-to-end example of implementing a voice interface for a client-server database via XML and XSL. The case study will demonstrate the power of XSL for simultaneously delivering multiple interfaces to the same data by developing HTML and WML front-ends also.


Our case study takes us inside the rarified atmosphere of a fictional cash-strapped dot-com called MyRubberBands.com, purporting to be the “premier rubber bands site on the Internet”. In the aftermath of the stock market meltdown, where our heroes saw their market valuation drop by over 95%, senior management, led by CEO Dr. Todd, has decreed that adding WML and voice functionality to the existing order status web site is do-or-die. MyRubberband’s competitors have just rolled out their own WAP/voice access solution, and an all-out effort is necessary to catch up. Follow the programmers as they embark on their project to quickly roll out an equivalent capability.


Our development team have decided to implement an XSLT-based solution to the problem. XSLT is an XML-based language for transforming input structured according to one XML vocabulary to structured output in another XML vocabulary, or some general text form.


XSLT treats the document to be transformed as a set of nodes. An XSLT stylesheet defines a set of rules, or templates. When a template matches one of the nodes in the source document, the output structure given in the template is created in the transformed document. XSLT uses the W3C XPath specification language to query XML data. XPath is strongly analogous to SQL, and lets us specify complex rules to match nodes in a document.

For a lot of applications, and for getting off to a quick start when processing XML, it is just the ticket — especially when you consider that XSLT is a relatively new technology so that processors should still have plenty of performance improvements possible. Use of an XSLT processor avoids the startup overhead of using a full parser API from a compiled language, making it more suited to dynamic web applications.

MyRubberBands.com — A Case Study

Our legacy database is implemented with MySQL, an open source client-server relational database management system. SQL code for the schema and a set of sample data is included in the code download for this book.


An XML schema is used to represent an export of the legacy database instead of a Document Type Definition (DTD) because the schema standard is now complete, and increasing numbers of developers will be looking for the extra power schemas offer, especially as new development tools become available. An excellent primer on schemas is available from the W3C at http://www.w3.org/TR/xmlschema-0/.


Scripts to export the database to XML format have been written using the Perl language, and the Data Base Interface (DBI) library. The Perl script shown was developed on Windows using Active State Perl, but should run on any platform, be it Windows or Unix. Many commercial databases, such as SQLServer 2000, are capable of exporting directly to XML, and so this step could be avoided entirely.

Business Requirements

With their competitors rolling out both voice and WAP access to services, MyRubberBands.com has no choice but to follow suit or lose market share in the cutthroat world of elastic band marketing. Due to market pressures, the new system must be up and running as soon as possible, and given this short development cycle, the requirements have been scaled back to providing simply voice and WAP access to a customer’s order status data.


However, some thought can still be given to the future. Rather than develop a quick and dirty “throwaway” voice interface, by putting in a little extra work now, the engineers can build a reusable infrastructure. By exporting their database to an XML format, they access the power of XSL to create VoiceXML and WML interfaces, and are able to transparently replace parts of the existing HTML site with dynamically created pages.

System Architecture

The figure on the next page is a block diagram showing the existing components of the system, and the relationships to the new XML/XSLT system required to implement the voice interface.


This drawing is not complete in all areas. For example, no method for user login and authentication is given, because such a system would already exist for an e-commerce site, and because although XML/XSLT would be helpful for creating device-specific login code, we are not going to examine on- the-fly transformation (inside a web server, for example) in this particular study.

Designing a Voice Interface

With these rather vague requirements in mind, we can make some design decisions, and sketch out a rough model for the voice interactivity envisaged. The goal is to make the experience simple and intuitive.


There will be a main menu of options. This is the entry point to the application, and the user can always return to it with a single voice command.


  • Online help will always be available. This will use the VoiceXML tag to simplify implementation, and also to overload any built-in help that may be offered by the voice platform.

  • The number of available options from the main menu should be kept to a minimum. The total number of states should also be minimized. This means that the behavior of the current command should not depend on what the previous command issued was. For example, the word “menu” should always refer to the main menu in every context.

  • The top-level commands from the main menu should always be active. If the main menu offers the command “foo”, the user should be able to say “foo” at any point in later dialogs with the same result.

The following state diagram illustrates these design goals. The main options are “order status”, “product list” (with a link to voice ordering via the existing phone service bureau), and “more information” to access a frequently asked questions list. For a more detailed examination of the issues to consider when designing voice applications, refer to Chapter 6. The order status menu leads to a variable number of additional choices, depending on the number of records in the user’s order history.

Creating a Markup Language

Naturally, our fictional rubber band team already has a database-driven e-commerce web site. Like all legacy databases, it has evolved over time into a hodgepodge of tables, some of which were hastily knocked together to implement poorly-defined requirements. We will assume that the company is operating a traditional Java Server Pages (JSP) site.


Since most of the tables in the database are relevant to the requirements of the various interfaces, the developers plump for a “verbose” approach to their XML. They will dump all of the data from all of the tables into XML form, even though some of it may be unnecessary in the VoiceXML, WML, or HTML contexts.

MyRubberbandsML by Trial and Error

The first thing any XML dialect needs is a top-level element. Since we might want to export all the customers in the database, or only one at a time, let’s add an attribute on the top level element to describe what kind of data feed this XML document constitutes.


<myrubberbands export_type=”single”>

The thing we are most interested is a customer record, because that will be the set of data needed to generate the voice interface for querying order status. Since we might have more than one customer in a file, each individual <customer> and their associated order history will be contained by a <customer_record> element. Note that the time stamps are in XSL’s standard format, and won’t translate easily for rendering by a TTS engine. The <customer_record> element that starts here is very lengthy, and is not closed until the associated addresses and order history that follows have been given.

<customer_record>
<customer id=”1″>
<firstname>John</firstname>
<middle>Quincy</middle>
<lastname>Public</lastname>
<username_or_email>jqp@foo.foo.com</username_or_email>
<password_or_pin>bar</password_or_pin>
<date_joined>2001-05-18T16:17:15</date_joined>
<date_lastchg>2001-05-18T16:17:15</date_lastchg>
</customer>

As shown in the database schema, a customer can have one or more addresses. The XML representation should preserve the foreign key relationship with the customer table, and this relationship should not be dependent on the position of the elements. In this case, for example, both the customer profile and all associated addresses are nested within the <customer_record> tag. This is why all of the <customer_address> elements carry the customer_id attribute inside: it mirrors the relationship between the customer and customer_address tables in the database schema.


In this case, the database schema allows the customer_address table to store real physical addresses like billing and shipping addresses, or e-mail address for alternate methods of customer contact. Hence, the <customer_address> element can contain the optional <email> element.


<customer_address address_type=”Ship To Address”
customer_id=”1″>
<address1>4321 La Place Ct</address1>
<address2>Ste 306</address2>
<city>Carlsbad</city>
<state_or_prov>CA</state_or_prov>
<postalcode>92008</postalcode>
<email></email>
<phone>7605551212</phone>
</customer_address>

Since the main objective of the voice application is to allow the user access to their order history and status information, it makes sense to include the history inside the <customer_record> scope. In this case, because we will need to enunciate the order time, and because we’d rather not use XSL’s limited text processing capabilities, we’ll add the attribute sayas to the <order_date> element. This provides a pronunciation that can be used with the VoiceXML <sayas> tag for TTS. However, the desired pronunciation of the date and time cannot be derived from the database alone, as addressed in the section Generating MyRubberbandsML.

Join us next week for more of Chapter 7 from Early Adopter VoiceXML.

This book excerpt comes to us from WROX Press–technical books that you can count on!

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends & analysis

Latest Posts

Related Stories