Technical books come in a series of waves. The first wave
describes the technology and provides some examples. The second and
third waves consist of more refined content and those mega reference
versions that purport to culminate everything on the topic.
In a way, I think that the authors of this book, Chetan Sharma
and Jeff Kunins, have hopscotched the whole evolution and produced
a comprehensive title that includes gems that can only have
originated from masters of the craft.
Chapters 1-4 provide a solid overview of the evolution of
pervasive computing and its speech processing roots. Reading chapter
3 on the birth of VoiceXML was a bit nostalgic for me and properly
gives credit to the many researchers and early developments that
made it possible for inferior-brained humans like myself to develop
speech applications.
Chapter 5 does a decent job of reviewing the various development
environments and primes readers for chapter 6, which introduces
readers to the VoiceXML language. The authors didn’t water down this
chapter at all. They dove right into the core concepts of the
language and provided generous amount of text to accompany the
examples. My only concern with the chapter is that it might be a bit
too heavy for readers that aren’t already professional
developers. That’s ok with me, after all, this book is part of the
"Professional Developer’s Guide Series." I did appreciate
that the authors used Javascript quite extensively in their
examples, while other books only mention it but don’t really delve
into the topic. Even though the chapter was fairly long, I still felt like
something was missing. Perhaps not enough was covered in the
chapter. This perceived shortcoming is forgivable however.
The very next
chapter is a complete VoiceXML reference. I was a bit perplexed by
the placement of the chapter since reference chapters usually brush
up against the appendix and are usually an afterthought. The chapter
presents each VoiceXML element alphabetically including its syntax,
a full description of what it is and how it’s used, a list and
description of its attributes, a list of the other elements that it
can be contained in or that it can contain (parents and children),
and an example of the element being used in the context of a larger
body of code. After I read through the reference, I was actually
happy with its placement. Developers shouldn’t skip over this
chapter.
Chapter 8 introduces grammars and speech synthesis tags. The
chapter does a good job of presenting the new grammar specification
that was introduced with VoiceXML 2.0. SRGF examples are presented
in both ABNF and XML forms. Though the chapter will be appreciated
by developers who are upgrading their grammar knowledge from
VoiceXML 1.0, I wish that the authors had included one or two more
examples of grammars working within a VoiceXML application. The
brevity of the SSML portion of the chapter is appropriate since the
concepts are easier to grasp and because SSML will not likely be
used as extensively as grammars or other VoiceXML elements.
Chapter 9, which covers dynamic VoiceXML scripting, is short and
to the point, as it should be. The writers assume that the reader
already has a solid understanding of Web development. If you’re new
to developing dynamic Web applications, you may want to read through
a book on Web scripting first. There’s really only so much you can
say about dynamic scripting in a VoiceXML book before it becomes a
Web development book, which would be a bit off-topic.
Chapters 11, 12, 14, and 15 provide a wealth of information on
the design and development process. These chapters were very rich in
content and provide a basis for establishing best practices for
speech application development. It was obvious to me that the
information that is presented was not just invented to fill pages,
but comes from a wealth of personal experience.
In conclusion, this is a very well rounded book on VoiceXML. I am
very happy with the mix of content, summaries of important concepts
such as linguistics, speech recognition, and speech synthesis,
as well as the in-your-face examples and complete reference. In
fact, I liked it so much that I will probably be using it as a
standard reference in my company’s VoiceXML training course.
About Jonathan Eisenzopf
Jonathan is a member of the Ferrum Group, LLC which specializes in Voice Web consulting and training. He will be teaching the VoiceXML Bootcamp June 10-13 in Washington, D.C. Feel free to send an email to eisen@ferrumgroup.com regarding questions or comments about this or any article, or for more information about training and consulting services.