Voxeo Documentation - VoiceXML 2.1 Tutorials Overview

If you are new to the idea of writing Voice-web applications, then this is the place to start. The pages contained in this section will serve to educate you on VoiceXML theories and practices from the ground up. Even if you have never written anything more complex than a simple HTML web page, by the time you have finished the lessons presented here, you will be ready to write top-shelf voice applications for the world.



Tutorial: Hello World
The place to start for all neophyte voice developers. This tutorial covers the basic syntax for VoiceXML 2.1 applications, and walks you through how to get your very first Hello World voice-web application operating within a half hour.


Tutorial: Voice Recognition
This section covers the basic principles of voice recognition, including grammar theory and markup, as well as touching on conditional logic in the VoiceXML dialog . This tutorial will introduce GSL voice recognition grammars into our arsenal of knowledge, and show us the correct way to write a simple inline grammar structure.


Tutorial: Call Flow
Once we have mastered voice grammars, the next logical step is learning how to programmatically transition callers from one destination to another within your voice application. The first tutorial presented in this section will show us how to write an application that uses conditional logic to allow callers to programmatically navigate from one point to another within a single document.

 Tutorial: Document Navigation
Similar to the previous tutorial where we learned how to navigate between logical form items within a single document, this next tutorial covers how we can navigate between separate VoiceXML 2.1 documents, which is widely used in most production grade IVR application deployments. In addition, we also introduce you to the concept of using variable declarations, and also the all-important topic of the scoping of these variables within the VoiceXML 2.1 context.

Tutorial: Using Audio Files
Text to Speech is fine for development and demo purposes, but recorded audio files add a much more professional feel to your application. This tutorial covers correct sound formatting and shows us how to record, and encode a sound file for inclusion into our application. Also included, at no additional charge, is a sample of an inline voice grammar that defines explicit return values, and scope activation. We also cover the use of Universal Events, and how to elegantly trap and handle these navigational or recognition events when they arise.

 Tutorial: Call Transfer
Here, we cover the finer aspects of placing a bridged call from a voice-enabled application to an external PSTN destination. After completing this lesson, you will know how to place an outbound call within the VoiceXML 2.1 framework, and how to intelligently trap the events and errors that may occur.

Tutorial: Caller ID and Called ID
This tutorial explains the caveats and benefits of being able to capture caller ID and called ID from your callers, which can be a very important topic in a real-world scenario when we must keep track of who called what application number. By the time that you have finished this section, you will have superior knowledge on the intricacies of the telephone communication standards relevant to caller ID and called ID and how it applies to voice-web applications.

 Tutorial: DTMF Recognition
Now that we have mastered voice grammars, it is time to look at how to capture DTMF input, Dual Tone Multiple Frequency, from a touch-tone phone. This tutorial will show you the best way to create a DTMF grammar for caller interactions and then allow the caller choices to dictate a course of action within the application.

Tutorial: JavaScript and VoiceXML
Now that we have covered the basics, this tutorial will teach you how to seamlessly integrate JavaScript within your VoiceXML code by way of the <script> element. As JavaScript adds a robust arsenal of features within VoiceXML, you can use this lesson as a springboard for adding all the great things in your JavaScript library to future voice-web deployments.


Tutorial: Subgrammars
Flat-file grammars are usually good enough for capturing simple utterances , but eventually, we will want to create a grammar structure that has a bit more power and flexibility. Covering the fine art of multi-level complex grammars, this lesson will show you the basics on how to create a simple GSL subgrammar construct for applications that require a more complex utterance recognition standard.


Tutorial: grXML Grammar Weighting
When you have to second guess what your callers are likely to say, and you must tweak your grammars for probability accuracy, this is the section for you. Touching on the craft of GSL grammar probability weighting, going through this lesson will show you how to add or decrease the probability for any grammar to be recognized as a matched value, and will prove invaluable when specific grammar tuning is required for like-sounding utterance values contained within a GSL voice grammar.


Tutorial: Subdialogs
Read this section to learn how to create modular dialog components that will save you much time and effort when you require a multitude of recognition dialogs that follow the same input formats. In most cases, having an application containing several recognition fields will do the job just fine, but in other cases, crafting a subdialog-oriented application structure can allow you you to leverage the power of subdialogs.


Tutorial: Shadow Variables
Shadow variables underlie the VoiceXML application and they allow you to access a bevy of information from caller input. When you have completed this tutorial, you will have learned how to harness the power of shadow variables in this lesson, which allows you to tune your applications for maximum performance based on a caller 'footprint'.


Tutorial: N-best Lesson
N-best post-processing is indeed the Holy Grail when it comes to ensuring that you get an accurate grammar match from your callers. Learn to write your own application that uses this powerful feature by checking out this lesson on N-best post processing.


Tutorial: Outbound VoiceXML Applications Using HTTP
Herein we learn how to use the api.voxeo.net servlet to make an outbound notification type call that is initiated using an HTTP request instead of one that is started by an inbound phone call.


Tutorial: Mixed Initiative Dialogs
For the advanced VoiceXML developer, this tutorial illustrates the theories of grammar scoping, more advanced subgrammar examples that allow for mixed-initiative dialogs: when you need the initial input from the caller to dictate the application flow by filling multiple recognition fields with a single utterance, a Mixed Initiative dialog is the methodology to employ.

 Tutorial: XML Grammars - Deprecated
The XML format of grammars, more formally known as Speech Recognition Grammar Specification, is the future of all VoiceXML grammars. Unlike the simpler GSL syntax that we covered in previous tutorials, SRGS is not a vendor-specific grammar markup, and that means added portability for all your IVR applications. This tutorial details how you can craft an input grammar that is designed to last, due to its 100% compliance to the W3C specifications.
 Tutorial: <foreach> and <data>
This tutorial covers how a developer can leverage the VoiceXML 2.1 <data> element to access information stored in an XML document, such as an RSS newsfeed, and import it into the application execution context using the Document Object Model. In addition, we illustrate how one can use the <foreach> element to loop through the fetched data incrementally without having to rely on client-side scripting.

Tutorial: Using the <mark> Element
The next tutorial shows us how we can use the new VoiceXML 2.1 <mark> element to craft a more intuitive dialog that will programmatically determine a callers choice based on the output markers set for any voice prompt that lists the options to our callers. This can be very handy when a scenario arises where a wide array of similar grammar entries could obfuscate the actual caller input.


Introduction to Server Side Languages
This section touches on the basics of how you can use a server side language to enhance your voice application with dynamic content. Leveraging the power of ASP, JSP, or PHP greatly enhances the features of any application, and learning how to integrate dynamic markup with static VoiceXML content is a must-read for any developers new to the idea of writing a production-grade IVR application.


Passing Querystring Variables Using ASP/JSP/PHP/CF
Eventually, a developer will come to the point where he needs to take all the caller input gathered, and pass it along to a new document for post-processing, or maybe save it off for storage in a database. This tutorial illustrates how we can take caller input from a dialog, send the information along to a new document, and then parse the data from the resulting querystring using a variety of different server-side markup languages, including JSP, ColdFusion, ASP, and PHP.


Tutorial: Event Logging
In this next tutorial, we build on the concepts learned from the preceding section so that we can take caller input, and shove it off to a dynamic document using the <data> element for long-term storage in a text file that is automatically updated as input is received from the caller.


Tutorial: Dynamic Grammars
This lesson will show you how to create and implement a dynamic grammar generated from a MS Access database, thus making for a maintainable and fluid grammar design for your VoiceXML application.


Tutorial: Screen Scraping
Need a way to grab any data off a web page and make it accessible using a phone? This section and tutorial will show you how to use the ColdFusion or PHP server side languages to screen scrape a web page and send the data to your callers.


See Also

VoiceXML Development Guide v2.1 Overview

Voxeo Support

VoiceXML is an XML markup language format that you can use to specify interactive voice dialogs between a human and a computer. A voice application developed using VoiceXML can be deployed in a similar way an HTML application is deployed. While HTML uses a visual browser to display data, VoiceXML uses a voice browser, for example, the Voxeo Prophecy VoiceXML browser, to interpret the VoiceXML data. For more information, see the VoiceXML 2.1 Development Guide.

Hyper Text Markup Language is the predominant markup language used for web pages to display data. The acronym for Hyper Text Markup Language is HTML. HTML describes the structure of text-based information in a document by identifying text as links, headings, paragraphs, lists, and so forth. The HTML text is supplemented with interactive forms, embedded images, and other objects.


The set of valid utterances recognized by the speech recognition system in a voice application. Grammars are commonly specified using symbolic notation which describes context-free grammars.


A dialog in a voice application is an exchange of spoken words, phrases and sentences presented as prompts that define what callers will say or hear during the course of a call. A dialog can be a menu that presents a caller with options and then transitions to another dialog based on the selection. A dialog can also be a form that defines an interaction and collects caller values for each field on a form.

Interactive voice response is software technology designed for a computer used to detect and process voice and DTMF telephone keypad inputs from humans. The acronym for interactive voice response is IVR. IVR systems can typically respond to inputs using voice, fax, text messaging, call transfer, and database transactions.

An event corresponds to a specific situation that may occur within an application, for example, in a voice application, a caller has not providing any input, a NoInput event, or the back-end is not available, an Error Connector event. Event handling can be utilized to handle these situations.


A Public Switched Telephone Network is a local, long-distance, and international telephone system used by telephone subscribers for voice communications, sometimes referred to as also a Plain Old Telephone Service (POTS). The acronym for Public Switched Telephone Network is PSTN.

Dual-Tone Multi-Frequency tone signaling is used for telecommunications over analog telephone lines between callers and other communication devices. The acronym for Dual-Tone Multi-Frequency is DTMF. If your IVR application only requires DTMF recognition, you can use Voxeo Prophecy Hosting for DTMF-only.

JavaScript is a dynamic scripting language, developed by Netscape, that is supported by most web browsers and web tools.


The <script> element is used to specify a block of client-side ECMAScript code. For more information, see <script> Element.

In an IVR application, an utterance is a unit of speech of the caller, usually collected in response to a prompt, which the automatic speech recognition system attempts to match against a grammar.
A servlet is a small Java program that runs within a web server. Servlets receive and respond to requests from web clients, usually across HTTP.
Notifications are messages sent immediately after system events, for example, status changes of servers, server instances, or Services. Notifications can be received by SNMP traps or through email messages, depending on the definition of the corresponding receiver.

In Voxeo CXP, you can define SNMP notifications. For more information, see the Notifications section in the Deployment Guide for your version of Voxeo CXP in the Voxeo CXP Documentation.

In Voxeo Prophecy, you can configure SNMP notifications for your Prophecy server. For more information, see Configuring SNMP in the Advanced Configuration and Tasks chapter for your version of Prophecy in the Prophecy Documentation.


Extensible markup language is the universal format for structured documents and data on the web. The acronym for extensible markup language is XML. XML is a set of rules, or formatting, for encoding documents in a machine-readable form.


An Internet standards organization responsible for many of the standards key to the functionality of the World Wide Web today. The acronym for World Wide Web Consortium is W3C. The W3C is working on voice expansions to access web content. For more information, see The Voice Browser Working Group.


The <data> element allows the developer to fetch content from an XML source without having to use any server-side logic, and without having to transition to a new dialog. For more information, see <data> Element.


Really Simple Syndication is a family of web feed formats used to publish frequently updated material in a standard format. The acronym for Really Simple Syndication is RSS.


The <foreach> element allows the developer to loop through items in an array. For more information, see <foreach> Element.


The <mark> element is used to insert markers into output streams for asynchronous notification. For more information, see <mark> Element.


A prompt is a unit of information which is presented to the caller while interacting within a dialog of a voice application. A prompt can be a prerecorded audio file, a text-to-speech string, or a combination of both.


PHP is a server-side scripting and programming language. PHP is used for web page development for dynamic web pages that can be easily embedded into HTML.


A physical computer dedicated to run one or more services to serve the needs of the users of other computers on the network. Depending on the computing service that it offers, a server can be be a database server, file server, mail server, print server, web server, gaming server, or some other kind of server.

In Voxeo CXP, a server is a concept that corresponds to a cluster of physical server machines and is represented by a Server object. Services must be hosted on a server in order to be called. For more information, see Configuring a Server in the Configuring Servers and Services section in the Deployment Guide for your version of Voxeo CXP in the Voxeo CXP Documentation.

In Voxeo Prophecy, a Prophecy server is a Community Controller or a managed server in the Community. For more information, see Managing Servers in the Working with Servers section in the Prophecy Commander chapter for your version of Prophecy in the Prophecy Documentation.

The <data> element allows the developer to fetch content from an XML source without having to use any server-side logic, and without having to transition to a new dialog. For more information, see <data> Element.

Voxeo Documentation

     Voxeo Documentation Overview
  Evolution Developer Portal Documentation
  Voxeo CXP Documentation
  Prophecy Documentation
  XML Development Languages Documentation
  VoiceXML 2.1 Development Guide
     VoiceXML Development Guide v2.1 Overview
     Platform Overview
     Creating an Application
     Voxeo File Manager
     Caching Tips and Techniques
     Cache Manager API
     Security Information
  Learning VoiceXML 2.1
     VoiceXML 2.1 Tutorials Overview
     Tutorial: Hello World
     Tutorial: Voice Recognition
     Tutorial: Call Flow
     Tutorial: Document Navigation
     Tutorial: Using Audio Files
     Tutorial: Call Transfer
     Tutorial: Caller ID and Called ID
     Tutorial: DTMF Recognition
     Tutorial: JavaScript and VoiceXML
     Tutorial: Subgrammars
     Tutorial: grXML Grammar Weighting
     Tutorial: Subdialogs
     Tutorial: Shadow Variables
     Tutorial: N-best Lesson
     Tutorial: Outbound VoiceXML Applications Using HTTP
     Tutorial: Mixed Initiative Dialogs
     Tutorial: <foreach> and <data>
     Tutorial: Using the <mark> Element
     Tutorial: Dynamic Grammars
     Tutorial: Event Logging
     Tutorial: Screen Scraping
     Introduction to Server Side Languages
     Passing Querystring Variables Using ASP/JSP/PHP/CF
     Final Notes
  Best Practices
  Debugging Techniques
  Exceptions and Errors
  VoiceXML 2.1 Porting Guide
  VoiceXML Variables
  All Things Audio
  Outbound Dialing
  Property Guide
  GSL Grammars
  grXML Grammars
     Custom Features
     External References
  Text-To-Speech Guide
     VoIP Dialing
     Voxeo Designer User Interface
  Element Reference Guide
     Element Summary
     <assign> Element
     <audio> Element
     <block> Element
     <break> Element
     <catch> Element
     <choice> Element
     <clear> Element
     <data> Element
     <disconnect> Element
     <else> Element
     <elseif> Element
     <emphasis> Element
     <enumerate> Element
     <error> Element
     <example> Element
     <exit> Element
     <field> Element
     <filled> Element
     <foreach> Element
     <form> Element
     <goto> Element
     <grammar> Element
     <help> Element
     <if> Element
     <initial> Element
     <item> Element
     <link> Element
     <log> Element
     <mark> Element
     <media> Element
     <menu> Element
     <meta> Element
     <noinput> Element
     <nomatch> Element
     <one-of> Element
     <option> Element
     <paragraph> Element
     <param> Element
     <phoneme> Element
     <prompt> Element
     <property> Element
     <prosody> Element
     <record> Element
     <reprompt> Element
     <return> Element
     <rule> Element
     <ruleref> Element
     <say-as> Element
     <script> Element
     <sentence> Element
     <sub> Element
     <subdialog> Element
     <submit> Element
     <tag> Element
     <throw> Element
     <token> Element
     <transfer> Element
     <value> Element
     <var> Element
     <voxeo:logcontrol> Element
     <voxeo:recordcall> Element
     <vxml> Element
  CCXML 1.0 Development Guide
  CallXML 3.0 Development Guide
  Voxeo Support
     Support Quick Reference
  Registering for an Account
  Self-Help Tools
  Collecting Information
  Evolution Ticketing System
     Submitting a Support Request
     Escalating a Support Request
  Software Lifecycle and Support Policies
  Getting Further Assistance
  Voxeo Glossary
  Help on Help