Contents

Grammar Types

On this page ShowHide

Built-in grammars
Custom grammars
Preloaded grammars
VoiceXML grammars
Pronunciation lexicon documents
User-defined dictionaries

Following are the different types of speech recognition grammars that you can use with Interaction Speech Recognition.

Built-in grammars

Built-in grammars are a set of grammars that a speech recognition product supports by default. These grammars recognize a basic set of words that are necessary in nearly all speech recognition tasks.

The following table lists the types of words that Interaction Speech Recognition supports in its built-in grammars.

Built-in grammar type	Description
Boolean	This built-in grammar type contains the words Interaction Speech Recognition can identify for affirmative and negative utterances. For yes and no words, Interaction Speech Recognition returns the values of `true` and `false`, respectively.
Date	This built-in grammar type contains the words that Interaction Speech Recognition can identify as a calendar date, including the month, day, and year. Interaction Speech Recognition returns the date as a string of digits in the format of `yyyymmdd`. `yyyy` represents the year (optional). `mm` represents the month. `dd` represents the numeric day. If a year is not specified, Interaction Speech Recognition inserts a question mark (`?`) for each digit.
Time	This built-in grammar type contains the words that Interaction Speech Recognition can identify as a specific time as it is measured by a clock, such as 10:35 AM. Interaction Speech Recognition returns the time as a five-character string in the format of `hhmmf`. `hh` represents the hour. `mm` represents the minute. `f` represents the time format: `p` = PM, such as seven PM `a` = AM, such as seven AM `h` = 24-hour format, such as nineteen hundred `?` = Ambiguous, such as seven o'clock, which does not provide context
Digits	This built-in grammar type contains the words that Interaction Speech Recognition can identify as a string of individual digits. The digits grammar type is not the same as a number grammar type, where the placement of a digit has context. For example, a series of digits could represent an account number, a support case, or some other numeric identifier, such as 0045833154. Numbers, in the case of built-in grammars, represent amounts, such as 2,157 (two-thousand-one-hundred-fifty-seven). The digits built-in grammar type also supports the word double when callers reference two identical digits in succession such as double 7 (77). Note: The digits built-in grammar type does not support usage of the `minlength`, `maxlength`, or `length` parameters. If you require a specific number of digits, you must add the functionality of verifying the length of the returned string of digits in your processing system, such as custom handlers or VoiceXML. Interaction Speech Recognition returns the string of digits.
Currency	This built-in grammar type contains the words that Interaction Speech Recognition can identify as being related to monetary amounts. Interaction Speech Recognition returns a string in the following ISO4217 international standard format: `UUUmmm.nn` `UUU` = Currency type, such as USD Note: The en-us currency built-in grammar type of Interaction Speech Recognition recognizes only dollars and cents. `mmm` = The number of whole units of the currency type `nn` = The number of partial units of a whole unit
Number	This built-in grammar type contains the words that Interaction Speech Recognition can identify as a complete numeric value; not a series of digits. It also recognizes words that indicate a positive or negative value, such as plus, positive, minus, and negative. For example, if a caller says, one-thousand-two-hundred-thirty-four, Interaction Speech Recognition returns `1234`. If the caller says a word that indicates a positive or negative value, Interaction Speech Recognition includes an addition symbol (`+`) or hyphen (`-`), respectively.
Phone	This built-in grammar type contains the words that Interaction Speech Recognition can identify as a telephone number, including an optional extension. For example, if a caller says, eight, eight, eight, five, five, five, one, two, three, four, extension five, six, seven, Interaction Speech Recognition returns `8005551234x567`. A caller can also say numbers in place of digits, such as eight hundred (800) and forty-seven (47).

Custom grammars

Custom grammars are grammars that you create for one or more reasons. Such as:

The built-in grammars do not provide enough recognition entries for all situations.
The needs of the business require recognition of a specific set of words, such as names of products, names of subsidiary companies, industry-specific words (medical, technology, security, agriculture, government, and so on), or names of cities, states, or countries.

You create custom grammars as text files that specify the words and optional words for Interaction Speech Recognition to recognize. All custom grammars use the formats and syntax specified by the Speech Recognition Grammar Specification (SRGS) standard. For more information, see Grammar Syntax.

Tip:

When creating a custom grammar, ensure that you do not create circular references that result in infinite processing. However, when necessary, recursive grammars can provide a great deal of efficiency in your custom grammars.

Note:

Genesys recommends that you divide grammars that require more than a gigabyte of memory into smaller grammars in order to avoid running out of memory when loading.

Preloaded grammars

Sometimes, grammars are very large or complex. Downloading, compiling, and processing these large or complex grammars can impact performance, both in audio communications and with the resources of the processing system. For these reasons, you can specify that the processing systems preload those grammars. Processing systems outside an active speech recognition session download, compile, and cache preloaded grammars. When a speech recognition session then requires use of one of these grammars, the processing system already has the grammar in memory and can recognize utterances quickly.

You do not need to preload all grammars as each preloaded grammar occupies memory of the processing systems. In most situations, the grammar files are small or simple enough that downloading, compiling, and processing occurs without impacting the session or the performance of the processing system.

Genesys recommends that you preload grammars only if testing of a non-preloaded grammar causes performance problems, such as delays in responsiveness and audio quality issues.

VoiceXML grammars

VoiceXML is a W3C standard for using text files to facilitate interactive voice conversations between people and computers. VoiceXML uses text-based files with specialized eXtensible Markup Language (XML) elements that can define words for Text-to-Speech (TTS), speech recognition, grammars, conversation management, and playing recorded audio.

If you use Interaction Speech Recognition with a VoiceXML system, your VoiceXML files can include or reference grammars that Interaction Speech Recognition uses. VoiceXML grammars also use the Speech Recognition Grammar Specification (SRGS) standard.

When Interaction Speech Recognition recognizes a speech pattern based on the supplied grammar, it returns that data to the VoiceXML interpreter.

Pronunciation lexicon documents

A grammar can optionally reference one or more external pronunciation lexicon documents. Interaction Media Server supports the loading, processing, and usage of external pronunciation lexicons in Interaction Speech Recognition as required by the Speech Recognition Grammar Specification (SRGS) 1.0. For more information about SRGS requirements regarding external pronunciation lexicons, see https://www.w3.org/TR/speech-grammar/#S4.10.

User-defined dictionaries

Beginning with CIC 2019 R3, Interaction Speech Recognition supports user-defined dictionaries. This feature allows you to specify alternative pronunciations of uncommon names and words to use for speech recognition and in the playing of prompts in speech synthesis.

Pronunciation Lexicon Specification (PLS) is a definition that allows automated speech recognition and text-to-speech engines to use external dictionaries during speech recognition and speech synthesis. For more information, see https://www.w3.org/TR/pronunciation-lexicon/.

You specify the alternative pronunciations in a lexicon file and the system matches them during grammar-based speech recognition. This feature is especially useful for recognition of dialects and names where pronunciations can vary. The system will also match default pronunciations during speech recognition.

VoiceXML

The VoiceXML standard supports a feature that can use external dictionaries with speech recognition and TTS engines. For speech recognition, you specify the dictionary in the grammar file in a specially-defined lexicon element. You can specify multiple dictionaries at different points in the IVR dialog flow. For TTS prompts, you specify the dictionary in the lexicon element defined inside a prompt element. For more information about definitions and the VoiceXML standard, see https://www.w3.org/TR/voicexml20/.

In CIC, reference the VoiceXML document containing the lexicon in the Document URI field of a handler subroutine. For more information about handler setup, see the https://help.genesys.com/cic/mergedProjects/wh_tr/desktop/pdfs/voicexml_tr.pdf. For more information about specifying lexicons in grammar files, see https://www.w3.org/TR/speech-grammar/#S4.10.

Example of a lexicon file with an alternate pronunciation for the name Anna

<?xml version="1.0" encoding="UTF-8"?>

<lexicon version="1.0"

      xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"

      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

      xsi:schemaLocation="http://www.w3.org/2005/01/pronunciation-lexicon

        http://www.w3.org/TR/2007/CR-pronunciation-lexicon-20071212/pls.xsd"

      alphabet="x-inin-arpabet" xml:lang="en-US">

  <lexeme>

    <grapheme>Anna</grapheme>

    <phoneme>ah n ah</phoneme>

  </lexeme>

</lexicon>

Example of lexicon usage in VoiceXML

<?xml version="1.0" encoding="UTF-8"?>

<!--

ASR of name Anna followed by a greeting

-->

<vxml xmlns="http://www.w3.org/2001/vxml"

  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

  xsi:schemaLocation="http://www.w3.org/2001/vxml

   http://www.w3.org/TR/voicexml20/vxml.xsd"

   version="2.0">

  <form>

    <field name="name">

      <prompt>

         Please say your name

      </prompt>

      <grammar type="application/srgs+xml" src="name.grxml"/>

    </field>

    <filled>

      <prompt>

<lexicon uri="file:///path/to/file/alt_pron_for_anna.pls"/>

Hello <value expr="name"/></prompt>

      <exit/>

    </filled>

  </form>

</vxml>