Contents

Supported Resources

Use this page to select the resources that are supported for this MRCP server. The options are:

Speech Recognition: This is a full speech recognition resource that is capable of receiving a media stream containing audio, and interpreting it to recognition results. It also has a natural language semantic interpreter to post-process the recognized data according to the semantic data in the grammar, and to provide semantic results along with the recognized input. The recognition resource may also support enrolled grammars, where the client can enroll and create new personal grammars for use in future recognition operations.
DTMF Recognition: This is a recognition resource capable of extracting and interpreting DTMF digits in a media stream, and matching them against a supplied digit grammar. It could also do a semantic interpretation based on semantic tags in the grammar.
Speech Synthesizer: This is a full-capability speech synthesis resource capable of rendering speech from text. Such a synthesizer should have full SSML [25] support.
Basic Synthesizer: This is a speech synthesizer resource with very limited capabilities that can generate its media stream exclusively from concatenated audio clips. The speech data is described using a limited subset of SSML [25] elements. A basic synthesizer must support the SSML tags <speak>, <audio>, <say-as> and <mark>.
Speak Verify: This is a resource capable of verifying the authenticity of a claimed identity by matching a media stream containing spoken input to a pre-existing voice-print. This may also involve matching the caller's voice against more than one voice-print, also called multi-verification or speaker identification.
Recorder: This is a resource capable of recording audio and saving it to a URI. A recorder should provide some end-pointing capabilities for suppressing silence at the beginning and end of a recording, and may also suppress silence in the middle of a recording. If such suppression is done, the recorder must maintain timing metadata to indicate the actual time stamps of the recorded media.

By default, Speech Synthesizer is selected as a supported resource.