mod_google_transcribe

A Freeswitch module that generates real-time transcriptions on a Freeswitch channel by using Google's Speech-to-Text API.

Optionally, the connection to the google cloud recognizer can be delayed until voice activity has been detected. This can be useful in cases where it is desired to minimize the costs of streaming audio for transcription. This setting is governed by the channel variables starting with 1RECOGNIZER_VAD`, as described below.

API

Commands

The freeswitch module exposes two versions of an API command to transcribe speech:

version 1

uuid_google_transcribe <uuid> start <lang-code> [interim]

When using this command, additional speech processing options can be provided through Freeswitch channel variables, described below.

####version 2

uuid_google_transcribe2 <uuid> start <lang-code> [interim] (bool) \
[single-utterance](bool) [separate-recognition](bool) [max-alternatives](int) \
[profanity-filter](bool) [word-time](bool) [punctuation](bool) \
[model](string) [enhanced](bool) [hints](word seperated by , and no spaces) \
[play-file] (play file path)

This command allows speech processing options to be provided on the command line, and has the ability to optionally play an audio file as a prompt.

Example:

bgapi uuid_google_transcribe2 312033b6-4b2a-48d8-be0c-5f161aec2b3e start en-US \
true true true 5 true true true command_and_search true \
yes,no,hello https://www2.cs.uic.edu/~i101/SoundFiles/CantinaBand60.wav

Attaches media bug to channel and performs streaming recognize request.

uuid - unique identifier of Freeswitch channel
lang-code - a valid Google language code to use for speech recognition
interim - If the 'interim' keyword is present then both interim and final transcription results will be returned; otherwise only final transcriptions will be returned

uuid_google_transcribe <uuid> stop

Stop transcription on the channel.

Command Variables

Additional google speech options can be set through freeswitch channel variables for uuid_google_transcribe (some can alternatively be set in the command line for uuid_google_transcribe2).

variable	Description
GOOGLE_SPEECH_SINGLE_UTTERANCE	read this
GOOGLE_SPEECH_SEPARATE_RECOGNITION_PER_CHANNEL	read this
GOOGLE_SPEECH_MAX_ALTERNATIVES	read this
GOOGLE_SPEECH_PROFANITY_FILTER	read this
GOOGLE_SPEECH_ENABLE_WORD_TIME_OFFSETS	read this
GOOGLE_SPEECH_ENABLE_AUTOMATIC_PUNCTUATION	read this
GOOGLE_SPEECH_MODEL	read this
GOOGLE_SPEECH_USE_ENHANCED	read this
GOOGLE_SPEECH_HINTS	read this
GOOGLE_SPEECH_ALTERNATIVE_LANGUAGE_CODES	a comma-separated list of language codes, per this
GOOGLE_SPEECH_SPEAKER_DIARIZATION	set to 1 to enable speaker diarization
GOOGLE_SPEECH_SPEAKER_DIARIZATION_MIN_SPEAKER_COUNT	read this
GOOGLE_SPEECH_SPEAKER_DIARIZATION_MAX_SPEAKER_COUNT	read this
GOOGLE_SPEECH_METADATA_INTERACTION_TYPE	set to 'discussion', 'presentation', 'phone_call', 'voicemail', 'professionally_produced', 'voice_search', 'voice_command', or 'dictation' per this
GOOGLE_SPEECH_METADATA_INDUSTRY_NAICS_CODE	read this
GOOGLE_SPEECH_METADATA_MICROPHONE_DISTANCE	set to 'nearfield', 'midfield', or 'farfield' per this
GOOGLE_SPEECH_METADATA_ORIGINAL_MEDIA_TYPE	set to 'audio', or 'video' per this
GOOGLE_SPEECH_METADATA_RECORDING_DEVICE_TYPE	set to 'smartphone', 'pc', 'phone_line', 'vehicle', 'other_outdoor_device', or 'other_indoor_device' per this
START_RECOGNIZING_ON_VAD	if set to 1 or true, do not begin streaming audio to google cloud until voice activity is detected.
RECOGNIZER_VAD_MODE	An integer value 0-3 from less to more aggressive vad detection (default: 2).
RECOGNIZER_VAD_VOICE_MS	The number of milliseconds of voice activity that is required to trigger the connection to google cloud, when START_RECOGNIZING_ON_VAD is set (default: 250).
RECOGNIZER_VAD_DEBUG	if >0 vad debug logs will be generated (default: 0).

Events

google_transcribe::transcription - returns an interim or final transcription. The event contains a JSON body describing the transcription result:

{
	"stability": 0,
	"is_final": true,
	"alternatives": [{
		"confidence": 0.96471,
		"transcript": "Donny was a good bowler, and a good man"
	}]
}

google_transcribe::end_of_utterance - returns an indication that an utterance has been detected. This may be returned prior to a final transcription. This event is only returned when GOOGLE_SPEECH_SINGLE_UTTERANCE is set to true.

google_transcribe::end_of_transcript - returned when a transcription operation has completed. If a final transcription has not been returned by now, it won't be. This event is only returned when GOOGLE_SPEECH_SINGLE_UTTERANCE is set to true.

google_transcribe::no_audio_detected - returned when google has returned an error indicating that no audio was received for a lengthy period of time.

google_transcribe::max_duration_exceeded - returned when google has returned an an indication that a long-running transcription has been stopped due to a max duration limit (305 seconds) on their side. It is the applications responsibility to respond by starting a new transcription session, if desired.

google_transcribe::no_audio_detected - returned when google has not received any audio for some reason.

Usage

When using drachtio-fsrmf, you can access this API command via the api method on the 'endpoint' object.

ep.api('uuid_google_transcribe', `${ep.uuid} start en-US`);

Examples

google_transcribe.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

mod_google_transcribe

API

Commands

version 1

Command Variables

Events

Usage

Examples

Files

README.md

Latest commit

History

README.md

File metadata and controls

mod_google_transcribe

API

Commands

version 1

Command Variables

Events

Usage

Examples