-
Our current inference pipeline supports single input only and does not support batch processing.
-
We provide two inference modes:
text only
andtext & speech
. You can set thedecode_text_only
parameter in the inference script to choose your preferred mode. -
If using CosyVoice for decoding (as employed in SLAM-Omni), please take note of the following:
- Download the corresponding CosyVoice-300M-SFT model from CosyVoice and set the
codec_decoder_path
parameter in your script to its location. - You can customize the output voice tone by specifying the
audio_prompt_path
. A selection of optional voices is provided in theprompt
directory. If not specified, the default voice tone will be used.
- Download the corresponding CosyVoice-300M-SFT model from CosyVoice and set the