Release ESP-SR Release V1.6 · espressif/esp-sr

We are delighted to announce the release of our latest models, MultiNet7 and nsnet1, as well as more wake word models trained by TTS samples.

1. MultiNet7: Speech commands recognition model

We are proud to introduce our new MultiNet7 model. This new model is optimized for efficiency, using less memory and reducing compute time while maintaining high accuracy. You can upgrade MultiNet7 from MultiNet6 smoothly.

2. nsnet1 - The first deep noise suppression model

We are also introducing nsnet1, our first deep noise suppression model. This model is designed to enhance speech quality in noisy environments, making it perfect for real-world applications like telephony systems.

nsnet1 uses a deep learning approach to suppress background noise while preserving the original speech signal. It is trained on a large dataset to learn the patterns of noise and effectively cancel them out without distorting the speech.

This model is available for ESP32-S3 chip. You can enable it by setting afe_config.afe_ns_mode = NS_MODE_NET; . Please refer to esp-skainet/examples/voice_communication for more details.

Note: currently only AFE_VC support nsnet1. AFE_SR does not support nsnet1.

3. Wake Word Models Trained by TTS

We have expanded our wake word models trained by TTS (Text-to-Speech) samples to include more wake word for our users. With the combination of TTS and LLM methods, the TTS model can be trained on a large amount of unlabeled audio data by self-supervised learning. The zero-shot performance of the TTS model is significantly improved, which allows us to clone voices based on a large number of short audio clips (less than 10 seconds). Now, we can train a wake word model only by TTS samples.

These wake word models are designed to recognize specific keywords or phrases that trigger an action or response from your device or application.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ESP-SR Release V1.6

1. MultiNet7: Speech commands recognition model

2. nsnet1 - The first deep noise suppression model

3. Wake Word Models Trained by TTS