Hardware power for synthesizing speech #83

leonardo7901 · 2024-09-19T09:30:04Z

leonardo7901
Sep 19, 2024

Hi everyone!

First, a huge thanks to the developers behind this amazing framework and to the community for all the helpful discussions both here and in the main branch. They've really helped me dive into the TTS world.

If I'm not writing in the wrong thread, I have a question related to my current project: I'm training a VITS model to generate speech for an LLM that will be integrated into a robot. While I can rely on cloud services like OpenAI's API for the LLM, I believe the speech synthesis part needs to be done locally (due to latency requirements/I want to use my model).

I'm aiming for real-time synthesis (or at least minimal latency). My question is: how powerful does the robot's hardware need to be? A Raspberry Pi 5 seems a bit too underpowered. Would a mini-PC be a better fit? Is CUDA acceleration essential for this task? I tested my current model (~370k steps, I'm planning even ~2M) on an i9-12900k without CUDA, and 'tts' generated an output file in about 6 seconds, which is acceptable for me.

Thanks in advance for your insights!

Answered by eginhard

Sep 19, 2024

You can convert Coqui's VITS models to ONNX format and then run them with sherpa-onnx, see: coqui-ai#2602 (comment) But not sure what the minimal hardware requirements would be in that case, you'd need to do some tests.

View full answer

eginhard · 2024-09-19T09:48:09Z

eginhard
Sep 19, 2024
Maintainer

You can convert Coqui's VITS models to ONNX format and then run them with sherpa-onnx, see: coqui-ai#2602 (comment) But not sure what the minimal hardware requirements would be in that case, you'd need to do some tests.

1 reply

leonardo7901 Sep 19, 2024
Author

Thanks for the reply. Unfortunately, I don't have a Raspberry Pi, so my questions were also about understanding what I need to buy: a Pi 5 8GB if it is able to synthesize speech or save money with a less powerful model. Do you think I can post the same question to sherpa-onnx?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hardware power for synthesizing speech #83

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Hardware power for synthesizing speech #83

leonardo7901 Sep 19, 2024

Replies: 1 comment · 1 reply

eginhard Sep 19, 2024 Maintainer

leonardo7901 Sep 19, 2024 Author

leonardo7901
Sep 19, 2024

Replies: 1 comment 1 reply

eginhard
Sep 19, 2024
Maintainer

leonardo7901 Sep 19, 2024
Author