You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm working on a project using F5-TTS for AI voice assistant and I'm curious about how others are using F5-TTS. So I came here to ask how do you guys integrate F5-TTS into your python scripts? Also, any tips on speeding up the inference process would be awesome!
In another thread [(https://github.com//issues/224)] I found these suggestions for speeding up by SWivid:
"Use less nfe_step for speed-quality trade-off.
Try distillation techniques.
Train a smaller model from scratch if single-language needed application scenario."
When used in a script where can you define the nfe_step parameter? I tried changing it in my toml. file but it did not seem to make any speed difference. With the help of many youtube tutorials I also managed to train my own model and then defined it as ckpt file in the toml. file, I'd like to ask you guys if this is the correct way to do it?
Speed was the same even though I trained it 10-15 times smaller than the default one. So I figured some speed gains are lost when the model is loaded into memory and F5-TTS is starting the other processes. Gradio also works much faster than the python code, 2-3 seconds on gradio vs 15 in python, with the new model.
Basically with my little python knowledge I wrapped this up into a function that takes incoming string and then translates that to speech. Though soon I realized this is not the most optimal way since the next time the AI outputs something F5-TTS has to start again. How would one keep F5-TTS running and ready prepared for the next input?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hello fellow tts enthusiasts!
I'm working on a project using F5-TTS for AI voice assistant and I'm curious about how others are using F5-TTS. So I came here to ask how do you guys integrate F5-TTS into your python scripts? Also, any tips on speeding up the inference process would be awesome!
In another thread [(https://github.com//issues/224)] I found these suggestions for speeding up by SWivid:
"Use less nfe_step for speed-quality trade-off.
Try distillation techniques.
Train a smaller model from scratch if single-language needed application scenario."
When used in a script where can you define the nfe_step parameter? I tried changing it in my toml. file but it did not seem to make any speed difference. With the help of many youtube tutorials I also managed to train my own model and then defined it as ckpt file in the toml. file, I'd like to ask you guys if this is the correct way to do it?
Speed was the same even though I trained it 10-15 times smaller than the default one. So I figured some speed gains are lost when the model is loaded into memory and F5-TTS is starting the other processes. Gradio also works much faster than the python code, 2-3 seconds on gradio vs 15 in python, with the new model.
Basically with my little python knowledge I wrapped this up into a function that takes incoming string and then translates that to speech. Though soon I realized this is not the most optimal way since the next time the AI outputs something F5-TTS has to start again. How would one keep F5-TTS running and ready prepared for the next input?
This become a long one! Thanks in advance!
Beta Was this translation helpful? Give feedback.
All reactions