Keep a model loaded to reduce subsequent generation time #185
-
Hello, I am pretty new to programming in general so this question might sound silly a bit. I am making subsequent generation calls (tts_to_file) and each call is taking around 14~ seconds. I believe the model is loaded each time a generation call is made, is there way to keep the model loaded at all times? For example, I am using this to test.
|
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 6 replies
-
This is exactly what the import logging
from TTS.api import TTS
logging.basicConfig(level=logging.INFO)
# Load the TTS model
tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2").to("cuda")
def test_function(text):
tts.tts_to_file(
text=text,
file_path="tts_output.wav",
speaker_wav="reference.wav",
language="en"
)
test_function('test 1')
test_function('test 2')
test_function('test 2')
test_function('test 3') I get the following output on an RTX3090: INFO:TTS.utils.manage:tts_models/multilingual/multi-dataset/xtts_v2 is already downloaded.
INFO:TTS.tts.models:Using model: xtts
INFO:TTS.utils.synthesizer:Text split into sentences.
INFO:TTS.utils.synthesizer:Input: ['test 1']
INFO:TTS.utils.synthesizer:Processing time: 1.834
INFO:TTS.utils.synthesizer:Real-time factor: 1.012
INFO:TTS.utils.synthesizer:Text split into sentences.
INFO:TTS.utils.synthesizer:Input: ['test 2']
INFO:TTS.utils.synthesizer:Processing time: 0.342
INFO:TTS.utils.synthesizer:Real-time factor: 0.252
INFO:TTS.utils.synthesizer:Text split into sentences.
INFO:TTS.utils.synthesizer:Input: ['test 2']
INFO:TTS.utils.synthesizer:Processing time: 0.391
INFO:TTS.utils.synthesizer:Real-time factor: 0.251
INFO:TTS.utils.synthesizer:Text split into sentences.
INFO:TTS.utils.synthesizer:Input: ['test 3']
INFO:TTS.utils.synthesizer:Processing time: 0.349
INFO:TTS.utils.synthesizer:Real-time factor: 0.248 You could potentially save a little bit of time by precomputing the speaker embedding and reusing it (see the docs). But if it takes 14 seconds for you, the issue must come from somewhere else, even on a CPU it's faster. |
Beta Was this translation helpful? Give feedback.
-
I have used much longer testing text (around 100 words each), using Nvidia T4 (16gb) with cuda enabled. All took the same amount of time (14~ seconds). That's why I was curious. I see, that's all we can do :( |
Beta Was this translation helpful? Give feedback.
This is exactly what the
tts = TTS(...)
line does. Using:I get the following output on an RTX3090: