You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello,
I fine-tuned the libritts2k model on some custom data (roughly 15 minutes of speech) of mine. The output with the inference demo is pretty good, though it doesn't sound like the custom data voice. Do I have to fine-tune the model longer? The best results are typically after 5000 iterations or do I have to change some code in the inference.py file. Or do I have a grave misunderstanding on how to produce a custom dataset voice?
Any advice would be welcome, thank you.
The text was updated successfully, but these errors were encountered:
Hello,
I fine-tuned the libritts2k model on some custom data (roughly 15 minutes of speech) of mine. The output with the inference demo is pretty good, though it doesn't sound like the custom data voice. Do I have to fine-tune the model longer? The best results are typically after 5000 iterations or do I have to change some code in the inference.py file. Or do I have a grave misunderstanding on how to produce a custom dataset voice?
Any advice would be welcome, thank you.
The text was updated successfully, but these errors were encountered: