Replies: 2 comments 1 reply
-
The old-school way would be to train a model on phonemes with stress labels, but of course XTTS is not even trained on phonemes. Since the model doesn't directly receive that information, there is no way to force a specific stress and you can only hope it guesses correctly from the context when it's trained on more data. At inference time, you could try something like capitalising the syllable that should be stressed. During training, you could try to steer the model in that direction by capitalising the stressed syllables there as well. |
Beta Was this translation helpful? Give feedback.
-
thank you! Isn't cleaners auto converting all to lowercase on inference ? |
Beta Was this translation helpful? Give feedback.
-
It seems that no mater how much you train the model it will very often put stress on different positions. Is there any way using vocab or something to hard force stress position to always be accurate and don't pull the rng and put the stress what ever it decides it?
Beta Was this translation helpful? Give feedback.
All reactions