You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
I'm mainly use RVC to voice different characters, if most of time it works well enough, in some cases like screams, breath, laughs or vocal fry, the algorithm kind of bug out and can't follow well, make it sound really weird.
Describe the solution you'd like
I'm aware some settings under the hood could be tweaked in order to get better results, however, theses settings aren't displayed to the user. It would be great if we could have some presets to select for inference and training, optimizing the quality of the results for speech or for singing. For example : male speech, female speech, children speech, male singing, female singing, etc. It could cover more accurately the vocal range of each character.
Describe alternatives you've considered
Right now, I found that using checkpoint fusion can help a tiny bit to extend the vocal range, however, the voice isn't faithful to the original anymore.
Additional context
If it's not possible, could we make a pre-trained or a separate breath/scream/laugh model that focuses only on that? then we can blend the "voice noises (like breath, etc.)" model with the speech model of the same character.
The text was updated successfully, but these errors were encountered:
Well, if you use larger training dataset including those voices you said, the model may be able to recognize them. Theoretically, the model has the capability to study any voice feature.
Well, if you use larger training dataset including those voices you said, the model may be able to recognize them. Theoretically, the model has the capability to study any voice feature.
in this case, does it affect training quality? many tutorials said the dataset should have a very coherent and stable voice. However, things like whispering, screams and laughs, are very different from the common spoken voice, even though it's from the same person. So, would it be beneficial to train 2 models? one only for screams/breath and one only for spoken word?
I'm not pretty sure about it because I haven't had a test.
the dataset should have a very coherent and stable voice.
Yes, because the dataset used for training in RVC is usually quite small. The large I said means that the dataset should be the same scale as what trained the pre-trained model.
Is your feature request related to a problem? Please describe.
I'm mainly use RVC to voice different characters, if most of time it works well enough, in some cases like screams, breath, laughs or vocal fry, the algorithm kind of bug out and can't follow well, make it sound really weird.
Describe the solution you'd like
I'm aware some settings under the hood could be tweaked in order to get better results, however, theses settings aren't displayed to the user. It would be great if we could have some presets to select for inference and training, optimizing the quality of the results for speech or for singing. For example : male speech, female speech, children speech, male singing, female singing, etc. It could cover more accurately the vocal range of each character.
Describe alternatives you've considered
Right now, I found that using checkpoint fusion can help a tiny bit to extend the vocal range, however, the voice isn't faithful to the original anymore.
Additional context
If it's not possible, could we make a pre-trained or a separate breath/scream/laugh model that focuses only on that? then we can blend the "voice noises (like breath, etc.)" model with the speech model of the same character.
The text was updated successfully, but these errors were encountered: