You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@ErfolgreichCharismatisch modern TTS models consist of 2 parts: feature generator and vocoder. Feature generator produces low-dimensional time-frequency acoustic features from text, while vocoder reconstructs raw waveform from these features. Each model trains separately. WaveGrad corresponds to the second part, vocoder. It takes acoustic features (mel-spectrograms) as input, not text. And it can be trained on arbitrary dataset.
As I understand it, this tts-algorithm works with your audio files without assigned text.
The text was updated successfully, but these errors were encountered: