Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TTS without Text? #18

Open
ErfolgreichCharismatisch opened this issue Jan 23, 2021 · 2 comments
Open

TTS without Text? #18

ErfolgreichCharismatisch opened this issue Jan 23, 2021 · 2 comments

Comments

@ErfolgreichCharismatisch

As I understand it, this tts-algorithm works with your audio files without assigned text.

  1. How would it understand the content, language?
  2. Is it working with the lj-speech set only or a dataset in lj-speech structure?
@ivanvovk
Copy link
Owner

ivanvovk commented Jan 23, 2021

@ErfolgreichCharismatisch modern TTS models consist of 2 parts: feature generator and vocoder. Feature generator produces low-dimensional time-frequency acoustic features from text, while vocoder reconstructs raw waveform from these features. Each model trains separately. WaveGrad corresponds to the second part, vocoder. It takes acoustic features (mel-spectrograms) as input, not text. And it can be trained on arbitrary dataset.

@ErfolgreichCharismatisch
Copy link
Author

Interesting. So which feature generator(s) does it work out of the box with?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants