Replies: 1 comment
-
ahh nevermind, answering my own question here, it looks like the generate loop breaks when a row of EOS tokens is encountered. Assuming this is the reason for the output discrepancy. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi Everyone,
I'm assuming this isn't an issue, but I've trained the entire pipeline a few times and am struggling to get intelligible speech results. I've noticed during the inference when I call the audiolm pipeline, the semantic transformer seems to stop early according to the ouput in my jupyter notebook. I'm assuming this is just an output discrepancy and that inference is finishing faster than the display updates (looks like tqdm)
generated_wav = audiolm(prime_wave_path="/audio/generated_samples/test_audio_primer.wav")
The output:
Training params:
I'm using the LibriSpeech corpus which is about 1k hours of speech. I download the train and test tests, convert the files from flac to wav using ffmpeg and sort them into training and validation folders. I'm largely using the defaults on the readme for everything and using Encodec. Happy to provide the complete training scripts if needed.
Beta Was this translation helpful? Give feedback.
All reactions