Unexpected or poor results with audio transcription? Check your audio mix... #559
jaygooby
started this conversation in
Show and tell
Replies: 1 comment
-
Thank you for sharing this information! |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Spent more than a day puzzling why we couldn't get anything like a recognisable transcription from a file, that Macwhisper with the free small model, had no issues with.
We just had endless variations of:
or sometimes:
Spent ages faffing around with different params and even ended up brute forcing the temperatures and temperature increments to try and see if that made any difference, but then my colleague @boblete said "Have you tried making a mono version of the audio?"
Oh no.
The video we were extracting the audio from, had been recorded with a polar mic or the stereo mix had been done with the spoken audio on one channel and backing audio on the other.
When we made the wav from the mp4:
whisper takes our wav and makes a mono version - and the first channel happens to have only the backing audio on, so no actual audio to transcribe, hence our lack of anything sensible out of the model - it was just getting the backing audio and no spoken words.
You can mimic the process of what whisper was doing with the mono channel like this:
and in our case that
mono.wav
just has backing audio - the narration has been dropped.So just a heads up that if you're getting unexpected or poor results with audio transcription, check your audio mix!
You can ensure the mono mix you give to whisper has both source channels on if you make your
.wav
like this:Hope this helps save your sanity!
Beta Was this translation helpful? Give feedback.
All reactions