Unexpected or poor results with audio transcription? Check your audio mix... #559

jaygooby · 2024-09-02T16:23:25Z

jaygooby
Sep 2, 2024

Spent more than a day puzzling why we couldn't get anything like a recognisable transcription from a file, that Macwhisper with the free small model, had no issues with.

We just had endless variations of:

[music]
[music]
[music]

or sometimes:

[music]
(upbeat music)
(upbeat music)
[music]

Spent ages faffing around with different params and even ended up brute forcing the temperatures and temperature increments to try and see if that made any difference, but then my colleague @boblete said "Have you tried making a mono version of the audio?"

Oh no.

The video we were extracting the audio from, had been recorded with a polar mic or the stereo mix had been done with the spoken audio on one channel and backing audio on the other.

When we made the wav from the mp4:

ffmpeg -nostdin -hide_banner -loglevel error -y -i demo.mp4 -ar 16000 -f wav tmp.wav

whisper takes our wav and makes a mono version - and the first channel happens to have only the backing audio on, so no actual audio to transcribe, hence our lack of anything sensible out of the model - it was just getting the backing audio and no spoken words.

You can mimic the process of what whisper was doing with the mono channel like this:

ffmpeg -nostdin -hide_banner -loglevel error -y -i demo.mp4  -ar 16000 -ac 1 -f wav mono.wav

and in our case that mono.wav just has backing audio - the narration has been dropped.

So just a heads up that if you're getting unexpected or poor results with audio transcription, check your audio mix!

You can ensure the mono mix you give to whisper has both source channels on if you make your .wav like this:

ffmpeg -nostdin -hide_banner -loglevel error -y -i demo.mp4  -ar 16000 -af "pan=mono|c0=c1" -f wav mono.wav

Hope this helps save your sanity!

jart · 2024-09-03T07:52:39Z

jart
Sep 3, 2024
Maintainer

Thank you for sharing this information!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unexpected or poor results with audio transcription? Check your audio mix... #559

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Unexpected or poor results with audio transcription? Check your audio mix... #559

jaygooby Sep 2, 2024

Replies: 1 comment

jart Sep 3, 2024 Maintainer

jaygooby
Sep 2, 2024

jart
Sep 3, 2024
Maintainer