Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audio less than 1s long silently fails all transcription #1603

Open
isaac-mcfadyen opened this issue Dec 7, 2023 · 2 comments
Open

Audio less than 1s long silently fails all transcription #1603

isaac-mcfadyen opened this issue Dec 7, 2023 · 2 comments
Labels
enhancement New feature or request question Further information is requested

Comments

@isaac-mcfadyen
Copy link

Issue

Apologies if this is a known issue! I looked and couldn't find an existing one 😄

When passing audio that is less than 1s long to whisper.cpp, regardless of the model or hardware acceleration (tested with small and medium, on both CPU only and cuBLAS), Whisper silently fails to transcribe the audio.

There's no error returned, it just looks like it skips all sampling (there's 0ms listed for sample, encode, decode, and prompt times) and returns nothing for the transcript.

Reproduction

First, make sure whisper is working by running on JFK sample:

  • make clean
  • make -j (with or without cuBLAS)
  • ./main --model ./models/ggml-medium.en.bin bindings/go/samples/jfk.wav
    • Working! Transcribes correctly.

Now use an external audio editor or ffmpeg to trim the audio to less than a second and run again:

  • ffmpeg -i bindings/go/samples/jfk.wav -to 00:00:00.8 jfk-short.wav
  • ./main --model models/ggml-medium.en.bin ./jfk-short.wav
    • Not working. Mel shows some time (few ms) but encoder, decoder, prompt, and batchd times show 0ms, and there's no transcript. I tested on multiple clips to ensure it wasn't just this one, and I can consistently reproduce.

If you pad the audio to more than a second, transcripts appear again:

  • ffmpeg -i jfk-short.wav -af "apad=pad_dur=1" jfk-padded.wav
  • ./main --model models/ggml-medium.en.bin ./jfk-padded.wav
    • Working again! Shows the (cut off) transcript.

System Info

Tested on Ubuntu 22.04, amd64, both with and without cuBLAS enabled.

@bobqianic
Copy link
Collaborator

whisper.cpp/whisper.cpp

Lines 5199 to 5202 in 3163090

// if only 1 second left, then stop
if (seek + 100 >= seek_end) {
break;
}

@bobqianic bobqianic added the question Further information is requested label Dec 10, 2023
@bobqianic bobqianic added the enhancement New feature or request label Jan 15, 2024
@bobqianic bobqianic linked a pull request Jan 15, 2024 that will close this issue
11 tasks
@bobqianic bobqianic removed a link to a pull request Feb 5, 2024
11 tasks
@jerzygangi
Copy link

@bobqianic Are you aware if this bug was ever fixed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants