Only part of audio transcribed #153

NasonZ · 2023-12-26T18:36:44Z

I have an hour long meeting which I would like to transcribe. I've attempted to do so with:

import whisper_timestamped as whisper

audio = whisper.load_audio("/content/Meeting Recording.mp4")

model = whisper.load_model("medium", device="cpu")

result = whisper.transcribe(model, audio, language="en")

My issue is that the result only covers the first 2 minutes of the meeting, what settings do I need to adjust to transcribe the entire meeting?

The text was updated successfully, but these errors were encountered:

Jeronymous · 2023-12-28T11:51:30Z

Maybe there is a big silence gap after the 2 first minutes.
Can you try with option "vad=True"

NasonZ · 2023-12-28T15:29:33Z

There's no silence gap, but setting vad=True did help. I now managed to get 38 minutes out of 64 minutes, better but still not the entire meeting. Where the cutoff happened this time, there doesn't seem to be a significant silent gap, probably less than a second before one speaker replied to another.

Any ideas why it transcribe keeps prematurely ending?

Jeronymous · 2024-01-08T07:56:23Z

No, I have no idea why you experience this.
I would need the audio and the full set of options to reproduce and investigate

Jeronymous · 2024-01-11T06:52:33Z

This issue is really strange. @NasonZ Is there a way you can share the audio?
(either by linking the zipped audio here, or sending it to hello_at_linto.ai)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Only part of audio transcribed #153

Only part of audio transcribed #153

NasonZ commented Dec 26, 2023

Jeronymous commented Dec 28, 2023

NasonZ commented Dec 28, 2023

Jeronymous commented Jan 8, 2024

Jeronymous commented Jan 11, 2024

Only part of audio transcribed #153

Only part of audio transcribed #153

Comments

NasonZ commented Dec 26, 2023

Jeronymous commented Dec 28, 2023

NasonZ commented Dec 28, 2023

Jeronymous commented Jan 8, 2024

Jeronymous commented Jan 11, 2024