Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only part of audio transcribed #153

Open
NasonZ opened this issue Dec 26, 2023 · 4 comments
Open

Only part of audio transcribed #153

NasonZ opened this issue Dec 26, 2023 · 4 comments

Comments

@NasonZ
Copy link

NasonZ commented Dec 26, 2023

I have an hour long meeting which I would like to transcribe. I've attempted to do so with:

import whisper_timestamped as whisper

audio = whisper.load_audio("/content/Meeting Recording.mp4")

model = whisper.load_model("medium", device="cpu")

result = whisper.transcribe(model, audio, language="en")

My issue is that the result only covers the first 2 minutes of the meeting, what settings do I need to adjust to transcribe the entire meeting?

@Jeronymous
Copy link
Member

Maybe there is a big silence gap after the 2 first minutes.
Can you try with option "vad=True"

@NasonZ
Copy link
Author

NasonZ commented Dec 28, 2023

There's no silence gap, but setting vad=True did help. I now managed to get 38 minutes out of 64 minutes, better but still not the entire meeting. Where the cutoff happened this time, there doesn't seem to be a significant silent gap, probably less than a second before one speaker replied to another.

Any ideas why it transcribe keeps prematurely ending?

@Jeronymous
Copy link
Member

No, I have no idea why you experience this.
I would need the audio and the full set of options to reproduce and investigate

@Jeronymous
Copy link
Member

This issue is really strange. @NasonZ Is there a way you can share the audio?
(either by linking the zipped audio here, or sending it to hello_at_linto.ai)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants