Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Notes on repetitions #38

Open
jwijffels opened this issue Jan 29, 2024 · 4 comments
Open

Notes on repetitions #38

jwijffels opened this issue Jan 29, 2024 · 4 comments

Comments

@jwijffels
Copy link
Contributor

jwijffels commented Jan 29, 2024

Strategies to reduce repetitions / hallucinations

Use 5 beams
Increase entropy threshold from the default 2.4 to 2.8 for example. Higher threshold will reject repetitive text and fallback to sampling with higher temperature
Reduce the maximum context size (--max-context). By default it is 224. Setting it to 64 or 32 can reduce the repetitions significantly. Setting it to 0 will most likely eliminate all repetitions, but the transcription quality can be affected because it will be losing the context from the previous transcript

Related to timestamps: see ggerganov/whisper.cpp#1724

@jwijffels
Copy link
Contributor Author

TODO: add R function to detect repetitions, the location in the audio/transcription where this occurs and after which the model does not recover, such that it can be used to relaunch the transcription with other settings or a better model.

@jmgirard
Copy link
Contributor

I've been running into this issue a lot with large-v3. Makes it basically unusable for my purposes. Sounds like v2 may be better?

@jwijffels
Copy link
Contributor Author

jwijffels commented Mar 25, 2024

yes, large-v2 or medium and remove silences - best model for silence removal is Silero, webrtc is a lot faster but less accurate.

Next plug in the detected non-silence periods in the predict function - either use argument sections (which will create a new audio file based on these voiced sections) or arguments offset/duration (which will also look a bit around the cutoff timepoints) - available since audio.whisper 0.4

Next to that, I hope ggerganov/whisper.cpp#1768 will also make improvements once incorporated in whisper.cpp and in audio.whisper

@jmgirard
Copy link
Contributor

jmgirard commented Mar 25, 2024

large-v2 seems to be doing better (even without removing the silences). Interestingly, it is also running a lot faster than v3, presumably because it is not wasting as much time hallucinating. Trying audio.vadsilero now... Moved discussion over to #62

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants