Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to reduce Whisper hallucinations #198

Open
pr-data-port opened this issue Jul 20, 2024 · 5 comments
Open

How to reduce Whisper hallucinations #198

pr-data-port opened this issue Jul 20, 2024 · 5 comments
Labels
help wanted Extra attention is needed

Comments

@pr-data-port
Copy link

When I ran 15min audio through whisper timestamped model "", which is from an interview - it randomly shows completely unrelated text in the audio that was played. Has anyone ever see such an issue?

@Jeronymous
Copy link
Member

Something is missing in this description.
Please clarify which model you are running, with either the command or the python code you run.

@pr-data-port
Copy link
Author

@Jeronymous this is the code I called:

device = "cuda:0" if torch.cuda.is_available() else "cpu"
model_id = "openai/whisper-large-v3"
model = whisper_timestamped.load_model(model_id, device=device)

result = whisper_timestamped.transcribe(
    model, 
    audio, 
    language="en", 
    beam_size=5, 
    best_of=5, 
    temperature=0.4,
    no_speech_threshold=0.6
)

print(json.dumps(result, indent=2, ensure_ascii = False))

(changing thetemperature to temperature=(0.2, 0.4, 0.6, 0.8, 1.0) didn't gave any improvements)
and the output I got was this:

{
  "text": " Hello everyone! Today I will show you how to make a super easy and easy to make Christmas tree Christmas tree is a very simple and very easy to make and very easy to make Christmas tree is a very simple and very easy to make Christmas tree is a very simple and very easy to make Christmas tree is a very simple and very easy to make Christmas tree is a very simple and very easy to make Christmas tree is a very simple and very easy to make Christmas tree is a very simple and very easy to make Christmas tree is a very simple Christmas tree is a very simple Christmas tree is a very simple Christmas tree is a very simple Christmas tree is a very simple Christmas tree is a very simple Christmas tree is a very simple Christmas tree is a very simple Christmas tree is a very simple Christmas tree is a very simple Christmas tree is a very simple Christmas tree is a very simple Christmas tree is a very simple Christmas tree is a very simple Christmas tree is a very simple Christmas tree is a very simple Christmas tree is a very simple Christmas tree is a very simple Christmas tree is a very simple Christmas tree is a very simple Christmas tree is a very simple Christmas tree is a very simple Christmas tree is a very simple Christmas tree is a very simple Christmas tree is a very simple Christmas tree is a very simple Christmas tree is a very simple Christmas tree is a very simple Christmas tree [...]

skipping the segment and word level timestamps as I am not sure they are relevant for this situation.
The audio is about something completely different and not even remotely related to Christmas :D

@Jeronymous
Copy link
Member

OK it seems the model hallucinates completely.
I wonder why you don't use temperature 0.
Maybe try with "temperature=0" ...

Otherwise, I would need the audio to investigate more.

Maybe the volume level is particularly low ?

You can also try option vad="auditok" or vad="silero", to try to avoid hallucination, as well as "condition_on_previous_text=False"

@pr-data-port
Copy link
Author

Thank you @Jeronymous changing the temperature and adding vad="auditok", did indeed help at least it shows correct text, just generates loads of duplications still. I am sceptical on adding the condition_on_previous_text=False, because that previously helped remove duplicates, but the error rate rose significantly.

@Jeronymous Jeronymous added the help wanted Extra attention is needed label Jul 24, 2024
@Jeronymous Jeronymous changed the title Audio How to reduce Whisper hallucinations Jul 24, 2024
@LaurinmyReha

This comment was marked as abuse.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants