Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate WhisperX performance #64

Open
alundgard opened this issue Dec 9, 2024 · 3 comments
Open

Investigate WhisperX performance #64

alundgard opened this issue Dec 9, 2024 · 3 comments
Assignees

Comments

@alundgard
Copy link
Member

alundgard commented Dec 9, 2024

@dnoneill dnoneill self-assigned this Jan 8, 2025
@dnoneill
Copy link

dnoneill commented Jan 8, 2025

https://drive.google.com/drive/folders/1XRsDK-w3OrIN8zbC0DbUB5iCqrsKQyCm?usp=drive_link

@alundgard
Copy link
Member Author

alundgard commented Jan 15, 2025

Comparing SDR Whisper to default WhisperX small

Druid: jz734cm7143, File: tt618qz3245_sl

Druid: dg444xm0599, File: fb204cb6192_sl

  • SDR Whisper output: Repeated "Thank you" hallucinations during the first 11 minutes of instrumental music. Semi-accurate transcription of sung lyrics ("Daisy, daisy...") and spoken numbers ("One million nine hundred..."). Repeated "I don't know" hallucinations during the last 7 minutes of instrumental music.
  • Default WhisperX output: Mostly-accurate "Music playing" caption during the first 11 minutes of instrumental music. Mostly-accurate transcription of sung lyrics ("Daisy, daisy...") and spoken numbers ("One million nine hundred..."). Repeated "So, you" hallucinations during the last 7 minutes of instrumental music.

Druid: mc135dt6327, File: bs744dg5568_sl

  • SDR Whisper output: Japanese language output, unable to read. (Was Japanese specified as the transcription language? Answer: Yes.) There appears to be repeated hallucination during the last 8 minutes of silence (starting around 52:00).
  • Default WhisperX output: English language translation, unable to evaluate accuracy. (Was English specified as the translation language? Answer: No, there is no speech in the first 30 seconds for WhisperX to auto-detect.) No hallucination during the last 8 minutes of silence (the last vtt caption is at 51:52).

@alundgard
Copy link
Member Author

alundgard commented Jan 17, 2025

Comparing default Whisper small to default WhisperX small

Druid: jz734cm7143, File: tt618qz3245_sl

  • Default Whisper small output: Fewer hallucinations during music compared to SDR Whisper. Still some hallucinations during music/noise at 23:58. Note: No "BF-WATCH TV 2021" hallucinations.
  • Default WhisperX output: No apparent hallucinations at the above times.

Druid: dg444xm0599, File: fb204cb6192_sl

  • Default Whisper small output: Very significant hallucinations throughout ("I don't know what I'm doing" and "I'm sorry"). Almost unusable. Does not capture any of the sung lyrics ("Daisy, daisy ..."). Captures some of the spoken numbers ("One million nine hundred...").
  • Default WhisperX output: Mostly-accurate "Music playing" caption during the first 11 minutes of instrumental music. Mostly-accurate transcription of sung lyrics ("Daisy, daisy...") and spoken numbers ("One million nine hundred..."). Repeated "So, you" hallucinations during the last 7 minutes of instrumental music.

Druid: mc135dt6327, File: bs744dg5568_sl

  • SDR Whisper output: Very significant hallucinations throughout ("..." and "Please subscribe to my channel"). Hallucination during the last 8 minutes of silence. Unable to evaluate translation accuracy.
  • Default WhisperX output: English language translation, unable to evaluate accuracy. (Was English specified as the translation language? Answer: No, there is no speech in the first 30 seconds for WhisperX to auto-detect.) No hallucination during the last 8 minutes of silence (the last vtt caption is at 51:52).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants