Skip to content

Video transcription with speaker diarization utilizing Pyannote and Whisper.cpp

Notifications You must be signed in to change notification settings

sxvghd/video-transcriber

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Video transcription utilizing Speaker Diarization with Pyannote and Whisper.cpp

Uses yt-dlp to download and convert media, Whisper.cpp to transcribe audio, and then performs speaker diarization with Pyannote.

Usage

Set HF_TOKEN (Hugging Face token) and VIDEO_URL environment variables in docker-compose.yml, and then run main.py with docker compose up.

The large whisper model is automatically downloaded, but this can be adjusted in the Dockerfile.

Notes

Performance for diarization seems to be improved when segment length for whisper is decreased, such as --max-len 50.

About

Video transcription with speaker diarization utilizing Pyannote and Whisper.cpp

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 73.9%
  • Dockerfile 14.9%
  • Shell 11.2%