A Flask-based web application that automatically transcribes congressional hearing audio files, identifies speakers, and generates analytical summaries. Built for researchers, journalists, and policy analysts working with congressional hearing recordings.
- High-accuracy speech-to-text transcription using OpenAI Whisper
- Automatic speaker identification and separation using Pyannote (not yet attributing the actual identity of the speaker)
- Web interface for easy file uploads (not yet tested)
- Multiple output formats (JSON, TXT, SRT)
- Real-time processing status updates
- Secure file handling and storage (at your own risk)
-
Clone the repository: git clone https://github.com/YOUR-USERNAME/hearing-transcriber.git cd hearing-transcriber
-
Create and activate a virtual environment: python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install required packages: pip install -r requirements.txt
-
Create a .env file with required tokens: SECRET_KEY=your_flask_secret_key - This is a random string you generate for Flask security - You can generate one in Python using: import secrets print(secrets.token_hex(16) - Copy the output and use it as your SECRET_KEY
HF_TOKEN=your_huggingface_token
- Create account at Hugging Face (https://huggingface.co/)
- Go to Settings → Access Tokens
- Click "New token"
- Give it a name (e.g., "hearing-transcriber")
- Select "read" access
- Accept the model terms:
- Visit Speaker Diarization model
- Click "Accept terms" button
- Visit Segmentation model
- Click "Accept terms" button
-
System Requirements:
- Python 3.10+
- ffmpeg
- 8GB+ RAM recommended
-
API Access:
- Hugging Face account (https://huggingface.co)
- Accept terms for pyannote/speaker-diarization
- Accept terms for pyannote/segmentation
-
Start the Flask server: python run.py
-
Open your browser and navigate to: http://localhost:5000
-
Upload an audio file (MP3 or WAV format)
- Maximum file size: 16MB
- Supported formats: MP3, WAV
-
Wait for processing to complete
- Progress will be shown in real-time
- Results will be available for download
- transcript.json: Full transcript with speaker labels and timestamps
- transcript.txt: Human-readable transcript with summary
- transcript.srt: Subtitle format with speaker identification
/app /static - Static assets and uploaded images /storage - Temporary file storage /templates - HTML templates /transcriber - Core transcription logic /config - Application configuration
- All uploaded files are securely handled
- Temporary files are automatically cleaned
- File extensions are strictly validated
- Maximum file size is enforced
MIT License - See LICENSE file for details
Contributions are welcome! Please read CONTRIBUTING.md for guidelines.