Congressional Hearing Transcriber

A Flask-based web application that automatically transcribes congressional hearing audio files, identifies speakers, and generates analytical summaries. Built for researchers, journalists, and policy analysts working with congressional hearing recordings.

Features

High-accuracy speech-to-text transcription using OpenAI Whisper
Automatic speaker identification and separation using Pyannote (not yet attributing the actual identity of the speaker)
Web interface for easy file uploads (not yet tested)
Multiple output formats (JSON, TXT, SRT)
Real-time processing status updates
Secure file handling and storage (at your own risk)

Installation

Clone the repository: git clone https://github.com/YOUR-USERNAME/hearing-transcriber.git cd hearing-transcriber
Create and activate a virtual environment: python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
Install required packages: pip install -r requirements.txt
Create a .env file with required tokens: SECRET_KEY=your_flask_secret_key - This is a random string you generate for Flask security - You can generate one in Python using: import secrets print(secrets.token_hex(16) - Copy the output and use it as your SECRET_KEY

HF_TOKEN=your_huggingface_token
- Create account at Hugging Face (https://huggingface.co/)
- Go to Settings → Access Tokens
- Click "New token"
- Give it a name (e.g., "hearing-transcriber")
- Select "read" access
- Accept the model terms:
- Visit Speaker Diarization model
- Click "Accept terms" button
- Visit Segmentation model
- Click "Accept terms" button

Prerequisites

System Requirements:
- Python 3.10+
- ffmpeg
- 8GB+ RAM recommended
API Access:
- Hugging Face account (https://huggingface.co)
- Accept terms for pyannote/speaker-diarization
- Accept terms for pyannote/segmentation

Usage

Start the Flask server: python run.py
Open your browser and navigate to: http://localhost:5000
Upload an audio file (MP3 or WAV format)
- Maximum file size: 16MB
- Supported formats: MP3, WAV
Wait for processing to complete
- Progress will be shown in real-time
- Results will be available for download

Output Files

transcript.json: Full transcript with speaker labels and timestamps
transcript.txt: Human-readable transcript with summary
transcript.srt: Subtitle format with speaker identification

Project Structure

/app /static - Static assets and uploaded images /storage - Temporary file storage /templates - HTML templates /transcriber - Core transcription logic /config - Application configuration

Security Notes

All uploaded files are securely handled
Temporary files are automatically cleaned
File extensions are strictly validated
Maximum file size is enforced

License

MIT License - See LICENSE file for details

Contributing

Contributions are welcome! Please read CONTRIBUTING.md for guidelines.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
app		app
.gitignore		.gitignore
README.md		README.md
check_structure.py		check_structure.py
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Congressional Hearing Transcriber

Features

Installation

Prerequisites

Usage

Output Files

Project Structure

Security Notes

License

Contributing

About

Releases

Packages

Languages

JensRantil/hearing-transcriber

Folders and files

Latest commit

History

Repository files navigation

Congressional Hearing Transcriber

Features

Installation

Prerequisites

Usage

Output Files

Project Structure

Security Notes

License

Contributing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages