This project implements a high-performance audio transcription API using Rust, Python, and the Whisper AI model. It combines the efficiency of Rust for the web server with the power of Python's machine learning ecosystem.
- Rust Web Server (
src/main.rs
) - Python Whisper Integration (
python/whisper_ffi.py
) - Benchmark Script (
benchmark_transcription.py
)
- Rust (latest stable version)
- Python 3.8+
- uv (Python package manager)
-
Clone the repository:
git clone https://github.com/yourusername/whisper-transcription-api.git cd whisper-transcription-api
-
Install Rust dependencies:
cargo build
-
Set up the Python environment:
uv venv source .venv/bin/activate # On Windows use: .venv\Scripts\activate uv pip install -r requirements.txt
- Start the Rust server:
cargo run
The server will start on http://localhost:3000
.
Send a POST request to http://localhost:3000/transcribe
with the following JSON body:
{
"audio_data": [<raw audio bytes as a list of integers>]
}
The API will respond with:
{
"transcription": "Transcribed text",
"language": "Detected language",
"confidence": 0.98
}
To benchmark the API performance:
- Update the
audio_file_path
inbenchmark_transcription.py
to point to your test audio file. - Run the benchmark script:
python benchmark_transcription.py
This will output performance statistics for both sequential and concurrent requests.
whisper-transcription-api/
├── src/
│ └── main.rs
├── python/
│ └── whisper_ffi.py
├── Cargo.toml
├── pyproject.toml
├── requirements.txt
├── benchmark_transcription.py
└── README.md
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.