This project provides a local Flask application that transcribes real-time phone calls made through the Twilio API. It uses Assembly AI for speech-to-text transcription, Ngrok for tunneling, and Flask to handle incoming calls and web socket connections. You can do real time transcribing of phone calls on your local host running on PORT 5000(in this case).
- Twilio API: For making calls and managing voice interactions.
- Assembly AI: For converting speech to text.
- Ngrok: To create a secure tunnel to your localhost server.
- Flask: To create a local development server.
- Base64: For decoding incoming audio by converting it into mu-law bytes.
- Flask-Sock: To handle WebSocket connections.
-
Clone the repository:
git clone https://github.com/aditya10avg/Transcriber.git cd Transcriber.git
-
Create a virtual environment and activate it: (Preferred but optional)
python -m venv venv
source venv/bin/activate # On Windows use `venv\Scripts\activate`
- Install the required dependencies:
pip install Flask flask-sock assemblyai python-dotenv ngrok twilio
-
Ensure you have an account with Twilio and Assembly AI, and sign up for Ngrok.
-
Usage Set your environment variables in a .env file:
Create file named .env
TWILIO_ACCOUNT_SID=your_twilio_account_sid
TWILIO_API_KEY_SID=your_twilio_api_key_sid
TWILIO_API_SECRET=your_twilio_api_secret
TWILIO_NUMBER=your_twilio_phone_number
NGROK_AUTHENTICATION=your_ngrok_auth_token
Run the Flask application:
python stream.py
Make a call to your Twilio number and speak to see your speech being transcribed in the console.
stream.py: This file contains the main logic for receiving calls, handling audio streams via WebSocket, and integrating with Twilio and Assembly AI.
twilio_transcriber.py: This file handles the connection to Assembly AI and manages the transcription process. Ensure this file is in the same folder as stream.py.
The Flask application listens for incoming POST requests from Twilio at the / endpoint. Upon receiving a call, it responds with TwiML instructions to connect the call audio to a WebSocket. The audio stream is sent over the WebSocket connection and processed in real-time. The audio payload is received in mu-law format, decoded from base64, and then streamed to Assembly AI for transcription.
TWILIO_ACCOUNT_SID: Your Twilio account SID.
TWILIO_API_KEY_SID: Your Twilio API Key SID.
TWILIO_API_SECRET: Your Twilio API Secret.
TWILIO_NUMBER: Your Twilio phone number.
NGROK_AUTHENTICATION: Your Ngrok authentication token.
Flask /n Flask-Sock /n Twilio /n Assembly AI /n Ngrok /n python-dotenv /n
- Ensure that
twilio_transcriber.py
is well-documented to provide additional context about its functionality. - You may want to add more detailed instructions or sections depending on your audience's technical knowledge.