See the Medium Article for context and more information
NOTE: This does NOT work on mobile iOS devices. See Gradio Issue #2987 for details.
This is a Gradio UI application that takes in a request for a story from the microphone and speaks an interactive Choose-Your-Own-Adventure style children's story. It leverages:
- OpenAI Whisper: to transcribe user audio input request
- OpenAI ChatGPT (3.5-turbo): to generate a story chapter given the user's inputs
- (Optional) Google Cloud Text-to-Speech: to use realistic voices when telling the story.
WARNING: This application uses paid API services. Create quotas and watch your usage.
At the time of writing, the pricing is as follows:
- whisper: $0.006 / minute (rounded to the nearest second)
- gpt-3.5-turbo: $0.002 / 1K tokens
- Google Text-to-Speech:
- 0 to 1 million bytes free per month
- $0.000016 USD per byte ($16.00 USD per 1 million bytes)
Check the links as these can change often. But at the time of writing it costs less than one USD for light use.
Both OpenAI and Google offer free credits for new users.
Note there are two ways to speak the story: Mac or GCP Text-to-Speech. If using a Mac,
the Mac say
command is used and that's the easiest/fastest route to running this.
It uses the System voice set up in the Accessibility settings.
However, if not on a Mac or if you prefer a more realistic voice, the GCP Text-to-Speech may be used.
This requires you having (a) a GCP project, (b) the TTS API enabled, and (c) your account authenticated
in gcloud (or GOOGLE_APPLICATION_CREDENTIALS environment variable set).
This application has only been tested on a Macbook.
- Sign up at OpenAI and acquire an OpenAI API key.
- Add to environment variable with:
export OPENAI_API_KEY="sk-xxxxxxxxxxxxxxx
" - Create virtual environment
- Run
pip install -r requirements.txt
- If on Mac, brew install
ffmpeg
:brew install ffmpeg
- Linux may need to install also but untested.
- Review and update config in
config.py
as desired - If using GCP TTS
- set in
config.py
:SPEECH_METHOD = SpeechMethod.GCP
- Navigate to the Google API page and enable the API
- Confirm you are authenticated in gcloud and your account has access to that API.
- Run with:
python storyteller.py
- Navigate to
http://127.0.0.1:7860/
and have fun!
Replace <service-name>
with a name of your choice.
- Build Docker image:
docker build -t <image-name> .
- Run locally with something similar to:
docker run -it --rm \
-e GOOGLE_APPLICATION_CREDENTIALS=/tmp/creds.json \
-v ${HOME}/.config/gcloud/application_default_credentials.json:/tmp/creds.json \
-e OPENAI_API_KEY=<openai-api-key> \
-p <port>:7860 \
audio-storyteller \
python storyteller.py \
--address=0.0.0.0 \
--port=7860 \
--username=<username> \
--password=<password>
Fill in: <openai-api-key>, <port>, and optional <username>:<password>. Then once running, navigate on a browser to
127.0.0.1:` and fill in the
optional username:password you provided.
-
Follow the directions above to create a local docker image.
-
Tag and push (Note: Follow these directions to authenticate)
docker tag <image-name> gcr.io/<project-id>/<image-name> docker push gcr.io/<project-id>/<image-name>
-
Create a service account on your GCP project IAM page named:
audio-storytelling-bot@<project-id>.iam.gserviceaccount.com
-
Deploy with the following command, setting anything in
<>
appropriately:gcloud run deploy audio-storytelling-bot \ --image gcr.io/<project-id>/<image-name> \ --platform managed \ --service-account=audio-storytelling-bot@<project-id>.iam.gserviceaccount.com \ --set-env-vars=OPENAI_API_KEY=<openai-key-string> \ --allow-unauthenticated \ --port=7860 \ --cpu=1 \ --memory=512Mi \ --min-instances=0 \ --max-instances=3 \ --command="python" \ --args="storyteller.py,--address=0.0.0.0,--port=7860,--username=user,--password=storyteller"
Cloud Run will automatically scale the number of instances based on the incoming traffic. You can access the deployed Gradio application via the URL provided by the Cloud Run service.