Skip to content

This is a simple server that uses Silero models to convert text to audio files over HTTP

License

Notifications You must be signed in to change notification settings

twirapp/silero-tts-api-server

Repository files navigation

Banner

Languages supported

Note

All models are from the repository: snakers4/silero-models

Language Model Speakers
Russian v4_ru 5: aidar, baya, kseniya, xenia, eugene
Ukrainian v4_ua 1: mykyta
Uzbek v4_uz 1: dilnavoz
English v3_en 118: en_0, en_1, ..., en_117
Spanish v3_es 3: es_0, es_1, es_2
French v3_fr 6: fr_0, fr_1, fr_2, fr_3, fr_4, fr_5
German v3_de 5: bernd_ungerer, eva_k, friedrich, hokuspokus, karlsson
Tatar v3_tt 1: dilyara
Mongolian v3_xal 2: erdni, delghir

Installation via docker

Important

This requires docker installed and the docker daemon running

docker run --rm -p 8000:8000 twirapp/silero-tts-api-server
Build and run from local repository

Clone the repository:

git clone https://github.com/twirapp/silero-tts-api-server.git && cd silero-tts-api-server

Build docker image:

docker build -f docker/Dockerfile -t silero-tts-api-server .

Run the container:

docker run --rm -p 8000:8000 silero-tts-api-server

Or use docker compose:

docker-compose -f docker/compose.yml up

Installation

Important

Minimum requirement python 3.9

This project uses rye for dependency management, it assumes you have installed it

  1. Clone the repository

    git clone https://github.com/twirapp/silero-tts-api-server.git && cd silero-tts-api-server
  2. Install dependencies

    This will automatically create the virtual environment in the .venv directory and install the required dependencies

    rye sync
    (not recommended) alternative install via pip Create a virtual environment and activate:
    python3 -m venv .venv && source .venv/bin/activate

    Install only the required dependencies:

    pip3 install --no-deps -r requirements.lock
  3. Download silero tts models

    bash ./install_models.sh
  4. Run the server

    litestar run

Note

The default will be localhost:8000

Documentation

You can view the automatically generated documentation based on OpenAPI at:

Provider Url
Swagger https://localhost:8000/schema/
ReDoc https://localhost:8000/schema/redoc
Stoplight Elements https://localhost:8000/schema/elements
RepiDoc https://localhost:8000/schema/repidoc
OpenAPI schema yaml https://localhost:8000/schema/openapi.yaml
OpenAPI schema json https://localhost:8000/schema/openapi.json

Endpoints

  • GET /generate - Generate audio in wav format from text. Parameters: text speaker sample_rate, pitch, rate
  • GET /speakers - Get list of speakers

sample_rate can be set from 8 000, 24 000, 48 000 pitch and rate can be set from 0 to 100

Environment variables

  • TEXT_LENGTH_LIMIT - Maximum length of the text to be processed. Default is 930 characters.
  • MKL_NUM_THREADS - Number of threads to use for generating audio. Default number of threads: number of CPU cores.

Considerations for the future

This repository is dedicated to twir.app and is designed to meet its requirements.

TwirApp needs to generate audio using the CPU. If support for other devices such as cuda or mps is needed, please open an issue.