Web app, command-line interface and Python library for synthesizing English texts into speech.
pip install en-tts --user
Visit 🤗 Hugging Face for a live demo.
You can also run it locally be executing en-tts-web
in CLI and opening your browser on http://127.0.0.1:7860.
en-tts-cli synthesize "When the sunlight strikes raindrops in the air, they act as a prism and form a rainbow."
The output can be listened here.
from pathlib import Path
from tempfile import gettempdir
from en_tts import Synthesizer, Transcriber, normalize_audio, save_audio
text = "When the sunlight strikes raindrops in the air, they act as a prism and form a rainbow."
transcriber = Transcriber()
synthesizer = Synthesizer()
text_ipa = transcriber.transcribe_to_ipa(text)
audio = synthesizer.synthesize(text_ipa)
tmp_dir = Path(gettempdir())
save_audio(audio, tmp_dir / "output.wav")
# Optional: normalize output
normalize_audio(tmp_dir / "output.wav", tmp_dir / "output_norm.wav")
The used TTS model is published here.
Evaluation results:
- MOS naturalness: 3.55 ± 0.28 (GT: 4.17 ± 0.23)
- MOS intelligibility: 4.44 ± 0.24 (GT: 4.63 ± 0.19)
- Mean MCD-DTW: 29.15
- Mean penalty: 0.1018
- Vowels: i, u, æ, ɑ, ɔ, ə, ɛ, ɪ, ʊ, ʌ
- Diphthongs: aɪ, aʊ, eɪ, oʊ, ɔɪ
- R-colored vowels: ɔr, ər, ɛr, ɪr, ʊr, ʌr
- Consonants: b, d, dʒ, f, h, j, k, l, m, n, p, r, s, t, tʃ, v, w, z, ð, ŋ, ɡ, ʃ, θ
- Breaks:
- SIL0 (no break)
- SIL1 (short break)
- SIL2 (break)
- SIL3 (long break)
- Special characters: . ? ! , : ; - — " ' ( ) [ ]
Each vowel, diphthong, r-colored vowel and consonant can have one of these duration markers:
- ˘ -> very short, e.g., oʊ˘
- nothing -> normal, e.g., oʊ
- ˑ -> half long, e.g., oʊˑ
- ː -> long, e.g., oʊː
Furthermore, each vowel, diphthong and r-colored vowel can have a leading stress symbol attached:
- ˈ -> primary stress, e.g., ˈoʊ
- ˌ -> secondary stress, e.g., ˌoʊ
- nothing -> no stress, e.g., oʊ
Stress and duration markers can be combined, e.g., ˌoʊː
If you want to cite this repo, you can use the BibTeX-entry generated by GitHub (see About => Cite this repository).
- Taubert, S. (2024). en-tts (Version 0.0.2) [Computer software]. https://doi.org/10.5281/zenodo.11032264
Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – Project-ID 416228727 – CRC 1410
The authors gratefully acknowledge the GWK support for funding this project by providing computing time through the Center for Information Services and HPC (ZIH) at TU Dresden.
The authors are grateful to the Center for Information Services and High Performance Computing [Zentrum fur Informationsdienste und Hochleistungsrechnen (ZIH)] at TU Dresden for providing its facilities for high throughput calculations.