🐸STT based forced alignment tool, fork of unmaintained DSAlign.
It is recommended to use this tool from within a virtual environment.
After cloning and changing to the root of the project,
there is a script for creating one with all requirements in the git-ignored dir venv
:
$ bin/createenv.sh
$ ls venv
bin include lib lib64 pyvenv.cfg share
bin/align.sh
will automatically use it.
Internally STT-align uses the 🐸STT STT engine. For it to be able to function, it requires a couple of files that are specific to the language of the speech data you want to align. You can download the models from here. If you want to align English, there is already a helper script that will download and prepare all required data:
$ bin/getmodel.sh
[...]
$ ls models/en/
alphabet.txt lm.binary output_graph.pb output_graph.pbmm output_graph.tflite trie
A typical application of the aligner is done in three phases:
- Preparing the data. Albeit most of this has to be done individually, there are some tools for data preparation, statistics and maintenance. All involved file formats are described here.
- Aligning the data using the alignment tool and it algorithm.
- Exporting aligned data using the data-set exporter.
There is a script for downloading and preparing some public domain speech and transcript data.
It requires ffmpeg
for some sample conversion.
$ bin/gettestdata.sh
$ ls data
test1 test2
Now the aligner can be called either "manually" (specifying all involved files directly):
$ bin/align.sh --audio data/test1/audio.wav --script data/test1/transcript.txt --aligned data/test1/aligned.json --tlog data/test1/transcript.log
Or "automatically" by specifying a so-called catalog file that bundles all involved paths:
$ bin/align.sh --catalog data/test1.catalog