LAS-based E2E system training with speller
./01-feature-extraction.sh # feature extraction
Feature extraction is inherited from Kaldi, and so needs kaldi-style data folder as the input, and features path.sh to point to kaldi installation and utils/ folder with data tools.
./02-sentencepiece-text-prep.sh # text segmenting with sentencepiece
./03-train.sh # prepares input; performs LAS training; has been tested on Python3.9
LAS training is based on scripts by Harikrishna Vydana, Decoder is changed to add speller, variables specifying different speller inputs, parameter for the number of embeddings representing OOVs
./04-decode.sh # runs beam-search decoding with the trained model