From 96e7f5c7ea543d40b73ff773cb776b6f1178c503 Mon Sep 17 00:00:00 2001 From: Fangjun Kuang Date: Tue, 24 Aug 2021 21:30:30 +0800 Subject: [PATCH] Release v0.1 (#26) --- README.md | 93 ++++++++++++++--------------------- egs/librispeech/ASR/README.md | 65 +----------------------- 2 files changed, 39 insertions(+), 119 deletions(-) diff --git a/README.md b/README.md index 0a9b657b3f..dc03c58836 100644 --- a/README.md +++ b/README.md @@ -1,80 +1,61 @@ - -# Table of Contents - -- [Installation](#installation) - * [Install k2](#install-k2) - * [Install lhotse](#install-lhotse) - * [Install icefall](#install-icefall) -- [Run recipes](#run-recipes) +
+ +
## Installation -`icefall` depends on [k2][k2] for FSA operations and [lhotse][lhotse] for -data preparations. To use `icefall`, you have to install its dependencies first. -The following subsections describe how to setup the environment. - -CAUTION: There are various ways to setup the environment. What we describe -here is just one alternative. +Please refer to +for installation. -### Install k2 +## Recipes -Please refer to [k2's installation documentation][k2-install] to install k2. -If you have any issues about installing k2, please open an issue at -. +Please refer to +for more information. -### Install lhotse +We provide two recipes at present: -Please refer to [lhotse's installation documentation][lhotse-install] to install -lhotse. + - [yesno][yesno] + - [LibriSpeech][librispeech] -### Install icefall +### yesno -`icefall` is a set of Python scripts. What you need to do is just to set -the environment variable `PYTHONPATH`: +This is the simplest ASR recipe in `icefall` and can be run on CPU. +Training takes less than 30 seconds and gives you the following WER: -```bash -cd $HOME/open-source -git clone https://github.com/k2-fsa/icefall -cd icefall -pip install -r requirements.txt -export PYTHONPATH=$HOME/open-source/icefall:$PYTHONPATHON ``` - -To verify `icefall` was installed successfully, you can run: - -```bash -python3 -c "import icefall; print(icefall.__file__)" +[test_set] %WER 0.42% [1 / 240, 0 ins, 1 del, 0 sub ] ``` +We do provide a Colab notebook for this recipe. -It should print the path to `icefall`. +[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1tIjjzaJc3IvGyKiMCDWO-TSnBgkcuN3B?usp=sharing) -## Recipes -At present, two recipes are provided: +### LibriSpeech - - [LibriSpeech][LibriSpeech] - - [yesno][yesno] [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1tIjjzaJc3IvGyKiMCDWO-TSnBgkcuN3B?usp=sharing) +We provide two models for this recipe: [conformer CTC model][LibriSpeech_conformer_ctc] +and [TDNN LSTM CTC model][LibriSpeech_tdnn_lstm_ctc]. -### Yesno +#### Conformer CTC Model -For the yesno recipe, training with 50 epochs takes less than 2 minutes using **CPU**. +The best WER we currently have is: -The WER is +||test-clean|test-other| +|--|--|--| +|WER| 2.57% | 5.94% | -``` -[test_set] %WER 0.42% [1 / 240, 0 ins, 1 del, 0 sub ] -``` +We provide a Colab notebook to run a pre-trained conformer CTC model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1huyupXAcHsUrKaWfI83iMEJ6J0Nh0213?usp=sharing) + +#### TDNN LSTM CTC Model -## Use Pre-trained models +The WER for this model is: -See [egs/librispeech/ASR/conformer_ctc/README.md](egs/librispeech/ASR/conformer_ctc/README.md) -for how to use pre-trained models. -[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1huyupXAcHsUrKaWfI83iMEJ6J0Nh0213?usp=sharing) +||test-clean|test-other| +|--|--|--| +|WER| 6.59% | 17.69% | +We provide a Colab notebook to run a pre-trained TDNN LSTM CTC model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1kNmDXNMwREi0rZGAOIAOJo93REBuOTcd?usp=sharing) -[yesno]: egs/yesno/ASR/README.md -[LibriSpeech]: egs/librispeech/ASR/README.md -[k2-install]: https://k2.readthedocs.io/en/latest/installation/index.html# -[k2]: https://github.com/k2-fsa/k2 -[lhotse]: https://github.com/lhotse-speech/lhotse -[lhotse-install]: https://lhotse.readthedocs.io/en/latest/getting-started.html#installation +[LibriSpeech_tdnn_lstm_ctc]: egs/librispeech/ASR/tdnn_lstm_ctc +[LibriSpeech_conformer_ctc]: egs/librispeech/ASR/conformer_ctc +[yesno]: egs/yesno/ASR +[librispeech]: egs/librispeech/ASR diff --git a/egs/librispeech/ASR/README.md b/egs/librispeech/ASR/README.md index 30778ed05e..ae0c2684df 100644 --- a/egs/librispeech/ASR/README.md +++ b/egs/librispeech/ASR/README.md @@ -1,64 +1,3 @@ -## Data preparation - -If you want to use `./prepare.sh` to download everything for you, -you can just run - -``` -./prepare.sh -``` - -If you have pre-downloaded the LibriSpeech dataset, please -read `./prepare.sh` and modify it to point to the location -of your dataset so that it won't re-download it. After modification, -please run - -``` -./prepare.sh -``` - -The script `./prepare.sh` prepares features, lexicon, LMs, etc. -All generated files are saved in the folder `./data`. - -**HINT:** `./prepare.sh` supports options `--stage` and `--stop-stage`. - -## TDNN-LSTM CTC training - -The folder `tdnn_lstm_ctc` contains scripts for CTC training -with TDNN-LSTM models. - -Pre-configured parameters for training and decoding are set in the function -`get_params()` within `tdnn_lstm_ctc/train.py` -and `tdnn_lstm_ctc/decode.py`. - -Parameters that can be passed from the command-line can be found by - -``` -./tdnn_lstm_ctc/train.py --help -./tdnn_lstm_ctc/decode.py --help -``` - -If you have 4 GPUs on a machine and want to use GPU 0, 2, 3 for -mutli-GPU training, you can run - -``` -export CUDA_VISIBLE_DEVICES="0,2,3" -./tdnn_lstm_ctc/train.py \ - --master-port 12345 \ - --world-size 3 -``` - -If you want to decode by averaging checkpoints `epoch-8.pt`, -`epoch-9.pt` and `epoch-10.pt`, you can run - -``` -./tdnn_lstm_ctc/decode.py \ - --epoch 10 \ - --avg 3 -``` - -## Conformer CTC training - -The folder `conformer-ctc` contains scripts for CTC training -with conformer models. The steps of running the training and -decoding are similar to `tdnn_lstm_ctc`. +Please refer to +for how to run models in this recipe.