-
Notifications
You must be signed in to change notification settings - Fork 310
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
f4223ee
commit 96e7f5c
Showing
2 changed files
with
39 additions
and
119 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,80 +1,61 @@ | ||
|
||
# Table of Contents | ||
|
||
- [Installation](#installation) | ||
* [Install k2](#install-k2) | ||
* [Install lhotse](#install-lhotse) | ||
* [Install icefall](#install-icefall) | ||
- [Run recipes](#run-recipes) | ||
<div align="center"> | ||
<img src="https://raw.githubusercontent.com/k2-fsa/icefall/master/docs/source/_static/logo.png" width=168> | ||
</div> | ||
|
||
## Installation | ||
|
||
`icefall` depends on [k2][k2] for FSA operations and [lhotse][lhotse] for | ||
data preparations. To use `icefall`, you have to install its dependencies first. | ||
The following subsections describe how to setup the environment. | ||
|
||
CAUTION: There are various ways to setup the environment. What we describe | ||
here is just one alternative. | ||
Please refer to <https://icefall.readthedocs.io/en/latest/installation/index.html> | ||
for installation. | ||
|
||
### Install k2 | ||
## Recipes | ||
|
||
Please refer to [k2's installation documentation][k2-install] to install k2. | ||
If you have any issues about installing k2, please open an issue at | ||
<https://github.com/k2-fsa/k2/issues>. | ||
Please refer to <https://icefall.readthedocs.io/en/latest/recipes/index.html> | ||
for more information. | ||
|
||
### Install lhotse | ||
We provide two recipes at present: | ||
|
||
Please refer to [lhotse's installation documentation][lhotse-install] to install | ||
lhotse. | ||
- [yesno][yesno] | ||
- [LibriSpeech][librispeech] | ||
|
||
### Install icefall | ||
### yesno | ||
|
||
`icefall` is a set of Python scripts. What you need to do is just to set | ||
the environment variable `PYTHONPATH`: | ||
This is the simplest ASR recipe in `icefall` and can be run on CPU. | ||
Training takes less than 30 seconds and gives you the following WER: | ||
|
||
```bash | ||
cd $HOME/open-source | ||
git clone https://github.com/k2-fsa/icefall | ||
cd icefall | ||
pip install -r requirements.txt | ||
export PYTHONPATH=$HOME/open-source/icefall:$PYTHONPATHON | ||
``` | ||
|
||
To verify `icefall` was installed successfully, you can run: | ||
|
||
```bash | ||
python3 -c "import icefall; print(icefall.__file__)" | ||
[test_set] %WER 0.42% [1 / 240, 0 ins, 1 del, 0 sub ] | ||
``` | ||
We do provide a Colab notebook for this recipe. | ||
|
||
It should print the path to `icefall`. | ||
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1tIjjzaJc3IvGyKiMCDWO-TSnBgkcuN3B?usp=sharing) | ||
|
||
## Recipes | ||
|
||
At present, two recipes are provided: | ||
### LibriSpeech | ||
|
||
- [LibriSpeech][LibriSpeech] | ||
- [yesno][yesno] [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1tIjjzaJc3IvGyKiMCDWO-TSnBgkcuN3B?usp=sharing) | ||
We provide two models for this recipe: [conformer CTC model][LibriSpeech_conformer_ctc] | ||
and [TDNN LSTM CTC model][LibriSpeech_tdnn_lstm_ctc]. | ||
|
||
### Yesno | ||
#### Conformer CTC Model | ||
|
||
For the yesno recipe, training with 50 epochs takes less than 2 minutes using **CPU**. | ||
The best WER we currently have is: | ||
|
||
The WER is | ||
||test-clean|test-other| | ||
|--|--|--| | ||
|WER| 2.57% | 5.94% | | ||
|
||
``` | ||
[test_set] %WER 0.42% [1 / 240, 0 ins, 1 del, 0 sub ] | ||
``` | ||
We provide a Colab notebook to run a pre-trained conformer CTC model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1huyupXAcHsUrKaWfI83iMEJ6J0Nh0213?usp=sharing) | ||
|
||
#### TDNN LSTM CTC Model | ||
|
||
## Use Pre-trained models | ||
The WER for this model is: | ||
|
||
See [egs/librispeech/ASR/conformer_ctc/README.md](egs/librispeech/ASR/conformer_ctc/README.md) | ||
for how to use pre-trained models. | ||
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1huyupXAcHsUrKaWfI83iMEJ6J0Nh0213?usp=sharing) | ||
||test-clean|test-other| | ||
|--|--|--| | ||
|WER| 6.59% | 17.69% | | ||
|
||
We provide a Colab notebook to run a pre-trained TDNN LSTM CTC model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1kNmDXNMwREi0rZGAOIAOJo93REBuOTcd?usp=sharing) | ||
|
||
[yesno]: egs/yesno/ASR/README.md | ||
[LibriSpeech]: egs/librispeech/ASR/README.md | ||
[k2-install]: https://k2.readthedocs.io/en/latest/installation/index.html# | ||
[k2]: https://github.com/k2-fsa/k2 | ||
[lhotse]: https://github.com/lhotse-speech/lhotse | ||
[lhotse-install]: https://lhotse.readthedocs.io/en/latest/getting-started.html#installation | ||
[LibriSpeech_tdnn_lstm_ctc]: egs/librispeech/ASR/tdnn_lstm_ctc | ||
[LibriSpeech_conformer_ctc]: egs/librispeech/ASR/conformer_ctc | ||
[yesno]: egs/yesno/ASR | ||
[librispeech]: egs/librispeech/ASR |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,64 +1,3 @@ | ||
|
||
## Data preparation | ||
|
||
If you want to use `./prepare.sh` to download everything for you, | ||
you can just run | ||
|
||
``` | ||
./prepare.sh | ||
``` | ||
|
||
If you have pre-downloaded the LibriSpeech dataset, please | ||
read `./prepare.sh` and modify it to point to the location | ||
of your dataset so that it won't re-download it. After modification, | ||
please run | ||
|
||
``` | ||
./prepare.sh | ||
``` | ||
|
||
The script `./prepare.sh` prepares features, lexicon, LMs, etc. | ||
All generated files are saved in the folder `./data`. | ||
|
||
**HINT:** `./prepare.sh` supports options `--stage` and `--stop-stage`. | ||
|
||
## TDNN-LSTM CTC training | ||
|
||
The folder `tdnn_lstm_ctc` contains scripts for CTC training | ||
with TDNN-LSTM models. | ||
|
||
Pre-configured parameters for training and decoding are set in the function | ||
`get_params()` within `tdnn_lstm_ctc/train.py` | ||
and `tdnn_lstm_ctc/decode.py`. | ||
|
||
Parameters that can be passed from the command-line can be found by | ||
|
||
``` | ||
./tdnn_lstm_ctc/train.py --help | ||
./tdnn_lstm_ctc/decode.py --help | ||
``` | ||
|
||
If you have 4 GPUs on a machine and want to use GPU 0, 2, 3 for | ||
mutli-GPU training, you can run | ||
|
||
``` | ||
export CUDA_VISIBLE_DEVICES="0,2,3" | ||
./tdnn_lstm_ctc/train.py \ | ||
--master-port 12345 \ | ||
--world-size 3 | ||
``` | ||
|
||
If you want to decode by averaging checkpoints `epoch-8.pt`, | ||
`epoch-9.pt` and `epoch-10.pt`, you can run | ||
|
||
``` | ||
./tdnn_lstm_ctc/decode.py \ | ||
--epoch 10 \ | ||
--avg 3 | ||
``` | ||
|
||
## Conformer CTC training | ||
|
||
The folder `conformer-ctc` contains scripts for CTC training | ||
with conformer models. The steps of running the training and | ||
decoding are similar to `tdnn_lstm_ctc`. | ||
Please refer to <https://icefall.readthedocs.io/en/latest/recipes/librispeech.html> | ||
for how to run models in this recipe. |