Skip to content

Latest commit

 

History

History
225 lines (160 loc) · 11.3 KB

CONTRIBUTING.md

File metadata and controls

225 lines (160 loc) · 11.3 KB

How to contribute to ESPnet

1. What to contribute

If you are interested in contributing to ESPnet, your contributions will fall into three categories: major features, minor updates, and recipes.

1.1 Major features

If you want to ask or propose a new feature, please first open a new issue with the tag Feature request or directly contact Shinji Watanabe [email protected] or other main developers. Each feature implementation and design should be discussed and modified according to ongoing and future works. You can find ongoing major development plans at https://github.com/espnet/espnet/milestones or in https://github.com/espnet/espnet/issues (pinned issues)

1.2 Minor Updates (minor feature, bug-fix for an issue)

If you want to propose a minor feature, update an existing minor feature, or fix a bug, please first take a look at the existing issues and/or pull requests. Pick an issue and comment on the task that you want to work on this feature.

If you need help or additional information to propose the feature, you can open a new issue with the tag Discussion and ask ESPnet members.

1.3 Recipes

ESPnet provides and maintains many example scripts, called recipes, that demonstrate how to use the toolkit. The recipes for ESPnet1 are put under egs directory, while ESPnet2 ones are put under egs2. Similar to Kaldi, each subdirectory of egs and egs2 corresponds to a corpus that we have example scripts for.

1.3.1 ESPnet1 recipes

ESPnet1 recipes (egs/X) follow the convention from Kaldi and may rely on several utilities available in Kaldi. As such, porting a new recipe from Kaldi to ESPnet is natural, and the user may refer to port-kaldi-recipe and other existing recipes for new additions. For the Kaldi-style recipe architecture, please refer to Prepare-Kaldi-Style-Directory.

For each recipe, we ask you to report the following: experiments results and environnement, model information. For reproducibility, a link to upload the pre-trained model may also be added. All this information should be written in a markdown file called RESULTS.md and put at the recipe root. You can refer to tedlium2-example for an example.

To generate RESULTS.md for a recipe, please follow the following instructions:

  • Execute ~/espnet/utils/show_result.sh at the recipe root (where run.sh is located). You'll get your environment information and evaluation results for each experiment in a markdown format. From here, you can copy or redirect text output to RESULTS.md.
  • Execute ~/espnet/utils/pack_model.sh at the recipe root to generate a packed ESPnet model called model.tar.gz and output model information. Executing the utility script without argument will give you the expected arguments.
  • Put the model information in RESULTS.md and model link if you're using a private web storage
  • If you don't have private web storage, please contact Shinji Watanabe [email protected] to give you access to ESPnet storage.

1.3.2 ESPnet2 recipes

ESPnet2's recipes correspond to egs2. ESPnet2 applies a new paradigm without dependencies of Kaldi's binaries, which makes it lighter and more generalized. For ESPnet2, we do not recommend preparing the recipe's stages for each corpus but using the common pipelines we provided in asr.sh, tts.sh, and enh.sh. For details of creating ESPnet2 recipes, please refer to egs2-readme.

The common pipeline of ESPnet2 recipes will take care of the RESULTS.md generation, model packing, and uploading. ESPnet2 models are maintained at Zenodo and Hugging Face. You can also refer to the document in https://github.com/espnet/espnet_model_zoo To upload your model, you need first:

  1. Sign up to Zenodo: https://zenodo.org/
  2. Create access token: https://zenodo.org/account/settings/applications/tokens/new/
  3. Set your environment: % export ACCESS_TOKEN=""

To port models from zenodo using Hugging Face hub,

  1. Create a Hugging Face account - https://huggingface.co/
  2. Request to be added to espnet organisation - https://huggingface.co/espnet
  3. Go to egs2/RECIPE/*/scripts/utils and run ./upload_models_to_hub.sh "ZENODO_MODEL_NAME"

1.3.3 Additional requirements for new recipe

2 Pull Request

If your proposed feature or bugfix is ready, please open a Pull Request (PR) at https://github.com/espnet/espnet or use the Pull Request button in your forked repo. If you're not familiar with the process, please refer to the following guides:

3 Version policy and development branches

We basically develop in the master branch.

  1. We will keep the first version digit 0 until we have some super major changes in the project organization level.

  2. The second version digit will be updated when we have major updates, including new functions and refactoring, and their related bug fix and recipe changes. This version update will be done roughly every half year so far (but it depends on the development plan).

  3. The third version digit will be updated when we fix serious bugs or accumulate some minor changes, including recipe related changes periodically (every two months or so).

4 Unit testing

ESPnet's testing is located under test/. You can install additional packages for testing as follows:

$ cd <espnet_root>
$ . ./tools/activate_python.sh
$ pip install -e ".[test]"

4.1 Python

Then you can run the entire test suite using flake8, autopep8, black and pytest with coverage by

./ci/test_python.sh

Followings are some useful tips when you are using pytest:

  • New test file should be put under test/ directory and named test_xxx.py. Each method in the test file should have the format def test_yyy(...). Pytest will automatically find and test them.
  • We recommend adding several small test files instead of grouping them in one big file (e.g.: test_e2e_xxx.py). Technically, a test file should only cover methods from one file (e.g.: test_transformer_utils.py to test transformer_utils.py).
  • To monitor test coverage and avoid the overlapping test, we recommend using pytest --cov-report term-missing <test_file|dir> to highlight covered and missed lines. For more details, please refer to coverage-test.
  • We limited test running time to 2.0 seconds (see: pytest-timeouts). As such, we recommend using small model parameters and avoiding dynamic imports, file access, and unnecessary loops. If a unit test needs more running time, you can annotate your test with @pytest.mark.execution_timeout(sec).
  • For test initialization (parameters, modules, etc), you can use pytest fixtures. Refer to pytest fixtures for more information.

4.2 Bash scripts

You can also test the scripts in utils with bats-core and shellcheck.

To test:

./ci/test_shell.sh

5 Integration testing

Write new integration tests in ci/test_integration.sh when you add new features in espnet/bin. They use our smallest dataset egs/mini_an4 to test run.sh. To make the coverage take them into account, don't forget --python ${python} support in your run.sh

# ci/integration_test.sh

python="coverage run --append"

cd egs/mini_an4/your_task
./run.sh --python "${python}"

5.1 Configuration files

6 Writing new tools

You can place your new tools under

  • espnet/bin: heavy and large (e.g., neural network related) core tools.
  • utils: lightweight self-contained python/bash scripts.

For utils scripts, do not forget to add help messages and test scripts under test_utils.

6.1 Python tools guideline

To generate doc, do not forget def get_parser(): -> ArgumentParser in the main file.

#!/usr/bin/env python3
# Copyright XXX
#  Apache 2.0  (http://www.apache.org/licenses/LICENSE-2.0)
import argparse

# NOTE: do not forget this
def get_parser():
    parser = argparse.ArgumentParser(
        description="awsome tool",  # DO NOT forget this
    )
    ...
    return parser

if __name__ == '__main__':
    args = get_parser().parse_args()
    ...

6.2 Bash tools guideline

To generate doc, support --help to show its usage. If you use Kaldi's utils/parse_option.sh, define help_message="Usage: $0 ...".

7 Writing documentation

See doc.

8 Adding pretrained models

Pack your trained models using utils/pack_model.sh and upload it here (You require permission). Add the shared link to utils/recog_wav.sh or utils/synth_wav.sh as follows:

    "tedlium.demo") share_url="https://drive.google.com/open?id=1UqIY6WJMZ4sxNxSugUqp3mrGb3j6h7xe" ;;

The model name is arbitrary for now.

9 On CI failure

9.1 Travis CI and Github Actions

  1. read the log from PR checks > details

9.2 Circle CI

  1. read the log from PR checks > details
  2. turn on Rerun workflow > Rerun job with SSH
  3. open your local terminal and ssh -p xxx xxx (check circle ci log for the exact address)
  4. try anything you can to pass the CI

9.3 Codecov

  1. write more tests to increase coverage
  2. explain to reviewers why you can't increase coverage