Add doc for using WeNet CTC models with sherpa-onnx (#507)

k2-fsa · Nov 16, 2023 · 1c4fe27 · 1c4fe27
1 parent a34c2c8
commit 1c4fe27
Show file tree

Hide file tree

Showing 10 changed files with 155 additions and 7 deletions.
diff --git a/docs/source/conf.py b/docs/source/conf.py
@@ -122,7 +122,7 @@ def get_version():
 .. _PyTorch: https://pytorch.org/
 .. _Huggingface: https://huggingface.co
 .. _WenetSpeech: https://github.com/wenet-e2e/WenetSpeech
-.. _wenet: https://github.com/k2-fsa/sherpa
+.. _WeNet: https://github.com/wenet-e2e/wenet
 .. _GigaSpeech: https://github.com/SpeechColab/GigaSpeech
 .. _Kaldi: https://github.com/kaldi-asr/kaldi
 .. _kaldifeat: https://csukuangfj.github.io/kaldifeat/installation.html

diff --git a/docs/source/cpp/pretrained_models/index.rst b/docs/source/cpp/pretrained_models/index.rst
@@ -16,7 +16,7 @@ Two kinds of end-to-end (E2E) models are supported by `sherpa`_:
 
    For CTC-based models, we support any type of models trained using CTC loss
    as long as you can export the model via torchscript. Models from the following
-   frameworks are currently supported: `icefall`_, `wenet`_, and `torchaudio`_ (Wav2Vec 2.0).
+   frameworks are currently supported: `icefall`_, `WeNet`_, and `torchaudio`_ (Wav2Vec 2.0).
    If you have a CTC model and want it to be supported in `sherpa`, please
    create an issue at `<https://github.com/k2-fsa/sherpa/issues>`_.
 
@@ -46,7 +46,7 @@ This page lists all available pre-trained models that you can download.
    for you to try offline recognition step by step.
 
    It shows how to install sherpa and use it as offline recognizer,
-   which supports the models from icefall, the wenet framework and torchaudio.
+   which supports the models from icefall, the `WeNet`_ framework and torchaudio.
 
 .. |Sherpa offline recognition python api colab notebook| image:: https://colab.research.google.com/assets/colab-badge.svg
    :target: https://colab.research.google.com/drive/1RdU06GcytTpI-r8vkQ7NkI0ugytnwJVB?usp=sharing

diff --git a/docs/source/cpp/pretrained_models/offline_ctc/wenet.rst b/docs/source/cpp/pretrained_models/offline_ctc/wenet.rst
@@ -1,7 +1,7 @@
 WeNet
 =====
 
-This section lists models from `wenet`_.
+This section lists models from `WeNet`_.
 
 wenet-english-model (English)
 -----------------------------

diff --git a/docs/source/onnx/pretrained_models/index.rst b/docs/source/onnx/pretrained_models/index.rst
@@ -21,4 +21,5 @@ available pre-trained models.
    offline-paraformer/index
    offline-ctc/index
    whisper/index
+   wenet/index
    small-online-models
diff --git a/docs/source/onnx/pretrained_models/wenet/all-models.rst b/docs/source/onnx/pretrained_models/wenet/all-models.rst
@@ -0,0 +1,36 @@
+All models from WeNet
+=====================
+
+`<https://github.com/wenet-e2e/wenet/blob/main/docs/pretrained_models.en.md>`_
+lists all pre-trained models from `WeNet`_ and we have converted all of them
+to `sherpa-onnx`_ using the following script:
+
+  `<https://github.com/k2-fsa/sherpa-onnx/blob/master/scripts/wenet/run.sh>`_.
+
+We have uploaded the exported models to huggingface and you can find them from
+the following figure:
+
+  .. figure:: ./pic/wenet-models-onnx-list.jpg
+     :alt: All pretrained models from `WeNet`
+     :width: 600
+
+     All pre-trained models from `WeNet`_.
+
+To make it easier to copy the links, we list them below:
+
+  - `<https://huggingface.co/csukuangfj/sherpa-onnx-zh-wenet-aishell>`_
+  - `<https://huggingface.co/csukuangfj/sherpa-onnx-zh-wenet-aishell2>`_
+  - `<https://huggingface.co/csukuangfj/sherpa-onnx-en-wenet-gigaspeech>`_
+  - `<https://huggingface.co/csukuangfj/sherpa-onnx-en-wenet-librispeech>`_
+  - `<https://huggingface.co/csukuangfj/sherpa-onnx-zh-wenet-multi-cn>`_
+  - `<https://huggingface.co/csukuangfj/sherpa-onnx-zh-wenet-wenetspeech>`_
+
+Colab
+-----
+
+We provide a colab notebook
+|Sherpa-onnx wenet ctc colab notebook|
+for you to try the exported `WeNet`_ models with `sherpa-onnx`_.
+
+.. |Sherpa-onnx wenet ctc colab notebook| image:: https://colab.research.google.com/assets/colab-badge.svg
+   :target: https://github.com/k2-fsa/colab/blob/master/sherpa-onnx/sherpa_onnx_with_models_from_wenet.ipynb
diff --git a/docs/source/onnx/pretrained_models/wenet/how-to-export.rst b/docs/source/onnx/pretrained_models/wenet/how-to-export.rst
@@ -0,0 +1,99 @@
+How to export models from WeNet to sherpa-onnx
+==============================================
+
+Suppose you have the following files from `WeNet`_:
+
+  - ``final.pt``
+  - ``train.yaml``
+  - ``global_cmvn``
+  - ``units.txt``
+
+We describe below how to use scripts from `sherpa-onnx`_ to export your files.
+
+.. hint::
+
+   Both streaming and non-streaming models are supported.
+
+Export for non-streaming inference
+----------------------------------
+
+You can use the following script
+
+  `<https://github.com/k2-fsa/sherpa-onnx/blob/master/scripts/wenet/export-onnx.py>`_
+
+to export your model to `sherpa-onnx`_. After running it, you should get two files:
+
+  - ``model.onnx``
+  - ``model.int8.onnx``.
+
+Next, we rename ``units.txt`` to ``tokens.txt`` to follow the convention used in `sherpa-onnx`_:
+
+.. code-block:: bash
+
+    mv units.txt tokens.txt
+
+Now you can use the following command for speech recognition with the exported models:
+
+.. code-block:: bash
+
+  # with float32 models
+  ./build/bin/sherpa-onnx-offline \
+    --wenet-ctc-model=./model.onnx
+    --tokens=./tokens.txt \
+    /path/to/some.wav
+
+  # with int8 models
+  ./build/bin/sherpa-onnx-offline \
+    --wenet-ctc-model=./model.int8.onnx
+    --tokens=./tokens.txt \
+    /path/to/some.wav
+
+Export for streaming inference
+------------------------------
+
+You can use the following script
+
+  `<https://github.com/k2-fsa/sherpa-onnx/blob/master/scripts/wenet/export-onnx-streaming.py>`_
+
+to export your model to `sherpa-onnx`_. After running it, you should get two files:
+
+  - ``model-streaming.onnx``
+  - ``model-streaming.int8.onnx``.
+
+Next, we rename ``units.txt`` to ``tokens.txt`` to follow the convention used in `sherpa-onnx`_:
+
+.. code-block:: bash
+
+    mv units.txt tokens.txt
+
+Now you can use the following command for speech recognition with the exported models:
+
+.. code-block:: bash
+
+  # with float32 models
+  ./build/bin/sherpa-onnx \
+    --wenet-ctc-model=./model-streaming.onnx
+    --tokens=./tokens.txt \
+    /path/to/some.wav
+
+  # with int8 models
+  ./build/bin/sherpa-onnx \
+    --wenet-ctc-model=./model-streaming.int8.onnx
+    --tokens=./tokens.txt \
+    /path/to/some.wav
+
+FAQs
+----
+
+sherpa-onnx/csrc/online-wenet-ctc-model.cc:Init:144 head does not exist in the metadata
+---------------------------------------------------------------------------------------
+
+.. code-block::
+
+   /Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/online-wenet-ctc-model.cc:Init:144 head does not exist in the metadata
+
+To fix the above error, please check the following two items:
+
+  - Make sure you are using ``model-streaming.onnx`` or ``model-streaing.int8.onnx``. The executable
+    you are running requires a streaming model as input.
+  - Make sure you use the script from `sherpa-onnx`_ to export your model.
diff --git a/docs/source/onnx/pretrained_models/wenet/index.rst b/docs/source/onnx/pretrained_models/wenet/index.rst
@@ -0,0 +1,12 @@
+WeNet
+=====
+
+This page lists all CTC models from `WeNet`_.
+
+
+.. toctree::
+   :maxdepth: 5
+
+   how-to-export
+   all-models
+
diff --git a/docs/source/onnx/pretrained_models/wenet/pic/wenet-models-onnx-list.jpg b/docs/source/onnx/pretrained_models/wenet/pic/wenet-models-onnx-list.jpg
diff --git a/docs/source/sherpa/pretrained_models/index.rst b/docs/source/sherpa/pretrained_models/index.rst
@@ -16,7 +16,7 @@ Two kinds of end-to-end (E2E) models are supported by `k2-fsa/sherpa`_:
 
    For CTC-based models, we support any type of models trained using CTC loss
    as long as you can export the model via torchscript. Models from the following
-   frameworks are currently supported: `icefall`_, `wenet`_, and `torchaudio`_ (Wav2Vec 2.0).
+   frameworks are currently supported: `icefall`_, `WeNet`_, and `torchaudio`_ (Wav2Vec 2.0).
    If you have a CTC model and want it to be supported in `k2-fsa/sherpa`_, please
    create an issue at `<https://github.com/k2-fsa/sherpa/issues>`_.
 
@@ -46,7 +46,7 @@ This page lists all available pre-trained models that you can download.
    for you to try offline recognition step by step.
 
    It shows how to install sherpa and use it as offline recognizer,
-   which supports the models from icefall, the wenet framework and torchaudio.
+   which supports the models from icefall, the `WeNet`_ framework and torchaudio.
 
 .. |Sherpa offline recognition python api colab notebook| image:: https://colab.research.google.com/assets/colab-badge.svg
    :target: https://github.com/k2-fsa/colab/blob/master/sherpa/sherpa_offline_recognition_python_api_demo.ipynb

diff --git a/docs/source/sherpa/pretrained_models/offline_ctc/wenet.rst b/docs/source/sherpa/pretrained_models/offline_ctc/wenet.rst
@@ -1,7 +1,7 @@
 WeNet
 =====
 
-This section lists models from `wenet`_.
+This section lists models from `WeNet`_.
 
 wenet-english-model (English)
 -----------------------------