From d98e949c6064a55df19cd2205d67a5a34db184b6 Mon Sep 17 00:00:00 2001 From: asu Date: Fri, 25 Oct 2024 14:35:26 +0200 Subject: [PATCH] Add results and notices for results for GigaSpeech transducer & wavlm --- recipes/GigaSpeech/ASR/CTC/README.md | 4 +++- recipes/GigaSpeech/ASR/transducer/README.md | 12 +++++++++++- 2 files changed, 14 insertions(+), 2 deletions(-) diff --git a/recipes/GigaSpeech/ASR/CTC/README.md b/recipes/GigaSpeech/ASR/CTC/README.md index 3e0af9b63b..488906ecb6 100644 --- a/recipes/GigaSpeech/ASR/CTC/README.md +++ b/recipes/GigaSpeech/ASR/CTC/README.md @@ -62,7 +62,9 @@ This can be done by modifying the current recipe. We invite you to have a look a | Release | Hyperparams file | Decoding method | Finetuning Split | Test WER | Dev WER | HuggingFace link | Full model link | Training GPUs | |:-------------:|:---------------------------:| :----------:| :-----:| :-----:| :-----:| :-----:| :-----:| :-----:| -| 05-08-23 | train_hf_wavlm.yaml | GreedySearch | XL | xx | xx | TBD | TBD | 4xRTX 3090 | +| 25-10-2024 | train_hf_wavlm.yaml | GreedySearch | XL | 11.88% | 11.86% | Unavailable\* | Unavailable\* | 8xRTX 3090 | + +\*: Unfortunately, we are unable to upload the checkpoints for the WavLM model at this time. We currently don't have plans to remedy this. # **Citing SpeechBrain** Please, cite SpeechBrain if you use it for your research or business. diff --git a/recipes/GigaSpeech/ASR/transducer/README.md b/recipes/GigaSpeech/ASR/transducer/README.md index b2a52a2648..d672f3c45c 100644 --- a/recipes/GigaSpeech/ASR/transducer/README.md +++ b/recipes/GigaSpeech/ASR/transducer/README.md @@ -48,10 +48,18 @@ According to our tests, the performance is not affected. Results are obtained with beam search and no LM (no-streaming i.e. full context). +**TBD: The final models are currently in training.** This model has already been succesfully trained, though. This will be updated when the checkpoints are ready for download. + + + + ## Streaming model @@ -74,6 +82,8 @@ may end up forming indirect dependencies to audio many seconds ago. | | full | cs=32 (1280ms) | 16 (640ms) | 8 (320ms) | |:-----:|:----:|:-----:|:-----:|:-----:| +**TBD: The final models are currently in training.** This model has already been succesfully trained, though. This will be updated when the checkpoints are ready for download. + ### Inference Once your model is trained, you need a few manual steps in order to use it with the high-level streaming interfaces (`speechbrain.inference.ASR.StreamingASR`):