add Chinese distill-whisper fine-tuning results #1648

yuekaizhang · 2024-06-12T08:57:56Z

In #1605, we fine-tuned whisper using 14k hours Chinese data. This PR added decoding results for distill-whisper fine-tuning experiment.

Instead of actually using distillation loss for training, the model structure and parameter initialization method from the distill-whisper paper (https://arxiv.org/abs/2311.00430) were adopted: only the first and last layers of the decoder were retained.

Accuracy:
Distill-whisper is slightly worse comparing with norm whisper.

Model	CER (Average SPEECH_IO_TEST_SET 01-26)	Training Set
whisper-large-ft-v1	4.32%	multi-hans-zh (about 14k hours)
whisper-large-ft-v1-distill	4.71%	multi-hans-zh (about 14k hours)

Speed:
Every decoding step could acclerate about 4x comparing with the original decoder.
Norm whisper: 32 decoder layers

Distill-whisper: 2 decoder layers

For a quick test: https://huggingface.co/yuekai/icefall_asr_multi-hans-zh_whisper/blob/main/test_model.py

JinZr

LGTM, thanks!

add distill whisper results

0936c80

yuekaizhang assigned JinZr Jun 12, 2024

JinZr approved these changes Jun 12, 2024

View reviewed changes

JinZr merged commit d5be739 into k2-fsa:master Jun 12, 2024
253 checks passed

yfyeung pushed a commit to yfyeung/icefall that referenced this pull request Aug 9, 2024

add distill whisper results (k2-fsa#1648)

4446a04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add Chinese distill-whisper fine-tuning results #1648

add Chinese distill-whisper fine-tuning results #1648

yuekaizhang commented Jun 12, 2024

JinZr left a comment

add Chinese distill-whisper fine-tuning results #1648

add Chinese distill-whisper fine-tuning results #1648

Conversation

yuekaizhang commented Jun 12, 2024

JinZr left a comment

Choose a reason for hiding this comment