This repository contains a reimplementation of STaRHuBERT.
This project was developed as part of the CMU-11785 course.
Contributors: Qingzheng Wang, Shun Li, Toris Ye, Fanxing Bu
Key Features:
- Speech Temporal Relation (STaR): Distill the knowledge by focusing on the pairwise temporal relation between two speech frames.
- Temporal Gram Matrix (TGM): Propose Temporal Gram Matrix which aggregates channel information at two time steps.
- Layer-wise TGM: Distill the TGM for every Transformer layer
- Intra-layer TGM: Modify the TGM as computing the temporal relation between the input and output of a single Transformer layer.
Reimplementation Results:
Below is the checkpoint for our reimplementation:
The STaR distillation code is implemented in train_starhubert.py, which we developed as it was not included in the original repository provided by the paper.
You can evaluate our reimplementation on SUPERB downstream tasks using our checkpoint by following the steps below.
-
Clone and install the S3PRL toolkit with
pip install -e ".[all]"
(dev mode). -
Copy the entire
./models/starhubert
folder into<s3prl root>/s3prl/upstream/
. -
Please add upstream importing line in
<s3prl root>/s3prl/hub.py
.from s3prl.upstream.starhubert.hubconf import *
-
Please change each config file of s3prl downstream tasks as follows.
- Uncomment learning rate scheduler
- Learning rate scaled to 10x in spekaer identification (SID) task
-
Run the following command to fine-tune the ARMHuBERT model.
For automatic speech recognition (ASR) as an example:
python run_downstream.py \ -m train \ -n STaRHuBERT \ # You can set your exp name whatever you want -u starhubert \ -d asr \ -k <path to .ckpt file in <git root>/results/pretrain/> \ -g <path to .yaml file in <git root>/results/pretrain/>
Note: Refer to the SUPERB docs for more information on usage details and data preparation.