GitHub - BUTSpeechFIT/DiariZen: A toolkit for speaker diarization.

DiariZen

DiariZen is a speaker diarization toolkit driven by AudioZen and Pyannote 3.1.

Installation

# create virtual python environment
conda create --name diarizen python=3.10
conda activate diarizen

# install diarizen 
conda install pytorch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 pytorch-cuda=12.1 -c pytorch -c nvidia
pip install -r requirements.txt && pip install -e .

# install pyannote-audio
cd pyannote-audio && pip install -e .[dev,testing]

# install dscore
git submodule init
git submodule update

Datasets

We use SDM (first channel from the first far-field microphone array) data from public AMI, AISHELL-4, and AliMeeting for model training and evaluation. Please download these datasets firstly. Our data partition is here.

Usage

download WavLM Base+ model
download ResNet34-LM model
modify the path of used dataset and configuration file
cd recipes/diar_ssl && bash -i run_stage.sh

Pre-trained

our pre-trained checkpoints and the estimated rttm files can be found here. The local experimental path has been anonymized. To use the pre-trained models, please check the diar_ssl/run_stage.sh.
in case you have trouble reproducing our experiments, we also provide the intermediate results of EN2002a, an AMI test recording, during inference for debugging.

Results (SDM)

We aim to make the whole pipeline as simple as possible. Therefore, for the results below:

we did not use any simulated data
we did not apply advanced learning scheduler strategies
we did not perform further domain adaptation to each dataset
all experiments share the same hyper-parameters for clustering

collar=0s                           
--------------------------------------------------------------
System         Features       AMI   AISHELL-4   AliMeeting         
--------------------------------------------------------------
Pyannote3       SincNet       21.1     13.9       22.8

Proposed         Fbank        19.7     12.5       21.0
              WavLM-frozen    17.0     11.7       19.9
              WavLM-updated   15.4     11.7       17.6
--------------------------------------------------------------

collar=0.25s 
--------------------------------------------------------------
System         Features       AMI   AISHELL-4   AliMeeting         
--------------------------------------------------------------
Pyannote3       SincNet       13.7     7.7       13.6

Proposed         Fbank        12.9     6.9       12.6
              WavLM-frozen    10.9     6.1       12.0
              WavLM-updated    9.8     5.9       10.2
--------------------------------------------------------------
Note:
The results above are different from our ICASSP submission. 
We made a few updates to experimental numbers but the conclusions in our paper are as same as the original ones.

Citation

If you found this work helpful, please consider citing: J. Han, F. Landini, J. Rohdin, A. Silnova, M. Diez, and L. Burget, Leveraging Self-Supervised Learning for Speaker Diarization, arXiv preprint arXiv:2409.09408, 2024.

@article{han2024leveragingselfsupervisedlearningspeaker,
      title={Leveraging Self-Supervised Learning for Speaker Diarization}, 
      author={Jiangyu Han and Federico Landini and Johan Rohdin and Anna Silnova and Mireia Diez and Lukas Burget},
      journal={arXiv preprint arXiv:2409.09408},
      year={2024}
}

License

This repository under the MIT license.

Contact

If you have any comment or question, please contact [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
diarizen		diarizen
dscore @ 824f126		dscore @ 824f126
pyannote-audio		pyannote-audio
recipes/diar_ssl		recipes/diar_ssl
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DiariZen

Installation

Datasets

Usage

Pre-trained

Results (SDM)

Citation

License

Contact

About

Releases

Packages

Languages

License

BUTSpeechFIT/DiariZen

Folders and files

Latest commit

History

Repository files navigation

DiariZen

Installation

Datasets

Usage

Pre-trained

Results (SDM)

Citation

License

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages