`pyannote.audio` speaker diarization toolkit

Using pyannote.audio open-source toolkit in production?
Consider switching to pyannoteAI for better and faster options.

`pyannote.audio` speaker diarization toolkit

pyannote.audio is an open-source toolkit written in Python for speaker diarization. Based on PyTorch machine learning framework, it comes with state-of-the-art pretrained models and pipelines, that can be further finetuned to your own data for even better performance.

TL;DR

Install pyannote.audio with pip install pyannote.audio
Accept pyannote/segmentation-3.0 user conditions
Accept pyannote/speaker-diarization-3.1 user conditions
Create access token at hf.co/settings/tokens.

from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained(
    "pyannote/speaker-diarization-3.1",
    use_auth_token="HUGGINGFACE_ACCESS_TOKEN_GOES_HERE")

# send pipeline to GPU (when available)
import torch
pipeline.to(torch.device("cuda"))

# apply pretrained pipeline
diarization = pipeline("audio.wav")

# print the result
for turn, _, speaker in diarization.itertracks(yield_label=True):
    print(f"start={turn.start:.1f}s stop={turn.end:.1f}s speaker_{speaker}")
# start=0.2s stop=1.5s speaker_0
# start=1.8s stop=3.9s speaker_1
# start=4.2s stop=5.7s speaker_0
# ...

Highlights

🤗 pretrained pipelines (and models) on 🤗 model hub
🤯 state-of-the-art performance (see Benchmark)
🐍 Python-first API
⚡ multi-GPU training with pytorch-lightning

Documentation

Changelog
Frequently asked questions
Models
- Available tasks explained
- Applying a pretrained model
- Training, fine-tuning, and transfer learning
Pipelines
- Available pipelines explained
- Applying a pretrained pipeline
- Adapting a pretrained pipeline to your own data
- Training a pipeline
Contributing
- Adding a new model
- Adding a new task
- Adding a new pipeline
- Sharing pretrained models and pipelines
Blog
- 2022-12-02 > "How I reached 1st place at Ego4D 2022, 1st place at Albayzin 2022, and 6th place at VoxSRC 2022 speaker diarization challenges"
- 2022-10-23 > "One speaker segmentation model to rule them all"
- 2021-08-05 > "Streaming voice activity detection with pyannote.audio"
Videos
- Introduction to speaker diarization / JSALT 2023 summer school / 90 min
- Speaker segmentation model / Interspeech 2021 / 3 min
- First release of pyannote.audio / ICASSP 2020 / 8 min
Community contributions (not maintained by the core team)
- 2024-04-05 > Offline speaker diarization (speaker-diarization-3.1) by Simon Ottenhaus

Benchmark

Out of the box, pyannote.audio speaker diarization pipeline v3.1 is expected to be much better (and faster) than v2.x. Those numbers are diarization error rates (in %):

Benchmark	v2.1	v3.1	pyannoteAI
AISHELL-4	14.1	12.2	11.9
AliMeeting (channel 1)	27.4	24.4	22.5
AMI (IHM)	18.9	18.8	16.6
AMI (SDM)	27.1	22.4	20.9
AVA-AVD	66.3	50.0	39.8
CALLHOME (part 2)	31.6	28.4	22.2
DIHARD 3 (full)	26.9	21.7	17.2
Earnings21	17.0	9.4	9.0
Ego4D (dev.)	61.5	51.2	43.8
MSDWild	32.8	25.3	19.8
RAMC	22.5	22.2	18.4
REPERE (phase2)	8.2	7.8	7.6
VoxConverse (v0.3)	11.2	11.3	9.4

Diarization error rate (in %)

Citations

If you use pyannote.audio please use the following citations:

@inproceedings{Plaquet23,
  author={Alexis Plaquet and Hervé Bredin},
  title={{Powerset multi-class cross entropy loss for neural speaker diarization}},
  year=2023,
  booktitle={Proc. INTERSPEECH 2023},
}

@inproceedings{Bredin23,
  author={Hervé Bredin},
  title={{pyannote.audio 2.1 speaker diarization pipeline: principle, benchmark, and recipe}},
  year=2023,
  booktitle={Proc. INTERSPEECH 2023},
}

Development

The commands below will setup pre-commit hooks and packages needed for developing the pyannote.audio library.

pip install -e .[dev,testing]
pre-commit install

Test

pytest

Name	Name	Last commit message	Last commit date
Latest commit hbredin Merge branch 'release/3.3.2' Sep 11, 2024 240a7f3 · Sep 11, 2024 History 2,400 Commits
.faq	.faq	ci: update suggest.md (#1435 )	Sep 20, 2023
.github	.github	setup: drop support for Python 3.8 (#1728 )	Jun 19, 2024
doc	doc	doc(setup): update ipython (8.10.0) and Sphinx (3.0.4) (#1391 )	Nov 17, 2023
notebook	notebook	BREAKING(io): update Audio behavior for multi-channel audio	Apr 21, 2023
pyannote	pyannote	doc: fix Pipeline docstring	Aug 19, 2024
questions	questions	feat: add FAQ based on faqtory (#1283 )	Mar 16, 2023
tests	tests	fix: fix support for `speechbrain==1.0` (#1659 )	Jun 19, 2024
tutorials	tutorials	community: add tutorial for offline use of pyannote/speaker-diarizati…	Apr 5, 2024
.gitattributes	.gitattributes	github: update linguist configuration	Jun 24, 2020
.gitignore	.gitignore	feat: add huggingface model hub integration	Jan 5, 2021
.gitmodules	.gitmodules	git: add AMI-diarization-setup submodule	Apr 27, 2021
.pre-commit-config.yaml	.pre-commit-config.yaml	fix: update `isort` version to 5.12.0 in pre-commit-config (#1596 )	Dec 22, 2023
CHANGELOG.md	CHANGELOG.md	doc: update changelog	Sep 11, 2024
CODE_OF_CONDUCT.md	CODE_OF_CONDUCT.md	doc: add code of conduct (#1560 )	Nov 24, 2023
FAQ.md	FAQ.md	feat: add FAQ based on faqtory (#1283 )	Mar 16, 2023
LICENSE	LICENSE	feat: initial import	Oct 12, 2020
MANIFEST.in	MANIFEST.in	feat: add pyannote.audio.sample.SAMPLE_FILE (#1629 )	Jan 25, 2024
README.md	README.md	doc: update README.md to reference pyannoteAI	May 10, 2024
codecov.yml	codecov.yml	ci: simplify codecov support	Mar 16, 2021
environment.yaml	environment.yaml	fix: update installation instructions and requirements (#653 )	Apr 7, 2021
faq.yml	faq.yml	feat: add FAQ based on faqtory (#1283 )	Mar 16, 2023
requirements.txt	requirements.txt	fix: fix support for `speechbrain==1.0` (#1659 )	Jun 19, 2024
setup.cfg	setup.cfg	setup: drop support for Python 3.8 (#1728 )	Jun 19, 2024
setup.py	setup.py	setup: drop support for Python 3.8 (#1728 )	Jun 19, 2024
version.txt	version.txt	setup: bump version	Sep 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

`pyannote.audio` speaker diarization toolkit

TL;DR

Highlights

Documentation

Benchmark

Citations

Development

Test

About

Releases 9

Used by 427

Contributors 60

Languages

License

pyannote/pyannote-audio

Folders and files

Latest commit

History

Repository files navigation

pyannote.audio speaker diarization toolkit

TL;DR

Highlights

Documentation

Benchmark

Citations

Development

Test

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases 9

Used by 427

Contributors 60

Languages

`pyannote.audio` speaker diarization toolkit