Skip to content

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

License

Notifications You must be signed in to change notification settings

marianne-m/pyannote-audio

 
 

Repository files navigation

Neural speaker diarization with pyannote.audio

pyannote.audio is an open-source toolkit written in Python for speaker diarization. Based on PyTorch machine learning framework, it provides a set of trainable end-to-end neural building blocks that can be combined and jointly optimized to build speaker diarization pipelines.

TL;DR Open In Colab

# instantiate pretrained speaker diarization pipeline
from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization")

# apply pretrained pipeline
diarization = pipeline("audio.wav")

# print the result
for turn, _, speaker in diarization.itertracks(yield_label=True):
    print(f"start={turn.start:.1f}s stop={turn.end:.1f}s speaker_{speaker}")
# start=0.2s stop=1.5s speaker_A
# start=1.8s stop=3.9s speaker_B
# start=4.2s stop=5.7s speaker_A
# ...

What's new in pyannote.audio 2.0

For version 2.0 of pyannote.audio, I decided to rewrite almost everything from scratch. Highlights of this release are:

Installation

Only Python 3.8+ is officially supported (though it might work with Python 3.7)

conda create -n pyannote python=3.8
conda activate pyannote
conda install pytorch torchaudio -c pytorch
pip install https://github.com/pyannote/pyannote-audio/archive/develop.zip

Documentation

Benchmark

The pretrained speaker diarization pipeline with default parameters is expected to be much better in v2.0 than in v1.1:

Diarization error rate (%) v1.1 v2.0 ∆DER
AMI only_words evaluation set 29.7 21.5 -28%
DIHARD 3 evaluation set 29.2 22.2 -23%
VoxConverse 0.0.2 evaluation set 21.5 12.8 -40%

Here is the (pseudo-)code used to obtain those numbers:

# v1.1
import torch
pipeline = torch.hub.load("pyannote/pyannote-audio", "dia")
diarization = pipeline({"audio": "audio.wav"})

# v2.0
from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization")
diarization = pipeline("audio.wav")

# evaluation
from pyannote.metrics.diarization import DiarizationErrorRate
metric = DiarizationErrorRate(collar=0.0, skip_overlap=False)
for audio, reference in evaluation_set:  # pseudo-code
    diarization = pipeline(audio)
    _ = metric(reference, diarization)
der = abs(metric)

Support

For commercial enquiries and scientific consulting, please contact me.

Development

The commands below will setup pre-commit hooks and packages needed for developing the pyannote.audio library.

pip install -e .[dev,testing]
pre-commit install

Tests rely on a set of debugging files available in test/data directory. Set PYANNOTE_DATABASE_CONFIG environment variable to test/data/database.yml before running tests:

PYANNOTE_DATABASE_CONFIG=tests/data/database.yml pytest

About

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 94.1%
  • Jupyter Notebook 3.9%
  • JavaScript 1.6%
  • HTML 0.4%