A repository for comparing potential speaker diarization tools to be used in the MEXCA pipeline.
The repository contains subdirectories for different parts of the experiment:
speaker-diarization\
: Contains all files for the speaker diarization partembeddings\
: Contains the encoded speaker embeddings as .pt filesresults\
: Contains the .rttm files with speaker annotationsclustering.py
: Script for clustering the speaker embeddings and assigning the speaker labels to speaker segmentssd_*.py
: Scripts for applying the respective speaker encoding modelscompare_sd.ipynb
: Notebook for comparing the speaker diarization approachesspeaker_diarization.py
: Script to run all speaker encoding scripts after each otherspeaker_representation.py
: Helper functions for performing speaker diarization
voice-activity-detection\
: Contains all files for the voice activity detection partresults\
: Contains the .rttm files with speech segmentscompare_vad.ipynb
: Notebook for comparing the voice activity detection approachescustom.conf
: Configuration file for the opensmile feature extractoropensmile_helper_functions
: Helper functions for extracting opensmile voice activity featuresvad_*.py
: Scripts for applying the voice activity detection models
explore_ami_corpus.ipynb
: Notebook for exploring the properties of the AMI corpusrttm.py
: Functions for creating, reading, modifying, and writing .rttm files and objectsrttm_test.py
: Preliminary test suite forrttm.py