The following repository contains baselines to start your work with the task of DeepFake Source Tracing as part of Source tracing: The origins of synthetic or manipulated speech INTERSPEECH 2025 Special Session.
Special thanks to Resemble AI and AI4Trust project for their support and affiliation.
The baseline is based on the MLAAD (Source Tracing Protocols) dataset. To download the required resources run:
python scripts/download_resources.py
The default scripts' arguments assume that all the required data is put into data
dir in the project root directory.
Install all the required dependencies from the requirements.txt
file. The baseline was created using Python 3.11.
pip install -r requirements.txt
To train the feature extractor based on Wav2Vec2.0-based encoder using GE2E-Loss run:
python train_ge2e.py --config configs/config_ge2e.yaml
This baseline builds upon the work of Xie et al. "Generalized Source Tracing: Detecting Novel Audio Deepfake Algorithm with Real Emphasis and Fake Dispersion Strategy" and its associated Github repo.
The work uses a data augmentation technique and an OOD detection method to improve the classification of unseen deepfake algorithms. However, in this repository we implement the very basic setup, and leave potential authors the option to improve upon it.
More details here
For the required data augmentation step you will need the MUSAN and RIRS_NOISES datasets.
The first step of the tool reads the original MLAAD data, augments it with random noise and RIR and extracts
the wav2vec2-base
features needed to train the AASIST model. Additional parameters can be set from the script,
such as max length, model, etc.
python scripts/preprocess_dataset.py
Output will be written to exp/preprocess_wav2vec2-base/
. You can change the path in the script.
Using the augmented features, we then train an AASIST model for 30 epochs. The model is able to classify the samples
with respect to the source system. The class assignment will be written to exp/label_assignment.txt
.
python train_refd.py
Given the trained model stored in exp/trained_models/
, we can now compute its accuracy over known classes (those
seen during training time).
python scripts/get_classification_metrics.py
The script will limit the data in the dev
and eval
sets to the samples which are from the known systems
(i.e. those also present in the training data) and compute their classification metrics.
python scripts/ood_detector.py --feature_extraction_step
The script builds an NSD OOD detector as described in the original paper. The OOD detector is based on the hidden states and logits of the AASIST model. It first extracts all this info from the trained model and stores it in separate dicts. It then loads the training data and determines the in-domain scores.
It then computes the scores for the development set. Based on these scores for which we know the OOD class assignments it determines the EER and associated threshold. The computed threshold is then used for providing the classification into OOD and known systems metrics for the evaluation data.
The baseline results at this point is a 63% EER with an F1-score of 0.31 for the eval data.
This repository is licensed under the CC BY-NC 4.0 License for original content.
- Portions of this repository include code from REFD repository, which does not have a license.
- As per copyright law, such code is "All Rights Reserved" and is not covered by the CC BY-NC license. Users should not reuse or redistribute it without the original author's explicit permission.
The following repository is built using the following open-source repositories:
@inproceedings{xie24_interspeech,
title = {Generalized Source Tracing: Detecting Novel Audio Deepfake Algorithm with Real Emphasis and Fake Dispersion Strategy},
author = {Yuankun Xie and Ruibo Fu and Zhengqi Wen and Zhiyong Wang and Xiaopeng Wang and Haonnan Cheng and Long Ye and Jianhua Tao},
year = {2024},
booktitle = {Interspeech 2024},
pages = {4833--4837},
doi = {10.21437/Interspeech.2024-254},
issn = {2958-1796},
}
@software{Eren_Coqui_TTS_2021,
author = {Eren, Gölge and {The Coqui TTS Team}},
doi = {10.5281/zenodo.6334862},
license = {MPL-2.0},
month = jan,
title = {{Coqui TTS}},
url = {https://github.com/coqui-ai/TTS},
version = {1.4},
year = {2021}
}