Skip to content

piotrkawa/audio-deepfake-source-tracing

Repository files navigation

Baselines for Interspeech 2025 Special Session on Source Tracing

The following repository contains baselines to start your work with the task of DeepFake Source Tracing as part of Source tracing: The origins of synthetic or manipulated speech INTERSPEECH 2025 Special Session.

Attribution

Special thanks to Resemble AI and AI4Trust project for their support and affiliation.

Contributors

Before you start

Download dataset

The baseline is based on the MLAAD (Source Tracing Protocols) dataset. To download the required resources run:

python scripts/download_resources.py

The default scripts' arguments assume that all the required data is put into data dir in the project root directory.

Install dependencies

Install all the required dependencies from the requirements.txt file. The baseline was created using Python 3.11.

pip install -r requirements.txt

GE2E + Wav2Vec2.0 Baseline

To train the feature extractor based on Wav2Vec2.0-based encoder using GE2E-Loss run:

python train_ge2e.py --config configs/config_ge2e.yaml

REFD Baseline

This baseline builds upon the work of Xie et al. "Generalized Source Tracing: Detecting Novel Audio Deepfake Algorithm with Real Emphasis and Fake Dispersion Strategy" and its associated Github repo.

The work uses a data augmentation technique and an OOD detection method to improve the classification of unseen deepfake algorithms. However, in this repository we implement the very basic setup, and leave potential authors the option to improve upon it.

More details here

Download data augmentation datasets

For the required data augmentation step you will need the MUSAN and RIRS_NOISES datasets.

Step 1. Data augmentation and feature extraction

The first step of the tool reads the original MLAAD data, augments it with random noise and RIR and extracts the wav2vec2-base features needed to train the AASIST model. Additional parameters can be set from the script, such as max length, model, etc.

python scripts/preprocess_dataset.py

Output will be written to exp/preprocess_wav2vec2-base/. You can change the path in the script.

Step 2. Train a AASIST model on top of the wav2vec2-base features

Using the augmented features, we then train an AASIST model for 30 epochs. The model is able to classify the samples with respect to the source system. The class assignment will be written to exp/label_assignment.txt.

python train_refd.py

Step 3. Get the classification metrics for the known (in-domain) classes

Given the trained model stored in exp/trained_models/, we can now compute its accuracy over known classes (those seen during training time).

python scripts/get_classification_metrics.py

The script will limit the data in the dev and eval sets to the samples which are from the known systems (i.e. those also present in the training data) and compute their classification metrics.

Step 4. Run the OOD detector and evaluate it

python scripts/ood_detector.py --feature_extraction_step

The script builds an NSD OOD detector as described in the original paper. The OOD detector is based on the hidden states and logits of the AASIST model. It first extracts all this info from the trained model and stores it in separate dicts. It then loads the training data and determines the in-domain scores.

It then computes the scores for the development set. Based on these scores for which we know the OOD class assignments it determines the EER and associated threshold. The computed threshold is then used for providing the classification into OOD and known systems metrics for the evaluation data.

The baseline results at this point is a 63% EER with an F1-score of 0.31 for the eval data.

License

This repository is licensed under the CC BY-NC 4.0 License for original content.

Exceptions:

  • Portions of this repository include code from REFD repository, which does not have a license.
  • As per copyright law, such code is "All Rights Reserved" and is not covered by the CC BY-NC license. Users should not reuse or redistribute it without the original author's explicit permission.

References

The following repository is built using the following open-source repositories:

REFD

@inproceedings{xie24_interspeech,
  title     = {Generalized Source Tracing: Detecting Novel Audio Deepfake Algorithm with Real Emphasis and Fake Dispersion Strategy},
  author    = {Yuankun Xie and Ruibo Fu and Zhengqi Wen and Zhiyong Wang and Xiaopeng Wang and Haonnan Cheng and Long Ye and Jianhua Tao},
  year      = {2024},
  booktitle = {Interspeech 2024},
  pages     = {4833--4837},
  doi       = {10.21437/Interspeech.2024-254},
  issn      = {2958-1796},
}

Coqui.ai TTS

@software{Eren_Coqui_TTS_2021,
  author = {Eren, Gölge and {The Coqui TTS Team}},
  doi = {10.5281/zenodo.6334862},
  license = {MPL-2.0},
  month = jan,
  title = {{Coqui TTS}},
  url = {https://github.com/coqui-ai/TTS},
  version = {1.4},
  year = {2021}
}