StoRM: A Stochastic Regeneration Model for Speech Enhancement and Dereverberation

This repository contains the official PyTorch implementation for the paper:

StoRM: A Stochastic Regeneration Model for Speech Enhancement And Dereverberation, 2022 [1]

Audio examples and supplementary materials are available on our project page.

Installation

Create a new virtual environment with Python 3.8 (we have not tested other Python versions, but they may work).
Install the package dependencies via pip install -r requirements.txt.
Your logs will be stored as local TensorBoard logs. Run tensorboard --logdir logs/ to see them.

Pretrained checkpoints

We provide pretrained checkpoints for the models trained on TIMIT+Chime3 (enhancement), WSJ0+Chime3 (enhancement), Voicebank/DEMAND (enhancement) and WSJ0+Reverb (dereverberation), as in the original paper [1]. We also included the checkpoints for WSJ0+Wind as in [3]. All checkpoints can be downloaded here.

Usage:

For resuming training, you can use the --resume_from_checkpoint option of train.py.
For evaluating these checkpoints, use the --ckpt option of enhancement.py (see section Evaluation below).

Training

Training is done by executing train.py. A minimal running example with default settings (as in our paper [2]) can be run with

python train.py --format <your_format>--base_dir <your_base_dir> --gpus 0,

where

your_base_dir should be a path to a folder containing subdirectories train/ and valid/ (optionally test/ as well). The subdirectory structure depends on your_format:
- your_format=wsj0: Each subdirectory must itself have two subdirectories clean/ and noisy/, with the same filenames present in both.
- Add formats on your own, correpsonding to your data structure

To see all available training options, run python train.py --help. These include options for the backbone DNN, the SDE parameters, the PytorchLightning Trainer usual parameters such as max_epochs, limit_train_batches and so on.

Note:

This paper [1] uses a lighter configuration of the NCSN++ backbone with 27.8M parameters, which is passed with --backbone ncsnpp by default. By contrast, the architecture used in the paper [2] uses --backbone ncsnpp-large which is the baseline 65M parameters NCSN++.

Evaluation

To evaluate on a test set, run

python enhancement.py --test_dir <your_test_dir> --enhanced_dir <your_enhanced_dir> --ckpt <path_to_model_checkpoint>

to generate the enhanced .wav files. The --cpkt parameter of enhancement.py should be the path to a trained model checkpoint, as stored by the logger in logs/.

Data Creation

In preprocessing/, you will find the data generation script used to create all the datasets used in the paper. Minimal example is:

    cd preprocessing;
    python3 create_data.py --task <your_task> --speech <your_speech_format> --noise <your_noise_data>

Please check the script for other options

For the wind noise generation scripts and non-linear mixing technique presented in [3], we refer the reader to [4] and suggest asking the authors about their wind noise generator code. We only provide here the script for parsing the commands to that generator + the non-linar mixing method. We are not responsible for distribution of the code by [4].

Citations / References

We kindly ask you to cite our papers in your publication when using any of our research or code:

@article{lemercier2023storm,
  author={Lemercier, Jean-Marie and Richter, Julius and Welker, Simon and Gerkmann, Timo},
  journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing}, 
  title={StoRM: A Diffusion-Based Stochastic Regeneration Model for Speech Enhancement and Dereverberation}, 
  year={2023},
  volume={31},
  number={},
  pages={2724-2737},
  doi={10.1109/TASLP.2023.3294692}}

@inproceedings{lemercier2023wind,
  author={Lemercier, Jean-Marie and Thiemannm, Joachim and Konig, Raphael and Gerkmann, Timo},
  booktitle={VDE 15th ITG conference on Speech Communication}, 
  title={Wind Noise Reduction with a Diffusion-based Stochastic Regeneration Model}, 
  year={2023}}

[1] Jean-Marie Lemercier, Julius Richter, Simon Welker, and Timo Gerkmann. "StoRM: A Stochastic Regeneration Model for Speech Enhancement And Dereverberation", IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 2724-2737, 2023.

[2] Julius Richter, Simon Welker, Jean-Marie Lemercier, Bunlong Lay and Timo Gerkmann. "Speech Enhancement and Dereverberation with Diffusion-Based Generative Models", IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 2351-2364, 2023.

[3] Jean-Marie Lemercier, Joachim Thiemann, Raphael Konig and Timo Gerkmann. "Wind Noise Reduction with a Diffusion-based Stochastic Regeneration Model", ITG Speech Communication, Aachen, Germany, 2023

[4] D. Mirabilii et al. "Simulating wind noise with airflow speed-dependent characteristics,” Int. Workshop on Acoustic Signal Enhancement, Aachen, Germany, 2022

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
preprocessing		preprocessing
sgmse		sgmse
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
enhancement.py		enhancement.py
inference.png		inference.png
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

StoRM: A Stochastic Regeneration Model for Speech Enhancement and Dereverberation

Installation

Pretrained checkpoints

Training

Evaluation

Data Creation

Citations / References

About

Releases

Packages

Languages

License

fraunhofer-iais/storm

Folders and files

Latest commit

History

Repository files navigation

StoRM: A Stochastic Regeneration Model for Speech Enhancement and Dereverberation

Installation

Pretrained checkpoints

Training

Evaluation

Data Creation

Citations / References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages