This is the official repository for the ICSD dataset.
Please note that our paper is currently under review. After the paper is accepted, you can download the audio files and metadata from our provided source URL list on Huggingface
🎤 ICSD is a comprehensive audio event dataset for infant cry and snoring detection with the following features:
- containing over 3.3 hours of strongly labeled data and 1 hour of weakly labeled data;
- containing foreground events and background events for generating synthetic data
The figure below shows the organized structure of the ICSD dataset where audio files are stored in the audio folder and event time-stamp annotations in the metadata folder, each further categorized into train, validation, and test subfolders. Moreover, source materials for generating synthetic strongly labeled data are also provided. You can use Scaper to generate your own synthetic data.
Detailed description for the dataset could be found in our paper.
To use the ICSD dataset, you can download the audio files and metada from our provided source URL list on HuggingFace.
Please note that ICSD doesn't own the copyright of the audios; the copyright remains with the original owners of the video or audio.
The demo
folder provides four audio samples that you can download and listen to.
We designed our baseline system based on DCASE 2023 Challenge task4
The script conda_create_environment.sh
is available to create an environment which runs the baseline system.
You can download the ICSD dataset using the script: download_ICSD.py
- Visit our Hugging Face repository to request access permissions. After the review process, you will receive authorization to download the ICSD dataset.
- After obtaining permission, Navigate to your Hugging Face settings to generate your personal token. Under 'Repositories permissions', enter
datasets/QingyuLiu1/ICSD
. For detailed information about tokens, please refer to the official documentation.
-
Run the command
python download_ICSD.py --token=your_token
. Here,your_token
is the one you've generated from your Hugging Face settings. Following this, the ICSD dataset will be downloaded into thedata
folder and automatically unzipped.If the script
download_ICSD.py
cannot run due to network issues, you can manually download the Dataset.zip from Hugging Face and unzip it into thedata
folder. -
In addition to the well-organized dataset, we also provide source materials for generating synthetic strongly labeled data. You can download these by running the command
python download_ICSD.py --token=your_token --file_name=Materials.zip --local_dir=your_folder
.
Three baselines are provided:
- baseline with only synthetic data
- baseline with real data and synthetic data
- baseline using pre-trained embedding
The baseline using the synthetic strongly labeled data can be run from scratch using the following command:
python train_sed.py
We provide a pretrained checkpoint. The baseline can be tested on the test set of the dataset using the following command:
python train_sed.py --test_from_checkpoint /path/to/synth_only.ckpt
The baseline using the real strongly labeled data and synthetic data can be run from scratch using the following command:
python train_sed.py --strong_real
The command will automatically considered the strong labeled data in the training process.
We provide a pretrained checkpoint. The baseline can be tested on the test set of the dataset using the following command:
python train_sed.py --test_from_checkpoint /path/to/hybrid.ckpt
We added a baseline which exploits the pre-trained model BEATs. It's an iterative self-supervised learning model designed to extract high-level non-speech audio semantics. The BEATs feature representations are integrated with the CNN output through a linear transformation and layer normalization, providing additional complementary information that can enhance sound event detection performance.
To run this system, you should first pre-compute the embeddings using the following command: python extract_embeddings.py --output_dir ./embeddings --pretrained_model "beats"
Then, use the following command to run the system:
train_pretrained.py
We provide a pretrained checkpoint. The baseline can be tested on the test set of the dataset using the following command:
python train_pretrained.py --test_from_checkpoint /path/to/BEATS.ckpt
We acknowledge the wonderful work by these excellent developers!
- Audioset: agkphysics/AudioSet
- Baby Chillanto Database
- Donate A Cry: gveres/donateacry-corpus
- Female and Male Snoring: orannahum/female-and-male-snoring
- Snoring: tareqkhanemu/snoring
- ESC-50: karolpiczak/ESC-50
- SINS: KULeuvenADVISE/SINS_database
- MUSAN: MUSAN-openslr.org
- Scaper: justinsalamon/scaper
If you use the ICSD dataset, please cite the following papers:
@article{ICSD,
title={ICSD: An Open-source Dataset for Infant Cry and Snoring Detection},
author={Qingyu Liu, Longfei Song, Dongxing Xu, Yanhua Long},
journal={arXiv},
volume={abs/2408.10561}
year={2024}
}