Skip to content

QingyuLiu0521/ICSD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ICSD: An Open-source Dataset for Infant Cry and Snoring Detection

arXiv hf

This is the official repository for the ICSD dataset.

Please note that our paper is currently under review. After the paper is accepted, you can download the audio files and metadata from our provided source URL list on Huggingface

About ⭐️

🎤 ICSD is a comprehensive audio event dataset for infant cry and snoring detection with the following features:

  • containing over 3.3 hours of strongly labeled data and 1 hour of weakly labeled data;
  • containing foreground events and background events for generating synthetic data

The figure below shows the organized structure of the ICSD dataset where audio files are stored in the audio folder and event time-stamp annotations in the metadata folder, each further categorized into train, validation, and test subfolders. Moreover, source materials for generating synthetic strongly labeled data are also provided. You can use Scaper to generate your own synthetic data.

folder

Detailed description for the dataset could be found in our paper.

To use the ICSD dataset, you can download the audio files and metada from our provided source URL list on HuggingFace.

Please note that ICSD doesn't own the copyright of the audios; the copyright remains with the original owners of the video or audio.

Data Preview 🔍

The demo folder provides four audio samples that you can download and listen to.

Baseline system 🖥️

We designed our baseline system based on DCASE 2023 Challenge task4

Requirements 📝

The script conda_create_environment.sh is available to create an environment which runs the baseline system.

Data Download ⬇️

You can download the ICSD dataset using the script: download_ICSD.py

Usage:

  1. Visit our Hugging Face repository to request access permissions. After the review process, you will receive authorization to download the ICSD dataset.
  2. After obtaining permission, Navigate to your Hugging Face settings to generate your personal token. Under 'Repositories permissions', enter datasets/QingyuLiu1/ICSD. For detailed information about tokens, please refer to the official documentation.
RepositoriesPermissions
  1. Run the command python download_ICSD.py --token=your_token. Here, your_token is the one you've generated from your Hugging Face settings. Following this, the ICSD dataset will be downloaded into the data folder and automatically unzipped.

    If the script download_ICSD.py cannot run due to network issues, you can manually download the Dataset.zip from Hugging Face and unzip it into the data folder.

  2. In addition to the well-organized dataset, we also provide source materials for generating synthetic strongly labeled data. You can download these by running the command python download_ICSD.py --token=your_token --file_name=Materials.zip --local_dir=your_folder.

Training 👨‍💻

Three baselines are provided:

  • baseline with only synthetic data
  • baseline with real data and synthetic data
  • baseline using pre-trained embedding

1. Baseline with only synthetic data

The baseline using the synthetic strongly labeled data can be run from scratch using the following command:

python train_sed.py

We provide a pretrained checkpoint. The baseline can be tested on the test set of the dataset using the following command:

python train_sed.py --test_from_checkpoint /path/to/synth_only.ckpt

2. Baseline with real data and synthetic data

The baseline using the real strongly labeled data and synthetic data can be run from scratch using the following command:

python train_sed.py --strong_real

The command will automatically considered the strong labeled data in the training process.

We provide a pretrained checkpoint. The baseline can be tested on the test set of the dataset using the following command:

python train_sed.py --test_from_checkpoint /path/to/hybrid.ckpt

3. Baseline using pre-trained embedding

We added a baseline which exploits the pre-trained model BEATs. It's an iterative self-supervised learning model designed to extract high-level non-speech audio semantics. The BEATs feature representations are integrated with the CNN output through a linear transformation and layer normalization, providing additional complementary information that can enhance sound event detection performance.

To run this system, you should first pre-compute the embeddings using the following command: python extract_embeddings.py --output_dir ./embeddings --pretrained_model "beats" Then, use the following command to run the system: train_pretrained.py

We provide a pretrained checkpoint. The baseline can be tested on the test set of the dataset using the following command:

python train_pretrained.py --test_from_checkpoint /path/to/BEATS.ckpt

Acknowledgement 🔔

We acknowledge the wonderful work by these excellent developers!

Reference 📖

If you use the ICSD dataset, please cite the following papers:

@article{ICSD,
      title={ICSD: An Open-source Dataset for Infant Cry and Snoring Detection},
      author={Qingyu Liu, Longfei Song, Dongxing Xu, Yanhua Long},
      journal={arXiv},
      volume={abs/2408.10561}
      year={2024}
}

About

ICSD Dataset

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published