Skip to content
View GTSinger's full-sized avatar

Block or report GTSinger

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
GTSinger/README.md

GTSinger: A Global Multi-Technique Singing Corpus with Realistic Music Scores for All Singing Tasks

Yu Zhang*, Changhao Pan*, Wenxiang Guo*, Ruiqi Li, Zhiyuan Zhu, Jialei Wang, Wenhao Xu, Jingyu Lu, Zhiqing Hong, Chuxin Wang, LiChao Zhang, Jinzheng He, Ziyue Jiang, Yuxin Chen, Chen Yang, Jiecheng Zhou, Xinyu Cheng, Zhou Zhao | Zhejiang University

Dataset and code of GTSinger (NeurIPS 2024 Spotlight): A Global Multi-Technique Singing Corpus with Realistic Music Scores for All Singing Tasks.

arXiv weixin zhihu Hugging Face Google Drive GitHub Stars

We introduce GTSinger, a large Global, multi-Technique, free-to-use, high-quality singing corpus with realistic music scores, designed for all singing tasks, along with its benchmarks.

We provide the corpus and processing codes for our dataset and benchmarks' implementation in this repository.

Also, you can visit our Demo Page for the audio samples of our dataset as well as the results of our benchmarks.

News

  • 2024.09: We released the full dataset of GTSinger!
  • 2024.09: GTSinger is accepted by NeurIPS 2024 (Spotlight)!
  • 2024.05: We released the code of GTSinger!

TODO List

✅ Release the code.

✅ Release the full dataset.

✅ Release the processed data of Chinese, English, Spanish, German, Russian.

✅ Refine the paired speech data of each language.

✅ Refine Chinese, Spanish, German, Russian annotations.

🔲 Further refine English, French, Japanese, Korean, Italian annotations (planned to be completed by the end of 2024).

🔲 Release the remaining processed data (planned to be completed by the end of 2024).

Key Features

  • 80.59 hours of singing voices in GTSinger are recorded in professional studios by skilled singers, ensuring high quality and clarity, forming the largest recorded singing dataset.
  • Contributed by 20 singers across nine widely spoken languages (Chinese, English, Japanese, Korean, Russian, Spanish, French, German, and Italian) and all four vocal ranges, GTSinger enables zero-shot SVS and style transfer models to learn diverse timbres and styles.
  • GTSinger provides controlled comparison and phoneme-level annotations of six singing techniques (mixed voice, falsetto, breathy, pharyngeal, vibrato, and glissando) for songs, thereby facilitating singing technique modeling, recognition, and control.
  • Unlike fine-grained music scores, GTSinger features realistic music scores with regular note duration, assisting singing models in learning and adapting to real-world musical composition.
  • The dataset includes manual phoneme-to-audio alignments, global style labels (singing method, emotion, range, and pace), and 16.16 hours of paired speech, ensuring comprehensive annotations and broad task suitability.

Dataset

Where to download

Click Hugging Face to access our full dataset (audio along with TextGrid, json, musicxml) and processed data (metadata.json, phone_set.json, spker_set.json) on Hugging Face for free! Hope our data is helpful for your research.

Besides, we also provide our dataset on Google Drive.

Please note that, if you are using GTSinger, it means that you have accepted the terms of license.

Data Architecture

Our dataset is organized hierarchically.

It presents nine top-level folders, each corresponding to a distinct language.

Within each language folder, there are five sub-folders, each representing a specific singing technique.

These technique folders contain numerous song entries, with each song further divided into several controlled comparison groups: a control group (natural singing without the specific technique), and a technique group (densely employing the specific technique).

Our singing voices and speech are recorded at a 48kHz sampling rate with 24-bit resolution in WAV format.

Alignments and annotations are provided in TextGrid files, including word boundaries, phoneme boundaries, phoneme-level annotations for six techniques, and global style labels (singing method, emotion, pace, and range).

We also provide realistic music scores in musicxml format.

Notably, we provide an additional JSON file for each singing voice, facilitating data parsing and processing for singing models.

Here is the data structure of our dataset:

.
├── Chinese
│   ├── ZH-Alto-1
│   └── ZH-Tenor-1
├── English
│   ├── EN-Alto-1
│   │   ├── Breathy
│   │   ├── Glissando
│   │   │   └── my love
│   │   │       ├── Control_Group
│   │   │       ├── Glissando_Group
│   │   │       └── Paired_Speech_Group
│   │   ├── Mixed_Voice_and_Falsetto
│   │   ├── Pharyngeal
│   │   └── Vibrato
│   ├── EN-Alto-2
│   │   ├── Breathy
│   │   ├── Glissando
│   │   ├── Mixed_Voice_and_Falsetto
│   │   ├── Pharyngeal
│   │   └── Vibrato
│   └── EN-Tenor-1
│       ├── Breathy
│       ├── Glissando
│       ├── Mixed_Voice_and_Falsetto
│       ├── Pharyngeal
│       └── Vibrato
├── French
│   ├── FR-Soprano-1
│   └── FR-Tenor-1
├── German
│   ├── DE-Soprano-1
│   └── DE-Tenor-1
├── Italian
│   ├── IT-Bass-1
│   ├── IT-Bass-2
│   └── IT-Soprano-1
├── Japanese
│   ├── JA-Soprano-1
│   └── JA-Tenor-1
├── Korean
│   ├── KO-Soprano-1
│   ├── KO-Soprano-2
│   └── KO-Tenor-1
├── Russian
│   └── RU-Alto-1
└── Spanish
    ├── ES-Bass-1
    └── ES-Soprano-1

Code for preprocessing data

The code for processing the dataset is provided in the ./Data-Process.

Dependencies

A suitable conda environment named gt_dataprocess can be created and activated with:

conda create -n gt_dataprocess python=3.8 -y
conda activate gt_dataprocess
pip install -r requirements.txt

Data Check

The code for checking the dataset is provided in ./Data-Process/data_check/, including the following files:

  • check_file_and_folder.py: Check the file and folder structure of the dataset.

  • check_valid_bandwidth.py: Check the sample rate and valid bandwidth of the dataset.

  • count_time.py: Count the time of the singing voice and speech in the dataset.

  • plot_f0.py: Plot the pitch(f0) of the singing voice audio.

  • plot_mel.py: Plot the mel-spectrogram of audio.

Data Preprocessing

The code for preprocessing the dataset is provided in ./Data-Process/data_preprocess/, including the following files:

  • gen_final_json.py: Generate the final JSON file for each singing voice based on the TextGrid file and musicxml file that have been annotated.

  • global2tgjson.py: Add global style labels to the JSON file and TextGrid file.

  • seg_singing.py & seg_speech.py: Segment the singing voice and speech based on the TextGrid file.

  • gen_xml.py : For the generation and processing of XML.

Code for benchmarks

Technique Controllable Singing Voice Synthesis

The code for our benchmarks for Technique Controllable Singing Voice Synthesis. You can also use GTSinger to train TCSinger!

Technique Recognition

The code for our benchmarks for Technique Recognition.

Style Transfer

The code for our benchmarks for Style Transfer. You can use GTSinger to train StyleSinger and TCSinger!

Speech-to-Singing Conversion

The code for our benchmarks for Speech-to-Singing-Conversion. You can use GTSinger to train AlignSTS!

Citations

If you find this code useful in your research, please cite our work:

@article{zhang2024gtsinger,
  title={GTSinger: A Global Multi-Technique Singing Corpus with Realistic Music Scores for All Singing Tasks},
  author={Zhang, Yu and Pan, Changhao and Guo, Wenxiang and Li, Ruiqi and Zhu, Zhiyuan and Wang, Jialei and Xu, Wenhao and Lu, Jingyu and Hong, Zhiqing and Wang, Chuxin and others},
  journal={arXiv preprint arXiv:2409.13832},
  year={2024}
}

Disclaimer

Any organization or individual is prohibited from using any technology mentioned in this paper to generate someone's singing without his/her consent, including but not limited to government leaders, political figures, and celebrities. If you do not comply with this item, you could be in violation of copyright laws.

visitors

Popular repositories Loading

  1. GTSinger GTSinger Public

    Dataset and code of GTSinger(NeurIPS 2024 Spotlight): A Global Multi-Technique Singing Corpus with Realistic Music Scores for All Singing Tasks

    Python 224 9

  2. GTSinger.github.io GTSinger.github.io Public

    The Demo Page of GTSinger (NeurIPS 2024).

    HTML