This repository is implementation of "Instrument Separation of Symbolic Music by Explicitly Guided Diffusion Model" (submitted at NeurIPS ML4CD 2022). Our diffusion model takes mixtures (time x pitch) as inputs and recovers music (time x pitch x instrument) with strong consistency.
- Python 3.8.8
- Ubuntu 20.04.2 LTS
- Read requirements.txt for other Python libraries
preprocess.ipynb is to obtain the pianoroll of mixtures and music pairs from LMD. Although it is originally built to separate melody/non-melody tracks using midi-miner, you can modify it by removing melody-related codes.
You should modify json fields related to file and folder paths in config.json. By setting "strategy" (ex. ddp) and "gpus" (ex. [0, 1, 2]), you can train the models with distributed GPU settings of pytorch-lightining.
For training the diffusion separator based on TransUNet, the command is below;
python melody2music_train.py
For training the independent decoder, the command is below;
python melody2music_decoder.py
You can obtain demo samples from melody2music_test.ipynb.
You can listen our generated samples on Google Drive. It consists of sets of (*.png, *_original_music.wav, *_music_from_mixture.wav).
I have learned a lot from Lil'Log and Huggingface tutorial.
Sangjun Han, Hyeongrae Ihm, DaeHan Ahn, Woohyung Lim (University of Ulsan and LG AI Research), "Instrument Separation of Symbolic Music by Explicitly Guided Diffusion Model"