Skip to content

Releases: yxlllc/DDSP-SVC

5.0: Improved DDSP Cascade Diffusion Model

08 Feb 09:46
Compare
Choose a tag to compare

model_0.pt is a pre-trained model using contentvec768l12 encoder.
A demo of training from scratch (without using a pre-trained model) is here.

4.0: DDSP Cascade Diffusion Model

15 Aug 15:47
Compare
Choose a tag to compare

Unzip the demo model into exp directory, unzip the sample audios to the main directory, then run the demo samples:

# opencpop (1st speaker)
python main_diff.py -i samples/source.wav -diff exp/diffusion-new-demo/model_200000.pt -o samples/svc-opencpop+12key.wav -id 1 -k 12 -kstep 100
# kiritan (2nd speaker)
python main_diff.py -i samples/source.wav -diff exp/diffusion-new-demo/model_200000.pt -o samples/svc-kiritan+12key.wav -id 2 -k 12 -kstep 100
# mix the timbre of opencpop and kiritan in a 0.5 to 0.5 ratio
python main_diff.py -i samples/source.wav -diff exp/diffusion-new-demo/model_200000.pt -o samples/svc-opencpop_kiritan_mix+12key.wav -mix "{1:0.5,2:0.5}" -k 12 -kstep 100

The training data of this 2-speaker model is from opencpop and kiritan

Thanks to CN_ChiTu for helping to train this model.

3.0: Dramatically improve audio quality with a shallow diffusion model

13 May 18:45
Compare
Choose a tag to compare

Unzip the two demo models into exp directory, then run the demo samples:

# opencpop (1st speaker)
python main_diff.py -i samples/source.wav -ddsp exp/ddsp-demo/model_300000.pt -diff exp/diffusion-demo/model_400000.pt -o samples/svc-opencpop+12key.wav -id 1 -k 12 -kstep 300
# kiritan (2nd speaker)
python main_diff.py -i samples/source.wav -ddsp exp/ddsp-demo/model_300000.pt -diff exp/diffusion-demo/model_400000.pt -o samples/svc-kiritan+12key.wav -id 2 -k 12 -kstep 300
# mix the timbre of opencpop and kiritan in a 0.5 to 0.5 ratio
python main_diff.py -i samples/source.wav -ddsp exp/ddsp-demo/model_300000.pt -diff exp/diffusion-demo/model_400000.pt -o samples/svc-opencpop_kiritan_mix+12key.wav -mix "{1:0.5,2:0.5}" -k 12 -kstep 300

The training data of this 2-speaker model is from opencpop and kiritan

Thanks to lafi2333 for helping to train the demo models.

2.0:Greatly optimized training speed

21 Mar 16:46
Compare
Choose a tag to compare

Unzip the pretrained model into exp directory, then run the demo samples:

# opencpop (1st speaker)
python main.py -i samples/source.wav -m exp/multi_speaker/model_300000.pt -o samples/svc-opencpop+12key.wav -k 12 -id 1
# kiritan (2nd speaker)
python main.py -i samples/source.wav -m exp/multi_speaker/model_300000.pt -o samples/svc-kiritan+12key.wav -k 12 -id 2
# mix the timbre of opencpop and kiritan in a 0.5 to 0.5 ratio
python main.py -i samples/source.wav -m exp/multi_speaker/model_300000.pt -o samples/svc-opencpop_kiritan_mix+12key.wav -k 12 -mix "{1:0.5, 2:0.5}"

The training data of this 2-speaker model is from opencpop and kiritan

Thanks to CN_ChiTu for helping to train this model.

Multi-speaker support and timbre mixing

08 Mar 14:02
Compare
Choose a tag to compare

Unzip the pretrained model into exp directory, then run the demo samples:

# opencpop (1st speaker)
python main.py -i samples/source.wav -m exp/multi_speaker/model_300000.pt -o samples/svc-opencpop+12key.wav -k 12 -pe crepe -e true -id 1
# kiritan (2nd speaker)
python main.py -i samples/source.wav -m exp/multi_speaker/model_300000.pt -o samples/svc-kiritan+12key.wav -k 12 -pe crepe -e true -id 2
# mix the timbre of opencpop and kiritan in a 0.5 to 0.5 ratio
python main.py -i samples/source.wav -m exp/multi_speaker/model_300000.pt -o samples/svc-opencpop_kiritan_mix+12key.wav -k 12 -pe crepe -e true -mix "{1:0.5, 2:0.5}"

The training data of this 2-speaker model is from opencpop and kiritan

Thanks to CN_ChiTu for helping to train this model.

1.0

05 Mar 03:08
Compare
Choose a tag to compare
1.0

Unzip the pretrained model into exp directory, then run the demo samples:

# origin output
python main.py -i samples/source.wav -m exp/opencpop/model_300000.pt -o samples/svc-opencpop+10key-origin.wav -k 10 -pe crepe
# enhanced output
python main.py -i samples/source.wav -m exp/opencpop/model_300000.pt -o samples/svc-opencpop+10key-enhance.wav -k 10 -pe crepe -e true

The training data is from opencpop