Skip to content

3.0: Dramatically improve audio quality with a shallow diffusion model

Compare
Choose a tag to compare
@yxlllc yxlllc released this 13 May 18:45
· 95 commits to master since this release

Unzip the two demo models into exp directory, then run the demo samples:

# opencpop (1st speaker)
python main_diff.py -i samples/source.wav -ddsp exp/ddsp-demo/model_300000.pt -diff exp/diffusion-demo/model_400000.pt -o samples/svc-opencpop+12key.wav -id 1 -k 12 -kstep 300
# kiritan (2nd speaker)
python main_diff.py -i samples/source.wav -ddsp exp/ddsp-demo/model_300000.pt -diff exp/diffusion-demo/model_400000.pt -o samples/svc-kiritan+12key.wav -id 2 -k 12 -kstep 300
# mix the timbre of opencpop and kiritan in a 0.5 to 0.5 ratio
python main_diff.py -i samples/source.wav -ddsp exp/ddsp-demo/model_300000.pt -diff exp/diffusion-demo/model_400000.pt -o samples/svc-opencpop_kiritan_mix+12key.wav -mix "{1:0.5,2:0.5}" -k 12 -kstep 300

The training data of this 2-speaker model is from opencpop and kiritan

Thanks to lafi2333 for helping to train the demo models.