-
Notifications
You must be signed in to change notification settings - Fork 260
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Soundstream Training Goes From Great to Horrible #221
Comments
My training code `from audiolm_pytorch import SoundStream, SoundStreamTrainer soundstream = SoundStream( trainer = SoundStreamTrainer( |
@adamfils try loading from the checkpoint just before the collapse, and lowering the learning rate |
Thanks. Also what is the difference between sample_31500.flac and sample_31500.ema.flac (EMA and non EMA Audio Samples). |
@adamfils you want to use the this is a common practice in generative field, where you update the parameters of your generator with exponential smoothing, which often leads to better end models |
Okay. because the ema samples sound bad while the non Ema audio samples sound great. 😬 |
@adamfils yikes, that doesn't sound good! let me check on this maybe this sunday morning |
Any updates on how this got fixed? Want to start a training as well in the coming week. |
multiple engineers and researchers have already successfully trained you should just go for it, if you have enough data |
@Fritskee my next stretch goal is to turn the soundstream training into a CLI, like what i did for lightweight gan |
I just wanted to go with LibriSpeech, so I figured if the weights were already out there, might as well ask. But you make a fair point! |
That'd be dope! I also want to take the time to thank you for all your efforts to democratize the latest research in ML! |
I have been training soundstream for the past 3 days on my A6000. At 25,000 steps I got amazing results then after that the loss just increased abruptly and other generations are just bad.
As you can see below from step 25031 the loss looks weird and increases.
At 25,000 steps here is the result
https://voca.ro/1c10gpytA3id
At 25,500 here is the result
https://voca.ro/1eaoQiOmo1Se
25000: saving to results 25000: saving model to results 25001: soundstream total loss: 4.872, soundstream recon loss: 0.001 | discr (scale 1) loss: 2.284 | discr (scale 0.5) loss: 1.894 | discr (scale 0.25) loss: 1.829 25002: soundstream total loss: 4.893, soundstream recon loss: 0.001 | discr (scale 1) loss: 2.336 | discr (scale 0.5) loss: 1.899 | discr (scale 0.25) loss: 1.887 25003: soundstream total loss: 4.375, soundstream recon loss: 0.001 | discr (scale 1) loss: 2.332 | discr (scale 0.5) loss: 1.825 | discr (scale 0.25) loss: 1.871 25004: soundstream total loss: 4.699, soundstream recon loss: 0.001 | discr (scale 1) loss: 2.229 | discr (scale 0.5) loss: 1.879 | discr (scale 0.25) loss: 1.921 25005: soundstream total loss: 4.486, soundstream recon loss: 0.001 | discr (scale 1) loss: 2.217 | discr (scale 0.5) loss: 1.859 | discr (scale 0.25) loss: 1.928 25006: soundstream total loss: 4.232, soundstream recon loss: 0.001 | discr (scale 1) loss: 2.296 | discr (scale 0.5) loss: 1.842 | discr (scale 0.25) loss: 1.934 25007: soundstream total loss: 4.356, soundstream recon loss: 0.001 | discr (scale 1) loss: 2.056 | discr (scale 0.5) loss: 1.939 | discr (scale 0.25) loss: 1.930 25008: soundstream total loss: 4.532, soundstream recon loss: 0.001 | discr (scale 1) loss: 2.011 | discr (scale 0.5) loss: 1.965 | discr (scale 0.25) loss: 1.964 25009: soundstream total loss: 4.534, soundstream recon loss: 0.001 | discr (scale 1) loss: 2.065 | discr (scale 0.5) loss: 2.011 | discr (scale 0.25) loss: 2.013 25010: soundstream total loss: 4.773, soundstream recon loss: 0.001 | discr (scale 1) loss: 2.297 | discr (scale 0.5) loss: 2.198 | discr (scale 0.25) loss: 2.055 25011: soundstream total loss: 4.817, soundstream recon loss: 0.001 | discr (scale 1) loss: 2.109 | discr (scale 0.5) loss: 2.110 | discr (scale 0.25) loss: 2.033 25012: soundstream total loss: 5.056, soundstream recon loss: 0.001 | discr (scale 1) loss: 2.398 | discr (scale 0.5) loss: 2.042 | discr (scale 0.25) loss: 1.931 25013: soundstream total loss: 5.122, soundstream recon loss: 0.001 | discr (scale 1) loss: 2.212 | discr (scale 0.5) loss: 1.955 | discr (scale 0.25) loss: 1.865 25014: soundstream total loss: 4.553, soundstream recon loss: 0.001 | discr (scale 1) loss: 2.231 | discr (scale 0.5) loss: 1.909 | discr (scale 0.25) loss: 1.913 25015: soundstream total loss: 4.360, soundstream recon loss: 0.001 | discr (scale 1) loss: 2.232 | discr (scale 0.5) loss: 1.847 | discr (scale 0.25) loss: 1.952 25016: soundstream total loss: 4.644, soundstream recon loss: 0.001 | discr (scale 1) loss: 2.279 | discr (scale 0.5) loss: 1.803 | discr (scale 0.25) loss: 1.994 25017: soundstream total loss: 5.561, soundstream recon loss: 0.001 | discr (scale 1) loss: 2.278 | discr (scale 0.5) loss: 1.807 | discr (scale 0.25) loss: 1.943 25018: soundstream total loss: 4.956, soundstream recon loss: 0.001 | discr (scale 1) loss: 2.209 | discr (scale 0.5) loss: 1.713 | discr (scale 0.25) loss: 1.878 25019: soundstream total loss: 5.055, soundstream recon loss: 0.001 | discr (scale 1) loss: 2.179 | discr (scale 0.5) loss: 1.732 | discr (scale 0.25) loss: 1.865 25020: soundstream total loss: 5.168, soundstream recon loss: 0.001 | discr (scale 1) loss: 2.332 | discr (scale 0.5) loss: 1.762 | discr (scale 0.25) loss: 1.853 25021: soundstream total loss: 4.924, soundstream recon loss: 0.001 | discr (scale 1) loss: 2.375 | discr (scale 0.5) loss: 1.813 | discr (scale 0.25) loss: 1.867 25022: soundstream total loss: 4.844, soundstream recon loss: 0.001 | discr (scale 1) loss: 2.462 | discr (scale 0.5) loss: 1.786 | discr (scale 0.25) loss: 1.855 25023: soundstream total loss: 5.200, soundstream recon loss: 0.001 | discr (scale 1) loss: 2.579 | discr (scale 0.5) loss: 1.798 | discr (scale 0.25) loss: 1.822 25024: soundstream total loss: 7.380, soundstream recon loss: 0.002 | discr (scale 1) loss: 2.756 | discr (scale 0.5) loss: 1.805 | discr (scale 0.25) loss: 1.813 25025: soundstream total loss: 4.865, soundstream recon loss: 0.001 | discr (scale 1) loss: 2.723 | discr (scale 0.5) loss: 1.748 | discr (scale 0.25) loss: 1.758 25026: soundstream total loss: 4.889, soundstream recon loss: 0.001 | discr (scale 1) loss: 2.725 | discr (scale 0.5) loss: 1.854 | discr (scale 0.25) loss: 1.846 25027: soundstream total loss: 5.056, soundstream recon loss: 0.001 | discr (scale 1) loss: 2.747 | discr (scale 0.5) loss: 1.817 | discr (scale 0.25) loss: 1.854 25028: soundstream total loss: 5.091, soundstream recon loss: 0.001 | discr (scale 1) loss: 3.242 | discr (scale 0.5) loss: 1.839 | discr (scale 0.25) loss: 1.891 25029: soundstream total loss: 4.385, soundstream recon loss: 0.001 | discr (scale 1) loss: 8.894 | discr (scale 0.5) loss: 1.760 | discr (scale 0.25) loss: 1.883 25030: soundstream total loss: 2.860, soundstream recon loss: 0.001 | discr (scale 1) loss: 108.547 | discr (scale 0.5) loss: 1.708 | discr (scale 0.25) loss: 1.798 25031: soundstream total loss: -15.905, soundstream recon loss: 0.002 | discr (scale 1) loss: 1718.587 | discr (scale 0.5) loss: 1.557 | discr (scale 0.25) loss: 1.979 25032: soundstream total loss: -303.631, soundstream recon loss: 0.024 | discr (scale 1) loss: 10940.722 | discr (scale 0.5) loss: 1.072 | discr (scale 0.25) loss: 3.398 25033: soundstream total loss: -2264.270, soundstream recon loss: 0.295 | discr (scale 1) loss: 234567.777 | discr (scale 0.5) loss: 0.180 | discr (scale 0.25) loss: 5.426 25034: soundstream total loss: -53273.180, soundstream recon loss: 15.740 | discr (scale 1) loss: 1108289.203 | discr (scale 0.5) loss: 0.008 | discr (scale 0.25) loss: 0.970 25035: soundstream total loss: -244286.930, soundstream recon loss: 272.947 | discr (scale 1) loss: 3089418.844 | discr (scale 0.5) loss: 0.010 | discr (scale 0.25) loss: 0.029 25036: soundstream total loss: -648283.398, soundstream recon loss: 2447.980 | discr (scale 1) loss: 7947847.062 | discr (scale 0.5) loss: 0.000 | discr (scale 0.25) loss: 0.007 25037: soundstream total loss: -1452483.922, soundstream recon loss: 19413.394 | discr (scale 1) loss: 18546006.250 | discr (scale 0.5) loss: 0.000 | discr (scale 0.25) loss: 0.001 25038: soundstream total loss: -2364417.562, soundstream recon loss: 132011.410 | discr (scale 1) loss: 33489656.000 | discr (scale 0.5) loss: 0.000 | discr (scale 0.25) loss: 0.008 25039: soundstream total loss: 2783657.594, soundstream recon loss: 803092.328 | discr (scale 1) loss: 49849376.000 | discr (scale 0.5) loss: 0.000 | discr (scale 0.25) loss: 0.002 25040: soundstream total loss: 14825873.875, soundstream recon loss: 2074252.219 | discr (scale 1) loss: 38289075.500 | discr (scale 0.5) loss: 0.000 | discr (scale 0.25) loss: 0.002 25041: soundstream total loss: 11907697.250, soundstream recon loss: 1596693.234 | discr (scale 1) loss: 12477728.375 | discr (scale 0.5) loss: 0.000 | discr (scale 0.25) loss: 0.016 25042: soundstream total loss: 2389267.781, soundstream recon loss: 358384.961 | discr (scale 1) loss: 1455136.312 | discr (scale 0.5) loss: 0.000 | discr (scale 0.25) loss: 0.009 25043: soundstream total loss: 47939.899, soundstream recon loss: 14739.114 | discr (scale 1) loss: 37.778 | discr (scale 0.5) loss: 63.086 | discr (scale 0.25) loss: 52.932 25044: soundstream total loss: 847.260, soundstream recon loss: 2.112 | discr (scale 1) loss: 15.008 | discr (scale 0.5) loss: 60.966 | discr (scale 0.25) loss: 115.134 25045: soundstream total loss: 936.149, soundstream recon loss: 0.910 | discr (scale 1) loss: 16.900 | discr (scale 0.5) loss: 5.909 | discr (scale 0.25) loss: 0.893 25046: soundstream total loss: 401.222, soundstream recon loss: 0.256 | discr (scale 1) loss: 18.919 | discr (scale 0.5) loss: 6.674 | discr (scale 0.25) loss: 0.226 25047: soundstream total loss: 172.702, soundstream recon loss: 0.054 | discr (scale 1) loss: 18.073 | discr (scale 0.5) loss: 4.793 | discr (scale 0.25) loss: 0.570
The text was updated successfully, but these errors were encountered: