You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
thanks for your great work. when i use the dmd distillation code, i find the snr loss is not use the mse loss, but the coeff * latents, not the grad and may be negative. Is it related to the way model learning using snr gamma?
The text was updated successfully, but these errors were encountered:
thanks for your great work. when i use the dmd distillation code, i find the snr loss is not use the mse loss, but the coeff * latents, not the grad and may be negative. Is it related to the way model learning using snr gamma?
The default args.snr_gamma should be none ? I am also puzzled about the difference between these two, which one should use
thanks for your great work. when i use the dmd distillation code, i find the snr loss is not use the mse loss, but the coeff * latents, not the grad and may be negative. Is it related to the way model learning using snr gamma?
The text was updated successfully, but these errors were encountered: