You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In my previous experience, the model basically get gradient (backward) once a step. Why should we split loss "grad_accum_every" times to get gradient in a step?
If I have gpu constrain (1 T4 gpu), that means I could only set batch size to 1 or 2 at each stage training, should I still set "grad_accum_every' to large number like 16 or 32?
Thank you!
The text was updated successfully, but these errors were encountered:
I am curious about how "grad_accum_every" used in https://github.com/lucidrains/musiclm-pytorch/blob/main/musiclm_pytorch/trainer.py#L317
In my previous experience, the model basically get gradient (backward) once a step. Why should we split loss "grad_accum_every" times to get gradient in a step?
If I have gpu constrain (1 T4 gpu), that means I could only set batch size to 1 or 2 at each stage training, should I still set "grad_accum_every' to large number like 16 or 32?
Thank you!
The text was updated successfully, but these errors were encountered: