Usage about grad_accum_every #59

r03922123 · 2023-06-26T09:16:16Z

I am curious about how "grad_accum_every" used in https://github.com/lucidrains/musiclm-pytorch/blob/main/musiclm_pytorch/trainer.py#L317

In my previous experience, the model basically get gradient (backward) once a step. Why should we split loss "grad_accum_every" times to get gradient in a step?

If I have gpu constrain (1 T4 gpu), that means I could only set batch size to 1 or 2 at each stage training, should I still set "grad_accum_every' to large number like 16 or 32?

Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Usage about grad_accum_every #59

Usage about grad_accum_every #59

r03922123 commented Jun 26, 2023

Usage about grad_accum_every #59

Usage about grad_accum_every #59

Comments

r03922123 commented Jun 26, 2023