Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Usage about grad_accum_every #59

Open
r03922123 opened this issue Jun 26, 2023 · 0 comments
Open

Usage about grad_accum_every #59

r03922123 opened this issue Jun 26, 2023 · 0 comments

Comments

@r03922123
Copy link

I am curious about how "grad_accum_every" used in https://github.com/lucidrains/musiclm-pytorch/blob/main/musiclm_pytorch/trainer.py#L317

In my previous experience, the model basically get gradient (backward) once a step. Why should we split loss "grad_accum_every" times to get gradient in a step?

If I have gpu constrain (1 T4 gpu), that means I could only set batch size to 1 or 2 at each stage training, should I still set "grad_accum_every' to large number like 16 or 32?

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant