You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This commit was created on GitHub.com and signed with GitHub’s verified signature.
Adding support to DeepSpeed and FSDP through DP-ZeRO on multi-GPU
Adding a second approach to compute private gradient. This approach re-writes and extends the torch layers' back-propagation. New approach does not need ghost differentiation, may be slower (but improvable), and is much more generally applicable.
Removing param.summed_clipped_grad and replacing with param.private_grad
Adding ZeRO examples for image classification and GPT