Skip to content

v2.0

Compare
Choose a tag to compare
@woodyx218 woodyx218 released this 19 Feb 05:18
· 34 commits to main since this release
215629c
  1. Adding support to DeepSpeed and FSDP through DP-ZeRO on multi-GPU
  2. Adding a second approach to compute private gradient. This approach re-writes and extends the torch layers' back-propagation. New approach does not need ghost differentiation, may be slower (but improvable), and is much more generally applicable.
  3. Removing param.summed_clipped_grad and replacing with param.private_grad
  4. Adding ZeRO examples for image classification and GPT
  5. Adding mixed precision training (fp16 and bf16)