Skip to content

VectorLM v0.1.1

Compare
Choose a tag to compare
@adil-a adil-a released this 08 Apr 05:06
· 2 commits to master since this release
8320c48

This release fixes a bug in state checkpointing and adds a few features.

  • Previously, you would not be able to checkpoint a model being trained with Hybrid FSDP. This version now implements the use of torch's distributed checkpointing submodule for our checkpointing functionality.
  • We have enabled forward prefetching of weights in FSDP by default as to maximize communication overlap.
  • We have also added functionality for low CPU memory usage (under Memory & Compute) while loading large models. This makes it so that the model weights are loaded onto CPU memory once from the main rank and are scattered appropriately.