VectorLM v0.1.1
This release fixes a bug in state checkpointing and adds a few features.
- Previously, you would not be able to checkpoint a model being trained with Hybrid FSDP. This version now implements the use of torch's distributed checkpointing submodule for our checkpointing functionality.
- We have enabled forward prefetching of weights in FSDP by default as to maximize communication overlap.
- We have also added functionality for low CPU memory usage (under Memory & Compute) while loading large models. This makes it so that the model weights are loaded onto CPU memory once from the main rank and are scattered appropriately.