Release VectorLM v0.1.1 · VectorInstitute/vectorlm

This release fixes a bug in state checkpointing and adds a few features.

Previously, you would not be able to checkpoint a model being trained with Hybrid FSDP. This version now implements the use of torch's distributed checkpointing submodule for our checkpointing functionality.
We have enabled forward prefetching of weights in FSDP by default as to maximize communication overlap.
We have also added functionality for low CPU memory usage (under Memory & Compute) while loading large models. This makes it so that the model weights are loaded onto CPU memory once from the main rank and are scattered appropriately.

Provide feedback