TorchSnapshot (Beta Release)

A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mind.

Install

Requires Python >= 3.8 and PyTorch >= 2.0.0

From pip:

# Stable
pip install torchsnapshot
# Or, using conda
conda install -c conda-forge torchsnapshot

# Nightly
pip install --pre torchsnapshot-nightly

From source:

git clone https://github.com/pytorch/torchsnapshot
cd torchsnapshot
pip install -r requirements.txt
python setup.py install

Why TorchSnapshot

Performance

TorchSnapshot provides a fast checkpointing implementation employing various optimizations, including zero-copy serialization for most tensor types, overlapped device-to-host copy and storage I/O, parallelized storage I/O.
TorchSnapshot greatly speeds up checkpointing for DistributedDataParallel workloads by distributing the write load across all ranks (benchmark).
When host memory is abundant, TorchSnapshot allows training to resume before all storage I/O completes, reducing the time blocked by checkpoint saving.

Memory Usage

TorchSnapshot's memory usage adapts to the host's available resources, greatly reducing the chance of out-of-memory issues when saving and loading checkpoints.
TorchSnapshot supports efficient random access to individual objects within a snapshot, even when the snapshot is stored in a cloud object storage.

Usability

Simple APIs that are consistent between distributed and non-distributed workloads.
Out of the box integration with commonly used cloud object storage systems.
Automatic resharding (elasticity) on world size change for supported workloads (more details).

Security

Secure tensor serialization without pickle dependency [WIP].

Getting Started

from torchsnapshot import Snapshot

# Taking a snapshot
app_state = {"model": model, "optimizer": optimizer}
snapshot = Snapshot.take(path="/path/to/snapshot", app_state=app_state)

# Restoring from a snapshot
snapshot.restore(app_state=app_state)

See the documentation for more details.

License

torchsnapshot is BSD licensed, as found in the LICENSE file.

Name	Name	Last commit message	Last commit date
Latest commit Thomas Polasek and facebook-github-bot Convert FBCODE to use the Ruff Formatter Dec 6, 2024 0a23047 · Dec 6, 2024 History 253 Commits
.github	.github	upgrade metadata call to use IMDSv2, only supported version soon (#180 )	Nov 5, 2024
benchmarks	benchmarks	Add missing Pyre mode headers] [batch:1/1162] [shard:30/N]	Oct 29, 2024
docs	docs	Pin sphinx version to 5.0.1 (#146 )	Aug 2, 2023
examples	examples	Pyre Configurationless migration for] [batch:231/244]	Mar 8, 2024
tests	tests	Convert FBCODE to use the Ruff Formatter	Dec 6, 2024
torchsnapshot	torchsnapshot	Convert FBCODE to use the Ruff Formatter	Dec 6, 2024
.coveragerc	.coveragerc	Reuse the test workflows across triggers (#116 )	Oct 27, 2022
.flake8	.flake8	Fix GitHub pre-commit check (#173 )	Apr 2, 2024
.gitignore	.gitignore	add pre-commit action (#60 )	Aug 29, 2022
.pre-commit-config.yaml	.pre-commit-config.yaml	Fix GitHub pre-commit check (#173 )	Apr 2, 2024
CODE_OF_CONDUCT.md	CODE_OF_CONDUCT.md	Initial commit	Jun 9, 2022
CONTRIBUTING.md	CONTRIBUTING.md	Update CONTRIBUTING.md to add precommit instructions (#10 )	Jun 16, 2022
LICENSE	LICENSE	Initial commit	Jun 9, 2022
README.md	README.md	Update python and pytorch versions (#163 )	Oct 31, 2023
dev-requirements.txt	dev-requirements.txt	Fix nightly release permissions error (#176 )	May 20, 2024
pyproject.toml	pyproject.toml	add pre-commit action (#60 )	Aug 29, 2022
pytest.ini	pytest.ini	GPUBatchedBufferStager (#90 )	Oct 18, 2022
requirements.txt	requirements.txt	Remove nest_asyncio from torchsnapshot	Aug 1, 2024
setup.py	setup.py	Add py.typed to expose type annotations (#110 )	Oct 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TorchSnapshot (Beta Release)

Install

Why TorchSnapshot

Getting Started

License

About

Releases 1

Packages

Contributors 33

Languages

License

pytorch/torchsnapshot

Folders and files

Latest commit

History

Repository files navigation

TorchSnapshot (Beta Release)

Install

Why TorchSnapshot

Getting Started

License

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 33

Languages

Packages