Releases · GoogleCloudPlatform/gcs-connector-for-pytorch

20 Nov 22:54

jdnurme

v1.4.0

9791744

1.4.0 Latest

Latest

What's Changed

Lightning multinode parquet by @jdnurme in #73
Install pytest on running continuous test for dataflux pypi package. by @akansha1812 in #75
Make DatafluxPytTrain a wrapper of DataFluxMapStyleDataset by @abhibyreddi in #74
updated base docker, example yaml, added readme by @jdnurme in #76
Fix DatafluxPytTrain.getitem by @abhibyreddi in #77
Run continuous test on the pypi installed package on presubmit by @akansha1812 in #78
Add code to make it possible to deploy training on a multi-node GKE cluster by @abhibyreddi in #81
Configure shared memory size by @abhibyreddi in #82
Reorder Dockerfile and add dockerignore to speed up builds by @MattIrv in #84
Correcting the checkpointing functions to handle the Path object. by @Yash9060 in #83
Parse bucket name from ckpt directory name instead of separate parameter for bucket name by @Yash9060 in #85
Make Lightning checkpoint demo work with Bernard's GKE framework and with FSDP strategy by @MattIrv in #86
Initialize new storage_client.bucket on every request by @Yash9060 in #87
Add README file for lightning image segmentation workload by @abhibyreddi in #89
Check in initial Parquet benchmark based on MaxText data loading benchmark by @MattIrv in #90
Add GKE deployment for MaxText Parquet training benchmark by @MattIrv in #91
Skip training when demo is run to benchmark Dataflux by @abhibyreddi in #92
Update the definition of the local flag by @abhibyreddi in #93
Allow running demo code in listing-only mode by @abhibyreddi in #95
Raise exception when ADC are missing by @abhibyreddi in #94
Update defaults for batch_size and num_workers by @abhibyreddi in #96
Faster Lightning Checkpoint download by @MattIrv in #99
Adding custom GCS Writer. by @Yash9060 in #98
update to latest dataflux client by @jdnurme in #101
add continuous benchmark with kokoro by @jdnurme in #102
Run image training demo as part of continuous integ tests by @abhibyreddi in #104
Adding GCS Custom reader by @Yash9060 in #105
MultiNode demo by @Yash9060 in #106
add benchmark code and update kokoro scripts by @jdnurme in #108
Parameterizing min_epochs, max_epochs & max_steps by @Yash9060 in #107
Add a helper method to create storage_client when needed. by @awonak in #109
Make step time configurable by @abhibyreddi in #110
Remove client initialization for fast listing from dataflux-pytorch by @akansha1812 in #111
Multipart checkpoint upload by @jdnurme in #114
adds unit tests, adds presubmit integration test, updated demo code by @jdnurme in #117
Add code to clear kernel cache after saving checkpoints by @abhibyreddi in #122
update continuous to run full benchmark by @jdnurme in #123
Adding benchmarking code for multi node checkpointing. by @Yash9060 in #121
set multipart upload to default behavior by @jdnurme in #127
Introduce AsyncCheckpointIO option for non-blocking checkpoint saves by @awonak in #116
Print average times to save and load checkpoints together by @abhibyreddi in #129
Changing hardcoded values to placeholders by @Yash9060 in #128
Make num_nodes configurable by @abhibyreddi in #130
update lightning bench with multipart and 10k info by @jdnurme in #131
update default dataflux to use multipart by @jdnurme in #133
Run unit tests on x86 Mac by @abhibyreddi in #115
implement fast download for df checkpoint by @jdnurme in #134
Add image segmentation benchmark results to README by @abhibyreddi in #118
Add single node async benchmark execution to integration tests by @awonak in #135
Refactor benchmark tables by @awonak in #136
add option to run benchmark without lightning by @jdnurme in #137
Fix AsyncCheckpointIO race condition by @awonak in #138
Update image segmentation benchmark README by @abhibyreddi in #139
add upload and download improvements to multinode by @jdnurme in #141
Update documented step time by @abhibyreddi in #142
CPU simulated benchmarking for GKE cluster. by @Yash9060 in #143
Simulated CPU benchmarking code by @Yash9060 in #145
Add support for multi-node checkpointing with fsspec by @abhibyreddi in #144
Correcting the code for simulated benchmarks by @Yash9060 in #146
Multi-node checkpoint benchmark improvements by @MattIrv in #149
Set pytorch version to 2.3.1 by @abhibyreddi in #148
update main readme with checkpoing bench results by @jdnurme in #150
Add support to benchmark multi-node checkpointing with default FSDP strategy by @abhibyreddi in #151
Remove duplicative pip install instructions from multi-node checkpoint benchmark readme by @MattIrv in #152
Skip saving checkpoints during training by @abhibyreddi in #153
Install checkpoint benchmark dependencies before running the benchmark by @abhibyreddi in #155
Update checkpoint readmes by @MattIrv in #159
Implement a custom FSDP strategy for benchmarking loads from boot disk by @abhibyreddi in #157
Added debug flag to GCSReader/Writer by @Yash9060 in #154
Correcting load_checkpoint for simulated benchmarks. by @Yash9060 in #161
Add support for benchmarking checkpoint save/restore to/from distributed filesystems by @abhibyreddi in #162
Correct table header row by @abhibyreddi in #163
Adding option to use FSspec with simulated benchmarks by @Yash9060 in #164
Create client for each processs by @akansha1812 in #166
update bench script to run simulated multinode bench by @jdnurme in https://github.com/GoogleCloudPlatform/dataflux-pyt...

Contributors

awonak, MattIrv, and 4 other contributors

Assets 2

02 Aug 20:07

divrawal

v1.3.0

d9824cb

v1.3.0

What's Changed

Add boilerplate code for Dataflux-Pytorch Lightning demo by @abhibyreddi in #57
Refactor Dataflux simple demo loops and add retry flags by @MattIrv in #59
Catch exception when loading arrays from raw bytes fails by @abhibyreddi in #38
Implement data module for the pytorch lightning workload by @abhibyreddi in #60
Update default retry config to match successful 1k-node benchmarks. by @MattIrv in #63
Lightning text by @jdnurme in #64
add limit_train_batches param by @jdnurme in #65
Make it possible to deploy Pytorch Lightning image segmentation workload on a Ray cluster. by @abhibyreddi in #66
Update demo loops to configure multiprocessing start method by @MattIrv in #68
For mac and windows skip passing client storage to avoid pickling error in multiprocessing by @akansha1812 in #67
Disable compose download when create and delete permissions are missing by @akansha1812 in #70
Continuous test which installs gcs-torch-dataflux from PyPi and runs integration test. by @akansha1812 in #71
Added lightning package to setup file and updated version for re… by @divrawal in #69

New Contributors

@akansha1812 made their first contribution in #67

Full Changelog: v1.2.0...v1.3.0

Contributors

MattIrv, jdnurme, and 3 other contributors

Assets 2

15 Jul 22:01

jdnurme

v1.2.0

01df204

v1.2.0

What's Changed

Standardize Python extensions and formatting settings. by @MattIrv in #53
Set Dataflux user-agent through dataflux_core.user_agent module by @MattIrv in #52
Apply new formatter to all Python files. by @MattIrv in #54
Benchmark update by @divrawal in #51
Configure retry logic by @jdnurme in #55

Full Changelog: v1.1.0...v1.2.0

Contributors

MattIrv, jdnurme, and divrawal

Assets 2

12 Jul 05:43

bernardhan33

v1.1.0

e062cac

v1.1.0

What's Changed

Update README.md by @MattIrv in #33
Presubmit Integration Testing by @jdnurme in #34
Update README.md by @dutchiechris in #36
Add a new flag for specifying number of dataloader threads by @abhibyreddi in #37
add CODEOWNERS file by @jdnurme in #39
Lightning checkpoint by @divrawal in #40
Fix the continuous build failure by introducing virtual environment by @bernardhan33 in #43
Add threaded download to map-style dataset. by @MattIrv in #44
Readme typo fix by @divrawal in #42
add disable_compose config by @jdnurme in #46
Fix continuous and presubmit tests by @bernardhan33 in #47
Benchmark checkpoint by @divrawal in #45
update readme with 429 info by @jdnurme in #48
Fix QPS limit example URL referring to project instead of bucket. by @MattIrv in #49
Bump version to 1.1.0 by @bernardhan33 in #50

New Contributors

@dutchiechris made their first contribution in #36
@abhibyreddi made their first contribution in #37
@divrawal made their first contribution in #40

Full Changelog: v1.0.0...v1.1.0

Contributors

dutchiechris, MattIrv, and 4 other contributors

Assets 2

02 Apr 21:43

bernardhan33

v1.0.0

0ea91d8

v1.0.0

What's Changed

Add iterable dataset to Colab demo by @bernardhan33 in #19
Update README to note the iterable dataset support by @bernardhan33 in #18
add kokoro configs by @jdnurme in #21
Fix Iterable Dataset bug on downloading the whole subset of data by @bernardhan33 in #28
Update baseline performance numbers for Dataflux datasets by @bernardhan33 in #29
Debug logging for Kokoro unit test build by @MattIrv in #30
Increase pytest verbosity in Kokoro by @MattIrv in #31
bump version to 1.0.0 by @bernardhan33 in #32

New Contributors

@jdnurme made their first contribution in #21

Full Changelog: v0.1.0...v1.0.0

Contributors

MattIrv, jdnurme, and bernardhan33

Assets 2

15 Mar 00:30

bernardhan33

v0.1.0

8c0c45a

v0.1.0

What's Changed

Create a real-world end-to-end image segmentation training demo with Dataflux Dataset by @bernardhan33 in #8
Add checkpointing support by @bernardhan33 in #9
Add the simple walkthrough Colab by @bernardhan33 in #11
Add fast listing component to quick demo by @bernardhan33 in #13
Add pyproject.toml to prepare for PyPI release by @bernardhan33 in #14
Update README and demos to note the new pip install command by @bernardhan33 in #16
Add support for Dataflux Iterable Dataset by @bernardhan33 in #17

New Contributors

@MattIrv made their first contribution in #4

Full Changelog: v0.0.0...v0.1.0

Contributors

MattIrv and bernardhan33

Assets 2

13 Feb 22:05

bernardhan33

v0.0.0

353ce14

Dataflux v0.0.0

Added support for PyTorch map-style dataset.
Published early README.

What's Changed

Initial commit of dataflux-pytorch by @Magichan33 in #1
Fix padding by @Magichan33 in #2
Fix typo by @Magichan33 in #3

New Contributors

@Magichan33 made their first contribution in #1

Full Changelog: https://github.com/GoogleCloudPlatform/dataflux-pytorch/commits/v0.0.0

Contributors

bernardhan33

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

Contributors

What's Changed

New Contributors

Contributors

What's Changed

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

Releases: GoogleCloudPlatform/gcs-connector-for-pytorch

1.4.0

What's Changed

Contributors

v1.3.0

What's Changed

New Contributors

Contributors

v1.2.0

What's Changed

Contributors

v1.1.0

What's Changed

New Contributors

Contributors

v1.0.0

What's Changed

New Contributors

Contributors

v0.1.0

What's Changed

New Contributors

Contributors

Dataflux v0.0.0

What's Changed

New Contributors

Contributors