Skip to content

alxmrs/xarray-beam

This branch is 81 commits behind google/xarray-beam:main.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

author
Xarray-Beam authors
Sep 22, 2021
ebfdbf0 · Sep 22, 2021

History

61 Commits
Aug 2, 2021
Aug 16, 2021
Aug 5, 2021
Sep 22, 2021
Aug 9, 2021
Aug 5, 2021
May 11, 2021
May 11, 2021
Aug 16, 2021
May 20, 2021
Sep 15, 2021

Repository files navigation

Xarray-Beam

Xarray-Beam is a Python library for building Apache Beam pipelines with Xarray datasets.

The project aims to facilitate data transformations and analysis on large-scale multi-dimensional labeled arrays, such as:

  • Ad-hoc computation on Xarray data, by dividing a xarray.Dataset into many smaller pieces ("chunks").
  • Adjusting array chunks, using the Rechunker algorithm.
  • Ingesting large, multi-dimensional array datasets into an analysis-ready, cloud-optimized format, namely Zarr (see also Pangeo Forge).
  • Calculating statistics (e.g., "climatology") across distributed datasets with arbitrary groups.

For more about our approach and how to get started, read the documentation!

🚨 Warning: Xarray-Beam is new and unpolished 🚨

Expect sharp edges 🔪 and performance cliffs 🧗, particularly related to the management of lazy data with Dask and reading/writing data with Zarr. We have used it to efficiently process ~25 TB datasets. We expect it to scale to PB size datasets, but that's easier said than done. We welcome feedback and contributions from early adopters, and hope to have it ready for wider audience soon.

Installation

Xarray-Beam requires recent versions of immutabledict, xarray, dask, rechunker and zarr, and the latest release of Apache Beam (2.31.0 or later). For best performance when writing Zarr files, use Xarray 0.19.0 or later.

Disclaimer

Xarray-Beam is an experiment that we are sharing with the outside world in the hope that it will be useful. It is not a supported Google product. We welcome feedback, bug reports and code contributions, but cannot guarantee they will be addressed.

See the "Contribution guidelines" for more.

Credits

Contributors:

  • Stephan Hoyer
  • Jason Hickey
  • Cenk Gazen
  • Alex Merose

About

Distributed Xarray with Apache Beam

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 100.0%