Skip to content

Commit

Permalink
Rename reading/writing doc
Browse files Browse the repository at this point in the history
  • Loading branch information
shoyer committed Aug 5, 2021
1 parent 9d8d908 commit 72191d5
Show file tree
Hide file tree
Showing 2 changed files with 21 additions and 19 deletions.
4 changes: 3 additions & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,12 @@ The documentation includes narrative documentation that will walk you through th

We recommend reading both, as well as a few [end to end examples](https://github.com/google/xarray-beam/tree/main/examples) to understand what code using Xarray-Beam typically looks like.

## Contents

```{toctree}
:maxdepth: 1
data-model.ipynb
io.ipynb
read-write.ipynb
aggregation.ipynb
rechunking.ipynb
api.md
Expand Down
36 changes: 18 additions & 18 deletions docs/io.ipynb → docs/read-write.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -5,15 +5,15 @@
"id": "8e4f05ea",
"metadata": {},
"source": [
"# Loading and saving data"
"# Reading and writing data"
]
},
{
"cell_type": "markdown",
"id": "480ac360",
"metadata": {},
"source": [
"## Loading datasets into chunks"
"## Read datasets into chunks"
]
},
{
Expand All @@ -27,7 +27,7 @@
{
"cell_type": "code",
"execution_count": 42,
"id": "3fec02e8",
"id": "5923b201",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -37,7 +37,7 @@
{
"cell_type": "code",
"execution_count": 39,
"id": "7b431556",
"id": "bc5bfdc0",
"metadata": {
"tags": [
"hide-input"
Expand Down Expand Up @@ -100,7 +100,7 @@
},
{
"cell_type": "markdown",
"id": "25c5f2a6",
"id": "2f0e5efb",
"metadata": {},
"source": [
"Importantly, xarray datasets fed into `DatasetToChunks` **can be lazy**, with data not already loaded eagerly into NumPy arrays. When you feed lazy datasets into `DatasetToChunks`, each individual chunk will be indexed and evaluated separately on Beam workers.\n",
Expand All @@ -113,7 +113,7 @@
{
"cell_type": "code",
"execution_count": 47,
"id": "a2ce5049",
"id": "8a0d0091",
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -149,15 +149,15 @@
},
{
"cell_type": "markdown",
"id": "ea3ec245",
"id": "de622acb",
"metadata": {},
"source": [
"`chunks=None` tells Xarray to use its builtin lazy indexing machinery, instead of using Dask. This is advantageous because datasets using Xarray's lazy indexing are serialized much more compactly (via [pickle](https://docs.python.org/3/library/pickle.html)) when passed into Beam transforms."
]
},
{
"cell_type": "markdown",
"id": "1c8bc4bc",
"id": "e4dc8c82",
"metadata": {},
"source": [
"Alternatively, you can pass in lazy datasets [using dask](http://xarray.pydata.org/en/stable/user-guide/dask.html). In this case, you don't need to explicitly supply `chunks` to `DatasetToChunks`:"
Expand All @@ -166,7 +166,7 @@
{
"cell_type": "code",
"execution_count": 49,
"id": "3c86c82e",
"id": "b61440aa",
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -198,7 +198,7 @@
},
{
"cell_type": "markdown",
"id": "db585d3a",
"id": "d73c6398",
"metadata": {},
"source": [
"Dask's lazy evaluation system is much more general than Xarray's lazy indexing, so as long as resulting dataset can be independently evaluated in each chunk this can be a very convenient way to setup computation for Xarray-Beam.\n",
Expand All @@ -208,7 +208,7 @@
},
{
"cell_type": "markdown",
"id": "cdf80b53",
"id": "4c4dfd42",
"metadata": {},
"source": [
"```{note}\n",
Expand All @@ -221,25 +221,25 @@
"id": "233809a4",
"metadata": {},
"source": [
"## Saving data to Zarr"
"## Writing data to Zarr"
]
},
{
"cell_type": "markdown",
"id": "67b10192",
"id": "2f415ceb",
"metadata": {},
"source": [
"[Zarr](https://zarr.readthedocs.io/) is the preferred file format for reading and writing data with Xarray-Beam, due to its excellent scalability and support inside Xarray.\n",
"\n",
"{py:class}`~xarray_beam.ChunksToZarr` is Xarray-Beam's API for saving chunks into a (new) Zarr store. \n",
"{py:class}`~xarray_beam.ChunksToZarr` is Xarray-Beam's API for saving chunks into a Zarr store. \n",
"\n",
"You can get started just using it directly:"
]
},
{
"cell_type": "code",
"execution_count": 50,
"id": "88fc081c",
"id": "9c6efa33",
"metadata": {},
"outputs": [
{
Expand All @@ -257,7 +257,7 @@
},
{
"cell_type": "markdown",
"id": "70da81a8",
"id": "04c0f50b",
"metadata": {},
"source": [
"By default, `ChunksToZarr` needs to evaluate and combine the entire distributed dataset in order to determine overall Zarr metadata (e.g., array names, shapes, dtypes and attributes). This is fine for relatively small datasets, but can entail significant additional communication and storage costs for large datasets.\n",
Expand All @@ -270,7 +270,7 @@
{
"cell_type": "code",
"execution_count": 55,
"id": "b8ea3f4a",
"id": "993191db",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -280,7 +280,7 @@
},
{
"cell_type": "markdown",
"id": "31748c31",
"id": "e70cd961",
"metadata": {},
"source": [
"Xarray operations like indexing and expand dimensions (see {py:meth}`xarray.Dataset.expand_dims`) are entirely lazy on this dataset, which makes it relatively straightforward to build up a Dataset with the required variables and dimensions, e.g., as used in the [ERA5 climatology example](https://github.com/google/xarray-beam/blob/main/examples/era5_climatology.py)."
Expand Down

0 comments on commit 72191d5

Please sign in to comment.