Skip to content

Commit

Permalink
Mhs/das 2293/reduce memory usage (#12)
Browse files Browse the repository at this point in the history
* DAS-2293: Write the output netcdf file incrementally.

Write out the datatree appending each variable to keep from having to hold the
entire data tree in memory at once.

* DAS-2293: Update changelog and version

* fixup! DAS-2293: Write the output netcdf file incrementally.
  • Loading branch information
flamingbear authored Feb 6, 2025
1 parent 70b7768 commit 187722c
Show file tree
Hide file tree
Showing 3 changed files with 22 additions and 12 deletions.
7 changes: 7 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,13 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [v0.2.1] - 2025-02-06

### Changed

- Output file written incrementally to reduce total memory usage.


## [v0.2.0] - 2025-02-04

### Added
Expand Down
2 changes: 1 addition & 1 deletion docker/service_version.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
0.2.0
0.2.1
25 changes: 14 additions & 11 deletions smap_l2_gridder/grid.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,33 +30,36 @@ def transform_l2g_input(input_filename: Path, output_filename: Path) -> None:

def process_input(in_data: DataTree, output_file: Path):
"""Process input file to generate gridded output file."""
out_data = DataTree()
root_dt = DataTree()

short_name = get_collection_shortname(in_data)

out_data = transfer_metadata(in_data, out_data)
root_dt = transfer_metadata(in_data, root_dt)
root_dt.to_netcdf(output_file, mode='w')

# Process grids from all top level groups that are not only Metadata
data_group_names = get_data_groups(in_data)

for group_name in data_group_names:
group_dt = DataTree()

grid_info = get_grid_information(in_data, group_name, short_name)
vars_to_grid = get_target_variables(in_data, group_name, short_name)

# Add coordinates and CRS metadata for this group_name
x_dim, y_dim = compute_dims(grid_info['target'])
out_data[f'{group_name}/crs'] = create_crs(grid_info['target'])
out_data[f'{group_name}/x-dim'] = x_dim
out_data[f'{group_name}/y-dim'] = y_dim
group_dt[f'{group_name}/crs'] = create_crs(grid_info['target'])
group_dt[f'{group_name}/x-dim'] = x_dim
group_dt[f'{group_name}/y-dim'] = y_dim

group_dt.to_netcdf(output_file, mode='a')

for var_name in vars_to_grid:
var_dt = DataTree()
full_var_name = f'{group_name}/{var_name}'
out_data[full_var_name] = prepare_variable(
in_data[full_var_name], grid_info
)

# write the output data file.
out_data.to_netcdf(output_file)
var_dt[full_var_name] = prepare_variable(in_data[full_var_name], grid_info)
# append variable to output file
var_dt.to_netcdf(output_file, mode='a')


def prepare_variable(var: DataTree | DataArray, grid_info: dict) -> DataArray:
Expand Down

0 comments on commit 187722c

Please sign in to comment.