Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overwriting files that were lazily loaded results in permission error… #241

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

roeldegoede
Copy link
Collaborator

…s. Loading the data first solves that problem.

Issue addressed

When reading in a sfincs model with netcdf files (as grid (future)), subgrid or as forcing), these datasets are lazily loaded. In theory, that increases performance, because data is only loaded when necesssary. However, due to lazy references to the data in the original files, the files cannot be overwritten when you write the model.

Of course, you could choose to only write the files that you changed, but often the SfincsModel.write() is used.

Explanation

Everywhere where we would like to write a netcdf, we first check if the file already exists. If that is the case, we have to load the data into memory first, such that we can overwrite the file after (after loading, the lazy references are gone).

Checklist

  • [x ] Updated tests or added new tests
  • [x ] Branch is up to date with main
  • Updated documentation if needed
  • Updated changelog.rst if needed

Additional Notes (optional)

Question is whether we even want to load and overwrite datasets that are still lazy, because in most cases that will mean that they are unchanged and loading/writing would result in unnecessary computations?

…s. Loading the data first solves that problem.
@roeldegoede roeldegoede marked this pull request as draft January 29, 2025 09:08
@roeldegoede
Copy link
Collaborator Author

Somehow, the tests that I implemented to guarantee this would work (which it perfectly did locally) fail online with he same permission errors as before. However, since everything works on my side, it's very hard to debug.. @DirkEilander or @LuukBlom any suggestions are welcome.

Comment on lines +59 to +62
try:
ds.to_netcdf(nc_copy)
except PermissionError:
pass
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to_netcdf() can succeed or fail, doesn't affect the test at all?

use

with pytest.raises(PermissionError) as e:
  ds.to_netcdf(nc_copy)
  assert .... in str(e)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good one, I was looking for something like that, but then forgot

Comment on lines +52 to +55
ds = utils.xu_open_dataset(nc_copy)

# Convert to dataset
ds = ds.ugrid.to_dataset()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here you are overwriting the ds variable with another object, which could trigger the garbage collector and release the files. m not sure if this breaks anything, but I would stay away from using the same variable name for two different objects.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty sure this happens in more places in the code, I could change that.

Comment on lines +1346 to +1369
def check_exists_and_lazy(ds, file_name):
"""If a netcdf file is read lazily, the file can not be overwritten.
This function checks whether the file already exists, if so, it checks
if the data is lazily loaded. If so, data should be loaded before writing.

Parameters
----------
ds : xarray.Dataset, xu.UgridDataset
The dataset to be written to a netcdf file.
file_name : str
The path to the netcdf file.
"""
if not os.path.exists(file_name):
return

# Check for lazy loading
lazy_vars = [not data_array._in_memory for data_array in ds.data_vars.values()]

# if all(lazy_vars):
# return # All variables are lazy-loaded, skip writing?

if any(lazy_vars):
ds.load() # Some variables are lazy-loaded, load them into memory
return
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure to call ds.close() the file after loading the contents

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed earlier that ds.close() does not work for xugrid.UgridDatasets right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, I keep forgetting.
Unfortunately, for the same reason ds.load() will also not work then :s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants