-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Overwriting files that were lazily loaded results in permission error… #241
base: main
Are you sure you want to change the base?
Conversation
…s. Loading the data first solves that problem.
Somehow, the tests that I implemented to guarantee this would work (which it perfectly did locally) fail online with he same permission errors as before. However, since everything works on my side, it's very hard to debug.. @DirkEilander or @LuukBlom any suggestions are welcome. |
try: | ||
ds.to_netcdf(nc_copy) | ||
except PermissionError: | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to_netcdf()
can succeed or fail, doesn't affect the test at all?
use
with pytest.raises(PermissionError) as e:
ds.to_netcdf(nc_copy)
assert .... in str(e)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good one, I was looking for something like that, but then forgot
ds = utils.xu_open_dataset(nc_copy) | ||
|
||
# Convert to dataset | ||
ds = ds.ugrid.to_dataset() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here you are overwriting the ds
variable with another object, which could trigger the garbage collector and release the files. m not sure if this breaks anything, but I would stay away from using the same variable name for two different objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty sure this happens in more places in the code, I could change that.
def check_exists_and_lazy(ds, file_name): | ||
"""If a netcdf file is read lazily, the file can not be overwritten. | ||
This function checks whether the file already exists, if so, it checks | ||
if the data is lazily loaded. If so, data should be loaded before writing. | ||
|
||
Parameters | ||
---------- | ||
ds : xarray.Dataset, xu.UgridDataset | ||
The dataset to be written to a netcdf file. | ||
file_name : str | ||
The path to the netcdf file. | ||
""" | ||
if not os.path.exists(file_name): | ||
return | ||
|
||
# Check for lazy loading | ||
lazy_vars = [not data_array._in_memory for data_array in ds.data_vars.values()] | ||
|
||
# if all(lazy_vars): | ||
# return # All variables are lazy-loaded, skip writing? | ||
|
||
if any(lazy_vars): | ||
ds.load() # Some variables are lazy-loaded, load them into memory | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure to call ds.close()
the file after loading the contents
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed earlier that ds.close() does not work for xugrid.UgridDatasets right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, I keep forgetting.
Unfortunately, for the same reason ds.load()
will also not work then :s
…s. Loading the data first solves that problem.
Issue addressed
When reading in a sfincs model with netcdf files (as grid (future)), subgrid or as forcing), these datasets are lazily loaded. In theory, that increases performance, because data is only loaded when necesssary. However, due to lazy references to the data in the original files, the files cannot be overwritten when you write the model.
Of course, you could choose to only write the files that you changed, but often the SfincsModel.write() is used.
Explanation
Everywhere where we would like to write a netcdf, we first check if the file already exists. If that is the case, we have to load the data into memory first, such that we can overwrite the file after (after loading, the lazy references are gone).
Checklist
main
Additional Notes (optional)
Question is whether we even want to load and overwrite datasets that are still lazy, because in most cases that will mean that they are unchanged and loading/writing would result in unnecessary computations?