Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Harmonize nodata usage #375

Open
johanvdw opened this issue Apr 24, 2024 · 2 comments
Open

Harmonize nodata usage #375

johanvdw opened this issue Apr 24, 2024 · 2 comments

Comments

@johanvdw
Copy link
Collaborator

There are too many different nodata definitions in use in niche:

  • if data is float 32: np.nan is use
  • if data is uint8: 255 is used
  • in other cases -99 is used

I think we should consider moving all data types to float32. Areas are relatively small, and we compress tif files anyway. This would also mean we can use np.nan for no data, which propagates properly without special tweaks.

masked arrays are another option to remove this low level code, but fixing the data type is probably even better.

Note that currently the code seems to be working well, but it is rather complex internally, leading eg to #335 .

@cecileherr
Copy link
Collaborator

A side note: in the past (with niche 1.2, Win 10, 8Go RAM but hardly any free hard disk < 10 Go) I have had problems with memory issues with some projects (example: for a project with resolution of 5*5 m

MemoryError: unable to allocate 43.8 MiB for an array with shape (2760, 4160) and data type float32

).

I suspect changing the data type to float/solving issue #335 might lead to more memory problems (?). Would it be possible to test this/give an idea of the impact on memory/speed? Thx!

@stijnvanhoey
Copy link
Collaborator

I did a check on the usage of masked arrays, see #387, but I will reverse the masked array implementation. Whereas the usage simplifies the implementation an no-data-regions considerably, the reasons are:

As this is more than double the time, we do no longer consider this a valuable option and focus on predefined data types (uint8 and float32) for each variable with clear no-data implementation (255 and np.nan for uint8 and float32 respectively)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants