Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: nodata handling #162 #163

Merged
merged 7 commits into from
Jun 19, 2024
Merged

refactor: nodata handling #162 #163

merged 7 commits into from
Jun 19, 2024

Conversation

Kirill888
Copy link
Member

@Kirill888 Kirill888 commented Jun 18, 2024

  • adding new types
    • MaybeAutoNodata SomeNodata -- None|int|float|str|"auto"
    • Nodata is now None|float|int
    • MaybeNodata is now None|float|int|str have been removed
    • "auto" replaces what used to be None
    • None now means "no nodata value"
  • resolve_nodata() is used to handle nodata options consistently across the library
  • .odc.nodata is used to extract nodata from xarrays
  • fixes in reproject/overview generation for float data. It is assumed to have nan values, and GDAL needs nodata=nan to handle it correctly.

nodata types explained

  • MaybeAutoNodataSomeNodata should be used in APIs that allow configuring nodata and nodata overrides
  • Nodata is a type suitable to pass to GDAL {src,dst)_nodata it can be None which means expect no missing values in the data.
  • FillValue is a type suitable for np.full and is resolved from Nodata + dtype
    • NaN for not-configured floating point images
    • 0 for not-configured integer images
    • value of nodata if set (NaN is fine here too)
SomeNodata + dtype + [fallback] -> Nodata + dtype -> FillValue

Copy link

github-actions bot commented Jun 18, 2024

@github-actions github-actions bot temporarily deployed to pull request June 18, 2024 06:42 Inactive
- adding new types
  - `MaybeAutoNodata` -- `None|int|float|str|"auto"`
  - `Nodata` is now `None|float|int`
  - `MaybeNodata` is now `None|float|int|str
  - "auto" replaces what used to be `None`
  - `None` now means "no nodata value"
- `resolve_nodata()` is used to handle nodata
   options consistently across the library
- default nodata for float is `nan`
- fixes in reproject/overview generation for float
  data. It is assumed to have `nan` values, and
  GDAL needs `nodata=nan` to handle it correctly.
@github-actions github-actions bot temporarily deployed to pull request June 18, 2024 06:48 Inactive
@SpacemanPaul
Copy link
Contributor

Thanks for the swift action on this Kirill - I will review tomorrow.

- bump versions for netlify
- supply token to codecov
Copy link

codecov bot commented Jun 18, 2024

Codecov Report

Attention: Patch coverage is 98.88889% with 1 line in your changes missing coverage. Please review.

Project coverage is 95.48%. Comparing base (6af5d0c) to head (e80f39a).
Report is 33 commits behind head on develop.

Files Patch % Lines
odc/geo/_xr_interop.py 96.55% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #163      +/-   ##
===========================================
+ Coverage    95.26%   95.48%   +0.21%     
===========================================
  Files           31       31              
  Lines         5323     5489     +166     
===========================================
+ Hits          5071     5241     +170     
+ Misses         252      248       -4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@github-actions github-actions bot temporarily deployed to pull request June 18, 2024 07:00 Inactive
@github-actions github-actions bot temporarily deployed to pull request June 18, 2024 07:17 Inactive
@github-actions github-actions bot temporarily deployed to pull request June 18, 2024 07:26 Inactive
@robbibt
Copy link
Contributor

robbibt commented Jun 18, 2024

Thanks heaps for this @Kirill888 - the test cases appear to match the functionality I was hoping for nicely, but I'll do a "user" test today and verify that TIFFs exported using the updated code work as intended (e.g. in ESRI etc).

@Kirill888
Copy link
Member Author

@robbibt thanks, BTW pleas use pip install odc-geo==0.4.7rc1 for your test (that release is not on conda, only pypi)

@robbibt
Copy link
Contributor

robbibt commented Jun 19, 2024

I think this is working perfectly. For example, we have a dataset with an Xarray nodata: nan attribute:

import datacube
import odc.geo.xr
from datacube.utils.cog import write_cog

dc = datacube.Datacube()

query_params = dict(
    x=(142.13223, 142.65461),
    y=(-32.17591, -32.54618),
    time=("2022", "2022"),
)

ds = dc.load(product="ga_ls8cls9c_gm_cyear_3", measurements=["edev"], **query_params)
da = ds.edev.squeeze()

image

This gets written out with a GeoTIFF nodata flag nodata=nan using both datacube and odc-geo tooling:

da.odc.write_cog("nodata_nan_odcgeo.tif")
write_cog(da, "nodata_nan_datacube.tif")

image
image

However, if we write out data without a Xarray nodata: nan attribute, datacube doesn't include a GeoTIFF nodata flag, but now odc-geo does!

del da.attrs["nodata"]

da.odc.write_cog("nodata_missing_odcgeo.tif")
write_cog(da, "nodata_missing_datacube.tif")

image
image

I can though return a true missing nodata value like this:

da.odc.write_cog("nodata_truenone_odcgeo.tif", nodata=None)

image

FillValue = Union[float, int]
Nodata = Union[float, int, None]
MaybeNodata = Union[float, int, str, None]
MaybeAutoNodata = Union[float, int, str, None, Literal["auto"]]
T = TypeVar("T")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MaybeNodata = Nodata | str seems unexpected as Maybe usually means or None. I understand that str might be necessary for cases where nodata in ('nan', 'NAN', 'NaN') but I'm not sure why this gets a "Maybe" label?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, there's a conversion function in math.py

I still think the naming here is confusing but I don't have a good suggestion for a replacement. Maybe ApiNodata? As in the nodata type the user sees in the API (as opposed the internal nodata value passed to GDAL)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's more of a compatibility issue, before this Nodata could not be None, but it could be a str, so there was a MaybeNodata also. I guess we could delete it assuming nobody was using it.

Copy link
Contributor

@SpacemanPaul SpacemanPaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think some of the naming is confusing, but the approach is sound.

FillValue = Union[float, int]
Nodata = Union[float, int, None]
MaybeNodata = Union[float, int, str, None]
MaybeAutoNodata = Union[float, int, str, None, Literal["auto"]]
T = TypeVar("T")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, there's a conversion function in math.py

I still think the naming here is confusing but I don't have a good suggestion for a replacement. Maybe ApiNodata? As in the nodata type the user sees in the API (as opposed the internal nodata value passed to GDAL)?

@robbibt
Copy link
Contributor

robbibt commented Jun 19, 2024

I do want to do a quick test of the COG overviews stuff, so will approve this as soon as I've finished that.

@robbibt
Copy link
Contributor

robbibt commented Jun 19, 2024

I think some of the naming is confusing, but the approach is sound.

I also find "MaybeAutoNodata" a bit confusing - took me a while to work out what it meant.

@Kirill888
Copy link
Member Author

I think some of the naming is confusing, but the approach is sound.

I also find "MaybeAutoNodata" a bit confusing - took me a while to work out what it meant.

What about SomeNodata instead of MaybeAutoNodata, @SpacemanPaul @robbibt

@SpacemanPaul
Copy link
Contributor

I think some of the naming is confusing, but the approach is sound.

I also find "MaybeAutoNodata" a bit confusing - took me a while to work out what it meant.

What about SomeNodata instead of MaybeAutoNodata, @SpacemanPaul @robbibt

Yes SomeNodata or even AnyNodata works for me.

now that `Nodata` is allowed to be `None`,
'MaybeNodata' is now confusing, removing it.

Let's assume that it never was a part of the
`odc.geo` API.
@Kirill888
Copy link
Member Author

Kirill888 commented Jun 19, 2024

Updated,

  • Removed no longer relevant MaybeNodata type, if anyone was using that type, they were probably also using Nodata type too, and that have changed, so...

  • use SomeNodata instead of MaybeAutoNodata name

  • add nodata setter

@github-actions github-actions bot temporarily deployed to pull request June 19, 2024 02:21 Inactive
@Kirill888 Kirill888 merged commit 78577dc into develop Jun 19, 2024
20 checks passed
@robbibt
Copy link
Contributor

robbibt commented Jun 19, 2024

OK, overview generation is working much better for our DEA Intertidal example: even if nodata is not set, we still get a result that follows GDAL's expected functionality:

datacube-core with nodata=nan Xarray attribute (overviews generated correctly ✔️):

datacube-core with no nodata Xarray attribute (overviews generated incorrectly ❌):

odc-geo with nodata=nan Xarray attribute (overviews generated correctly ✔️):

odc-geo with no nodata Xarray attribute (overviews generated correctly ✔️):

@Kirill888 Kirill888 deleted the fix-float-nodata branch June 19, 2024 02:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants