Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Great African Food Company Crop Type Tanzania Dataset #511

Closed
wants to merge 3 commits into from

Conversation

nilsleh
Copy link
Collaborator

@nilsleh nilsleh commented Apr 20, 2022

This PR adds the Crop Type Tanzania dataset from Radiant MLHub.

It features time series data with polygon crop type annotations as segmentation masks. I have implemented it as a GeoDataset to allow for train/val/test splits based on geographical location and am looking to do that with other datasets of this type (also converting CV4A_Kenya_Crop_Type dataset to a GeoDataset). Following the TODO in cv4a_kenya_crop_type.py this implementation is populating the rtree index by using stac.json files. The __getitem__ method returns a tensor with dimensions time x num_bands x height x width. This dataset has both rasters as input, as well as vector annotations, so sort of a hybrid between RasterDataset and VectorDataset.

Dataset Features:

  • 392 annotations with 6 different crop type classes for 44 different labeled areas that each have a variant amount of time series inputs in form of Sentinel 2 imagery

Dataset Format:

  • separate sentinel 2 bands as tif file as well as a cloud probability layer (images in epsg 32736)
  • stac.json files for each input image tile (bboxes in epsg 4326)
  • geojson files with polygon annotation and label (polygon coordinates in epsg 32736)
  • stac.json for labels

Issues:

  • I am not creating the correct dummy data, something is off with the bounds, and therefore tests are
  • here aren't many annotations and the ones that are there are all very small (see below for example) making me question whether I mess up the indexing when creating a segmentation mask from the polygon annotations
  • there is also another design choice regarding the included datetime in stac.json files: if this datetime is used to populate the index, then RandomGeoSampler will also sample time instances and a given bounding box query will not return all timesteps for a given geographical XY location, as it is maybe expected.

Example
Screenshot from 2022-04-20 20-25-53
:

@github-actions github-actions bot added datasets Geospatial or benchmark datasets testing Continuous integration testing labels Apr 20, 2022
@adamjstewart adamjstewart added this to the 0.3.0 milestone Apr 20, 2022
@adamjstewart
Copy link
Collaborator

and am looking to do that with other datasets of this type (also converting CV4A_Kenya_Crop_Type dataset to a GeoDataset)

❤️

@nilsleh nilsleh mentioned this pull request Apr 21, 2022
4 tasks
@adamjstewart
Copy link
Collaborator

Superseded by #512

@adamjstewart adamjstewart removed this from the 0.3.0 milestone Jul 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datasets Geospatial or benchmark datasets testing Continuous integration testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants