Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add South Africa Crop Type Competition Dataset #1840

Merged
merged 40 commits into from
Mar 15, 2024

Conversation

GeorgeHuber
Copy link
Contributor

@GeorgeHuber GeorgeHuber commented Feb 1, 2024

This PR adds the South Africa Crop Type Competition Dataset to TorchGeo.

Dataset

Similar to #1459, this dataset does not implement automatic download functions due to issues migrating data from radiantmlhub source cooperative until issue #1830 is resolved.

Example field and label in early July:
Screenshot 2024-02-19 at 4 47 04 PM

  • Implement the dataset extending either GeoDataset or NonGeoDataset
  • Add the dataset definition to torchgeo/datasets/init.py
  • Add a data.py script to tests/data// that generates test data with the same directory structure/file naming conventions as the new dataset
  • Add appropriate tests with 100% test coverage to tests/datasets/
  • Add the dataset to docs/api/datasets.rst
  • Add the dataset metadata to either docs/api/geo_datasets.csv or docs/api/non_geo_datasets.csv

@github-actions github-actions bot added the datasets Geospatial or benchmark datasets label Feb 1, 2024
@GeorgeHuber GeorgeHuber marked this pull request as draft February 1, 2024 03:06
@github-actions github-actions bot added the testing Continuous integration testing label Feb 9, 2024
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Feb 9, 2024
@adamjstewart adamjstewart added this to the 0.6.0 milestone Feb 11, 2024
Copy link
Contributor

@yichiac yichiac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please refer to #1459 for removing the download and checksum functions until #1830 is resolved.

torchgeo/datasets/south_africa_crop_type.py Outdated Show resolved Hide resolved
torchgeo/datasets/south_africa_crop_type.py Outdated Show resolved Hide resolved
torchgeo/datasets/south_africa_crop_type.py Outdated Show resolved Hide resolved
torchgeo/datasets/south_africa_crop_type.py Outdated Show resolved Hide resolved
torchgeo/datasets/south_africa_crop_type.py Outdated Show resolved Hide resolved
torchgeo/datasets/south_africa_crop_type.py Outdated Show resolved Hide resolved
@yichiac yichiac mentioned this pull request Feb 15, 2024
4 tasks
@GeorgeHuber GeorgeHuber marked this pull request as ready for review February 16, 2024 05:36
GeorgeHuber and others added 10 commits February 15, 2024 23:44
* Documentation, satellite-specific transform and weights for additional Satlas single-image rgb&multispectral Swin-v2 models. Tests pass.

* Address 3 of comments

* Address comments, fix readmydocs and isort, mypy still unhappy

* update

* Add bands to meta dicts

* Add comment about Satlas S2 RGB using TCI product

* linting

---------

Co-authored-by: Piper Wolters <[email protected]>
Co-authored-by: Piper Wolters <[email protected]>
tests/data/south_africa_crop_type/data.py Outdated Show resolved Hide resolved
tests/data/south_africa_crop_type/data.py Outdated Show resolved Hide resolved
torchgeo/datasets/south_africa_crop_type.py Outdated Show resolved Hide resolved
torchgeo/datasets/south_africa_crop_type.py Outdated Show resolved Hide resolved
torchgeo/datasets/south_africa_crop_type.py Outdated Show resolved Hide resolved
torchgeo/datasets/south_africa_crop_type.py Outdated Show resolved Hide resolved
torchgeo/datasets/south_africa_crop_type.py Show resolved Hide resolved
torchgeo/datasets/south_africa_crop_type.py Outdated Show resolved Hide resolved
torchgeo/datasets/south_africa_crop_type.py Outdated Show resolved Hide resolved
torchgeo/datasets/south_africa_crop_type.py Outdated Show resolved Hide resolved
@GeorgeHuber GeorgeHuber requested a review from adamjstewart March 1, 2024 20:31
Copy link
Collaborator

@adamjstewart adamjstewart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any remaining major issues. Just wondering if __getitem__ can be further simplified. Also need comments in __getitem__ to explain all the non-standard stuff.

torchgeo/datasets/south_africa_crop_type.py Outdated Show resolved Hide resolved
torchgeo/datasets/south_africa_crop_type.py Outdated Show resolved Hide resolved
@GeorgeHuber
Copy link
Contributor Author

I don't see any remaining major issues. Just wondering if __getitem__ can be further simplified. Also need comments in __getitem__ to explain all the non-standard stuff.

I added some comments to explain the extra stuff, let me know if you think it needs more detail. The multiple images to one label thing is really where most of the complication comes from along with the need to make sure the labels and fields are in the same order since they won't be sorted alphabetically. I thought this would be the most efficient solution as we loop over the hit filepaths once just as we would standardly but I'm open to other ideas.

Copy link
Collaborator

@adamjstewart adamjstewart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably worth iterating over the real data just to make sure there are no bugs once you move away from the synthetic data. Just as a quick personal test, doesn't need to be in CI.

torchgeo/datasets/south_africa_crop_type.py Outdated Show resolved Hide resolved
torchgeo/datasets/south_africa_crop_type.py Outdated Show resolved Hide resolved
torchgeo/datasets/south_africa_crop_type.py Outdated Show resolved Hide resolved
Returns:
data and labels at that index
"""
assert isinstance(self.paths, str)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably possible to relax this constraint by using filepaths to compute the correct directory for the mask. But let's do that another day, I want to get this merged.

@adamjstewart adamjstewart merged commit 1f6e974 into microsoft:main Mar 15, 2024
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datasets Geospatial or benchmark datasets documentation Improvements or additions to documentation testing Continuous integration testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants