-
Notifications
You must be signed in to change notification settings - Fork 386
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add dataset table to the documentation #435
Conversation
This starts to re-raise the question of how we should categorize datasets. We keep flip-flopping on this and it's leading to a lot of inconsistencies. Functionally, I think the biggest distinction is between geospatial datasets (have geospatial metadata) and non-geospatial datasets. In terms of attributes that we might want to list in this table, the biggest distinction is between benchmark datasets (bot input image and target labels) and non-benchmark datasets. The division in the docs doesn't necessarily need to match the base class division. I'm leaning towards splitting the docs into benchmark vs. non-benchmark and keeping geospatial vs. non-geospatial as a base class distinction only. I've also been thinking about renaming
Torchvision does this for their models and it always confuses me because I expect the hyperlink to take me to the model class definition, not the citation. I would prefer to have hyperlinks to class definitions and then the class definition contains a hyperlink to the citation. Thoughts?
We'll have to remind people to update this table every time they add a new dataset. |
NonGeoDataset sounds great to me. And I'm fine with having a "benchmark dataset" table that doesn't align with how the classes are organized.
Fine with me -- rows in the table should definitely link somewhere.
Yep! That's fine with me. I can also add instructions to the contributing page in this PR. |
While we're on the topic, what about datasets like SpaceNet where they're pre-chipped to be like VisionDatasets but do contain geospatial metadata? In that case, it was the organisation of the dataset that informed the decision to make it a Kind of lies in between geo-vs-vision |
@ashnair1 my current plan is to someday convert all of those to |
65c2b1a
to
a1b61f5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In terms of filenames, there isn't a ton of consistency. We list "Geospatial Datasets" in "generic_datasets.csv" and "Non-geospatial Datasets" in "non_geo_datasets.csv". I'm honestly not sure what to call them anymore, and we've gone back and forth for a while. We should figure out how to make these more consistent in our docs/API. This doesn't necessarily need to happen in this PR, just pointing out the inconsistencies here.
a1b61f5
to
5e3dc17
Compare
* added to data table * add links * fix docs
0ed71a8
to
0e1b673
Compare
This looks great! Will take a closer look later. Since we first created the docs, torchvision's docs have completely changed. They now have a short page with just the dataset tables and then separate pages for each dataset. I actually kind of like that format, and it may allow us to skip the step of adding the dataset to |
Co-authored-by: Adam J. Stewart <[email protected]>
Co-authored-by: Adam J. Stewart <[email protected]>
Co-authored-by: Adam J. Stewart <[email protected]>
Co-authored-by: Adam J. Stewart <[email protected]>
Co-authored-by: Adam J. Stewart <[email protected]>
Alright, ready for round 2 |
* Add benchmark dataset table * Add geospatial datasets * Work on Data table (microsoft#478) * added to data table * add links * fix docs * Added section for implementing new datasets to the Contributing page * Removing extra file * Add EDDMapS and GBIF rows to generic * Formatting * Renaming to make sense * Short names * Fixes * Checking references * Trying links * Figured out links * Removing hyphens for empty cells as these are rendered as bullet points * Update docs/api/non_geo_datasets.csv Co-authored-by: Adam J. Stewart <[email protected]> * Update docs/api/non_geo_datasets.csv Co-authored-by: Adam J. Stewart <[email protected]> * Update docs/api/non_geo_datasets.csv Co-authored-by: Adam J. Stewart <[email protected]> * Update docs/api/non_geo_datasets.csv Co-authored-by: Adam J. Stewart <[email protected]> * Update docs/user/contributing.rst Co-authored-by: Adam J. Stewart <[email protected]> * Update docs/api/geo_datasets.csv * Update geo_datasets.csv * Update geo_datasets.csv * Update contributing.rst * Formatting * Fix table links Co-authored-by: Nils Lehmann <[email protected]> Co-authored-by: Adam J. Stewart <[email protected]>
In the TorchGeo paper we have a table that lists the properties of some of the datasets we've added. This should be reproduced in the docs so that we have an overview of what all is available in the library.
I've just copied (more/less) the tables from the paper and haven't updated them with the datasets that have been implemented since.
Things to look into/questions: