-
Notifications
You must be signed in to change notification settings - Fork 386
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add PreChippedGeoSampler for pre-chipped geospatial datasets #479
Conversation
So how does this work? Are you taking all the pre-chipped GeoTIFFs in a directory and building an R-tree using those extents? |
@RitwikGupta Yes, that's how a |
Some benchmarking -- I created a subset of 5000 tiffs from the USAVars dataset (256x256x4 patches in local UTM CRS scattered around the US) and used these with a RasterDataset. Things:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In terms of pure functionality, works exactly as advertised :)
Yeah, this is much slower than I expected. For datasets that come with a STAC JSON file we should use this whenever possible.
Yep, wouldn't expect any difference here. I think a more interesting benchmark would be to convert a VisionDataset to a RasterDataset and compare before and after.
Is this just because of warping?
Let me clarify this in the docs. This will hopefully no longer be an issue with #409. |
Perhaps we can generate/cache this on first run? This is also a problem with the SECO dataset IIRC.
This is essentially what I'm doing with the CustomDataset. The |
…ft#479) * Add PreChippedGeoSampler for pre-chipped geospatial datasets * Add shuffle parameter * Add tests, fix type hints * Warn about multi-CRS datasets
…ft#479) * Add PreChippedGeoSampler for pre-chipped geospatial datasets * Add shuffle parameter * Add tests, fix type hints * Warn about multi-CRS datasets
Rationale
Many existing
VisionDatasets
actually contain geospatial metadata. These datasets should be converted toGeoDatasets
(#83). However,GeoDatasets
are a bit more complicated thanVisionDatasets
and require aGeoSampler
to use. This PR adds aPreChippedGeoSampler
to make this transition easier.Implementation
For
VisionDatasets
, sampling is quite simple:However, it was much trickier to get the same behavior for
GeoDatasets
. Previously, a user would need to do something like:Crucially, this requires the user to know the size of each image, to explicitly specify the number of images in the train dataset, and to be clever with stride. With this PR, users can instead use:
This is almost as simple as
VisionDataset
sampling and probably about as good as we're going to get.This may be of interest to @recursix @RitwikGupta @ashnair1. I think #409 is the only remaining bottleneck preventing us from converting more
VisionDatasets
toGeoDatasets
.