Replies: 3 comments 1 reply
-
Sorry for taking so long to respond to this! Grad school/internships have kept me busy. I think it's important to think about this not just from the perspective of curated benchmark datasets like CV4A Crop Type Dataset but also from the perspective of uncurated collections of geospatial data (e.g., decades of Landsat and CDL data). The latter is obviously a harder challenge, so if we can solve the latter, the former should be solved for free. Specific comments to your 4 proposals with the above in mind:
Let's consider an example use case. A user has a decade worth of Landsat imagery downloaded and CDL data for each year in the same time range. There are multiple possible ways in which they may want to use this data:
As you can see, these use cases are way to complex to handle with a single sampler. Some are more common than others, so we can focus on the common use cases and try to make things useful while still generic enough to handle many use cases. For example, I can image something like:
We can keep adding to this list as we think of things that people might want to do. Point is, I think the only way to handle these complicated use cases is to consider it from the perspective of uncurated datasets. We should store all spatiotemporal info in the index and then let the sampler handle the complexity of deciding when and where to sample from. This allows us to do all of the above ideas with the same dataset implementation just by swapping out different samplers. Let me know if this makes sense! |
Beta Was this translation helpful? Give feedback.
-
I am happy to know if there are any updates on that topic! I have been working on multi-temporal change detection for a while, and I found there are few datasets/models yet, making it a pain to research that... |
Beta Was this translation helpful? Give feedback.
-
Hi, I have interest in using SITS datasets as well and also happy to deliver my input here. I have an initial version of this on my end that follows the (1.) approach and that seems to work well for what I need. Let me sketch the outline here:
@adamjstewart regarding the way that the sampler gets parts of the temporal range of the dataset: Wouldn't your CyclicGeoSampler and ForecastingGeoSampler just be combining ROI's along the temporal dimension? Just like we have the roi_split for spatially discontinuous data, we can use it for temporally discontinuous data too? Should I maybe just open a PR so you guys can have a look and we just get started somewhere? I feel it is better to at least have partial support than trying to cover all aspects right from the start. |
Beta Was this translation helpful? Give feedback.
-
As suggested the following comment from #512 , is moved to its own issue:
After spending some time on the CropType Datasets in #512, I have a more general question about these types of time-series raster datasets. And since to my knowledge there is not yet a
GeoDataset
that includes time-series rasters as input and a corresponding mask, I thought I would raise them here.I am hereafter assuming that the desired behavior for such a time-series raster datasets is a getitem method that returns all time-series steps for a given geographical location. This is inspired by the
CV4A_Crop_Type_Dataset
which returns all time-series steps for each label, but is aVisionDataset
and therefore does not deal with bounding boxes. In case of the added datasets in this PR, the relationship between label and input is one-to-many. However, it was already pointed out that different geospatial datasets might require a different behavior.The following outline different approaches and observations I have made:
mint: float = 0 maxt: float = sys.maxsize
, populate the index that way and then the sampler would return all time-series steps for each label, since the time dimension would be the same for everything. The downside is that if the user would like to have some control over the time-dimension that is being returned, it would have to happen on their own behalf after the sample or batch is already returned.Another observation is that not all labels range over the same time-horizon. So while some labels have lets say 40 corresponding images, others might have 70. Hence, consider the case when a bounding box from the sampler suggests a region that intersects with two or more such labels. What is the proper way of merging the varying time dimensions of rasters to yield one sample, in addition to merging individual bands of each of the samples like
RasterDataset
does?Maybe I am also thinking about this wrong or missing something. Either way, I would welcome suggestions/comments.
Beta Was this translation helpful? Give feedback.
All reactions