You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As suggested the following comment from #512 , is moved to its own issue:
After spending some time on the CropType Datasets in #512, I have a more general question about these types of time-series raster datasets. And since to my knowledge there is not yet a GeoDataset that includes time-series rasters as input and a corresponding mask, I thought I would raise them here.
I am hereafter assuming that the desired behavior for such a time-series raster datasets is a getitem method that returns all time-series steps for a given geographical location. This is inspired by the CV4A_Crop_Type_Dataset which returns all time-series steps for each label, but is a VisionDataset and therefore does not deal with bounding boxes. In case of the added datasets in this PR, the relationship between label and input is one-to-many. However, it was already pointed out that different geospatial datasets might require a different behavior.
The following outline different approaches and observations I have made:
Using the information of each of the individual time-series images allows one to populate the index in such a way that all spatio-temporal information is available to the sampler. However, when using the sampler in a default way and passing the datasets bounds to it, then the sampler not just samples XY-coords but also the time-dimension, meaning that returned samples will not include all time-series steps for a specific region. Additionally, this approach can be slow because there can be many thousand input images to go through to populate the index and it hence takes a long time to instantiate the dataset.
In response to the last comment, a faster instantiation of the dataset could be to populate the index with the spatiotemporal information coming from the single-label, albeit it might be more tricky to gather all the time information because that is not necessarily included in the label. However, this would yield the same "issue" as above where the sampler will also sample the time dimension and not return all time-series step for each label.
Another approach could be to ignore the time dimension all together and just set it like it is being done in RasterDataset with mint: float = 0 maxt: float = sys.maxsize, populate the index that way and then the sampler would return all time-series steps for each label, since the time dimension would be the same for everything. The downside is that if the user would like to have some control over the time-dimension that is being returned, it would have to happen on their own behalf after the sample or batch is already returned.
Another approach, that could be an add-on to 3 would be to add a start_date and end_date parameter to the constructor and filter the files in such a way that they comply with this time range when a sample is gathered without using the supplied date information in the index.
Another observation is that not all labels range over the same time-horizon. So while some labels have lets say 40 corresponding images, others might have 70. Hence, consider the case when a bounding box from the sampler suggests a region that intersects with two or more such labels. What is the proper way of merging the varying time dimensions of rasters to yield one sample, in addition to merging individual bands of each of the samples like RasterDataset does?
Maybe I am also thinking about this wrong or missing something. Either way, I would welcome suggestions/comments.
The text was updated successfully, but these errors were encountered:
As suggested the following comment from #512 , is moved to its own issue:
After spending some time on the CropType Datasets in #512, I have a more general question about these types of time-series raster datasets. And since to my knowledge there is not yet a
GeoDataset
that includes time-series rasters as input and a corresponding mask, I thought I would raise them here.I am hereafter assuming that the desired behavior for such a time-series raster datasets is a getitem method that returns all time-series steps for a given geographical location. This is inspired by the
CV4A_Crop_Type_Dataset
which returns all time-series steps for each label, but is aVisionDataset
and therefore does not deal with bounding boxes. In case of the added datasets in this PR, the relationship between label and input is one-to-many. However, it was already pointed out that different geospatial datasets might require a different behavior.The following outline different approaches and observations I have made:
mint: float = 0 maxt: float = sys.maxsize
, populate the index that way and then the sampler would return all time-series steps for each label, since the time dimension would be the same for everything. The downside is that if the user would like to have some control over the time-dimension that is being returned, it would have to happen on their own behalf after the sample or batch is already returned.Another observation is that not all labels range over the same time-horizon. So while some labels have lets say 40 corresponding images, others might have 70. Hence, consider the case when a bounding box from the sampler suggests a region that intersects with two or more such labels. What is the proper way of merging the varying time dimensions of rasters to yield one sample, in addition to merging individual bands of each of the samples like
RasterDataset
does?Maybe I am also thinking about this wrong or missing something. Either way, I would welcome suggestions/comments.
The text was updated successfully, but these errors were encountered: