Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Time-Series GeoDatasets #518

Closed
nilsleh opened this issue Apr 30, 2022 · 0 comments
Closed

Time-Series GeoDatasets #518

nilsleh opened this issue Apr 30, 2022 · 0 comments
Labels
datasets Geospatial or benchmark datasets

Comments

@nilsleh
Copy link
Collaborator

nilsleh commented Apr 30, 2022

As suggested the following comment from #512 , is moved to its own issue:

After spending some time on the CropType Datasets in #512, I have a more general question about these types of time-series raster datasets. And since to my knowledge there is not yet a GeoDataset that includes time-series rasters as input and a corresponding mask, I thought I would raise them here.

I am hereafter assuming that the desired behavior for such a time-series raster datasets is a getitem method that returns all time-series steps for a given geographical location. This is inspired by the CV4A_Crop_Type_Dataset which returns all time-series steps for each label, but is a VisionDataset and therefore does not deal with bounding boxes. In case of the added datasets in this PR, the relationship between label and input is one-to-many. However, it was already pointed out that different geospatial datasets might require a different behavior.

The following outline different approaches and observations I have made:

  1. Using the information of each of the individual time-series images allows one to populate the index in such a way that all spatio-temporal information is available to the sampler. However, when using the sampler in a default way and passing the datasets bounds to it, then the sampler not just samples XY-coords but also the time-dimension, meaning that returned samples will not include all time-series steps for a specific region. Additionally, this approach can be slow because there can be many thousand input images to go through to populate the index and it hence takes a long time to instantiate the dataset.
  2. In response to the last comment, a faster instantiation of the dataset could be to populate the index with the spatiotemporal information coming from the single-label, albeit it might be more tricky to gather all the time information because that is not necessarily included in the label. However, this would yield the same "issue" as above where the sampler will also sample the time dimension and not return all time-series step for each label.
  3. Another approach could be to ignore the time dimension all together and just set it like it is being done in RasterDataset with mint: float = 0 maxt: float = sys.maxsize, populate the index that way and then the sampler would return all time-series steps for each label, since the time dimension would be the same for everything. The downside is that if the user would like to have some control over the time-dimension that is being returned, it would have to happen on their own behalf after the sample or batch is already returned.
  4. Another approach, that could be an add-on to 3 would be to add a start_date and end_date parameter to the constructor and filter the files in such a way that they comply with this time range when a sample is gathered without using the supplied date information in the index.

Another observation is that not all labels range over the same time-horizon. So while some labels have lets say 40 corresponding images, others might have 70. Hence, consider the case when a bounding box from the sampler suggests a region that intersects with two or more such labels. What is the proper way of merging the varying time dimensions of rasters to yield one sample, in addition to merging individual bands of each of the samples like RasterDataset does?

Maybe I am also thinking about this wrong or missing something. Either way, I would welcome suggestions/comments.

@adamjstewart adamjstewart added the datasets Geospatial or benchmark datasets label May 3, 2022
@microsoft microsoft locked and limited conversation to collaborators Jul 1, 2022
@adamjstewart adamjstewart converted this issue into discussion #640 Jul 1, 2022

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
datasets Geospatial or benchmark datasets
Projects
None yet
Development

No branches or pull requests

2 participants