Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: better handling of partial timestamps #134

Merged
merged 8 commits into from
Sep 16, 2021
Merged

Conversation

adamjstewart
Copy link
Collaborator

In #115, @calebrob6 noted that there is rarely intersection between the CDL dataset and any other dataset due to the way that indexing works for partial timestamps. This PR is an attempt to fix this issue.

Proposal

The proposed behavior is as follows. For files with partial timestamps, the [mint, maxt] range should span all possible times within that range. For example:

  • year only (e.g. CDL): first and last second of the year
  • year, month only: first and last second of the month
  • year, week only: first and last second of the week
  • year, month, day only: first and last second of the day
  • year, month, day, hour only: first and last second of the hour
  • year, month, day, hour, min only: first and last second of the min

This should cover all possible timestamp formats.

Solution

Add some disambiguate_timestamp helper function that returns a mint and maxt based on a timestamp and a strptime format string. I'll have to look at datetime more closely, but I don't see an easy way to do this without checking for specific format codes.

Discussion

There are many temporal resolutions we could care about. Right now, indices in the R-tree are seconds since 1970 (POSIX timestamp). We could use a different index, although datetime.timestamp() seems to be the only easy way to get a single float from a datetime, and R-trees can only take ints/floats. Regardless of the number we store, we should think about the level of accuracy we care about. For example, we could pretend that time isn't a thing and only look at date. For most (all?) of the datasets we have so far, the level of granularity provided by the timestamp is date, not datetime. However, I could envision some datasets (from drones or planes) that take multiple samples within the same day. On the other hand, we could go down to the resolution of milliseconds if we think there might be datasets that sample at such a high rate.

Possible remaining gotchas:

  • time zone: if times are not all in UTC, images taken from different time zones could have issues
  • daylight savings time: a dataset could have multiple samples from the same local time due to overlap, solution is to always store times in UTC if possible
  • UTC vs local time: if a filename contains a time, we don't necessarily know if that is local time or UTC time

@adamjstewart adamjstewart added the datasets Geospatial or benchmark datasets label Sep 13, 2021
@adamjstewart adamjstewart marked this pull request as ready for review September 13, 2021 22:45
@calebrob6 calebrob6 closed this Sep 15, 2021
@calebrob6 calebrob6 reopened this Sep 15, 2021
Copy link
Member

@calebrob6 calebrob6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect, thanks!

@adamjstewart adamjstewart merged commit 8c49b2a into main Sep 16, 2021
@adamjstewart adamjstewart deleted the fixes/time branch September 16, 2021 16:07
isaaccorley referenced this pull request in isaaccorley/torchgeo Sep 18, 2021
* Proposal: better handling of partial timestamps

* Parse format string directly

* Add unit tests

* Windows is broken

* Windows is still broken

* Fix mypy

* Simplify logic

* Fix bug for month 12, add details to docstring
@adamjstewart adamjstewart added this to the 0.1.0 milestone Nov 20, 2021
yichiac pushed a commit to yichiac/torchgeo that referenced this pull request Apr 29, 2023
* Proposal: better handling of partial timestamps

* Parse format string directly

* Add unit tests

* Windows is broken

* Windows is still broken

* Fix mypy

* Simplify logic

* Fix bug for month 12, add details to docstring
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datasets Geospatial or benchmark datasets
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants