-
Notifications
You must be signed in to change notification settings - Fork 385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: better handling of partial timestamps #134
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
adamjstewart
force-pushed
the
fixes/time
branch
from
September 14, 2021 01:49
9279eb5
to
cfe9974
Compare
calebrob6
requested changes
Sep 15, 2021
adamjstewart
commented
Sep 16, 2021
calebrob6
approved these changes
Sep 16, 2021
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect, thanks!
isaaccorley
referenced
this pull request
in isaaccorley/torchgeo
Sep 18, 2021
* Proposal: better handling of partial timestamps * Parse format string directly * Add unit tests * Windows is broken * Windows is still broken * Fix mypy * Simplify logic * Fix bug for month 12, add details to docstring
yichiac
pushed a commit
to yichiac/torchgeo
that referenced
this pull request
Apr 29, 2023
* Proposal: better handling of partial timestamps * Parse format string directly * Add unit tests * Windows is broken * Windows is still broken * Fix mypy * Simplify logic * Fix bug for month 12, add details to docstring
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In #115, @calebrob6 noted that there is rarely intersection between the CDL dataset and any other dataset due to the way that indexing works for partial timestamps. This PR is an attempt to fix this issue.
Proposal
The proposed behavior is as follows. For files with partial timestamps, the [mint, maxt] range should span all possible times within that range. For example:
This should cover all possible timestamp formats.
Solution
Add some
disambiguate_timestamp
helper function that returns amint
andmaxt
based on a timestamp and astrptime
format string. I'll have to look atdatetime
more closely, but I don't see an easy way to do this without checking for specific format codes.Discussion
There are many temporal resolutions we could care about. Right now, indices in the R-tree are seconds since 1970 (POSIX timestamp). We could use a different index, although
datetime.timestamp()
seems to be the only easy way to get a single float from adatetime
, and R-trees can only take ints/floats. Regardless of the number we store, we should think about the level of accuracy we care about. For example, we could pretend that time isn't a thing and only look at date. For most (all?) of the datasets we have so far, the level of granularity provided by the timestamp is date, not datetime. However, I could envision some datasets (from drones or planes) that take multiple samples within the same day. On the other hand, we could go down to the resolution of milliseconds if we think there might be datasets that sample at such a high rate.Possible remaining gotchas: