Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More General Utilities With Representation Of Time In idata Objects #35

Conversation

AFg6K7h4fhy2
Copy link
Collaborator

@AFg6K7h4fhy2 AFg6K7h4fhy2 commented Oct 28, 2024

Generalize utilities to add dates and times to az.InferenceData objects. A demonstration of these utilities can be found in #40 .

@AFg6K7h4fhy2 AFg6K7h4fhy2 self-assigned this Oct 28, 2024
@AFg6K7h4fhy2 AFg6K7h4fhy2 linked an issue Oct 28, 2024 that may be closed by this pull request
@AFg6K7h4fhy2 AFg6K7h4fhy2 added Medium Priority A task that is of medium relative priority. enhancement Enhancement to existing feature or aspect of the project. labels Oct 28, 2024
@AFg6K7h4fhy2 AFg6K7h4fhy2 added this to the [October 28, November 8] milestone Oct 28, 2024
@AFg6K7h4fhy2 AFg6K7h4fhy2 added High Priority A task that is of higher relative priority. and removed Medium Priority A task that is of medium relative priority. labels Oct 28, 2024
@AFg6K7h4fhy2
Copy link
Collaborator Author

#28 can be covered in this PR given the PR's scope.

@AFg6K7h4fhy2
Copy link
Collaborator Author

AFg6K7h4fhy2 commented Oct 29, 2024

@damonbayer Can you take a quick look at the changes?

EDIT: Adding rendered file now under notebooks. → See the pdf file.

@AFg6K7h4fhy2 AFg6K7h4fhy2 marked this pull request as ready for review October 29, 2024 22:01
@AFg6K7h4fhy2
Copy link
Collaborator Author

@dylanhmorris Current situation is that I would like you first to look at this file, then decide which option is closest to what you believe is best, then suggest changes to that option (code-changes or broader changes). I've selected Option 3 (pass date, group, and dim_name as a tuple). Finally, this file begins to set up the tests for Option 3 (but this could be applied to the other options as well). I would appreciate some suggestions on this file. A rendered version of the test file as a pdf with additional output can be found here.

@AFg6K7h4fhy2 AFg6K7h4fhy2 requested review from dylanhmorris and damonbayer and removed request for damonbayer October 30, 2024 14:32
@AFg6K7h4fhy2
Copy link
Collaborator Author

AFg6K7h4fhy2 commented Nov 21, 2024

In communications between DHM and the author, a decision has been made to cease using pl.date_range and pl.datetime_range and instead use np.arange with utilities for datetime64[ns] and datetime64[D] objects. As such, some code and tests need to be updated.

@AFg6K7h4fhy2
Copy link
Collaborator Author

AFg6K7h4fhy2 commented Nov 21, 2024

The author's remark:

Even if one uses the approach involving np.arange w/ timedelta64 "D", assign_coords converts to [ns] precision, keeping the "times" even if one is operating in "dates". Bite the bullet here an just always accept "times" for coords? I want to canvas assign_coords once more.

from datetime import date, datetime, timedelta

import arviz as az
import numpy as np
import polars as pl
import xarray as xr


coords = {"obs_dim_0": [0, 1, 2, 3, 5]}
data = xr.Dataset(
    {"obs": ("obs_dim_0", np.array([10, 20, 30, 40, 50]))}, coords=coords
)
idata = az.InferenceData(observed_data=data)

# test parameters
group = "observed_data"
variable = "obs"
dimension = "obs_dim_0"
start_date = date(2022, 8, 1)

expected_datetimes = np.array([
    datetime(2022, 8, 1),
    datetime(2022, 8, 2),
    datetime(2022, 8, 3),
    datetime(2022, 8, 4),
    datetime(2022, 8, 5),
]).astype("datetime64[D]")

idata_group = getattr(idata, group)
variable_data = idata_group[variable]
interval_size = variable_data.sizes[dimension]

out = np.arange(
    start_date, interval_size, np.timedelta64(1, "D")
)

print(out)

idata_group = idata_group.assign_coords({dimension: out})

setattr(idata, group, idata_group)

print(idata.observed_data.coords[dimension].values)

Out:

['2022-08-01' '2022-08-02' '2022-08-03' '2022-08-04' '2022-08-05']
['2022-08-01T00:00:00.000000000' '2022-08-02T00:00:00.000000000'
 '2022-08-03T00:00:00.000000000' '2022-08-04T00:00:00.000000000'
 '2022-08-05T00:00:00.000000000']
<ipython-input-2-6ff7ebed4ad4>:35: UserWarning: Converting non-nanosecond precision datetime values to nanosecond precision. This behavior can eventually be relaxed in xarray, as it is an artifact from pandas which is now beginning to support non-nanosecond precision values. This warning is caused by passing non-nanosecond np.datetime64 or np.timedelta64 values to the DataArray or Variable constructor; it can be silenced by converting the values to nanosecond precision ahead of time.
  idata_group = idata_group.assign_coords({dimension: out})

Warning:

UserWarning: Converting non-nanosecond precision datetime values to nanosecond precision. This behavior can eventually be relaxed in xarray, as it is an artifact from pandas which is now beginning to support non-nanosecond precision values. This warning is caused by passing non-nanosecond np.datetime64 or np.timedelta64 values to the DataArray or Variable constructor; it can be silenced by converting the values to nanosecond precision ahead of time

@dylanhmorris
Copy link
Collaborator

dylanhmorris commented Nov 21, 2024

Yes, if xarray currently only supports [ns] precision, we should do the same in this function (and relax it if and when xarray changes)

@AFg6K7h4fhy2
Copy link
Collaborator Author

AFg6K7h4fhy2 commented Nov 21, 2024

Re: #35 (comment)

With this, the author will still operate with using two paths, one for datetime64[D] and another for datetime64[ns]. However, it will be noted in a docstring / type hint that the output (i.e. the assigned coords) is datetime64[ns]. Rather than remove what will operate as intended as some point in the future (i.e. when xarray behavior is relaxed w/ non [ns] precision), the author will retain the separation, which does not seem to harm usability.

@dylanhmorris
Copy link
Collaborator

Sounds good, but to avoid the warning, you should coerce the [D] resolution arrays to [ns] resolution before calling assign_coords()

@AFg6K7h4fhy2
Copy link
Collaborator Author

Re: #35 (comment)

Sounds good; I've been proceeding with properly placed .astype("datetime64[ns]")s.

Copy link
Collaborator

@dylanhmorris dylanhmorris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking very good, thanks @AFg6K7h4fhy2. A few final things.

forecasttools/idata_w_dates_to_df.py Outdated Show resolved Hide resolved
forecasttools/idata_w_dates_to_df.py Outdated Show resolved Hide resolved
forecasttools/idata_w_dates_to_df.py Outdated Show resolved Hide resolved
forecasttools/idata_w_dates_to_df.py Outdated Show resolved Hide resolved
forecasttools/idata_w_dates_to_df.py Outdated Show resolved Hide resolved
forecasttools/idata_w_dates_to_df.py Show resolved Hide resolved
forecasttools/idata_w_dates_to_df.py Outdated Show resolved Hide resolved
forecasttools/idata_w_dates_to_df.py Outdated Show resolved Hide resolved
tests/test_add_time_coords_to_idata.py Outdated Show resolved Hide resolved
tests/test_add_time_coords_to_idata.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@dylanhmorris dylanhmorris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your hard work, @AFg6K7h4fhy2!

@dylanhmorris dylanhmorris merged commit 1b4f413 into main Nov 25, 2024
2 checks passed
@dylanhmorris dylanhmorris deleted the 34-more-general-utilities-with-representation-of-time-in-idata-objects branch November 25, 2024 15:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement to existing feature or aspect of the project. High Priority A task that is of higher relative priority.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

More General Utilities With Representation Of Time In idata Objects
3 participants