Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempt Use Of Dates With idata Objects #30

Merged
merged 30 commits into from
Oct 25, 2024

Conversation

AFg6K7h4fhy2
Copy link
Collaborator

@AFg6K7h4fhy2 AFg6K7h4fhy2 commented Oct 21, 2024

Tests for this will be found here: #29

This PR ought to be merged before further work on #9 occurs.

@AFg6K7h4fhy2 AFg6K7h4fhy2 added the enhancement Enhancement to existing feature or aspect of the project. label Oct 21, 2024
@AFg6K7h4fhy2 AFg6K7h4fhy2 added this to the [October 14, October 25] milestone Oct 21, 2024
@AFg6K7h4fhy2 AFg6K7h4fhy2 self-assigned this Oct 21, 2024
@AFg6K7h4fhy2 AFg6K7h4fhy2 linked an issue Oct 21, 2024 that may be closed by this pull request
@AFg6K7h4fhy2
Copy link
Collaborator Author

Forecasts in this PR are implicitly assumed to be sequential days — not weeks, 2-weeks, a sequence of Saturdays, etc... Other resolution may be supported. Some critical tasks of this PR include (some of these already exist, albeit inadequately):

  • Designing a function that adds "dates" as coordinates OR as a group to an az.InferenceData object that was generated from AT LEAST posterior and posterior_predictive groups (should have observed_data by default and might not have a predictions group).
  • Designing a function that receives an az.InferenceData object with date coordinates OR a dates group and converts it to a polars dataframe.
  • (similar to previous) Designing a function that receives an az.InferenceData object with date coordinates OR a dates group and converts it to tidy dataframe.

Key considerations:

  • Dates as coordinates?
  • Dates as groups in idata object?
  • Names of idata groups to expect (e.g. obs or y in the group observed_data)?
  • Function in forecasttools (not on user-side) for idata wo/ dates → idata w/ dates?
  • Function in forecasttools (not on user-side) for idata w/ dates → [dataframe formats]?

@AFg6K7h4fhy2
Copy link
Collaborator Author

From DB:

  1. TPM is working on an experiment to create some idata with dates to demo functionality in forecasttools. This is fine and I don't care how it is done in the immediate future.
  2. There should be some utility to add dates to an existing idata object. It should be agnostic as possible. It should live in forecasttools (although I initially made an issue for it in PyRenew) (1) should be implemented using (2), once (2) exists.

@AFg6K7h4fhy2
Copy link
Collaborator Author

AFg6K7h4fhy2 commented Oct 22, 2024

Some notes extracted from the experimentation file for adding dates to idata objects:

  • Function for adding dates to idata accept either IS08601 str or datetime date.
  • Any functions that receives a polars dataframe with dates ought to have those dates typed date (datetime).
  • .draw was artifact from R → use draw.
  • There should be name agnostic idata reference to dimensions (e.g. reference what is stored in idata.observed_data["obs_dim_0"] without using obs_dim_0...

NOTE:

# idata.observed_data.sizes["obs_dim_0"] is suggested by Arviz over
# idata.observed_data["obs_dim_0"].size

idata.observed_data.sizes
# output: Frozen({'obs_dim_0': 488})

# idata.observed_data.dims gets one:
# output: FrozenMappingWarningOnValuesAccess({'obs_dim_0': 488})


# ^ "obs_col" name still needed ^
  • An alternate name, which could be a forecasttools default, for a key like obs_dim_0 could be day_index.
  • Dates being added to idata objects ought to be "as agnostic as possible".
  • A function that add to an idata object an interval of one-day sequential dates is one possible function for forecasttools; other functions could provide other resolutions.
  • TPM PR task:
    • Experimentation using assign_coords with dates to idata.
    • Refactor function that takes idata wo/ dates and converts to polars dataframe.
    • Experimentation using add_group with dates to idata (shows plot_ts).
    • Using assign_coords to create new .nc example forecast file.
    • Refactor function that aggregates values by epiweek to use idata w/ dates.
    • Ensure vignette still works.

@AFg6K7h4fhy2 AFg6K7h4fhy2 marked this pull request as ready for review October 23, 2024 20:58
@AFg6K7h4fhy2
Copy link
Collaborator Author

AFg6K7h4fhy2 commented Oct 24, 2024

Questions (for myself & DHM to some degree):

  • Should the method that adds dates to idata add the dates as type str, datetime, or np.datetime?
  • Should the method that takes idata w/ dates and converts it to tidy-like dataframes have the output column for dates be of type str?

@AFg6K7h4fhy2
Copy link
Collaborator Author

Comments on notebook files are appreciated.
Notebooks (save for the FluSight vignette), will not be added to the PR.
I can turn notebooks into vignettes if others desire.

@AFg6K7h4fhy2
Copy link
Collaborator Author

Docstrings likely still need to be corrected.

@damonbayer
Copy link
Contributor

Comments on notebook files are appreciated. Notebooks (save for the FluSight vignette), will not be added to the PR. I can turn notebooks into vignettes if others desire.

Some more direction here to reviewers would be helpful. Which notebooks should we look at and what kind of feedback would be useful (since you say these will not be part of the PR)?

@AFg6K7h4fhy2
Copy link
Collaborator Author

Some more direction here to reviewers would be helpful. Which notebooks should we look at and what kind of feedback would be useful (since you say these will not be part of the PR)?

Notebooks:

  • df_from_idata_w_dates_to_tidy_draws: Shows how to go from InferenceData without dates to a tidy data.frame in R with dates and draws.
  • create_idata_flu_forecast_w_dates: Produces an influenza hospital admissions forecast idata object with dates.
  • explore_add_dates_to_idata_via_coords: Shows how to go from InferenceData without dates to InferenceData with dates, via the idata coordinates.
  • explore_add_dates_to_idata_via_groups: Shows how to go from InferenceData without dates to InferenceData with dates, via the idata groups.
  • test_out_idata_w_dates: Shows how to go from InferenceData without dates to a polars dataframe of draws and dates.

In a sequence of learning, someone would use:

explore_add_dates_to_idata_via_coordstest_out_idata_w_datesdf_from_idata_w_dates_to_tidy_draws.

If someone didn't know much about NumPyro to Arviz, then create_idata_flu_forecast_w_dates would be the place to go.

@damonbayer
Copy link
Contributor

@AFg6K7h4fhy2 why don't you think these notebooks should be part of the PR?

I think they could be used as vignettes or reworked as tests.

@AFg6K7h4fhy2
Copy link
Collaborator Author

AFg6K7h4fhy2 commented Oct 25, 2024

@AFg6K7h4fhy2 why don't you think these notebooks should be part of the PR?

I think they could be used as vignettes or reworked as tests.

I think, at present, they are not informative enough to be featured, as they are, in main.

(possibly) More clearly stated: TM hopes others can indicate which notebooks are worth keeping. Then TM can make these into tests or vignettes.

@damonbayer
Copy link
Contributor

damonbayer commented Oct 25, 2024

@AFg6K7h4fhy2 Perhaps we should sync face to face about this, but I'm confused about the intended goals here. The PR is titled "Attempt Use Of Dates With idata Objects." I suppose that is what the PR does, but I am wondering if we can narrow the focus.

  • There is some functionality for "tidying" InferenceData objects with dates into a dataframe (idata_forecast_w_dates_to_df) but it does not convert them to the tidy_draws-like format, as described in Function to cast InferenceData into tidy_draws format #18. This PR is not linked to that issue, so I maybe it is not intended to address this need. What is the intended use of the format it does convert them to?
  • There is also some functionality for adding dates to idata objects (add_dates_as_coords_to_idata), but it is not fully-featured. Having this "first pass" at that goal could be a reasonable scope for this PR.
  • There are miscellaneous updates to other functions and their documentation (ex forecasttools/recode_locations.py). These seem unrelated to the above two points. It might be okay for them to slip in, but I need more clarity about what we're trying to accomplish in this PR.

@AFg6K7h4fhy2
Copy link
Collaborator Author

AFg6K7h4fhy2 commented Oct 25, 2024

From face-to-face discussion with Damon:

  • PR goal is to create an idata object with dates as its coordinates and then verify that this idata works with the existing code (primarily the hubverse submission vignette). This PR is more of a first-pass towards utilities that connect time to idata objects.

@AFg6K7h4fhy2
Copy link
Collaborator Author

Not covered in this PR is the task of writing a function that takes an idata object and, in a group/coordinate agnostic manner, converts it to a tidybases::tidy_draws dataframe, which is a more general utility than is covered by the function idata_forecast_w_dates_to_df.

Copy link
Contributor

@damonbayer damonbayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@AFg6K7h4fhy2 AFg6K7h4fhy2 merged commit 558c640 into main Oct 25, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement to existing feature or aspect of the project.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Attempt Use Of Dates With idata Objects
2 participants