Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Function To Cast InferenceData Into tidy_draws Format #36

Open
wants to merge 74 commits into
base: main
Choose a base branch
from

Conversation

AFg6K7h4fhy2
Copy link
Collaborator

@AFg6K7h4fhy2 AFg6K7h4fhy2 commented Oct 28, 2024

For the scope of this PR, please refer to issue #18 .

@AFg6K7h4fhy2 AFg6K7h4fhy2 self-assigned this Oct 28, 2024
@AFg6K7h4fhy2 AFg6K7h4fhy2 linked an issue Oct 28, 2024 that may be closed by this pull request
@AFg6K7h4fhy2 AFg6K7h4fhy2 added feature A new tool or utility being added. High Priority A task that is of higher relative priority. labels Oct 28, 2024
@AFg6K7h4fhy2 AFg6K7h4fhy2 added this to the [October 28, November 8] milestone Oct 28, 2024
@AFg6K7h4fhy2 AFg6K7h4fhy2 added Medium Priority A task that is of medium relative priority. and removed High Priority A task that is of higher relative priority. labels Oct 28, 2024
@AFg6K7h4fhy2 AFg6K7h4fhy2 mentioned this pull request Nov 6, 2024
@AFg6K7h4fhy2
Copy link
Collaborator Author

AFg6K7h4fhy2 commented Feb 4, 2025

Requested review from @dylanhmorris. Once a review is received, the author will request another review from Damon.

Beyond the tidy connections, the author is also looking for comments on the changes made in the pre-commit configuration file, along with the pre-commit related edits across other files.

Copy link
Collaborator

@dylanhmorris dylanhmorris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @AFg6K7h4fhy2. Getting close but needs a few tweaks.

.pre-commit-config.yaml Outdated Show resolved Hide resolved
forecasttools/__init__.py Outdated Show resolved Hide resolved
forecasttools/daily_to_epiweekly.py Outdated Show resolved Hide resolved
forecasttools/idata_to_tidy.py Outdated Show resolved Hide resolved
forecasttools/idata_to_tidy.py Outdated Show resolved Hide resolved
((pl.col(".chain") - 1) * pl.col("draws_per_chain"))
+ pl.col(".iteration")
).alias(".draw")
)
Copy link
Collaborator

@dylanhmorris dylanhmorris Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to drop "draws_per_chain", but also it's not a given that all chains will have the same number of draws. Instead, more robust do compute this as .iteration + <n_draws_in_all_previous_chains>. Many ways to do that in polars.

forecasttools/idata_w_dates_to_df.py Outdated Show resolved Hide resolved

@pytest.fixture
def mock_inference_data():
np.random.seed(42)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running np.random.seed changes global state. Better practice to do something like this https://builtin.com/data-science/numpy-random-seed

Comment on lines +13 to +18
posterior_predictive = xr.Dataset(
{
"observed_hospital_admissions": ("chain", np.random.randn(2, 100)),
},
coords={"chain": [0, 1]},
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not run the test on the provided inference_data_1.nc? Or are you planning to remove it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am on the fence about removing it. Seems good to have a canonical pyrenew-hew .nc file on hand esp. given that forecasttools-py does / will even more so interface abundantly with pyrenew models. On the other hand, having adequate and general idata / xarray representations seems good for testing too. I do not know if the latter must exist at the cost of the former. I lean towards having both, with the .nc file perhaps being used in notebooks and the "fake" idatas being used for testing.

col in df.columns
for col in [".chain", ".draw", ".iteration", "variable", "value"]
)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to check that individual values are as expected, not just that the draws are unique and one for each row.

This was referenced Feb 6, 2025
.pre-commit-config.yaml Outdated Show resolved Hide resolved
@@ -42,6 +43,8 @@ patsy = "^0.5.6"
nbformat = "^5.10.4"
nbclient = "^0.10.0"
jupyter = "^1.1.1"
pandas = "^2.2.3"
metaflow = "^2.13.9"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature A new tool or utility being added. Medium Priority A task that is of medium relative priority.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Function to cast InferenceData into tidy_draws format
3 participants