Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial suite of tests #54

Merged
merged 14 commits into from
Nov 19, 2024
Merged

Initial suite of tests #54

merged 14 commits into from
Nov 19, 2024

Conversation

eschrom
Copy link
Collaborator

@eschrom eschrom commented Nov 15, 2024

Many tests are included here for data cleaning, uptake data manipulating, and model building. Writing tests inspired some moderate reorganizations/fixes in init.py. The tests for model building are a bit sparse, because this portion of the code will be significantly reworked (requiring new tests) when numpyro is introduced. Still, it would be good to get some legitimate testing infrastructure incorporated.

@eschrom eschrom requested review from swo and Fuhan-Yang November 15, 2024 16:58
Copy link
Collaborator

@swo swo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A number of suggestions and some requests for documentation, but nothing critical or blocking. I think a lot of the stuff that I'm thinking should be tweaked are things that we'll develop joint understanding of, when we implement a second model.

iup/__init__.py Outdated
@@ -225,6 +225,44 @@ def trim_outlier_intervals(

return IncidentUptakeData(df)

def expand_implicit_columns(self, group_cols: tuple[str,] | None) -> Self:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the right word is, but "expand" is tricky. It sounds to me like "explode" or "extend," which each have different meanings in the polars world

In R broom, they call things like this "augment"

Not super important at all

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Augment" is actually the word I found myself tending toward when writing documentation, so I will happily change it.

- daily-average uptake in the interval preceding the previous date
"""
self = (
self.with_columns(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function would be better as a class method, rather than an object method

I think that will make #42 easier to reason about

The self = whatever has a really bad code smell

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add this to issue #42.

@@ -318,7 +356,7 @@ def parse_nis(
format of the dates in the NIS date column
rollout: dt.date
date of rollout
filters: dict
filters: dict | None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some fun notes: dict | None is equivalent to Optional[dict]

And you could specify the key and value types with dict[str, Any] (or replace Any which whatever type you want)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know! I will open a new issue #55 to remind me to make type hinting as precise as possible throughout.

@@ -406,7 +445,7 @@ def select_columns(
name of the NIS column for the uptake estimate (population %)
date_col: str
name of the NIS column for the date
group_cols: dict
group_cols: dict | None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is kind of confusing, that it's both selection and renaming

Like, {"age_group": "age_group"} means "keep age group," but {"age": "age_group"} means "keep the column 'age' and rename it", and {"raw_point_estimate": "estimate"} means "there are no group columns but please also rename the point estimate column."

I think it makes a lot more sense for group cols to be just a list (or set) of strings

Copy link
Collaborator Author

@eschrom eschrom Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it might be a touch confusing when inspecting select_columns for the first time, I think it makes a really intuitive interface for the user in the config.yaml. You just specify a dictionary of: {"what_the_grouping_column_is_called" : "what_I_want_the_grouping_column_to_be_called"}. This makes sure the user enters the same number of actual vs. desired grouping column names, whereas a list for each could be different lengths (raising errors downstream).

The 3rd example you raise shouldn't be an issue, because this is only for grouping columns. While {"raw_point_estimate": "estimate"} would indeed rename a column, it would also cause errors downstream because the point estimate would be erroneously used as a grouping factor.

iup/__init__.py Outdated
else:
return (
pl.when(x.drop_nulls().n_unique() == 1)
.then(x * 0.0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should have some documentation (that z-score of a single data point is defined to be zero, which seems sensible)

Suggested change
.then(x * 0.0)
.then(0.0)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change made and documentation added!

"estimate": [0.0, 0.0, 1.0, 0.1, 3.0, 0.3, 4.0, 0.4],
"season": ["2019/2020"] * 8,
"elapsed": [0, 0, 7, 7, 14, 14, 21, 21],
"interval": [None, None, 7, 7, 7, 7, 7, 7],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[None] * 2 + [7] * 6 might be easier to read, but it's sort of six-to-one

frame, ("estimate", "elapsed")
)

assert len(output) == 2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a minor thing, but I might be more specific here. Can you you write out what you expect output to be in this situation, like the exact data frame? That makes it easier for me as the code reviewer to be 100% sure about what you want this function to be doing

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, done.

output = iup.LinearIncidentUptakeModel()
output = output.fit(frame, ("geography",))

assert output.model.score(output.x, output.y) == 1.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be nice to also check that the coefficient(s) in the model is precisely what you think it was

Copy link
Collaborator Author

@eschrom eschrom Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. As we add more types of models, I think it would be good to choose a single fixture data frame for testing that has the right edge cases and known outputs to thoroughly test all models.

start, start_date, end_date, interval, group_cols
)

assert output.shape[0] == 5
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could write this as output.shape == (5, 7)

and this is another place that saying the exact data frame you expect would be good

assert output.shape[1] == 8


def test_project_sequentially():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

love this test. are there any other trivial or edge cases that are easy to test?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likely so - I'll give it some thought but merge what I have so far.

@eschrom eschrom merged commit e781432 into main Nov 19, 2024
2 checks passed
@eschrom eschrom deleted the ecs_testing branch November 19, 2024 16:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants