-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pit templates #12
Pit templates #12
Conversation
@@ -167,6 +167,6 @@ def obs_data(): | |||
"location": ["a"] * 20 + ["b"] * 20, | |||
"population": [100.0] * 10 + [150.0] * 10 + [200.0] * 10 + [250.0] * 10, | |||
"age_group": (["young"] * 10 + ["old"] * 10) * 2, | |||
"date": [datetime.strptime("2020-01-01", "%Y-%m-%d") + timedelta(i) for i in range(10)] * 4, | |||
"date": [datetime.strptime("2020-01-14", "%Y-%m-%d") + timedelta(i) for i in range(10)] * 4, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needed to update date here to get observed data that were in the same range of prediction target dates, so that I could test correct results based on PIT scores.
reference_time_col: str = "reference_date", | ||
horizon_col: str = "horizon", pred_col: str = "value", | ||
idx_col: str = "output_type_id", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these have moved to the .fit
method instead of .transform
horizon_col, idx_col, pred_col) | ||
min_horizon = model_out[horizon_col].min() | ||
max_horizon = model_out[horizon_col].max() | ||
wide_model_out = self._pivot_horizon(model_out) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reference_time_col
, horizon_col
, idx_col
, and pred_col
are now saved as attributes of the object, so we no longer need to pass them around as arguments and we use self.horizon_col
, etc.
if self.model_out_train is not None: | ||
wide_model_out_train = self._pivot_horizon(self.model_out_train) | ||
else: | ||
wide_model_out_train = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refactoring to-do: move this inside of self._build_train_X_Y
. see #2
time_col: name of column in `df` that contains the time index. | ||
obs_col: name of column in `df` that contains observed values. | ||
feat_cols: names of columns in `df` with features | ||
target_data_train: pl.DataFrame |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checked the docstring types against the types in the function parameter 👍
|
||
|
||
def test_build_train_X_Y_pit_templates(obs_data, wide_model_out, monkeypatch): | ||
# we use monkeypatch to remove abstract methods from the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a neat trick! At some point, it might be worth creating a fixture for this pattern, since it's in the code base quite a few times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea, i filed issue #17 for that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the notes throughout--those were helpful.
The Python re-factoring and docstrings looked good. I also ran through the local setup instructions and ran the test suite--all is working as expected!
This PR makes it so that templates for the Shaake shuffle can be constructed based on PIT values from past forecasts rather than just past observed data. This is also an intermediate step toward estimating copulas.