Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YAML-run pipeline, part 2 (data preprocessing) #28

Merged
merged 14 commits into from
Aug 15, 2024
Merged

Conversation

thanasibakis
Copy link
Collaborator

@thanasibakis thanasibakis commented Aug 13, 2024

This resolves #26.

Until now, we've been missing the ability to easily configure how data is preprocessed.

The linmod.data script now accepts a single commandline argument for the path to a YAML file configuring its behavior. This is optional; without it, the default behavior (seen in the dictionary linmod.data.DEFAULT_CONFIG) will be used. The YAML file only needs to define the keys it wants to modify from default; missing keys will be populated with the default values.

An example is given in present-day-forecasting/config.yaml. As described in the README, this is run as python3 linmod.data config.yaml.

@afmagee42
Copy link
Collaborator

While we're working on configurability, we also want to be able to configure an analysis to filter not just to data relevant to the forecast date (date <= forecast_date) but also data available before the forecast date (date_submitted <= forecast_date).

@afmagee42
Copy link
Collaborator

We should also find a way to keep the forecast date around as a date, so that we can choose to plot things on a non-arbitrary-time-axis (that is, against something other than $-30 \leq t \leq 14$).

@thanasibakis
Copy link
Collaborator Author

We should also find a way to keep the forecast date around as a date, so that we can choose to plot things on a non-arbitrary-time-axis (that is, against something other than − 30 ≤ t ≤ 14 ).

Done :)

@thanasibakis thanasibakis changed the title [WIP] Data preprocessing configuration via YAML YAML-run pipeline, part 2 (data preprocessing) Aug 14, 2024
@thanasibakis thanasibakis marked this pull request as ready for review August 14, 2024 18:19
@thanasibakis thanasibakis requested a review from afmagee42 August 14, 2024 18:19
@thanasibakis
Copy link
Collaborator Author

While we're working on configurability, we also want to be able to configure an analysis to filter not just to data relevant to the forecast date (date <= forecast_date) but also data available before the forecast date (date_submitted <= forecast_date).

Done. And now that we're on this topic, I've updated the preprocessing script to give us two datasets for a given horizon [forecast_date - L, forecast_date + H]:

  • An evaluation dataset with all sequences collected and reported within this horizon
  • A modeling dataset with only sequences collected and reported within the subinterval [forecast_date - L, forecast_date]

Base automatically changed from model-objects to main August 14, 2024 22:00
Copy link
Collaborator

@afmagee42 afmagee42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is solid progress towards easy running, but we've got to get some things cleared up first.

linmod/data.py Outdated Show resolved Hide resolved
linmod/data.py Outdated Show resolved Hide resolved
linmod/data.py Show resolved Hide resolved
linmod/data.py Outdated Show resolved Hide resolved
linmod/data.py Outdated Show resolved Hide resolved
linmod/data.py Outdated Show resolved Hide resolved
linmod/data.py Outdated Show resolved Hide resolved
linmod/data.py Outdated Show resolved Hide resolved
present-day-forecasting/main.py Outdated Show resolved Hide resolved
present-day-forecasting/main.py Outdated Show resolved Hide resolved
@afmagee42 afmagee42 self-requested a review August 15, 2024 21:06
Copy link
Collaborator

@afmagee42 afmagee42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@thanasibakis thanasibakis merged commit 4947165 into main Aug 15, 2024
1 check passed
@thanasibakis thanasibakis deleted the data-config branch August 15, 2024 21:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Configurable number of day's data and forecasting date
2 participants