Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aligning public data with user-defined season: add an attribute to store the shift #247

Open
3 tasks
cbutsko opened this issue Jan 13, 2025 · 0 comments
Open
3 tasks
Assignees
Milestone

Comments

@cbutsko
Copy link
Contributor

cbutsko commented Jan 13, 2025

When users query public extractions parquet and provide their own processing period, we need to align existing extractions with the user-proposed season and make sure that the label (represented by the valid_date) is still preserved in the selected period.

While the current version of code is already doing that (refer to this part), it would be nice to add more context for the user and include the actual shift that has been made with the respect to the actual label position.

To do:

  • add an attribute to the resulting dataframe of the query_public_extraction function that preserves information about the shift
  • add a logger message/warning to the user that notifies the user about the number of samples where the shift was larger than a threshold
  • define the acceptable threshold of the shift
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants