Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a sequence_as_of column to oracle output target data #296

Open
1 task
bsweger opened this issue Jan 28, 2025 · 1 comment · May be fixed by #300
Open
1 task

Add a sequence_as_of column to oracle output target data #296

bsweger opened this issue Jan 28, 2025 · 1 comment · May be fixed by #300
Assignees

Comments

@bsweger
Copy link
Collaborator

bsweger commented Jan 28, 2025

Background

Because the oracle output files are not partitioned by sequence_as_of, they will be overwritten each time the hub's post-submission jobs are run (if the round_id in question is still in the 14 week window interim data window).

For normal hub operations, this is fine. However, if the target data jobs are run out of order (for example, a manual backfill run with an older nowcast_date), that would break the implicit assumption that oracle.parquet fill always reflect the most recent nowcast_date.

We decided that partitioning oracle outputs by sequence_date makes them too onerous to access. So we can't guarantee that overwriting the file won't happen, but we can add sequence_as_of information to the oracle output files as a breadcrumb.

Definition of done

  • The oracle.parquet files generated by get_target_data.py will contain a sequence_as_of column
@bsweger bsweger added this to Lab Work Jan 28, 2025
@bsweger bsweger converted this from a draft issue Jan 28, 2025
@bsweger bsweger self-assigned this Jan 28, 2025
@bsweger bsweger added this to the Variant Nowcast milestone Jan 28, 2025
bsweger added a commit that referenced this issue Jan 28, 2025
Resolves #296

This changeset adds sequence_as_of and tree_as_of columns to the
oracle output target data files.
@bsweger
Copy link
Collaborator Author

bsweger commented Jan 29, 2025

I also added a tree_as_of column as part of this work, even though we didn't explicitly have that as a requirement. It aligns with our approach for the time series outputs and provides clarify for people accessing this file outside of the hub context.

@bsweger bsweger linked a pull request Jan 29, 2025 that will close this issue
@bsweger bsweger moved this from In Progress to Ready for Review in Lab Work Jan 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Ready for Review
Development

Successfully merging a pull request may close this issue.

1 participant