Add a sequence_as_of column to oracle output target data #296

bsweger · 2025-01-28T21:18:02Z

Background

Because the oracle output files are not partitioned by sequence_as_of, they will be overwritten each time the hub's post-submission jobs are run (if the round_id in question is still in the 14 week window interim data window).

For normal hub operations, this is fine. However, if the target data jobs are run out of order (for example, a manual backfill run with an older nowcast_date), that would break the implicit assumption that oracle.parquet fill always reflect the most recent nowcast_date.

We decided that partitioning oracle outputs by sequence_date makes them too onerous to access. So we can't guarantee that overwriting the file won't happen, but we can add sequence_as_of information to the oracle output files as a breadcrumb.

Definition of done

The oracle.parquet files generated by get_target_data.py will contain a sequence_as_of column

The text was updated successfully, but these errors were encountered:

Resolves #296 This changeset adds sequence_as_of and tree_as_of columns to the oracle output target data files.

bsweger · 2025-01-29T20:44:20Z

I also added a tree_as_of column as part of this work, even though we didn't explicitly have that as a requirement. It aligns with our approach for the time series outputs and provides clarify for people accessing this file outside of the hub context.

bsweger added this to Lab Work Jan 28, 2025

bsweger converted this from a draft issue Jan 28, 2025

bsweger self-assigned this Jan 28, 2025

bsweger added this to the Variant Nowcast milestone Jan 28, 2025

bsweger added a commit that referenced this issue Jan 28, 2025

Add two new colums to oracle-output target files

b04ac3d

Resolves #296 This changeset adds sequence_as_of and tree_as_of columns to the oracle output target data files.

bsweger linked a pull request Jan 29, 2025 that will close this issue

Add "as_of" columns to oracle output target data #300

Open

bsweger moved this from In Progress to Ready for Review in Lab Work Jan 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a sequence_as_of column to oracle output target data #296

Add a sequence_as_of column to oracle output target data #296

bsweger commented Jan 28, 2025 •

edited

Loading

bsweger commented Jan 29, 2025

Add a sequence_as_of column to oracle output target data #296

Add a sequence_as_of column to oracle output target data #296

Comments

bsweger commented Jan 28, 2025 • edited Loading

bsweger commented Jan 29, 2025

bsweger commented Jan 28, 2025 •

edited

Loading