You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Because the oracle output files are not partitioned by sequence_as_of, they will be overwritten each time the hub's post-submission jobs are run (if the round_id in question is still in the 14 week window interim data window).
For normal hub operations, this is fine. However, if the target data jobs are run out of order (for example, a manual backfill run with an older nowcast_date), that would break the implicit assumption that oracle.parquet fill always reflect the most recent nowcast_date.
We decided that partitioning oracle outputs by sequence_date makes them too onerous to access. So we can't guarantee that overwriting the file won't happen, but we can add sequence_as_of information to the oracle output files as a breadcrumb.
Definition of done
The oracle.parquet files generated by get_target_data.py will contain a sequence_as_of column
The text was updated successfully, but these errors were encountered:
I also added a tree_as_of column as part of this work, even though we didn't explicitly have that as a requirement. It aligns with our approach for the time series outputs and provides clarify for people accessing this file outside of the hub context.
Background
Because the oracle output files are not partitioned by
sequence_as_of
, they will be overwritten each time the hub's post-submission jobs are run (if the round_id in question is still in the 14 week window interim data window).For normal hub operations, this is fine. However, if the target data jobs are run out of order (for example, a manual backfill run with an older nowcast_date), that would break the implicit assumption that
oracle.parquet
fill always reflect the most recent nowcast_date.We decided that partitioning oracle outputs by sequence_date makes them too onerous to access. So we can't guarantee that overwriting the file won't happen, but we can add sequence_as_of information to the oracle output files as a breadcrumb.
Definition of done
oracle.parquet
files generated byget_target_data.py
will contain asequence_as_of
columnThe text was updated successfully, but these errors were encountered: