-
-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate EIA 861 2023 final release data #3911
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great. Kind of amazing that no changes were required to any of the transformations.
I'm a little worried about the warning in enforce_schema()
since I think it might cause a lot of noise/confusion in the logs since there are a lot of tables where we intend for it to drop columns. Like I think maybe every FERC Form 1 table will now have a warning. I suggested a change to the logging output since I don't see a super simple way to pass in a flag that would indicate whether we expect to drop columns or not everywhere this is getting called.
src/pudl/metadata/classes.py
Outdated
# Log warning if columns in dataframe are getting dropped in write | ||
dropped_columns = list(df.columns.difference(expected_cols)) | ||
if dropped_columns: | ||
logger.warning( | ||
"The following columns are getting dropped when writing to SQLite:" | ||
f"{dropped_columns}. To keep these columns, add them to the " | ||
f"metadata.resources fields and update alembic." | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My recollection is that there are many tables in which we intentionally rely on enforce_schema()
to remove extra columns that aren't part of the table. In which case this change will create a large number of spurious warnings that should actually be ignored, making it hard to catch cases where it's actually a problem and maybe confusing people (later us).
Of course we don't want to be accidentally dropping columns we mean to keep, but is there a less noisy / more targeted way we can do that? The change to being stricter about the column mappings we made recently should help.
Maybe we could make it logger.info()
instead, and make it clear in the message that we often intend this behavior.
demand_side_management_eia861,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,-1 | ||
distributed_generation_eia861,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,-1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not blocking, but maybe we should update all the 0 values for years in which the table doesn't exist to be -1, as has already happened in skiprows.csv
.
Co-authored-by: Zane Selvans <[email protected]>
Overview
Closes #3905.
What problem does this address?
Integrates the final release data for EIA 861.
What did you change?
column_maps
energy_capacity_mwh
clean_nerc
functionDesign questions
Tasks
Documentation
Make sure to update relevant aspects of the documentation.
Tasks
Testing
How did you make sure this worked? How can a reviewer verify this?
To-do list