Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate EIA 861 2023 final release data #3911

Merged
merged 13 commits into from
Oct 21, 2024
Merged

Integrate EIA 861 2023 final release data #3911

merged 13 commits into from
Oct 21, 2024

Conversation

e-belfer
Copy link
Member

@e-belfer e-belfer commented Oct 16, 2024

Overview

Closes #3905.

What problem does this address?

Integrates the final release data for EIA 861.

What did you change?

  • Extracted new year of data, and mapped changed columns in column_maps
  • Add a warning when we're silently dropping columns when writing to SQL
  • Handle split of virtual PV into under and over 1MW capacity
  • Added a new field, energy_capacity_mwh
  • Updated a stale docstring for the clean_nerc function

Design questions

Tasks

Preview Give feedback

Documentation

Make sure to update relevant aspects of the documentation.

Tasks

Preview Give feedback

Testing

How did you make sure this worked? How can a reviewer verify this?

To-do list

Preview Give feedback

@e-belfer e-belfer added eia861 Anything having to do with EIA Form 861 data-update When fresh data is integrated into PUDL from quarterly or annual updates labels Oct 16, 2024
@e-belfer e-belfer self-assigned this Oct 16, 2024
@e-belfer e-belfer marked this pull request as ready for review October 17, 2024 21:17
@e-belfer e-belfer requested a review from zaneselvans October 17, 2024 21:17
Copy link
Member

@zaneselvans zaneselvans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. Kind of amazing that no changes were required to any of the transformations.

I'm a little worried about the warning in enforce_schema() since I think it might cause a lot of noise/confusion in the logs since there are a lot of tables where we intend for it to drop columns. Like I think maybe every FERC Form 1 table will now have a warning. I suggested a change to the logging output since I don't see a super simple way to pass in a flag that would indicate whether we expect to drop columns or not everywhere this is getting called.

Comment on lines 1645 to 1653
# Log warning if columns in dataframe are getting dropped in write
dropped_columns = list(df.columns.difference(expected_cols))
if dropped_columns:
logger.warning(
"The following columns are getting dropped when writing to SQLite:"
f"{dropped_columns}. To keep these columns, add them to the "
f"metadata.resources fields and update alembic."
)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My recollection is that there are many tables in which we intentionally rely on enforce_schema() to remove extra columns that aren't part of the table. In which case this change will create a large number of spurious warnings that should actually be ignored, making it hard to catch cases where it's actually a problem and maybe confusing people (later us).

Of course we don't want to be accidentally dropping columns we mean to keep, but is there a less noisy / more targeted way we can do that? The change to being stricter about the column mappings we made recently should help.

Maybe we could make it logger.info() instead, and make it clear in the message that we often intend this behavior.

Comment on lines 6 to 7
demand_side_management_eia861,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,-1
distributed_generation_eia861,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,-1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not blocking, but maybe we should update all the 0 values for years in which the table doesn't exist to be -1, as has already happened in skiprows.csv.

src/pudl/metadata/classes.py Outdated Show resolved Hide resolved
@e-belfer e-belfer requested a review from zaneselvans October 18, 2024 13:54
@e-belfer e-belfer added this pull request to the merge queue Oct 21, 2024
Merged via the queue into main with commit cb8b8c8 Oct 21, 2024
19 checks passed
@e-belfer e-belfer deleted the 861-2023-fr branch October 21, 2024 15:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data-update When fresh data is integrated into PUDL from quarterly or annual updates eia861 Anything having to do with EIA Form 861
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Integrate EIA 861 Final Release Data
2 participants