Integrate EIA 861 2023 final release data #3911

e-belfer · 2024-10-16T16:12:52Z

Overview

Closes #3905.

What problem does this address?

Integrates the final release data for EIA 861.

What did you change?

Extracted new year of data, and mapped changed columns in column_maps
Add a warning when we're silently dropping columns when writing to SQL
Handle split of virtual PV into under and over 1MW capacity
Added a new field, energy_capacity_mwh
Updated a stale docstring for the clean_nerc function

Design questions

Tasks

Give feedback

Do we want to split out virtual_pv_under_1mw and virtual_pv_over_1_mw in the TECH_CLASSES enum? Currently these records are getting lost in the stack of the net metering table. - Yes! And let's not fill them, because that'll create double-counting.
Update row counts in validation tests
Options

Documentation

Make sure to update relevant aspects of the documentation.

Tasks

Give feedback

Update the release notes: reference the PR and related issues.
Update relevant table or source description metadata (see src/metadata).
Review and update any other aspects of the documentation that might be affected by this PR.
Options

Testing

How did you make sure this worked? How can a reviewer verify this?

To-do list

Give feedback

Run make pytest-coverage locally to ensure that the merge queue will accept your PR.
Review the PR yourself and call out any questions or issues you have.
For bigger ETL or data changes run the full ETL locally and then run the data validations using make pytest-validate.
Alternatively, run the build-deploy-pudl GitHub Action manually.
Options

…olumn

…/pudl into 861-2023-fr

zaneselvans

This looks great. Kind of amazing that no changes were required to any of the transformations.

I'm a little worried about the warning in enforce_schema() since I think it might cause a lot of noise/confusion in the logs since there are a lot of tables where we intend for it to drop columns. Like I think maybe every FERC Form 1 table will now have a warning. I suggested a change to the logging output since I don't see a super simple way to pass in a flag that would indicate whether we expect to drop columns or not everywhere this is getting called.

zaneselvans · 2024-10-18T01:08:28Z

src/pudl/metadata/classes.py

+        # Log warning if columns in dataframe are getting dropped in write
+        dropped_columns = list(df.columns.difference(expected_cols))
+        if dropped_columns:
+            logger.warning(
+                "The following columns are getting dropped when writing to SQLite:"
+                f"{dropped_columns}. To keep these columns, add them to the "
+                f"metadata.resources fields and update alembic."
+            )
+


My recollection is that there are many tables in which we intentionally rely on enforce_schema() to remove extra columns that aren't part of the table. In which case this change will create a large number of spurious warnings that should actually be ignored, making it hard to catch cases where it's actually a problem and maybe confusing people (later us).

Of course we don't want to be accidentally dropping columns we mean to keep, but is there a less noisy / more targeted way we can do that? The change to being stricter about the column mappings we made recently should help.

Maybe we could make it logger.info() instead, and make it clear in the message that we often intend this behavior.

zaneselvans · 2024-10-18T01:14:01Z

src/pudl/package_data/eia861/skipfooter.csv

+demand_side_management_eia861,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,-1
+distributed_generation_eia861,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,-1


Not blocking, but maybe we should update all the 0 values for years in which the table doesn't exist to be -1, as has already happened in skiprows.csv.

src/pudl/metadata/classes.py

Co-authored-by: Zane Selvans <[email protected]>

…/pudl into 861-2023-fr

e-belfer added 2 commits October 15, 2024 08:22

Extract new archive

6ed7cae

WIP updates to extraction CSVs

b34bd3c

e-belfer added eia861 Anything having to do with EIA Form 861 data-update When fresh data is integrated into PUDL from quarterly or annual updates labels Oct 16, 2024

e-belfer self-assigned this Oct 16, 2024

e-belfer and others added 5 commits October 16, 2024 12:26

Fix duplicated column name

22ee8df

Update clean_nerc docstring and add FRCC_NERC to enum

6fd4c79

Prevent silent column drops on write to sqlite, add energy_capacity c…

ff0d8b5

…olumn

Update release notes

1f74f63

Merge branch 'main' into 861-2023-fr

a1898b8

e-belfer marked this pull request as ready for review October 17, 2024 21:17

Merge branch 'main' into 861-2023-fr

f44c8a3

e-belfer requested a review from zaneselvans October 17, 2024 21:17

e-belfer added 2 commits October 17, 2024 17:19

Add under_1mw and over_1mw to table

502db82

Merge branch '861-2023-fr' of https://github.com/catalyst-cooperative…

178159a

…/pudl into 861-2023-fr

zaneselvans requested changes Oct 18, 2024

View reviewed changes

e-belfer and others added 3 commits October 18, 2024 09:35

Update src/pudl/metadata/classes.py

e5149b5

Co-authored-by: Zane Selvans <[email protected]>

Add -1s to 861 skipfooter map, update validation tests

aaeccf7

Merge branch '861-2023-fr' of https://github.com/catalyst-cooperative…

b6db0ef

…/pudl into 861-2023-fr

e-belfer requested a review from zaneselvans October 18, 2024 13:54

zaneselvans approved these changes Oct 19, 2024

View reviewed changes

e-belfer added this pull request to the merge queue Oct 21, 2024

Merged via the queue into main with commit cb8b8c8 Oct 21, 2024
19 checks passed

e-belfer deleted the 861-2023-fr branch October 21, 2024 15:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate EIA 861 2023 final release data #3911

Integrate EIA 861 2023 final release data #3911

e-belfer commented Oct 16, 2024 •

edited

Loading

Tasks

Tasks

To-do list

zaneselvans left a comment

zaneselvans Oct 18, 2024

zaneselvans Oct 18, 2024

		demand_side_management_eia861,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,-1
		distributed_generation_eia861,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,-1

Integrate EIA 861 2023 final release data #3911

Integrate EIA 861 2023 final release data #3911

Conversation

e-belfer commented Oct 16, 2024 • edited Loading

Overview

What problem does this address?

What did you change?

Design questions

Tasks

Documentation

Tasks

Testing

How did you make sure this worked? How can a reviewer verify this?

To-do list

zaneselvans left a comment

Choose a reason for hiding this comment

zaneselvans Oct 18, 2024

Choose a reason for hiding this comment

zaneselvans Oct 18, 2024

Choose a reason for hiding this comment

e-belfer commented Oct 16, 2024 •

edited

Loading