Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FERC 714: transform of hourly demand table (dbf +xbrl) #3842

Merged
merged 47 commits into from
Sep 26, 2024

Conversation

cmgosnell
Copy link
Member

@cmgosnell cmgosnell commented Sep 12, 2024

Overview

Closes #3838. There is a tasklist in the issue!

What problem does this address?

most of the work here has been in cleaning the date formats 🙄

  • datetimes omigosh....
    • 🟢 convert_dates_to_zero_offset_hours_xbrl hours being 01-24 or 01-00 (of next day.. probably I put in some emails to ferc about this)
    • 🟢 convert_dates_to_zero_seconds: some last record of the days being last second of the day (T23:59)
    • GAPS. There are 41 gaps in the timeseries
    • there are lots of overlapping timestamps rn
  • ...

TO-DO once all tables are merged in

  • get rid of _post_process function

Testing

How did you make sure this worked? How can a reviewer verify this?

To-do list

Preview Give feedback

@cmgosnell cmgosnell requested a review from aesharpe September 12, 2024 20:17
@cmgosnell cmgosnell self-assigned this Sep 12, 2024
@cmgosnell cmgosnell added the ferc714 Anything having to do with FERC Form 714 label Sep 13, 2024
@aesharpe aesharpe added the data-update When fresh data is integrated into PUDL from quarterly or annual updates label Sep 16, 2024
@aesharpe
Copy link
Member

Just a lil lonely comment because it's not part of the code you edited therefore I can't comment directly on the line...

The doc strings for the OFFSET_CODES dictionary seems like it has a copy and paste error in it?

Copy link
Member

@aesharpe aesharpe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good! I found it really easy to go through all the functions and understand what was happening to the table. Most of my comments are non-blocking. Love that all of the functions are bite-sized! :)

src/pudl/transform/ferc714.py Outdated Show resolved Hide resolved
src/pudl/transform/ferc714.py Show resolved Hide resolved
src/pudl/transform/ferc714.py Outdated Show resolved Hide resolved
src/pudl/transform/ferc714.py Outdated Show resolved Hide resolved
src/pudl/transform/ferc714.py Outdated Show resolved Hide resolved
src/pudl/transform/ferc714.py Outdated Show resolved Hide resolved
@aesharpe
Copy link
Member

oh also, @cmgosnell reminder to add some color to the metadata/resources/ferc714.py module about the out_ferc714__hourly_planning_area_demand table.

@cmgosnell cmgosnell marked this pull request as ready for review September 24, 2024 15:30
@cmgosnell cmgosnell requested a review from aesharpe September 25, 2024 17:40
Comment on lines 23 to +24
ferc714_xbrl_to_sqlite_settings:
years: [2021, 2022]
years: [2021, 2023]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the years here and in the pudl-etl settings were disjoint (here had 2022 and the pudl etl settings had 2023. this was causing several errors and was hard to track down bc it only happened when i re-ran the xbrl_to_sqlite stuff. because of this i added a validation (and corresponding unit test) in to the EtlSettings

Comment on lines 415 to 425
# sort here and then don't sort in the groupby so we can process
# the newer years of data first. This is so we can see early if
# new data causes any failures.
df = df.sort_index(ascending=False)
for year, gdf in df.groupby(df.index.year, sort=False):
if year in years:
logger.info(f"Imputing year {year}")
keep = df.columns[~gdf.isnull().all()]
tsi = pudl.analysis.timeseries_cleaning.Timeseries(gdf[keep])
result = tsi.to_dataframe(tsi.impute(method="tnn"), copy=False)
results.append(result)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i did two things here:

  • make the newer years run first - bc this takes 15 minutes to run omigosh and presumably the new failures will come in the new years.
  • restrict the years that this impute is run on - bc the xbrl data introduced "Hour 24" records that are the first second of Jan 1 of the following year & this impute needs... well at least more than one record for the whole year

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm ok! I thought we fixed the weird hour data for XBRL. I remember the issue was with Hour 0 vs. Hour 24 etc. but maybe we just made then all Hour 24. Should we correct it so it's not the first second of the new year?

@cmgosnell cmgosnell enabled auto-merge September 25, 2024 22:24
Copy link
Member

@aesharpe aesharpe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have any tests for the yearly_forecast table, but I think it's ok!

Comment on lines +409 to +410
# impute_ferc714_hourly_demand_matrix chunks over years at a time
# and having only one record
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks unfinished

Comment on lines +411 to +416
if year in years:
logger.info(f"Imputing year {year}")
keep = df.columns[~gdf.isnull().all()]
tsi = pudl.analysis.timeseries_cleaning.Timeseries(gdf[keep])
result = tsi.to_dataframe(tsi.impute(method="tnn"), copy=False)
results.append(result)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should add a check here to make sure there aren't too many year values outside of years

@cmgosnell cmgosnell added this pull request to the merge queue Sep 25, 2024
Merged via the queue into main with commit b291160 Sep 26, 2024
17 checks passed
@cmgosnell cmgosnell deleted the transform-714-xbrl branch September 26, 2024 01:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data-update When fresh data is integrated into PUDL from quarterly or annual updates ferc714 Anything having to do with FERC Form 714
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Write transform function to clean and normalize FERC 714 XBRL hourly historic load table
4 participants