-
-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FERC 714: transform of hourly demand table (dbf +xbrl) #3842
Conversation
Just a lil lonely comment because it's not part of the code you edited therefore I can't comment directly on the line... The doc strings for the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really good! I found it really easy to go through all the functions and understand what was happening to the table. Most of my comments are non-blocking. Love that all of the functions are bite-sized! :)
…osing all the report_dates plus lots of documentation
* Add respondent ID csv * Add notes columns to CSV
oh also, @cmgosnell reminder to add some color to the |
…ferc714__yearly_planning_area_demand_forecast table
…perative/pudl into ferc714-data-source
Fixes to the 714 data source docs page
ferc714_xbrl_to_sqlite_settings: | ||
years: [2021, 2022] | ||
years: [2021, 2023] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the years here and in the pudl-etl settings were disjoint (here had 2022 and the pudl etl settings had 2023. this was causing several errors and was hard to track down bc it only happened when i re-ran the xbrl_to_sqlite stuff. because of this i added a validation (and corresponding unit test) in to the EtlSettings
src/pudl/analysis/state_demand.py
Outdated
# sort here and then don't sort in the groupby so we can process | ||
# the newer years of data first. This is so we can see early if | ||
# new data causes any failures. | ||
df = df.sort_index(ascending=False) | ||
for year, gdf in df.groupby(df.index.year, sort=False): | ||
if year in years: | ||
logger.info(f"Imputing year {year}") | ||
keep = df.columns[~gdf.isnull().all()] | ||
tsi = pudl.analysis.timeseries_cleaning.Timeseries(gdf[keep]) | ||
result = tsi.to_dataframe(tsi.impute(method="tnn"), copy=False) | ||
results.append(result) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i did two things here:
- make the newer years run first - bc this takes 15 minutes to run omigosh and presumably the new failures will come in the new years.
- restrict the years that this impute is run on - bc the xbrl data introduced "Hour 24" records that are the first second of Jan 1 of the following year & this impute needs... well at least more than one record for the whole year
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm ok! I thought we fixed the weird hour data for XBRL. I remember the issue was with Hour 0 vs. Hour 24 etc. but maybe we just made then all Hour 24. Should we correct it so it's not the first second of the new year?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have any tests for the yearly_forecast table, but I think it's ok!
# impute_ferc714_hourly_demand_matrix chunks over years at a time | ||
# and having only one record |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks unfinished
if year in years: | ||
logger.info(f"Imputing year {year}") | ||
keep = df.columns[~gdf.isnull().all()] | ||
tsi = pudl.analysis.timeseries_cleaning.Timeseries(gdf[keep]) | ||
result = tsi.to_dataframe(tsi.impute(method="tnn"), copy=False) | ||
results.append(result) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should add a check here to make sure there aren't too many year
values outside of years
Overview
Closes #3838. There is a tasklist in the issue!
What problem does this address?
most of the work here has been in cleaning the date formats 🙄
convert_dates_to_zero_offset_hours_xbrl
hours being 01-24 or 01-00 (of next day.. probably I put in some emails to ferc about this)convert_dates_to_zero_seconds
: some last record of the days being last second of the day (T23:59)TO-DO once all tables are merged in
_post_process
functionTesting
How did you make sure this worked? How can a reviewer verify this?
To-do list