Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write transform function to clean and normalize FERC 714 XBRL hourly historic load table #3838

Closed
21 of 22 tasks
Tracked by #3809
aesharpe opened this issue Sep 9, 2024 · 2 comments · Fixed by #3842
Closed
21 of 22 tasks
Tracked by #3809
Assignees
Labels
data-update When fresh data is integrated into PUDL from quarterly or annual updates ferc714 Anything having to do with FERC Form 714

Comments

@aesharpe
Copy link
Member

aesharpe commented Sep 9, 2024

Tasks

Preview Give feedback

out of scope here -> issue or pr or otherwise

@cmgosnell
Copy link
Member

cmgosnell commented Sep 17, 2024

should we add the respondent_id_ferc714_csv and respondent_id_ferc714_xbrl into the table schema?

reasons to have the source ids in the table

because this hourly table is a parquet table not a db table it doesn’t really have foreign key relationships to these soon to be small glue tables the stores the map between the pudl-derived ids and these new ids. i assume most users will primarily be familiar with or searching for a utility with one of these _{source} native IDs. usually we merge these things back into the tables during the out_ de-normalization phase but we don't do that to this table.

reasons to not have the source ids in the table

but adding these two columns into an already very large table is not the tidiest. one will always be null (null xbrl id in csv years and vice versa)... although we could fill in the values for their respective other source years.

@cmgosnell
Copy link
Member

cmgosnell commented Sep 18, 2024

re plotting the demand timeseries, here are a few things I did. Nothing hugely bad jumped out at me tbh.

  • plot full timeseries for all respondents - done with new pudl-derived respondent ids to see if there were weird disjointed things that would maybe indicate we mapped the ID's incorrectly. i was also looking for new weird big missing bits or new trends that were real different from the past. I did find a good handful of big gaps and periods where the demands was suddenly higher or lower in the CSV years but i'm assuming that's oos for this work
  • plot 2019-now for all respondents: looking for similar oddities about the
  • plot one month at a time. i tried a few months separately all from the new data. mostly for this one i was looking for phase shifts that could indicate that we got the timezone wrong. believe it or not nothing jumped out at me. I was mostly looking for which hour the peaks happened... this felt very unscientific I bet we could find some other way to do this more

a few fully unblocking imo respondent_id_ferc714's just to note for maybe later cleaning:

fully_empty = [42,47, 48,175,180,31, 97] # these guys looked always 0 or null
big_one_time_spikes = [19,161, 77] # mostly in csv years but these looked anomalous 
long_big_spikes = [16] # one ~year long spike back in the csv years

this is my little code snippet that i used for plot things.. this is the most complicated one so you can remove specific pieces of the mask to expand it

for rid in df.respondent_id_ferc714.unique():
    recent = df[(df.respondent_id_ferc714 == rid) & (df.datetime_utc.dt.month == 5) & (df.datetime_utc.dt.year == 2022)]
    if not recent.empty or recent.demand_mwh.notnull().any():
        recent.sort_values("datetime_utc").plot("datetime_utc", "demand_mwh", title=f"Respondent ID {rid}")

        plt.show()

some examples

these are not super helpful on their own but you get the gist
all time:
image
recent years:
image

one month:
image

@github-project-automation github-project-automation bot moved this from In review to Done in Catalyst Megaproject Sep 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data-update When fresh data is integrated into PUDL from quarterly or annual updates ferc714 Anything having to do with FERC Form 714
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants