Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transform PHMSA company data #4005

Draft
wants to merge 54 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
d41717d
Update phmsagas DOI and start transformation
Sep 21, 2024
ba27b7e
Merge remote-tracking branch 'upstream/main' into issue-3770-transfor…
Sep 23, 2024
b40071e
Starting data transformation
Sep 23, 2024
90fd277
Update notebook and change etl_fast phmsagas years
Sep 26, 2024
c5406b1
Add 2023 package data columns for new phmsagas run
Sep 27, 2024
9fc3f45
Added documentation
Oct 1, 2024
c3e2c67
Add troubleshooting to index
Oct 1, 2024
1bfb71d
Update troubleshooting
Oct 2, 2024
7760e9a
Add helpers
Oct 6, 2024
e23bd61
Temp add change
Oct 8, 2024
b5c7acd
Update column mappings
Oct 11, 2024
035712e
Merge remote-tracking branch 'upstream/main' into issue-3770-transfor…
Oct 11, 2024
5d7d00f
Merge remote-tracking branch 'upstream/main' into issue-3770-transfor…
Oct 18, 2024
9518e83
Update notebook and add draft transform script
Oct 24, 2024
418cd55
Merge remote-tracking branch 'upstream/main' into issue-3770-transfor…
Oct 24, 2024
66b63b2
Remove old files and cleanup helpers
Oct 24, 2024
cfc62d4
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 24, 2024
3225680
Resolved merge conflicts
Nov 1, 2024
f276746
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 1, 2024
ac67b39
Resolve merge conflicts
Nov 1, 2024
57209e9
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 1, 2024
6b2747e
Remove list of columns
Nov 1, 2024
d6bb6ea
Remove '.0' logic
Nov 1, 2024
1769487
Updates in response to comments
Nov 2, 2024
405cc1c
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 2, 2024
78cf151
Clean up documentation and logic
Nov 2, 2024
0c69a84
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 2, 2024
044dcd5
Use bulk series str ops
Nov 3, 2024
d8be474
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 3, 2024
c5aa116
Reorder transformations
Nov 3, 2024
aa3ac9c
Remove .0 substring from phone numbers
Nov 3, 2024
e3ec14e
Remove temp dev logic
Nov 3, 2024
e84f348
Merge remote-tracking branch 'upstream/main' into issue-3770-transfor…
Nov 3, 2024
7f94d13
Cleanup notebook
Nov 3, 2024
f1ba3dc
Merge remote-tracking branch 'upstream/main' into issue-3770-transfor…
Nov 16, 2024
cb6c767
Make updates per PR feedback
Nov 24, 2024
05cc8ce
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 24, 2024
1d3db43
Merge remote-tracking branch 'upstream/main' into issue-3770-transfor…
seeess1 Nov 24, 2024
2979526
Merge remote-tracking branch 'upstream/main' into issue-3770-transfor…
seeess1 Dec 6, 2024
9f57d77
Cleanup method description
seeess1 Dec 6, 2024
94d4d5d
Merge branch 'issue-3770-transform-phmsagas-data' of https://github.c…
seeess1 Dec 6, 2024
6f504e5
Update inits, classes, and fields
seeess1 Dec 7, 2024
759f1e6
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 7, 2024
f1fa7a3
Deduplication and test updates
seeess1 Dec 12, 2024
2ae5dfb
Merge branch 'issue-3770-transform-phmsagas-data' of https://github.c…
seeess1 Dec 12, 2024
d5f5ffe
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 12, 2024
62a821c
Merge branch 'main' into phmsa-company-transform
e-belfer Jan 7, 2025
875da71
Extract new PHMSA data, fix state enum, add alembic migration
e-belfer Jan 7, 2025
06ae880
Merge branch 'phmsa-company-transform' of https://github.com/catalyst…
e-belfer Jan 7, 2025
9ac7858
Address ruff failures and unit test failure, move analyzing code to n…
e-belfer Jan 7, 2025
520c287
Get asset checks to run
e-belfer Jan 7, 2025
95e9a5e
Merge branch 'main' into phmsa-company-transform
e-belfer Jan 7, 2025
50ca79a
Merge branch 'main' into phmsa-company-transform
e-belfer Jan 7, 2025
3cfd6fc
Update release notes
e-belfer Jan 7, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions docs/release_notes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,13 @@ EIA 176
* Extracted these interim tables up through the latest 2023 data release. See
:issue:`4002` and :pr:`4004`.

PHMSA
~~~~~
* Add a transformed table containing annual operator data from PHMSA natural gas
distributors. This is a subset of the overall distributor data, focusing on
company-level attributes. Thanks to :user:`seeess1` for all of your work on this! See
:issue:`3770` and :pr:`4005`.

Bug Fixes
^^^^^^^^^

Expand Down

Large diffs are not rendered by default.

4 changes: 2 additions & 2 deletions notebooks/work-in-progress/eia861-transform.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -1438,7 +1438,7 @@
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m/var/folders/tf/l271ymp92vvbty6j01j580xm0000gn/T/ipykernel_27389/914883076.py\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;31m# Prep raw data for comparison\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mraw_dfs_dict\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0meia861_raw_dfs\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcopy\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 3\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mdf_name\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mdf\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mraw_dfs_dict\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mitems\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0mdf\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mpudl\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mhelpers\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfix_eia_na\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdf\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m/var/folders/tf/l271ymp92vvbty6j01j580xm0000gn/T/ipykernel_27389/914883076.py\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;31m# Prep raw data for comparison\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mraw_dfs_dict\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0meia861_raw_dfs\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcopy\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 3\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mdf_name\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mdf\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mraw_dfs_dict\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mitems\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0mdf\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mpudl\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mhelpers\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mstandardize_na_values\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdf\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;31mNameError\u001b[0m: name 'eia861_raw_dfs' is not defined"
]
}
Expand All @@ -1448,7 +1448,7 @@
"raw_dfs_dict = eia861_raw_dfs.copy()\n",
"\n",
"for df_name, df in raw_dfs_dict.items():\n",
" df = pudl.helpers.fix_eia_na(df)\n",
" df = pudl.helpers.standardize_na_values(df)\n",
" df = pudl.helpers.convert_to_date(df)\n",
" raw_dfs_dict[df_name] = df\n",
" \n",
Expand Down
Loading
Loading