Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transform PHMSA company data #4005

Draft
wants to merge 54 commits into
base: main
Choose a base branch
from
Draft

Transform PHMSA company data #4005

wants to merge 54 commits into from

Conversation

e-belfer
Copy link
Member

@e-belfer e-belfer commented Jan 7, 2025

Overview

Addresses part of #3770. Taking over @seeess1 's #3929 PR.

What problem does this address?

Goal of the PR is to complete the first transformation of raw PHMSA data into a core asset.

What did you change?

  • Added a new transform script for PHMSA data.
  • Made one helper method (fix_eia_na) more generic since it can be applied across data sets (and updated references accordingly).
  • Added a few other helper methods.
  • Updated column map files with 2023 values (pulled from whatever was in 2022).
  • Specifically changed the ordering of columns in /Users/sam/Documents/pudl/src/pudl/package_data/phmsagas/column_maps/yearly_distribution.csv since we had fax columns mapped to emails and vice versa.

Remaining tasks

  • Update data DOI
  • Write to alembic
  • Fix test failures
  • Get validation asset checks running
  • Run entire ETL to make sure state encoding doesn't cause problems anywhere else!

Documentation

Make sure to update relevant aspects of the documentation.

Tasks

Preview Give feedback

Testing

How did you make sure this worked? How can a reviewer verify this?

To-do list

Preview Give feedback

sam and others added 30 commits September 21, 2024 13:53
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@e-belfer e-belfer self-assigned this Jan 7, 2025
@e-belfer e-belfer added new-data Requests for integration of new data. phmsa Data from the Pipeline and Hazardous Material Safety Administration community labels Jan 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community new-data Requests for integration of new data. phmsa Data from the Pipeline and Hazardous Material Safety Administration
Projects
Status: In progress
Development

Successfully merging this pull request may close these issues.

2 participants