Skip to content

Latest commit

 

History

History
65 lines (50 loc) · 3.26 KB

README.md

File metadata and controls

65 lines (50 loc) · 3.26 KB

PUDL Example Notebooks

This repository contains a collection of Jupyter notebooks with examples of how to use the data and software distributed by Catalyst Cooperative's Public Utility Data Liberation (PUDL) project.

Run PUDL Notebooks on Kaggle

The easiest way to get up and running with these examples and a fresh copy of all the PUDL data is on Kaggle.

Kaggle offers substantial free computing resources and convenient data storage, so you can start playing with the PUDL data without needing to set up any software or download any data.

You'll find the PUDL data dictionary helpful for interpreting the data.

Running Jupyter locally

If you're already familiar with git, Python environments, filesystem paths, and running upyter notebooks locally, you can also work with these notebooks and the PUDL data locally:

  • Create a Python environment that includes common data science packages. We like to use the mamba package manager and the conda-forge channel.
  • Clone this repository.
  • Download the PUDL dataset from Kaggle (it's ~20GB!) and unzip it somewhere conveniently accessible from the notebooks in the cloned repo.
  • Start your JupyterLab or Jupyter Notebook server and navigate to the notebooks in the cloned repo.
  • You'll need to adjust the file paths in the notebooks to point at the directory where you put the PUDL data, and might need to adjust the packages installed in your Python environment to work with the notebooks.

Other Data Access Methods

See the PUDL documentation for other data access methods.

If you're familiar with cloud services, you can check out:

Stalk us on the Internet