Please visit https://www.pg-protagonists.com for additional background and a detailed analysis of the results.
This repository represents a capstone project for the University of Michigan, Master of Applied Data Science program.
The U.S. power grid network is both interesting and incredibly complex. Governed by 70+ balancing authorities, 13,000+ power plants, 77,000+ substations, and 160,000+ miles of high-voltage power lines and millions of low-voltage power lines and distribution transformers connecting customers around the United States. The notebooks in this respository seek to achieve the following:
- Visualize the U.S. network of power plants, substations, and balancing authorities
- Identify potential vulnerabilities in the U.S. power grid.
- Provide inherent risk assessment based on network measures, outages, weather events, etc.
The 01_data_collection.ipynb notebook will download and organize all data sets from their source location. The combined size of all data sets is approximately 400MB.
The primary data sets are below:
- Energy Information Agency (EIA)
- U.S. Environmental Protection Agency (EPA)
- Department of Homeland Security (DHS)
- Department of Energy (DOE)
The data sources used to create the network required significant cleaning. The entity relationship diagram, post-cleaning, is included below as a reference.
If the notebooks are run locally, the following command will install the packages according to the configuration file requirements.txt.
# install requirements
$ pip install -r requirements.txt
The project is designed to run the Jupyter notebooks in a specfic order to clean and enrich the original data sets as well as allow additional exploration at different stages. The notebooks can be run locally, or directly in Google Colab using the links below.
Each notebook will need access to data from the prior notebook, so if you are running in Google Colab, you will want to adjust the data storage location in the "Mount Drive" section.
-
01_data_collection.ipynb
This notebook downloads all of the raw data sets and will create the necessary folder structure (data/raw/) in your working directory. -
02_data_cleaning.ipynb
This notebook performs various cleaning activities on the raw data sets, including cross-referencing to ensure there are proper primary and foreign keys amongst them. -
03_network_analysis.ipynb
This notebook imports the cleaned data and creates the networks for power plants and substations, and power plants and balancing authorities. It also calculates related metrics for degree centrality, betweenness centrality, and clustering coefficients, and combines those with the cleaned data. As some of the metrics take a long time to calculate, such as betweeness centrality, pickle files are provided in the models directory and leveraged by default, although the code to re-run them is avaialble in the notebook and can be uncommented. -
04_electric_disturbance_events.ipynb
This notebook imports the cleaned data and uses that to calculate the probability of a disturbance and/or outage at the power plant, substation, and balancing authority levels. The results are combined with the cleaned data for downstream analysis. -
05_energy_forecasting.ipynb
This notebook imports time series data to explore seasonality of balancing authorities and forecast energy generation and demand. -
06_risk_analysis.ipynb This notebook leverages the metrics and analysis from prior notebooks to explore risk associated with substations and balancing authorities.
Some of the visuals and information from these notebooks can be found below.
This project is distributed under the MIT License.
- Paul Natland, [email protected]
- Garrett Woody, [email protected]