Skip to content

Commit

Permalink
final changes
Browse files Browse the repository at this point in the history
  • Loading branch information
fanxu30 committed Dec 14, 2024
1 parent a014e58 commit 92dc51c
Show file tree
Hide file tree
Showing 4 changed files with 49 additions and 53 deletions.
93 changes: 40 additions & 53 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
# fanxu_template
Data engineering individual project #1
# Cloud Hosted Notebook

[![Install](https://github.com/nogibjj/fanxu_template/actions/workflows/install.yml/badge.svg)](https://github.com/nogibjj/fanxu_template/actions/workflows/install.yml)

Expand All @@ -9,63 +8,51 @@ Data engineering individual project #1

[![Format](https://github.com/nogibjj/fanxu_template/actions/workflows/format.yml/badge.svg)](https://github.com/nogibjj/fanxu_template/actions/workflows/format.yml)

This project is for performing data manipulations on a cloud-hosted notebook

The link to the notebook can be accessed here: https://colab.research.google.com/drive/1lmU__izUOrgAEXc9Zdg5R7jm1xkz-ZbB?usp=sharing

Requirements

The project structure must include the following files:
- Jupyter Notebook with:
- Cells that perform descriptive statistics using Polars or Panda.
- Tested by using nbval plugin for pytest
- Makefile with the following:
- Run all tests (must test notebook and script and lib)
- Formats code with Python black
- Lints code with Ruff
- Installs code via: pip install -r requirements.txt
- test_script.py to test script
- test_lib.py to test library
- Pinned requirements.txt
- Gitlab Actions performs all four Makefile commands with badges for each one in the README.md
- Set up a cloud-hosted Jupyter Notebook (e.g., Google Colab)
- Perform data manipulation tasks on a sample dataset

Dataset
- Basketball Referemce 2023-2024 NBA Player Stats: Per Game
- https://www.basketball-reference.com/leagues/NBA_2024_per_game.html#per_game_stats

Required Files

- requirements.txt
- required dependencies to run this file
- provides required versions of devops and web components
- Makefile
- instructions to install, format, lint, and test python files
- devcontainer
- devcontainer.json
- contains docker container for python 3 dependencies
- script.py
- contains code to use pandas to read dataset, generate summary statistics, visualization, and a report
- lib.py
- contains shared code between script.py and notebook
- test_script.py
- contains code to test main.py file
- test_lib.py
- tests lib.py file
- worflows
- install.yml
- installs required python packages and dependencies
- lint.yml
- lints python code
- test.yml
- performs tests on required python files
- format.yml
- properly formats code
- .gitignore
- ignores unecessary files and programs to prevent installation conflicts

Steps
- set up github repository files such as requirements.txt, Makefile, devcontainer, hello.yml, etc.
- create script.py file containing python script to load in CSV file, create summary statistics, plot visualization, and generate a summary report
- test script.py file by making a test_script.py file
- perform a CI/CD run verifying that the code has passed all the linters and tests

Video Walkthrough:

https://youtu.be/hxRWFt41aqw
Project Structure:
```
📦 fan_xu_cloud_nb
├─ .devcontainer
│  └─ devcontainer.json
├─ .github
│  ├─ dependabot.yml
│  └─ workflows
│     ├─ format.yml
│     ├─ install.yml
│     ├─ lint.yml
│     └─ test.yml
├─ Makefile
├─ NBA_24_stats.csv
├─ NBA_pts_bar.png
├─ README.md
├─ __pycache__
│  └─ lib.cpython-312.pyc
├─ gitignore
├─ lib.py
├─ requirements.txt
├─ script.py
├─ test_lib.py
└─ test_script.py
```
©generated by [Project Tree Generator](https://woochanleee.github.io/project-tree-generator)

## Instructions

To use this code, you can clone this repo with:

`git clone https://github.com/nogibjj/fan_xu_cloud_nb.git`

To access the cloud notebook, click on the link above
File renamed without changes
Binary file added images/stats.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
9 changes: 9 additions & 0 deletions walkthrough.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
These statistics were collected from Basketball Reference for the 2023-2024 NBA season. They include a list a of the 735 NBA players who participated in the season ranked by their points per game.

![alt text](<images/stats.png>)

From the summary statistics you can tell a few things, there are a few columns with missing observations such as field goal percentage and free throw percentage, but most of the columns have full observations. You can also see the average NBA player is 26 years old and played in 40 games for that season.

![alt text](<images/NBA_pts_bar.png>)

This graph shows the points per game for players that shot higher than 50% from 3 for the season. As you can see, only one player had more than 8 points per game with most players having well below that. This illustrates how difficult it is to maintain that high of a 3 point percentage with volume shooting.

0 comments on commit 92dc51c

Please sign in to comment.