Skip to content

Commit

Permalink
Initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
fanxu30 authored Nov 4, 2024
0 parents commit fce69c1
Show file tree
Hide file tree
Showing 15 changed files with 1,051 additions and 0 deletions.
22 changes: 22 additions & 0 deletions .devcontainer/devcontainer.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
// For format details, see https://aka.ms/devcontainer.json. For config options, see the
// README at: https://github.com/devcontainers/templates/tree/main/src/python
{
"name": "Python 3",
// Or use a Dockerfile or Docker Compose file. More info: https://containers.dev/guide/dockerfile
"image": "mcr.microsoft.com/devcontainers/python:1-3.12-bullseye"

// Features to add to the dev container. More info: https://containers.dev/features.
// "features": {},

// Use 'forwardPorts' to make a list of ports inside the container available locally.
// "forwardPorts": [],

// Use 'postCreateCommand' to run commands after the container is created.
// "postCreateCommand": "pip3 install --user -r requirements.txt",

// Configure tool-specific properties.
// "customizations": {},

// Uncomment to connect as root instead. More info: https://aka.ms/dev-containers-non-root.
// "remoteUser": "root"
}
12 changes: 12 additions & 0 deletions .github/dependabot.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# To get started with Dependabot version updates, you'll need to specify which
# package ecosystems to update and where the package manifests are located.
# Please see the documentation for more information:
# https://docs.github.com/github/administering-a-repository/configuration-options-for-dependency-updates
# https://containers.dev/guide/dependabot

version: 2
updates:
- package-ecosystem: "devcontainers"
directory: "/"
schedule:
interval: weekly
20 changes: 20 additions & 0 deletions .github/workflows/format.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
name: CI
on:
push:
branches: [ "main" ]
pull_request:
branches: [ "main" ]
workflow_dispatch:

jobs:
build:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.9", "3.10", "3.12"]
steps:
- uses: actions/checkout@v3
- name: install packages
run: make install
- name: format
run: make format
18 changes: 18 additions & 0 deletions .github/workflows/install.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
name: CI
on:
push:
branches: [ "main" ]
pull_request:
branches: [ "main" ]
workflow_dispatch:

jobs:
build:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.9", "3.10", "3.12"]
steps:
- uses: actions/checkout@v3
- name: install packages
run: make install
20 changes: 20 additions & 0 deletions .github/workflows/lint.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
name: CI
on:
push:
branches: [ "main" ]
pull_request:
branches: [ "main" ]
workflow_dispatch:

jobs:
build:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.9", "3.10", "3.12"]
steps:
- uses: actions/checkout@v3
- name: install packages
run: make install
- name: lint
run: make lint
20 changes: 20 additions & 0 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
name: CI
on:
push:
branches: [ "main" ]
pull_request:
branches: [ "main" ]
workflow_dispatch:

jobs:
build:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.9", "3.10", "3.12"]
steps:
- uses: actions/checkout@v3
- name: install packages
run: make install
- name: test
run: make test
17 changes: 17 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
install:
pip install --upgrade pip && pip install -r requirements.txt

format:
black *.py

#checks python files
lint:
#pylint --ignore-patterns=test_*.py *.py
ruff check *.py

test:
python -m pytest -cov=script -cov=lib
py.test --nbval

all:
install format lint test
740 changes: 740 additions & 0 deletions NBA_24_stats.csv

Large diffs are not rendered by default.

71 changes: 71 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
# fanxu_template
Data engineering individual project #1

[![Install](https://github.com/nogibjj/fanxu_template/actions/workflows/install.yml/badge.svg)](https://github.com/nogibjj/fanxu_template/actions/workflows/install.yml)

[![Lint](https://github.com/nogibjj/fanxu_template/actions/workflows/lint.yml/badge.svg)](https://github.com/nogibjj/fanxu_template/actions/workflows/lint.yml)

[![Test](https://github.com/nogibjj/fanxu_template/actions/workflows/test.yml/badge.svg)](https://github.com/nogibjj/fanxu_template/actions/workflows/test.yml)

[![Format](https://github.com/nogibjj/fanxu_template/actions/workflows/format.yml/badge.svg)](https://github.com/nogibjj/fanxu_template/actions/workflows/format.yml)


Requirements

The project structure must include the following files:
- Jupyter Notebook with:
- Cells that perform descriptive statistics using Polars or Panda.
- Tested by using nbval plugin for pytest
- Makefile with the following:
- Run all tests (must test notebook and script and lib)
- Formats code with Python black
- Lints code with Ruff
- Installs code via: pip install -r requirements.txt
- test_script.py to test script
- test_lib.py to test library
- Pinned requirements.txt
- Gitlab Actions performs all four Makefile commands with badges for each one in the README.md

Dataset
- Basketball Referemce 2023-2024 NBA Player Stats: Per Game
- https://www.basketball-reference.com/leagues/NBA_2024_per_game.html#per_game_stats

Required Files

- requirements.txt
- required dependencies to run this file
- provides required versions of devops and web components
- Makefile
- instructions to install, format, lint, and test python files
- devcontainer
- devcontainer.json
- contains docker container for python 3 dependencies
- script.py
- contains code to use pandas to read dataset, generate summary statistics, visualization, and a report
- lib.py
- contains shared code between script.py and notebook
- test_script.py
- contains code to test main.py file
- test_lib.py
- tests lib.py file
- worflows
- install.yml
- installs required python packages and dependencies
- lint.yml
- lints python code
- test.yml
- performs tests on required python files
- format.yml
- properly formats code
- .gitignore
- ignores unecessary files and programs to prevent installation conflicts

Steps
- set up github repository files such as requirements.txt, Makefile, devcontainer, hello.yml, etc.
- create script.py file containing python script to load in CSV file, create summary statistics, plot visualization, and generate a summary report
- test script.py file by making a test_script.py file
- perform a CI/CD run verifying that the code has passed all the linters and tests

Video Walkthrough:

https://youtu.be/hxRWFt41aqw
1 change: 1 addition & 0 deletions gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
__pycache__
9 changes: 9 additions & 0 deletions lib.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
"""file for modules"""

import pandas as pd


def load_data(dataset):
"load data from csv into pandas dataframe"
data = pd.read_csv(dataset)
return data
22 changes: 22 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
#devops
black==22.3.0
click==8.1.3
pytest==7.4.0
pytest-cov==4.0.0
pylint==2.15.3
boto3==1.24.87
nbval==0.11.0
ruff==0.6.7

#web
fastapi == 0.85.0
uvicorn == 0.18.3

#math
#unpinned due to package conflicts
pandas
matplotlib>=3.5, <3.10
ydata-profiling

#processing
markdownify==0.13.1
45 changes: 45 additions & 0 deletions script.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
"""main file with main functions"""

from lib import load_data
import matplotlib.pyplot as plt
import markdownify as md
from ydata_profiling import ProfileReport

data = "NBA_24_stats.csv"


def summary(dataset):
"""provides summary statistics"""
df = load_data(dataset)
summary_stats = df.describe()
print(summary_stats)


def points_plot(dataset):
"""provides visualization"""
df = load_data(dataset)
accurate = df[df["3P%"] >= 0.5]
player_rank = accurate["Player"].astype(str)
plt.barh(player_rank, width=accurate["PTS"], color="green")
plt.xlabel("PPG")
plt.ylabel("Players")
plt.title("PPG for Players with 50% or higher 3P%")
plt.subplots_adjust(left=0.25)
plt.savefig("NBA_pts_bar.png")
plt.show()


def report(dataset):
"generates report and converts to pdf"
df = load_data(dataset)
profile = ProfileReport(df, title="NBA Statistics")
export = profile.to_html()
markdown = md.markdownify(export)
with open("NBA_report.md", "w", encoding="utf-8") as f_write:
f_write.write(markdown)


if __name__ == "__main__":
summary(data)
points_plot(data)
report(data)
14 changes: 14 additions & 0 deletions test_lib.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
"""file for testing lib.py"""

from lib import load_data


def test_load_data():
"""test load_data function"""
dataset = "NBA_24_stats.csv"
result_load = load_data(dataset)
assert result_load is not None


if __name__ == "__main__":
test_load_data()
20 changes: 20 additions & 0 deletions test_script.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
""" file for testing code"""

from script import summary, points_plot


def test_summary():
"""test descriptive statistics function"""
data = "NBA_24_stats.csv"
summary(data)


def test_plot():
"""test plot function"""
data = "NBA_24_stats.csv"
points_plot(data)


if __name__ == "__main__":
test_summary()
test_plot()

0 comments on commit fce69c1

Please sign in to comment.