Welcome to the FFC backend repository!
This project serves as a comprehensive data analytics platform built on freely available data from the official NHL API.
The repository is organized into these primary sections:
- Cloud Infrastructure (details below)
- Data Models for Transformation
- Research Notebooks
The AWS CDK is leveraged to define AWS services for data storage and transformation.
The project follows a standard Python project setup. Upon initialization, a virtualenv is created
within the project, stored under the .venv
directory. To set up the virtualenv, assuming
a python3
(or python
for Windows) executable in your path with access to the venv
package,
follow these steps:
- Run
python3 -m venv .venv
in the project root directory. - Activate the virtualenv:
. .venv/bin/activate
- Install required dependencies:
pip install -r requirements.txt
- Synthesize the CloudFormation template for AWS CDK code:
cdk synth
Before deployment, some useful commands include:
cdk ls
: list all stacks in the appcdk diff
: compare deployed stack with current statecdk docs
: open CDK documentation
To deploy all stacks, run cdk deploy
. For deploying a specific stack, use the stack ID,
such as cdk deploy StorageStack|ComputeStack|TransformStack
.
No guarantees are made regarding the quality of the data. NHL data might contain known issues and biases.
Production data is downloaded and saved to AWS S3 via
the download-raw-games
lambda function.
Triggered daily, it fetches details about the previous night's games.
For historical data, follow these steps to download to local disk within the Docker container
defined in the notebooks/
folder, then manually upload to AWS S3:
python /usr/src/app/src/extract/teams.py
python /usr/src/app/src/extract/players.py
python /usr/src/app/src/extract/games.py
- Articles
- Data sources
- Inspiration for analysis