This project is a boilerplate of a small data platform that can also run on your local machine.
Ensure you have the following tools installed:
Run the following commands to set up the platform.
cp .env.dist .env
direnv allow
mkdir -p ${SD_DATA_DIR}
uv sync
source .venv/bin/activate
lefthook install
docker compose up -d
The platform consists of the following layers.
- Storage
- Ingestion
- Transformation
- Visualization
- Experimentation
- Orchestration
- Persistes data
- Consists of multiple DuckDB databases.
duckdb ${SD_DATA_DIR}/jaffle_shop.db
- Collects data from various sources and stores them into the storages
- Consists of simple Python scripts
uv run src/ingestion/jaffle_shop.py
- Transforms data in storages
- Consists of multiple dbt projects
cd src/transformation/jaffle_shop
dbt deps
dbt run
- Visualizes data in the storages
- Consists of Streamlit applications
streamlit run src/visualization/jaffle_shop/Welcome.py
- Experiments with data in the storages
- Consists of Jupyter Notebook notebooks
jupyter notebook src/experimentation
- Orchestrates the processes
- Consists of Invoke tasks
invoke jaffle-shop