Skip to content

A boilerplate of a small data platform that can also run on your local machine.

Notifications You must be signed in to change notification settings

okeyaki/smalldata

Repository files navigation

Smalldata

This project is a boilerplate of a small data platform that can also run on your local machine.

🚀 Quickstart

Prerequisites

Ensure you have the following tools installed:

Installation

Run the following commands to set up the platform.

cp .env.dist .env

direnv allow

mkdir -p ${SD_DATA_DIR}

uv sync

source .venv/bin/activate

lefthook install

docker compose up -d

🏗️ Architecture

The platform consists of the following layers.

  • Storage
  • Ingestion
  • Transformation
  • Visualization
  • Experimentation
  • Orchestration

📌 Storage

  • Persistes data
  • Consists of multiple DuckDB databases.

Samples

duckdb ${SD_DATA_DIR}/jaffle_shop.db

📌 Ingestion

  • Collects data from various sources and stores them into the storages
  • Consists of simple Python scripts

Samples

uv run src/ingestion/jaffle_shop.py

📌 Transformation

  • Transforms data in storages
  • Consists of multiple dbt projects

Samples

cd src/transformation/jaffle_shop

dbt deps

dbt run

📌 Visualization

  • Visualizes data in the storages
  • Consists of Streamlit applications

Samples

streamlit run src/visualization/jaffle_shop/Welcome.py

📌 Experimentation

Notes

jupyter notebook src/experimentation

📌 Orchestration

  • Orchestrates the processes
  • Consists of Invoke tasks

Samples

invoke jaffle-shop

About

A boilerplate of a small data platform that can also run on your local machine.

Topics

Resources

Stars

Watchers

Forks