dbt-mimesis

Overview

This repository provides a framework for testing dbt data pipelines using:

Mimesis: A Python library for generating realistic fake data.
Pydantic: For parsing and validating schema.yml files.
dbt (data build tool): To manage transformations and run tests agains data pipelines.

The goal is to enable robust pipeline testing with realistic, schema-compliant fake data - without relying on sensitive production datasets.

Features

Test Data Generation: Automatically generate fake data based on dbt schemas (e.g., including constraints such as primary keys, foreign keys, nullability, uniqueness).
Referential Integrity: Ensure primary/foreign key relationships are respected in generated fake datasets.
CI/CD Integration: Includes a GitHub Actions pipeline to automate test data generation and dbt commands.

Prerequisites

Python 3.10
Poetry
Docker (optional, for the development container)

Quick Start

1. Clone the Repository

git clone https://github.com/inovex/blog-dbt-mimesis
cd blog-dbt-mimesis

2. Set Up Your Environment

Option 1: Use the Development Container (Recommended)

The repository includes a Development Container specification for quick setup.

Option 2: Manual Setup

Install the required dependencies:

# Install Python dependencies
poetry install

# Install dbt dependencies
cd dbt_mimesis_example
poetry run dbt deps

As this project uses duckdb, you also must install DuckDB CLI on your machine. You can follow this guide to install it. Then, run the following command to create a database file inside the dbt_mimesis_example directory:

duckdb dev.duckdb "SELECT 'Database created successfully';"

# navigate back to the root of the repository
cd ../

3. Generate Test Data

You can use the following command to generate some test data based on dbt_mimesis_example/seeds/schema.yml:

poetry run python data_generator/main.py \
    --dbt-model-path dbt_mimesis_example/seeds/schema.yml \
    --output-path dbt_mimesis_example/seeds/ \
    --min-rows <MIN_NUM_OF_ROWS> \
    --max-rows <MAX_NUM_OF_ROWS>

Using the --min-rows and --max-rows flags, you can specify the minimum/maximum amount of rows to be created for each table. Each table's row count will be a random number within the specified range.

4. Run dbt pipeline

You can now load the generated seed data to the duckdb database, execute downstream dbt models and perform dbt tests:

cd dbt_mimesis_example
# load seeds into duckdb
poetry run dbt seed

# run dbt models
poetry run dbt run

# perform dbt data tests
poetry run dbt test

Repository Structure

.
├── data_generator              # Python code to generate data
│   ├── generator.py            # implements the TestDataGenerator class
│   ├── __init__.py
│   ├── main.py                 # implements the main method to generate data for a dbt schema
│   ├── models.py               # Pydantic models to validate dbt schemas
├── dbt_mimesis_example
│   ├── dbt_project.yml         # dbt project definition
│   ├── dependencies.yml        # dbt dependencies
│   ├── macros                  # dbt macros
│   ├── models                  # dbt models
│   │   ├── airplanes.sql       # model for the airplanes table
│   │   ├── cities.sql          # model for the cities table
│   │   ├── flights.sql         # model for the flights table
│   │   └── schema.yml          # schema definition for the dbt models
│   ├── profiles.yml            # dbt profile
│   ├── README.md
│   ├── seeds
│   │   └── schema.yml          # dbt Seeds schema definition
│   ├── snapshots               # dbt snapshots
│   └── tests                   # dbt tests
├── poetry.lock
├── pyproject.toml
└── README.md

CI/CD Integration

This repository includes a GitHub Actions pipeline that:

Generates test data automatically.
Runs all dbt commands (e.g., seed, run, test) upon pull request or push to the main branch.
Fails if any of the dbt commands exits with a non zero exit code.

Resources

For more details, check out:

Happy testing! 🎉

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
.devcontainer		.devcontainer
.github/workflows		.github/workflows
data_generator		data_generator
dbt_mimesis_example		dbt_mimesis_example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

dbt-mimesis

Overview

Features

Prerequisites

Quick Start

1. Clone the Repository

2. Set Up Your Environment

Option 1: Use the Development Container (Recommended)

Option 2: Manual Setup

3. Generate Test Data

4. Run dbt pipeline

Repository Structure

CI/CD Integration

Resources

About

Releases

Packages

Languages

inovex/blog-dbt-mimesis

Folders and files

Latest commit

History

Repository files navigation

dbt-mimesis

Overview

Features

Prerequisites

Quick Start

1. Clone the Repository

2. Set Up Your Environment

Option 1: Use the Development Container (Recommended)

Option 2: Manual Setup

3. Generate Test Data

4. Run dbt pipeline

Repository Structure

CI/CD Integration

Resources

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages