Skip to content

Data generator tool for developers and QA engineers.

License

Notifications You must be signed in to change notification settings

xdev-developer/gen4s

Repository files navigation

Gen4s - data generator tool for developers and QA engineers.

Scala Steward badge Coverage Status Build status

Gen4s is a powerful data generation tool designed for developers and QA engineers.

Features:

  • Data Generation: Gen4s allows users to generate up-to-date data and publish it to their systems. This is particularly useful for testing and development purposes.

  • Maintain Test Data: Gen4s enables users to maintain test data in the file system or repository, ensuring that the data is always accessible and up-to-date.

  • Data Sharing: With Gen4s, users can easily share test data with their team, improving collaboration and efficiency.

  • Support for Different Profiles: Gen4s supports different profiles such as dev, local, QA, etc. This allows users to switch between different environments as needed.

  • Generation Scenarios: Gen4s supports running generation scenarios. These can be used to publish data, wait, and then publish another portion of data, simulating event time processing.

  • Load Testing: Gen4s is capable of load testing your system by publishing millions of messages. This can help identify potential performance issues.

  • Semi-Generation of Data: Gen4s supports semi-generation of data, where users can generate a CSV file from their database and use it as part of the data generation schema.

  • Command Line Execution: Gen4s can be executed directly from the command line, providing a simple and efficient way to generate data.

  • Environment Variables Profile Loading: Gen4s allows loading environment variables from a file and applying them, which can be useful for managing different runtime environments.

  • Support for Multiple Output Formats: Gen4s supports various output formats including stdout, Kafka, Avro, Protobuf, file system, and HTTP.

  • Schema Definition and Data Generators: Gen4s provides a variety of data generators for different data types and structures, including static values, timestamps, numbers, strings, UUIDs, IP addresses, and more.

  • Scenario Configuration: Gen4s allows for the configuration of multiple stages in a scenario, with configurable delays between stages and the number of samples to generate.

Installation

Using Homebrew

To install Gen4s using Homebrew, you first need to tap into the xdev-developer/tap repository.

Once the repository is tapped, you can install Gen4s. Here are the steps:

  1. Open your terminal.
  2. Tap into the xdev-developer/tap repository by running the command: brew tap xdev-developer/tap.
  3. Once the repository is tapped, install Gen4s by running the command: brew install gen4s.

Manual

Download latest release from Releases page, unzip archive and execute ./bin/gen4s

Running

Gen4s
Usage: gen4s [preview|run|scenario] [options]

  -c, --config <file>      Configuration file. Default ./config.conf
  -p, --profile <file>     Environment variables profile.
  -i, --input-records key=value,key1=value1
                           Key/Value pairs to override generated variable

Command: preview [options]
Preview data generation.
  --pretty                 pretty print
  -s, --samples <number>   Samples to generate, default 1

Command: run [options]
Run data generation stream.
  -s, --samples <number>   Samples to generate, default 1

Command: scenario
Run scenario
  --help                   prints usage info
./bin/gen4s run -c ./examples/playground/config.conf

Running with profile

You can create env vars profile for each runtime env: dev, staging, prod etc.

Env vars profile file format

dev.profile:

KAFKA_BOOTSTRAP_SERVERS=dev.kafka:9095
ORG_ID=12345
./bin/gen4s run -c ./examples/playground/config.conf -s 5 -p ./profiles/dev.profile

Running with value override

./bin/gen4s run -i test-string=hello,test-int=12345 -c ./examples/playground/config.conf

Runninng scenario

./bin/gen4s scenario -c ./examples/scenario/scenario.conf -p ./profiles/dev.profile

Building from source

Building standalone application:

sbt 'universal:packageXzTarball' OR
sbt 'universal:packageBin'

Building docker image

sbt 'universal:packageXzTarball'
cd app
docker build -t xdev.developer/gen4s:<version> .

Test docker image

docker run xdev.developer/gen4s:<version> bin/gen4s preview --pretty -c examples/playground/config.conf -s 5

Testing

Benchmarking

sbt clean "project benchmarks;jmh:run -i 3 -wi 3 -f3 -t1"

Configuration

input {
    schema = "<path-to>/examples/sample-schema.json"
    template = "<path-to>/examples/sample.template"
}


output {
    writer: {
      type: "std-output"
    }

    transformers: ["json-prettify"]
}

Input

  • schema - path to schema file

  • template - path to template file.

  • decode-new-line-as-template - treat each line in template file as standalone template.

  • csv-records - csv records input file.

  • global-variables - list of global variables. Global variable will be generated once per run.

CSV Records streaming

Using csv-records streaming you can generate templates using info from csv file with combination of random generators, see examples/csv-input.

Output

Stdout output

Console output.

output {
    writer: {
      type: "std-output"
    }

    transformers = ["json-prettify"] 
    validators = ["json", "missing-vars"]
}

Kafka output

output {
    writer {
        type = kafka-output

        topic = "logs"
        topic = ${?KAFKA_TOPIC}

        bootstrap-servers = "localhost:9092"
        bootstrap-servers = ${?KAFKA_BOOTSTRAP_SERVERS}

        batch-size = 1000
                
        headers {
            key = value
        }

        decode-input-as-key-value = true
        
        producer-config {
          compression-type = none # snappy, gzip, lz4
          in-flight-requests =  5
          linger-ms = 15
          max-batch-size-bytes = 1024
          max-request-size-bytes = 512
        }
    }
    transformers = ["json-minify"] 
    validators = ["json", "missing-vars"]
}
  • decode-input-as-key-value: true/false - decode input template as key/value json.

    key will be produced as 'kafka message key' and value as 'kafka message value'.

    {
      "key": 1,
      "value": { "id": 1, "timestamp": {{ts}}, "event": "Logged in" }
    }

Kafka AVRO output

output {
    writer {
        type = kafka-avro-output

        topic = "logs-avro"
        topic = ${?KAFKA_TOPIC}

        bootstrap-servers = "localhost:9092"
        bootstrap-servers = ${?KAFKA_BOOTSTRAP_SERVERS}

        batch-size = 1000
                
        headers {
            key = value
        }

        decode-input-as-key-value = true
        
        producer-config {
          compression-type = gzip
          in-flight-requests =  1
          linger-ms = 15
          max-batch-size-bytes = 1024
          max-request-size-bytes = 512
        }

        avro-config {
          schema-registry-url = "http://localhost:8081"
          schema-registry-url = ${?SCHEMA_REGISTRY_URL}

          key-schema = "/path/to/file/key.avsc"
          value-schema = "/path/to/file/value.avsc"
          auto-register-schemas = false
          registry-client-max-cache-size = 1000
        }
    }
    transformers = []
    validators = ["json", "missing-vars"]
}
  • key-schema - path to key schema, Optional.
  • value-schema - path to value schema, Optional.
  • auto-register-schemas - register schemas in schema-registry.

How schema resolver works:

  • Read from file.
  • When file isn't provided, gen4s lookup schema subject from schema registry (topic_name-key or topic_name-value).

Kafka Protobuf output

output {
    writer {
        type = kafka-protobuf-output

        topic = "persons-proto"
        topic = ${?KAFKA_TOPIC}

        bootstrap-servers = "localhost:9092"
        bootstrap-servers = ${?KAFKA_BOOTSTRAP_SERVERS}

        batch-size = 1000

        headers {
            key = value
        }

        decode-input-as-key-value = true

        proto-config {
          schema-registry-url = "http://localhost:8081"
          schema-registry-url = ${?SCHEMA_REGISTRY_URL}
          
          value-descriptor {
            file = "./examples/kafka-protobuf/person-value.desc"
            message-type = "Person"
          }
          auto-register-schemas = true
          registry-client-max-cache-size = 1000
        }
    }

    transformers = []
    validators = ["json", "missing-vars"]
}
  • value-descriptor - path to protobuf descriptor and message type.
  • auto-register-schemas - register schemas in schema-registry.

Create protobuf descriptor from proto file

Descriptor file can be created using protoc command:

protoc --include_imports --descriptor_set_out=person-value.desc person-value.proto

or using scalapbc

scalapbc --include_imports --descriptor_set_out=person-value.desc person-value.proto

File System output

output {
    writer {
        type = fs-output
        dir = "/tmp"
        filename-pattern = "my-cool-logs-%s.txt"
    }
    transformers = ["json-prettify"]
    validators = ["json", "missing-vars"]
}

Http output

output {
  writer {
    type = http-output
    url = "http://example.com"
    url = ${?REQUEST_URL}

    method = POST
    headers {
        key = value
    }
    parallelism = 3
    content-type = "application/json"
    stop-on-error = true
  }
  transformers = ["json-minify"]
  validators = ["json", "missing-vars"]
}

AWS S3 Output

output {
  writer {
    type = s-3-output
    bucket = "test-bucket"
    key = "key-%s.json"
    region = "us-east-1"
    endpoint = "http://localhost:4566"
    part-size-mb = 5
  }
  transformers = ["json-minify"]
  validators = ["json", "missing-vars"]
}

The available options for configuring an S3 output are:

  • bucket: This is the name of the S3 bucket where the output data will be written.
  • key: Represents the object key pattern. The %s is a placeholder that will be replaced unique identifier.
  • region: This is the AWS region where the S3 bucket is located.
  • endpoint: This is the URL of the S3 service endpoint. This can be useful for testing with local S3-compatible services like LocalStack.
  • part-size-mb: This is used to specify the part size for multipart uploads to the S3 bucket. The value is in megabytes.

Transformers

json-minify - transform generated JSON to compact printed JSON - (removes all new lines and spaces).

json-prettify - transform generated JSON to pretty printed JSON.

Scenario configuration

Using scenario you can run multiple stages, configure delay between stages and number of samples to generate.

stages: [
    { name: "Playground", samples: 5, config-file: "./examples/playground/config.conf", delay: 5 seconds},
    { name: "CSV Input",  samples: 3, config-file: "./examples/csv-input/config.conf"}
]

Schema definition and data generators

Static value generator

This sampler can be used like template constant (static value).

{ "variable": "id", "type": "static", "value": "id-12332221"}

Timestamp generator

{ "variable": "ts", "type": "timestamp", "unit": "sec"}

unit - timestamp unit, possible values: ms, ns, micros, sec. Default value - ms.

shiftDays - shift timestamp to n or -n days. Optional.

shiftHours - shift timestamp to n or -n hours. Optional.

shiftMinutes - shift timestamp to n or -n minutes. Optional.

shiftSeconds - shift timestamp to n or -n seconds. Optional.

shiftMillis - shift timestamp to n or -n milliseconds. Optional.

Int number generator.

{ "variable": "my-int", "type": "int", "min": 10, "max": 1000 }

Double number generator.

{ "variable": "test-double", "type": "double", "min": 10.5, "max": 15.5, "scale": 6 }

Boolean generator.

{ "variable": "test-bool", "type": "boolean"}

String generator.

{ "variable": "test-string", "type": "string", "len": 10}

String pattern generator.

{ "variable": "test-string-pattern", "type": "pattern", "pattern": "hello-???-###"} // hello-abc-123

Java UUID field generator.

{ "variable": "test-uuid", "type": "uuid" }

Ip address generator

{ "variable": "test-ip", "type": "ip", "ipv6": false }

Enumeration generator.

{ "variable": "test-enum", "type": "enum", "oneOf": ["hello", "world"] }

Env var generator.

{ "variable": "test-var", "type": "env-var", "name": "ORG_ID" }

Supported env vars:

    List(
      "CUSTOMER_ID",
      "USER_ID",
      "USERNAME",
      "ORG_ID",
      "EVENT_ID",
      "user.name",
      "os.name"
    )

OR any env var with G4S_ prefix, for example G4S_QA_USERNAME

DateTime generator

{ "variable": "test-date", "type": "date", "format": "MM/dd/yyyy", "shiftDays": -10 }

format - date format.

shiftDays - shift timestamp to n or -n days. Optional.

shiftHours - shift timestamp to n or -n hours. Optional.

shiftMinutes - shift timestamp to n or -n minutes. Optional.

shiftSeconds - shift timestamp to n or -n seconds. Optional.

List generator.

{ "variable": "test-array", "type": "list", "len": 3, "generator": { "variable": "_", "type": "ip" } }

Where len - list size to generate.

generator - element generator.

Template syntax

  • * - generates any symbol
    • *{2} - generates random symbols with 2 symbols size
    • *{2, 5} - generates random symbols with random size between 2 and 5
  • %w - generates random english word
    • %w{4} - generates random english word with fixed length. Max available length is 31
    • %w{2, 6} - generates random english word with random length between 2 and 6
  • %n{2} - returns defined number
  • %n{4, 10} - returns random number between 4 and 10
  • #{4} - returns random HEX number with provided length (4)
  • #{4, 8} - returns random HEX number of random length between 4 and 8
  • %ip4, %ip6, %mac - generates random values for IP v4, IP v6 and mac address respectively
  • other values are considered as text tokens