Gen4s is a powerful data generation tool designed for developers and QA engineers.
Features:
-
Data Generation: Gen4s allows users to generate up-to-date data and publish it to their systems. This is particularly useful for testing and development purposes.
-
Maintain Test Data: Gen4s enables users to maintain test data in the file system or repository, ensuring that the data is always accessible and up-to-date.
-
Data Sharing: With Gen4s, users can easily share test data with their team, improving collaboration and efficiency.
-
Support for Different Profiles: Gen4s supports different profiles such as dev, local, QA, etc. This allows users to switch between different environments as needed.
-
Generation Scenarios: Gen4s supports running generation scenarios. These can be used to publish data, wait, and then publish another portion of data, simulating event time processing.
-
Load Testing: Gen4s is capable of load testing your system by publishing millions of messages. This can help identify potential performance issues.
-
Semi-Generation of Data: Gen4s supports semi-generation of data, where users can generate a CSV file from their database and use it as part of the data generation schema.
-
Command Line Execution: Gen4s can be executed directly from the command line, providing a simple and efficient way to generate data.
-
Environment Variables Profile Loading: Gen4s allows loading environment variables from a file and applying them, which can be useful for managing different runtime environments.
-
Support for Multiple Output Formats: Gen4s supports various output formats including stdout, Kafka, Avro, Protobuf, file system, and HTTP.
-
Schema Definition and Data Generators: Gen4s provides a variety of data generators for different data types and structures, including static values, timestamps, numbers, strings, UUIDs, IP addresses, and more.
-
Scenario Configuration: Gen4s allows for the configuration of multiple stages in a scenario, with configurable delays between stages and the number of samples to generate.
To install Gen4s using Homebrew, you first need to tap into the xdev-developer/tap
repository.
Once the repository is tapped, you can install Gen4s. Here are the steps:
- Open your terminal.
- Tap into the
xdev-developer/tap
repository by running the command:brew tap xdev-developer/tap
. - Once the repository is tapped, install Gen4s by running the command:
brew install gen4s
.
Download latest release from Releases page, unzip archive and execute ./bin/gen4s
Gen4s
Usage: gen4s [preview|run|scenario] [options]
-c, --config <file> Configuration file. Default ./config.conf
-p, --profile <file> Environment variables profile.
-i, --input-records key=value,key1=value1
Key/Value pairs to override generated variable
Command: preview [options]
Preview data generation.
--pretty pretty print
-s, --samples <number> Samples to generate, default 1
Command: run [options]
Run data generation stream.
-s, --samples <number> Samples to generate, default 1
Command: scenario
Run scenario
--help prints usage info
./bin/gen4s run -c ./examples/playground/config.conf
You can create env vars profile for each runtime env: dev, staging, prod etc.
Env vars profile file format
dev.profile
:
KAFKA_BOOTSTRAP_SERVERS=dev.kafka:9095
ORG_ID=12345
./bin/gen4s run -c ./examples/playground/config.conf -s 5 -p ./profiles/dev.profile
./bin/gen4s run -i test-string=hello,test-int=12345 -c ./examples/playground/config.conf
./bin/gen4s scenario -c ./examples/scenario/scenario.conf -p ./profiles/dev.profile
Building standalone application:
sbt 'universal:packageXzTarball' OR
sbt 'universal:packageBin'
Building docker image
sbt 'universal:packageXzTarball'
cd app
docker build -t xdev.developer/gen4s:<version> .
Test docker image
docker run xdev.developer/gen4s:<version> bin/gen4s preview --pretty -c examples/playground/config.conf -s 5
Benchmarking
sbt clean "project benchmarks;jmh:run -i 3 -wi 3 -f3 -t1"
input {
schema = "<path-to>/examples/sample-schema.json"
template = "<path-to>/examples/sample.template"
}
output {
writer: {
type: "std-output"
}
transformers: ["json-prettify"]
}
-
schema - path to schema file
-
template - path to template file.
-
decode-new-line-as-template - treat each line in template file as standalone template.
-
csv-records - csv records input file.
-
global-variables - list of global variables. Global variable will be generated once per run.
Using csv-records streaming you can generate templates using info from csv file with combination of random generators, see examples/csv-input
.
Console output.
output {
writer: {
type: "std-output"
}
transformers = ["json-prettify"]
validators = ["json", "missing-vars"]
}
output {
writer {
type = kafka-output
topic = "logs"
topic = ${?KAFKA_TOPIC}
bootstrap-servers = "localhost:9092"
bootstrap-servers = ${?KAFKA_BOOTSTRAP_SERVERS}
batch-size = 1000
headers {
key = value
}
decode-input-as-key-value = true
producer-config {
compression-type = none # snappy, gzip, lz4
in-flight-requests = 5
linger-ms = 15
max-batch-size-bytes = 1024
max-request-size-bytes = 512
}
}
transformers = ["json-minify"]
validators = ["json", "missing-vars"]
}
-
decode-input-as-key-value: true/false - decode input template as key/value json.
key will be produced as 'kafka message key' and value as 'kafka message value'.
{ "key": 1, "value": { "id": 1, "timestamp": {{ts}}, "event": "Logged in" } }
output {
writer {
type = kafka-avro-output
topic = "logs-avro"
topic = ${?KAFKA_TOPIC}
bootstrap-servers = "localhost:9092"
bootstrap-servers = ${?KAFKA_BOOTSTRAP_SERVERS}
batch-size = 1000
headers {
key = value
}
decode-input-as-key-value = true
producer-config {
compression-type = gzip
in-flight-requests = 1
linger-ms = 15
max-batch-size-bytes = 1024
max-request-size-bytes = 512
}
avro-config {
schema-registry-url = "http://localhost:8081"
schema-registry-url = ${?SCHEMA_REGISTRY_URL}
key-schema = "/path/to/file/key.avsc"
value-schema = "/path/to/file/value.avsc"
auto-register-schemas = false
registry-client-max-cache-size = 1000
}
}
transformers = []
validators = ["json", "missing-vars"]
}
- key-schema - path to key schema, Optional.
- value-schema - path to value schema, Optional.
- auto-register-schemas - register schemas in schema-registry.
How schema resolver works:
- Read from file.
- When file isn't provided, gen4s lookup schema subject from schema registry (topic_name-key or topic_name-value).
output {
writer {
type = kafka-protobuf-output
topic = "persons-proto"
topic = ${?KAFKA_TOPIC}
bootstrap-servers = "localhost:9092"
bootstrap-servers = ${?KAFKA_BOOTSTRAP_SERVERS}
batch-size = 1000
headers {
key = value
}
decode-input-as-key-value = true
proto-config {
schema-registry-url = "http://localhost:8081"
schema-registry-url = ${?SCHEMA_REGISTRY_URL}
value-descriptor {
file = "./examples/kafka-protobuf/person-value.desc"
message-type = "Person"
}
auto-register-schemas = true
registry-client-max-cache-size = 1000
}
}
transformers = []
validators = ["json", "missing-vars"]
}
- value-descriptor - path to protobuf descriptor and message type.
- auto-register-schemas - register schemas in schema-registry.
Descriptor file can be created using protoc
command:
protoc --include_imports --descriptor_set_out=person-value.desc person-value.proto
or using scalapbc
scalapbc --include_imports --descriptor_set_out=person-value.desc person-value.proto
output {
writer {
type = fs-output
dir = "/tmp"
filename-pattern = "my-cool-logs-%s.txt"
}
transformers = ["json-prettify"]
validators = ["json", "missing-vars"]
}
output {
writer {
type = http-output
url = "http://example.com"
url = ${?REQUEST_URL}
method = POST
headers {
key = value
}
parallelism = 3
content-type = "application/json"
stop-on-error = true
}
transformers = ["json-minify"]
validators = ["json", "missing-vars"]
}
output {
writer {
type = s-3-output
bucket = "test-bucket"
key = "key-%s.json"
region = "us-east-1"
endpoint = "http://localhost:4566"
part-size-mb = 5
}
transformers = ["json-minify"]
validators = ["json", "missing-vars"]
}
The available options for configuring an S3 output are:
bucket
: This is the name of the S3 bucket where the output data will be written.key
: Represents the object key pattern. The%s
is a placeholder that will be replaced unique identifier.region
: This is the AWS region where the S3 bucket is located.endpoint
: This is the URL of the S3 service endpoint. This can be useful for testing with local S3-compatible services like LocalStack.part-size-mb
: This is used to specify the part size for multipart uploads to the S3 bucket. The value is in megabytes.
json-minify - transform generated JSON to compact printed JSON - (removes all new lines and spaces).
json-prettify - transform generated JSON to pretty printed JSON.
Using scenario you can run multiple stages, configure delay
between stages and number of samples to generate.
stages: [
{ name: "Playground", samples: 5, config-file: "./examples/playground/config.conf", delay: 5 seconds},
{ name: "CSV Input", samples: 3, config-file: "./examples/csv-input/config.conf"}
]
This sampler can be used like template constant (static value).
{ "variable": "id", "type": "static", "value": "id-12332221"}
{ "variable": "ts", "type": "timestamp", "unit": "sec"}
unit - timestamp unit, possible values: ms, ns, micros, sec. Default value - ms.
shiftDays - shift timestamp to n or -n days. Optional.
shiftHours - shift timestamp to n or -n hours. Optional.
shiftMinutes - shift timestamp to n or -n minutes. Optional.
shiftSeconds - shift timestamp to n or -n seconds. Optional.
shiftMillis - shift timestamp to n or -n milliseconds. Optional.
{ "variable": "my-int", "type": "int", "min": 10, "max": 1000 }
{ "variable": "test-double", "type": "double", "min": 10.5, "max": 15.5, "scale": 6 }
{ "variable": "test-bool", "type": "boolean"}
{ "variable": "test-string", "type": "string", "len": 10}
{ "variable": "test-string-pattern", "type": "pattern", "pattern": "hello-???-###"} // hello-abc-123
{ "variable": "test-uuid", "type": "uuid" }
{ "variable": "test-ip", "type": "ip", "ipv6": false }
{ "variable": "test-enum", "type": "enum", "oneOf": ["hello", "world"] }
{ "variable": "test-var", "type": "env-var", "name": "ORG_ID" }
Supported env vars:
List(
"CUSTOMER_ID",
"USER_ID",
"USERNAME",
"ORG_ID",
"EVENT_ID",
"user.name",
"os.name"
)
OR any env var with G4S_
prefix, for example G4S_QA_USERNAME
{ "variable": "test-date", "type": "date", "format": "MM/dd/yyyy", "shiftDays": -10 }
format - date format.
shiftDays - shift timestamp to n or -n days. Optional.
shiftHours - shift timestamp to n or -n hours. Optional.
shiftMinutes - shift timestamp to n or -n minutes. Optional.
shiftSeconds - shift timestamp to n or -n seconds. Optional.
{ "variable": "test-array", "type": "list", "len": 3, "generator": { "variable": "_", "type": "ip" } }
Where len - list size to generate.
generator - element generator.
- * - generates any symbol
- *{2} - generates random symbols with 2 symbols size
- *{2, 5} - generates random symbols with random size between 2 and 5
- %w - generates random english word
- %w{4} - generates random english word with fixed length. Max available length is 31
- %w{2, 6} - generates random english word with random length between 2 and 6
- %n{2} - returns defined number
- %n{4, 10} - returns random number between 4 and 10
- #{4} - returns random HEX number with provided length (4)
- #{4, 8} - returns random HEX number of random length between 4 and 8
- %ip4, %ip6, %mac - generates random values for IP v4, IP v6 and mac address respectively
- other values are considered as text tokens