-
Notifications
You must be signed in to change notification settings - Fork 32
Set up Khermes
Once you have a khermes cluster up & running (see the Getting started section to see how to do so) it's time to set up your khermes cluster to start producing data.
Khermes need four components to start producing data:
- Twirl template: it will define the random data that khermes will produce
- the kafka cluster that khermes will get connected to, the avro configuration that will be stored within the schema registry and finally the khermes configuration. These configurations will be stored in zookeeper, this way, you will be able to re-use them the next time you start your khermes cluster.
We have create a handy web console to easily interact with your khermes cluster. To access the console go to the following url using your favourite browser:
http://localhost:8080/console
You should see a console like this:
Try typing help to get the list of available commands!!
So let's start configuring the kafka configuration, to do so type the following command within the khermes web console:
khermes> create kafka-config
khermes> kafka-config> Please introduce the kafka-config name>
You should introduce a name for the kafka config that will be stored in zookeeper. And after that the console will ask you for a kafka configuration:
khermes> kafka-config> Please introduce the kafka-config>
Copy and paste the following kafka configuration example:
kafka {
bootstrap.servers="localhost:9092"
key.serializer = "io.confluent.kafka.serializers.KafkaAvroSerializer"
value.serializer = "io.confluent.kafka.serializers.KafkaAvroSerializer"
schema.registry.url = "http://localhost:8081"
}
Now it's time to configure the twirl template:
khermes> create twirl-template
khermes> twirl-template> Please introduce the twirl-template name> t1
And then copy and paste the following twirl template example:
@import scala.util.Random
@import com.stratio.khermes.helpers.faker.Faker
@import com.stratio.khermes.helpers.faker.generators.Positive
@(faker: Faker)
@defining(faker.Geo.geolocation, faker.Music.playedSong) { case (randomGeo, randomSong) =>
{
"song": "@(randomSong.song)",
"artist": "@(randomSong.artist)",
"album": "@(randomSong.album)",
"genre": "@(randomSong.genre)",
"playduration": @(faker.Number.number(3,Positive)),
"rating": @(faker.Number.rating(5)),
"user": "@(faker.Name.fullName)",
"usertype": "@(Seq("free", "membership")(Random.nextInt(2)))",
"city": "@(randomGeo.city)",
"location": "@(randomGeo.latitude),@(randomGeo.longitude)",
"starttime": "@(s"${Random.nextInt(24)}:${Random.nextInt(60)}:${Random.nextInt(60)}.${Random.nextInt(1000)}")"
}
}
Let's configure the avro config. Type the following commands in the web console:
khermes> create avro-config
khermes> avro-config> Please introduce the avro-config name> a1
And copy and paste the following avro configuration:
{
"type": "record",
"name": "khermes",
"fields": [{"name": "song","type": "string"},
{"name": "artist","type": "string"},
{"name": "album","type": "string"},
{"name": "genre","type": "string"},
{"name": "playduration","type": "int"},
{"name": "rating","type": "int"},
{"name": "user","type": "string"},
{"name": "usertype","type": "string"},
{"name": "city","type": "string"},
{"name": "location","type": "string"},
{"name": "starttime","type": "string"}]
}
And finally let's configure the khermes generator:
khermes> create generator-config
khermes> generator-config> Please introduce the generator-config name> g1
And then copy and paste the following configuration:
khermes {
templates-path = "/tmp/khermes/templates"
topic = "khermes"
template-name = "khermestemplate"
i18n = "EN"
timeout-rules {
number-of-events: 10
duration: 5 seconds
}
stop-rules {
number-of-events: 5000
}
}
To better understand the timeout-rules and stop-rules see the Khermes configuration section (TO DO!!)
We are set to start producing data!! To do so, let's check out the status of the cluster by typing the ls command in the webconsole. You should see an output like the following:
khermes> ls
Node Id | Status
-------------------------------------------------
180ce7c2-b113-4450-ab66-16d39cebe620 | false
13167b26-08c6-4b17-a751-d7fab8ae0877 | false
8527069d-c1ea-4c67-bdfb-cdb1965bb2fa | false
a37144da-757c-4bf7-aadf-2580dcb26f4c | false
These are your khermes cluster nodes. As you can see all the nodes are in a "false" status it means that none of them are currently producing data. Let's start producing data in the first one, to do so, just type start in the web console and afterwards type the names of the twirl-template, kafka configuration, generator config, avro config and finally the first node id as below:
khermes> start
khermes> start > Please introduce the twirl-template name> t1
khermes> start > Please introduce the kafka-config name> k1
khermes> start > Please introduce the generator-config name> g1
khermes> start > Please introduce the avro-config name> a1
khermes> start > Please introduce the node-ids> 180ce7c2-b113-4450-ab66-16d39cebe620
Command result: OK
If you type in ls again you should see how the first node has started to produce data:
khermes> ls
Node Id | Status
-------------------------------------------------
180ce7c2-b113-4450-ab66-16d39cebe620 | true
a37144da-757c-4bf7-aadf-2580dcb26f4c | false
13167b26-08c6-4b17-a751-d7fab8ae0877 | false
8527069d-c1ea-4c67-bdfb-cdb1965bb2fa | false
Khermes - An open source distributed data generator