Run the command:
mvn clean install
Note that event samples are generated as a part of the build.
The following software is needed to run the applications provided in the bin
directory.
The local machine:
- Spark 3.x built with Scala 2.12
- Spark executables (for example,
spark-submit
) in the PATH - Flink 1.10.x built with Scala 2.12
- Flink executables (for example,
flink
) in the PATH - Hadoop client distribution version 2.7.x
- Hadoop executables (for example,
hadoop
orhdfs
) in the PATH - The
HADOOP_CONF_DIR
environment variable points to the directory with Hadoop configuration files.
Remote dependencies:
- Hadoop cluster with HDFS and YARN
- Kafka cluster 2.x
Some references:
- Running Spark on YARN
- Using Spark's "Hadoop Free" Build - this is relevant for Spark distribution built without Hadoop
- Flink YARN Setup
- Make sure that Zookeeper and Kafka are running.
- Create the
events
andalarms
Kafka topics:kafka-topics.sh --zookeeper localhost:2181 --create --topic events --partitions 4 --replication-factor 1 kafka-topics.sh --zookeeper localhost:2181 --create --topic alarms --partitions 4 --replication-factor 1
All Spark and Flink applications are submitted to YARN.
You can configure settings by editing the shell launcher of the corresponding application in the bin
directory.
Event Generator pushes events to the events
Kafka topic.
It takes the events from Parquet files supplied to the input HDFS directory.
Before sending events to Kafka,
Event Generator substitutes real event timestamps instead of dummy timestamps provided in the Parquet samples.
Run Event Generator:
./bin/event-generator.sh
Event Writer consumes events from the events
Kafka topic
and writes them to the output HDFS directory as Parquet files.
Event Writer partitions Parquet files by siteId
and year_month
.
The content of the output directory may be helpful in troubleshooting,
because it has the data actually sent by Event Generator.
Run Event Writer:
./bin/event-writer.sh
Event Correlator consumes events from the events
Kafka topic, and correlates them.
It creates alarms as a result of the correlation, and sends alarms to the alarms
Kafka topic.
Run Event Correlator:
./bin/event-correlator.sh
Spark Predictor consumes events from the events
Kafka topic, and predicts failures based on the observed events.
It creates alarms as a result of the prediction, and sends alarms to the alarms
Kafka topic.
Run Spark Predictor:
./bin/spark-predictor.sh
- Run Kafka console consumer to get alarms on your terminal:
kafka-console-consumer.sh --bootstrap-server localhost:9092 --group alarms --topic alarms
- Put some samples with communication events to the input HDFS directory of Event Generator:
Note that Parquet files in the input HDFS directory should have distinct names, to let Spark Streaming distinguish them as separate files. You can find more event samples in the
hdfs dfs -put sampler/target/communication_events-controllers_2715_2716_all-1min-uniq.parquet /stream/input/events1.parquet hdfs dfs -put sampler/target/communication_events-controllers_2715_2716_all-1min-uniq.parquet /stream/input/events2.parquet hdfs dfs -put sampler/target/communication_events-controllers_2715_2716_all-1min-uniq.parquet /stream/input/events3.parquet ...
sampler/target/
directory. If you put event samples frequently enough, you will seecommunication lost
alarms coming from thealarms
topic to the Kafka console consumer:Event Correlator produces these alarms as a result of topology based event correlation.{"timestamp":"2019-02-27T09:55:00.000+01:00","objectId":2716,"severity":"CRITICAL","info":"Communication lost with the controller"} {"timestamp":"2019-02-27T09:55:00.000+01:00","objectId":2715,"severity":"CRITICAL","info":"Communication lost with the controller"} {"timestamp":"2019-02-27T09:55:00.000+01:00","objectId":2715,"severity":"CRITICAL","info":"Communication lost with the controller"} {"timestamp":"2019-02-27T09:55:00.000+01:00","objectId":2716,"severity":"CRITICAL","info":"Communication lost with the controller"}
- Put some samples with heat and smoke events to the input HDFS directory of Event Generator:
You will see
hdfs dfs -put sampler/target/heat_events-site_1-1min-10_events.parquet /stream/input/events12.parquet hdfs dfs -put sampler/target/smoke_events-site_1-1min-10_events.parquet /stream/input/events13.parquet
fire alarms
alarms coming from thealarms
topic to the Kafka console consumer:Spark Predictor produces these alarms as a result of prediction based on heat and smoke events.{"timestamp":"2019-02-27T11:52:40.000+01:00","objectId":1,"severity":"CRITICAL","info":"Fire on site 1"}
If you have yarn
command-line client installed, then you can kill an application like this:
yarn application -kill application_1544154479411_0830
Otherwise you can use the kill-yarn-apps.sh
tool and provide YARN identifiers for the applications you want to kill:
./bin/kill-yarn-apps.sh application_1544154479411_0830 application_1544154479411_0831 application_1544154479411_0837
The tool connects to YARN using your Hadoop settings exposed via the HADOOP_CONF_DIR
environment variable.