Telecom use cases implemented with popular streaming frameworks: Apache Spark and Apache Flink

Building the code

Run the command:

mvn clean install

Note that event samples are generated as a part of the build.

Runtime dependencies

The following software is needed to run the applications provided in the bin directory.

The local machine:

Spark 3.x built with Scala 2.12
Spark executables (for example,spark-submit) in the PATH
Flink 1.10.x built with Scala 2.12
Flink executables (for example,flink) in the PATH
Hadoop client distribution version 2.7.x
Hadoop executables (for example, hadoop or hdfs) in the PATH
The HADOOP_CONF_DIR environment variable points to the directory with Hadoop configuration files.

Remote dependencies:

Hadoop cluster with HDFS and YARN
Kafka cluster 2.x

Some references:

Running Spark on YARN
Using Spark's "Hadoop Free" Build - this is relevant for Spark distribution built without Hadoop
Flink YARN Setup

Preparing Kafka

Make sure that Zookeeper and Kafka are running.

Create the events and alarms Kafka topics:

kafka-topics.sh --zookeeper localhost:2181 --create --topic events --partitions 4 --replication-factor 1
kafka-topics.sh --zookeeper localhost:2181 --create --topic alarms --partitions 4 --replication-factor 1

Running applications

All Spark and Flink applications are submitted to YARN.

You can configure settings by editing the shell launcher of the corresponding application in the bin directory.

Event Generator

Event Generator pushes events to the events Kafka topic. It takes the events from Parquet files supplied to the input HDFS directory. Before sending events to Kafka, Event Generator substitutes real event timestamps instead of dummy timestamps provided in the Parquet samples.

Run Event Generator:

./bin/event-generator.sh

Event Writer

Event Writer consumes events from the events Kafka topic and writes them to the output HDFS directory as Parquet files. Event Writer partitions Parquet files by siteId and year_month. The content of the output directory may be helpful in troubleshooting, because it has the data actually sent by Event Generator.

Run Event Writer:

./bin/event-writer.sh

Event Correlator

Event Correlator consumes events from the events Kafka topic, and correlates them. It creates alarms as a result of the correlation, and sends alarms to the alarms Kafka topic.

Run Event Correlator:

./bin/event-correlator.sh

Spark Predictor

Spark Predictor consumes events from the events Kafka topic, and predicts failures based on the observed events. It creates alarms as a result of the prediction, and sends alarms to the alarms Kafka topic.

Run Spark Predictor:

./bin/spark-predictor.sh

Pushing events and getting alarms

Run Kafka console consumer to get alarms on your terminal:

kafka-console-consumer.sh --bootstrap-server localhost:9092 --group alarms --topic alarms

Put some samples with communication events to the input HDFS directory of Event Generator:

hdfs dfs -put sampler/target/communication_events-controllers_2715_2716_all-1min-uniq.parquet /stream/input/events1.parquet
hdfs dfs -put sampler/target/communication_events-controllers_2715_2716_all-1min-uniq.parquet /stream/input/events2.parquet
hdfs dfs -put sampler/target/communication_events-controllers_2715_2716_all-1min-uniq.parquet /stream/input/events3.parquet
...

Note that Parquet files in the input HDFS directory should have distinct names, to let Spark Streaming distinguish them as separate files. You can find more event samples in the sampler/target/ directory. If you put event samples frequently enough, you will see communication lost alarms coming from the alarms topic to the Kafka console consumer:

{"timestamp":"2019-02-27T09:55:00.000+01:00","objectId":2716,"severity":"CRITICAL","info":"Communication lost with the controller"}
{"timestamp":"2019-02-27T09:55:00.000+01:00","objectId":2715,"severity":"CRITICAL","info":"Communication lost with the controller"}
{"timestamp":"2019-02-27T09:55:00.000+01:00","objectId":2715,"severity":"CRITICAL","info":"Communication lost with the controller"}
{"timestamp":"2019-02-27T09:55:00.000+01:00","objectId":2716,"severity":"CRITICAL","info":"Communication lost with the controller"}

Event Correlator produces these alarms as a result of topology based event correlation.

Put some samples with heat and smoke events to the input HDFS directory of Event Generator:

hdfs dfs -put sampler/target/heat_events-site_1-1min-10_events.parquet /stream/input/events12.parquet
hdfs dfs -put sampler/target/smoke_events-site_1-1min-10_events.parquet /stream/input/events13.parquet

You will see fire alarms alarms coming from the alarms topic to the Kafka console consumer:

{"timestamp":"2019-02-27T11:52:40.000+01:00","objectId":1,"severity":"CRITICAL","info":"Fire on site 1"}

Spark Predictor produces these alarms as a result of prediction based on heat and smoke events.

Killing applications in YARN

If you have yarn command-line client installed, then you can kill an application like this:

yarn application -kill application_1544154479411_0830

Otherwise you can use the kill-yarn-apps.sh tool and provide YARN identifiers for the applications you want to kill:

./bin/kill-yarn-apps.sh application_1544154479411_0830 application_1544154479411_0831 application_1544154479411_0837

The tool connects to YARN using your Hadoop settings exposed via the HADOOP_CONF_DIR environment variable.

Name		Name	Last commit message	Last commit date
Latest commit History 381 Commits
.github/workflows		.github/workflows
bin		bin
correlator		correlator
generator		generator
predictor-flink		predictor-flink
predictor-spark		predictor-spark
resources		resources
sampler		sampler
shared-resources		shared-resources
test-resources		test-resources
topology		topology
writer		writer
yarn-killer		yarn-killer
.gitignore		.gitignore
README.md		README.md
kafka-cheat-sheet		kafka-cheat-sheet
pom.xml		pom.xml
scala-scalastyle.xml		scala-scalastyle.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Telecom use cases implemented with popular streaming frameworks: Apache Spark and Apache Flink

Building the code

Runtime dependencies

Preparing Kafka

Running applications

Event Generator

Event Writer

Event Correlator

Spark Predictor

Pushing events and getting alarms

Killing applications in YARN

About

Releases

Packages

Contributors 2

Languages

tashoyan/telecom-streaming

Folders and files

Latest commit

History

Repository files navigation

Telecom use cases implemented with popular streaming frameworks: Apache Spark and Apache Flink

Building the code

Runtime dependencies

Preparing Kafka

Running applications

Event Generator

Event Writer

Event Correlator

Spark Predictor

Pushing events and getting alarms

Killing applications in YARN

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages