stream2gym

Tool for fast prototyping of distributed stream processing applications.

The tool was tested on Ubuntu 20.04.4 and is based on Python 3.8.10, Kafka 2.13-2.8.0, PySpark 3.2.1 and MySQL 8.0.30.

Getting started

Clone the repository, then enter into it.

git clone https://github.com/PINetDalhousie/stream2gym.git

cd stream2gym

Install dependencies. Our tool depends on the following software:

pip3
Mininet 2.3.0
Networkx 2.5.1
Java 11
Xterm
Kafka-python 2.0.2
Matplotlib 3.3.4
Seaborn 0.12.1
PyYAML 5.3.1

Most dependencies can be installed using apt install and pip3 install:

$ sudo apt install python3-pip mininet default-jdk xterm netcat

$ sudo pip3 install mininet networkx kafka-python matplotlib python-snappy lz4 seaborn pyyaml seaborn

You are ready to go! Should be able to get help using:

sudo python3 main.py -h

Sample command lines

Navigate through the use-cases/ directory to explore the diverse applications we tested using stream2gym. Details of the applications including the exact data processing pipeline, topology, executed queries, and platform configurations can be found inside respective application directory. Example command to test a streaming data analytics application in a small network:

sudo python3 main.py use-cases/app-testing/document-analytics/input.graphml

Log production, consumption history and metrics of interest (e.g., bandwidth consumption) automatically for STANDARD producer and consumer. Look over the logs in logs/output/ directory once the simulation ends.
Set a duration for the simulation (OBS.: this is the time the workload will run, not the total simulation time.)

sudo python3 main.py use-cases/disconnection/military-coordination/input.graphml --time 300

Capture the traffic of all the hosts while testing your application.

sudo python3 main.py use-cases/disconnection/military-coordination/input.graphml --capture-all

Run event streaming and stream processing engine jointly or individually. Default setup is running event streaming (Apache Kafka) and stream processing engine (Apache Spark) as a sequential pipeline.

sudo python3 main.py use-cases/reproducibility/input.graphml --only-spark 1

Explore the stream2gym supported configuration parameters in documentation/config-parameters.pdf. Setup parameters as you need and quickly test your prototype in a distributed emulated environment.

Name		Name	Last commit message	Last commit date
Latest commit History 99 Commits
dependency		dependency
documentation		documentation
kafka		kafka
message-data/xml		message-data/xml
plot-scripts		plot-scripts
spark/pyspark		spark/pyspark
testbed-results		testbed-results
tests		tests
use-cases		use-cases
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
bandwidth-monitor.py		bandwidth-monitor.py
configParser.py		configParser.py
consumer.py		consumer.py
cpu-mem-monitor.py		cpu-mem-monitor.py
emuDataStore.py		emuDataStore.py
emuKafka.py		emuKafka.py
emuLoad.py		emuLoad.py
emuLogs.py		emuLogs.py
emuNetwork.py		emuNetwork.py
emuStreamProc.py		emuStreamProc.py
emuZk.py		emuZk.py
main.py		main.py
producer.py		producer.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

stream2gym

Getting started

Sample command lines

About

Releases

Packages

Languages

License

callumHub/stream2gym

Folders and files

Latest commit

History

Repository files navigation

stream2gym

Getting started

Sample command lines

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages