Skip to content

Big Data Management project: The collection of data from a network of sensors was simulated (kafka), which then had to be processed (spark) and stored (cassandraDB) in a distributed and efficient way.

Notifications You must be signed in to change notification settings

zAle711/supreme-pancake

 
 

Repository files navigation

supreme-pancake

Repo for Big Data Management project

Three components were created in this project, a producer / data collector (kafka), a distributed database (CassandraDB) and a consumer / data processor (Spark).
The collection of data from a network of sensors was simulated, which then had to be processed and stored in a distributed and efficient way. The data collected (or generated) by kafka were then processed by spark and saved for long-term archiving on cassanda db.
The connection between the PCs has been made simple and scalable using Zerotier.

  • Leave a star ⭐ if you like this project 🙂 thank you.

What's inside

  • Kafka module
  • Cassanda db module
  • Spark module
  • Data cleaning sripts
  • Distributed job start and stop scripts
  • Project runme script
  • Project document with details

About

Big Data Management project: The collection of data from a network of sensors was simulated (kafka), which then had to be processed (spark) and stored (cassandraDB) in a distributed and efficient way.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Java 85.9%
  • Shell 12.3%
  • Python 1.8%