Skip to content

Amin-Siddique/ApacheSpark-w-Python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 

Repository files navigation

ApacheSpark-Kafka-Python-Streaming

This project aims to consume tweets message from a python script through kafka broker and transform the data to get maximum tagged word through ApacheSpark Streaming.

Flow: Python Script --> Kafka --> ApacheSpark

Deployment

To deploy this project run the following in order:

  1. zookeeper
  zookeeper-server-start.bat config\zookeeper.properties 
  1. Kafka
  kafka-server-start.bat config/server.properties 

To check for messages in kafka (my topic's name is 'awesome'):

kafka-console-consumer.bat --bootstrap-server localhost:9092 --topic awesome
  1. Data Streaming from Twitter Python Script
  Python Twitter_to_Kafka.py
  1. Spark Script
  spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.7 Spark_kafka_twitter_tags.py

Authors

🚀 About Me

Hi, I'm Amin! 👋

I am a passionate Data Engineer

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages