This project aims to consume tweets message from a python script through kafka broker and transform the data to get maximum tagged word through ApacheSpark Streaming.
Flow: Python Script --> Kafka --> ApacheSpark
To deploy this project run the following in order:
- zookeeper
zookeeper-server-start.bat config\zookeeper.properties
- Kafka
kafka-server-start.bat config/server.properties
To check for messages in kafka (my topic's name is 'awesome'):
kafka-console-consumer.bat --bootstrap-server localhost:9092 --topic awesome
- Data Streaming from Twitter Python Script
Python Twitter_to_Kafka.py
- Spark Script
spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.7 Spark_kafka_twitter_tags.py
I am a passionate Data Engineer