Real-time Twitter Data Analysis using Flume, Kafka and Spark

Authors

👨‍💻 Vipul Tiwari
👩‍💻 Roline Stapny Saldanha
👨‍💻 Devi Sandeep Endluri
👨‍💻 Kartik Venkataraman
👩‍💻 Manseerat Batra

Architecture

Flume: Flume is used to connect to twitter and get the streaming data. Then, this data is cleaned and sent to Kafka.

Kafka: This holds the messages for consumption by Spark.

Spark Streaming: Consumes the messages from Kafka and process them, and sends them to Flask server.

Flask: Python web framework, which receives the data from Spark and shows dashboards.

Dashboard

Instructions

Needed packages:

Install all the packages from requirements.txt

Kafka:

Go to the Kafka directory.
Run zookeeper using command: nohup bin/zookeeper-server-start.sh config/zookeeper.properties > ~/zookeeper-logs &
Run Kafka using the command: nohup bin/kafka-server-start.sh config/server.properties > ~/kafka-logs &

Flume:

Go to the Flume directory (For example, cd apache-flume-1.9.0-bin/)
Run flume agent using the command: bin/flume-ng agent --conf conf --conf-file "/home/ubuntu/flume_twitter_to_kafka.conf" --name agent1

Spark streaming:

Download spark-streaming version 2.4.5
Unzip the tar file in the local workspace.
Set this directory path as SPARK_HOME in environmental variables.
Set the same path as HADOOP_HOME in environmental variables.
Add SPARK_HOME/bin to the PATH variable.
Make sure JAVA_HOME is set to JDK version 1.8
Download the spark-streaming-kafka-assembly_2.11-1.6.0.jar file in this project to the local workspace
Use the following example command to run: bin\spark-submit --jars spark-streaming-kafka-assembly_2.11-1.6.0.jar D:\Spring2020\csce678\project\code\cloudproject\SparkStreaming\spark-kafka.py 3.22.26.9:9092 twitter_stream_new D:\Spring2020\csce678\project\code\cloudproject\geo_tweets.txt

Flask:

Run the flask.rc file in this project (source flask.rc)
Run "flask run", this starts the application by default in localhost:5000

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.idea		.idea
FlaskDashBoard		FlaskDashBoard
FlumeTweetSource		FlumeTweetSource
SparkStreaming		SparkStreaming
README.md		README.md
architecture.PNG		architecture.PNG
geo_tweets.txt		geo_tweets.txt
requirements.txt		requirements.txt
spark-streaming-kafka-assembly_2.11-1.6.0.jar		spark-streaming-kafka-assembly_2.11-1.6.0.jar
webpage.PNG		webpage.PNG

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Real-time Twitter Data Analysis using Flume, Kafka and Spark

Authors

Architecture

Dashboard

Instructions

About

Releases

Packages

Contributors 4

Languages

dsandeep0138/SparkTwitterAnalysis

Folders and files

Latest commit

History

Repository files navigation

Real-time Twitter Data Analysis using Flume, Kafka and Spark

Authors

Architecture

Dashboard

Instructions

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages