- 👨💻 Vipul Tiwari
- 👩💻 Roline Stapny Saldanha
- 👨💻 Devi Sandeep Endluri
- 👨💻 Kartik Venkataraman
- 👩💻 Manseerat Batra
Flume: Flume is used to connect to twitter and get the streaming data. Then, this data is cleaned and sent to Kafka.
Kafka: This holds the messages for consumption by Spark.
Spark Streaming: Consumes the messages from Kafka and process them, and sends them to Flask server.
Flask: Python web framework, which receives the data from Spark and shows dashboards.
Needed packages:
Install all the packages from requirements.txt
Kafka:
- Go to the Kafka directory.
- Run zookeeper using command: nohup bin/zookeeper-server-start.sh config/zookeeper.properties > ~/zookeeper-logs &
- Run Kafka using the command: nohup bin/kafka-server-start.sh config/server.properties > ~/kafka-logs &
Flume:
- Go to the Flume directory (For example, cd apache-flume-1.9.0-bin/)
- Run flume agent using the command: bin/flume-ng agent --conf conf --conf-file "/home/ubuntu/flume_twitter_to_kafka.conf" --name agent1
Spark streaming:
- Download spark-streaming version 2.4.5
- Unzip the tar file in the local workspace.
- Set this directory path as SPARK_HOME in environmental variables.
- Set the same path as HADOOP_HOME in environmental variables.
- Add SPARK_HOME/bin to the PATH variable.
- Make sure JAVA_HOME is set to JDK version 1.8
- Download the spark-streaming-kafka-assembly_2.11-1.6.0.jar file in this project to the local workspace
- Use the following example command to run: bin\spark-submit --jars spark-streaming-kafka-assembly_2.11-1.6.0.jar D:\Spring2020\csce678\project\code\cloudproject\SparkStreaming\spark-kafka.py 3.22.26.9:9092 twitter_stream_new D:\Spring2020\csce678\project\code\cloudproject\geo_tweets.txt
Flask:
- Run the flask.rc file in this project (source flask.rc)
- Run "flask run", this starts the application by default in localhost:5000