In this project, we'll create a simple streaming application with Kafka. In fact, we'll have two versions for this app:
- Ver1: Kafka, Spark, NodeJS
- Ver2: Kafka, Spark, MongoDB, NodeJS (not ready yet)
- Kafka (https://kafka.apache.org/downloads)
- Spark (https://spark.apache.org/downloads.html)
- NodeJS (https://nodejs.org/en/download/)
- MongoDB (https://www.mongodb.com/download-center/community)
- Environment: Windows 10
We assume that we have a website selling four branchs of bicycles: Trek, Giant, Jett, Cannondale, Surly. Now, we want to know, in every 30s, how many velo of each of these branchs are sold. To simulate the amount of bicycles purchased online, we create a KafkaProducer which auto-generate the data. Next, Spark Streaming will read and calculate the data from KafkaProducer. And the result is used by server NodeJS to visualize.
KakfaProducer generate the data and send them to bike-data topic. Next, SparkStreamingConsumer'll read the data from this topic, calculate and send the result to a new topic named bike-data-visualization. On the other hand, NodeJS server'll always listen and read the data from bike-data-visualization topic and visualize them.
First, clone this project
$ git clone https://github.com/nxhuy-github/KafkaTraining.git
Next, open 2 CMD and start Zookeeper and Kafka
$ zookeeper-server-start.bat %KAFKA_HOME%\config\zookeeper.properties
$ kafka-server-start.bat %KAFKA_HOME%\config\server.properties
Note: in Linux, you just use file .sh instead.
Open a new CMD and create the kafka topic
$ kafka-topics.bat --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic bike-data
$ kafka-topics.bat --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic bike-data-visualization
Next, start the NodeJS server
$ cd realtime-dashboard
$ node server-kafka.js
Now, run KafkaProducer, open a new CMD and run this command line
$ cd bicycledataproducer\dist
$ java -cp bicycle-data-producer-1.0-SNAPSHOT.jar com.nxhuy.kafka.producer.training.BicycleDataProducer localhost:9092 bike-data 0 30 50 1200
In this CMD, we'll see the data generated by KafkaProducer.
Run SparkStreamingConsumer, open a new CMD and run this command line
$ cd bicyclestreamingconsumer\dist
$ spark-submit --class com.nxhuy.spark.streaming.training.BicycleStreamingConsumer bicycle-streaming-consumer-1.0-SNAPSHOT.jar
In this CMD, we'll see the result of SparkStreamingConsumer.
To see the result, in your browser, go to http://localhost:8080.
Comming soon