Skip to content

nxhuy-github/KafkaTraining

Repository files navigation

Kafka Training

In this project, we'll create a simple streaming application with Kafka. In fact, we'll have two versions for this app:

  • Ver1: Kafka, Spark, NodeJS
  • Ver2: Kafka, Spark, MongoDB, NodeJS (not ready yet)

Technologies

Version One

We assume that we have a website selling four branchs of bicycles: Trek, Giant, Jett, Cannondale, Surly. Now, we want to know, in every 30s, how many velo of each of these branchs are sold. To simulate the amount of bicycles purchased online, we create a KafkaProducer which auto-generate the data. Next, Spark Streaming will read and calculate the data from KafkaProducer. And the result is used by server NodeJS to visualize.

Idea of project

KakfaProducer generate the data and send them to bike-data topic. Next, SparkStreamingConsumer'll read the data from this topic, calculate and send the result to a new topic named bike-data-visualization. On the other hand, NodeJS server'll always listen and read the data from bike-data-visualization topic and visualize them.

How to run this project

First, clone this project

$ git clone https://github.com/nxhuy-github/KafkaTraining.git

Next, open 2 CMD and start Zookeeper and Kafka

$ zookeeper-server-start.bat %KAFKA_HOME%\config\zookeeper.properties
$ kafka-server-start.bat %KAFKA_HOME%\config\server.properties

Note: in Linux, you just use file .sh instead.

Open a new CMD and create the kafka topic

$ kafka-topics.bat --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic bike-data
$ kafka-topics.bat --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic bike-data-visualization

Next, start the NodeJS server

$ cd realtime-dashboard
$ node server-kafka.js

Now, run KafkaProducer, open a new CMD and run this command line

$ cd bicycledataproducer\dist
$ java -cp bicycle-data-producer-1.0-SNAPSHOT.jar com.nxhuy.kafka.producer.training.BicycleDataProducer localhost:9092 bike-data 0 30 50 1200

In this CMD, we'll see the data generated by KafkaProducer.

Run SparkStreamingConsumer, open a new CMD and run this command line

$ cd bicyclestreamingconsumer\dist
$ spark-submit --class com.nxhuy.spark.streaming.training.BicycleStreamingConsumer bicycle-streaming-consumer-1.0-SNAPSHOT.jar

In this CMD, we'll see the result of SparkStreamingConsumer.

To see the result, in your browser, go to http://localhost:8080.

Version Two

Comming soon

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published