Kafka Training

In this project, we'll create a simple streaming application with Kafka. In fact, we'll have two versions for this app:

Ver1: Kafka, Spark, NodeJS
Ver2: Kafka, Spark, MongoDB, NodeJS (not ready yet)

Technologies

Kafka (https://kafka.apache.org/downloads)
Spark (https://spark.apache.org/downloads.html)
NodeJS (https://nodejs.org/en/download/)
MongoDB (https://www.mongodb.com/download-center/community)
Environment: Windows 10

Version One

We assume that we have a website selling four branchs of bicycles: Trek, Giant, Jett, Cannondale, Surly. Now, we want to know, in every 30s, how many velo of each of these branchs are sold. To simulate the amount of bicycles purchased online, we create a KafkaProducer which auto-generate the data. Next, Spark Streaming will read and calculate the data from KafkaProducer. And the result is used by server NodeJS to visualize.

Idea of project

KakfaProducer generate the data and send them to bike-data topic. Next, SparkStreamingConsumer'll read the data from this topic, calculate and send the result to a new topic named bike-data-visualization. On the other hand, NodeJS server'll always listen and read the data from bike-data-visualization topic and visualize them.

How to run this project

First, clone this project

$ git clone https://github.com/nxhuy-github/KafkaTraining.git

Next, open 2 CMD and start Zookeeper and Kafka

$ zookeeper-server-start.bat %KAFKA_HOME%\config\zookeeper.properties

$ kafka-server-start.bat %KAFKA_HOME%\config\server.properties

Note: in Linux, you just use file .sh instead.

Open a new CMD and create the kafka topic

$ kafka-topics.bat --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic bike-data
$ kafka-topics.bat --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic bike-data-visualization

Next, start the NodeJS server

$ cd realtime-dashboard
$ node server-kafka.js

Now, run KafkaProducer, open a new CMD and run this command line

$ cd bicycledataproducer\dist
$ java -cp bicycle-data-producer-1.0-SNAPSHOT.jar com.nxhuy.kafka.producer.training.BicycleDataProducer localhost:9092 bike-data 0 30 50 1200

In this CMD, we'll see the data generated by KafkaProducer.

Run SparkStreamingConsumer, open a new CMD and run this command line

$ cd bicyclestreamingconsumer\dist
$ spark-submit --class com.nxhuy.spark.streaming.training.BicycleStreamingConsumer bicycle-streaming-consumer-1.0-SNAPSHOT.jar

In this CMD, we'll see the result of SparkStreamingConsumer.

To see the result, in your browser, go to http://localhost:8080.

Version Two

Comming soon

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
BicycleDataProducer		BicycleDataProducer
bicycledataproducer/target		bicycledataproducer/target
bicyclestreamingconsumer		bicyclestreamingconsumer
realtime-dashboard		realtime-dashboard
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kafka Training

Technologies

Version One

Idea of project

How to run this project

Version Two

About

Releases

Packages

Languages

License

nxhuy-github/KafkaTraining

Folders and files

Latest commit

History

Repository files navigation

Kafka Training

Technologies

Version One

Idea of project

How to run this project

Version Two

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages