Kafka Real Time Pipeline & Streaming Application

About

This is my first ever Kafka project. It is written in Java and consists of:

Data Generator which generates AVRO format data
Producer which sends this data to a Kafka topic
Consumer which reads the data, deserializes and stores on your local machine directory "data".
Kafka Streams application which reads the data, creates a tumbling 1 minute window and creates an aggregate to see how many events were sent to any of the postcodes in that time window. This aggregate is then sent back to a Kafka topic.
Consumer which reads the data coming from Kafka Streams application and stores the output on your local machine directory "data".

All components have their configuration files in the folder "configuration" so no code changes are needed if we want to change any basic settings like input/output topics, file names, hosts etc.

Note: Ideally this project would be split into different modules/gradle builds but for the sake of simplifying the review process it is structured this way to have all the code in one location.

Architecture

Note: In the root directory, folder named "Data" you can find data output files with sample few records left there to review the file structure.

To run the project

You will need Docker installed on your machine to run this.

Clone the repository
run

docker-compose up -d

This will spin out five docker containers with Zookeeper, Kafka, Schema Registry, Java machine to run the code and Kafdrop for GUI.

Run

docker ps

This will show the created Docker images. Select and copy the container_id associated with this image: "checkout_java-machine".

Run by replacing CONTAINER_ID with the id from the previous step. This will connect you to the instance with the Java code inside.

docker exec -it CONTAINER_ID /bin/sh; exit

Once connected just run the command below which will start the producer, two consumers and a Kafka streams application.

./launch.sh

This will land the files to the folder "data". With two file outputs page-views-raw.out and page-views-agg.out
You can view the Kafka cluster information in an interactive GUI. Just open your browser and go to localhost:9000

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
configuration		configuration
data		data
gradle/wrapper		gradle/wrapper
images		images
jars		jars
src/main		src/main
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
build.gradle		build.gradle
docker-compose.yml		docker-compose.yml
gradlew		gradlew
gradlew.bat		gradlew.bat
launch.sh		launch.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kafka Real Time Pipeline & Streaming Application

About

Architecture

To run the project

About

Releases

Packages

Languages

mmaaartin/kafka-rt-pipeline-steaming-app

Folders and files

Latest commit

History

Repository files navigation

Kafka Real Time Pipeline & Streaming Application

About

Architecture

To run the project

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages