Skip to content

mmaaartin/kafka-rt-pipeline-steaming-app

Repository files navigation

Kafka Real Time Pipeline & Streaming Application

About

This is my first ever Kafka project. It is written in Java and consists of:

  • Data Generator which generates AVRO format data
  • Producer which sends this data to a Kafka topic
  • Consumer which reads the data, deserializes and stores on your local machine directory "data".
  • Kafka Streams application which reads the data, creates a tumbling 1 minute window and creates an aggregate to see how many events were sent to any of the postcodes in that time window. This aggregate is then sent back to a Kafka topic.
  • Consumer which reads the data coming from Kafka Streams application and stores the output on your local machine directory "data".

All components have their configuration files in the folder "configuration" so no code changes are needed if we want to change any basic settings like input/output topics, file names, hosts etc.

Note: Ideally this project would be split into different modules/gradle builds but for the sake of simplifying the review process it is structured this way to have all the code in one location.

Architecture

alt text

Note: In the root directory, folder named "Data" you can find data output files with sample few records left there to review the file structure.

To run the project

You will need Docker installed on your machine to run this.

  1. Clone the repository

  2. run

docker-compose up -d

This will spin out five docker containers with Zookeeper, Kafka, Schema Registry, Java machine to run the code and Kafdrop for GUI.

  1. Run
docker ps

This will show the created Docker images. Select and copy the container_id associated with this image: "checkout_java-machine".

alt text

  1. Run by replacing CONTAINER_ID with the id from the previous step. This will connect you to the instance with the Java code inside.
docker exec -it CONTAINER_ID /bin/sh; exit
  1. Once connected just run the command below which will start the producer, two consumers and a Kafka streams application.
./launch.sh
  1. This will land the files to the folder "data". With two file outputs page-views-raw.out and page-views-agg.out
  2. You can view the Kafka cluster information in an interactive GUI. Just open your browser and go to localhost:9000

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published