Skip to content

big data streaming pipeline and integration platform project using kafka and Cassandra connectors.

Notifications You must be signed in to change notification settings

ronocara/Data-Warehousing

Repository files navigation

Data-Warehousing

big data streaming pipeline and integration platform project using kafka and Cassandra connectors.

This school project required us to build a Data Streaming Pipeline and Integration Platform with the following requirements:

  • from 10 different data producers that will be sent to a Kafka topic.
  • make a database using Cassandra to store data from the producers.
  • A Python program that fetches the latest batch of data that arrived in Kafka and then sends (saves) this batch of data to Cassandra)

This project was implemented on my local computer using Ubuntu. The project documentation is as follows:



kafka and zookeper
Kafka and Zookeeper first need to be run in the terminal

10 different producers are now simultaneously run to sen data produced to a topic in Kafka. The producers sends data of different vehicle records, such as their time and date of arrival, vehicle type, and lgu code.

producer output
Sample output from producer



The data will be read by the consumer

consumer output
Sample output from consumer

A dataase will now be made using Cassandra to store all the data from the producers. A python program 'kafka_consumer_sink.py' will now be used to fetch the latest batch of data that arrived in Kafka to the be sent and saved to the Cassandra database.

consumer output
Cassandra database before fetching data

consumer output
Cassandra database after fetching data

About

big data streaming pipeline and integration platform project using kafka and Cassandra connectors.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published