Data-Warehousing

big data streaming pipeline and integration platform project using kafka and Cassandra connectors.

This school project required us to build a Data Streaming Pipeline and Integration Platform with the following requirements:

from 10 different data producers that will be sent to a Kafka topic.
make a database using Cassandra to store data from the producers.
A Python program that fetches the latest batch of data that arrived in Kafka and then sends (saves) this batch of data to Cassandra)

This project was implemented on my local computer using Ubuntu. The project documentation is as follows:

Kafka and Zookeeper first need to be run in the terminal

10 different producers are now simultaneously run to sen data produced to a topic in Kafka. The producers sends data of different vehicle records, such as their time and date of arrival, vehicle type, and lgu code.

Sample output from producer

The data will be read by the consumer

Sample output from consumer

A dataase will now be made using Cassandra to store all the data from the producers. A python program 'kafka_consumer_sink.py' will now be used to fetch the latest batch of data that arrived in Kafka to the be sent and saved to the Cassandra database.

Cassandra database before fetching data

Cassandra database after fetching data

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
readme-pics		readme-pics
Dockerfile		Dockerfile
Notes		Notes
README.md		README.md
kafka2cassandra.yaml		kafka2cassandra.yaml
kafka_consumer_sink.py		kafka_consumer_sink.py
producer1.py		producer1.py
producer10.py		producer10.py
producer2.py		producer2.py
producer3.py		producer3.py
producer4.py		producer4.py
producer5.py		producer5.py
producer6.py		producer6.py
producer7.py		producer7.py
producer8.py		producer8.py
producer9.py		producer9.py
virtual_consumer.py		virtual_consumer.py
virtual_producer.py		virtual_producer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data-Warehousing

About

Releases

Packages

Languages

ronocara/Data-Warehousing

Folders and files

Latest commit

History

Repository files navigation

Data-Warehousing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages