This project offers a hands-on approach to building an end-to-end data engineering solution for processing real-time stock market data with Apache Kafka. By working through this project, you'll dive into the practical aspects of managing and analyzing data streams in real-time.
- Install Kafka !kafka_installation. Incase of insufficient pysical memory use swap memory. Here are the commands,
export KAFKA_HEAP_OPTS="-Xmx512M -Xms512M"
sudo fallocate -l 2G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile - Make the space permanent -> echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab
- View the changes using -> sudo swapon --show and Restart kafka.
- Make sure to run zookeeper and kafka seperately using similar comand "bin/kafka-server-start.sh config/server.properties"
- Once kafka is running in EC2, it should have accept requests on public ip 35.172.219.231:9092
- Up docker compose, it will run both producer and consumer (simulate env)
- S3 bucket
- AWS athena live query
- use Terraform as IaC
wget https://downloads.apache.org/kafka/3.3.1/kafka_2.12-3.3.1.tgz
tar -xvf kafka_2.12-3.3.1.tgz
bin/zookeeper-server-start.sh config/zookeeper.properties
export KAFKA_HEAP_OPTS="-Xmx256M -Xms128M"
bin/kafka-server-start.sh config/server.properties
sudo nano config/server.properties
bin/kafka-topics.sh --create --topic demo_testing2 --bootstrap-server {Put the Public IP of your EC2 Instance:9092} --replication-factor 1 --partitions 1
bin/kafka-console-producer.sh --topic demo_testing2 --bootstrap-server {Put the Public IP of your EC2 Instance:9092}
bin/kafka-console-consumer.sh --topic demo_testing2 --bootstrap-server {Put the Public IP of your EC2 Instance:9092}