A distributed system setup consisting of Kafka, Cassandra, Flink, ksqlDB, and custom consumers, designed to run across two nodes for high availability and scalability.
- System Architecture
- Prerequisites
- Cloning the Repository
- Environment Configuration
- Deployment Instructions
- Initializing Cassandra
- Additional Notes
This system consists of:
-
Node 1:
- Kafka Broker 1
- Cassandra Node 1
- Flink JobManager 1
- Flink TaskManager 1
- Zookeeper Node 1
- ksqlDB Server 1
-
Node 2:
- Kafka Broker 2
- Cassandra Node 2
- Flink JobManager 2
- Flink TaskManager 2
- Zookeeper Nodes 2 and 3
- ksqlDB Server 2
Custom consumers (kafka-to-cassandra
and flink-to-cassandra
) can run on either or both nodes.
Ensure the following are installed on both nodes:
- Docker (>= 20.10)
- Docker Compose (>= 1.29)
- Python (>= 3.8)
- pip (for Python dependencies)
On both nodes, clone the GitHub repository:
git clone https://github.com/dimitrisna/infrastructure.git
cd infrastructure
-
Create
.env
File:- On both nodes, copy the example environment file:
cp .env.example .env
- Update the variables specific to each node.
- On both nodes, copy the example environment file:
-
Example
.env
for Node 1:KAFKA_BROKER1_IP=192.168.1.101 KAFKA_BROKER2_IP=192.168.1.102 CASSANDRA_SEEDS=192.168.1.101,192.168.1.102 CASSANDRA_BROADCAST_ADDRESS1=192.168.1.101 ZOOKEEPER1_IP=192.168.1.101 JOBMANAGER1_IP=192.168.1.201 TASKMANAGER1_IP=192.168.1.202
-
Example
.env
for Node 2:KAFKA_BROKER1_IP=192.168.1.101 KAFKA_BROKER2_IP=192.168.1.102 CASSANDRA_SEEDS=192.168.1.101,192.168.1.102 CASSANDRA_BROADCAST_ADDRESS2=192.168.1.102 ZOOKEEPER2_IP=192.168.1.102 ZOOKEEPER3_IP=192.168.1.103 JOBMANAGER2_IP=192.168.2.201 TASKMANAGER2_IP=192.168.2.202
-
On one of the nodes, run the provided script to generate the Kafka Cluster ID:
python scripts/generate-cluster-id.py
-
Copy the generated Cluster ID and update the
.env
file on both nodes:KAFKA_CLUSTER_ID=<generated-cluster-id>
- Navigate to the project directory on Node 1.
- Build the Docker images:
bash scripts/build-images.sh
- Run the containers for Node 1:
docker-compose up -d kafka1 cassandra1 jobmanager1 taskmanager1 zoo1 ksqldb-server1
- Navigate to the project directory on Node 2.
- Build the Docker images:
bash scripts/build-images.sh
- Run the containers for Node 2:
docker-compose up -d kafka2 cassandra2 jobmanager2 taskmanager2 zoo2 zoo3 ksqldb-server2
- Install Cassandra dependencies:
pip install -r requirements.txt
- Run the Cassandra initialization script on one node only:
python cassandra/create_cassandra_tables.py
-
Ensure Synchronization:
- The
.env
files on both nodes must be consistent except for node-specific variables like IP addresses.
- The
-
Verify Container Health:
- Check the status of all containers:
docker ps
- Check the status of all containers:
-
Logs and Debugging:
- Use the following command to view logs for a container:
docker logs <container-name>
- Use the following command to view logs for a container: