Distributed Queue

Fig.1: Overview

Design

The broker implements a distributed logging queue in Python programming language and provides HTTP endpoints to interact with it over the network. It uses the Flask library to register the endpoints and further uses a Manager object to interact with the storage-layer. The storage layer tracks tables for all topics along with subscribers (consumer) and publishers (producer) that are using the broker to pass messages under specific topics. The table schemas are shown in Fig.1.

The Manager object switches the storage-layer from pandas DataFrame (in memory) to mySQL database (persistent) based on a is_SQL flag that can be set while deploying the broker with docker-compose.

On top of the broker, a python library “myqueue” is implemented that provides the MyProducer, MyConsumer classes to easily create respective instances. Moreover an ApiHandler is also implemented to interact with the broker from the python environment. The producer and consumer instances can concurrently send or receive data from the broker. The broker orders the requests and accordingly stores the messages in the storage layer using locking primitives like Semaphore and ThreadPool.

Assumptions

Topics should not have “–” in their name (i.e., T–1 is not allowed, T1 is allowed).
“subl” and “publ” names are reserved for the broker's internal use. Hence usage of these two for topic names is prohibited.
The log files are of specified format to work with the “myqueue” implementation. This is discussed in detail in the testing phase.
Producer and Consumer IDs start from 0 to N for each type of instance. For instance the first producer will have ID as 0 and first consumer will also have ID as 0; however these two are used in different contexts and are not the same.
Consumers start consuming messages when they subscribe to a topic. Please note previous messages are not accessible for the consumer instance.
Internal ordering of the messages is based on FIFS policy. Hence whichever request comes first, is served first. Unless it is parallelizable (i.e., read calls).

Challenges

Designing the schemas for the tables was a bit challenging. Especially how to design them so that it takes minimum space and it will be easy from the broker’s perspective to interact with the storage-layer.
MySQL database supports sequential query processing. Therefore, the concurrent requests from the producer and consumer objects poses a great issue. This leads to multiple failed attempts to test the persistent version of the broker. Eventually, the implementation synchronous Process Pool of size 1 to make the SQL queries sequential.
However, the above design choice may increase the latency for each request to be served.

Prerequisites

1. Docker: latest [version 20.10.23, build 7155243]

sudo apt-get update

sudo apt-get install \
    ca-certificates \
    curl \
    gnupg \
    lsb-release

sudo mkdir -p /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg

echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

sudo apt-get update

sudo apt-get install docker-ce docker-ce-cli containerd.io

2. Docker-compose standalone [version v2.15.1]

sudo curl -SL https://github.com/docker/compose/releases/download/v2.15.1/docker-compose-linux-x86_64 -o /usr/local/bin/docker-compose

sudo chmod +x /usr/local/bin/docker-compose

sudo ln -s /usr/local/bin/docker-compose /usr/bin/docker-compose

Installation Steps

Deploy broker

persistent mode
└── sudo docker-compose --env-file ./config/.env.persist up -d

inMemory mode
└──sudo docker-compose --env-file ./config/.env.inmem up -d

Restart broker

sudo docker-compose restart

Remove broker

sudo docker-compose down --rmi all

Testing

Here 10.110.10.216:5000 is used as an examle broker instance.

run a producer instance from log file

COMMAND
└──python runproduce.py --id 1 --topics T1 T2 T3 --broker 10.110.10.216:5000 --log_loc ./test/{EXP}

** producer_{id}.txt log file format must be maintained else throws Exception("log file:incompitable")

[ts         msg       parallel    topic]
---------------------------------------
21:52:23	INFO		P1-1		T1
21:52:23	INFO		P1-1		T2
21:52:23	INFO		P1-1		T3
21:52:23	INFO		P1-2		T1
21:52:23	INFO		P1-2		T2
21:52:23	INFO		P1-2		T3

run a consumer instance and store log

COMMAND
└──python runconsume.py --id 1 --topics T1 T2 T3 --broker 10.110.10.216:5000 --log_loc ./test/{EXP}

** consumer_{id}.txt stored log file example

[ts      msg]
---------------------------------------
T1     INFO
T2     INFO
T3     INFO
T2     WARN
T1     INFO

run API test cases

COMMAND
└──bash testAPI.sh 10.110.10.216 5000

Test results are as follows

Fig.2: 2 Producers 2 Consumers

run 2-producer, 2-consumer setup

Question: Implement 2 Producers and 2 consumers with 2 topics as shown in Fig. 2 using the library developed in Part-C. Given below is the "topic:producers:consumers" mapping.

T1: P1 P2: C1 C2
T2: P1 P2: C1 C2

Here, the last point means that P1 and P2 will produce to topic T2; C1, C2 will consume from T2.

[Producers]
└──python runproduce.py --id 1 --topics T1 T2 --broker 10.110.10.216:5000 --log_loc ./test/2P2C
└──python runproduce.py --id 2 --topics T1 T2 --broker 10.110.10.216:5000 --log_loc ./test/2P2C

[Consumers]
└──python runconsume.py --id 1 --topics T1 T2 --broker 10.110.10.216:5000 --log_loc ./test/2P2C
└──python runconsume.py --id 2 --topics T1 T2 --broker 10.110.10.216:5000 --log_loc ./test/2P2C

Run all commands together
-------------------------
+ bash test2P2C.sh 10.110.10.216 5000

Test results are as follows and Consumer logs are stored at ./test/2P2C/consumer_{id}.txt where id = 1,2

Fig.3: 5 Producers 3 Consumers

run 5-producer, 3-consumer setup

Question: Implement 5 Producers and 3 consumers with 3 topics as shown in Fig. 3 using the library developed in Part-C. Given below is the "topic:producers:consumers" mapping.

T1: P1 P2 P3: C1 C2 C3
T2: P1 P4 P5: C1
T3: P1 P2: C1 C2 C3

Here, the last point means that P1 and P2 will produce to topic T3; C1, C2 and C3 will consume from T3.

[Producers]
└──python runproduce.py --id 1 --topics T1 T2 T3 --broker 10.110.10.216:5000 --log_loc ./test/5P3C
└──python runproduce.py --id 2 --topics T1 T3 --broker 10.110.10.216:5000 --log_loc ./test/5P3C
└──python runproduce.py --id 3 --topics T1 --broker 10.110.10.216:5000 --log_loc ./test/5P3C
└──python runproduce.py --id 4 --topics T2 --broker 10.110.10.216:5000 --log_loc ./test/5P3C
└──python runproduce.py --id 5 --topics T2 --broker 10.110.10.216:5000 --log_loc ./test/5P3C

[Consumers]
└──python runconsume.py --id 1 --topics T1 T2 T3 --broker 10.110.10.216:5000 --log_loc ./test/5P3C
└──python runconsume.py --id 2 --topics T1 T3 --broker 10.110.10.216:5000 --log_loc ./test/5P3C
└──python runconsume.py --id 3 --topics T1 T3 --broker 10.110.10.216:5000 --log_loc ./test/5P3C

Run all commands together
-------------------------
+ bash test5P3C.sh 10.110.10.216 5000

Test results are as follows and Consumer logs are stored at ./test/5P3C/consumer_{id}.txt where id = 1,2,3

Contact Me

This is Assignment 1 of CS60002: Distributed Systems course in IIT Kharagpur, taught by Dr. Sandip Chakraborty. For questions and general feedback, contact Prasenjit Karmakar.

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
config		config
images		images
library		library
myqueue		myqueue
test		test
.DS_Store		.DS_Store
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
broker.py		broker.py
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt
runconsume.py		runconsume.py
runproduce.py		runproduce.py
runtestapi.py		runtestapi.py
test2P2C.sh		test2P2C.sh
test5P3C.sh		test5P3C.sh
testAPI.sh		testAPI.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Distributed Queue

Design

Assumptions

Challenges

Prerequisites

1. Docker: latest [version 20.10.23, build 7155243]

2. Docker-compose standalone [version v2.15.1]

Installation Steps

Deploy broker

Restart broker

Remove broker

Testing

run a producer instance from log file

run a consumer instance and store log

run API test cases

run 2-producer, 2-consumer setup

run 5-producer, 3-consumer setup

Contact Me

About

Languages

License

prasenjit52282/distributedQ

Folders and files

Latest commit

History

Repository files navigation

Distributed Queue

Design

Assumptions

Challenges

Prerequisites

1. Docker: latest [version 20.10.23, build 7155243]

2. Docker-compose standalone [version v2.15.1]

Installation Steps

Deploy broker

Restart broker

Remove broker

Testing

run a producer instance from log file

run a consumer instance and store log

run API test cases

run 2-producer, 2-consumer setup

run 5-producer, 3-consumer setup

Contact Me

About

Topics

Resources

License

Stars

Watchers

Forks

Languages