Skip to content

Latest commit

 

History

History
100 lines (76 loc) · 5.79 KB

README.md

File metadata and controls

100 lines (76 loc) · 5.79 KB

Distributed Key-Value Store

Enterprise Computing WS 2018/19 - Assignment 2 - Distributed Key Value Storage

Authors: Jacek Janczura & Tomasz Tkaczyk

Implementation

alt text

The aim of the project was to implement simple distributed key-value store which consists of two replicas, whereby one is the master and one is the slave. Clients always write to the master but can potentially read from both replica.
Additionally we should measure staleness and latency of our system in the scenario of asynchronous and synchronous replication from the master server to all of the slave servers.

Synchronous replication in case of for example write(key, value) message to the replica servers means that the message will be send from client to the master server. Master server will replicate the write on all of the replicas. When the master gets the response from all of the replicas he sends the final response to the client that the value is storred correctly. In this case client knows that the message is replicated already on other replicas(slave servers).

In contrary to synchronous replication asynchronous replication sends the response to the client just after storing the value in his KV store and than the master replicates that value on the other replicas.

portSlave=8000
hostSlave=127.0.0.1
portMaster=8001
hostMaster=127.0.0.1
  • All logs are stored in the file - EC_Assignment_2.log

  • To run the simple main just run the jar.
    java -jar target/EC_Assignment_2-1.0-SNAPSHOT-shaded.jar

  • In our project to handle the messages sent by the client to different type of servers we are using combination of strategy design pattern with factory design pattern. Thanks to this approach we are using polymorphism, so adding a new messages to handle is really easy and will be handle by all types of the servers that implement IServer interface.

Deployment

After the development phase, we've deployed two variants of our code on two AWS EC2 instances. We've decided to take two t2.micro instances and spin them in different avalibility zones. The Master instance was deployed on EU-West-1 (Irland), whereas the Slave was deployed on EU-Central-1 (Frankfurt). alt text

alt text

2) Benchmarking Latency and Staleness

  • Latency - full time of the write message. Time between sending the write message and receiving the response from the server.
  • Staleness - inconsitency in storage. Time period in which current state differs among the replicas. It begins after commiting the change in storage on master node, and ends right after commiting the same change on the last replica.

Since the master and slave servers are running on separate AWS ec2 instances we can trigger client from local computer.

One server is located in Ireland the other in Frankfurt and our local client is in Berlin. In our calculations and analysis we need to remember that in Ireland the timezone is different.

The write message which is supposed to store the value under fixed key will be sent to the master server 100 times using synchronous replication and 100 times using asynchronous replication to the slae server.

    /**
     * Method used for benchmarking latency and staleness of asynchronic replication. 
     * This method sends an update (In our KV store write is equal to an update) every second for 100s using asynchron. replication.
     */
    public void crazyUpdateAsynchronic() {
        for (int i = 0; i < 100; i++) {
            Request req = asyncWrite("Asy", "req" + i);
            sendSyncMsgToMaster(req);
            sleep(1000);
        }
    }
    /**
     * Method used for benchmarking latency and staleness of synchronic replication. 
     * This method sends an update (In our KV store write is equal to an update) every second for 100s using synchron. replication.
     */
    public void crazyUpdateSynchronic() {
        for (int i = 0; i < 100; i++) {
            Request req = syncWrite("Sy", "req" + i);
            sendSyncMsgToMaster(req);
            sleep(1000);
        }
    }

After 200 write operations, we manually collect the logs from Client, Master and Slave. The logs are then merged together, and manually transformed into a single CSV file.

3) Analysis

Average write message with synchronous replication latency - 407 ms

Average write message with asynchronous replication latency - 283,5 ms

Average write message with synchronous replication staleness - 124 ms

Average write message with asynchronous replication staleness - 135 ms

Synchronous replication

After collecting the logs, we have applied standard data cleaning transformations and aggregations to compute time differences between particular events. First we've analysed synchronous replication. Figure below depicts the differences in latency for 100 requests. On average one request needs 407ms to be processed (median 365ms). On a diagram we can observe 3 'peaks' which were probably caused by the network delays.

alt text

alt text

Asynchronous replication

In case of the asynchronous replication, the average latency is almost two times faster than synchronous replication. What is more, in both synchronus and asynchronous replication, staleness is comparable (ca. 130 ms).

alt text

alt text