Skip to content

Commit

Permalink
feat: consistent hashing
Browse files Browse the repository at this point in the history
  • Loading branch information
fadyat committed Sep 28, 2023
1 parent 47f3738 commit 10357e4
Show file tree
Hide file tree
Showing 2 changed files with 69 additions and 0 deletions.
69 changes: 69 additions & 0 deletions courses/itmo-software-design/consistent-hashing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
## Consistent hashing

Is a distributed hashing scheme that operates independently of the number of servers/objects
in a distributed hash table.

Hashing:

- based on some simple fn (hash tables)
- cryptographic hash fn (password security, blockchain)

In some situations, it may be necessary to split a hash table into several parts, hosted on
different servers. (objects and there keys are distributed across multiple servers)

Motivation: bypass the memory limitations of a single server, or to distribute the load.

Typical use case: distributed cache (memcached, redis)

Such setups consist of a pool of caching servers that host many kv pairs and are used to provide
fast access to data originally stored elsewhere.

> Reducing the load on a OLTP database and improve performance.
>
> Fetching from cache before fetching from the database, if cache miss, fetch from database and
> store in cache.
### How to distribute the data across the servers?

The simplest way is to take the hash % number of servers.

> Works fine until we add/remove a server. Need to redistribute the data across the servers.
>
> This may very well degrade performance severely and possibly crash the origin servers.
We need a distribution scheme that doesn't depend directly on the number of servers, when
we +/- servers the data should be redistributed minimally.

Consistent hashing is a distributed scheme that operates independently of the number
of servers or objects in distributed hash table by assigning them a position on an abstract
circle, or hash ring.

This allows servers and objects to scale without affecting the overall system.

Each object key will belong in the server whose key is closest, in a clockwise direction,
following the object key's position on the ring.

> From a programming perspective, the ring is a sorted list of server values,
> that means we can use binary search to find the correct server for an object key.
To encure object keys are evenly distributed across the servers, we can add virtual nodes
to the ring. (A -> A0..An)

![](./docs/ch/virtual-nodes.png)

Benefit:

- when removing some server, the data will be redistributed across the remaining servers
and the load will be balanced, without affecting the previous distribution.

- something similar happens when adding a new server, only some part of the data will be
redistributed to the new server.


## Resources

About consistent hashing:
- https://www.toptal.com/big-data/consistent-hashing

About Amazon DynamoDB replication:
- https://medium.com/omarelgabrys-blog/consistent-hashing-beyond-the-basics-525304a12ba
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 10357e4

Please sign in to comment.