You may be familiar with the game where the red robot hits the blue robot. This project uses a simple online, text-based implemenation of the game to demonstrate the consensus algorithm (RAFT) that keeps track of game state.
Suppose you have a two player game where each player connects to a central game server. If that server were to crash, the game would fail as well. RAFT is an algorithm that provides consensus on game events and changes in game state, allowing for a set of replicated servers to provide game state seamlessly and consistently despite failures. From the perspective of the players, the experience is no different from talking to a single server, yet the server cluster can recover from server failure with no loss of information or inconsistent data.
RAFT is most accurately described as a consensus algorithm. This relates to many systems because any set of nodes that intend to serve as a fault-tolerant cluster of replicated data must agree on that data---even if nodes fail or messages between nodes are lost.
For the in-depth explanation of RAFT, check out the original paper here. This project represents a working implementation of the algorithm described in detail in that paper.
For a crash course in the RAFT algorithm, check out the conference presentation below:
https://youtu.be/no5Im1daS-o?t=1
This implementation of raft uses cluster nodes as illustrated in the above video. However, any number (preferrably >= 3) can be used. An odd number of nodes helps prevent excessive leadership elections.
This code depends on the Amazon AWS SDK for Python (Boto 3)
pip install boto3
This project was designed to run on AWS EC2 instances. However, as long as the Amazon SQS queues are set up, these five nodes could just as well be run all on a local machine, even in the same directory (the names of the files written do disk differ by the node number). AWS credentials will still have to be set up for the local machine.
aws configure
For anyone still getting used to setting up AWS services, we found the easiest way to get the instance up and running was to install git, clone this repo, install python3, pip install boto3, and finally use the aws configure
command to set up the instance aws credentials. In our case the credentials lived in ~/.aws/credentials
.
The network runs using Amazon SQS FIFO queues and Content-based deduplication turned on.
The urls for those queues are hardcoded in Messenger.py
. Note that these files assume a certain URL format and that the names of these nodes match the expected scheme of '0', '1', '2', '3', '4', 'leader', 'client-blue', and 'client-red'.
python3 server_logic.py 0
Each RAFT instance is started by running the server_logic.py
file with the node ID as a command line argument. If the AWS credentials are set up correctly and the queue URLs are correct, the cluster will start communicating and establishing leadership right away.
python3 robotUI.py red
python3 robotUI.py blue
The player interfaces can be started up by the above commands. The program allows for punches and blocks to be entered by the player. These actions are communicated to the Raft Cluster and the game state updated accordingly. A player receives feedback on the success or failure of their actions. They are not informed if a player punches them on an unblocked side and misses. Player action state persists until a new action is chosen. A player cannot punch too fast or they will be required to wait.
If a player punches the oponent on a side they are not blocking then there is a 10% chance they win.
watch cat status0.txt
The status of each RAFT node is continually printed to file in status#.txt
for a human readable status of the node. This status looks different if a node is a leader or a follower. The logOutput#.txt
store the historical game state and is used to establish consensus.
If you were to run 5 nodes, the red and blue player UI, and watch the status files of all 5 nodes, your screen might look something like this (using Tmux for the console windows you see):
Here is a video walking through the project.