Design and Implement a Cluster Scale Up/Down Mechanism #58

kvaps · 2024-03-21T05:05:27Z

We need to design a mechanism for scaling a cluster up and down.

When a user modifies spec.replicas, the cluster should scale to the required number of replicas accordingly. Currently, we are utilizing a StatefulSet, but we understand that we might have to move away from it in favor of a custom pod controller.

Scaling up should work out of the box, but scaling down might be more complex due to several considerations:

The need to check the quorum state
The process of removing a replica from the cluster
Maybe should we disallow scaling down the etcd

We're open to suggestions on how to address these challenges and implement an efficient and reliable scaling mechanism.

The text was updated successfully, but these errors were encountered:

kvaps · 2024-03-21T05:28:59Z

Another idea to consider is if a user manually recreate a replica (by deleting the pod and the PVC). In such cases we need to verify within the cluster that the old replica no longer exists.

sircthulhu · 2024-03-31T14:30:12Z

Cluster rescaling proposal

etcd operator should be able to scale cluster up and down and react to pod deletion or PVC deletion.

Scaling procedure

There should be fields status.replicas and status.instanceNames in order to understand what instances are members,
which of them should become members and which of them should be removed.

We should introduce new status condition Rescaling that will be False if everything is fine and True if
cluster currently is rescaling or fixing, for example when pod (in case of emptyDir) or PVC is deleted.

Cluster state configmap should contain ETCD_INITIAL_CLUSTER only from the list of status.instanceNames as they're
healthy cluster members.

Status reconciliation

Field status.replicas should be filled on reconciliation based on current number of ready replicas if cluster is not
in rescaling state. Firstly, it's filled when cluster is bootstrapped.

Field status.instanceNames should be filled on reconciliation based on current ready replicas if cluster is not in
rescaling state.

Scaling up

When spec.replicas > status.replicas operator should scale cluster up.

Process is the following:

Check that cluster currently has quorum. If not, exit reconciliation loop and wait until it becomes healthy
Provided that cluster has quorum, it is safe to perform scaling up.
Update StatefulSet in accordance with spec.replicas
Update EtcdCluster status condition, set Rescaling to True with Reason: ScalingClusterUp
Execute etcdctl member add for each new member
Wait until StatefulSet becomes Ready
Update status.replicas and status.instanceNames in accordance with spec.replicas and current pod names
Update EtcdCluster status condition, set Rescaling to False with Reason: ReplicasMatchSpec
Then update cluster state ConfigMap ETCD_INITIAL_CLUSTER according to status.instanceNames

In case of errors, EtcdCluster will be stuck on Recaling stage without damaging cluster.

If user cancellation (by updating EtcdCluster's spec.replicas to old value), StatefulSet spec.replicas should
be reverted back and status condition for Rescaling should be set to False.

If user sets spec.replicas < status.replicas to both cancel scaling up and perform scaling down, we should update
StatefulSet's spec.replicas to status.replicas of CR and set Rescaling to False and schedule new reconciliation.

Scaling down

When spec.replicas < status.replicas operator should scale cluster down.

Process is the following:

Check that cluster currently has quorum. If not, exit reconciliation loop and wait until it becomes healthy.
Scaling down is not possible as changes to memberlist must be agreed by quorum.
Provided that cluster has quorum, it is safe to perform scaling down.
Operation should on a per-pod basis. Only one pod can be safely deleted at once.
Calculate last pod name as idx=status.replicas - 1 -> crdName-$(idx)
Update EtcdCluster status condition to Rescaling, status True and Reason: ScalingClusterDown
Update StatefulSet's spec.replicas to spec.replicas - 1
Connect to etcd cluster using Service as root and run send command like etcdctl member remove crdName-$(idx).\n
Running this command with an alive pod should be safe as pod should be already sent the SIGTERM signal by kubelet.
Update EtcdCluster status condition, set Rescaling to False with Reason: ReplicasMatchSpec
If spec.replicas < status.replicas, reschedule reconcile to run this algorithm from the beginning

kvaps added this to the v0.0.3 milestone Mar 22, 2024

sircthulhu self-assigned this Mar 29, 2024

kvaps removed this from the v0.0.3 milestone Mar 31, 2024

sircthulhu added the feature New feature or request label Mar 31, 2024

kvaps mentioned this issue Apr 24, 2024

[epic] etcd cluster management #196

Open

kvaps added this to the v0.4.0 milestone May 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Design and Implement a Cluster Scale Up/Down Mechanism #58

Design and Implement a Cluster Scale Up/Down Mechanism #58

kvaps commented Mar 21, 2024 •

edited

Loading

kvaps commented Mar 21, 2024 •

edited

Loading

sircthulhu commented Mar 31, 2024 •

edited

Loading

Design and Implement a Cluster Scale Up/Down Mechanism #58

Design and Implement a Cluster Scale Up/Down Mechanism #58

Comments

kvaps commented Mar 21, 2024 • edited Loading

kvaps commented Mar 21, 2024 • edited Loading

sircthulhu commented Mar 31, 2024 • edited Loading

Cluster rescaling proposal

Scaling procedure

Status reconciliation

Scaling up

Scaling down

kvaps commented Mar 21, 2024 •

edited

Loading

kvaps commented Mar 21, 2024 •

edited

Loading

sircthulhu commented Mar 31, 2024 •

edited

Loading