Control pod deletion behaviour via annotation #112

koikonom · 2024-04-11T10:51:33Z

This change introduces operator.redpanda.com/decommission-on-delete annotation that tells the operator to decommission a node when a pod gets deleted.

Seting this annotation allows us to upgrade the k8s worker nodes of a cluster without having to worry about launching or deleting instances because we can trigger the cloud provider's native upgrade functionality.

In essense we set the annotation and then allow the cloud provider to launch a new k8s worker node and drain the old one. Since we're now decommissioning the redpanda pod when it gets deleted the new pod that gets scheduled will no longer be stuck in Pending state due to PVC attachments still on the old k8s worker node.

alejandroEsc · 2024-04-11T12:49:22Z

src/go/k8s/internal/controller/redpanda/cluster_controller.go

@@ -334,6 +336,17 @@ func (r *ClusterReconciler) handlePodFinalizer(
 	if err != nil {
 		return fmt.Errorf("unable to fetch PodList: %w", err)
 	}
+
+	var decommissionOnDelete bool
+	decommissionOnDeleteVal, ok := rp.Annotations[DecommissionOnDeleteAnnotation]


Can you show me the effect of this being true? I checked and this seems like a new boolean, and found non, so how does decommission get triggered now?

We convert the value to a boolean (line 343) and then in line 377 if the value is true we don't just restart the pod, but instead we decommission the redpanda node and delete the pvc as well.

0xdiba

Looks good, have a discussion point on operational safety where we might need to consider changing the implementation.

This implements and closes redpanda-data/redpanda#10409

0xdiba · 2024-04-12T13:31:20Z

src/go/k8s/internal/controller/redpanda/cluster_controller.go

@@ -361,7 +374,7 @@ func (r *ClusterReconciler) handlePodFinalizer(
 					untainted = false
 				}
 			}
-			if untainted {
+			if untainted && !decommissionOnDelete {


I'm thinking we should probably make this safer based on the description/use-case of your PR and the scenarios at hand.

Say we've set the annotation on the cluster (ie for an upgrade that takes hours) and let's consider the following scenarios:

During the process, a human operator sets a configuration (requiring a restart) using rpk on the cluster. That would trigger the operator to do a whole-cluster restart. Would that delete and recreate all of the PVCs sequentially?

Say due to an update/patch error or urgency of incidents the annotation is left on the cluster. Would that trigger PVC recreation on every pod delete (eg restarts/upgrades etc)?

Based on the usecases on the PR (upgrades) and the fact that the main process used there is node draining would it make sense to keep and extend the current logic and just respect the NoSchedule taint if the annotation is set?
ie adding a new case to the for-loop directly above

if decomissionOnDelete && taint.Effect == corev1.TaintEffectNoSchedule { untainted = false }

That would give us more confidence we don't decomission nodes that aren't going away and have to transfer data "by mistake".

This change introduces `operator.redpanda.com/decommission-on-delete` annotation that tells the operator to decommission a node when a pod gets deleted. Seting this annotation allows us to upgrade the k8s worker nodes of a cluster without having to worry about launching or deleting instances because we can trigger the cloud provider's native upgrade functionality. In essense we set the annotation and then allow the cloud provider to launch a new k8s worker node and drain the old one. Since we're now decommissioning the redpanda pod when it gets deleted the new pod that gets scheduled will no longer be stuck in Pending state due to PVC attachments still on the old k8s worker node.

@RafalKorepta

What's Changed * Remove cluster to redpanda migration tool by @RafalKorepta in redpanda-data/redpanda-operator#120 * Control pod deletion behaviour via annotation by @koikonom in redpanda-data/redpanda-operator#112 * Add Tiered Storage config options for Azure by @JakeSCahill in redpanda-data/redpanda-operator#127 New Contributors * @koikonom made their first contribution in redpanda-data/redpanda-operator#112 **Full Changelog**: redpanda-data/redpanda-operator@v2.1.17-23.3.11...v2.1.18-23.3.13

@RafalKorepta

What's Changed * Remove cluster to redpanda migration tool by @RafalKorepta in redpanda-data/redpanda-operator#120 * Control pod deletion behaviour via annotation by @koikonom in redpanda-data/redpanda-operator#112 * Add Tiered Storage config options for Azure by @JakeSCahill in redpanda-data/redpanda-operator#127 New Contributors * @koikonom made their first contribution in redpanda-data/redpanda-operator#112 **Full Changelog**: redpanda-data/redpanda-operator@v2.1.17-23.3.11...v2.1.18-23.3.13

As there 1 hour limit with cloud provider node pool migration, that is not enough for some Redpanda cluster to finish decommission on delete feature will be removed from Redpanda operator. Reference #112

Previous usage of finalizer handlers was unreliable in the case of flipping Kubernetes Nodes ready status. Local SSD disks that could be attached to Redpanda Pod prevents rescheduling as the Persistent Volume affinity bounds Pod to only one Node. In case of Kubernetes Node coming back to live Cluster controller could already delete Redpanda data (PVC deletion and Redpanda decommissioning). If particular Redpanda Node would host single replica partition, then it would be a data lost. If the majority of Redpanda process would run in unstable Kubernetes Nodes, then Redpanda operator could break whole cluster by losing Raft quorum. Reference #112 redpanda-data/redpanda#6942

koikonom requested review from 0xdiba and ppanagiotis April 11, 2024 10:51

koikonom force-pushed the decomm_on_delete branch 4 times, most recently from ff8e23e to 15749c1 Compare April 11, 2024 11:55

alejandroEsc reviewed Apr 11, 2024

View reviewed changes

0xdiba reviewed Apr 12, 2024

View reviewed changes

koikonom force-pushed the decomm_on_delete branch 5 times, most recently from 855d0d0 to 12086b8 Compare April 18, 2024 12:44

koikonom force-pushed the decomm_on_delete branch from dbf1de8 to ecd1484 Compare April 23, 2024 08:09

koikonom force-pushed the decomm_on_delete branch from ecd1484 to 1ac0985 Compare April 23, 2024 08:10

koikonom marked this pull request as ready for review April 27, 2024 08:29

koikonom requested review from RafalKorepta and chrisseto as code owners April 27, 2024 08:29

RafalKorepta approved these changes Apr 27, 2024

View reviewed changes

RafalKorepta merged commit 12bbab4 into main Apr 27, 2024
5 checks passed

RafalKorepta mentioned this pull request May 1, 2024

Bump redpanda and redpanda-operator helm chart version redpanda-data/helm-charts#1263

Merged

RafalKorepta deleted the decomm_on_delete branch May 23, 2024 09:24

RafalKorepta mentioned this pull request Jun 18, 2024

Do not decommission when node is miss behaving #158

Closed

RafalKorepta mentioned this pull request Jun 19, 2024

Remove Cluster and Pod finalizers #160

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Control pod deletion behaviour via annotation #112

Control pod deletion behaviour via annotation #112

koikonom commented Apr 11, 2024

alejandroEsc Apr 11, 2024

koikonom Apr 11, 2024

0xdiba left a comment •

edited

Loading

0xdiba Apr 12, 2024

Control pod deletion behaviour via annotation #112

Control pod deletion behaviour via annotation #112

Conversation

koikonom commented Apr 11, 2024

alejandroEsc Apr 11, 2024

Choose a reason for hiding this comment

koikonom Apr 11, 2024

Choose a reason for hiding this comment

0xdiba left a comment • edited Loading

Choose a reason for hiding this comment

0xdiba Apr 12, 2024

Choose a reason for hiding this comment

0xdiba left a comment •

edited

Loading