New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Do not perform Redpanda decommission based on annotation #161

Merged

RafalKorepta merged 2 commits into main from rk/K8S-250/call-decommission-less-often

Jul 2, 2024

Contributor

RafalKorepta commented Jun 19, 2024

Calling decommission in the case of changing Pod annotation might be not
possible if Pod was removed along with its annotation where previous
Redpanda ID was stored. There is dedicated function to handle Ghost
brokers.

Decommission ghost brokers based on health overview where down nodes
are listed by Redpanda.

Reference

redpanda-data/redpanda#9750

redpanda-data/redpanda#13298
redpanda-data/redpanda#13132

redpanda-data/helm-charts#253
redpanda-data/redpanda#12847

redpanda-data/redpanda#4295
redpanda-data/redpanda#4526

RafalKorepta requested a review from chrisseto as a code owner

June 19, 2024 22:02

chrisseto reviewed

View reviewed changes

src/go/k8s/internal/controller/redpanda/cluster_controller.go Outdated Show resolved Hide resolved

src/go/k8s/internal/controller/redpanda/cluster_controller.go Outdated Show resolved Hide resolved

src/go/k8s/internal/controller/redpanda/cluster_controller.go Outdated Show resolved Hide resolved

src/go/k8s/internal/controller/redpanda/cluster_controller.go Outdated Show resolved Hide resolved

RafalKorepta force-pushed the rk/K8S-250/call-decommission-less-often branch 2 times, most recently from ead39de to 5a516fd Compare

June 21, 2024 14:43

chrisseto reviewed

View reviewed changes

Contributor

chrisseto left a comment

Sorry for all the comments. I'm just a bit nervous about this code path, as you might imagine.

src/go/k8s/internal/controller/redpanda/cluster_controller.go Outdated Show resolved Hide resolved

src/go/k8s/internal/controller/redpanda/cluster_controller.go Outdated Show resolved Hide resolved

src/go/k8s/internal/controller/redpanda/cluster_controller.go Outdated

-              		if _, ok := actualBrokerIDs[broker.NodeID]; ok {
-              			continue
+              	// Reduce brokers to the list of brokers that are not visible at the K8S level
+              	bl, err := adminClient.Brokers(ctx)

Contributor

chrisseto Jun 25, 2024

I really wish we had some degree of time component associated with this, like a "last seen". I'm worried there might be edge cases when a pod is newly schedule or in other cases where there's a temporary blip of information.

Contributor

chrisseto Jun 25, 2024

In lieu of a time component, I'm wondering if this logic should be less aggressive? As written, I think it's possible for for us to decommission a broker that's not alive but is correlated to a Pod or visa versa.

Should we instead attempt to correlate brokers to Pods and then consider the left over list potential ghost brokers?

for _, pod := range pods {
    _, found := brokersByID[getBrokerIDFromPod(pod)]
    if !found {
        // Abort the call here, there's a Pod that we can't correlate to a broker and therefore it's unsafe to attempt to decommission any broker in this code path.
    }
   // If found, remove the broker from our map
   delete(brokersByID(getBrokerIDFromPod(pod)))
}

// We know now that all Pods are accounted for. Decommissioning should be safe as long as Pod annotations are in sync.
// Maybe we should be pushing BrokerIDs onto the PV's as those are more likely to stay correct?
// Alternatively, we could try to fetch BrokerIDs from each Pod to minimize any chance of race condition.

Contributor Author

RafalKorepta Jun 27, 2024

I really wish we had some degree of time component associated with this, like a "last seen".

All right, but that information would be stored in custom resource definition or in the Pod condition or in Pod annotation (maybe somewhere else)?

I'm worried there might be edge cases when a pod is newly schedule or in other cases where there's a temporary blip of information.

If node is newly schedule it should show up in the brokers list as it needs to registered into Redpanda cluster. The Pod annotation would not be available and the code

redpanda-operator/src/go/k8s/internal/controller/redpanda/cluster_controller.go

Lines 1013 to 1015 in 5a516fd

    
           nodeIDStrAnnotation, annotationExist := pod.Annotations[resources.PodAnnotationNodeIDKey] 
        
           if !annotationExist { 
        
           	return fmt.Errorf("requeuing: %w", &missingBrokerIDError{})

would bail out and requeue.

I'm wondering if this logic should be less aggressive? As written, I think it's possible for for us to decommission a broker that's not alive but is correlated to a Pod or visa versa.

If Pod does not have annotation with the Redpanda ID the whole function will re queue.

redpanda-operator/src/go/k8s/internal/controller/redpanda/cluster_controller.go

Lines 1013 to 1015 in 5a516fd

    
           nodeIDStrAnnotation, annotationExist := pod.Annotations[resources.PodAnnotationNodeIDKey] 
        
           if !annotationExist { 
        
           	return fmt.Errorf("requeuing: %w", &missingBrokerIDError{})

If broker has Pod annotation, but it is not alive should be considered to be removed, right?

Should we instead attempt to correlate brokers to Pods and then consider the left over list potential ghost brokers?

That seems reasonable, but I feel like Redpanda cluster health report should be reconciled too. If there is any discrepancy we should stop too.

src/go/k8s/internal/controller/redpanda/cluster_controller.go Outdated Show resolved Hide resolved

src/go/k8s/internal/controller/redpanda/cluster_controller.go Outdated

+              	}
+              	// Until cluster does not report that its healthy do not decommission
+              	if !health.IsHealthy {

Contributor

chrisseto Jun 25, 2024

Do we actually want this to be an error case or should this function just noop if the cluster isn't healthy?

For example, if there's anything after a call to doDecommissionGhostBrokers (Updating .Status comes to mind), returning an error would prevent that from running but I don't think we'd actually want that.

Contributor

chrisseto Jun 25, 2024

Separate question: Is it possible for the presence of ghost brokers to cause a cluster to be unhealthy?

Contributor Author

RafalKorepta Jun 27, 2024 •

edited

Loading

Do we actually want this to be an error case or should this function just noop if the cluster isn't healthy?

For example, if there's anything after a call to doDecommissionGhostBrokers (Updating .Status comes to mind), returning an error would prevent that from running but I don't think we'd actually want that.

There is only updating License. I could reorder those, so that decommission is the last action in the reconciliation loop.

If this early return should just not decommission anything until cluster become healthy, then I can adjust that too.

Separate question: Is it possible for the presence of ghost brokers to cause a cluster to be unhealthy?

Yes, if we lose majority.

src/go/k8s/internal/controller/redpanda/cluster_controller.go Outdated

+              	// Until cluster does not report that its healthy do not decommission
+              	if !health.IsHealthy {
+              		err = errors.New("cluster is not healthy") //nolint:goerr113,err113 // That error will not be handled by reconciler function
+              		log.V(logger.DebugLevel).Error(err, "stopping decommission verification",

Contributor

chrisseto Jun 25, 2024

Similar question to do we want to return an error? This feels more like a warning or info log line.

Also I don't think "stopping decommission verification" is very descriptive here. How about "skipping ghost broker checks due to cluster health"?

Contributor Author

RafalKorepta Jun 27, 2024

Also I don't think "stopping decommission verification" is very descriptive here. How about "skipping ghost broker checks due to cluster health"?

Agree will change.

Similar question to do we want to return an error? This feels more like a warning or info log line.

I will change it to warn log line without an error.

RafalKorepta force-pushed the rk/K8S-250/call-decommission-less-often branch from 5a516fd to 725a7d2 Compare

June 28, 2024 17:22

chrisseto approved these changes

View reviewed changes

Contributor

chrisseto left a comment

LGTM! Is this tested by anything currently? If not we should file an issue to make sure this eventually gets some automated tests.

src/go/k8s/Makefile Outdated

@@ @@ -230,7 +230,7 @@ lint: golangci-lint @@
               .PHONY: lint-fix
               lint-fix: golangci-lint ## Run golangci-lint linter and perform fixes
-              	$(GOLANGCI_LINT) run --fix
+              	$(GOLANGCI_LINT) run --fix --timeout 30m

Contributor

chrisseto Jul 1, 2024

Oh we need to remove this or replace it with task file as this is very much out of sync with what CI runs. Not to be fixed in this PR. I can fix it because I missed it when I changed up the linting. Sorry 😓

Contributor Author

RafalKorepta Jul 2, 2024

NP I will remove that


          Do not perform Redpanda decommission based on annotation

552ba34

Calling decommission in the case of changing Pod annotation might be not
possible if Pod was removed along with its annotation where previous
Redpanda ID was stored. There is dedicated function to handle Ghost
brokers.

Reference

redpanda-data/redpanda#9750

redpanda-data/redpanda#13298
redpanda-data/redpanda#13132

redpanda-data/helm-charts#253
redpanda-data/redpanda#12847

Contributor Author

RafalKorepta commented Jul 2, 2024

I manually performed in 4 node cluster

kubectl delete pvc datadir-decommission-0 &
kubectl delete --force pod decommission-0

And correctly ghost broker was decommissioned

{
  "level": "info",
  "ts": "2024-07-02T14:48:06.161Z",
  "logger": "ClusterReconciler.Reconcile.doDecommissionGhostBrokers",
  "msg": "Nodes that are reported by Redpanda, but are not running in Kubernetes cluster",
  "controller": "cluster",
  "controllerGroup": "redpanda.vectorized.io",
  "controllerKind": "Cluster",
  "Cluster": {
    "name": "decommission",
    "namespace": "redpanda-system"
  },
  "namespace": "redpanda-system",
  "name": "decommission",
  "reconcileID": "88d52a69-cb25-4ccb-b7e2-a09d75bfdbd3",
  "nodes-considered-down": [
    0
  ],
  "cluster-health": {
    "is_healthy": true,
    "controller_id": 1,
    "all_nodes": [
      0,
      1,
      2,
      3,
      4
    ],
    "nodes_down": [],
    "leaderless_partitions": [],
    "under_replicated_partitions": []
  },
  "broker-list": [
    {
      "node_id": 0,
      "num_cores": 1,
      "membership_status": "active",
      "is_alive": false,
      "version": "",
      "maintenance_status": {
        "draining": false,
        "finished": false,
        "errors": false,
        "partitions": 0,
        "eligible": 0,
        "transferring": 0,
        "failed": 0
      }
    },
    {
      "node_id": 1,
      "num_cores": 1,
      "membership_status": "active",
      "is_alive": true,
      "version": "v24.1.9 - 871368edb0f049a9e8ee23068a2f8645c6e2e031",
      "maintenance_status": {
        "draining": false,
        "finished": false,
        "errors": false,
        "partitions": 0,
        "eligible": 0,
        "transferring": 0,
        "failed": 0
      }
    },
    {
      "node_id": 2,
      "num_cores": 1,
      "membership_status": "active",
      "is_alive": true,
      "version": "v24.1.9 - 871368edb0f049a9e8ee23068a2f8645c6e2e031",
      "maintenance_status": {
        "draining": false,
        "finished": false,
        "errors": false,
        "partitions": 0,
        "eligible": 0,
        "transferring": 0,
        "failed": 0
      }
    },
    {
      "node_id": 3,
      "num_cores": 1,
      "membership_status": "active",
      "is_alive": true,
      "version": "v24.1.9 - 871368edb0f049a9e8ee23068a2f8645c6e2e031",
      "maintenance_status": {
        "draining": false,
        "finished": false,
        "errors": false,
        "partitions": 0,
        "eligible": 0,
        "transferring": 0,
        "failed": 0
      }
    },
    {
      "node_id": 4,
      "num_cores": 1,
      "membership_status": "active",
      "is_alive": true,
      "version": "",
      "maintenance_status": {
        "draining": false,
        "finished": false,
        "errors": false,
        "partitions": 0,
        "eligible": 0,
        "transferring": 0,
        "failed": 0
      }
    }
  ],
  "actual-broker-ids": {
    "1": null,
    "2": null,
    "3": null,
    "4": null
  }
}
{
  "level": "info",
  "ts": "2024-07-02T14:48:06.161Z",
  "logger": "ClusterReconciler.Reconcile",
  "msg": "decommissioning ghost broker",
  "controller": "cluster",
  "controllerGroup": "redpanda.vectorized.io",
  "controllerKind": "Cluster",
  "Cluster": {
    "name": "decommission",
    "namespace": "redpanda-system"
  },
  "namespace": "redpanda-system",
  "name": "decommission",
  "reconcileID": "88d52a69-cb25-4ccb-b7e2-a09d75bfdbd3",
  "node-id": 0
}


          Decommission brokers when Redpanda reports additional not alive nodes

7775d42

By reducing broker list to only the one that are not running as Pod and
are reported as not alive.

RafalKorepta force-pushed the rk/K8S-250/call-decommission-less-often branch from 725a7d2 to 7775d42 Compare

July 2, 2024 14:52

RafalKorepta enabled auto-merge (rebase)

July 2, 2024 14:54

RafalKorepta merged commit 9fa7a78 into main

3 of 4 checks passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment