Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not perform Redpanda decommission based on annotation #161

Merged
merged 2 commits into from
Jul 2, 2024

Conversation

RafalKorepta
Copy link
Contributor

Calling decommission in the case of changing Pod annotation might be not
possible if Pod was removed along with its annotation where previous
Redpanda ID was stored. There is dedicated function to handle Ghost
brokers.

Decommission ghost brokers based on health overview where down nodes
are listed by Redpanda.

Reference

redpanda-data/redpanda#9750

redpanda-data/redpanda#13298
redpanda-data/redpanda#13132

redpanda-data/helm-charts#253
redpanda-data/redpanda#12847

redpanda-data/redpanda#4295
redpanda-data/redpanda#4526

@RafalKorepta RafalKorepta force-pushed the rk/K8S-250/call-decommission-less-often branch 2 times, most recently from ead39de to 5a516fd Compare June 21, 2024 14:43
Copy link
Contributor

@chrisseto chrisseto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for all the comments. I'm just a bit nervous about this code path, as you might imagine.

if _, ok := actualBrokerIDs[broker.NodeID]; ok {
continue
// Reduce brokers to the list of brokers that are not visible at the K8S level
bl, err := adminClient.Brokers(ctx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really wish we had some degree of time component associated with this, like a "last seen". I'm worried there might be edge cases when a pod is newly schedule or in other cases where there's a temporary blip of information.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In lieu of a time component, I'm wondering if this logic should be less aggressive? As written, I think it's possible for for us to decommission a broker that's not alive but is correlated to a Pod or visa versa.

Should we instead attempt to correlate brokers to Pods and then consider the left over list potential ghost brokers?

for _, pod := range pods {
    _, found := brokersByID[getBrokerIDFromPod(pod)]
    if !found {
        // Abort the call here, there's a Pod that we can't correlate to a broker and therefore it's unsafe to attempt to decommission any broker in this code path.
    }
   // If found, remove the broker from our map
   delete(brokersByID(getBrokerIDFromPod(pod)))
}

// We know now that all Pods are accounted for. Decommissioning should be safe as long as Pod annotations are in sync.
// Maybe we should be pushing BrokerIDs onto the PV's as those are more likely to stay correct?
// Alternatively, we could try to fetch BrokerIDs from each Pod to minimize any chance of race condition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really wish we had some degree of time component associated with this, like a "last seen".

All right, but that information would be stored in custom resource definition or in the Pod condition or in Pod annotation (maybe somewhere else)?

I'm worried there might be edge cases when a pod is newly schedule or in other cases where there's a temporary blip of information.

If node is newly schedule it should show up in the brokers list as it needs to registered into Redpanda cluster. The Pod annotation would not be available and the code

nodeIDStrAnnotation, annotationExist := pod.Annotations[resources.PodAnnotationNodeIDKey]
if !annotationExist {
return fmt.Errorf("requeuing: %w", &missingBrokerIDError{})

would bail out and requeue.

I'm wondering if this logic should be less aggressive? As written, I think it's possible for for us to decommission a broker that's not alive but is correlated to a Pod or visa versa.

If Pod does not have annotation with the Redpanda ID the whole function will re queue.

nodeIDStrAnnotation, annotationExist := pod.Annotations[resources.PodAnnotationNodeIDKey]
if !annotationExist {
return fmt.Errorf("requeuing: %w", &missingBrokerIDError{})

If broker has Pod annotation, but it is not alive should be considered to be removed, right?

Should we instead attempt to correlate brokers to Pods and then consider the left over list potential ghost brokers?

That seems reasonable, but I feel like Redpanda cluster health report should be reconciled too. If there is any discrepancy we should stop too.

}

// Until cluster does not report that its healthy do not decommission
if !health.IsHealthy {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually want this to be an error case or should this function just noop if the cluster isn't healthy?

For example, if there's anything after a call to doDecommissionGhostBrokers (Updating .Status comes to mind), returning an error would prevent that from running but I don't think we'd actually want that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separate question: Is it possible for the presence of ghost brokers to cause a cluster to be unhealthy?

Copy link
Contributor Author

@RafalKorepta RafalKorepta Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually want this to be an error case or should this function just noop if the cluster isn't healthy?

For example, if there's anything after a call to doDecommissionGhostBrokers (Updating .Status comes to mind), returning an error would prevent that from running but I don't think we'd actually want that.

There is only updating License. I could reorder those, so that decommission is the last action in the reconciliation loop.

If this early return should just not decommission anything until cluster become healthy, then I can adjust that too.

Separate question: Is it possible for the presence of ghost brokers to cause a cluster to be unhealthy?

Yes, if we lose majority.

// Until cluster does not report that its healthy do not decommission
if !health.IsHealthy {
err = errors.New("cluster is not healthy") //nolint:goerr113,err113 // That error will not be handled by reconciler function
log.V(logger.DebugLevel).Error(err, "stopping decommission verification",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar question to do we want to return an error? This feels more like a warning or info log line.

Also I don't think "stopping decommission verification" is very descriptive here. How about "skipping ghost broker checks due to cluster health"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I don't think "stopping decommission verification" is very descriptive here. How about "skipping ghost broker checks due to cluster health"?

Agree will change.

Similar question to do we want to return an error? This feels more like a warning or info log line.

I will change it to warn log line without an error.

@RafalKorepta RafalKorepta force-pushed the rk/K8S-250/call-decommission-less-often branch from 5a516fd to 725a7d2 Compare June 28, 2024 17:22
Copy link
Contributor

@chrisseto chrisseto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Is this tested by anything currently? If not we should file an issue to make sure this eventually gets some automated tests.

@@ -230,7 +230,7 @@ lint: golangci-lint

.PHONY: lint-fix
lint-fix: golangci-lint ## Run golangci-lint linter and perform fixes
$(GOLANGCI_LINT) run --fix
$(GOLANGCI_LINT) run --fix --timeout 30m
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh we need to remove this or replace it with task file as this is very much out of sync with what CI runs. Not to be fixed in this PR. I can fix it because I missed it when I changed up the linting. Sorry 😓

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NP I will remove that

Calling decommission in the case of changing Pod annotation might be not
possible if Pod was removed along with its annotation where previous
Redpanda ID was stored. There is dedicated function to handle Ghost
brokers.

Reference

redpanda-data/redpanda#9750

redpanda-data/redpanda#13298
redpanda-data/redpanda#13132

redpanda-data/helm-charts#253
redpanda-data/redpanda#12847
@RafalKorepta
Copy link
Contributor Author

I manually performed in 4 node cluster

kubectl delete pvc datadir-decommission-0 &
kubectl delete --force pod decommission-0

And correctly ghost broker was decommissioned

{
  "level": "info",
  "ts": "2024-07-02T14:48:06.161Z",
  "logger": "ClusterReconciler.Reconcile.doDecommissionGhostBrokers",
  "msg": "Nodes that are reported by Redpanda, but are not running in Kubernetes cluster",
  "controller": "cluster",
  "controllerGroup": "redpanda.vectorized.io",
  "controllerKind": "Cluster",
  "Cluster": {
    "name": "decommission",
    "namespace": "redpanda-system"
  },
  "namespace": "redpanda-system",
  "name": "decommission",
  "reconcileID": "88d52a69-cb25-4ccb-b7e2-a09d75bfdbd3",
  "nodes-considered-down": [
    0
  ],
  "cluster-health": {
    "is_healthy": true,
    "controller_id": 1,
    "all_nodes": [
      0,
      1,
      2,
      3,
      4
    ],
    "nodes_down": [],
    "leaderless_partitions": [],
    "under_replicated_partitions": []
  },
  "broker-list": [
    {
      "node_id": 0,
      "num_cores": 1,
      "membership_status": "active",
      "is_alive": false,
      "version": "",
      "maintenance_status": {
        "draining": false,
        "finished": false,
        "errors": false,
        "partitions": 0,
        "eligible": 0,
        "transferring": 0,
        "failed": 0
      }
    },
    {
      "node_id": 1,
      "num_cores": 1,
      "membership_status": "active",
      "is_alive": true,
      "version": "v24.1.9 - 871368edb0f049a9e8ee23068a2f8645c6e2e031",
      "maintenance_status": {
        "draining": false,
        "finished": false,
        "errors": false,
        "partitions": 0,
        "eligible": 0,
        "transferring": 0,
        "failed": 0
      }
    },
    {
      "node_id": 2,
      "num_cores": 1,
      "membership_status": "active",
      "is_alive": true,
      "version": "v24.1.9 - 871368edb0f049a9e8ee23068a2f8645c6e2e031",
      "maintenance_status": {
        "draining": false,
        "finished": false,
        "errors": false,
        "partitions": 0,
        "eligible": 0,
        "transferring": 0,
        "failed": 0
      }
    },
    {
      "node_id": 3,
      "num_cores": 1,
      "membership_status": "active",
      "is_alive": true,
      "version": "v24.1.9 - 871368edb0f049a9e8ee23068a2f8645c6e2e031",
      "maintenance_status": {
        "draining": false,
        "finished": false,
        "errors": false,
        "partitions": 0,
        "eligible": 0,
        "transferring": 0,
        "failed": 0
      }
    },
    {
      "node_id": 4,
      "num_cores": 1,
      "membership_status": "active",
      "is_alive": true,
      "version": "",
      "maintenance_status": {
        "draining": false,
        "finished": false,
        "errors": false,
        "partitions": 0,
        "eligible": 0,
        "transferring": 0,
        "failed": 0
      }
    }
  ],
  "actual-broker-ids": {
    "1": null,
    "2": null,
    "3": null,
    "4": null
  }
}
{
  "level": "info",
  "ts": "2024-07-02T14:48:06.161Z",
  "logger": "ClusterReconciler.Reconcile",
  "msg": "decommissioning ghost broker",
  "controller": "cluster",
  "controllerGroup": "redpanda.vectorized.io",
  "controllerKind": "Cluster",
  "Cluster": {
    "name": "decommission",
    "namespace": "redpanda-system"
  },
  "namespace": "redpanda-system",
  "name": "decommission",
  "reconcileID": "88d52a69-cb25-4ccb-b7e2-a09d75bfdbd3",
  "node-id": 0
}

By reducing broker list to only the one that are not running as Pod and
are reported as not alive.
@RafalKorepta RafalKorepta force-pushed the rk/K8S-250/call-decommission-less-often branch from 725a7d2 to 7775d42 Compare July 2, 2024 14:52
@RafalKorepta RafalKorepta enabled auto-merge (rebase) July 2, 2024 14:54
@RafalKorepta RafalKorepta merged commit 9fa7a78 into main Jul 2, 2024
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants