Decommission via Helm #253

vuldin · 2022-12-21T14:01:52Z

Add an explicit decommission action to our Helm chart

Note: Decommission in Redpanda can fail if not all partitions can be reassigned to other brokers (and maybe be rejected by other future guardrails), so the command must fail gracefully with an error if Redpanda doesn't allow the decommission to succeed. Polling the Decommission status Admin API may be helpful here.

Historical context:


I'm running into a weird issue when trying to upgrade. I've reproduced the issue, but I'm not yet sure why this is happening. Steps to reproduce:

1. `git clone https://github.com/vuldin/helm-charts.git`
2. `cd helm-charts`
3. `git checkout issue-upgrading`
4. `helm upgrade --install --debug -f values.yaml redpanda charts/redpanda -n redpanda --create-namespace`
5. Toggle `config.cluster.auto_create_topics_enabled` in values.yaml
6. `helm upgrade --install --debug -f values.yaml redpanda charts/redpanda -n redpanda --create-namespace`

At this point you will end up in this state:
![image](https://user-images.githubusercontent.com/310946/208923205-c79e96a0-55b7-4833-b506-77fb700575cb.png)

[redpanda-0.log](https://github.com/redpanda-data/helm-charts/files/10278229/redpanda-0.log)
[redpanda-1.log](https://github.com/redpanda-data/helm-charts/files/10278230/redpanda-1.log)
[redpanda-2.log](https://github.com/redpanda-data/helm-charts/files/10278231/redpanda-2.log)

The text was updated successfully, but these errors were encountered:

vuldin · 2022-12-21T14:16:11Z

redpanda-0-trace.log
redpanda-1-trace.log
redpanda-2-trace.log

vuldin · 2022-12-21T14:26:36Z

Step 5 changes a specific parameter, but it doesn't seem to matter which parameter is changed in the upgrade as long as it requires a node restart.

vuldin · 2022-12-21T14:29:49Z

redpanda-2 (the one running node) will report an additional broker:

> kubectl exec -it redpanda-2 -n redpanda -c redpanda -- rpk cluster info
CLUSTER
=======
redpanda.10a0efc9-b1c3-46d0-b749-3935717861d2

BROKERS
=======
ID    HOST                                             PORT
0     redpanda-0.redpanda.redpanda.svc.cluster.local.  9093
1     redpanda-1.redpanda.redpanda.svc.cluster.local.  9093
2     redpanda-2.redpanda.redpanda.svc.cluster.local.  9093
3     redpanda-2.redpanda.redpanda.svc.cluster.local.  9093

andrwng · 2022-12-21T18:06:54Z

I see on redpanda-2:

DEBUG 2022-12-21 14:37:47,773 [shard 0] cluster - members_manager.cc:362 - Initial node UUID map: {{5a94e0ad-ee47-40e1-9b5a-298652d0c65e}: 2, {858497a0-344e-4792-99d3-80c70c3eced1}: 0, {aa2c902e-e1c7-4137-b862-1d4c232517c0}: 1}

indicating the seeds talked to a node at redpanda-0 at 14:37. OTOH, redpanda-0's first logs start later than that:

Support: https://support.redpanda.com/ - Contact the support team privately
Product Feedback: https://redpanda.com/feedback - Let us know how we can improve your experience
Slack: https://redpanda.com/slack - Chat about all things Redpanda. Join the conversation!
Twitter: https://twitter.com/redpandadata - All the latest Redpanda news!


DEBUG 2022-12-21 14:38:49,154 seastar - smp::count: 1
DEBUG 2022-12-21 14:38:49,155 seastar - latency_goal: 0.00075

...at which point it loaded a different UUID from disk than the one seen by redpanda-2:

INFO  2022-12-21 14:38:49,227 [shard 0] main - application.cc:1412 - Loaded existing UUID for node: {7239458b-ee94-4a53-9e65-28d61a74c4b7}

vuldin · 2022-12-21T18:49:14Z

redpanda-0-trace-before.log
redpanda-1-trace-before.log
redpanda-2-trace-before.log
redpanda-0-trace-after.log
redpanda-1-trace-after.log
redpanda-2-trace-after.log

andrwng · 2022-12-22T01:18:46Z

The underlying issue here was that in this example, we were using persistentVolume=false, yielding an ephemeral disk that was wiped during certain operations (e.g. when restarting Redpanda to apply a config update). This is unsafe without additional operations, as RP expects wiped nodes to come with a corresponding decommissioning operations. Without this, we end up where enough nodes get wiped and restarted without removing the old node IDs (which now are down), and the controller Raft group is no longer able to maintain a quorum.

One thought was to add a decommission step in the preStop phase, but that won't work for 3-node clusters, since they will be unable to complete the decommission unless another node is added prior, given a replication factor of 3.

So it seems the path forward is that, after starting up, a node should check whether there are duplicates of its hostname in the cluster with other node IDs assigned to it. If so, we should decommission the other, presumably down, node ID.

vuldin · 2022-12-24T16:41:35Z

Just to associate this log message to this issue, the following message appears constantly after a node is replaced but not decommissioned:

WARN  2022-12-24 16:37:21,544 [shard 0] raft - [group_id:0, {redpanda/controller/0}] consensus.h:581 - received append_entries request addressed to different node: {id: {0}, revision: {0}}, current node: {id: {3}, revision: {0}}, source: {id: {2}, revision: {0}}

The following commands shows that the node count went from 3 to 4:

k exec -it -n redpanda redpanda-1 -c redpanda -- curl http://localhost:9644/v1/brokers | jq .

I think there is a time frame after which an unresponsive node gets decommissioned, but node 0 was not automatically decommissioned after 30 minutes. The cluster continued to function normally but in unhealthy state. rpk cluster info report correct information, but rpk cluster health showed 4 nodes with node 0 down.

The following rpk command to decommission the node fails:

> kubectl exec -it -n redpanda redpanda-2 -c redpanda -- rpk redpanda admin brokers decommission 0
unable to initialize admin client: request GET http://127.0.0.1:9644/v1/brokers/0 failed: Service Unavailable, body: "{\"message\": \"Unexpected error: Requested node does not exists\", \"code\": 503}"

command terminated with exit code 1

But this admin API request from the controller broker resolved the issue:

kubectl exec -it -n redpanda redpanda-2 -c redpanda -- curl -X PUT http://localhost:9644/v1/brokers/0/decommission

vuldin · 2022-12-24T17:33:47Z

I've also seen the cluster come back up after helm upgrade complaining about request addressed to different node..., and so I follow the steps above and then the cluster complains Currently there is no leader controller elected in the cluster. When in this state, 2 of the 3 nodes think they are in a 2-node cluster, and the 3rd thinks it is in a 3-node cluster. (Disclaimer: this is on minikube so this could point to a minikube-related issue. There are times when I must minikube delete then minikube start... to get into a working state for some reason).

andrwng · 2022-12-24T19:41:31Z

I think there is a time frame after which an unresponsive node gets decommissioned, but node 0 was not automatically decommissioned after 30 minutes. The cluster continued to function normally but in unhealthy state. rpk cluster info report correct information, but rpk cluster health showed 4 nodes with node 0 down.

Redpanda moves data away from an unavailable node, but it won't officially decommission (kick the node out of the group). The mismatch between the admin API and rpk seems strange though.

When in this state, 2 of the 3 nodes think they are in a 2-node cluster, and the 3rd thinks it is in a 3-node cluster.

There being no leader for the controller I think is explainable by each node having a new node ID after having all its data wiped, and the Raft group effectively being twice the size it should be. That said, it's odd that you're seeing mismatched cluster sizes, and that the sizes are small. When in this state, what are the node IDs? Are they new or old IDs?

joejulian · 2023-01-12T18:39:58Z

So I think the gist of all this is that the pod roll is leaving the cluster split-brained. Somehow the pod data's being deleted (pvc deleted by something outside of helm or redpanda?).

joejulian · 2023-01-12T19:24:43Z

When using emptyDir, whenever a Pod is terminated, when the pod starts up its storage is empty so it's assigned a new broker id. The old id is never decommissioned.

The fix is adding a process to decommission the broker that is no longer in the cluster.

alejandroEsc · 2023-01-20T15:19:56Z

@joejulian could this be done by a preStop script addition? Or is something that needs to happen at some other stage?

RafalKorepta · 2023-04-13T15:51:15Z

There is some work done in old operator.

REF redpanda-data/redpanda#9750

mattschumpert · 2023-04-27T22:24:21Z

@joejulian this says a core estimate is 'pending' but if this is a noop for core (I think it is), can you change sizing-core to 'N/A' please

JakeSCahill · 2023-09-13T10:10:38Z

This issue says it requires docs, but from the discussion, it looks like this will be an automated process.

What needs adding to the docs?

RafalKorepta · 2023-09-14T13:54:36Z

@alejandroEsc @joejulian Correct me if I'm wrong, but I think docs should describe

2 modes of operations:

in helm chart deployment model one will be able to add side cars (automated controllers which will decommission ghost Redpandas)
in operator deployment model
- allow to have side cars in the Redpanda deployment model
- disable side cars, but the same functionality would be executed by operator

It is maybe irrelevant what automated decommission is actually doing behind the scenes, but at least describe options how to enable such functionality.

mattschumpert · 2023-09-14T22:48:07Z

💥

alejandroEsc · 2023-09-14T22:50:00Z

@mattschumpert this is only partially completed. The changes on redpanda still need to be reviewed and merged, I'll fix this

Calling decommission in the case of changing Pod annotation might be not possible if Pod was removed along with its annotation where previous Redpanda ID was stored. There is dedicated function to handle Ghost brokers. Reference redpanda-data/redpanda#9750 redpanda-data/redpanda#13298 redpanda-data/redpanda#13132 redpanda-data/helm-charts#253 redpanda-data/redpanda#12847

vuldin added the bug Something isn't working label Dec 21, 2022

andrwng assigned joejulian Dec 22, 2022

joejulian added the P0 Needs done immediately! label Dec 23, 2022

joejulian changed the title ~~Upgrading to apply change that requires node restart fails~~ Add a process to decommission broker IDs for pods that are no loger in the cluster Jan 12, 2023

joejulian added enhancement New feature or request P1 High Priority - Sometime in the next 3 weeks chart:redpanda and removed bug Something isn't working P0 Needs done immediately! labels Jan 12, 2023

joejulian removed their assignment Jan 12, 2023

vuldin changed the title ~~Add a process to decommission broker IDs for pods that are no loger in the cluster~~ Add a process to decommission broker IDs for pods that are no longer in the cluster Jan 13, 2023

joejulian assigned joejulian, alejandroEsc and RafalKorepta Mar 13, 2023

alejandroEsc removed their assignment Mar 14, 2023

joejulian removed their assignment Apr 7, 2023

mattschumpert added the area/k8s label Apr 13, 2023

joejulian added this to the 2023Q2 milestone May 22, 2023

joejulian unassigned RafalKorepta May 30, 2023

alejandroEsc self-assigned this Jun 21, 2023

alejandroEsc mentioned this issue Jun 27, 2023

WIP - Decommission script for helm chart #564

Closed

alejandroEsc removed their assignment Jul 21, 2023

mattschumpert changed the title ~~Add a process to decommission broker IDs for pods that are no longer in the cluster~~ Add a Helm process to decommission broker IDs for pods that are no longer in the cluster Jul 27, 2023

mattschumpert changed the title ~~Add a Helm process to decommission broker IDs for pods that are no longer in the cluster~~ Decommission via Helm Jul 27, 2023

mattschumpert assigned joejulian Jul 27, 2023

joejulian removed their assignment Jul 31, 2023

joejulian modified the milestones: 2023Q2, 2023 August - 2023 November Aug 1, 2023

joejulian assigned alejandroEsc Aug 7, 2023

alejandroEsc mentioned this issue Aug 16, 2023

Redpanda-Operator: manage decommission for helm charts redpanda-data/redpanda#12847

Merged

7 tasks

alejandroEsc mentioned this issue Aug 23, 2023

feat: Redpanda parts for supporting decommissioning #684

Merged

RafalKorepta closed this as completed in #684 Sep 14, 2023

alejandroEsc reopened this Sep 14, 2023

alejandroEsc closed this as completed in redpanda-data/redpanda#12847 Sep 22, 2023

RafalKorepta mentioned this issue Jun 19, 2024

Do not perform Redpanda decommission based on annotation redpanda-data/redpanda-operator#161

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decommission via Helm #253

Decommission via Helm #253

vuldin commented Dec 21, 2022 •

edited by mattschumpert

Loading

vuldin commented Dec 21, 2022 •

edited

Loading

vuldin commented Dec 21, 2022

vuldin commented Dec 21, 2022

andrwng commented Dec 21, 2022

vuldin commented Dec 21, 2022

andrwng commented Dec 22, 2022

vuldin commented Dec 24, 2022 •

edited

Loading

vuldin commented Dec 24, 2022 •

edited

Loading

andrwng commented Dec 24, 2022

joejulian commented Jan 12, 2023

joejulian commented Jan 12, 2023

alejandroEsc commented Jan 20, 2023

RafalKorepta commented Apr 13, 2023

mattschumpert commented Apr 27, 2023

JakeSCahill commented Sep 13, 2023

RafalKorepta commented Sep 14, 2023

mattschumpert commented Sep 14, 2023

alejandroEsc commented Sep 14, 2023

Decommission via Helm #253

Decommission via Helm #253

Comments

vuldin commented Dec 21, 2022 • edited by mattschumpert Loading

vuldin commented Dec 21, 2022 • edited Loading

vuldin commented Dec 21, 2022

vuldin commented Dec 21, 2022

andrwng commented Dec 21, 2022

vuldin commented Dec 21, 2022

andrwng commented Dec 22, 2022

vuldin commented Dec 24, 2022 • edited Loading

vuldin commented Dec 24, 2022 • edited Loading

andrwng commented Dec 24, 2022

joejulian commented Jan 12, 2023

joejulian commented Jan 12, 2023

alejandroEsc commented Jan 20, 2023

RafalKorepta commented Apr 13, 2023

mattschumpert commented Apr 27, 2023

JakeSCahill commented Sep 13, 2023

RafalKorepta commented Sep 14, 2023

mattschumpert commented Sep 14, 2023

alejandroEsc commented Sep 14, 2023

vuldin commented Dec 21, 2022 •

edited by mattschumpert

Loading

vuldin commented Dec 21, 2022 •

edited

Loading

vuldin commented Dec 24, 2022 •

edited

Loading

vuldin commented Dec 24, 2022 •

edited

Loading