Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decommission via Helm #253

Closed
vuldin opened this issue Dec 21, 2022 · 18 comments · Fixed by redpanda-data/redpanda#12847 or #684
Closed

Decommission via Helm #253

vuldin opened this issue Dec 21, 2022 · 18 comments · Fixed by redpanda-data/redpanda#12847 or #684
Assignees
Labels
area/k8s chart:redpanda enhancement New feature or request P1 High Priority - Sometime in the next 3 weeks

Comments

@vuldin
Copy link
Member

vuldin commented Dec 21, 2022

Add an explicit decommission action to our Helm chart

Note: Decommission in Redpanda can fail if not all partitions can be reassigned to other brokers (and maybe be rejected by other future guardrails), so the command must fail gracefully with an error if Redpanda doesn't allow the decommission to succeed. Polling the Decommission status Admin API may be helpful here.

Historical context:


I'm running into a weird issue when trying to upgrade. I've reproduced the issue, but I'm not yet sure why this is happening. Steps to reproduce:

1. `git clone https://github.com/vuldin/helm-charts.git`
2. `cd helm-charts`
3. `git checkout issue-upgrading`
4. `helm upgrade --install --debug -f values.yaml redpanda charts/redpanda -n redpanda --create-namespace`
5. Toggle `config.cluster.auto_create_topics_enabled` in values.yaml
6. `helm upgrade --install --debug -f values.yaml redpanda charts/redpanda -n redpanda --create-namespace`

At this point you will end up in this state:
![image](https://user-images.githubusercontent.com/310946/208923205-c79e96a0-55b7-4833-b506-77fb700575cb.png)

[redpanda-0.log](https://github.com/redpanda-data/helm-charts/files/10278229/redpanda-0.log)
[redpanda-1.log](https://github.com/redpanda-data/helm-charts/files/10278230/redpanda-1.log)
[redpanda-2.log](https://github.com/redpanda-data/helm-charts/files/10278231/redpanda-2.log)

@vuldin vuldin added the bug Something isn't working label Dec 21, 2022
@vuldin
Copy link
Member Author

vuldin commented Dec 21, 2022

@vuldin
Copy link
Member Author

vuldin commented Dec 21, 2022

Step 5 changes a specific parameter, but it doesn't seem to matter which parameter is changed in the upgrade as long as it requires a node restart.

@vuldin
Copy link
Member Author

vuldin commented Dec 21, 2022

redpanda-2 (the one running node) will report an additional broker:

> kubectl exec -it redpanda-2 -n redpanda -c redpanda -- rpk cluster info
CLUSTER
=======
redpanda.10a0efc9-b1c3-46d0-b749-3935717861d2

BROKERS
=======
ID    HOST                                             PORT
0     redpanda-0.redpanda.redpanda.svc.cluster.local.  9093
1     redpanda-1.redpanda.redpanda.svc.cluster.local.  9093
2     redpanda-2.redpanda.redpanda.svc.cluster.local.  9093
3     redpanda-2.redpanda.redpanda.svc.cluster.local.  9093

@andrwng
Copy link
Contributor

andrwng commented Dec 21, 2022

I see on redpanda-2:

DEBUG 2022-12-21 14:37:47,773 [shard 0] cluster - members_manager.cc:362 - Initial node UUID map: {{5a94e0ad-ee47-40e1-9b5a-298652d0c65e}: 2, {858497a0-344e-4792-99d3-80c70c3eced1}: 0, {aa2c902e-e1c7-4137-b862-1d4c232517c0}: 1}

indicating the seeds talked to a node at redpanda-0 at 14:37. OTOH, redpanda-0's first logs start later than that:

Support: https://support.redpanda.com/ - Contact the support team privately
Product Feedback: https://redpanda.com/feedback - Let us know how we can improve your experience
Slack: https://redpanda.com/slack - Chat about all things Redpanda. Join the conversation!
Twitter: https://twitter.com/redpandadata - All the latest Redpanda news!


DEBUG 2022-12-21 14:38:49,154 seastar - smp::count: 1
DEBUG 2022-12-21 14:38:49,155 seastar - latency_goal: 0.00075

...at which point it loaded a different UUID from disk than the one seen by redpanda-2:

INFO  2022-12-21 14:38:49,227 [shard 0] main - application.cc:1412 - Loaded existing UUID for node: {7239458b-ee94-4a53-9e65-28d61a74c4b7}

@andrwng
Copy link
Contributor

andrwng commented Dec 22, 2022

The underlying issue here was that in this example, we were using persistentVolume=false, yielding an ephemeral disk that was wiped during certain operations (e.g. when restarting Redpanda to apply a config update). This is unsafe without additional operations, as RP expects wiped nodes to come with a corresponding decommissioning operations. Without this, we end up where enough nodes get wiped and restarted without removing the old node IDs (which now are down), and the controller Raft group is no longer able to maintain a quorum.

One thought was to add a decommission step in the preStop phase, but that won't work for 3-node clusters, since they will be unable to complete the decommission unless another node is added prior, given a replication factor of 3.

So it seems the path forward is that, after starting up, a node should check whether there are duplicates of its hostname in the cluster with other node IDs assigned to it. If so, we should decommission the other, presumably down, node ID.

@joejulian joejulian added the P0 Needs done immediately! label Dec 23, 2022
@vuldin
Copy link
Member Author

vuldin commented Dec 24, 2022

Just to associate this log message to this issue, the following message appears constantly after a node is replaced but not decommissioned:

WARN  2022-12-24 16:37:21,544 [shard 0] raft - [group_id:0, {redpanda/controller/0}] consensus.h:581 - received append_entries request addressed to different node: {id: {0}, revision: {0}}, current node: {id: {3}, revision: {0}}, source: {id: {2}, revision: {0}}

The following commands shows that the node count went from 3 to 4:

k exec -it -n redpanda redpanda-1 -c redpanda -- curl http://localhost:9644/v1/brokers | jq .

I think there is a time frame after which an unresponsive node gets decommissioned, but node 0 was not automatically decommissioned after 30 minutes. The cluster continued to function normally but in unhealthy state. rpk cluster info report correct information, but rpk cluster health showed 4 nodes with node 0 down.

The following rpk command to decommission the node fails:

> kubectl exec -it -n redpanda redpanda-2 -c redpanda -- rpk redpanda admin brokers decommission 0
unable to initialize admin client: request GET http://127.0.0.1:9644/v1/brokers/0 failed: Service Unavailable, body: "{\"message\": \"Unexpected error: Requested node does not exists\", \"code\": 503}"

command terminated with exit code 1

But this admin API request from the controller broker resolved the issue:

kubectl exec -it -n redpanda redpanda-2 -c redpanda -- curl -X PUT http://localhost:9644/v1/brokers/0/decommission

@vuldin
Copy link
Member Author

vuldin commented Dec 24, 2022

I've also seen the cluster come back up after helm upgrade complaining about request addressed to different node..., and so I follow the steps above and then the cluster complains Currently there is no leader controller elected in the cluster. When in this state, 2 of the 3 nodes think they are in a 2-node cluster, and the 3rd thinks it is in a 3-node cluster. (Disclaimer: this is on minikube so this could point to a minikube-related issue. There are times when I must minikube delete then minikube start... to get into a working state for some reason).

@andrwng
Copy link
Contributor

andrwng commented Dec 24, 2022

I think there is a time frame after which an unresponsive node gets decommissioned, but node 0 was not automatically decommissioned after 30 minutes. The cluster continued to function normally but in unhealthy state. rpk cluster info report correct information, but rpk cluster health showed 4 nodes with node 0 down.

Redpanda moves data away from an unavailable node, but it won't officially decommission (kick the node out of the group). The mismatch between the admin API and rpk seems strange though.

When in this state, 2 of the 3 nodes think they are in a 2-node cluster, and the 3rd thinks it is in a 3-node cluster.

There being no leader for the controller I think is explainable by each node having a new node ID after having all its data wiped, and the Raft group effectively being twice the size it should be. That said, it's odd that you're seeing mismatched cluster sizes, and that the sizes are small. When in this state, what are the node IDs? Are they new or old IDs?

@joejulian
Copy link
Contributor

So I think the gist of all this is that the pod roll is leaving the cluster split-brained. Somehow the pod data's being deleted (pvc deleted by something outside of helm or redpanda?).

@joejulian
Copy link
Contributor

When using emptyDir, whenever a Pod is terminated, when the pod starts up its storage is empty so it's assigned a new broker id. The old id is never decommissioned.

The fix is adding a process to decommission the broker that is no longer in the cluster.

@joejulian joejulian changed the title Upgrading to apply change that requires node restart fails Add a process to decommission broker IDs for pods that are no loger in the cluster Jan 12, 2023
@joejulian joejulian added enhancement New feature or request P1 High Priority - Sometime in the next 3 weeks chart:redpanda and removed bug Something isn't working P0 Needs done immediately! labels Jan 12, 2023
@joejulian joejulian removed their assignment Jan 12, 2023
@vuldin vuldin changed the title Add a process to decommission broker IDs for pods that are no loger in the cluster Add a process to decommission broker IDs for pods that are no longer in the cluster Jan 13, 2023
@alejandroEsc
Copy link
Contributor

@joejulian could this be done by a preStop script addition? Or is something that needs to happen at some other stage?

@alejandroEsc alejandroEsc removed their assignment Mar 14, 2023
@joejulian joejulian removed their assignment Apr 7, 2023
@RafalKorepta
Copy link
Contributor

There is some work done in old operator.

REF redpanda-data/redpanda#9750

@mattschumpert
Copy link

@joejulian this says a core estimate is 'pending' but if this is a noop for core (I think it is), can you change sizing-core to 'N/A' please

@joejulian joejulian added this to the 2023Q2 milestone May 22, 2023
@alejandroEsc alejandroEsc self-assigned this Jun 21, 2023
@alejandroEsc alejandroEsc removed their assignment Jul 21, 2023
@mattschumpert mattschumpert changed the title Add a process to decommission broker IDs for pods that are no longer in the cluster Add a Helm process to decommission broker IDs for pods that are no longer in the cluster Jul 27, 2023
@mattschumpert mattschumpert changed the title Add a Helm process to decommission broker IDs for pods that are no longer in the cluster Decommission via Helm Jul 27, 2023
@joejulian joejulian removed their assignment Jul 31, 2023
@JakeSCahill
Copy link
Contributor

This issue says it requires docs, but from the discussion, it looks like this will be an automated process.

What needs adding to the docs?

@RafalKorepta
Copy link
Contributor

@alejandroEsc @joejulian Correct me if I'm wrong, but I think docs should describe

2 modes of operations:

  • in helm chart deployment model one will be able to add side cars (automated controllers which will decommission ghost Redpandas)
  • in operator deployment model
    • allow to have side cars in the Redpanda deployment model
    • disable side cars, but the same functionality would be executed by operator

It is maybe irrelevant what automated decommission is actually doing behind the scenes, but at least describe options how to enable such functionality.

@mattschumpert
Copy link

💥

@alejandroEsc
Copy link
Contributor

@mattschumpert this is only partially completed. The changes on redpanda still need to be reviewed and merged, I'll fix this

@alejandroEsc alejandroEsc reopened this Sep 14, 2023
RafalKorepta added a commit to redpanda-data/redpanda-operator that referenced this issue Jun 19, 2024
Calling decommission in the case of changing Pod annotation might be not
possible if Pod was removed along with its annotation where previous
Redpanda ID was stored. There is dedicated function to handle Ghost
brokers.

Reference

redpanda-data/redpanda#9750

redpanda-data/redpanda#13298
redpanda-data/redpanda#13132

redpanda-data/helm-charts#253
redpanda-data/redpanda#12847
RafalKorepta added a commit to redpanda-data/redpanda-operator that referenced this issue Jun 21, 2024
Calling decommission in the case of changing Pod annotation might be not
possible if Pod was removed along with its annotation where previous
Redpanda ID was stored. There is dedicated function to handle Ghost
brokers.

Reference

redpanda-data/redpanda#9750

redpanda-data/redpanda#13298
redpanda-data/redpanda#13132

redpanda-data/helm-charts#253
redpanda-data/redpanda#12847
RafalKorepta added a commit to redpanda-data/redpanda-operator that referenced this issue Jun 28, 2024
Calling decommission in the case of changing Pod annotation might be not
possible if Pod was removed along with its annotation where previous
Redpanda ID was stored. There is dedicated function to handle Ghost
brokers.

Reference

redpanda-data/redpanda#9750

redpanda-data/redpanda#13298
redpanda-data/redpanda#13132

redpanda-data/helm-charts#253
redpanda-data/redpanda#12847
RafalKorepta added a commit to redpanda-data/redpanda-operator that referenced this issue Jul 2, 2024
Calling decommission in the case of changing Pod annotation might be not
possible if Pod was removed along with its annotation where previous
Redpanda ID was stored. There is dedicated function to handle Ghost
brokers.

Reference

redpanda-data/redpanda#9750

redpanda-data/redpanda#13298
redpanda-data/redpanda#13132

redpanda-data/helm-charts#253
redpanda-data/redpanda#12847
RafalKorepta added a commit to redpanda-data/redpanda-operator that referenced this issue Jul 2, 2024
Calling decommission in the case of changing Pod annotation might be not
possible if Pod was removed along with its annotation where previous
Redpanda ID was stored. There is dedicated function to handle Ghost
brokers.

Reference

redpanda-data/redpanda#9750

redpanda-data/redpanda#13298
redpanda-data/redpanda#13132

redpanda-data/helm-charts#253
redpanda-data/redpanda#12847
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment