🐛 graceful reset (etcd leave) doesn't work with KubeSpan #8057

smira · 2023-12-12T10:59:10Z

The machine leaves Discovery Service earlier than leaving etcd, so by the time etcd leave happens, links to the controlplane nodes are already broken, and etcd member can't communicate with other peers.

The fix is to defer leaving Discovery Service on reset to some later phase, when network connectivity via KubeSpan is no longer needed.

The text was updated successfully, but these errors were encountered:

Fixes siderolabs#8057 I went back and forth on the way to fix it exactly, and ended up with a pretty simple version of a fix. The problem was that discovery service was removing the member at the initial phase of reset, which actually still requires KubeSpan to be up: * leaving `etcd` (need to talk to other members) * stopping pods (might need to talk to Kubernetes API with some CNIs) Now leaving discovery service happens way later, when network interactions are no longer required. Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>

Fixes siderolabs#8057 I went back and forth on the way to fix it exactly, and ended up with a pretty simple version of a fix. The problem was that discovery service was removing the member at the initial phase of reset, which actually still requires KubeSpan to be up: * leaving `etcd` (need to talk to other members) * stopping pods (might need to talk to Kubernetes API with some CNIs) Now leaving discovery service happens way later, when network interactions are no longer required. Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com> (cherry picked from commit 10c59a6)

smira mentioned this issue Dec 12, 2023

Release 1.6.0 checklist #7561

Closed

smira self-assigned this Dec 12, 2023

smira mentioned this issue Dec 13, 2023

fix: leave discovery service later in the reset sequence #8060

Merged

talos-bot closed this as completed in 10c59a6 Dec 13, 2023

github-actions bot locked as resolved and limited conversation to collaborators Jun 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🐛 graceful reset (etcd leave) doesn't work with KubeSpan #8057

🐛 graceful reset (etcd leave) doesn't work with KubeSpan #8057

smira commented Dec 12, 2023

🐛 graceful reset (etcd leave) doesn't work with KubeSpan #8057

🐛 graceful reset (etcd leave) doesn't work with KubeSpan #8057

Comments

smira commented Dec 12, 2023