Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track control plane join race conditions via kubeadm #2050

Closed
chuckha opened this issue Jan 13, 2020 · 3 comments
Closed

Track control plane join race conditions via kubeadm #2050

chuckha opened this issue Jan 13, 2020 · 3 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/backlog Higher priority than priority/awaiting-more-evidence.

Comments

@chuckha
Copy link
Contributor

chuckha commented Jan 13, 2020

Kubeadm has a bug with joining multiple control plane nodes simultaneously. Occasionally, the control plane will fail to join because not enough etcd members are ready.

We can work around this by setting a concurrency limit of 1 to the controller that is responsible for joining the control plane nodes (usually the infrastructure controller). This fixes the issue by making joining a node is a blocking operation so only one node can join at a time. If users set a concurrency limit > 1 then there is about a 20% chance failure of a control plane join failing when two or more control planes attempt to join a cluster simultaneously.

CAPI is able to work around this by retrying the join after failure, but there will be ominous logs and a slow down in how long it takes all control plane nodes to become ready.

Let's track the work kubernetes/kubeadm#1793 that will improve simultaneous control plane joining.

/kind bug
/milestone Next
/priority backlog

@k8s-ci-robot k8s-ci-robot added this to the Next milestone Jan 13, 2020
@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. priority/backlog Higher priority than priority/awaiting-more-evidence. labels Jan 13, 2020
@vincepri
Copy link
Member

Would #2016 also be a potential fix to this issue?

@chuckha
Copy link
Contributor Author

chuckha commented Jan 13, 2020

Hm yes, if folks aren't using the kubeadm control plane then they wouldn't run into this issue!

/close

This is a duplicate of #2016

@chuckha chuckha closed this as completed Jan 13, 2020
@neolit123
Copy link
Member

Kubeadm has a bug with joining multiple control plane nodes simultaneously. Occasionally, the control plane will fail to join because not enough etcd members are ready.

technically the problem is in etcd.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/backlog Higher priority than priority/awaiting-more-evidence.
Projects
None yet
Development

No branches or pull requests

4 participants