Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Control plane should scale up serially, not in parallel #2016

Closed
dlipovetsky opened this issue Jan 7, 2020 · 5 comments
Closed

Control plane should scale up serially, not in parallel #2016

dlipovetsky opened this issue Jan 7, 2020 · 5 comments
Assignees
Labels
area/control-plane Issues or PRs related to control-plane lifecycle management priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Milestone

Comments

@dlipovetsky
Copy link
Contributor

dlipovetsky commented Jan 7, 2020

We recently implemented control plane scale up. If the desired number of control plane machines exceeds the actual number, and at least one control plane machine exists, the controller will create multiple machines in parallel. (Once created, each machine runs kubeadm join --control-plane).

I think we should scale up control planes serially. Before creating an additional control plane machine, we should verify that every etcd member has started. We could also verify that the etcd cluster has quorum (if it does not have quorum, creating a new machine might be a waste of time and resources. On the other hand, if it does have quorum, it might lose it after we create the machine)

Today, etcd still recommends that the cluster be scaled up or down one member at a time. Moreover, there are known issues with running kubeadm join --control-plane in parallel.

In the future, we will likely be able to scale up in parallel by using etcd non-voting members (learners). Kubeadm is already exploring this idea.

/cc @detiber @randomvariable @chuckha

@dlipovetsky
Copy link
Contributor Author

#1902 is required to scale the control plane serially.

@dlipovetsky
Copy link
Contributor Author

I think we meant to spell this out in the proposal; I will file a PR to update it.

@ncdc ncdc added area/control-plane Issues or PRs related to control-plane lifecycle management priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Jan 8, 2020
@ncdc ncdc added this to the v0.3.0 milestone Jan 8, 2020
@chuckha
Copy link
Contributor

chuckha commented Jan 13, 2020

Some interesting information regarding this issue: kubernetes/kubeadm#2001 (comment)

It's tracked upstream here: kubernetes/kubeadm#1793

Until that is fixed we should indeed join serially or block on kubeadm join call and only run at concurrency level 1.

@ncdc
Copy link
Contributor

ncdc commented Mar 2, 2020

Fixed by #2335
/close

@k8s-ci-robot
Copy link
Contributor

@ncdc: Closing this issue.

In response to this:

Fixed by #2335
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/control-plane Issues or PRs related to control-plane lifecycle management priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants