Update KEP kubeadm join --master #2331

fabriziopandini · 2018-06-29T14:47:49Z

Update to the kubeadm join --master KEP

/area kubeadm
/sig sig-cluster-lifecycle

@kubernetes/sig-cluster-lifecycle-pr-reviews

/CC @luxas @timothysc

chuckha · 2018-06-29T14:58:22Z

How would folks feel about kubeadm join --control-plane ?

Thumbs up/Thumbs down would work as a response

neolit123 · 2018-06-29T15:24:29Z

How would folks feel about kubeadm join --control-plane ?

i think --master is fine, unless --control-plane is considered more descriptive.

luxas

Thanks a lot @fabriziopandini!
The initial reaction is that this looks good, but I have comments/questions on how to manage the certs and identities in this scenario.

luxas · 2018-06-29T22:17:23Z

keps/sig-cluster-lifecycle/0015-kubeadm-join-master.md

+  - "@fabriziopandini"
+owning-sig: sig-cluster-lifecycle
+reviewers:
+  - "@cha”


on github @chuckha

luxas · 2018-06-29T22:17:46Z

keps/sig-cluster-lifecycle/0015-kubeadm-join-master.md

+owning-sig: sig-cluster-lifecycle
+reviewers:
+  - "@cha”
+  - "@jdtibier"


luxas · 2018-06-29T22:18:53Z

keps/sig-cluster-lifecycle/0015-kubeadm-join-master.md

+using kubeadm in combination with some scripts and/or automation tools (e.g.
+[this](https://kubernetes.io/docs/setup/independent/high-availability/)), this KEP was
+designed with the objective to introduce an upstream simple and reliable solution for
+achieving the same goal.


... with important non-goals. Say clearly (also in the summary) that this doesn't solve every case or even the full end-to-end flow automatically

luxas · 2018-06-29T22:19:23Z

keps/sig-cluster-lifecycle/0015-kubeadm-join-master.md

+  user stories for creating an highly available Kubernetes cluster, but instead
+  focuses on:
+
+  - Defining a generic and extensible flow for bootstrapping an HA cluster, the


a cluster consisting of multiple masters instead of HA?

luxas · 2018-06-29T22:20:04Z

keps/sig-cluster-lifecycle/0015-kubeadm-join-master.md

+
+    Higher-level tools could create nodes in parallel (both masters and workers)
+    for reducing the overall cluster startup time.
+    `kubeadm join --master` should support natively this practice without requiring


and how does this work?

You can create in parallel bootstrap master, secondary masters and workers because:

all the joining nodes (secondary masters and workers) use the same discovery mechanism, that "automatically" waits for the bootstrap master to complete its setup

the joining nodes (secondary masters and workers) can join in any order

luxas · 2018-06-29T22:37:26Z

keps/sig-cluster-lifecycle/0015-kubeadm-join-master.md

+
+#### Static workflow (advertise-address != `controlplaneAddress`)
+
+In case of a static bootstrap workflow the final layout of the controlplane - the number, the


what is technically the difference between these?

This was discussed in some kubeadm office hours meetings, and I wrote also a blog post with all the details https://blog.heptio.com/kubernetes-ha-under-x-ray-5d05f552c9f

luxas · 2018-06-29T22:42:13Z

keps/sig-cluster-lifecycle/0015-kubeadm-join-master.md

+(`kubeadm join --master` will fail immediately), but nothing in this proposal should
+prevent to address this in subsequent phases.
+
+#### Strategies for distributing cluster certificates


Please dive into how the CSR API could be utilized in an external CA-world.
It'd be cool to just use the CSR API, but we're stuck needing the front-proxy clientcert for the API server and the SA key that is not bound to the CA.

I would prefer to keep this out of scope of this scope, because it does not apply specifically to the multi-master scenario (e.g. it is relevant also for the existing kubeadm join workflow).

Never the less I totally agree the we should document how the CSR API could be used in an external CA-world, but I don't have real expertise in this. Eventually, let's raise the topic at the sig-meeting

luxas · 2018-06-29T22:42:40Z

keps/sig-cluster-lifecycle/0015-kubeadm-join-master.md

+Nothing in this proposal prevents implementation of `kubeadm upgrade` for HA cluster.
+
+Further detail will be provided in a subsequent release of this KEP when all the detail
+of the `v1Beta1` release of kubeadm api will be available (including a proper modeling


nit: v1beta1

luxas · 2018-06-29T22:43:21Z

keps/sig-cluster-lifecycle/0015-kubeadm-join-master.md

+
+## Drawbacks
+
+The kubeadm join --master workflow requires that some condition are satisfied at `kubeadm-init` time,


kubeadm init

luxas · 2018-06-29T22:46:40Z

keps/sig-cluster-lifecycle/0015-kubeadm-join-master.md

+proposal (`kubeadm join --master` will fail immediately), but nothing in this proposal
+should prevent to address this in subsequent phases.
+
+#### `kubeadm upgrade` for HA clusters


can you dive into exactly how and in which order the upgrade should be performed.
i.e. upgrade etcd fully first (in some way out of scope for this proposal probably), then remove one master from the loadbalancer, upgrade it, (add it back to the LB, or not). Upgrades without downtime supported officially or not? etc. etc.

The only prior art I'm aware of is https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade-ha/ , but I'm not sure there is already a consensus about how to do this, and so I prefer to get a first release of the KEP approved and the iterate on the upgrade part.

luxas · 2018-06-29T22:48:12Z

/assign
/assign @timothysc

fabriziopandini · 2018-07-02T11:06:54Z

@luxas I provided answer to your questions and addressed all comments. Let me know if there is something that requires further deep dives.
PS. I'm planning to discuss briefly this KEP during kubeadm office hours on wendesday, as well

timothysc

/approve

I'd like @chuckha and @detiber to lgtm .

k8s-ci-robot · 2018-07-03T14:25:31Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: timothysc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~keps/OWNERS~~ [timothysc]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

fabriziopandini · 2018-07-12T05:58:00Z

@chuckha @detiber kindly ping

chuckha

I'm trying to understand exactly how this is different than kubeadm init and I'm having a bit of trouble.

Under the section that outlines kubeadm join --master [New step], I don't see anything new.

The static workflow case is taken care of today with a carefully crafted kubeadm init --config=config.yaml. This is documented in official docs. Its almost exactly this document:

A user must copy certificates across nodes
A user must run some command on the new node

The dynamic case is more interesting, but I think this document handwaves over the manual steps the user will have to take regardless. For instance, the api-serving-cert will have to be regenerated on the bootstrapping master or copied over from the new master to the bootstrapping master.

What are we getting from a kubeadm join --master that kubeadm init --config=config.yaml is not getting us?

chuckha · 2018-07-12T14:14:20Z

keps/sig-cluster-lifecycle/0015-kubeadm-join-master.md

+
+- Provide support both for dynamic and static bootstrap flow
+
+  At the time a user is running `kubeadm init`, he might not know what


please use gender neutral pronouns throughout. This should be

At the time a user is running kubeadm init, they might ...

chuckha · 2018-07-12T14:16:24Z

keps/sig-cluster-lifecycle/0015-kubeadm-join-master.md

+  in advance the target layout of the controlplane instances (the number, the name and the IP
+  of master nodes).
+
+- Support different etcd deployment scenarios, and more specifically run master nodes components


we're calling these stacked control plane nodes in the upstream documentation if you want to use that language here for consistency.

chuckha · 2018-07-12T14:19:44Z

keps/sig-cluster-lifecycle/0015-kubeadm-join-master.md

+- This proposal doesn't include a solution for etcd cluster management (but nothing in this proposal should
+  prevent to address this in future).
+
+  > At the time of writing, the CoreOS recommended approach for etcd is to run


This section adds confusion -- where etcd is run entirely depends on what the goals of the cluster are. Kubeadm provides instructions for both, single etcd (traditional kubeadm workflow), external etcd (bootstrapped with kubeadm) and stacked control planes.

I think you can delete this paragraph if you want.

chuckha · 2018-07-12T16:44:55Z

keps/sig-cluster-lifecycle/0015-kubeadm-join-master.md

+  neither in the initial proposal nor in the foreseeable future (but nothing in this proposal should
+  explicitly prevent to reconsider this in future as well).
+
+- This proposal doesn't provide an automated solution for transferring the CA key and other required


I cannot wait until we solve this

chuckha · 2018-07-12T16:53:04Z

keps/sig-cluster-lifecycle/0015-kubeadm-join-master.md

+
+- if the user is not planning to distribute the apiserver certificate among masters, kubeadm
+  will generate a new apiserver serving certificate with the required SANS
+- if the user is planning to distribute the apiserver certificate among masters, he/she/the


can you delete the "he/she" part? This reads fine with "the operator".

fabriziopandini · 2018-07-13T11:30:02Z

@chuckha thanks for your comments!
Feedbacks are addressed;, with regards to you general question, yes, you are right, the proposed solution re-uses a lot of what already exists in kubeadm init and kubeadm join, but this is intentional.

What we are getting on top is the fact that we are going to simplify what the user has to do for getting a multi-master cluster up and running, and at the same time, we are reducing the probability to make errors along the way. This is basically achieved by:

delegating to kubeadm the task to ensure that exactly the same kubeadm-config_configMap is used on all masters
checking that the cluster/the kubeadm-config is properly configured for HA (kind of pre-flight checks on the "carefully creafted config.yaml" )
blocking users trying to create multi masters with configurations we don't want to support as a sig (e.g. HA with self-hosted control plane)

On top of that :

There are also things that kubeadm join --master doesn't do with respect to kubeadm init (it doesn't create a new token, it doesn't re-write / re-create all the RBAC rules, config maps, addons)
We can provide a better/cleaner solution for steps that should be done in a slightly different way on a secondary master with respect to the bootstrap master (e.g. updating the kubeadm-config map adding info about the new master instead of creating a new configMap from scratch *this is related to API redesing).

Does this sound good to you?

PS. the above points are documented in the alternatives paragraphs; let me know if this should be further clarified

neolit123 · 2018-07-13T13:02:39Z

BTW, during one of the meeting last week we've agreed that --master should to be used, and --control-plane is fine instead.
should that be reflected in the KEP?

chuckha · 2018-07-13T15:29:11Z

/lgtm

Thanks for the detailed response @fabianofranz. I think the alternatives section is good. I think you have to intimately know the code to get kubeadm init --config config.yaml to function correctly and kubeadm join --master will be a great addition.

pbarker · 2018-07-13T21:10:45Z

@fabianofranz I believe this KEP should be number 14 and the NEXT_KEP_NUMBER should be 15 looking at https://github.com/kubernetes/community/blob/master/keps/0000-kep-template.md#title

k8s-ci-robot requested review from luxas and timothysc June 29, 2018 14:47

luxas reviewed Jun 29, 2018

View reviewed changes

k8s-ci-robot assigned luxas and timothysc Jun 29, 2018

fabriziopandini force-pushed the kubeadm-join-master2 branch from efc199f to fd91c5b Compare July 2, 2018 11:01

fabriziopandini force-pushed the kubeadm-join-master2 branch from fd91c5b to 1c55635 Compare July 2, 2018 11:10

timothysc reviewed Jul 3, 2018

View reviewed changes

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 3, 2018

chuckha reviewed Jul 12, 2018

View reviewed changes

kubeadm-join-master2

e7520a4

fabriziopandini force-pushed the kubeadm-join-master2 branch from 1c55635 to e7520a4 Compare July 13, 2018 11:27

k8s-ci-robot assigned chuckha Jul 13, 2018

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 13, 2018

k8s-ci-robot merged commit ae474ac into kubernetes:master Jul 13, 2018

pbarker mentioned this pull request Jul 16, 2018

KEP: Dynamic Audit Configuration #2188

Merged

fabriziopandini deleted the kubeadm-join-master2 branch August 1, 2018 07:48


		#### Static workflow (advertise-address != `controlplaneAddress`)

		In case of a static bootstrap workflow the final layout of the controlplane - the number, the


		## Drawbacks

		The kubeadm join --master workflow requires that some condition are satisfied at `kubeadm-init` time,


		- Provide support both for dynamic and static bootstrap flow

		At the time a user is running `kubeadm init`, he might not know what

Update KEP kubeadm join --master #2331

Update KEP kubeadm join --master #2331

Conversation

fabriziopandini commented Jun 29, 2018

chuckha commented Jun 29, 2018

neolit123 commented Jun 29, 2018

luxas left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

luxas commented Jun 29, 2018

fabriziopandini commented Jul 2, 2018 • edited Loading

timothysc left a comment

Choose a reason for hiding this comment

k8s-ci-robot commented Jul 3, 2018

fabriziopandini commented Jul 12, 2018

chuckha left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fabriziopandini commented Jul 13, 2018

neolit123 commented Jul 13, 2018

chuckha commented Jul 13, 2018

pbarker commented Jul 13, 2018

fabriziopandini commented Jul 2, 2018 •

edited

Loading

chuckha left a comment •

edited

Loading