Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a doc about the kubeadm design and phases implementation #156

Merged
merged 1 commit into from
Jun 13, 2017

Conversation

luxas
Copy link
Member

@luxas luxas commented Feb 9, 2017

This document describes the functions of kubeadm and how the phases should be splitted up

@lukemarsden @roberthbailey @mikedanese @errordeveloper @pires @dmmcquay @jbeda

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Feb 9, 2017
Copy link

@dmmcquay dmmcquay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edits to make the doc more readable.

@@ -0,0 +1,110 @@
## Phases in kubeadm

`kubeadm init` does a lot of things. In order to make kubeadm the common building block for Kubernetes we want it to be, each logical phase kubeadm does during `init` should be invokable separately. Here's an explanation what kubeadm does in the different phases:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested edit:

kubeadm init does a lot. In order to make kubeadm the common building block for Kubernetes, each logical phase kubeadm does during init should be invokable separately. Here's an explanation what kubeadm does in these different phases:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the grammatical nits, I was in a hurry when writing this so I didn't really pay attention to my wording in this doc, will fix 👍


#### A note on constants / well-known values and paths

Somewhere we have to draw the line what should be configurable, what shouldn't and what should be hard-coded in the binary.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested edit:

We have to draw the line somewhere about what should be configurable, what shouldn't, and what should be hard-coded in the binary.

Somewhere we have to draw the line what should be configurable, what shouldn't and what should be hard-coded in the binary.

We've decided to make the Kubernetes directory `/etc/kubernetes` a constant in the application, since it is clearly the given path in a lot of places,
and the most intuitive location. Having that path configurable would for sure confuse readers of a on-top-of-kubeadm-implemented deployment solution.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested edit:

and the most intuitive location. Having that path configurable would confuse readers of an on-top-of-kubeadm-implemented deployment solution.


Somewhere we have to draw the line what should be configurable, what shouldn't and what should be hard-coded in the binary.

We've decided to make the Kubernetes directory `/etc/kubernetes` a constant in the application, since it is clearly the given path in a lot of places,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested edit:

We've decided to make the Kubernetes directory /etc/kubernetes a constant in the application, since it is clearly the given path in a majority of cases,

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I'm not even sure you need to justify the choice of /etc/kubernetes)

We've decided to make the Kubernetes directory `/etc/kubernetes` a constant in the application, since it is clearly the given path in a lot of places,
and the most intuitive location. Having that path configurable would for sure confuse readers of a on-top-of-kubeadm-implemented deployment solution.

Further, having the admin/kubelet kubeconfig path would
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't make sense to me.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, this is just an incomplete thought

- ??

If a given certificate and private key pair both exist, the generation will be skipped, and those files will validated and used if they are valid for the use-case.
This means the user can for example give his own CA which then will be used signing the rest of the certs.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested edit:

This means the user can, for example, give his own CA, which then will be used for signing the rest of the certs.


### Bootstrap the control plane

First, see so there is an etcd cluster somewhere:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested edit:

First, see that there is an etcd cluster somewhere:

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested edit (nit-picky, only because it's been called out already):

going just a bit further: "First, determine if there is an etcd cluster available:"


First, see so there is an etcd cluster somewhere:
- use an external one if external etcd params are given
- spin up one locally if external etcd should not be enabled. It should:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested edit:

spin up one locally if external etcd is not be enabled. It should:


Then kubeadm waits for the control plane to come up as static pods

The master node is patched; the master label is added and the master is tainted.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested edit:

The master node is patched, the master label is added, and the master is tainted.

- kube-proxy; the `kube-proxy` ServiceAccount can gets the privileges in the `system:node-proxier` ClusterRole
- kube-dns; the `kube-dns` ServiceAccount can gets the privileges in the `system:kube-dns` ClusterRole

In the future; the base layer of configuration for all kubelets in the cluster will be set in this phase.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested edit:

In the future the base layer of configuration for all kubelets in the cluster will be set in this phase.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

substitute ; for ,

@luxas
Copy link
Member Author

luxas commented Feb 9, 2017

cc @mattymo @v1k0d3n @justinsb @chrislovecnm
You're building tools on top of kubeadm; please comment here what you think about the planned, separately-invokable phases listed here

@luxas
Copy link
Member Author

luxas commented Feb 9, 2017

cc @alexbrand for Kismatic as well :)

@luxas
Copy link
Member Author

luxas commented Feb 9, 2017

cc @andrewrynhard

### Generate KubeConfig files on the master

There should be:
- a KubeConfig file for kubeadm to use itself and the admin: `/etc/kubernetes/admin.conf`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

who is the admin?

Copy link
Contributor

@pires pires Feb 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a kubeconfig file for the cluster operator to copy to any machine where he wants to control the cluster from.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does kubectl read that file by default for system wide settings? I must have missed that and thought the config had to be in ~/.kube/

### Generate a token.csv file for bootstrapping purposes

There should be:
- a `/etc/kubernetes/tokens.csv` file that contains one kubeadm-generated token
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gah, how did we end up with a CSV?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know, I don't like this at all either, but this is what we've got for bootstrapping :/

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm with @philips on this one; why not json?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@v1k0d3n the apiserver doesn't support reading tokens from json, so that's impossible for the time being

What we all want though, and what we're gonna implement is a dynamic way of managing these tokens so we can ditch this file totally (no one likes handling files on disk)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See https://kubernetes.io/docs/admin/kube-apiserver/

--token-auth-file string                                  If set, the file that will be used to secure the secure port of the API server via token authentication.

The new stuff is here: kubernetes/kubernetes#41281

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The usage of this file is gonna be replaced by the kubernetes/kubernetes#36101 and kubernetes/kubernetes#41281

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@luxas you should add your last comment to the document.

- contain altnames for
- the kubernetes' service's internal clusterIP and dns name (e.g. `10.96.0.1`, `kubernetes.default.svc.cluster.local`, `kubernetes.default.svc`, `kubernetes.default`, `kubernetes`)
- the hostname of the node
- the IP address of the default gateway
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why gateway IP? for snat tricks?

- the kubernetes' service's internal clusterIP and dns name (e.g. `10.96.0.1`, `kubernetes.default.svc.cluster.local`, `kubernetes.default.svc`, `kubernetes.default`, `kubernetes`)
- the hostname of the node
- the IP address of the default gateway
- optional extra altnames that can be specified by the user
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we include localhost and 127.0.0.1 in this list?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to have a cert signed for localhost. That seems like a security issue.

- the IP address of the default gateway
- optional extra altnames that can be specified by the user
- be in the `system:masters` organization
- a client certificate for the apiservers to connect to the kubelets securely (`apiserver-kubelet-client.crt`) using `ca.crt` as the CA with its private key (`apiserver-kubelet-client.key`). The certificate should:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one cert or one per client?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's only one client (the apiserver).


Somewhere we have to draw the line what should be configurable, what shouldn't and what should be hard-coded in the binary.

We've decided to make the Kubernetes directory `/etc/kubernetes` a constant in the application, since it is clearly the given path in a lot of places,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I'm not even sure you need to justify the choice of /etc/kubernetes)

- an API Server certificate (`apiserver.crt`) using `ca.crt` as the CA with its private key (`apiserver.key`). The certificate should:
- be a serving server certificate (`x509.ExtKeyUsageServerAuth`)
- contain altnames for
- the kubernetes' service's internal clusterIP and dns name (e.g. `10.96.0.1`, `kubernetes.default.svc.cluster.local`, `kubernetes.default.svc`, `kubernetes.default`, `kubernetes`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean 100.96.0.1 ? Or is it actually 10.96...?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10.96.0.1/12 is the default service subnet for kubeadm

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's the default but do we really need a /12 (10.96.0.0 - 10.111.255.255) for this?

- be a serving server certificate (`x509.ExtKeyUsageServerAuth`)
- contain altnames for
- the kubernetes' service's internal clusterIP and dns name (e.g. `10.96.0.1`, `kubernetes.default.svc.cluster.local`, `kubernetes.default.svc`, `kubernetes.default`, `kubernetes`)
- the hostname of the node
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably we aren't changing how kubeadm works here, so this is not the venue, but any notion of "hostname" is a huge red-flag for AWS compatibility

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be optional and not included as a SAN by default.

- the hostname of the node
- the IP address of the default gateway
- optional extra altnames that can be specified by the user
- be in the `system:masters` organization
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This precludes/complicates use with an existing CA infrastructure I believe. On the assumption though that doc is about the phases, and this is the existing kubeadm behaviour we are document, I'm going to stop calling out these issues. If that's not right, and we are actually decided what the keys should look like, let me know and I will review with that mindset.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is an excellent point. We can't be too prescriptive about fields of the certs, since we need to allow certs generated by organizations to work smoothly with kubeadm.

@justinsb
Copy link
Member

This looks cool. kops could probably use the RBAC generation phase today (although we would want to be sure that it was HA safe, and that it did not reapply if users deleted the RBAC policies). The RBAC layer feels almost like it could be a manifest - perhaps this meshes into the add-on-manager-v2 discussion (where I need to produce some form of strawman design doc based on the existing works).

@luxas
Copy link
Member Author

luxas commented Feb 10, 2017

Thanks for the comments everyone, I'll look at this tomorrow or so and update it with your suggestion and more detail 👍


### Certificates

First off, kubeadm generates certificate and private key pairs for different purposes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the "First off" part of the sentence.

- be a serving server certificate (`x509.ExtKeyUsageServerAuth`)
- contain altnames for
- the kubernetes' service's internal clusterIP and dns name (e.g. `10.96.0.1`, `kubernetes.default.svc.cluster.local`, `kubernetes.default.svc`, `kubernetes.default`, `kubernetes`)
- the hostname of the node
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be optional and not included as a SAN by default.

- the kubernetes' service's internal clusterIP and dns name (e.g. `10.96.0.1`, `kubernetes.default.svc.cluster.local`, `kubernetes.default.svc`, `kubernetes.default`, `kubernetes`)
- the hostname of the node
- the IP address of the default gateway
- optional extra altnames that can be specified by the user
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to have a cert signed for localhost. That seems like a security issue.

- the hostname of the node
- the IP address of the default gateway
- optional extra altnames that can be specified by the user
- be in the `system:masters` organization
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is an excellent point. We can't be too prescriptive about fields of the certs, since we need to allow certs generated by organizations to work smoothly with kubeadm.

- the IP address of the default gateway
- optional extra altnames that can be specified by the user
- be in the `system:masters` organization
- a client certificate for the apiservers to connect to the kubelets securely (`apiserver-kubelet-client.crt`) using `ca.crt` as the CA with its private key (`apiserver-kubelet-client.key`). The certificate should:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's only one client (the apiserver).

- be a client certificate (`x509.ExtKeyUsageClientAuth`)
- be in the `system:masters` organization
- a certificate for signing ServiceAccount Tokens (`sa.crt`) using `ca.crt` as the CA with its private key (`sa.key`). The certificate should:
- ??
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mikedanese

I think it needs to have the usage for signing certs (KeyUsageCertSign) and shouldn't have any other key usages set.


There should be:
- a KubeConfig file for kubeadm to use itself and the admin: `/etc/kubernetes/admin.conf`
- inside this file, a client certificate is generated from the `ca.crt` and `ca.key`. The client cert should:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess a client cert is ok here for bootstrapping, but once you have the cluster configured you should really throw it away in favor of configuring users so that things like audit logging can track who did what when.

- inside this file, a client certificate is generated from the `ca.crt` and `ca.key`. The client cert should:
- be a client certificate (`x509.ExtKeyUsageClientAuth`)
- be in the `system:masters` organization
- a KubeConfig file for kubelet to use: `/etc/kubernetes/kubelet.conf`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this kubelet is going to connect to the apiserver, shouldn't we use TLS bootstrapping to get it a client cert and give it a temporary token instead?

- use an external one if external etcd params are given
- spin up one locally if external etcd should not be enabled. It should:
- be spun up in a Pod
- listen on `localhost:2379`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will need to change once we support HA. In particular, we'll need to generate another set of certificates (possibly in a different cert universe) for the etcd instances to form a quorum.

kind: Node
spec:
taints:
- key: node-role.kubernetes.io
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davidopp does this sound good to you?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this taint supposed to signify? If it is for the master node(s) only, then use the same key and value (if any) as you use for the label on the master. (e.g. node-role.kubenetes.io/master with no value)

Also keep in mind that if you add a taint, then all pods that are supposed to run on the node will need a toleration for that taint.

Copy link
Contributor

@pires pires left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far this LGTM.

@luxas this is a good job, particularly knowing you're the most active code contributor to kubeadm. Hat off to you!

- an API Server certificate (`apiserver.crt`) using `ca.crt` as the CA with its private key (`apiserver.key`). The certificate should:
- be a serving server certificate (`x509.ExtKeyUsageServerAuth`)
- contain altnames for
- the kubernetes' service's internal clusterIP and dns name (e.g. `10.96.0.1`, `kubernetes.default.svc.cluster.local`, `kubernetes.default.svc`, `kubernetes.default`, `kubernetes`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's the default but do we really need a /12 (10.96.0.0 - 10.111.255.255) for this?

- contain altnames for
- the kubernetes' service's internal clusterIP and dns name (e.g. `10.96.0.1`, `kubernetes.default.svc.cluster.local`, `kubernetes.default.svc`, `kubernetes.default`, `kubernetes`)
- the hostname of the node
- the IPv4 address of the default route
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this exactly, the apiserver pod IP?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what's returned from 0.0.0.0, the IP address from the default (main) net interface

- the kubernetes' service's internal clusterIP and dns name (e.g. `10.96.0.1`, `kubernetes.default.svc.cluster.local`, `kubernetes.default.svc`, `kubernetes.default`, `kubernetes`)
- the hostname of the node
- the IPv4 address of the default route
- optional extra altnames that can be specified by the user
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't seen this in the form of flags or configuration (haven't updated on the configuration API WIP PR). Are we planning ahead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been possible from the beginning by using --api-external-dns-names, but it will probably be exposed in an other shape in the future (via the API Group)

- the hostname of the node
- the IPv4 address of the default route
- optional extra altnames that can be specified by the user
- ??be in the `system:masters` organization, this might not be needed??
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is DNS integrated with RBAC in any way? I don't think so.

- ??be in the `system:masters` organization, this might not be needed??
- a client certificate for the apiservers to connect to the kubelets securely (`apiserver-kubelet-client.crt`) using `ca.crt` as the CA with its private key (`apiserver-kubelet-client.key`). The certificate should:
- be a client certificate (`x509.ExtKeyUsageClientAuth`)
- be in the `system:masters` organization
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is DNS integrated with RBAC in any way? I don't think so.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't related to DNS, but RBAC. Being in the system:master Organization (maps to Group in RBAC), basically makes you use sudo while talking to the cluster.

- spin up one locally if external etcd is not be enabled. It should:
- be spun up in a Pod
- listen on `localhost:2379`
- set `PodSpec.SecurityContext.SELinuxOptions.Type=spc_t`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you provide a link or a little explanation why?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

```

If self-hosting should be enabled, do a little bit more:
- TODO
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should help here. When does this doc need to be merged?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the design is fully settled and agreed on :)


### Configure the bootstrap tokens

This phase configures the BootstrapSigner to create the public `cluster-info` ConfigMap in the `kube-public` namespace
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please provide a link, even if to just the PR, to what the BootstrapSigner is?

In the future, the base layer of configuration for all kubelets in the cluster will be set in this phase.

Then the rest of the addons follow:
- apply the CNI Network Provider file if specified
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add, make sure CNI plug-in supports the new RBAC mode if RBAC is enabled.

- apply the CNI Network Provider file if specified
- deploy kube-proxy as a DaemonSet
- the credentials to the master come from the ServiceAccount
- the location of the master comes from a ConfigMap
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need the pod network or is this gone now?

First off, kubeadm generates certificate and private key pairs for different purposes.
Certificates are stored by default in `/etc/kubernetes/pki`.

There should be:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should also be a CA keypair and client certificate for the Authenticating Proxy described here: https://kubernetes.io/docs/admin/authentication/#authenticating-proxy. It's near zero cost in the kube-apiserver to have it, there's no exposure unless actual machine access is had, and future extension stories (https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/kubernetes-sig-api-machinery/o2v-HjAHk0k/Rg4lNeEACQAJ) rely upon it.


The master node is patched, the master label is added, and the master is tainted.

The proposed master label for `v1.6` is `node-role.kubernetes.io/master=true`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a value of "true" is probably not useful. Why not just have the key (node-role.kubernetes.io/master) and no value?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes so much sense to me that I now feel dumb because I didn't think about it first.

@luxas
Copy link
Member Author

luxas commented Feb 20, 2017

Updated this now with a lot more detail about what should be done in kubeadm.
Also includes proposal for kubeadm phase

- `--admission-control` to `NamespaceLifecycle,LimitRanger,ServiceAccount,PersistentVolumeLabel,DefaultStorageClass,ResourceQuota` or whatever the recommended set of admission controllers is at the time
- `--storage-backend` to `etcd3`. Support for `etcd2` in kubeadm is dropped.
- `--kubelet-preferred-address-types` to `InternalIP,ExternalIP,Hostname` so `kubectl logs` and other apiserver -> kubelet communication works in environments where the hostnames of the nodes aren't resolvable.
- `--requestheader-username-headers=X-Remote-User`, `--requestheader-group-headers=X-Remote-Group`, `--requestheader-extra-headers-prefix=X-Remote-Extra-`, --requestheader-allowed-names=front-proxy-client` so the front proxy communication is secure.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing a "`" here.

Copy link

@jbeda jbeda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty darn good to me.

One question on UX and please correct me if I'm wrong. Here are my assumptions:

  • There will be a limited number of flags for kubeadm init and kubeadm join. This is to keep this surface area simple.
  • There will be extra options exposed as flags on individual phases.
  • If users want access to those options they have 2 choices:
    • Run the phases individually
    • Create a config file (different ones for join and init) and pass that to the top level command.

- `--use-kubernetes-version=${STABLE_VERSION}`
- `--self-hosted`

What would `kubeadm init` actually do?
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not 100% necessary for v1 of this but it would be awesome if there were a kubeadm init --plan that would just output this for the options that are passed in. In this case users can understand what kubeadm is planning on doing in terms of the phases and make it clear that kubeadm is really just a bundle command.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

- the hostname of the node
- I guess this might be a requested feature in opinionated setups, but might be a no-no in more advanced setups. Consensus here?
- the IPv4 address of the default route
- optional extra altnames that can be specified by the user
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should describe how. Ideally not just a massive command line arg list, but also a config file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything described here will be available to config via the configfile which is going to expose a nice (the nicest) way of actually setting a kubeadm cluster up.
I should point that out much better...

- listen on `localhost:2379` and use `HostNetwork=true`
- have `PodSpec.SecurityContext.SELinuxOptions.Type=spc_t` set because of https://github.com/kubernetes/kubeadm/issues/107
- be at a minimum version `3.0.14`
- make a volume out to the host on a given host path
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to be certain we don't resource starve CPU on etcd + we need to make certain we open up ulimit caps ~ 65536 as a sane default.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@timothysc Could you send a quick PR on the ulimit thing for etcd?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I dug on this and found out that we still don't support ulimit on pods, but we can override dockers startup defaults.. :-/

- `--use-kubernetes-version=${STABLE_VERSION}`
- `--self-hosted`

What would `kubeadm init` actually do?
Copy link
Member

@fabriziopandini fabriziopandini Apr 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to have: Additional aliases for each phase with a numbered prefix. e.g.

  • kubeadm phase 1-certs ... (where 1-certs is alias for certscertificates`)
  • kubeadm phase 2-kubeconfig ... (where 2-kubeconfig is alias for kubeconfig)

Those aliases could help users in understanding/using the right sequence of phases, and also in having a better experience with autocompletion.

- `--image-repository=gcr.io/google_containers`
- Outputs:
- Applies the built-in `kube-dns` manifest to the API Server

Copy link
Member

@fabriziopandini fabriziopandini Apr 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to provide a more consistent approach for outputs across the entire kubeadm phase UX, harmonising current approaches:

  • kubeadm phase certs selfsign use --cert-dir=
  • kubeadm phase kubeconfig clientcert writes to stdout
  • kubeadm phase controlplane * writes to /etc/kubernetes/manifests/*
  • kubeadm phase addons cluster-info apply directly to the apiserver
  • eg

A first draft for possible alternative solution is:

  • by default all kubeadm phase commands should provide the same experience of kubeadm init:
    • kubeadm phase certs and kubeadm phase token writes to --cert-dir, which defaults to /etc/kubernetes/pki
    • kubeadm phase kubeconfig clientcert writes to /etc/kubernetes/; to be defined a rule for deriving filename from --client-name=(this is easier to implement if we will specialize kubeconfig phase)
    • kubeadm phase controlplane * writes to /etc/kubernetes/manifests/*.yaml
    • kubeadm phase addons * apply directly to the apiserver
  • if invoked with a --stdoutflag, all kubeadm phase commands should write to stdout; the output for the most of the commands is a yaml string or a sequence of kubectl commands; to be defined a format for kubeadm phase certs (alternative support --stdout only where it make sense)

- Outputs:
- Creates the `cluster-info` ConfigMap in the `kube-public` namespace
- Exposes it publicly with RBAC bindings
- `kubeadm phase addons self-hosted`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be that considering self-hosted as an addons can be confusing for users.
I suggest to treat self-hosted phase separately from addons

- `sa.{pub,key}`
- `front-proxy-ca.{crt,key}`
- `front-proxy-client.{crt,key}`
- `kubeadm phase kubeconfig clientcert`
Copy link
Member

@fabriziopandini fabriziopandini Apr 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if offering a generic kubeconfig clientcert/token phase can offer the simplest choice for users, because it requires to call this phase 4 times (kubelet,scheduler,controller-manager,admin), each one with very specific parameters.

A draft alternative proposal, might be specializing kubeconfig phase with dedicate sub-commands:

  • kubeconfig kubelet (Inputs: --api-server, --cert-dir)
  • kubeconfig scheduler (Inputs: --api-server, --cert-dir)
  • kubeconfig controller-manager (Inputs: --api-server, --cert-dir)
  • kubeconfig admin (Inputs: --api-server, --cert-dir)
    All commands should produce a clientCert (default mode) expet if if --token flag is provided)

Additionally

  • it could be useful to support a kubeconfig all (or simply kubeconfig without subcommands) allowing to generate all 4 default config files in a single action
  • it could be useful to support a kubeconfig user (Inputs: --api-server, --cert-dir, --name, --organization), allowing to create kubeconfig files fore other admins/other cluster users
  • for kubeconfig admin and kubeconfig user it could be useful to to generate also an alternative output format, providing kubectl config set-cluster, kubectl config set credentials commands linking to separated certificates files (e.g. useful for admin managing more that one cluster... > with a kubeconfig containing many clusters/credentitals)

- `--cert-dir=/etc/kubernetes/pki`
- Output:
- The kubeconfig is `stdout`
- `kubeadm phase controlplane etcd`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it could be useful to support a controlplane all (or simply controlplane without subcommands) allowing to generate all control plane components in a single action

- The kubeconfig is `stdout`
- `kubeadm phase controlplane etcd`
- Inputs:
- `--image=gcr.io/google_containers/etcd-${GOARCH}:3.0.14-kubeadm`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to keep an approach more consistent with kubeadm init, providing flags --image-repository=gcr.io/google_containersand --use-kubernetes-version=${STABLE_VERSION}instead of--image=...`

- `kubeadm phase controlplane etcd`
- Inputs:
- `--image=gcr.io/google_containers/etcd-${GOARCH}:3.0.14-kubeadm`
- `--data-dir=/var/lib/etcd`
Copy link
Member

@fabriziopandini fabriziopandini Apr 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By extending etcd flags, it is possible to provide with a minimal effort an early support for creating HA clusters, e.g.:

  1. With kubeadmin init, create the first master with
    • --apiserver-advertise-address and --cert-altnames properly set
    • an etcd.yaml that contains instructions for spinning up an etcd instance that should be member of an etcd cluster
  2. Manually copy /etcd/kubernetes folder to a second and third masters (probably also necessary to replace kubelet.conf with a new one containing new credentials, created with kubeconfig user)
  3. Manually setup a LB/DNS in front of all api-servers (matching with --apiserver-advertise-address and --cert-altnames)
  4. Join nodes

In order to support this a possibile solution is:

  • add --name flag to etcd command, and assign to it a unique name (e.g. the nodename value retrived from Downward API)
  • add a --etcd-cluster-discovery flag supporting a dynamic approach for setting up an etcd cluster (no need of knowing the name/ip of nodes in advance, but access to internet/to another etcd instance required)
  • add a `--etcd-cluster-members' flag supporting a static approach for setting up an etcd cluster (required knowing the name/ip of nodes in advance, no need of access to internet/to another etcd instance)
  • add other flag to etcd command required in a cluster configuration (--initial-advertise-peer-urls and
    --listen-peer-urls)

--image=${IMAGE_REPOSITORY}/kube-scheduler-${ARCH}:${USE_KUBERNETES_VERSION}

kubectl taint node $(hostname -i) node-role.kubernetes.io/master:NoSchedule
kubectl label node $(hostname -i) node-role.kubernetes.io/master=""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using kubectlin the middle of a kubeadmsequence of phasessounds a little bit strange. With low priority (but also low effort), it might make sense to implement a phase also for this step.

This phase creates the `cluster-info` ConfigMap in the `kube-public` namespace as defined in https://github.com/kubernetes/community/blob/master/contributors/design-proposals/bootstrap-discovery.md
- The `ca.crt` and the address/port of the apiserver is added to the `cluster-info` ConfigMap in the `kubeconfig` key
- Exposes the `cluster-info` ConfigMap to unauthenticated users (i.e. users in RBAC group `system:unauthenticated`)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at kubeadm init code, it seems there are also two additional steps here: CreateServiceAccounts and CreateRBACRules.
In order to make kubeadm init a full "bundle command", also those steps should be implemented as phases.

@timothysc
Copy link
Member

I'm of the opinion we should merge this and move along with updates as 1.8 progresses.

`kubeadm init` and `kubeadm join` together provides a nice user experience for creating a best-practice but bare Kubernetes cluster from scratch.
However, it might not be obvious _how_ kubeadm does that.

This document strives to explain what types of phases of work happen under the hood.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest: "This document strives to explain the phases of work that happen under the hood."

However, it might not be obvious _how_ kubeadm does that.

This document strives to explain what types of phases of work happen under the hood.
Also included is ComponentConfiguration API types for talking to kubeadm programmatically.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you relying on the componentconfig API group? Be careful with this, that group will be deleted soon (as soon as every component's config has been moved out of it).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's what I think the new ComponentConfiguration stuff looks like: https://github.com/kubernetes/kubernetes/blob/master/cmd/kubeadm/app/apis/kubeadm/v1alpha1/types.go

in it's own API group and everything. Not coupled to anything else, just reusing the API machinery

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok nice!

- The `kubeadm join` request to add a node should be automatically approved
- Extendable
- It should for example _not_ favor any network provider, instead configuring a network is out-of-scope
- Should provide a config file that can be used for customizing various parameters
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this versioned?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, see the code above

@mtaufen
Copy link

mtaufen commented Jun 13, 2017

This is really cool!

@luxas
Copy link
Member Author

luxas commented Jun 13, 2017

Thank you @mtaufen!

@timothysc Totally agree -- will merge now and follow up as we go.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA.
Projects
None yet
Development

Successfully merging this pull request may close these issues.