Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve kubeadm join experience #2656

Closed
bjornbouetsmith opened this issue Feb 14, 2022 · 8 comments
Closed

Improve kubeadm join experience #2656

bjornbouetsmith opened this issue Feb 14, 2022 · 8 comments
Labels
area/HA kind/support Categorizes issue or PR as a support question.

Comments

@bjornbouetsmith
Copy link

Choose one: FEATURE REQUEST/BUG

Versions

kubeadm version (use kubeadm version):v1.23.3

Environment:

  • Kubernetes version (use kubectl version):v1.23.3
  • Cloud provider or hardware configuration: irellevant
  • OS:Rocky Linux 8.5
  • Kernel: Linux kube150.root.dom 4.18.0-348.12.2.el8_5.x86_64 kubeadm join on slave node fails preflight checks #1 SMP Wed Jan 19 17:53:40 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
  • Container runtime (CRI):containerd
  • Container networking plugin (CNI):Calico
  • Others:

What happened?

When executing the cluster join command as a control-plane - it fails because all the required certificates are not available on the node that wants to join the cluster.
i.e. when executing a command similar to

kubeadm join k8s.root.dom:6443 --token abcdef.0123456789abcdef \
        --discovery-token-ca-cert-hash sha256:a32be5a71a6902bde72623c7ef99f8e5fe84a9f66c4de7e99535f4b54166259a \
        --control-plane

Output from the command is:

error execution phase preflight:
One or more conditions for hosting a new control plane instance is not satisfied.

[failure loading key for service account: couldn't load the private key file /etc/kubernetes/pki/sa.key: open /etc/kubernetes/pki/sa.key: no such file or directory, failure loading certificate for front-proxy CA: couldn't load the certificate file /etc/kubernetes/pki/front-proxy-ca.crt: open /etc/kubernetes/pki/front-proxy-ca.crt: no such file or directory, failure loading certificate for etcd CA: couldn't load the certificate file /etc/kubernetes/pki/etcd/ca.crt: open /etc/kubernetes/pki/etcd/ca.crt: no such file or directory]

Fixing it requires that the user manually copies all the certificates to the local node first e.g. by executing

scp -r root@kube1.root.dom:/etc/kubernetes/pki/ /etc/kubernetes

Rerunning the join command makes it possible to get further in the process, since it now actually tries to join the cluster

What you expected to happen?

I would expect that the join command automatically pulled the certificates from whatever secrets they probably live in.
If that is not possible, it would be nice that the cluster init command made that obvious - that any node needs to have a copy of ALL the certificates generated on the master node.

How to reproduce it (as minimally and precisely as possible)?

Run

On "master" node do

kubeadm init

On another node whatever the master node output as its cluster join command as a control-panel:

kubeadm init

Anything else we need to know?

  • I am using custom certificates located in /etc/kubernetes/pki

As a side node I can also tell that even though the command proceeds, it then fails because the certificates that I copied - now contains the CN for the master node - and that is obbiously not working on any other nodes.

So my guess is that whoever want to do this workaround - needs to copy only the "right" certificates - and not ALL as I did.

@neolit123
Copy link
Member

/kind support

have you seen this in the HA docs?
https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/high-availability/#steps-for-the-first-control-plane-node
there is the --upload-certs flag for kubeadm init and a phase to execute the upload on demand.

@k8s-ci-robot k8s-ci-robot added the kind/support Categorizes issue or PR as a support question. label Feb 14, 2022
@bjornbouetsmith
Copy link
Author

have you seen this in the HA docs? https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/high-availability/#steps-for-the-first-control-plane-node there is the --upload-certs flag for kubeadm init and a phase to execute the upload on demand.

I had not to be honest - I think that would be a good idea to have as output of the cluster init command - if its possible to do after the cluster has been inited that is.

I will try to recreate my test cluster and include the --upload-certs and see if that fixes my issue.

Thanks

@neolit123
Copy link
Member

neolit123 commented Feb 14, 2022

in the other ticket where you commented you've mentioned external CA. i don't recall how "upload-certs" interacts with that, but i'm assuming it will not try to upload the missing CA.key in that case. so please give it a try and report if it works.

obviously with this functionality we only "copy" the "certs" that are sharable between CP nodes (listed here):
https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/high-availability/#manual-certs

and if you are using external CA you have to sign the rest of the certs for secondary CP nodes.
so you might as well sign the new CP certs and distribute the shared ones manually.

I think that would be a good idea to have as output of the cluster init command - if its possible to do after the cluster has been inited that is.

we assume that users read the HA docs, that include a number of caveats.
those docs also mention how to upload the certs on demand.
init --upload-certs does output the full join command with --certificate-key

@bjornbouetsmith
Copy link
Author

bjornbouetsmith commented Feb 14, 2022

I will try to recreate my test cluster and include the --upload-certs and see if that fixes my issue.

That worked - thanks - and sorry for creating an issue for this.

I still think the output of cluster init could use a notice about this.

Also - I tried doing

sudo kubeadm init phase upload-certs --upload-certs

Which gave me this output:

[upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
[upload-certs] Using certificate key:
136e810b4b806799cad35b08caeed14302e41a88a41cbd210c25798ef6b4fee6

And then delete my certs on my "other" node - and rerunning the cluster control-plane join command - but since the original did not contain the --certificate-key it obviously did not work.

So another solution could be that the 'kubeadm init phase upload-certs --upload-certs' actually output the full cluster join command again - including the --certificate-key

So dum dum's like myself did not have to read documentation :-)

P.S. (if you do it on an existing cluster that is)

@bjornbouetsmith
Copy link
Author

we assume that users read the HA docs, that include a number of caveats. those docs also mention how to upload the certs on demand. init --upload-certs does output the full join command with --certificate-key

Assuming that people read documentation is probably okay, but making it easy is better - and it should be possible to make the output write just a little hint that if you did not use --upload-certs, then you have to manually run it - or copy the correct cerfificate files (with link to doc).

@neolit123
Copy link
Member

So another solution could be that the 'kubeadm init phase upload-certs --upload-certs' actually output the full cluster join command again - including the --certificate-key

the upload-certs phase does not know about the full join command and especially about the bootstrap token.
but kubeadm init ... --upload-certs does print the full join command with the cert key.

printing the join command for a new token is possible:

https://kubernetes.io/docs/reference/setup-tools/kubeadm/kubeadm-token/#cmd-token-create

the kubeadm token create ... has a --print-join-command flag where you can also include a custom --certificate-key (i.e. output from the upload-certs phase).

@bjornbouetsmith
Copy link
Author

bjornbouetsmith commented Feb 14, 2022

So another solution could be that the 'kubeadm init phase upload-certs --upload-certs' actually output the full cluster join command again - including the --certificate-key

the upload-certs phase does not know about the full join command and especially about the bootstrap token. but kubeadm init ... --upload-certs does print the full join command with the cert key.

printing the join command for a new token is possible:

https://kubernetes.io/docs/reference/setup-tools/kubeadm/kubeadm-token/#cmd-token-create

the kubeadm token create ... has a --print-join-command flag where you can also include a custom --certificate-key (i.e. output from the upload-certs phase).

okay - so the correct workflow if you want to join another node after the initial join token has expired is to do:

token=$(kubeadm token generate)
certkeyout=$(sudo kubeadm init phase upload-certs --upload-certs)
#copy the contents of certkeyout variable into an array and now a[1] contains the certificate key
readarray -td ':' a <<<"$certkeyout";declare -p a
certkey=${a[1]}
sudo kubeadm token create $token --print-join-command --certificate-key $certkey

That seems like a lot of work - I will definately try to remember this flow - and it would be nice if you could invoke one command with kubeadm - that did all of the above and just output a new "join command"

@neolit123
Copy link
Member

neolit123 commented Feb 14, 2022

That seems like a lot of work - I will definately try to remember this flow - and it would be nice if you could invoke one command with kubeadm - that did all of the above and just output a new "join command"

...or just use the original join command that init gave you and pass --certificate-key...
separate commands have separate purposes. "upload-certs" is part of the "init" command phases (and is optional). while "token *" is about managing tokens. it's not a great UX, but i don't think we will add a combined command for these (it has been discussed in the past)

thanks for testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/HA kind/support Categorizes issue or PR as a support question.
Projects
None yet
Development

No branches or pull requests

3 participants