Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failed to run Kubelet: unable to load bootstrap kubeconfig: stat /etc/kubernetes/bootstrap-kubelet.conf: no such file or directory #8242

Closed
khmarochos opened this issue Nov 28, 2021 · 9 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@khmarochos
Copy link

khmarochos commented Nov 28, 2021

I'm trying to deploy a K8S cluster with my own CA.

The environment consists of 9 virtual machines in my own virtualization cluster (master1, master2, etcd1, etcd2, etcd3, worker1, worker2, worker3, worker4).

This is not the first K8S cluster I deploy with Kubespray, but I never tried to use my own CA before. I'm following the hints that I found at #5687.

I've created the Root CA certificate (self-signed) and the Intermediate CA certificate (signed by Root CA). The chain of 2 certificates is here: https://gist.github.com/melnik13/328238e82c096a02d9f65a825ef270a8. The Intermediate CA's key is not encrypted, here it is: https://gist.github.com/melnik13/233f5019fd56bec67787a78bb5dcd477 (I don't consider it as something secret, this is a testing environment).

I install the CA certificate chain to master1 and master2 (as /etc/kubernetes/ssl/ca.crt and /etc/kubernetes/ssl/front-proxy-ca.crt) and to etcd1, etcd2, etcd3 (as /etc/ssl/etcd/ssl/ca.pem). I install the key of the Intermediate CA to master1 and master2 (as /etc/kubernetes/ssl/ca.key and /etc/kubernetes/ssl/front-proxy-ca.key) and to etcd1, etcd2, etcd3 (as /etc/ssl/etcd/ssl/ca-key.pem). Hopefully, that's correct, though I'm not sure... :-)

The operating system is CentOS 7 x64 with all the updates installed, here are more details:

Linux 3.10.0-1160.45.1.el7.x86_64 x86_64
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

Ansible version is 2.10.11, here are more details:

ansible 2.10.11
  config file = None
  configured module search path = ['/home/k8s-manager/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/local/lib/python3.6/site-packages/ansible
  executable location = /usr/local/bin/ansible
  python version = 3.6.8 (default, Nov 16 2020, 16:55:22) [GCC 4.8.5 20150623 (Red Hat 4.8.5-44)]

Python version is 3.6.8.

Kubespray version is 2.17, the commit ID (SHA-1 hash) is a2af9a7.

Network plugin is Calico.

Here's the full inventory's dump: https://gist.github.com/melnik13/5564b04055211bca4ea0a39847113532 .

I invoke Ansible with the following command:

ansible-playbook \
    -v -v -v -v \
    --inventory=~/src/tuchakube/inventory/c13/hosts.yml \
    --extra-vars "cluster_name=c13.tuchakube.local" \
    ~/src/kubespray/cluster.yml

Here's the result of Ansible run: https://gist.githubusercontent.com/melnik13/de5eb0b30554030fad8885411b42b9df/raw/3dc52ce1929fe79330854847e0717f72f6f80bd0/ansible-cluster.log, the shortened version is here: https://gist.github.com/melnik13/687c61f737f12d625911d685287e887e .

Here's what I see on master2 in the messages log-file: https://gist.github.com/19c4a116e34bceacbc81966d94357ce2 .

Is that a bug? Or that's something I'm doing wrong?

@khmarochos khmarochos added the kind/bug Categorizes issue or PR as related to a bug. label Nov 28, 2021
@khmarochos
Copy link
Author

I've just created a fresh new environment and issues a single self-signed CA certificate (without dividing it to Root and Intermediate). I installed this certificate the same way and now I've got the following result of Kubespray:

PLAY RECAP *********************************************************************
etcd1                      : ok=177  changed=51   unreachable=0    failed=0    skipped=273  rescued=0    ignored=0
etcd2                      : ok=170  changed=49   unreachable=0    failed=0    skipped=260  rescued=0    ignored=0
etcd3                      : ok=170  changed=49   unreachable=0    failed=0    skipped=260  rescued=0    ignored=0
localhost                  : ok=4    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
master1                    : ok=491  changed=108  unreachable=0    failed=0    skipped=1072 rescued=0    ignored=1
master2                    : ok=432  changed=95   unreachable=0    failed=0    skipped=947  rescued=0    ignored=0
worker1                    : ok=368  changed=78   unreachable=0    failed=0    skipped=643  rescued=0    ignored=0
worker2                    : ok=368  changed=78   unreachable=0    failed=0    skipped=642  rescued=0    ignored=0
worker3                    : ok=368  changed=78   unreachable=0    failed=0    skipped=642  rescued=0    ignored=0
worker4                    : ok=368  changed=78   unreachable=0    failed=0    skipped=642  rescued=0    ignored=0

So, something seems to be wrong with my Intermediate CA certificate, but what exactly it is?

@khmarochos
Copy link
Author

Perhaps I was wrong when I had chosen the Bug label during the issue's submission, it might be my fault.

I'd be very grateful for any hints.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 27, 2022
@doker78
Copy link

doker78 commented Mar 6, 2022

hey any updates here ?

i have the same issue:
FAILED - RETRYING: kubeadm | Initialize first master (3 retries left). FAILED - RETRYING: kubeadm | Initialize first master (2 retries left). FAILED - RETRYING: kubeadm | Initialize first master (1 retries left).
WITH:
ansible 2.10.15
kubespray 2.18.0
Kubernetes version: v1.23.4

failed to find: bootstrap-kubelet.conf and NOT FOUNDED in any nodes
fatal: [master1]: FAILED! => {"attempts": 3, "changed": true, "cmd": ["timeout", "-k", "300s", "300s", "/usr/local/bin/kubeadm", "init", "--config=/etc/kubernetes/kubeadm-config.yaml", "--ignore-preflight-errors=all", "--skip-phases=addon/coredns", "--upload-certs"], "delta": "0:05:00.002718", "end": "2022-03-06 19:38:30.241541", "failed_when_result": true, "msg": "non-zero return code", "rc": 124, "start": "2022-03-06 19:33:30.238823", "stderr": "W0306 19:33:30.257150 30320 utils.go:69] The recommended value for \"clusterDNS\" in \"KubeletConfiguration\" is: [10.233.0.10]; the provided value is: [169.254.25.10]\n\t[WARNING FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml]: /etc/kubernetes/manifests/kube-apiserver.yaml already exists\n\t[WARNING FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml]: /etc/kubernetes/manifests/kube-controller-manager.yaml already exists\n\t[WARNING FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml]: /etc/kubernetes/manifests/kube-scheduler.yaml already exists\n\t[WARNING Port-10250]: Port 10250 is in use", "stderr_lines": ["W0306 19:33:30.257150 30320 utils.go:69] The recommended value for \"clusterDNS\" in \"KubeletConfiguration\" is: [10.233.0.10]; the provided value is: [169.254.25.10]", "\t[WARNING FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml]: /etc/kubernetes/manifests/kube-apiserver.yaml already exists", "\t[WARNING FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml]: /etc/kubernetes/manifests/kube-controller-manager.yaml already exists", "\t[WARNING FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml]: /etc/kubernetes/manifests/kube-scheduler.yaml already exists", "\t[WARNING Port-10250]: Port 10250 is in use"], "stdout": "[init] Using Kubernetes version: v1.23.4\n[preflight] Running pre-flight checks\n[preflight] Pulling images required for setting up a Kubernetes cluster\n[preflight] This might take a minute or two, depending on the speed of your internet connection\n[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'\n[certs] Using certificateDir folder \"/etc/kubernetes/ssl\"\n[certs] Using existing ca certificate authority\n[certs] Using existing apiserver certificate and key on disk\n[certs] Using existing apiserver-kubelet-client certificate and key on disk\n[certs] Using existing front-proxy-ca certificate authority\n[certs] Using existing front-proxy-client certificate and key on disk\n[certs] External etcd mode: Skipping etcd/ca certificate authority generation\n[certs] External etcd mode: Skipping etcd/server certificate generation\n[certs] External etcd mode: Skipping etcd/peer certificate generation\n[certs] External etcd mode: Skipping etcd/healthcheck-client certificate generation\n[certs] External etcd mode: Skipping apiserver-etcd-client certificate generation\n[certs] Using the existing \"sa\" key\n[kubeconfig] Using kubeconfig folder \"/etc/kubernetes\"\n[kubeconfig] Using existing kubeconfig file: \"/etc/kubernetes/admin.conf\"\n[kubeconfig] Using existing kubeconfig file: \"/etc/kubernetes/kubelet.conf\"\n[kubeconfig] Using existing kubeconfig file: \"/etc/kubernetes/controller-manager.conf\"\n[kubeconfig] Using existing kubeconfig file: \"/etc/kubernetes/scheduler.conf\"\n[kubelet-start] Writing kubelet environment file with flags to file \"/var/lib/kubelet/kubeadm-flags.env\"\n[kubelet-start] Writing kubelet configuration to file \"/var/lib/kubelet/config.yaml\"\n[kubelet-start] Starting the kubelet\n[control-plane] Using manifest folder \"/etc/kubernetes/manifests\"\n[control-plane] Creating static Pod manifest for \"kube-apiserver\"\n[control-plane] Creating static Pod manifest for \"kube-controller-manager\"\n[control-plane] Creating static Pod manifest for \"kube-scheduler\"\n[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory \"/etc/kubernetes/manifests\". This can take up to 5m0s\n[kubelet-check] Initial timeout of 40s passed.", "stdout_lines": ["[init] Using Kubernetes version: v1.23.4", "[preflight] Running pre-flight checks", "[preflight] Pulling images required for setting up a Kubernetes cluster", "[preflight] This might take a minute or two, depending on the speed of your internet connection", "[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'", "[certs] Using certificateDir folder \"/etc/kubernetes/ssl\"", "[certs] Using existing ca certificate authority", "[certs] Using existing apiserver certificate and key on disk", "[certs] Using existing apiserver-kubelet-client certificate and key on disk", "[certs] Using existing front-proxy-ca certificate authority", "[certs] Using existing front-proxy-client certificate and key on disk", "[certs] External etcd mode: Skipping etcd/ca certificate authority generation", "[certs] External etcd mode: Skipping etcd/server certificate generation", "[certs] External etcd mode: Skipping etcd/peer certificate generation", "[certs] External etcd mode: Skipping etcd/healthcheck-client certificate generation", "[certs] External etcd mode: Skipping apiserver-etcd-client certificate generation", "[certs] Using the existing \"sa\" key", "[kubeconfig] Using kubeconfig folder \"/etc/kubernetes\"", "[kubeconfig] Using existing kubeconfig file: \"/etc/kubernetes/admin.conf\"", "[kubeconfig] Using existing kubeconfig file: \"/etc/kubernetes/kubelet.conf\"", "[kubeconfig] Using existing kubeconfig file: \"/etc/kubernetes/controller-manager.conf\"", "[kubeconfig] Using existing kubeconfig file: \"/etc/kubernetes/scheduler.conf\"", "[kubelet-start] Writing kubelet environment file with flags to file \"/var/lib/kubelet/kubeadm-flags.env\"", "[kubelet-start] Writing kubelet configuration to file \"/var/lib/kubelet/config.yaml\"", "[kubelet-start] Starting the kubelet", "[control-plane] Using manifest folder \"/etc/kubernetes/manifests\"", "[control-plane] Creating static Pod manifest for \"kube-apiserver\"", "[control-plane] Creating static Pod manifest for \"kube-controller-manager\"", "[control-plane] Creating static Pod manifest for \"kube-scheduler\"", "[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory \"/etc/kubernetes/manifests\". This can take up to 5m0s", "[kubelet-check] Initial timeout of 40s passed."]}

my INI file

`[all]
master0 ansible_host=192.168.122.10 ansible_user=root ip=192.168.122.10 etcd_member_name=etcd1
master1 ansible_host=192.168.122.11 ansible_user=root ip=192.168.122.11 etcd_member_name=etcd0
master2 ansible_host=192.168.122.12 ansible_user=root ip=192.168.122.12 etcd_member_name=etcd2
worker0 ansible_host=192.168.122.20 ansible_user=root
worker1 ansible_host=192.168.122.21 ansible_user=root
worker2 ansible_host=192.168.122.22 ansible_user=root

[kube_control_plane]
master0
master1
master2

[etcd]
master0
master1
master2

[kube_node]
worker0
worker1
worker2

[calico_rr]

[k8s_cluster:children]
kube_control_plane
kube_node
calico_rr
`

and here that i got on "journalctl"
Mar 06 19:43:23 master0 kubelet[16076]: E0306 19:43:23.655747 16076 server.go:302] "Failed to run kubelet" err="failed to run Kubelet: unable to load bootstrap kubeconfig: stat /etc/kubernetes/bootstr> Mar 06 19:43:23 master0 systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE Mar 06 19:43:23 master0 systemd[1]: kubelet.service: Failed with result 'exit-code'

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 5, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@khmarochos
Copy link
Author

Hi there.

I found a very stupid workaround.

Instead of using a full certification chain (Intermediate CA certificate + Root CA certificate), I used the Intermediate CA certificate only. Kubespray works fine, almost everything works fine as well, but it led me to another problem: https://discuss.kubernetes.io/t/using-an-intermediate-ca-whose-certificate-is-signed-by-a-self-signed-root-ca-certificate/19866 :-/

@khmarochos
Copy link
Author

khmarochos commented Jun 10, 2022

To be frank, I've come to the conclusion that the most reliable way is to issue a self-signed certificate (with v3_ca extension, of course) and use it as the CA for a Kubernetes cluster. Everything works fine without additional CAs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

4 participants