Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding a new static worker node results in a preflight check failure on existing nodes #2802

Open
xmudrii opened this issue Jun 13, 2023 · 6 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/cluster-management Denotes a PR or issue as being assigned to SIG Cluster Management.
Milestone

Comments

@xmudrii
Copy link
Member

xmudrii commented Jun 13, 2023

What happened?

Trying to add a new static worker node results in the following error:

+ sudo kubeadm init phase preflight --config=./kubeone/cfg/master_0.yaml
W0613 19:21:47.950292   27890 initconfiguration.go:331] [config] WARNING: Ignored YAML document with GroupVersionKind kubeadm.k8s.io/v1beta3, Kind=JoinConfiguration
W0613 19:21:47.958412   27890 initconfiguration.go:119] Usage of CRI endpoints without URL scheme is deprecated and can cause kubelet errors in the future. Automatically prepending scheme "unix" to the "criSocket" with value "/run/containerd/containerd.sock". Please update your configuration!
W0613 19:21:47.958515   27890 utils.go:69] The recommended value for "clusterDNS" in "KubeletConfiguration" is: [10.96.0.10]; the provided value is: [169.254.20.10]
	[WARNING DirAvailable--var-lib-etcd]: /var/lib/etcd is not empty
error execution phase preflight: [preflight] Some fatal errors occurred:
	[ERROR Port-6443]: Port 6443 is in use
	[ERROR Port-10259]: Port 10259 is in use
	[ERROR Port-10257]: Port 10257 is in use
	[ERROR FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml]: /etc/kubernetes/manifests/kube-apiserver.yaml already exists
	[ERROR FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml]: /etc/kubernetes/manifests/kube-controller-manager.yaml already exists
	[ERROR FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml]: /etc/kubernetes/manifests/kube-scheduler.yaml already exists
	[ERROR FileAvailable--etc-kubernetes-manifests-etcd.yaml]: /etc/kubernetes/manifests/etcd.yaml already exists
	[ERROR Port-10250]: Port 10250 is in use
	[ERROR Port-2379]: Port 2379 is in use
	[ERROR Port-2380]: Port 2380 is in use
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher

What happens is that joining a new static worker nodes triggers the WithFullInstall workflow that's used to provision the cluster from scratch as well. There we run preflight checks with kubeadm on each node to verify that VMs satisfy requirements to be a Kubernetes node.

That works the first time we provision the cluster, but subsequent runs (e.g. when adding a new static node) are failing on existing nodes because the cluster is provisioned, so files are already created and ports are taken by Kubernetes components.

Expected behavior

  • Adding a new static worker node works as expected

How to reproduce the issue?

  • Provision the cluster
  • Try to add a new static worker node after the cluster is provisioned

What KubeOne version are you using?

Provide your KubeOneCluster manifest here (if applicable)

{
  "kubeone": {
    "major": "1",
    "minor": "6",
    "gitVersion": "v1.6.0-rc.2-36-g0536063a",
    "gitCommit": "0536063ab064601ba217c2abd41abd4c80a02477",
    "gitTreeState": "",
    "buildDate": "2023-06-13T21:16:41+02:00",
    "goVersion": "go1.20.4",
    "compiler": "gc",
    "platform": "darwin/arm64"
  },
  "machine_controller": {
    "major": "",
    "minor": "",
    "gitVersion": "8e5884837711fb0fc6b568d734f09a7b809fc28e",
    "gitCommit": "",
    "gitTreeState": "",
    "buildDate": "",
    "goVersion": "",
    "compiler": "",
    "platform": "linux/amd64"
  }
}

What cloud provider are you running on?

Baremetal

What operating system are you running in your cluster?

Ubuntu 20.04.6

Additional information

We can mitigate this issue by ignoring those failures, in some cases, those failures can be real issues that's going to prevent cluster from being provisioned.

@xmudrii xmudrii added kind/bug Categorizes issue or PR as related to a bug. sig/cluster-management Denotes a PR or issue as being assigned to SIG Cluster Management. labels Jun 13, 2023
@xmudrii xmudrii self-assigned this Jun 13, 2023
@kubermatic-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.
After a furter 30 days, they will turn rotten.
Mark the issue as fresh with /remove-lifecycle stale.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@kubermatic-bot kubermatic-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 11, 2023
@xmudrii
Copy link
Member Author

xmudrii commented Sep 12, 2023

/remove-lifecycle stale

@kubermatic-bot kubermatic-bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 12, 2023
@kubermatic-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.
After a furter 30 days, they will turn rotten.
Mark the issue as fresh with /remove-lifecycle stale.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@kubermatic-bot kubermatic-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 11, 2023
@xmudrii
Copy link
Member Author

xmudrii commented Dec 11, 2023

/remove-lifecycle stale

@kubermatic-bot kubermatic-bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 11, 2023
@kubermatic-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.
After a furter 30 days, they will turn rotten.
Mark the issue as fresh with /remove-lifecycle stale.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@kubermatic-bot kubermatic-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 8, 2024
@xmudrii
Copy link
Member Author

xmudrii commented Apr 8, 2024

/remove-lifecycle stale

@kubermatic-bot kubermatic-bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 8, 2024
@xmudrii xmudrii added the priority/low Not that important. label Jun 24, 2024
@kron4eg kron4eg removed the priority/low Not that important. label Aug 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/cluster-management Denotes a PR or issue as being assigned to SIG Cluster Management.
Projects
None yet
Development

No branches or pull requests

3 participants