Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[upgrade/postupgrade] FATAL post-upgrade error: unable to create/update the DNS service: services "kube-dns" not found (using coredns) #2358

Closed
VannTen opened this issue Dec 3, 2020 · 6 comments

Comments

@VannTen
Copy link

VannTen commented Dec 3, 2020

Is this a BUG REPORT or FEATURE REQUEST?

BUG REPORT

Versions

kubeadm version (use kubeadm version):kubeadm version: &version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.11", GitCommit:"d94a81c724ea8e1ccc9002d89b7fe81d58f89ede", GitTreeState:"clean", BuildDate:"2020-03-12T21:06:11Z", GoVersion:"go1.12.17", Compiler:"gc", Platform:"linux/amd64"}

Environment:

What happened? /usr/local/bin/kubeadm upgrade apply -y v1.15.11 --config=/etc/kubernetes/kubeadm-config.yaml --ignore-preflight-errors=all --allow-experimental-upgrades --allow-release-candidate-upgrades --etcd-upgrade=false --force

After launching the following command :

usr/local/bin/kubeadm upgrade apply -y v1.15.11 --config=/etc/kubernetes/kubeadm-config.yaml --ignore-preflight-errors=all --allow-experimental-upgrades --allow-release-candidate-upgrades --etcd-upgrade=false --force

kubeadm fail at the post-upgrade stage with the following error :
[upgrade/postupgrade] FATAL post-upgrade error: unable to create/update the DNS service: services "kube-dns" not found
But it is configured to use coreDNS. Furthermore, the cluster was already using coreDNS, and there was no kube-dns Deployment or Service.

What you expected to happen?

kubeadm correctly recognizes coredns and does not try to find a kube-dns service.

How to reproduce it (as minimally and precisely as possible)?

The reproduction seems quite hard. The linked slack messages mentions that this happens sometimes, sometines nos (!).
We are updating several clusters with mostly the same method, and only (as of now) encoutered that problem on one. And the previous upgrade (1.13 -> 1.14) went fine, even though coredns was already used.

However, some facts that might be related :
After seeing that kubeadm set clusterDNS for kubelet config to the address 10.x.x.10 (x depending from the subnet services), I constated that this clusterIP was taken by another service (from an application running on the cluster), and the creation date of that service was between the previous upgrade and the one where we encoutered the error (so at least the timing makes sense). That setting of clusterDNS was not actually used by the kubelets, because kubespray handle its the kubelets configuration (I think), and use the third ip in the range (10.x.x.3). But maybe this somehow confuse kubeadm ?

I did not had the time to setup a reproducing scenario unfortunately. If I do, I will update the issue.

Anything else we need to know?

The workaround which is mentioned in one of the slack messages works well, and allowed us to perform our upgrade. I'll note it here since it's probably more accessible for future users of kubeadm which could stumble upon that :

Workaround

Copy the service coredns. Create a new service kube-dns from the copy, changing the name ("kube-dns") and forcing the clusterIP to 10.x.x.10 ( -> matching what is in your kubeadm config). Then relaunch your command

@neolit123
Copy link
Member

hi,

[upgrade/postupgrade] FATAL post-upgrade error: unable to create/update the DNS service: services "kube-dns" not found

the service that coredns uses is also called kube-dns, because that was part of the original transition plan in k8s from kube-dns to coredns. #sig-network on k8s slack know more about this topic.

the coredns / kube-dns manifests are here:
https://github.com/kubernetes/kubernetes/blob/master/cmd/kubeadm/app/phases/addons/dns/manifests.go

Kubernetes version (use kubectl version): 1.14.1 (upgrading to 1.15.11)

this version is not supported. you'd have to be at 1.18 soon as older versions are going out of support.
if you are able to reproduce the problem with a test cluster that is 1.17 or newer versions we could have a look at it.

if so please re-open the ticket.

@neolit123
Copy link
Member

btw, the error is coming from here:
https://github.com/kubernetes/kubernetes/blob/98bc258bf5516b6c60860e06845b899eab29825d/cmd/kubeadm/app/phases/addons/dns/dns.go#L363-L365

the hard to reproduce aspect here only means that somehow the service is not available at that particular moment, which is bad.
we could make some of the operations in the function to be retried, but instead it feels like there is an external problem that has to be better understood.

@lgtm87
Copy link

lgtm87 commented Dec 17, 2020

@neolit123 I have faced exactly the same issue during cluster upgrade from 1.16.3 to 1.17.11 version.
kubespray v2.14 was used for cluster upgrade, kubeadm command and output are the following:
"module_args":

{ "_raw_params": "timeout -k 600s 600s /usr/local/bin/kubeadm upgrade apply -y v1.17.11 --config=/etc/kubernetes/kubeadm-config.yaml --ignore-preflight-errors=all --allow-experimental-upgrades --etcd-upgrade=false --force", "_uses_shell": false, "argv": null, "chdir": null, "creates": null, "executable": null, "removes": null, "stdin": null, "stdin_add_newline": true, "strip_empty_ends": true, "warn": true }
},
"msg": "non-zero return code",
"rc": 1,
"start": "2020-12-17 07:50:21.835844",
"stderr": "W1217 07:50:21.875273 5279 strict.go:54] error unmarshaling configuration schema.GroupVersionKind{Group:"kubeproxy.config.k8s.io", Version:"v1alpha1", Kind:"KubeProxyConfiguration"}: error unmarshaling JSON: while decoding JSON: json: unknown field "tcpFinTimeout"\nW1217 07:50:21.878019 5279 defaults.go:186] The recommended value for "clusterDNS" in "KubeletConfiguration" is: [10.239.0.10]; the provided value is: [169.254.25.10]\nW1217 07:50:21.878113 5279 validation.go:28] Cannot validate kube-proxy config - no validator is available\nW1217 07:50:21.878121 5279 validation.go:28] Cannot validate kubelet config - no validator is available\nW1217 07:50:21.888242 5279 common.go:94] WARNING: Usage of the --config flag for reconfiguring the cluster during upgrade is not recommended!\nW1217 07:50:21.890289 5279 strict.go:54] error unmarshaling configuration schema.GroupVersionKind{Group:"kubeproxy.config.k8s.io", Version:"v1alpha1", Kind:"KubeProxyConfiguration"}: error unmarshaling JSON: while decoding JSON: json: unknown field "tcpFinTimeout"\nW1217 07:50:21.890652 5279 defaults.go:186] The recommended value for "clusterDNS" in "KubeletConfiguration" is: [10.239.0.10]; the provided value is: [169.254.25.10]\nW1217 07:50:21.890719 5279 validation.go:28] Cannot validate kube-proxy config - no validator is available\nW1217 07:50:21.890726 5279 validation.go:28] Cannot validate kubelet config - no validator is available\n\t[WARNING CoreDNSUnsupportedPlugins]: start version '1.6.7' not supported\n\t[WARNING CoreDNSMigration]: CoreDNS will not be upgraded: start version '1.6.7' not supported\nW1217 07:50:25.293871 5279 manifests.go:214] the default kube-apiserver authorization-mode is "Node,RBAC"; using "Node,RBAC"\nW1217 07:50:27.524497 5279 dns.go:246] the CoreDNS Configuration was not migrated: unable to migrate CoreDNS ConfigMap: start version '1.6.7' not supported. The existing CoreDNS Corefile configuration has been retained.\n[upgrade/postupgrade] FATAL post-upgrade error: unable to create/update the DNS service: services "kube-dns" not found\nTo see the stack trace of this error execute with --v=5 or higher",
"stderr_lines": [
"W1217 07:50:21.875273 5279 strict.go:54] error unmarshaling configuration schema.GroupVersionKind{Group:"kubeproxy.config.k8s.io", Version:"v1alpha1", Kind:"KubeProxyConfiguration"}: error unmarshaling JSON: while decoding JSON: json: unknown field "tcpFinTimeout"",
"W1217 07:50:21.878019 5279 defaults.go:186] The recommended value for "clusterDNS" in "KubeletConfiguration" is: [10.239.0.10]; the provided value is: [169.254.25.10]",
"W1217 07:50:21.878113 5279 validation.go:28] Cannot validate kube-proxy config - no validator is available",
"W1217 07:50:21.878121 5279 validation.go:28] Cannot validate kubelet config - no validator is available",
"W1217 07:50:21.888242 5279 common.go:94] WARNING: Usage of the --config flag for reconfiguring the cluster during upgrade is not recommended!",
"W1217 07:50:21.890289 5279 strict.go:54] error unmarshaling configuration schema.GroupVersionKind{Group:"kubeproxy.config.k8s.io", Version:"v1alpha1", Kind:"KubeProxyConfiguration"}: error unmarshaling JSON: while decoding JSON: json: unknown field "tcpFinTimeout"",
"W1217 07:50:21.890652 5279 defaults.go:186] The recommended value for "clusterDNS" in "KubeletConfiguration" is: [10.239.0.10]; the provided value is: [169.254.25.10]",
"W1217 07:50:21.890719 5279 validation.go:28] Cannot validate kube-proxy config - no validator is available",
"W1217 07:50:21.890726 5279 validation.go:28] Cannot validate kubelet config - no validator is available",
"\t[WARNING CoreDNSUnsupportedPlugins]: start version '1.6.7' not supported",
"\t[WARNING CoreDNSMigration]: CoreDNS will not be upgraded: start version '1.6.7' not supported",
"W1217 07:50:25.293871 5279 manifests.go:214] the default kube-apiserver authorization-mode is "Node,RBAC"; using "Node,RBAC"",
"W1217 07:50:27.524497 5279 dns.go:246] the CoreDNS Configuration was not migrated: unable to migrate CoreDNS ConfigMap: start version '1.6.7' not supported. The existing CoreDNS Corefile configuration has been retained.",
"[upgrade/postupgrade] FATAL post-upgrade error: unable to create/update the DNS service: services "kube-dns" not found",
"To see the stack trace of this error execute with --v=5 or higher"

Cluster is also using coredns before upgrade, can you please advice how to resolve it?

@neolit123
Copy link
Member

neolit123 commented Dec 17, 2020

"\t[WARNING CoreDNSUnsupportedPlugins]: start version '1.6.7' not supported",

my understanding of this problem:

  • you have 1.16 and try to upgrade to 1.17
  • your 1.16 cluster has a coredns version that is not supported by the upgrade tooling that is included in 1.17.
  • 1.16 -> 1.17 upgrade fails.

you can try editing the CoreDNS ConfigMap and Deployment to use 1.6.2 (downgrade CoreDNS):
https://github.com/kubernetes/kubernetes/blob/release-1.16/cmd/kubeadm/app/constants/constants.go#L336

if 1.6.7 is something that kubespray installs, please contact the kubespray team.

@lgtm87
Copy link

lgtm87 commented Dec 18, 2020

Thank you for update, the funny thing is that even if upgrade process fails - coredns 1.6.7 is installed (there is 2 replicasets, one is failing with corefile-backup configmap, and one is working fine).
I have upgraded another cluster with the same kubespray version and coredns 1.6.7 (also 1.16->1.17) and it finished successfully.
I don't think its an option for me to downgrade coredns version if we already have 1.6.7, looking at logs it's just a warning, but fatal message is about "kube-dns service not found"?

@VannTen
Copy link
Author

VannTen commented Feb 22, 2021

I encountered the same issue (with the same cluster) upgrading from 1.17.12 to 1.18.10, using kubespray v2.14.2.
The workaround was to create a copy of the coredns svc named kube-dns, and free the 10.x.0.10 service address (it was used by another service).
@neolit123 Could we reopen this issue ? I lack the permission to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants