Upgrading a 1.12 cluster thru 1.13 to 1.14 fails. #1471

mauilion · 2019-03-28T00:35:04Z

BUG REPORT

Versions

kubeadm version (use kubeadm version): 1.14.0

Environment:

Kubernetes version (use kubectl version): 1.13.5
Cloud provider or hardware configuration: local
OS (e.g. from /etc/os-release): any
Kernel (e.g. uname -a): any
Others:

What happened?

in 1.12 we bound etcd to localhost on single master setups. We also minted certificates that included only hostname, localhost, and 127.0.0.1 ip san.

[certificates] etcd/server serving cert is signed for DNS names [kube-master localhost] and IPs [127.0.0.1 ::1]

in 1.13 we changed that behavior and started binding etcd to 127.0.0.1 and the node ip.
We also updated the cert generation to pick up the change.

You can upgrade a cluster from 1.12 to 1.13 with no issues as kubeadm plan will try to assess etcd on localhost.

When you try to upgrade the 1.13 cluster to 1.14 the upgrade fails cause in 1.14 kubeadm tries to assess etcd on the node ip. While it's a valid assumption that etcd would be bound to the node ip if the cluster were created using 1.13 this cluster was originally created using 1.12 and etcd is only bound to 127.0.0.1

What you expected to happen?

That we would either try to determine what address etcd is bound to or make the change in 1.13 to modify etcd configuration so that we don't strand 1.12 clusters.

How to reproduce it (as minimally and precisely as possible)?

bring up a 1.12 single master cluster
upgrade it to 1.13
Try to upgrade it to 1.14

Anything else we need to know?

You can work around this issue with the following:
using kubeadm for the version of kubernetes you are on you can:

fetch the kubeadm.conf from the cluster.

 kubeadm config view > /etc/kubeadm.conf

append the etcd config in kubeadm.conf to something like:

etcd:
  local:
    dataDir: /var/lib/etcd
    image: ""
    serverCertSANs:
    - "10.192.0.2"
    extraArgs:
      listen-client-urls: https://127.0.0.1:2379,https://10.192.0.2:2379

where 10.192.0.2 is the node ip.

remove the existing etcd server certs and regenerate them with a phase.

rm /etc/kubernetes/pki/etcd/server.*

Mint new ones. You should see the new ip san in effect.

kubeadm init phase certs etcd-server --config /etc/kubeadm.conf
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [kube-master localhost] and IPs [10.192.0.2 127.0.0.1 ::1]

use a phase to reconfigure etcd with the new listen-client-urls

kubeadm init phase etcd local --config /etc/kubeadm.conf
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"

you should now see etcd port 2379 bound to 127.0.0.1 and 10.192.0.2

ss -ln | grep 2379                                                                                                                                                       
tcp    LISTEN     0      128    127.0.0.1:2379                  *:*                  
tcp    LISTEN     0      128    10.192.0.2:2379                  *:*

Upload the kubeadm.conf to the cluster.

kubeadm config upload from-file --config /etc/kubeadm.conf
[uploadconfig] storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace

Now you can grab the new kubeadm and upgrade.

The text was updated successfully, but these errors were encountered:

davidkarlsen · 2019-03-28T00:39:53Z

I also did:

reload kubelet:

systemctl restart kubelet.service

so that it was all reloaded and ready

neolit123 · 2019-03-28T18:29:02Z

@mauilion thanks for the detailed report.
i think this was caught here: #1469

seems like we can close 1469 as this ticket outlines the problem better.

@kubernetes/sig-cluster-lifecycle

neolit123 · 2019-03-28T18:32:59Z

#1469 (comment)

proskehy · 2019-03-29T09:01:06Z

Experiencing the same issue as @neolit123 describes in #1469 (comment).

Etcd manifests have not been manually altered, cluster was created before k8s 1.12.

fabriziopandini · 2019-03-30T16:00:34Z

/assign
/lifecycle active

neolit123 · 2019-04-11T14:16:55Z

should be resolved in 1.14.1

kfox1111 · 2019-07-15T17:25:54Z

I just upgraded a cluster through 1.13.x and 1.14.4. Worked ok. I then tried to upgrade to 1.15.0 and it failed:
[root@evan3 ~]# kubeadm upgrade plan
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[preflight] Running pre-flight checks.
[upgrade] Making sure the cluster is healthy:
error syncing endpoints with etc: dial tcp xxx.xxx.xxx.xxx:2379: connect: connection refused

It was referencing the external address.

I do still see localhost stuff in etcd:
grep 127.0.0.1 /etc/kubernetes/manifests/etcd.yaml
- --advertise-client-urls=https://127.0.0.1:2379
- --initial-advertise-peer-urls=https://127.0.0.1:2380
- --initial-cluster=evan3=https://127.0.0.1:2380
- --listen-client-urls=https://127.0.0.1:2379
- --listen-peer-urls=https://127.0.0.1:2380
- ETCDCTL_API=3 etcdctl --endpoints=https://[127.0.0.1]:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt

kfox1111 · 2019-07-15T17:29:02Z

Should I just update all references to 127.0.0.1 to the external address in /etc/kubernetes/manifests/etcd.yaml or is there more to it?

neolit123 · 2019-07-15T17:54:39Z

try patching it manually. this should not have happened as we had a special case to handle the etcd upgrade not being localhost.

baduba · 2020-05-28T18:47:03Z

Preciso subir a versao do eks do 1.12 pra 1.14
alguem pode me orientar o melhor como devo fazer

neolit123 added kind/bug Categorizes issue or PR as related to a bug. area/upgrades labels Mar 28, 2019

neolit123 added this to the v1.14 milestone Mar 28, 2019

neolit123 added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Mar 28, 2019

neolit123 mentioned this issue Mar 28, 2019

kubeadm upgrade plan not working for v1.13.5 to v1.14.0 #1469

Closed

k8s-ci-robot added the lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. label Mar 30, 2019

k8s-ci-robot assigned fabriziopandini Mar 30, 2019

fabriziopandini mentioned this issue Apr 1, 2019

kubeadm : fix-kubeadm-upgrade-12-13-14 kubernetes/kubernetes#75956

Merged

neolit123 closed this as completed Apr 11, 2019

blurpy mentioned this issue Dec 2, 2019

1.15 - kubeadm join --control-plane fails on clusters created with <= 1.12 #1950

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrading a 1.12 cluster thru 1.13 to 1.14 fails. #1471

Upgrading a 1.12 cluster thru 1.13 to 1.14 fails. #1471

mauilion commented Mar 28, 2019 •

edited

Loading

davidkarlsen commented Mar 28, 2019

neolit123 commented Mar 28, 2019 •

edited

Loading

neolit123 commented Mar 28, 2019

proskehy commented Mar 29, 2019

fabriziopandini commented Mar 30, 2019

neolit123 commented Apr 11, 2019

kfox1111 commented Jul 15, 2019

kfox1111 commented Jul 15, 2019

neolit123 commented Jul 15, 2019

baduba commented May 28, 2020

Upgrading a 1.12 cluster thru 1.13 to 1.14 fails. #1471

Upgrading a 1.12 cluster thru 1.13 to 1.14 fails. #1471

Comments

mauilion commented Mar 28, 2019 • edited Loading

Versions

What happened?

What you expected to happen?

How to reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

davidkarlsen commented Mar 28, 2019

neolit123 commented Mar 28, 2019 • edited Loading

neolit123 commented Mar 28, 2019

proskehy commented Mar 29, 2019

fabriziopandini commented Mar 30, 2019

neolit123 commented Apr 11, 2019

kfox1111 commented Jul 15, 2019

kfox1111 commented Jul 15, 2019

neolit123 commented Jul 15, 2019

baduba commented May 28, 2020

mauilion commented Mar 28, 2019 •

edited

Loading

neolit123 commented Mar 28, 2019 •

edited

Loading