Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrading a 1.12 cluster thru 1.13 to 1.14 fails. #1471

Closed
mauilion opened this issue Mar 28, 2019 · 10 comments
Closed

Upgrading a 1.12 cluster thru 1.13 to 1.14 fails. #1471

mauilion opened this issue Mar 28, 2019 · 10 comments
Assignees
Labels
area/upgrades kind/bug Categorizes issue or PR as related to a bug. lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Milestone

Comments

@mauilion
Copy link

mauilion commented Mar 28, 2019

BUG REPORT

Versions

kubeadm version (use kubeadm version): 1.14.0

Environment:

  • Kubernetes version (use kubectl version): 1.13.5
  • Cloud provider or hardware configuration: local
  • OS (e.g. from /etc/os-release): any
  • Kernel (e.g. uname -a): any
  • Others:

What happened?

in 1.12 we bound etcd to localhost on single master setups. We also minted certificates that included only hostname, localhost, and 127.0.0.1 ip san.

[certificates] etcd/server serving cert is signed for DNS names [kube-master localhost] and IPs [127.0.0.1 ::1]

in 1.13 we changed that behavior and started binding etcd to 127.0.0.1 and the node ip.
We also updated the cert generation to pick up the change.

You can upgrade a cluster from 1.12 to 1.13 with no issues as kubeadm plan will try to assess etcd on localhost.

When you try to upgrade the 1.13 cluster to 1.14 the upgrade fails cause in 1.14 kubeadm tries to assess etcd on the node ip. While it's a valid assumption that etcd would be bound to the node ip if the cluster were created using 1.13 this cluster was originally created using 1.12 and etcd is only bound to 127.0.0.1

What you expected to happen?

That we would either try to determine what address etcd is bound to or make the change in 1.13 to modify etcd configuration so that we don't strand 1.12 clusters.

How to reproduce it (as minimally and precisely as possible)?

bring up a 1.12 single master cluster
upgrade it to 1.13
Try to upgrade it to 1.14

Anything else we need to know?

You can work around this issue with the following:
using kubeadm for the version of kubernetes you are on you can:

  1. fetch the kubeadm.conf from the cluster.
 kubeadm config view > /etc/kubeadm.conf
  1. append the etcd config in kubeadm.conf to something like:
etcd:
  local:
    dataDir: /var/lib/etcd
    image: ""
    serverCertSANs:
    - "10.192.0.2"
    extraArgs:
      listen-client-urls: https://127.0.0.1:2379,https://10.192.0.2:2379 

where 10.192.0.2 is the node ip.

  1. remove the existing etcd server certs and regenerate them with a phase.
rm /etc/kubernetes/pki/etcd/server.*
  1. Mint new ones. You should see the new ip san in effect.
kubeadm init phase certs etcd-server --config /etc/kubeadm.conf
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [kube-master localhost] and IPs [10.192.0.2 127.0.0.1 ::1]
  1. use a phase to reconfigure etcd with the new listen-client-urls
kubeadm init phase etcd local --config /etc/kubeadm.conf
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"

you should now see etcd port 2379 bound to 127.0.0.1 and 10.192.0.2

ss -ln | grep 2379                                                                                                                                                       
tcp    LISTEN     0      128    127.0.0.1:2379                  *:*                  
tcp    LISTEN     0      128    10.192.0.2:2379                  *:*     
  1. Upload the kubeadm.conf to the cluster.
kubeadm config upload from-file --config /etc/kubeadm.conf
[uploadconfig] storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace

Now you can grab the new kubeadm and upgrade.

@davidkarlsen
Copy link
Member

I also did:

  1. reload kubelet:
systemctl restart kubelet.service

so that it was all reloaded and ready

@neolit123
Copy link
Member

neolit123 commented Mar 28, 2019

@mauilion thanks for the detailed report.
i think this was caught here: #1469

seems like we can close 1469 as this ticket outlines the problem better.

@kubernetes/sig-cluster-lifecycle

@neolit123 neolit123 added kind/bug Categorizes issue or PR as related to a bug. area/upgrades labels Mar 28, 2019
@neolit123 neolit123 added this to the v1.14 milestone Mar 28, 2019
@neolit123 neolit123 added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Mar 28, 2019
@neolit123
Copy link
Member

#1469 (comment)

@proskehy
Copy link

Experiencing the same issue as @neolit123 describes in #1469 (comment).

Etcd manifests have not been manually altered, cluster was created before k8s 1.12.

@fabriziopandini
Copy link
Member

/assign
/lifecycle active

@k8s-ci-robot k8s-ci-robot added the lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. label Mar 30, 2019
@neolit123
Copy link
Member

should be resolved in 1.14.1

@kfox1111
Copy link

I just upgraded a cluster through 1.13.x and 1.14.4. Worked ok. I then tried to upgrade to 1.15.0 and it failed:
[root@evan3 ~]# kubeadm upgrade plan
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[preflight] Running pre-flight checks.
[upgrade] Making sure the cluster is healthy:
error syncing endpoints with etc: dial tcp xxx.xxx.xxx.xxx:2379: connect: connection refused

It was referencing the external address.

I do still see localhost stuff in etcd:
grep 127.0.0.1 /etc/kubernetes/manifests/etcd.yaml
- --advertise-client-urls=https://127.0.0.1:2379
- --initial-advertise-peer-urls=https://127.0.0.1:2380
- --initial-cluster=evan3=https://127.0.0.1:2380
- --listen-client-urls=https://127.0.0.1:2379
- --listen-peer-urls=https://127.0.0.1:2380
- ETCDCTL_API=3 etcdctl --endpoints=https://[127.0.0.1]:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt

@kfox1111
Copy link

Should I just update all references to 127.0.0.1 to the external address in /etc/kubernetes/manifests/etcd.yaml or is there more to it?

@neolit123
Copy link
Member

try patching it manually. this should not have happened as we had a special case to handle the etcd upgrade not being localhost.

@baduba
Copy link

baduba commented May 28, 2020

Preciso subir a versao do eks do 1.12 pra 1.14
alguem pode me orientar o melhor como devo fazer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/upgrades kind/bug Categorizes issue or PR as related to a bug. lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

No branches or pull requests

8 participants