Skip to content
This repository has been archived by the owner on Sep 24, 2021. It is now read-only.

lost master during upgrade #123

Closed
chuckha opened this issue Oct 25, 2019 · 11 comments · Fixed by #126
Closed

lost master during upgrade #123

chuckha opened this issue Oct 25, 2019 · 11 comments · Fixed by #126
Assignees

Comments

@chuckha
Copy link
Contributor

chuckha commented Oct 25, 2019

One of my master nodes was unable to bring up etcd fast enough:

Here are the relevant lines in /var/log/cloud-init-output.log

...
[etcd] Announced new etcd member joining to the existing etcd cluster
[etcd] Wrote Static Pod manifest for a local etcd member to "/etc/kubernetes/manifests/etcd.yaml"
[etcd] Waiting for the new etcd member to join the cluster. This can take up to 40s
[kubelet-check] Initial timeout of 40s passed.
error execution phase control-plane-join/etcd: error creating local etcd static pod manifest file: timeout waiting for etcd cluster to be available

This kinda sucks because it's not like we can re-run kubeadm init :/ I don't know if there's anything we can really do in this situation, at least not in this repo. Feel free to close if this is too inactionable.

@chuckha
Copy link
Contributor Author

chuckha commented Oct 25, 2019

Hmm, might be a configuration error? the etcd pod is giving me some weird logs:

2019-10-25 18:46:06.694841 W | etcdserver: request "header:<ID:13768187975200604220 username:\"kube-apiserver-etcd-client\" auth_revision:1 > txn:<compare:<target:MOD key:\"/registry/minions/ip-10-0-0-94.us-west-2.compute.internal\" mod_revision:2531 > success:<request_put:<key:\"/registry/minions/ip-10-0-0-94.us-west-2.compute.internal\" value_size:2926 >> failure:<>>" with result "size:16" took too long (461.256113ms) to execute
2019-10-25 18:46:08.171830 I | embed: rejected connection from "10.0.0.33:36376" (error "remote error: tls: bad certificate", ServerName "")
2019-10-25 18:46:08.614328 W | etcdserver: read-only range request "key:\"/registry/apiextensions.k8s.io/customresourcedefinitions\" range_end:\"/registry/apiextensions.k8s.io/customresourcedefinitiont\" count_only:true " with result "range_response_count:0 size:5" took too long (590.030795ms) to execute
2019-10-25 18:46:08.614655 W | etcdserver: read-only range request "key:\"/registry/apiextensions.k8s.io/customresourcedefinitions/\" range_end:\"/registry/apiextensions.k8s.io/customresourcedefinitions0\" limit:10000 " with result "range_response_count:0 size:5" took too long (602.507946ms) to execute
2019-10-25 18:46:09.218253 I | embed: rejected connection from "10.0.0.33:36428" (error "remote error: tls: bad certificate", ServerName "")
2019-10-25 18:46:09.568821 W | etcdserver: read-only range request "key:\"/registry/certificatesigningrequests/\" range_end:\"/registry/certificatesigningrequests0\" limit:10000 " with result "range_response_count:10 size:18706" took too long (193.823471ms) to execute
2019-10-25 18:46:09.571006 W | etcdserver: read-only range request "key:\"/registry/certificatesigningrequests\" range_end:\"/registry/certificatesigningrequestt\" count_only:true " with result "range_response_count:0 size:7" took too long (195.952555ms) to execute
2019-10-25 18:46:09.571643 W | etcdserver: read-only range request "key:\"/registry/deployments\" range_end:\"/registry/deploymentt\" count_only:true " with result "range_response_count:0 size:7" took too long (138.46605ms) to execute
2019-10-25 18:46:09.572548 W | etcdserver: read-only range request "key:\"/registry/deployments/\" range_end:\"/registry/deployments0\" limit:10000 " with result "range_response_count:1 size:1289" took too long (138.577944ms) to execute
2019-10-25 18:46:09.572861 W | etcdserver: read-only range request "key:\"/registry/daemonsets/\" range_end:\"/registry/daemonsets0\" limit:10000 " with result "range_response_count:1 size:1060" took too long (159.632747ms) to execute
2019-10-25 18:46:09.573554 W | etcdserver: read-only range request "key:\"/registry/daemonsets\" range_end:\"/registry/daemonsett\" count_only:true " with result "range_response_count:0 size:7" took too long (159.673077ms) to execute
2019-10-25 18:46:09.573679 W | etcdserver: read-only range request "key:\"/registry/controllers\" range_end:\"/registry/controllert\" count_only:true " with result "range_response_count:0 size:5" took too long (168.713817ms) to execute
2019-10-25 18:46:09.573879 W | etcdserver: read-only range request "key:\"/registry/controllers/\" range_end:\"/registry/controllers0\" limit:10000 " with result "range_response_count:0 size:5" took too long (168.748407ms) to execute
2019-10-25 18:46:09.574082 W | etcdserver: read-only range request "key:\"/registry/leases\" range_end:\"/registry/leaset\" count_only:true " with result "range_response_count:0 size:7" took too long (178.012634ms) to execute
2019-10-25 18:46:09.575447 W | etcdserver: read-only range request "key:\"/registry/leases/\" range_end:\"/registry/leases0\" limit:10000 " with result "range_response_count:6 size:2452" took too long (178.490785ms) to execute
2019-10-25 18:46:09.575609 W | etcdserver: read-only range request "key:\"/registry/leases\" range_end:\"/registry/leaset\" count_only:true " with result "range_response_count:0 size:7" took too long (187.102653ms) to execute
2019-10-25 18:46:09.575887 W | etcdserver: read-only range request "key:\"/registry/leases/\" range_end:\"/registry/leases0\" limit:10000 " with result "range_response_count:6 size:2452" took too long (187.172172ms) to execute
2019-10-25 18:46:10.951630 I | embed: rejected connection from "10.0.0.33:36544" (error "remote error: tls: bad certificate", ServerName "")
2019-10-25 18:46:18.172503 I | embed: rejected connection from "10.0.0.33:36620" (error "remote error: tls: bad certificate", ServerName "")
2019-10-25 18:46:19.172532 I | embed: rejected connection from "10.0.0.33:36626" (error "remote error: tls: bad certificate", ServerName "")
2019-10-25 18:46:20.827869 I | embed: rejected connection from "10.0.0.33:36636" (error "remote error: tls: bad certificate", ServerName "")
2019-10-25 18:46:28.172875 I | embed: rejected connection from "10.0.0.33:36672" (error "remote error: tls: bad certificate", ServerName "")
2019-10-25 18:46:29.172957 I | embed: rejected connection from "10.0.0.33:36684" (error "remote error: tls: bad certificate", ServerName "")
2019-10-25 18:46:30.529317 I | embed: rejected connection from "10.0.0.33:36688" (error "remote error: tls: bad certificate", ServerName "")
2019-10-25 18:46:32.810440 I | embed: rejected connection from "10.0.0.33:36698" (error "remote error: tls: bad certificate", ServerName "")
2019-10-25 18:46:38.173331 I | embed: rejected connection from "10.0.0.33:36720" (error "remote error: tls: bad certificate", ServerName "")
2019-10-25 18:46:39.173633 I | embed: rejected connection from "10.0.0.33:36728" (error "remote error: tls: bad certificate", ServerName "")
2019-10-25 18:46:40.892817 I | embed: rejected connection from "10.0.0.33:36736" (error "remote error: tls: bad certificate", ServerName "")
2019-10-25 18:46:43.169799 I | embed: rejected connection from "10.0.0.33:36752" (error "EOF", ServerName "")
2019-10-25 18:46:48.173585 I | embed: rejected connection from "10.0.0.33:36770" (error "remote error: tls: bad certificate", ServerName "")
2019-10-25 18:46:49.173673 I | embed: rejected connection from "10.0.0.33:36782" (error "remote error: tls: bad certificate", ServerName "")
2019-10-25 18:46:50.721219 I | embed: rejected connection from "10.0.0.33:36792" (error "remote error: tls: bad certificate", ServerName "")
2019-10-25 18:46:58.174266 I | embed: rejected connection from "10.0.0.33:36826" (error "remote error: tls: bad certificate", ServerName "")
2019-10-25 18:46:59.174582 I | embed: rejected connection from "10.0.0.33:36834" (error "remote error: tls: bad certificate", ServerName "")
2019-10-25 18:47:00.531769 I | embed: rejected connection from "10.0.0.33:36842" (error "remote error: tls: bad certificate", ServerName "")
2019-10-25 18:47:02.694785 I | embed: rejected connection from "10.0.0.33:36858" (error "remote error: tls: bad certificate", ServerName "")
2019-10-25 18:48:40.021002 I | mvcc: store.index: compact 2237
2019-10-25 18:48:40.022921 I | mvcc: finished scheduled compaction at 2237 (took 1.571358ms)
2019-10-25 18:53:40.023436 I | mvcc: store.index: compact 2989
2019-10-25 18:53:40.024863 I | mvcc: finished scheduled compaction at 2989 (took 1.073696ms)

@ncdc
Copy link
Contributor

ncdc commented Oct 25, 2019

Which etcd pod is that - the new one, or one of the good ones?

@chuckha
Copy link
Contributor Author

chuckha commented Oct 25, 2019

t's the one from a new node that failed to kubeadm join successfully

@ncdc
Copy link
Contributor

ncdc commented Oct 25, 2019

Is something missing some certs somewhere?

@ncdc
Copy link
Contributor

ncdc commented Oct 25, 2019

@chuckha if you still have it, could you please gist the full cloud-init log and the full etcd pod logs?

@chuckha
Copy link
Contributor Author

chuckha commented Oct 25, 2019

@ncdc and I paired on this one for a bit but as expected everything looked fine. It seems to have been a transient issue that kubeadm was unable to handle as etcd did indeed start and join the cluster, eventually.

We wanted to tag @fabriziopandini, @timothysc and @mauilion to start a larger conversation around the robustness of kubeadm as a tool we can depend on for automated installs. This is not the first time we've run into issues where there is a recoverable blip in the system (network or otherwise) that kubeadm was unable to handle. One other such issue we encountered was kubernetes/kubeadm#1844 (a lack of retries for image pulling).

Are there things we could do to improve the robustness of kubeadm? is that outside the scope of the kubeadm project?

@ncdc
Copy link
Contributor

ncdc commented Oct 25, 2019

Also, specifically, have any of you encountered issues where the apiserver static pod for some reason tries to connect over a period of time with a bad client cert to etcd, and then it magically clears itself up?

@chuckha
Copy link
Contributor Author

chuckha commented Oct 25, 2019

Cloud init log on the node that did not become a master when it should have
Cloud-init v. 19.2-36-g059d049c-0ubuntu2~18.04.1 running 'init-local' at Fri, 25 Oct 2019 18:45:12 +0000. Up 17.24 seconds.
Cloud-init v. 19.2-36-g059d049c-0ubuntu2~18.04.1 running 'init' at Fri, 25 Oct 2019 18:45:14 +0000. Up 19.68 seconds.
ci-info: ++++++++++++++++++++++++++++++++++++++Net device info+++++++++++++++++++++++++++++++++++++++
ci-info: +--------+------+-----------------------------+---------------+--------+-------------------+
ci-info: | Device |  Up  |           Address           |      Mask     | Scope  |     Hw-Address    |
ci-info: +--------+------+-----------------------------+---------------+--------+-------------------+
ci-info: |  eth0  | True |          10.0.0.33          | 255.255.255.0 | global | 06:a5:45:e6:58:5c |
ci-info: |  eth0  | True | fe80::4a5:45ff:fee6:585c/64 |       .       |  link  | 06:a5:45:e6:58:5c |
ci-info: |   lo   | True |          127.0.0.1          |   255.0.0.0   |  host  |         .         |
ci-info: |   lo   | True |           ::1/128           |       .       |  host  |         .         |
ci-info: +--------+------+-----------------------------+---------------+--------+-------------------+
ci-info: ++++++++++++++++++++++++++++Route IPv4 info+++++++++++++++++++++++++++++
ci-info: +-------+-------------+----------+-----------------+-----------+-------+
ci-info: | Route | Destination | Gateway  |     Genmask     | Interface | Flags |
ci-info: +-------+-------------+----------+-----------------+-----------+-------+
ci-info: |   0   |   0.0.0.0   | 10.0.0.1 |     0.0.0.0     |    eth0   |   UG  |
ci-info: |   1   |   10.0.0.0  | 0.0.0.0  |  255.255.255.0  |    eth0   |   U   |
ci-info: |   2   |   10.0.0.1  | 0.0.0.0  | 255.255.255.255 |    eth0   |   UH  |
ci-info: +-------+-------------+----------+-----------------+-----------+-------+
ci-info: +++++++++++++++++++Route IPv6 info+++++++++++++++++++
ci-info: +-------+-------------+---------+-----------+-------+
ci-info: | Route | Destination | Gateway | Interface | Flags |
ci-info: +-------+-------------+---------+-----------+-------+
ci-info: |   1   |  fe80::/64  |    ::   |    eth0   |   U   |
ci-info: |   3   |    local    |    ::   |    eth0   |   U   |
ci-info: |   4   |   ff00::/8  |    ::   |    eth0   |   U   |
ci-info: +-------+-------------+---------+-----------+-------+
Generating public/private rsa key pair.
Your identification has been saved in /etc/ssh/ssh_host_rsa_key.
Your public key has been saved in /etc/ssh/ssh_host_rsa_key.pub.
The key fingerprint is:
SHA256:Z+nJW9YGsFbeNhM5LjbfV8IHgzYv/B02ZTbH95z/gWc root@ip-10-0-0-33
The key's randomart image is:
+---[RSA 2048]----+
|                 |
|             . o |
|          . = *.B|
|           O * OB|
|        S * O X==|
|         * o XoB=|
|          + o.=E=|
|           + .o +|
|          .     .|
+----[SHA256]-----+
Generating public/private dsa key pair.
Your identification has been saved in /etc/ssh/ssh_host_dsa_key.
Your public key has been saved in /etc/ssh/ssh_host_dsa_key.pub.
The key fingerprint is:
SHA256:xA+PMNL6mbAsTKZb5xryIpZ3Pr4895J78f6uskvXLqI root@ip-10-0-0-33
The key's randomart image is:
+---[DSA 1024]----+
|                 |
|     . .         |
|    . + +        |
|     o + =       |
|  o o   S o      |
| = . + o .  .    |
|o * + + ..o. .   |
|oB *.+ +.+.o.    |
|+.+.===E*+*o=+   |
+----[SHA256]-----+
Generating public/private ecdsa key pair.
Your identification has been saved in /etc/ssh/ssh_host_ecdsa_key.
Your public key has been saved in /etc/ssh/ssh_host_ecdsa_key.pub.
The key fingerprint is:
SHA256:gbzn8m3F7LVF3uCtLky88FHZKM7dq5EJFmeBnDtmqHE root@ip-10-0-0-33
The key's randomart image is:
+---[ECDSA 256]---+
|          . o.   |
|     . .   +  .  |
|      o . ...o + |
|       o E == = o|
|      . S oO.= *.|
|       +  o @ * *|
|      . .  B B +.|
|       o .. * +. |
|        ...  +o  |
+----[SHA256]-----+
Generating public/private ed25519 key pair.
Your identification has been saved in /etc/ssh/ssh_host_ed25519_key.
Your public key has been saved in /etc/ssh/ssh_host_ed25519_key.pub.
The key fingerprint is:
SHA256:d27zqBRx2vk2KJFhjYF5COTXOz7ePYYUJ43t6yjJVmg root@ip-10-0-0-33
The key's randomart image is:
+--[ED25519 256]--+
|     .o. +.      |
|     .  +..+     |
|      . ..* o+   |
|       . . O+.+  |
|        S O.+=   |
|         oE*oo.  |
|         o=+=.+. |
|         o==.B+. |
|         .oo+o+. |
+----[SHA256]-----+
Cloud-init v. 19.2-36-g059d049c-0ubuntu2~18.04.1 running 'modules:config' at Fri, 25 Oct 2019 18:45:31 +0000. Up 36.76 seconds.
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[preflight] Running pre-flight checks before initializing the new control plane instance
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [ip-10-0-0-33.us-west-2.compute.internal kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local my-cluster-apiserver-777064584.us-west-2.elb.amazonaws.com] and IPs [10.96.0.1 10.0.0.33]
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [ip-10-0-0-33.us-west-2.compute.internal localhost] and IPs [10.0.0.33 127.0.0.1 ::1]
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [ip-10-0-0-33.us-west-2.compute.internal localhost] and IPs [10.0.0.33 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Valid certificates and keys now exist in "/etc/kubernetes/pki"
[certs] Using the existing "sa" key
[kubeconfig] Generating kubeconfig files
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[check-etcd] Checking that the etcd cluster is healthy
[kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.15" ConfigMap in the kube-system namespace
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Activating the kubelet service
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
[etcd] Announced new etcd member joining to the existing etcd cluster
[etcd] Wrote Static Pod manifest for a local etcd member to "/etc/kubernetes/manifests/etcd.yaml"
[etcd] Waiting for the new etcd member to join the cluster. This can take up to 40s
[kubelet-check] Initial timeout of 40s passed.
error execution phase control-plane-join/etcd: error creating local etcd static pod manifest file: timeout waiting for etcd cluster to be available
Cloud-init v. 19.2-36-g059d049c-0ubuntu2~18.04.1 running 'modules:final' at Fri, 25 Oct 2019 18:45:36 +0000. Up 41.33 seconds.
2019-10-25 18:47:03,173 - util.py[WARNING]: Failed running /var/lib/cloud/instance/scripts/runcmd [1]
2019-10-25 18:47:03,182 - cc_scripts_user.py[WARNING]: Failed to run module scripts-user (scripts in /var/lib/cloud/instance/scripts)
2019-10-25 18:47:03,182 - util.py[WARNING]: Running module scripts-user (<module 'cloudinit.config.cc_scripts_user' from '/usr/lib/python3/dist-packages/cloudinit/config/cc_scripts_user.py'>) failed
Cloud-init v. 19.2-36-g059d049c-0ubuntu2~18.04.1 finished at Fri, 25 Oct 2019 18:47:03 +0000. Datasource DataSourceEc2Local.  Up 128.39 seconds
Etcd pod logs
2019-10-25 18:45:53.692520 I | etcdmain: etcd Version: 3.3.10
2019-10-25 18:45:53.692923 I | etcdmain: Git SHA: 27fc7e2
2019-10-25 18:45:53.692929 I | etcdmain: Go Version: go1.10.4
2019-10-25 18:45:53.692934 I | etcdmain: Go OS/Arch: linux/amd64
2019-10-25 18:45:53.692939 I | etcdmain: setting maximum number of CPUs to 2, total number of available CPUs is 2
2019-10-25 18:45:53.693059 I | embed: peerTLS: cert = /etc/kubernetes/pki/etcd/peer.crt, key = /etc/kubernetes/pki/etcd/peer.key, ca = , trusted-ca = /etc/kubernetes/pki/etcd/ca.crt, client-cert-auth = true, crl-file = 
2019-10-25 18:45:53.694721 I | embed: listening for peers on https://10.0.0.33:2380
2019-10-25 18:45:53.694762 I | embed: listening for client requests on 10.0.0.33:2379
2019-10-25 18:45:53.694808 I | embed: listening for client requests on 127.0.0.1:2379
2019-10-25 18:45:54.622612 I | etcdserver: name = ip-10-0-0-33.us-west-2.compute.internal
2019-10-25 18:45:54.622634 I | etcdserver: data dir = /var/lib/etcd
2019-10-25 18:45:54.622642 I | etcdserver: member dir = /var/lib/etcd/member
2019-10-25 18:45:54.622647 I | etcdserver: heartbeat = 100ms
2019-10-25 18:45:54.622651 I | etcdserver: election = 1000ms
2019-10-25 18:45:54.622656 I | etcdserver: snapshot count = 10000
2019-10-25 18:45:54.622665 I | etcdserver: advertise client URLs = https://10.0.0.33:2379
2019-10-25 18:45:56.169521 I | etcdserver: starting member 787472b955109238 in cluster 18c4c43dc346df9a
2019-10-25 18:45:56.169562 I | raft: 787472b955109238 became follower at term 0
2019-10-25 18:45:56.169596 I | raft: newRaft 787472b955109238 [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
2019-10-25 18:45:56.169606 I | raft: 787472b955109238 became follower at term 1
2019-10-25 18:45:59.227793 W | auth: simple token is not cryptographically signed
2019-10-25 18:46:01.023881 I | rafthttp: started HTTP pipelining with peer 548ffd4cb5ba47dd
2019-10-25 18:46:01.023918 I | rafthttp: started HTTP pipelining with peer 583eaa54d93ebf12
2019-10-25 18:46:01.023978 I | rafthttp: started HTTP pipelining with peer a750b6bd7c140a58
2019-10-25 18:46:01.024000 I | rafthttp: starting peer 548ffd4cb5ba47dd...
2019-10-25 18:46:01.024140 I | rafthttp: started HTTP pipelining with peer 548ffd4cb5ba47dd
2019-10-25 18:46:01.032300 I | rafthttp: started streaming with peer 548ffd4cb5ba47dd (writer)
2019-10-25 18:46:01.032668 I | rafthttp: started streaming with peer 548ffd4cb5ba47dd (writer)
2019-10-25 18:46:01.033591 I | rafthttp: started peer 548ffd4cb5ba47dd
2019-10-25 18:46:01.033611 I | rafthttp: added peer 548ffd4cb5ba47dd
2019-10-25 18:46:01.033623 I | rafthttp: starting peer 583eaa54d93ebf12...
2019-10-25 18:46:01.033639 I | rafthttp: started HTTP pipelining with peer 583eaa54d93ebf12
2019-10-25 18:46:01.039243 I | rafthttp: started peer 583eaa54d93ebf12
2019-10-25 18:46:01.039284 I | rafthttp: added peer 583eaa54d93ebf12
2019-10-25 18:46:01.039298 I | rafthttp: starting peer a750b6bd7c140a58...
2019-10-25 18:46:01.039318 I | rafthttp: started HTTP pipelining with peer a750b6bd7c140a58
2019-10-25 18:46:01.041946 I | rafthttp: started streaming with peer 583eaa54d93ebf12 (writer)
2019-10-25 18:46:01.041971 I | rafthttp: started streaming with peer 583eaa54d93ebf12 (writer)
2019-10-25 18:46:01.042035 I | rafthttp: started streaming with peer 583eaa54d93ebf12 (stream MsgApp v2 reader)
2019-10-25 18:46:01.042213 I | rafthttp: started streaming with peer 583eaa54d93ebf12 (stream Message reader)
2019-10-25 18:46:01.042349 I | rafthttp: started streaming with peer 548ffd4cb5ba47dd (stream MsgApp v2 reader)
2019-10-25 18:46:01.042511 I | rafthttp: started streaming with peer a750b6bd7c140a58 (writer)
2019-10-25 18:46:01.042574 I | rafthttp: started streaming with peer 548ffd4cb5ba47dd (stream Message reader)
2019-10-25 18:46:01.043091 I | rafthttp: started streaming with peer a750b6bd7c140a58 (writer)
2019-10-25 18:46:01.044809 I | rafthttp: started peer a750b6bd7c140a58
2019-10-25 18:46:01.044843 I | rafthttp: added peer a750b6bd7c140a58
2019-10-25 18:46:01.044862 I | etcdserver: starting server... [version: 3.3.10, cluster version: to_be_decided]
2019-10-25 18:46:01.046926 I | rafthttp: started streaming with peer a750b6bd7c140a58 (stream MsgApp v2 reader)
2019-10-25 18:46:01.047113 I | rafthttp: started streaming with peer a750b6bd7c140a58 (stream Message reader)
2019-10-25 18:46:01.259064 I | rafthttp: peer 548ffd4cb5ba47dd became active
2019-10-25 18:46:01.259098 I | rafthttp: established a TCP streaming connection with peer 548ffd4cb5ba47dd (stream MsgApp v2 reader)
2019-10-25 18:46:01.264263 I | embed: ClientTLS: cert = /etc/kubernetes/pki/etcd/server.crt, key = /etc/kubernetes/pki/etcd/server.key, ca = , trusted-ca = /etc/kubernetes/pki/etcd/ca.crt, client-cert-auth = true, crl-file = 
2019-10-25 18:46:01.708120 I | rafthttp: established a TCP streaming connection with peer 548ffd4cb5ba47dd (stream Message reader)
2019-10-25 18:46:01.713546 I | rafthttp: peer 583eaa54d93ebf12 became active
2019-10-25 18:46:01.713567 I | rafthttp: established a TCP streaming connection with peer 583eaa54d93ebf12 (stream Message reader)
2019-10-25 18:46:01.714125 I | rafthttp: peer a750b6bd7c140a58 became active
2019-10-25 18:46:01.714145 I | rafthttp: established a TCP streaming connection with peer a750b6bd7c140a58 (stream Message reader)
2019-10-25 18:46:01.714463 I | rafthttp: established a TCP streaming connection with peer a750b6bd7c140a58 (stream MsgApp v2 reader)
2019-10-25 18:46:01.714597 I | rafthttp: established a TCP streaming connection with peer 583eaa54d93ebf12 (stream MsgApp v2 reader)
2019-10-25 18:46:01.762928 I | etcdserver: 787472b955109238 initialzed peer connection; fast-forwarding 8 ticks (election ticks 10) with 3 active peer(s)
2019-10-25 18:46:01.786915 I | raft: 787472b955109238 [term: 1] received a MsgHeartbeat message with higher term from 583eaa54d93ebf12 [term: 10]
2019-10-25 18:46:01.786941 I | raft: 787472b955109238 became follower at term 10
2019-10-25 18:46:01.786951 I | raft: raft.node: 787472b955109238 elected leader 583eaa54d93ebf12 at term 10
2019-10-25 18:46:02.132799 I | rafthttp: established a TCP streaming connection with peer 583eaa54d93ebf12 (stream MsgApp v2 writer)
2019-10-25 18:46:02.133329 I | rafthttp: established a TCP streaming connection with peer 548ffd4cb5ba47dd (stream MsgApp v2 writer)
2019-10-25 18:46:02.133592 I | rafthttp: established a TCP streaming connection with peer a750b6bd7c140a58 (stream MsgApp v2 writer)
2019-10-25 18:46:02.133841 I | rafthttp: established a TCP streaming connection with peer 548ffd4cb5ba47dd (stream Message writer)
2019-10-25 18:46:02.134112 I | rafthttp: established a TCP streaming connection with peer 583eaa54d93ebf12 (stream Message writer)
2019-10-25 18:46:02.134319 I | rafthttp: established a TCP streaming connection with peer a750b6bd7c140a58 (stream Message writer)
2019-10-25 18:46:02.144818 W | rafthttp: closed an existing TCP streaming connection with peer 548ffd4cb5ba47dd (stream MsgApp v2 writer)
2019-10-25 18:46:02.232295 I | rafthttp: established a TCP streaming connection with peer 548ffd4cb5ba47dd (stream MsgApp v2 writer)
2019-10-25 18:46:02.232561 W | rafthttp: closed an existing TCP streaming connection with peer 583eaa54d93ebf12 (stream MsgApp v2 writer)
2019-10-25 18:46:02.232658 I | rafthttp: established a TCP streaming connection with peer 583eaa54d93ebf12 (stream MsgApp v2 writer)
2019-10-25 18:46:02.232730 W | rafthttp: closed an existing TCP streaming connection with peer 548ffd4cb5ba47dd (stream Message writer)
2019-10-25 18:46:02.232827 I | rafthttp: established a TCP streaming connection with peer 548ffd4cb5ba47dd (stream Message writer)
2019-10-25 18:46:02.232911 W | rafthttp: closed an existing TCP streaming connection with peer a750b6bd7c140a58 (stream Message writer)
2019-10-25 18:46:02.232982 I | rafthttp: established a TCP streaming connection with peer a750b6bd7c140a58 (stream Message writer)
2019-10-25 18:46:02.233081 W | rafthttp: closed an existing TCP streaming connection with peer a750b6bd7c140a58 (stream MsgApp v2 writer)
2019-10-25 18:46:02.233163 I | rafthttp: established a TCP streaming connection with peer a750b6bd7c140a58 (stream MsgApp v2 writer)
2019-10-25 18:46:02.233228 W | rafthttp: closed an existing TCP streaming connection with peer 583eaa54d93ebf12 (stream Message writer)
2019-10-25 18:46:02.233322 I | rafthttp: established a TCP streaming connection with peer 583eaa54d93ebf12 (stream Message writer)
2019-10-25 18:46:02.559794 I | etcdserver/membership: added member 40e0ff5ee27c98d0 [https://10.0.0.6:2380] to cluster 18c4c43dc346df9a
2019-10-25 18:46:02.559828 I | rafthttp: starting peer 40e0ff5ee27c98d0...
2019-10-25 18:46:02.559844 I | rafthttp: started HTTP pipelining with peer 40e0ff5ee27c98d0
2019-10-25 18:46:02.598979 I | rafthttp: started streaming with peer 40e0ff5ee27c98d0 (writer)
2019-10-25 18:46:02.599185 I | rafthttp: started streaming with peer 40e0ff5ee27c98d0 (writer)
2019-10-25 18:46:02.599216 I | rafthttp: started peer 40e0ff5ee27c98d0
2019-10-25 18:46:02.599273 I | rafthttp: added peer 40e0ff5ee27c98d0
2019-10-25 18:46:02.599436 N | etcdserver/membership: set the initial cluster version to 3.3
2019-10-25 18:46:02.599487 I | etcdserver/api: enabled capabilities for version 3.3
2019-10-25 18:46:02.603791 I | rafthttp: started streaming with peer 40e0ff5ee27c98d0 (stream Message reader)
2019-10-25 18:46:02.622369 I | rafthttp: started streaming with peer 40e0ff5ee27c98d0 (stream MsgApp v2 reader)
proto: no coders for int
proto: no encoder for ValueSize int [GetProperties]
2019-10-25 18:46:02.743390 W | etcdserver: request "header:<ID:11011422053164772752 username:\"kube-apiserver-etcd-client\" auth_revision:1 > txn:<compare:<target:MOD key:\"/registry/ranges/serviceips\" mod_revision:0 > success:<request_put:<key:\"/registry/ranges/serviceips\" value_size:69 >> failure:<>>" with result "size:14" took too long (118.164746ms) to execute
2019-10-25 18:46:02.757966 I | etcdserver/membership: added member 548ffd4cb5ba47dd [https://10.0.0.94:2380] to cluster 18c4c43dc346df9a
2019-10-25 18:46:02.759168 I | etcdserver/membership: added member ae68fddb8a6f870b [https://10.0.0.167:2380] to cluster 18c4c43dc346df9a
2019-10-25 18:46:02.759223 I | rafthttp: starting peer ae68fddb8a6f870b...
2019-10-25 18:46:02.759303 I | rafthttp: started HTTP pipelining with peer ae68fddb8a6f870b
2019-10-25 18:46:02.761786 I | rafthttp: started streaming with peer ae68fddb8a6f870b (writer)
2019-10-25 18:46:02.772671 I | rafthttp: started streaming with peer ae68fddb8a6f870b (writer)
2019-10-25 18:46:02.833246 I | rafthttp: started peer ae68fddb8a6f870b
2019-10-25 18:46:02.872593 I | rafthttp: added peer ae68fddb8a6f870b
2019-10-25 18:46:02.872806 I | rafthttp: started streaming with peer ae68fddb8a6f870b (stream MsgApp v2 reader)
2019-10-25 18:46:02.873165 I | rafthttp: started streaming with peer ae68fddb8a6f870b (stream Message reader)
2019-10-25 18:46:03.154518 W | etcdserver: request "header:<ID:11011422053164773868 username:\"kube-apiserver-etcd-client\" auth_revision:1 > txn:<compare:<target:MOD key:\"/registry/rolebindings/kube-system/kubeadm:nodes-kubeadm-config\" mod_revision:0 > success:<request_put:<key:\"/registry/rolebindings/kube-system/kubeadm:nodes-kubeadm-config\" value_size:363 >> failure:<>>" with result "size:5" took too long (281.681761ms) to execute
2019-10-25 18:46:03.230637 I | etcdserver/membership: added member 583eaa54d93ebf12 [https://10.0.0.181:2380] to cluster 18c4c43dc346df9a
2019-10-25 18:46:03.726336 I | mvcc: store.index: compact 1074
2019-10-25 18:46:03.727009 W | etcdserver: request "header:<ID:11011422053164777695 username:\"kube-apiserver-etcd-client\" auth_revision:1 > compaction:<revision:1074 > " with result "size:5" took too long (486.51911ms) to execute
2019-10-25 18:46:03.728453 I | mvcc: finished scheduled compaction at 1074 (took 1.121706ms)
2019-10-25 18:46:03.728761 I | etcdserver/membership: removed member 40e0ff5ee27c98d0 from cluster 18c4c43dc346df9a
2019-10-25 18:46:03.728778 I | rafthttp: stopping peer 40e0ff5ee27c98d0...
2019-10-25 18:46:03.728793 I | rafthttp: stopped streaming with peer 40e0ff5ee27c98d0 (writer)
2019-10-25 18:46:03.728882 I | rafthttp: stopped streaming with peer 40e0ff5ee27c98d0 (writer)
2019-10-25 18:46:03.728960 I | rafthttp: stopped HTTP pipelining with peer 40e0ff5ee27c98d0
2019-10-25 18:46:03.729033 I | rafthttp: stopped streaming with peer 40e0ff5ee27c98d0 (stream MsgApp v2 reader)
2019-10-25 18:46:03.729057 I | rafthttp: stopped streaming with peer 40e0ff5ee27c98d0 (stream Message reader)
2019-10-25 18:46:03.729111 I | rafthttp: stopped peer 40e0ff5ee27c98d0
2019-10-25 18:46:03.729135 I | rafthttp: removed peer 40e0ff5ee27c98d0
2019-10-25 18:46:03.744171 I | etcdserver/membership: added member a750b6bd7c140a58 [https://10.0.0.90:2380] to cluster 18c4c43dc346df9a
2019-10-25 18:46:03.746698 I | etcdserver/membership: removed member ae68fddb8a6f870b from cluster 18c4c43dc346df9a
2019-10-25 18:46:03.746718 I | rafthttp: stopping peer ae68fddb8a6f870b...
2019-10-25 18:46:03.746734 I | rafthttp: stopped streaming with peer ae68fddb8a6f870b (writer)
2019-10-25 18:46:03.746840 I | rafthttp: stopped streaming with peer ae68fddb8a6f870b (writer)
2019-10-25 18:46:03.746871 I | rafthttp: stopped HTTP pipelining with peer ae68fddb8a6f870b
2019-10-25 18:46:03.747005 I | rafthttp: stopped streaming with peer ae68fddb8a6f870b (stream MsgApp v2 reader)
2019-10-25 18:46:03.747025 I | rafthttp: stopped streaming with peer ae68fddb8a6f870b (stream Message reader)
2019-10-25 18:46:03.747031 I | rafthttp: stopped peer ae68fddb8a6f870b
2019-10-25 18:46:03.747097 I | rafthttp: removed peer ae68fddb8a6f870b
2019-10-25 18:46:03.757489 I | etcdserver/membership: added member 787472b955109238 [https://10.0.0.33:2380] to cluster 18c4c43dc346df9a
2019-10-25 18:46:03.762404 I | etcdserver: published {Name:ip-10-0-0-33.us-west-2.compute.internal ClientURLs:[https://10.0.0.33:2379]} to cluster 18c4c43dc346df9a
2019-10-25 18:46:03.762676 I | embed: ready to serve client requests
2019-10-25 18:46:03.848641 I | embed: ready to serve client requests
2019-10-25 18:46:04.264885 I | embed: serving client requests on 10.0.0.33:2379
2019-10-25 18:46:04.272806 I | embed: rejected connection from "10.0.0.33:36214" (error "EOF", ServerName "")
2019-10-25 18:46:04.277054 I | embed: rejected connection from "10.0.0.33:36228" (error "EOF", ServerName "")
2019-10-25 18:46:04.277211 I | embed: rejected connection from "10.0.0.33:36252" (error "EOF", ServerName "")
2019-10-25 18:46:04.284249 I | embed: serving client requests on 127.0.0.1:2379
2019-10-25 18:46:04.710241 W | rafthttp: lost the TCP streaming connection with peer 548ffd4cb5ba47dd (stream MsgApp v2 reader)
2019-10-25 18:46:04.711109 W | rafthttp: lost the TCP streaming connection with peer 548ffd4cb5ba47dd (stream Message reader)
2019-10-25 18:46:04.741021 E | rafthttp: failed to dial 548ffd4cb5ba47dd on stream MsgApp v2 (EOF)
2019-10-25 18:46:04.741096 I | rafthttp: peer 548ffd4cb5ba47dd became inactive (message send to peer failed)
2019-10-25 18:46:05.239809 I | embed: rejected connection from "10.0.0.94:46014" (error "read tcp 10.0.0.33:2380->10.0.0.94:46014: read: connection reset by peer", ServerName "")
2019-10-25 18:46:06.042094 W | rafthttp: health check for peer 548ffd4cb5ba47dd could not connect: dial tcp 10.0.0.94:2380: connect: connection refused (prober "ROUND_TRIPPER_RAFT_MESSAGE")
2019-10-25 18:46:06.042288 W | rafthttp: health check for peer 548ffd4cb5ba47dd could not connect: dial tcp 10.0.0.94:2380: connect: connection refused (prober "ROUND_TRIPPER_SNAPSHOT")
2019-10-25 18:46:06.071445 W | etcdserver: request "header:<ID:5178416110811058421 username:\"kube-apiserver-etcd-client\" auth_revision:1 > txn:<compare:<target:MOD key:\"/registry/services/endpoints/kube-system/kube-controller-manager\" mod_revision:2633 > success:<request_put:<key:\"/registry/services/endpoints/kube-system/kube-controller-manager\" value_size:366 >> failure:<>>" with result "size:16" took too long (1.777811004s) to execute
2019-10-25 18:46:06.071807 I | etcdserver/membership: removed member 548ffd4cb5ba47dd from cluster 18c4c43dc346df9a
2019-10-25 18:46:06.071827 I | rafthttp: stopping peer 548ffd4cb5ba47dd...
2019-10-25 18:46:06.072177 I | rafthttp: closed the TCP streaming connection with peer 548ffd4cb5ba47dd (stream MsgApp v2 writer)
2019-10-25 18:46:06.072191 I | rafthttp: stopped streaming with peer 548ffd4cb5ba47dd (writer)
2019-10-25 18:46:06.072577 I | rafthttp: closed the TCP streaming connection with peer 548ffd4cb5ba47dd (stream Message writer)
2019-10-25 18:46:06.072594 I | rafthttp: stopped streaming with peer 548ffd4cb5ba47dd (writer)
2019-10-25 18:46:06.072615 I | rafthttp: stopped HTTP pipelining with peer 548ffd4cb5ba47dd
2019-10-25 18:46:06.072628 I | rafthttp: stopped streaming with peer 548ffd4cb5ba47dd (stream MsgApp v2 reader)
2019-10-25 18:46:06.072674 I | rafthttp: stopped streaming with peer 548ffd4cb5ba47dd (stream Message reader)
2019-10-25 18:46:06.072709 I | rafthttp: stopped peer 548ffd4cb5ba47dd
2019-10-25 18:46:06.072734 I | rafthttp: removed peer 548ffd4cb5ba47dd
2019-10-25 18:46:06.233077 W | wal: sync duration of 1.101617115s, expected less than 1s
2019-10-25 18:46:06.694841 W | etcdserver: request "header:<ID:13768187975200604220 username:\"kube-apiserver-etcd-client\" auth_revision:1 > txn:<compare:<target:MOD key:\"/registry/minions/ip-10-0-0-94.us-west-2.compute.internal\" mod_revision:2531 > success:<request_put:<key:\"/registry/minions/ip-10-0-0-94.us-west-2.compute.internal\" value_size:2926 >> failure:<>>" with result "size:16" took too long (461.256113ms) to execute
2019-10-25 18:46:08.171830 I | embed: rejected connection from "10.0.0.33:36376" (error "remote error: tls: bad certificate", ServerName "")
2019-10-25 18:46:08.614328 W | etcdserver: read-only range request "key:\"/registry/apiextensions.k8s.io/customresourcedefinitions\" range_end:\"/registry/apiextensions.k8s.io/customresourcedefinitiont\" count_only:true " with result "range_response_count:0 size:5" took too long (590.030795ms) to execute
2019-10-25 18:46:08.614655 W | etcdserver: read-only range request "key:\"/registry/apiextensions.k8s.io/customresourcedefinitions/\" range_end:\"/registry/apiextensions.k8s.io/customresourcedefinitions0\" limit:10000 " with result "range_response_count:0 size:5" took too long (602.507946ms) to execute
2019-10-25 18:46:09.218253 I | embed: rejected connection from "10.0.0.33:36428" (error "remote error: tls: bad certificate", ServerName "")
2019-10-25 18:46:09.568821 W | etcdserver: read-only range request "key:\"/registry/certificatesigningrequests/\" range_end:\"/registry/certificatesigningrequests0\" limit:10000 " with result "range_response_count:10 size:18706" took too long (193.823471ms) to execute
2019-10-25 18:46:09.571006 W | etcdserver: read-only range request "key:\"/registry/certificatesigningrequests\" range_end:\"/registry/certificatesigningrequestt\" count_only:true " with result "range_response_count:0 size:7" took too long (195.952555ms) to execute
2019-10-25 18:46:09.571643 W | etcdserver: read-only range request "key:\"/registry/deployments\" range_end:\"/registry/deploymentt\" count_only:true " with result "range_response_count:0 size:7" took too long (138.46605ms) to execute
2019-10-25 18:46:09.572548 W | etcdserver: read-only range request "key:\"/registry/deployments/\" range_end:\"/registry/deployments0\" limit:10000 " with result "range_response_count:1 size:1289" took too long (138.577944ms) to execute
2019-10-25 18:46:09.572861 W | etcdserver: read-only range request "key:\"/registry/daemonsets/\" range_end:\"/registry/daemonsets0\" limit:10000 " with result "range_response_count:1 size:1060" took too long (159.632747ms) to execute
2019-10-25 18:46:09.573554 W | etcdserver: read-only range request "key:\"/registry/daemonsets\" range_end:\"/registry/daemonsett\" count_only:true " with result "range_response_count:0 size:7" took too long (159.673077ms) to execute
2019-10-25 18:46:09.573679 W | etcdserver: read-only range request "key:\"/registry/controllers\" range_end:\"/registry/controllert\" count_only:true " with result "range_response_count:0 size:5" took too long (168.713817ms) to execute
2019-10-25 18:46:09.573879 W | etcdserver: read-only range request "key:\"/registry/controllers/\" range_end:\"/registry/controllers0\" limit:10000 " with result "range_response_count:0 size:5" took too long (168.748407ms) to execute
2019-10-25 18:46:09.574082 W | etcdserver: read-only range request "key:\"/registry/leases\" range_end:\"/registry/leaset\" count_only:true " with result "range_response_count:0 size:7" took too long (178.012634ms) to execute
2019-10-25 18:46:09.575447 W | etcdserver: read-only range request "key:\"/registry/leases/\" range_end:\"/registry/leases0\" limit:10000 " with result "range_response_count:6 size:2452" took too long (178.490785ms) to execute
2019-10-25 18:46:09.575609 W | etcdserver: read-only range request "key:\"/registry/leases\" range_end:\"/registry/leaset\" count_only:true " with result "range_response_count:0 size:7" took too long (187.102653ms) to execute
2019-10-25 18:46:09.575887 W | etcdserver: read-only range request "key:\"/registry/leases/\" range_end:\"/registry/leases0\" limit:10000 " with result "range_response_count:6 size:2452" took too long (187.172172ms) to execute
2019-10-25 18:46:10.951630 I | embed: rejected connection from "10.0.0.33:36544" (error "remote error: tls: bad certificate", ServerName "")
2019-10-25 18:46:18.172503 I | embed: rejected connection from "10.0.0.33:36620" (error "remote error: tls: bad certificate", ServerName "")
2019-10-25 18:46:19.172532 I | embed: rejected connection from "10.0.0.33:36626" (error "remote error: tls: bad certificate", ServerName "")
2019-10-25 18:46:20.827869 I | embed: rejected connection from "10.0.0.33:36636" (error "remote error: tls: bad certificate", ServerName "")
2019-10-25 18:46:28.172875 I | embed: rejected connection from "10.0.0.33:36672" (error "remote error: tls: bad certificate", ServerName "")
2019-10-25 18:46:29.172957 I | embed: rejected connection from "10.0.0.33:36684" (error "remote error: tls: bad certificate", ServerName "")
2019-10-25 18:46:30.529317 I | embed: rejected connection from "10.0.0.33:36688" (error "remote error: tls: bad certificate", ServerName "")
2019-10-25 18:46:32.810440 I | embed: rejected connection from "10.0.0.33:36698" (error "remote error: tls: bad certificate", ServerName "")
2019-10-25 18:46:38.173331 I | embed: rejected connection from "10.0.0.33:36720" (error "remote error: tls: bad certificate", ServerName "")
2019-10-25 18:46:39.173633 I | embed: rejected connection from "10.0.0.33:36728" (error "remote error: tls: bad certificate", ServerName "")
2019-10-25 18:46:40.892817 I | embed: rejected connection from "10.0.0.33:36736" (error "remote error: tls: bad certificate", ServerName "")
2019-10-25 18:46:43.169799 I | embed: rejected connection from "10.0.0.33:36752" (error "EOF", ServerName "")
2019-10-25 18:46:48.173585 I | embed: rejected connection from "10.0.0.33:36770" (error "remote error: tls: bad certificate", ServerName "")
2019-10-25 18:46:49.173673 I | embed: rejected connection from "10.0.0.33:36782" (error "remote error: tls: bad certificate", ServerName "")
2019-10-25 18:46:50.721219 I | embed: rejected connection from "10.0.0.33:36792" (error "remote error: tls: bad certificate", ServerName "")
2019-10-25 18:46:58.174266 I | embed: rejected connection from "10.0.0.33:36826" (error "remote error: tls: bad certificate", ServerName "")
2019-10-25 18:46:59.174582 I | embed: rejected connection from "10.0.0.33:36834" (error "remote error: tls: bad certificate", ServerName "")
2019-10-25 18:47:00.531769 I | embed: rejected connection from "10.0.0.33:36842" (error "remote error: tls: bad certificate", ServerName "")
2019-10-25 18:47:02.694785 I | embed: rejected connection from "10.0.0.33:36858" (error "remote error: tls: bad certificate", ServerName "")
2019-10-25 18:48:40.021002 I | mvcc: store.index: compact 2237
2019-10-25 18:48:40.022921 I | mvcc: finished scheduled compaction at 2237 (took 1.571358ms)
2019-10-25 18:53:40.023436 I | mvcc: store.index: compact 2989
2019-10-25 18:53:40.024863 I | mvcc: finished scheduled compaction at 2989 (took 1.073696ms)
2019-10-25 18:58:40.033984 I | mvcc: store.index: compact 3596
2019-10-25 18:58:40.035358 I | mvcc: finished scheduled compaction at 3596 (took 925.4µs)
2019-10-25 19:03:40.040973 I | mvcc: store.index: compact 4201
2019-10-25 19:03:40.042197 I | mvcc: finished scheduled compaction at 4201 (took 908.541µs)
2019-10-25 19:08:40.057052 I | mvcc: store.index: compact 4807
2019-10-25 19:08:40.058338 I | mvcc: finished scheduled compaction at 4807 (took 927.838µs)
2019-10-25 19:13:40.070843 I | mvcc: store.index: compact 5414
2019-10-25 19:13:40.072160 I | mvcc: finished scheduled compaction at 5414 (took 941.345µs)
2019-10-25 19:18:40.079209 I | mvcc: store.index: compact 6015
2019-10-25 19:18:40.080531 I | mvcc: finished scheduled compaction at 6015 (took 975.149µs)
2019-10-25 19:23:40.098631 I | mvcc: store.index: compact 6615
2019-10-25 19:23:40.100010 I | mvcc: finished scheduled compaction at 6615 (took 943.17µs)
2019-10-25 19:28:34.770009 I | etcdserver: start to snapshot (applied: 10001, lastsnap: 0)
2019-10-25 19:28:34.774648 I | etcdserver: saved snapshot at index 10001
2019-10-25 19:28:34.774834 I | etcdserver: compacted raft log at 5001
2019-10-25 19:28:40.098039 I | mvcc: store.index: compact 7217
2019-10-25 19:28:40.099386 I | mvcc: finished scheduled compaction at 7217 (took 936.324µs)
2019-10-25 19:33:40.104530 I | mvcc: store.index: compact 7818
2019-10-25 19:33:40.105923 I | mvcc: finished scheduled compaction at 7818 (took 955.293µs)

@fabriziopandini
Copy link

fabriziopandini commented Oct 27, 2019

@chuckha I don't have full context on how the upgrade tool works, so I might be fully out of the track, but there is something that seems strange to me in this sequence.

From what I understand from the logs, kubeadm is running an init sequence, so it generates new certificates for all components, etcd included.

[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [ip-10-0-0-33.us-west-2.compute.internal localhost] and IPs [10.0.0.33 127.0.0.1 ::1]

Then the new etcd instance starts on https://10.0.0.33:2379 and, if I'm not wrong, it tries to add two existing etcd members

2019-10-25 18:46:02.757966 I | etcdserver/membership: added member 548ffd4cb5ba47dd [https://10.0.0.94:2380] to cluster 18c4c43dc346df9a
2019-10-25 18:46:02.759168 I | etcdserver/membership: added member ae68fddb8a6f870b [https://10.0.0.167:2380] to cluster 18c4c43dc346df9a

What it is not clear to me is if/how the existing and new member are expected to communicate, considering the new member has a newly generated set of PKI

More generally. Is it correct to start an upgrade sequence with a plain kubeadm init, or should we share certificates before (or eventually use kubeadm join instead of kubeadm init)?

to start a larger conversation around the robustness of kubeadm as a tool we can depend on for automated installs

Happy to start this discussion. Lately, we are adding more and more retry trying to make kubeadm resilient to possible conditions that might happen in the wild, but we should always consider that those are a workaround that in some case retries might hide real problems (see e.g discussion on kubernetes/kubeadm#1793 that was fixed by kubernetes-retired/cluster-api-bootstrap-provider-kubeadm#250)

Also, this might be relevant for this issue kubernetes/kubeadm#1793

@ncdc
Copy link
Contributor

ncdc commented Oct 28, 2019

It definitely should be doing a join. That's what it's coded to do. We might have a bug somewhere.

@fabriziopandini
Copy link

fabriziopandini commented Oct 28, 2019

After further deep dive (thanks for help!):

There are no problems in the sequence and in certificates.

Most probably, the problem is in a combination of:

As a temporary workaround, a possible solution is to make the upgrade tool to wait for all the etcd members to report status before moving to the next node #124

A more long term solution is to review kubeadm wait and make it wait for the status of the joining member-only (not for the status of all the members)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants