inconsistency in writing to etcd (V3.0.14) - I think i have a broken cluster. #7533

eran-totango · 2017-03-19T14:11:54Z

Hi,

i think that one of my clusters is broken.

In a working cluster (our test env) :
When changing a setting in a kubernetes deployment spec. (for example. number of replicas),
all of my etcd servers (01,02 and 03) are immediately updated with the new setting.
by checking : ETCDCTL_API=3 etcdctl get /registry/deployments/default/
in each server.

In my broken cluster (our production env):
when i try to change a setting in a deployment,only etcd-01 is updated with the new settings (02 and 03) aren't being updated.

This issue causes major problem in our production environment,
for example: if kube-apiserver restarts, it takes an old configuration which causes our micro services to run with old versions etc..

I think i have a lead to the root cause, not sure though :
a few weeks ago, etcd-prod03 died. (failed its ec2 status checks).
We were having problems while trying to replace it with a new servers, but we eventually made it.
Could that be the problem ? that even though it has successfully connected to the cluster,
we're facing problems because of it?

When trying : ETCDCTL_API=3 etcdctl endpoint status
I get:

etcd-prod01: 127.0.0.1:2379, e2f74e1dab85cd6, 3.0.14, 20 MB, false, 5568, 51548878
etcd-prod02: 127.0.0.1:2379, 89bb99adae595af9, 3.0.14, 55 MB, false, 5568, 51548922
etcd-prod03: 127.0.0.1:2379, 44170dda23246fe, 3.0.14, 55 MB, true, 5568, 51548944

What should i do in such case?
Any help will be appreciated, Thanks.

The text was updated successfully, but these errors were encountered:

philips · 2017-03-19T22:22:04Z

How did you recover the new server? Did you try and restore from backup or do something else?

eran-totango · 2017-03-19T22:26:59Z

@philips i didn't restore from backup.
This server was added as a "new" server to the cluster

heyitsanthony · 2017-03-19T23:21:53Z

Updates should be visible on a majority of members, so something is clearly wrong.

when i try to change a setting in a deployment,only etcd-01 is updated with the new settings (02 and 03) aren't being updated.

This was tested with etcdctl get like on the test env?

etcd-prod01: 127.0.0.1:2379, e2f74e1dab85cd6, 3.0.14, 20 MB, false, 5568, 51548878
etcd-prod02: 127.0.0.1:2379, 89bb99adae595af9, 3.0.14, 55 MB, false, 5568, 51548922
etcd-prod03: 127.0.0.1:2379, 44170dda23246fe, 3.0.14, 55 MB, true, 5568, 51548944

This is strange because if 01 is accepting updates that aren't visible on 02 and 03, it should have a raft index larger than the other members, but instead it has 51548878 < 51548922, 51548944.

We were having problems while trying to replace it with a new servers, but we eventually made it. Could that be the problem ? that even though it has successfully connected to the cluster,
we're facing problems because of it?

"eventually made it" sounds like something could be misconfigured. What were the problems / what was the workaround?

The following information should help with debugging:

etcd server logs for each member
ETCDCTL_API=3 ./bin/etcdctl -w json get abc for each member
ETCDCTL_API=3 ./bin/etcdctl member list for each member

eran-totango · 2017-03-20T07:05:32Z

@heyitsanthony yeah, it was tested just like test env.

Could it be that whenever kube-apiserver starts it picks one of the etcd servers to talk to ?
that will maybe explain why it currently writes to 01 (and before its latest start it might have written to other server). not sure though

ETCDCTL_API=3 ./bin/etcdctl -w json get abc :

etcd-prod01 - {"header":{"cluster_id":16818498639733619310,"member_id":1022164153822371030,"revision":19787000,"raft_term":5568}}
etcd-prod02 - {"header":{"cluster_id":16818498639733619310,"member_id":9924695175074503417,"revision":15390316,"raft_term":5568}}
etcd-prod03 - {"header":{"cluster_id":16818498639733619310,"member_id":306650346849191678,"revision":15390319,"raft_term":5568}}

ETCDCTL_API=3 ./bin/etcdctl member list

gives the exact same input on the 3 servers : 
44170dda23246fe, started, etcd-prod03, http://etcd-prod03.internal:2380, http://10.0.107.64:2379
e2f74e1dab85cd6, started, etcd-prod01, http://etcd-prod01.internal:2380, http://10.0.107.200:2379
89bb99adae595af9, started, etcd-prod02, http://etcd-prod02.internal:2380, http://10.0.107.60:2379

server logs:
etcd-prod01 : https://ufile.io/bcf0c
etcd-prod02 : https://ufile.io/ff8ed
etcd-prod03: https://ufile.io/96da81

heyitsanthony · 2017-03-20T18:03:55Z

Could it be that whenever kube-apiserver starts it picks one of the etcd servers to talk to ?

It shouldn't matter so long as the requests go through consensus (which they do).

etcd-prod01 - {"header":{"cluster_id":16818498639733619310,"member_id":1022164153822371030,"revision":19787000,"raft_term":5568}}
etcd-prod02 - {"header":{"cluster_id":16818498639733619310,"member_id":9924695175074503417,"revision":15390316,"raft_term":5568}}
etcd-prod03 - {"header":{"cluster_id":16818498639733619310,"member_id":306650346849191678,"revision":15390319,"raft_term":5568}}

This is bad. 01's revision is way ahead of 02 and 03.

What steps were taken to add 03? @xiang90 thinks it's using an old snapshot that's out of sync with the raft log which is causing k8s's compare-and-swaps to fail on 02 and 03 but succeed on 01 (there's a safeguard in 3.1.0 for that now). Would it be possible send the etcd data directories (snap and wal directories) for each member to team-etcd@coreos.com to confirm?

As a workaround, the easiest fix would be to etcdctl member remove members etcd-02 and etcd-03 through etcd-01's endpoint (backing up member directories first), then etcdctl member add 02 and 03 back to 01 with fresh data directories.

eran-totango · 2017-03-20T23:22:07Z

@heyitsanthony
in order to re add 03 to the cluster, we launched a new ec2 instance and used this configuration:

etcd --name etcd-prod03 
--initial-advertise-peer-urls http://10.0.107.200:2380 
--listen-peer-urls http://10.0.107.200:2380 
--listen-client-urls http://10.0.107.200:2379,http://127.0.0.1:2379 
--advertise-client-urls http://10.0.107.200:2379 
--initial-cluster etcd-prod01=http://etcd-prod01.internal:2380,etcd-prod02=http://etcd-prod02.internal:2380,etcd-prod03=http://etcd-prod03.internal:2380 
--initial-cluster-state new 
--data-dir /mnt/etcd

it didn't work, then we used this thread to solve it:
#2780
from what i remember, we deleted the data directory and used --initial-cluster-state existing.

if i want to remove 02 and 03 and then re add them with fresh data directories, what flags should i use? the current command i use to start the etcd binary is this :

etcd --name <node_name> 
--initial-advertise-peer-urls http://<node_ip>:2380 
--listen-peer-urls http://<node_ip>:2380 
--listen-client-urls http://<node_ip>:2379,http://127.0.0.1:2379 
--advertise-client-urls http://<node_ip>:2379 
--initial-cluster etcd-prod01=http://etcd-prod01.internal:2380,etcd-prod02=http://etcd-prod02.internal:2380,etcd-prod03=http://etcd-prod03.internal:2380 
--initial-cluster-state new --data-dir /mnt/etcd

I'd prefer not to send you the data directories since they contain information about our production environment, such as secret keys etc..
Is there anything i can do to check something for you?

heyitsanthony · 2017-03-20T23:43:21Z

@eran-totango that command looks OK. Some comments:

etcdctl snapshot 01 before doing anything (this is in case the cluster is totally wrecked, then it can be restored using https://github.com/coreos/etcd/blob/master/Documentation/op-guide/recovery.md)
back up the member directories before reconfiguring membership and clear out /mnt/etcd before launching the process
use --initial-cluster-state existing

Note that there'll be a brief loss of availability when going from 1->2 nodes since the cluster has to wait until the second member comes up. It's possible to do this without the major outage (there'll be a short leader election) by removing/adding the leader node until 01 is elected (this will be reflected in etcdctl endpoint status and the logs), then remove/add the remaining member.

No thoughts on what to do without direct access to the wal/snap. /cc @xiang90 any thoughts?

xiang90 · 2017-03-20T23:47:11Z

@heyitsanthony We can write a tool to clear out the actual value and leave the metadata.

@eran-totango We do want to figure out the root cause of the issue. If you would love to help, we can probably hack out a tool to wipe out the sensitive data before you send anything to us.

eran-totango · 2017-03-21T08:11:58Z

@xiang90 sure, let's do it.

eran-totango · 2017-03-21T11:48:23Z

@heyitsanthony @xiang90
I'm now having problems when trying to restore from snapshot :(

I created a snapshot of etcd-prod01 using this command:
ETCDCTL_API=3 etcdctl --endpoints http://localhost:2379 snapshot save snapshot.db
I created a new etcd cluster from scratch.
(etcd-prod-01, etcd-prod-02, etcd-prod-03 instead of etcd-prod01, etcd-prod02, etcd-prod03)
I copied the etcd-prod01 snapshot file to etcd-prod-01.
I tried to restore it with this command :
from etcd-prod-01:

ETCDCTL_API=3 etcdctl snapshot restore snapshot.db 
--name etcd-prod-01 (the new server)
--initial-cluster etcd-prod-01=http://etcd-prod-01.internal:2380,etcd-prod-02=http://etcd-prod-02.internal:2380,etcd-prod-03=http://etcd-prod-03.internal:2380 
--initial-advertise-peer-urls http://10.0.107.103:2380

and i'm getting this error:

2017-03-21 11:37:16.057468 I | netutil: resolving etcd-prod-01.internal:2380 to 10.0.107.103:2380
2017-03-21 11:37:16.253211 I | mvcc: restore compact to 19944978
panic: no lessor to attach lease

goroutine 1 [running]:
panic(0xc60aa0, 0xc82000c300)
	/usr/local/go/src/runtime/panic.go:481 +0x3e6
github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/mvcc.(*store).restore(0xc8200b0180, 0x0, 0x0)
	/home/gyuho/go/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/mvcc/kvstore.go:420 +0x1335
github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/mvcc.NewStore(0x7f2fcd3af138, 0xc820219b60, 0x0, 0x0, 0x7f2fcd3af190, 0xc8201b55a8, 0x20)
	/home/gyuho/go/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/mvcc/kvstore.go:120 +0x39e
github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdctl/ctlv3/command.makeDB(0xc8201c4fe0, 0x1d, 0x7ffe3c7d0808, 0xb, 0x3)
	/home/gyuho/go/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdctl/ctlv3/command/snapshot_command.go:374 +0xaef
github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdctl/ctlv3/command.snapshotRestoreCommandFunc(0xc8201ba200, 0xc8201bc380, 0x1, 0x7)
	/home/gyuho/go/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdctl/ctlv3/command/snapshot_command.go:194 +0x66a
github.com/coreos/etcd/cmd/vendor/github.com/spf13/cobra.(*Command).execute(0xc8201ba200, 0xc8201bc310, 0x7, 0x7, 0x0, 0x0)
	/home/gyuho/go/src/github.com/coreos/etcd/cmd/vendor/github.com/spf13/cobra/command.go:572 +0x85a
github.com/coreos/etcd/cmd/vendor/github.com/spf13/cobra.(*Command).ExecuteC(0x1592160, 0xc8201ba200, 0x0, 0x0)
	/home/gyuho/go/src/github.com/coreos/etcd/cmd/vendor/github.com/spf13/cobra/command.go:662 +0x53f
github.com/coreos/etcd/cmd/vendor/github.com/spf13/cobra.(*Command).Execute(0x1592160, 0x0, 0x0)
	/home/gyuho/go/src/github.com/coreos/etcd/cmd/vendor/github.com/spf13/cobra/command.go:618 +0x2d
github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdctl/ctlv3.Start()
	/home/gyuho/go/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdctl/ctlv3/ctl.go:96 +0x8f
main.main()
	/home/gyuho/go/src/github.com/coreos/etcd/cmd/etcdctl/main.go:40 +0x111

I did the exact same steps in my test environment (1-4) and it worked perfectly.
Both clusters are running V3.0.14. what etcd version should i use?
(running a standalone kubernetes cluster V1.5.2)

xiang90 · 2017-03-21T17:47:54Z

@eran-totango k8s never works with etcd 3.0.14. it works with etcd 3.0.17+.

xiang90 · 2017-03-21T17:52:06Z

@eran-totango Well. I was wrong. etcd 3.0.12+ should be OK.

But have you ever ran your cluster with a previous version of etcd? Or it was created with etcd 3.0.14?

xiang90 · 2017-03-21T17:55:17Z

2017-03-21 11:37:16.253211 I | mvcc: restore compact to 19944978
panic: no lessor to attach lease

This is fixed by #7203.

You need a newer version of etcdctl to recover the backup. Try etcd 3.0.17.

eran-totango · 2017-03-21T18:02:03Z

@xiang90 when i created the kubernetes cluster i was running etcd V3.0.7 and then i upgraded to V3.0.14. I'll try to restore with etcdctl V3.0.17 and will let you know.

should i upgrade my etcd to V3.0.17 as well?

heyitsanthony · 2017-03-28T18:06:50Z

@eran-totango yes, upgrading is recommended

heyitsanthony · 2017-05-26T17:48:04Z

Appears to be configuration issue and possibly state machine inconsistency that's since been fixed; not much else to do here. Closing.

heyitsanthony closed this as completed May 26, 2017

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

inconsistency in writing to etcd (V3.0.14) - I think i have a broken cluster. #7533

inconsistency in writing to etcd (V3.0.14) - I think i have a broken cluster. #7533

eran-totango commented Mar 19, 2017

philips commented Mar 19, 2017

eran-totango commented Mar 19, 2017

heyitsanthony commented Mar 19, 2017

eran-totango commented Mar 20, 2017 •

edited

Loading

heyitsanthony commented Mar 20, 2017

eran-totango commented Mar 20, 2017

heyitsanthony commented Mar 20, 2017

xiang90 commented Mar 20, 2017

eran-totango commented Mar 21, 2017

eran-totango commented Mar 21, 2017 •

edited

Loading

xiang90 commented Mar 21, 2017

xiang90 commented Mar 21, 2017 •

edited

Loading

xiang90 commented Mar 21, 2017

eran-totango commented Mar 21, 2017 •

edited

Loading

heyitsanthony commented Mar 28, 2017

heyitsanthony commented May 26, 2017

inconsistency in writing to etcd (V3.0.14) - I think i have a broken cluster. #7533

inconsistency in writing to etcd (V3.0.14) - I think i have a broken cluster. #7533

Comments

eran-totango commented Mar 19, 2017

philips commented Mar 19, 2017

eran-totango commented Mar 19, 2017

heyitsanthony commented Mar 19, 2017

eran-totango commented Mar 20, 2017 • edited Loading

heyitsanthony commented Mar 20, 2017

eran-totango commented Mar 20, 2017

heyitsanthony commented Mar 20, 2017

xiang90 commented Mar 20, 2017

eran-totango commented Mar 21, 2017

eran-totango commented Mar 21, 2017 • edited Loading

xiang90 commented Mar 21, 2017

xiang90 commented Mar 21, 2017 • edited Loading

xiang90 commented Mar 21, 2017

eran-totango commented Mar 21, 2017 • edited Loading

heyitsanthony commented Mar 28, 2017

heyitsanthony commented May 26, 2017

eran-totango commented Mar 20, 2017 •

edited

Loading

eran-totango commented Mar 21, 2017 •

edited

Loading

xiang90 commented Mar 21, 2017 •

edited

Loading

eran-totango commented Mar 21, 2017 •

edited

Loading