When killed. Etcdv3 node can require manual intervention to bring back #7628

doodles526 · 2017-03-29T17:40:34Z

This was discovered when running a benchmark on a 3 node etcd cluster. The issue was only produced on a single node.

It appears that the BoltDB backend lags behind the snapshots written to disk, as a hard kill to an etcd member can result in etcdmain: database file (/data/etcd/member/snap/db index 7622690) does not match with snapshot (index 8081536). upon starting back up. After getting this error and the node starting to flap, you are able to fix the issue by deleting the latest snapshot and WAL file. It appears that after the snapshot is written to disk before the db file is written to, and that upon booting etcd doesn't have an automated method of recovery.

Log output

Mar 28 21:44:24 aws-us-east-1-memory etcd697: etcd.19  | 2017-03-28 21:44:24.287615 I | etcdserver: applying snapshot at index 7607808...
Mar 28 21:44:24 aws-us-east-1-memory etcd697: etcd.19  | 2017-03-28 21:44:24.291440 I | etcdserver: saved snapshot at index 7599475
Mar 28 21:44:24 aws-us-east-1-memory etcd697: etcd.19  | 2017-03-28 21:44:24.291867 I | etcdserver: compacted raft log at 7594475
Mar 28 21:44:24 aws-us-east-1-memory etcd697: etcd.19  | 2017-03-28 21:44:24.295647 I | etcdserver: saved snapshot at index 7602829
Mar 28 21:44:24 aws-us-east-1-memory etcd697: etcd.19  | 2017-03-28 21:44:24.295980 I | etcdserver: compacted raft log at 7597829
Mar 28 21:44:24 aws-us-east-1-memory etcd697: etcd.19  | 2017-03-28 21:44:24.298568 I | etcdserver: saved snapshot at index 7600706
Mar 28 21:44:24 aws-us-east-1-memory etcd697: etcd.19  | 2017-03-28 21:44:24.300390 I | etcdserver: saved snapshot at index 7607808
Mar 28 21:44:24 aws-us-east-1-memory etcd697: etcd.19  | 2017-03-28 21:44:24.300554 I | etcdserver: compacted raft log at 7602808
Mar 28 21:44:24 aws-us-east-1-memory etcd697: etcd.19  | 2017-03-28 21:44:24.302218 I | etcdserver: raft applied incoming snapshot at index 7622690
Mar 28 21:44:29 aws-us-east-1-memory etcd697: etcd.19  | 2017-03-28 21:44:29.642527 I | etcdserver: recovering lessor...
Mar 28 21:44:29 aws-us-east-1-memory etcd697: etcd.19  | 2017-03-28 21:44:29.648641 I | etcdserver: finished recovering lessor
Mar 28 21:44:29 aws-us-east-1-memory etcd697: etcd.19  | 2017-03-28 21:44:29.648663 I | etcdserver: restoring mvcc store...
Mar 28 21:44:50 aws-us-east-1-memory etcd697: etcd.19  | 2017-03-28 21:44:50.554203 I | pkg/fileutil: purged file /data/etcd/member/snap/000000000000007d-000000000073ddc2.snap successfully
Mar 28 21:44:50 aws-us-east-1-memory etcd697: etcd.19  | 2017-03-28 21:44:50.554290 I | pkg/fileutil: purged file /data/etcd/member/snap/000000000000007d-000000000073e427.snap successfully
Mar 28 21:44:50 aws-us-east-1-memory etcd697: etcd.19  | 2017-03-28 21:44:50.554345 I | pkg/fileutil: purged file /data/etcd/member/snap/000000000000007d-000000000073e822.snap successfully
Mar 28 21:44:50 aws-us-east-1-memory etcd697: etcd.19  | 2017-03-28 21:44:50.554397 I | pkg/fileutil: purged file /data/etcd/member/snap/000000000000007d-000000000073ed71.snap successfully
Mar 28 21:44:50 aws-us-east-1-memory etcd697: etcd.19  | 2017-03-28 21:44:50.554480 I | pkg/fileutil: purged file /data/etcd/member/snap/000000000000007d-000000000073f16f.snap successfully
Mar 28 21:44:53 aws-us-east-1-memory etcd697: etcd.19  | 2017-03-28 21:44:53.814295 I | wal: segmented wal file /data/etcd/member/wal/0000000000000017-0000000000792046.wal is created
Mar 28 21:45:07 aws-us-east-1-memory etcd697: etcd.19  | 2017-03-28 21:45:07.766653 I | rafthttp: receiving database snapshot [index:8081536, from f69252dd496581b1] ...
Mar 28 21:45:13 aws-us-east-1-memory etcd697: etcd.19  | 2017-03-28 21:45:13.915615 I | snap: saved database snapshot to disk [total bytes: 1589116928]
Mar 28 21:45:13 aws-us-east-1-memory etcd697: etcd.19  | 2017-03-28 21:45:13.915661 I | rafthttp: received and saved database snapshot [index: 8081536, from: f69252dd496581b1] successfully
Mar 28 21:45:13 aws-us-east-1-memory etcd697: etcd.19  | 2017-03-28 21:45:13.915874 I | raft: 11561d69df4299ed [commit: 8006839, lastindex: 8006839, lastterm: 125] starts to restore snapshot [index: 8081536, term: 125]
Mar 28 21:45:13 aws-us-east-1-memory etcd697: etcd.19  | 2017-03-28 21:45:13.915911 I | raft: log [committed=8006839, applied=8006839, unstable.offset=8006840, len(unstable.Entries)=0] starts to restore snapshot [index: 8081536, term: 125]
Mar 28 21:45:13 aws-us-east-1-memory etcd697: etcd.19  | 2017-03-28 21:45:13.915947 I | raft: 11561d69df4299ed restored progress of 11561d69df4299ed [next = 8081537, match = 8081536, state = ProgressStateProbe, waiting = false, pendingSnapshot = 0]
Mar 28 21:45:13 aws-us-east-1-memory etcd697: etcd.19  | 2017-03-28 21:45:13.915975 I | raft: 11561d69df4299ed restored progress of 1fd3856b78fe333e [next = 8081537, match = 0, state = ProgressStateProbe, waiting = false, pendingSnapshot = 0]
Mar 28 21:45:13 aws-us-east-1-memory etcd697: etcd.19  | 2017-03-28 21:45:13.915992 I | raft: 11561d69df4299ed restored progress of f69252dd496581b1 [next = 8081537, match = 0, state = ProgressStateProbe, waiting = false, pendingSnapshot = 0]
Mar 28 21:45:13 aws-us-east-1-memory etcd697: etcd.19  | 2017-03-28 21:45:13.916004 I | raft: 11561d69df4299ed [commit: 8081536] restored snapshot [index: 8081536, term: 125]
Mar 28 21:45:13 aws-us-east-1-memory etcd697: etcd.19  | 2017-03-28 21:45:13.920984 I | etcdserver: raft applied incoming snapshot at index 8081536
Mar 28 21:45:16 aws-us-east-1-memory etcd697: etcd.19  | /app/etcd_runner.sh: line 6:    24 Killed                  etcd --initial-cluster-state new --initial-cluster-token $TOKEN --name etcd697.aws-us-east-1-memory.19.dblayer.com --data-dir /data/etcd --heartbeat-interval 500 --election-timeout 5000 --snapshot-count 1000 --listen-peer-urls http://10.197.80.130:2380 --advertise-client-urls 'http://10.197.80.130:2379' --initial-advertise-peer-urls http://10.197.80.130:2380 --listen-client-urls 'http://10.197.80.130:2379,http://127.0.0.1:2379' --initial-cluster 'etcd697.aws-us-east-1-memory.19.dblayer.com=http://10.197.80.130:2380,etcd675.aws-us-east-1-memory.18.dblayer.com=http://10.197.80.131:2380,etcd616.aws-us-east-1-memory.20.dblayer.com=http://10.197.80.132:2380'
Mar 28 21:45:16 aws-us-east-1-memory etcd697: start.13 | etcd.19 process exited.
Mar 28 21:45:16 aws-us-east-1-memory etcd697: start.13 | Stopping Cron Scheduler.
Mar 28 21:45:23 aws-us-east-1-memory etcd697: can't setuid.  syscall.Setuid(1500): operation not supported
Mar 28 21:45:23 aws-us-east-1-memory etcd697: running as uid 0
Mar 28 21:45:23 aws-us-east-1-memory etcd697: start.13 | Adding Procfile entry [Type: "DAEMON_PROCESS", Name: "etcd", Schedule: "", Command: "/app/etcd_runner.sh"].
Mar 28 21:45:23 aws-us-east-1-memory etcd697: start.13 | Starting etcd process.
Mar 28 21:45:23 aws-us-east-1-memory etcd697: start.13 | Starting Cron Scheduler.
Mar 28 21:45:23 aws-us-east-1-memory etcd697: etcd.19  | 2017-03-28 21:45:23.338478 I | etcdmain: etcd Version: 3.1.4
Mar 28 21:45:23 aws-us-east-1-memory etcd697: etcd.19  | 2017-03-28 21:45:23.338629 I | etcdmain: Git SHA: 41e52eb
Mar 28 21:45:23 aws-us-east-1-memory etcd697: etcd.19  | 2017-03-28 21:45:23.338638 I | etcdmain: Go Version: go1.7.5
Mar 28 21:45:23 aws-us-east-1-memory etcd697: etcd.19  | 2017-03-28 21:45:23.338643 I | etcdmain: Go OS/Arch: linux/amd64
Mar 28 21:45:23 aws-us-east-1-memory etcd697: etcd.19  | 2017-03-28 21:45:23.338649 I | etcdmain: setting maximum number of CPUs to 16, total number of available CPUs is 16
Mar 28 21:45:23 aws-us-east-1-memory etcd697: etcd.19  | 2017-03-28 21:45:23.338698 N | etcdmain: the server is already initialized as member before, starting as etcd member...
Mar 28 21:45:23 aws-us-east-1-memory etcd697: etcd.19  | 2017-03-28 21:45:23.338787 I | embed: listening for peers on http://10.197.80.130:2380
Mar 28 21:45:23 aws-us-east-1-memory etcd697: etcd.19  | 2017-03-28 21:45:23.338862 I | embed: listening for client requests on 10.197.80.130:2379
Mar 28 21:45:23 aws-us-east-1-memory etcd697: etcd.19  | 2017-03-28 21:45:23.338907 I | embed: listening for client requests on 127.0.0.1:2379
Mar 28 21:45:24 aws-us-east-1-memory etcd697: etcd.19  | 2017-03-28 21:45:24.339415 W | etcdserver: another etcd process is running with the same data dir and holding the file lock.
Mar 28 21:45:24 aws-us-east-1-memory etcd697: etcd.19  | 2017-03-28 21:45:24.339458 W | etcdserver: waiting for it to exit before starting...
Mar 28 21:45:27 aws-us-east-1-memory etcd697: etcd.19  | 2017-03-28 21:45:27.851222 W | snap: skipped unexpected non snapshot file 000000000000007d-000000000060b62f.snap.broken
Mar 28 21:45:27 aws-us-east-1-memory etcd697: etcd.19  | 2017-03-28 21:45:27.851247 W | snap: skipped unexpected non snapshot file 00000000007b5080.snap.db
Mar 28 21:45:27 aws-us-east-1-memory etcd697: etcd.19  | 2017-03-28 21:45:27.852483 I | etcdserver: recovered store from snapshot at index 8081536
Mar 28 21:45:27 aws-us-east-1-memory etcd697: etcd.19  | 2017-03-28 21:45:27.852504 I | etcdserver: name = etcd697.aws-us-east-1-memory.19.dblayer.com
Mar 28 21:45:27 aws-us-east-1-memory etcd697: etcd.19  | 2017-03-28 21:45:27.852510 I | etcdserver: data dir = /data/etcd
Mar 28 21:45:27 aws-us-east-1-memory etcd697: etcd.19  | 2017-03-28 21:45:27.852516 I | etcdserver: member dir = /data/etcd/member
Mar 28 21:45:27 aws-us-east-1-memory etcd697: etcd.19  | 2017-03-28 21:45:27.852521 I | etcdserver: heartbeat = 500ms
Mar 28 21:45:27 aws-us-east-1-memory etcd697: etcd.19  | 2017-03-28 21:45:27.852525 I | etcdserver: election = 5000ms
Mar 28 21:45:27 aws-us-east-1-memory etcd697: etcd.19  | 2017-03-28 21:45:27.852530 I | etcdserver: snapshot count = 1000
Mar 28 21:45:27 aws-us-east-1-memory etcd697: etcd.19  | 2017-03-28 21:45:27.852542 I | etcdserver: advertise client URLs = http://10.197.80.130:2379
Mar 28 21:45:28 aws-us-east-1-memory etcd697: etcd.19  | 2017-03-28 21:45:28.127894 I | etcdserver: restarting member 11561d69df4299ed in cluster 861e1d72f0835e7a at commit index 8157572
Mar 28 21:45:28 aws-us-east-1-memory etcd697: etcd.19  | 2017-03-28 21:45:28.131329 I | raft: 11561d69df4299ed became follower at term 125
Mar 28 21:45:28 aws-us-east-1-memory etcd697: etcd.19  | 2017-03-28 21:45:28.131363 I | raft: newRaft 11561d69df4299ed [peers: [11561d69df4299ed,1fd3856b78fe333e,f69252dd496581b1], term: 125, commit: 8157572, applied: 8081536, lastindex: 8157572, lastterm: 125]
Mar 28 21:45:28 aws-us-east-1-memory etcd697: etcd.19  | 2017-03-28 21:45:28.131515 I | etcdserver/api: enabled capabilities for version 3.1
Mar 28 21:45:28 aws-us-east-1-memory etcd697: etcd.19  | 2017-03-28 21:45:28.131539 I | etcdserver/membership: added member 11561d69df4299ed [http://10.197.80.130:2380] to cluster 861e1d72f0835e7a from store
Mar 28 21:45:28 aws-us-east-1-memory etcd697: etcd.19  | 2017-03-28 21:45:28.131547 I | etcdserver/membership: added member 1fd3856b78fe333e [http://10.197.80.132:2380] to cluster 861e1d72f0835e7a from store
Mar 28 21:45:28 aws-us-east-1-memory etcd697: etcd.19  | 2017-03-28 21:45:28.131554 I | etcdserver/membership: added member f69252dd496581b1 [http://10.197.80.131:2380] to cluster 861e1d72f0835e7a from store
Mar 28 21:45:28 aws-us-east-1-memory etcd697: etcd.19  | 2017-03-28 21:45:28.131561 I | etcdserver/membership: set the cluster version to 3.1 from store
Mar 28 21:45:36 aws-us-east-1-memory etcd697: etcd.19  | 2017-03-28 21:45:36.527334 C | etcdmain: database file (/data/etcd/member/snap/db index 7622690) does not match with snapshot (index 8081536).

The text was updated successfully, but these errors were encountered:

doodles526 · 2017-03-29T17:46:14Z

To get this node to come back, I needed to manually delete the latest snapshot and WAL file, and have it recover the rest from the other nodes in the cluster

heyitsanthony · 2017-03-29T18:06:31Z

Possible candidate for failpoint testing. Start etcd so it snapshots frequently and inject a sleep before the db sync, kill it, then check if the node restarts cleanly.

fanminshi · 2017-04-27T22:01:36Z

@doodles526 could you provide me an exact steps to reproduce the same issue?

hasbro17 · 2017-04-27T23:18:29Z

Steps to reproduce the above issue:
Have a snapshot snapshot.db ready from some previous cluster.

Restore the data-dir from the snapshot for the first member of our cluster:

$ ETCDCTL_API=3 bin/etcdctl snapshot restore snapshot.db --name infra1 --initial-cluster infra1=http://127.0.0.1:2380 --initial-cluster-token etcd-cluster-1   --initial-advertise-peer-urls http://127.0.0.1:2380

Create the 1st member of a new cluster using the restored data-dir infra1.etcd:

$ bin/etcd --data-dir=infra1.etcd --name infra1 --listen-client-urls http://127.0.0.1:2379 --advertise-client-urls http://127.0.0.1:2379 --listen-peer-urls http://127.0.0.1:2380 --initial-advertise-peer-urls http://127.0.0.1:2380 --initial-cluster 'infra1=http://127.0.0.1:2380' --initial-cluster-state new

Add a 2nd member to cluster:

$ ETCDCTL_API=3 bin/etcdctl member add infra2 --peer-urls=http://127.0.0.1:22380

Start etcd server for 2nd member:

$ bin/etcd --data-dir=infra2.etcd --name infra2 --listen-client-urls http://127.0.0.1:22379 --advertise-client-urls http://127.0.0.1:22379 --listen-peer-urls http://127.0.0.1:22380 --initial-advertise-peer-urls http://127.0.0.1:22380 --initial-cluster 'infra1=http://127.0.0.1:2380,infra2=http://127.0.0.1:22380' --initial-cluster-state existing

Kill and restart the etcd server for the 2nd member:

^C
$ bin/etcd --data-dir=infra2.etcd --name infra2 --listen-client-urls http://127.0.0.1:22379 --advertise-client-urls http://127.0.0.1:22379 --listen-peer-urls http://127.0.0.1:22380 --initial-advertise-peer-urls http://127.0.0.1:22380 --initial-cluster 'infra1=http://127.0.0.1:2380,infra2=http://127.0.0.1:22380' --initial-cluster-state existing

The 2nd member will exit with the following error:

2017-04-27 16:10:25.544472 C | etcdmain: database file (bin/infra2.etcd/member/snap/db index 4) does not match with snapshot (index 5).

hasbro17 · 2017-04-27T23:26:23Z

The snapshot that I used was just from writing 1 key value pair to a single member etcd cluster:

$ ETCDCTL_API=3 etcdctl --endpoints=http://127.0.0.1:2379 put foo1 bar1
OK
$ ETCDCTL_API=3 etcdctl --endpoints http://127.0.0.1:2379 snapshot save snapshot.db

heyitsanthony · 2017-04-27T23:26:27Z

@hasbro17 that looks like a separate issue. There's no etcdctl snapshot or membership reconfiguration involved in this one. etcd isn't syncing with raft correctly for this one. Open another issue?

hasbro17 · 2017-04-28T00:07:55Z

@heyitsanthony done #7834

In the case that follower recieves a snapshot from leader and crashes before renaming xxx.snap.db to db, restarting follower results loading old db. This will causes a index mismatch between snap metadata index and consistent index from db. The pr fixes the above on init of etcdserver through: 1. check if xxx.snap.db (xxx==snapshot.Metadata.Index) exists. 2. rename xxx.snap.db to db if exists. 3. load backend again with the new db file. FIXES etcd-io#7628

…nap files In the case that follower recieves a snapshot from leader and crashes before renaming xxx.snap.db to db but after snapshot has persisted to .wal and .snap, restarting follower results loading old db, new .wal, and new .snap. This will causes a index mismatch between snap metadata index and consistent index from db. This pr forces an ordering where saving/renaming db must happen before snapshot is persisted to wal and snap file. this ensures that db file can never be newer than wal and snap file. hence, it guarantees the invariant snapshot.Metadata.Index <= db.ConsistentIndex() in NewServer() when checking validity of db and snap file. FIXES etcd-io#7628

…ap files In the case that follower recieves a snapshot from leader and crashes before renaming xxx.snap.db to db but after snapshot has persisted to .wal and .snap, restarting follower results loading old db, new .wal, and new .snap. This will causes a index mismatch between snap metadata index and consistent index from db. This pr forces an ordering where saving/renaming db must happen after snapshot is persisted to wal and snap file. this guarantees wal and snap files are newer than db. on server restart, etcd server checks if snap index > db consistent index. if yes, etcd server attempts to load xxx.snap.db where xxx=snap index if there is any and panic other wise. FIXES etcd-io#7628

heyitsanthony added the type/bug label Mar 29, 2017

heyitsanthony added this to the v3.2.0 milestone Apr 11, 2017

fanminshi self-assigned this Apr 26, 2017

hongchaodeng mentioned this issue Apr 27, 2017

Upgrade restarts all pods after a restore happens coreos/etcd-operator#1008

Closed

fanminshi mentioned this issue May 4, 2017

etcdserver: renaming db happens after snapshot persists to wal and snap files #7876

Merged

fanminshi closed this as completed in #7876 May 9, 2017

heyitsanthony mentioned this issue May 19, 2017

Replica becomes corrupted after being OOM killed #7957

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When killed. Etcdv3 node can require manual intervention to bring back #7628

When killed. Etcdv3 node can require manual intervention to bring back #7628

doodles526 commented Mar 29, 2017

doodles526 commented Mar 29, 2017

heyitsanthony commented Mar 29, 2017

fanminshi commented Apr 27, 2017

hasbro17 commented Apr 27, 2017

hasbro17 commented Apr 27, 2017

heyitsanthony commented Apr 27, 2017

hasbro17 commented Apr 28, 2017

When killed. Etcdv3 node can require manual intervention to bring back #7628

When killed. Etcdv3 node can require manual intervention to bring back #7628

Comments

doodles526 commented Mar 29, 2017

doodles526 commented Mar 29, 2017

heyitsanthony commented Mar 29, 2017

fanminshi commented Apr 27, 2017

hasbro17 commented Apr 27, 2017

hasbro17 commented Apr 27, 2017

heyitsanthony commented Apr 27, 2017

hasbro17 commented Apr 28, 2017