The rejoined node can not send out the events for watchers #8411

abel-von · 2017-08-17T03:51:58Z

Hi, we are doing some HA test for the etcd and kubernetes, and one of the test cases is to cut the network between one node of the etcd cluster and the leader. as we know that etcd can still work even some of the nodes is broken. But when we remain this for a long time (maybe one or two hours) and recover it, We have found that the kube-apiserver can not refresh its cache to the newest value in etcd.

By some investigation, we have found that it is the problem of watching mechanism in etcd. the events of key change are not sent out to apiserver.

And we also found that, if the network is cut for a shot time, it is working correctly. only cut the network for a long time can this issue be produced.

After we produce this and investigate it, we found that it is because that the wal files would be purged every 10000 requests. and after the broken node rejoin to the cluster, the leader will send it snapshot instead of raft logs. the node restore the snapshot to its backend db, but this restore operation is not defined in the watchableStore, so the events of change are not sent out.

The text was updated successfully, but these errors were encountered:

abel-von · 2017-08-17T04:09:08Z

the version is 3.1.9

xiang90 · 2017-08-17T04:16:06Z

@abel-von can you write a simple script/program to reproduce this problem?

abel-von · 2017-08-17T05:17:38Z

@xiang90 I have changed the code of watchable_store.go, and the issue has been fixed in my environment. the change is adding a function Restore to type watchableStore. like this:

func (s *watchableStore) Restore(b backend.Backend) error {
	s.mu.Lock()
	defer s.mu.Unlock()
	err := s.store.Restore(b)
	if err != nil {
		return err
	}
	s.unsynced = s.synced
	s.synced = newWatcherGroup()
	s.syncWatchers()
	return nil
}

just sync the watchers after restore.

but the master branch is 3.2.X and the mvcc code has been changed a lot, I will see if the issue still exist in the master branch, if exist, I will submit a PR then.

The steps to reproduce this issue:

start up 3 etcd nodes.
using etcdctl to watch a key, i.e. etcdctl --endpoints http://{brokennodeip}:2379 watch /aaa
cut the network by iptables. iptables -I OUTPUT -d {leaderip} -j DROP and iptables -I INPUT -s {leaderip} -j DROP
4.use etcdctl to update the the watched key again and again, at least 10000 times.

while true; do
    etcdctl --endpoints=http://{leaderip}:2379 put /aaa hhh
done

recover the network between the node and the leader. iptables -D OUTPUT -d {leaderip} -j DROP and iptables -D INPUT -s {leaderip} -j DROP
we can get the new value of the aaa from the brokennodeip by etcdctl --endpoints http://{brokennodeip}:2379 get /aaa, but the watch in the step 2 couldn't get the events of change.

heyitsanthony · 2017-08-17T07:07:41Z

OK, so the repro is watch on some member, partition it, write values into the watched key on another member until triggering a snapshot, unpartition member, wait forever on the watch? I wouldn't be surprised if this breaks in 3.2 too; the restore+watch path isn't very well tested.

Fixes: etcd-io#8411

Fixes: #8411

Fixes: etcd-io#8411

chestack · 2019-04-24T12:52:07Z

@abel-von, came into same issue on my environment, thanks for your fixing.

As you mentioned It works correctly without this PR if the network is cut for a shot time. So which is the root cause: node partition for a long time or sufficient events(updated 10000 times) happened during the partition?

abel-von pushed a commit to abel-von/etcd that referenced this issue Aug 17, 2017

mvcc: sending events after restore

637b723

Fixes: etcd-io#8411

abel-von mentioned this issue Aug 17, 2017

mvcc: sending events after restore #8412

Closed

abel-von pushed a commit to abel-von/etcd that referenced this issue Aug 18, 2017

mvcc: sending events after restore

0d723d5

Fixes: etcd-io#8411

abel-von pushed a commit to abel-von/etcd that referenced this issue Aug 19, 2017

mvcc: sending events after restore

1b849d6

Fixes: etcd-io#8411

gyuho pushed a commit to gyuho/etcd that referenced this issue Aug 21, 2017

mvcc: sending events after restore

13041c1

Fixes: etcd-io#8411

gyuho mentioned this issue Aug 21, 2017

mvcc: sending events after restore #8427

Merged

gyuho closed this as completed in #8427 Aug 21, 2017

gyuho pushed a commit that referenced this issue Aug 21, 2017

mvcc: sending events after restore

78d6822

Fixes: #8411

This was referenced Sep 15, 2017

etcd3 store: retry with live object on conflict if there was a suggestion kubernetes/kubernetes#43152

Merged

Upgrade etcd version to 3.1.10 in v1.9 kubernetes/kubeadm#450

Closed

liggitt mentioned this issue Oct 10, 2017

Scheduler dies with "Schedulercache is corrupted" kubernetes/kubernetes#50916

Closed

jpbetz pushed a commit to jpbetz/etcd that referenced this issue Nov 1, 2017

mvcc: sending events after restore

6bacf5b

Fixes: etcd-io#8411

jpbetz mentioned this issue Nov 1, 2017

Automated cherry pick of #8427 to release 3.1 branch #8806

Merged

jpbetz pushed a commit to jpbetz/etcd that referenced this issue Nov 3, 2017

mvcc: sending events after restore

e83f50e

Fixes: etcd-io#8411

chestack pushed a commit to chestack/etcd that referenced this issue Apr 1, 2019

mvcc: sending events after restore

aba037b

Fixes: etcd-io#8411

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The rejoined node can not send out the events for watchers #8411

The rejoined node can not send out the events for watchers #8411

abel-von commented Aug 17, 2017

abel-von commented Aug 17, 2017

xiang90 commented Aug 17, 2017

abel-von commented Aug 17, 2017

heyitsanthony commented Aug 17, 2017

chestack commented Apr 24, 2019 •

edited

Loading

The rejoined node can not send out the events for watchers #8411

The rejoined node can not send out the events for watchers #8411

Comments

abel-von commented Aug 17, 2017

abel-von commented Aug 17, 2017

xiang90 commented Aug 17, 2017

abel-von commented Aug 17, 2017

heyitsanthony commented Aug 17, 2017

chestack commented Apr 24, 2019 • edited Loading

chestack commented Apr 24, 2019 •

edited

Loading