-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The rejoined node can not send out the events for watchers #8411
Comments
the version is 3.1.9 |
@abel-von can you write a simple script/program to reproduce this problem? |
@xiang90 I have changed the code of
just sync the watchers after restore. but the master branch is 3.2.X and the mvcc code has been changed a lot, I will see if the issue still exist in the master branch, if exist, I will submit a PR then. The steps to reproduce this issue:
|
OK, so the repro is watch on some member, partition it, write values into the watched key on another member until triggering a snapshot, unpartition member, wait forever on the watch? I wouldn't be surprised if this breaks in 3.2 too; the restore+watch path isn't very well tested. |
@abel-von, came into same issue on my environment, thanks for your fixing. As you mentioned It works correctly without this PR if the network is cut for a shot time. So which is the root cause: node partition for a long time or sufficient events(updated 10000 times) happened during the partition? |
Hi, we are doing some HA test for the etcd and kubernetes, and one of the test cases is to cut the network between one node of the etcd cluster and the leader. as we know that etcd can still work even some of the nodes is broken. But when we remain this for a long time (maybe one or two hours) and recover it, We have found that the kube-apiserver can not refresh its cache to the newest value in etcd.
By some investigation, we have found that it is the problem of watching mechanism in etcd. the events of key change are not sent out to apiserver.
And we also found that, if the network is cut for a shot time, it is working correctly. only cut the network for a long time can this issue be produced.
After we produce this and investigate it, we found that it is because that the wal files would be purged every 10000 requests. and after the broken node rejoin to the cluster, the leader will send it snapshot instead of raft logs. the node restore the snapshot to its backend db, but this restore operation is not defined in the
watchableStore
, so the events of change are not sent out.The text was updated successfully, but these errors were encountered: