-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reproduce watch starvation and event loss #17535
Reproduce watch starvation and event loss #17535
Conversation
Skipping CI for Draft Pull Request. |
334e373
to
6e370d3
Compare
Signed-off-by: Chao Chen <chaochn@amazon.com>
6e370d3
to
49447be
Compare
After the commit that logs more information about events are being skipped etcd server log
Test log
The problem is: If the shared watch response channel is full and this watcher next event revision is compacted, this slow/unsync'd watcher would skip all the events pending to send out due to this logic. etcd/server/mvcc/watcher_group.go Lines 253 to 260 in 4a5e9d1
|
tests/e2e/watch_delay_test.go
Outdated
|
||
func TestWatchDelayOnStreamMultiplex(t *testing.T) { | ||
e2e.BeforeTest(t) | ||
clus, err := e2e.NewEtcdProcessCluster(t, &e2e.EtcdProcessClusterConfig{ClusterSize: 1, LogLevel: "info"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can reproduce it as well when ClientHttpSeparate: true
. I was thinking that CPU resource or networking bandwitdh might impact this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for confirming.
Maybe the client should close the watch and watch start from compact revision? |
tests/e2e/watch_delay_test.go
Outdated
rev := gresp.Header.Revision | ||
watchCacheWatchOpts := append([]clientv3.OpOption{clientv3.WithCreatedNotify(), clientv3.WithRev(rev), clientv3.WithProgressNotify()}, commonWatchOpts...) | ||
//lastTimeGotResponse := time.Now() | ||
for wres := range c.Watch(rootCtx, watchCacheWatchKeyPrefix, watchCacheWatchOpts...) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my local, I didn't receive compact revision from watch channel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assume that all the watchers need to send compact revision and the watcherGroup.choose returns math.Int64.
If the watch chan is full that is mentioned by @chaochn47, the client won't receive the ErrCompact error.
And the minRev will be updated to curRev + 1. Next time, the watcher will pick up [lastCurRev+1, curRev] events and miss [compactRevision, lastCurRev].
etcd/server/mvcc/watchable_store.go
Lines 366 to 368 in 4a5e9d1
for w := range wg.watchers { | |
w.minRev = curRev + 1 | |
If the watcherGroup.choose returns math.Int64, maybe we should skip syncWatchers
and retry next time.
At least, the client should receive the ErrCompact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not guaranteed the chooseAll would return the math.MaxInt64
.
For example some watchers in the picked 512 watchers progress is faster than compactRev
. The returned rev would be the minimum of revision that is bigger than compactRev
.
etcd/server/mvcc/watcher_group.go
Lines 263 to 265 in 4a5e9d1
if minRev > w.minRev { | |
minRev = w.minRev | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. I was trying to use the following change to skip update minRev for compacted watcher.
// server/mvcc/watchable_store.go
package mvcc
func (s *watchableStore) syncWatchers() int {
// ...
for w := range wg.watchers {
if w.minRev < compactionRev {
// skip it and retry it later since w.ch is full now
continue
}
w.minRev = curRev + 1
// ...
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aha, got it. This should be a correct change. After testing, the compaction response could still be delayed as much as few minutes and the watch cache is stale for that long.
dev-dsk-chaochn-2c-a26acd76 % grep "compacted rev" $HOME/watch-lost.log
logger.go:130: 2024-03-08T05:40:46.304Z DEBUG compaction compacted rev {"compact-revision": 45704}
logger.go:130: 2024-03-08T05:40:57.027Z DEBUG compaction compacted rev {"compact-revision": 90291}
logger.go:130: 2024-03-08T05:41:07.175Z DEBUG compaction compacted rev {"compact-revision": 131525}
logger.go:130: 2024-03-08T05:41:16.660Z DEBUG compaction compacted rev {"compact-revision": 170723}
(24-03-08 5:42:17) <0> [~]
dev-dsk-chaochn-2c-a26acd76 % grep "got watch response error" $HOME/watch-lost.log
logger.go:130: 2024-03-08T05:41:26.364Z WARN watch-cache got watch response error {"last-received-events-kv-mod-revision": 10644, "compact-revision": 170723, "error": "etcdserver: mvcc: required revision has been compacted"}
I was thinking we should send the compacted error watch response to another channel so it won't be blocked that long.
etcd/server/etcdserver/api/v3rpc/watch.go
Line 412 in cd01925
case wresp, ok := <-sws.watchStream.Chan(): |
but anyway we can make incremental changes to improve this.
Signed-off-by: Chao Chen <chaochn@amazon.com>
9df8bb5
to
fa93dc3
Compare
Please read https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md#contribution-flow.
Test box is using
m5.4xlarge
instance type which has 16vCPU and 64GiB MemorySet
numOfDirectWatches
to800
There is a gap of events not being sent out.
And stuck at receiving
85669
number of events but there are at least247214
mutation requests sent successfully to server.After changing
numOfDirectWatches
to100