Reproduce watch starvation and event loss #17535

chaochn47 · 2024-03-06T01:34:43Z

Please read https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md#contribution-flow.

Test box is using m5.4xlarge instance type which has 16vCPU and 64GiB Memory

export GOWORK=off
make build
rm $HOME/watch-lost.log

pushd tests/e2e && EXPECT_DEBUG=true go test -v -timeout=2h -run TestWatchOnStreamMultiplex | tee $HOME/watch-lost.log

Set `numOfDirectWatches` to `800`

There is a gap of events not being sent out.

13006     logger.go:130: 2024-03-06T01:21:32.267Z     INFO    watch-cache     got watch response      {"event-type": "DELETE", "key": "/registry/pods/196/33", "rev": 13001}
13007     logger.go:130: 2024-03-06T01:21:32.267Z     INFO    watch-cache     got watch response      {"event-type": "PUT", "key": "/registry/pods/197/33", "rev": 13002}
13008     logger.go:130: 2024-03-06T01:21:32.569Z     DEBUG   compaction      compacted rev   {"compact-revision": 79244}
13009     logger.go:130: 2024-03-06T01:21:33.819Z     INFO    watch-cache     got watch response      {"event-type": "PUT", "key": "/registry/pods/102/438", "rev": 174748}
13010     logger.go:130: 2024-03-06T01:21:33.820Z     INFO    watch-cache     got watch response      {"event-type": "PUT", "key": "/registry/pods/149/439", "rev": 174749}
13011     logger.go:130: 2024-03-06T01:21:33.823Z     INFO    watch-cache     got watch response      {"event-type": "PUT", "key": "/registry/pods/64/437", "rev": 174750}
13012     logger.go:130: 2024-03-06T01:21:38.505Z     INFO    watch-cache     got watch response      {"event-type": "PUT", "key": "/registry/pods/173/438", "rev": 174751}

And stuck at receiving 85669 number of events but there are at least 247214 mutation requests sent successfully to server.

logger.go:130: 2024-03-06T01:25:50.702Z	WARN	compareEvents	watch events received is lagging behind	{"watch-events-received": 85669, "events-triggered": 247214}

After changing `numOfDirectWatches` to `100`

    logger.go:130: 2024-03-06T03:27:47.664Z	INFO	compareEvents	The number of events watch cache received is high than or equal to events triggered on client side	{"watch-cache-received": 191156, "traffic-triggered": 190962}
    logger.go:130: 2024-03-06T03:27:47.664Z	DEBUG	watch-cache	watcher exited
    logger.go:130: 2024-03-06T03:27:47.664Z	INFO	closing test cluster...
    logger.go:130: 2024-03-06T03:27:47.665Z	INFO	stoping server...	{"name": "test-0"}
    logger.go:130: 2024-03-06T03:27:47.684Z	INFO	stopped server.	{"name": "test-0"}
    logger.go:130: 2024-03-06T03:27:47.684Z	INFO	closing server...	{"name": "test-0"}
    logger.go:130: 2024-03-06T03:27:47.684Z	INFO	stoping server...	{"name": "test-0"}
    logger.go:130: 2024-03-06T03:27:47.684Z	INFO	removing directory	{"data-dir": "/tmp/TestWatchDelayOnStreamMultiplex1761534390/002"}
    logger.go:130: 2024-03-06T03:27:47.715Z	INFO	closed test cluster.
--- PASS: TestWatchDelayOnStreamMultiplex (80.68s)
PASS

k8s-ci-robot · 2024-03-06T01:34:45Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Signed-off-by: Chao Chen <chaochn@amazon.com>

chaochn47 · 2024-03-06T07:30:20Z

After the commit that logs more information about events are being skipped

etcd server log

{
    "level": "info",
    "ts": "2024-03-06T07:21:38.691929Z",
    "caller": "mvcc/watchable_store.go:371",
    "msg": "watcher has skipped events",
    "watcher-next-rev-to-accept": 2726,
    "watcher-min-rev-of-next-events-batch-to-sent": 184694,
    "watch-id": 0
}

Test log

   2723    2024-03-06T07:21:25.354Z   watch-cache     got watch response      {"event-type": "DELETE", "key": "/registry/pods/17/7", "rev": 2724}
   2724     2024-03-06T07:21:25.354Z  watch-cache     got watch response      {"event-type": "PUT", "key": "/registry/pods/35/7", "rev": 2725}
   2725     2024-03-06T07:21:38.695Z  watch-cache     got watch response      {"event-type": "DELETE", "key": "/registry/pods/64/463", "rev": 184694}
   2726     2024-03-06T07:21:38.695Z  watch-cache     got watch response      {"event-type": "PUT", "key": "/registry/pods/59/463", "rev": 184695}

The problem is:

If the shared watch response channel is full and this watcher next event revision is compacted, this slow/unsync'd watcher would skip all the events pending to send out due to this logic.

etcd/server/mvcc/watcher_group.go

Lines 253 to 260 in 4a5e9d1

    
           if w.minRev < compactRev { 
        
           	select { 
        
           	case w.ch <- WatchResponse{WatchID: w.id, CompactRevision: compactRev}: 
        
           		w.compacted = true 
        
           		wg.delete(w) 
        
           	default: 
        
           		// retry next time 
        
           	}

fuweid · 2024-03-06T08:17:41Z

tests/e2e/watch_delay_test.go

+
+func TestWatchDelayOnStreamMultiplex(t *testing.T) {
+	e2e.BeforeTest(t)
+	clus, err := e2e.NewEtcdProcessCluster(t, &e2e.EtcdProcessClusterConfig{ClusterSize: 1, LogLevel: "info"})


can reproduce it as well when ClientHttpSeparate: true. I was thinking that CPU resource or networking bandwitdh might impact this.

Thanks for confirming.

fuweid · 2024-03-06T08:19:34Z

If the shared watch response channel is full and this watcher next event revision is compacted, this slow/unsync'd watcher would skip all the events pending to send out due to this logic.

Maybe the client should close the watch and watch start from compact revision?
Because it's impossible to get the data between minRev and compactedRevsion.

fuweid · 2024-03-06T08:43:32Z

tests/e2e/watch_delay_test.go

+		rev := gresp.Header.Revision
+		watchCacheWatchOpts := append([]clientv3.OpOption{clientv3.WithCreatedNotify(), clientv3.WithRev(rev), clientv3.WithProgressNotify()}, commonWatchOpts...)
+		//lastTimeGotResponse := time.Now()
+		for wres := range c.Watch(rootCtx, watchCacheWatchKeyPrefix, watchCacheWatchOpts...) {


In my local, I didn't receive compact revision from watch channel.

Assume that all the watchers need to send compact revision and the watcherGroup.choose returns math.Int64.
If the watch chan is full that is mentioned by @chaochn47, the client won't receive the ErrCompact error.
And the minRev will be updated to curRev + 1. Next time, the watcher will pick up [lastCurRev+1, curRev] events and miss [compactRevision, lastCurRev].

etcd/server/mvcc/watchable_store.go

Lines 366 to 368 in 4a5e9d1

for w := range wg.watchers {

w.minRev = curRev + 1

If the watcherGroup.choose returns math.Int64, maybe we should skip syncWatchers and retry next time.
At least, the client should receive the ErrCompact.

It is not guaranteed the chooseAll would return the math.MaxInt64.

For example some watchers in the picked 512 watchers progress is faster than compactRev. The returned rev would be the minimum of revision that is bigger than compactRev.

etcd/server/mvcc/watcher_group.go

Lines 263 to 265 in 4a5e9d1

if minRev > w.minRev {

minRev = w.minRev

}

Yeah. I was trying to use the following change to skip update minRev for compacted watcher.

// server/mvcc/watchable_store.go package mvcc func (s *watchableStore) syncWatchers() int { // ... for w := range wg.watchers { if w.minRev < compactionRev { // skip it and retry it later since w.ch is full now continue } w.minRev = curRev + 1 // ... } }

aha, got it. This should be a correct change. After testing, the compaction response could still be delayed as much as few minutes and the watch cache is stale for that long.

dev-dsk-chaochn-2c-a26acd76 % grep "compacted rev" $HOME/watch-lost.log logger.go:130: 2024-03-08T05:40:46.304Z DEBUG compaction compacted rev {"compact-revision": 45704} logger.go:130: 2024-03-08T05:40:57.027Z DEBUG compaction compacted rev {"compact-revision": 90291} logger.go:130: 2024-03-08T05:41:07.175Z DEBUG compaction compacted rev {"compact-revision": 131525} logger.go:130: 2024-03-08T05:41:16.660Z DEBUG compaction compacted rev {"compact-revision": 170723} (24-03-08 5:42:17) <0> [~] dev-dsk-chaochn-2c-a26acd76 % grep "got watch response error" $HOME/watch-lost.log logger.go:130: 2024-03-08T05:41:26.364Z WARN watch-cache got watch response error {"last-received-events-kv-mod-revision": 10644, "compact-revision": 170723, "error": "etcdserver: mvcc: required revision has been compacted"}

I was thinking we should send the compacted error watch response to another channel so it won't be blocked that long.

etcd/server/etcdserver/api/v3rpc/watch.go

Line 412 in cd01925

case wresp, ok := <-sws.watchStream.Chan():

but anyway we can make incremental changes to improve this.

Signed-off-by: Chao Chen <chaochn@amazon.com>

k8s-ci-robot added the do-not-merge/work-in-progress label Mar 6, 2024

chaochn47 mentioned this pull request Mar 6, 2024

etcd watch events starvation / lost multiplexed on a single watch stream #17529

Closed

4 tasks

chaochn47 force-pushed the release-3.5-reproduce-watch-event-starvation branch 2 times, most recently from 334e373 to 6e370d3 Compare March 6, 2024 01:39

reproduce watch starvation and event loss

49447be

Signed-off-by: Chao Chen <chaochn@amazon.com>

chaochn47 force-pushed the release-3.5-reproduce-watch-event-starvation branch from 6e370d3 to 49447be Compare March 6, 2024 05:23

fuweid reviewed Mar 6, 2024

View reviewed changes

add addtional logging and make compact fail-open and retry

fa93dc3

Signed-off-by: Chao Chen <chaochn@amazon.com>

chaochn47 force-pushed the release-3.5-reproduce-watch-event-starvation branch from 9df8bb5 to fa93dc3 Compare March 8, 2024 05:57

chaochn47 closed this Mar 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproduce watch starvation and event loss #17535

Reproduce watch starvation and event loss #17535

chaochn47 commented Mar 6, 2024 •

edited

Loading

k8s-ci-robot commented Mar 6, 2024

chaochn47 commented Mar 6, 2024

fuweid Mar 6, 2024

chaochn47 Mar 6, 2024

fuweid commented Mar 6, 2024 •

edited

Loading

fuweid Mar 6, 2024

fuweid Mar 6, 2024 •

edited

Loading

chaochn47 Mar 6, 2024

fuweid Mar 6, 2024

chaochn47 Mar 8, 2024

Reproduce watch starvation and event loss #17535

Reproduce watch starvation and event loss #17535

Conversation

chaochn47 commented Mar 6, 2024 • edited Loading

Set numOfDirectWatches to 800

After changing numOfDirectWatches to 100

k8s-ci-robot commented Mar 6, 2024

chaochn47 commented Mar 6, 2024

etcd server log

Test log

fuweid Mar 6, 2024

Choose a reason for hiding this comment

chaochn47 Mar 6, 2024

Choose a reason for hiding this comment

fuweid commented Mar 6, 2024 • edited Loading

fuweid Mar 6, 2024

Choose a reason for hiding this comment

fuweid Mar 6, 2024 • edited Loading

Choose a reason for hiding this comment

chaochn47 Mar 6, 2024

Choose a reason for hiding this comment

fuweid Mar 6, 2024

Choose a reason for hiding this comment

chaochn47 Mar 8, 2024

Choose a reason for hiding this comment

chaochn47 commented Mar 6, 2024 •

edited

Loading

Set `numOfDirectWatches` to `800`

After changing `numOfDirectWatches` to `100`

fuweid commented Mar 6, 2024 •

edited

Loading

fuweid Mar 6, 2024 •

edited

Loading