p2p/simulations: fix flaky test TestHTTPNodeRPC #30245

bearpebble · 2024-07-30T20:13:18Z

There is a race condition between subscribing and generating an event in the test. This PR extends the TestAPI to expose the number of active subscriptions, allowing clients to wait until their subscription becomes active.

Tested via

cd p2p/simulations
go test -c
stress ./simulations.test -test.run=TestHTTPNodeRPC
...
10m0s: 156026 runs so far, 0 failures

See issue #29830

lightclient · 2024-07-31T22:59:53Z

I don't think this is a great way to solve this test. A few would be to extend the TestAPI interface to have a method to check if subscribe was successful or to have a loop that waits until the subscribe working (with a timeout) before beginning the test.

There is a race condition between subscribing and generating an event in the test. The TestAPI now exposes the number of active subscriptions so clients can ensure their subscription is active before calling the api.

bearpebble · 2024-08-01T09:01:47Z

Hey @lightclient, thanks for the feedback.

I agree that it's not a great way to solve it. Performing a loop until subscriptions start working also feels kind of wrong though, since it could mask some rpc calls not working properly.

I was initially thinking about keeping a map of active subscriptions in the TestAPI but the client does not have access to the subscription ID or anything that would uniquely identify it. Clients therefore cannot query the status of a specific subscription. The solution I implemented is tracking the number of active subscriptions, which should work fine as long as the tests don't perform concurrent subscriptions from multiple callers.

I also saw your other PR about removing the directory completely, so if that goes through or you the approach sucks, feel free to close this PR.

lightclient · 2024-08-01T15:46:08Z

p2p/simulations/http_test.go

+	state               *atomic.Value
+	peerCount           *int64
+	counter             int64
+	activeSubscriptions int64


I think you want sync.WaitGroup here. Also maybe just subCount, no need to be overly verbose.

lightclient · 2024-08-01T15:49:12Z

p2p/simulations/http_test.go

+	if err := rpcClient1.CallContext(ctx, &expectedActiveSubscriptions, "test_getNumActiveSubscriptions"); err != nil {
+		t.Fatalf("error calling RPC method: %s", err)
+	}
+	expectedActiveSubscriptions += 1


Are you not able to know statically the expected number of active subs? (1) ?

lightclient · 2024-08-01T15:50:38Z

p2p/simulations/http_test.go

@@ -565,6 +581,22 @@ func TestHTTPNodeRPC(t *testing.T) {
 	}
 	defer sub.Unsubscribe()

+	// make sure the subscription becomes active
+	var numActiveSubscriptions int64
+	for i := 0; i < 3; i++ {


I think you should do a for and select loop with a timeout here. You're basically saying "timeout after 300 milliseconds" here, but in a more complicated way.

fjl · 2024-08-12T08:38:54Z

p2p/simulations has been deleted in #30250

bearpebble requested a review from fjl as a code owner July 30, 2024 20:13

fix flaky test TestHTTPNodeRPC

33dbe85

There is a race condition between subscribing and generating an event in the test. The TestAPI now exposes the number of active subscriptions so clients can ensure their subscription is active before calling the api.

bearpebble force-pushed the p2p-simulations-fix-flaky-test branch from 9bf6995 to 33dbe85 Compare August 1, 2024 08:57

reorder struct member to fix alignment on 32 bit

fb55dee

lightclient reviewed Aug 1, 2024

View reviewed changes

fjl closed this Aug 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

p2p/simulations: fix flaky test TestHTTPNodeRPC #30245

p2p/simulations: fix flaky test TestHTTPNodeRPC #30245

bearpebble commented Jul 30, 2024 •

edited

Loading

lightclient commented Jul 31, 2024

bearpebble commented Aug 1, 2024

lightclient Aug 1, 2024

lightclient Aug 1, 2024

lightclient Aug 1, 2024

fjl commented Aug 12, 2024

p2p/simulations: fix flaky test TestHTTPNodeRPC #30245

p2p/simulations: fix flaky test TestHTTPNodeRPC #30245

Conversation

bearpebble commented Jul 30, 2024 • edited Loading

lightclient commented Jul 31, 2024

bearpebble commented Aug 1, 2024

lightclient Aug 1, 2024

Choose a reason for hiding this comment

lightclient Aug 1, 2024

Choose a reason for hiding this comment

lightclient Aug 1, 2024

Choose a reason for hiding this comment

fjl commented Aug 12, 2024

bearpebble commented Jul 30, 2024 •

edited

Loading