Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

p2p/simulations: fix flaky test TestHTTPNodeRPC #30245

Closed

Conversation

bearpebble
Copy link
Contributor

@bearpebble bearpebble commented Jul 30, 2024

There is a race condition between subscribing and generating an event in the test. This PR extends the TestAPI to expose the number of active subscriptions, allowing clients to wait until their subscription becomes active.

Tested via

cd p2p/simulations
go test -c
stress ./simulations.test -test.run=TestHTTPNodeRPC
...
10m0s: 156026 runs so far, 0 failures

See issue #29830

@bearpebble bearpebble requested a review from fjl as a code owner July 30, 2024 20:13
@lightclient
Copy link
Member

I don't think this is a great way to solve this test. A few would be to extend the TestAPI interface to have a method to check if subscribe was successful or to have a loop that waits until the subscribe working (with a timeout) before beginning the test.

There is a race condition between subscribing and generating an event in
the test. The TestAPI now exposes the number of active subscriptions so
clients can ensure their subscription is active before calling the api.
@bearpebble bearpebble force-pushed the p2p-simulations-fix-flaky-test branch from 9bf6995 to 33dbe85 Compare August 1, 2024 08:57
@bearpebble
Copy link
Contributor Author

Hey @lightclient, thanks for the feedback.

I agree that it's not a great way to solve it. Performing a loop until subscriptions start working also feels kind of wrong though, since it could mask some rpc calls not working properly.

I was initially thinking about keeping a map of active subscriptions in the TestAPI but the client does not have access to the subscription ID or anything that would uniquely identify it. Clients therefore cannot query the status of a specific subscription. The solution I implemented is tracking the number of active subscriptions, which should work fine as long as the tests don't perform concurrent subscriptions from multiple callers.

I also saw your other PR about removing the directory completely, so if that goes through or you the approach sucks, feel free to close this PR.

state *atomic.Value
peerCount *int64
counter int64
activeSubscriptions int64
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you want sync.WaitGroup here. Also maybe just subCount, no need to be overly verbose.

if err := rpcClient1.CallContext(ctx, &expectedActiveSubscriptions, "test_getNumActiveSubscriptions"); err != nil {
t.Fatalf("error calling RPC method: %s", err)
}
expectedActiveSubscriptions += 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you not able to know statically the expected number of active subs? (1) ?

@@ -565,6 +581,22 @@ func TestHTTPNodeRPC(t *testing.T) {
}
defer sub.Unsubscribe()

// make sure the subscription becomes active
var numActiveSubscriptions int64
for i := 0; i < 3; i++ {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should do a for and select loop with a timeout here. You're basically saying "timeout after 300 milliseconds" here, but in a more complicated way.

@fjl
Copy link
Contributor

fjl commented Aug 12, 2024

p2p/simulations has been deleted in #30250

@fjl fjl closed this Aug 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants