Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rpc: make websocket ping interval shorter than pong timeout #27726

Closed
wants to merge 1 commit into from

Conversation

zghh
Copy link

@zghh zghh commented Jul 15, 2023

With the current ping/pong configuration, after the client establishes a websocket connection with the server, if the two do not send any message for a long time, a read error will occur on the server so that the server will disconnect the connection.

This PR adjusts the websocket ping interval according to the official example of the websocket to fix the bug.

@zghh zghh requested review from fjl and holiman as code owners July 15, 2023 04:24
@fjl fjl changed the title rpc: adjust websocket ping interval rpc: make websocket ping interval shorter than pong timeout Jul 15, 2023
@fjl
Copy link
Contributor

fjl commented Jul 15, 2023

Hmm. So initially, I thought, this is obviously a correct fix for an issue. But upon further investigation, I wonder if the issue is even real. We handle ping as follows:

When no message is sent for 30s (wsPingInterval), a ping frame is sent. We then wait for a response from the peer within another 30s (wsPongTimeout).

Are you sure this issue is related to these constants? Did you experience such a disconnect in practice?

@zghh
Copy link
Author

zghh commented Jul 15, 2023

When I make 5000-10000 connections to Geth without sending any messages, disconnecting appears in a few minutes. And it would not disconnect after changing these constants.

I think the reason is that the following two places execute concurrently. If conn.SetReadDeadline(time.Time{}) executes first after receiving the pong frame, this issue will occur probability.

conn.SetReadDeadline(time.Time{})

wc.conn.SetReadDeadline(time.Now().Add(wsPongTimeout))

@fjl
Copy link
Contributor

fjl commented Jul 15, 2023

Alternative patch #27733

Please try.

@fjl
Copy link
Contributor

fjl commented Jul 15, 2023

I submitted an alternative patch because I think it's a logic race. Changing the timeouts will make it less likely, but the race is still there.

@fjl
Copy link
Contributor

fjl commented Aug 4, 2023

I'm closing this in favor of the alternative patch.

@fjl fjl closed this Aug 4, 2023
fjl added a commit that referenced this pull request Aug 11, 2023
This should fix #27726. With enough load, it might happen that the SetPongHandler 
callback gets invoked before the call to SetReadDeadline is made in pingLoop. When 
this occurs, the socket will end up with a 30s read deadline even though it got the pong,
which will lead to a timeout.

The fix here is processing the pong on pingLoop, synchronizing with the code that 
sends the ping.
devopsbo3 pushed a commit to HorizenOfficial/go-ethereum that referenced this pull request Nov 10, 2023
This should fix ethereum#27726. With enough load, it might happen that the SetPongHandler 
callback gets invoked before the call to SetReadDeadline is made in pingLoop. When 
this occurs, the socket will end up with a 30s read deadline even though it got the pong,
which will lead to a timeout.

The fix here is processing the pong on pingLoop, synchronizing with the code that 
sends the ping.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants