-
Notifications
You must be signed in to change notification settings - Fork 662
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
socket-mode: Handling WS errors during handshake #2099
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2099 +/- ##
==========================================
+ Coverage 91.65% 91.89% +0.23%
==========================================
Files 38 38
Lines 10311 10320 +9
Branches 647 649 +2
==========================================
+ Hits 9451 9484 +33
+ Misses 848 824 -24
Partials 12 12
Flags with carried forward coverage won't be shown. Click here to find out more. |
this.logger.debug('WebSocket open event received (connection established)!'); | ||
this.websocket = ws; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not tracking the current websocket instance was a problem in the case the web socket server returns an error. Since only once the open
event was emitted would we keep track of the websocket (this line), any errors during the websocket handshake would call into disconnect
which, crucially, only terminates the underlying network resources for the tracked this.websocket
.
That was the underlying issue that caused the "doubling up" of reconnection attempts (see "Issue 1" described by @gdhardy1 in this comment): each time an error was encountered during websocket handshake, there would be an 'untracked' ws
instance still executing reconnect attempts in addition to a new one being spawned. Essentially websocket-connection-bombing ourselves.
If you remove the changes in this file, you will see the integration test I added at the end of this PR fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for finding this. I was able to patch this in and test it while we await a full fix. It stabilized things tremendously.
this.logger.debug('Continuing with reconnect...'); | ||
this.emit(State.Reconnecting); | ||
cb.apply(this).then(res); | ||
if (this.shuttingDown) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While not a problem brought up by users, I noticed a problem here when writing integration tests:
- if you turn on the socket mode client and it is seeing errors from the web socket server, and
- in between reconnect attempts you call
disconnect()
to stop the client,
.. then the client will keep trying to reconnect forever via this part of the code. Adding this guard here just in case.
@@ -61,10 +61,7 @@ function errorWithCode(error: Error, code: ErrorCode): CodedError { | |||
* A factory to create SMWebsocketError objects. | |||
*/ | |||
export function websocketErrorWithOriginal(original: Error): SMWebsocketError { | |||
const error = errorWithCode( | |||
new Error(`Failed to send message on websocket: ${original.message}`), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed "Failed to send message" as part of this error, as this error can be raised unrelated to message-sending. In this PR, I noticed that websocket handshake errors also used this code path, which made the error message misleading.
@@ -168,6 +169,47 @@ describe('Integration tests with a WebSocket server', () => { | |||
assert.isTrue(debugLoggerSpy.calledWith(sinon.match('Unable to parse an incoming WebSocket message'))); | |||
await client.disconnect(); | |||
}); | |||
it('should maintain one serial reconnection attempt if WSS server sends unexpected HTTP response during handshake, like a 409', async () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the test testing for the exponential reconnection doubling spiral bug.
This PR fixes #2094
If a websocket error is raised during WS connection establishment, and retries are on, the library today may start a runaway loop of doubling the number of WS connections as it attempts to retry. This was caused by the
SlackWebSocket
class not tracking a fresh new WS connection until the connection was established; if an error occurred during the handshake, a new WS connection would be created and the old one would not be cleaned up. The core of the fix in this PR involves tracking the WS connection at all times.