Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

socket-mode: Handling WS errors during handshake #2099

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Conversation

filmaj
Copy link
Contributor

@filmaj filmaj commented Nov 22, 2024

This PR fixes #2094

If a websocket error is raised during WS connection establishment, and retries are on, the library today may start a runaway loop of doubling the number of WS connections as it attempts to retry. This was caused by the SlackWebSocket class not tracking a fresh new WS connection until the connection was established; if an error occurred during the handshake, a new WS connection would be created and the old one would not be cleaned up. The core of the fix in this PR involves tracking the WS connection at all times.

@filmaj filmaj added bug M-T: A confirmed bug report. Issues are confirmed when the reproduction steps are documented discussion M-T: An issue where more input is needed to reach a decision server-side-issue draft pkg:socket-mode applies to `@slack/socket-mode` backend Issue related to the Slack API labels Nov 22, 2024
Copy link

codecov bot commented Nov 22, 2024

Codecov Report

Attention: Patch coverage is 50.00000% with 7 lines in your changes missing coverage. Please review.

Project coverage is 91.89%. Comparing base (278514f) to head (1c035e3).
Report is 7 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2099      +/-   ##
==========================================
+ Coverage   91.65%   91.89%   +0.23%     
==========================================
  Files          38       38              
  Lines       10311    10320       +9     
  Branches      647      649       +2     
==========================================
+ Hits         9451     9484      +33     
+ Misses        848      824      -24     
  Partials       12       12              
Flag Coverage Δ
cli-hooks 95.24% <ø> (+0.01%) ⬆️
cli-test 94.48% <ø> (+0.01%) ⬆️
oauth 77.39% <ø> (ø)
socket-mode 61.82% <50.00%> (+3.59%) ⬆️
web-api 96.88% <ø> (ø)
webhook 96.65% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

this.logger.debug('WebSocket open event received (connection established)!');
this.websocket = ws;
Copy link
Contributor Author

@filmaj filmaj Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not tracking the current websocket instance was a problem in the case the web socket server returns an error. Since only once the open event was emitted would we keep track of the websocket (this line), any errors during the websocket handshake would call into disconnect which, crucially, only terminates the underlying network resources for the tracked this.websocket.

That was the underlying issue that caused the "doubling up" of reconnection attempts (see "Issue 1" described by @gdhardy1 in this comment): each time an error was encountered during websocket handshake, there would be an 'untracked' ws instance still executing reconnect attempts in addition to a new one being spawned. Essentially websocket-connection-bombing ourselves.

If you remove the changes in this file, you will see the integration test I added at the end of this PR fail.

Copy link

@gdhardy1 gdhardy1 Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for finding this. I was able to patch this in and test it while we await a full fix. It stabilized things tremendously.

this.logger.debug('Continuing with reconnect...');
this.emit(State.Reconnecting);
cb.apply(this).then(res);
if (this.shuttingDown) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While not a problem brought up by users, I noticed a problem here when writing integration tests:

  • if you turn on the socket mode client and it is seeing errors from the web socket server, and
  • in between reconnect attempts you call disconnect() to stop the client,

.. then the client will keep trying to reconnect forever via this part of the code. Adding this guard here just in case.

@@ -61,10 +61,7 @@ function errorWithCode(error: Error, code: ErrorCode): CodedError {
* A factory to create SMWebsocketError objects.
*/
export function websocketErrorWithOriginal(original: Error): SMWebsocketError {
const error = errorWithCode(
new Error(`Failed to send message on websocket: ${original.message}`),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed "Failed to send message" as part of this error, as this error can be raised unrelated to message-sending. In this PR, I noticed that websocket handshake errors also used this code path, which made the error message misleading.

@@ -168,6 +169,47 @@ describe('Integration tests with a WebSocket server', () => {
assert.isTrue(debugLoggerSpy.calledWith(sinon.match('Unable to parse an incoming WebSocket message')));
await client.disconnect();
});
it('should maintain one serial reconnection attempt if WSS server sends unexpected HTTP response during handshake, like a 409', async () => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the test testing for the exponential reconnection doubling spiral bug.

@filmaj filmaj changed the title WIP: Handling 409 response from websocket server socket-mode: Handling WS errors during handshake Nov 26, 2024
@filmaj filmaj marked this pull request as ready for review November 26, 2024 15:39
@filmaj filmaj requested a review from a team November 26, 2024 15:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend Issue related to the Slack API bug M-T: A confirmed bug report. Issues are confirmed when the reproduction steps are documented discussion M-T: An issue where more input is needed to reach a decision draft pkg:socket-mode applies to `@slack/socket-mode` server-side-issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

WebSocket error occurred: Unexpected server response: 409
2 participants