-
Notifications
You must be signed in to change notification settings - Fork 663
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
socket-mode: Handling WS errors during handshake #2099
Changes from 2 commits
9132ac1
4b4dd72
6f918b0
2fecb13
1c035e3
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -229,9 +229,13 @@ | |
this.logger.debug(`Before trying to reconnect, this client will wait for ${msBeforeRetry} milliseconds`); | ||
return new Promise((res, _rej) => { | ||
setTimeout(() => { | ||
this.logger.debug('Continuing with reconnect...'); | ||
this.emit(State.Reconnecting); | ||
cb.apply(this).then(res); | ||
if (this.shuttingDown) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. While not a problem brought up by users, I noticed a problem here when writing integration tests:
.. then the client will keep trying to reconnect forever via this part of the code. Adding this guard here just in case. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
this.logger.debug('Client shutting down, will not attempt reconnect.'); | ||
} else { | ||
this.logger.debug('Continuing with reconnect...'); | ||
this.emit(State.Reconnecting); | ||
cb.apply(this).then(res); | ||
} | ||
}, msBeforeRetry); | ||
}); | ||
} | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -61,10 +61,7 @@ function errorWithCode(error: Error, code: ErrorCode): CodedError { | |
* A factory to create SMWebsocketError objects. | ||
*/ | ||
export function websocketErrorWithOriginal(original: Error): SMWebsocketError { | ||
const error = errorWithCode( | ||
new Error(`Failed to send message on websocket: ${original.message}`), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Removed "Failed to send message" as part of this error, as this error can be raised unrelated to message-sending. In this PR, I noticed that websocket handshake errors also used this code path, which made the error message misleading. |
||
ErrorCode.WebsocketError, | ||
) as Partial<SMWebsocketError>; | ||
const error = errorWithCode(new Error(original.message), ErrorCode.WebsocketError) as Partial<SMWebsocketError>; | ||
error.original = original; | ||
return error as SMWebsocketError; | ||
} | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -42,9 +42,9 @@ describe('Integration tests with a WebSocket server', () => { | |
}); | ||
}); | ||
afterEach(async () => { | ||
server.close(); | ||
if (server) server.close(); | ||
server = null; | ||
wss.close(); | ||
if (wss) wss.close(); | ||
wss = null; | ||
exposed_ws_connection = null; | ||
if (client) { | ||
|
@@ -90,7 +90,7 @@ describe('Integration tests with a WebSocket server', () => { | |
await client.disconnect(); | ||
}); | ||
}); | ||
describe('catastrophic server behaviour', () => { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🔥! |
||
describe('`apps.connections.open` API failure modes', () => { | ||
beforeEach(() => { | ||
client = new SocketModeClient({ | ||
appToken: 'whatever', | ||
|
@@ -130,7 +130,7 @@ describe('Integration tests with a WebSocket server', () => { | |
await client.disconnect(); | ||
}); | ||
}); | ||
describe('failure modes / unexpected messages sent to client', () => { | ||
describe('unexpected socket messages sent to client', () => { | ||
const debugLoggerSpy = sinon.stub(); // add the following to expose further logging: .callsFake(console.log); | ||
const noop = () => {}; | ||
beforeEach(() => { | ||
|
@@ -139,6 +139,7 @@ describe('Integration tests with a WebSocket server', () => { | |
clientOptions: { | ||
slackApiUrl: `http://localhost:${HTTP_PORT}/`, | ||
}, | ||
logLevel: 'debug', | ||
logger: { | ||
debug: debugLoggerSpy, | ||
info: noop, | ||
|
@@ -168,6 +169,47 @@ describe('Integration tests with a WebSocket server', () => { | |
assert.isTrue(debugLoggerSpy.calledWith(sinon.match('Unable to parse an incoming WebSocket message'))); | ||
await client.disconnect(); | ||
}); | ||
it('should maintain one serial reconnection attempt if WSS server sends unexpected HTTP response during handshake, like a 409', async () => { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is the test testing for the exponential reconnection doubling spiral bug. |
||
// override socket mode client instance with lower client ping timeout, which controls reconnection rate | ||
client = new SocketModeClient({ | ||
appToken: 'whatever', | ||
clientOptions: { | ||
slackApiUrl: `http://localhost:${HTTP_PORT}/`, | ||
}, | ||
clientPingTimeout: 20, | ||
logLevel: 'debug', | ||
}); | ||
// shut down the default mock WS server used in these tests as we will customize its behaviour in this test | ||
wss.close(); | ||
wss = null; | ||
// custom HTTP server that blows up during initial WS handshake | ||
const badServer = createServer((_req, res) => { | ||
res.writeHead(409, { 'content-type': 'application/json' }); | ||
res.end( | ||
JSON.stringify({ | ||
message: 'Unexpected server response: 409', | ||
}), | ||
); | ||
}); | ||
badServer.listen(WSS_PORT); | ||
let closed = 0; | ||
client.on('close', () => { | ||
closed++; | ||
}); | ||
// do not use await here, since `start()` won't return until the connection is established. we are intentionally testing the connection establishment failure, so that will never finish. so, let's run this in a rogue "thread"! | ||
client.start(); | ||
await sleep(50); | ||
// after 50ms, with a timeout of 20ms, we would expect 2 retries by this point. | ||
assert.equal(closed, 2, 'unexpected number of times `close` event was raised during reconnection!'); | ||
await client.disconnect(); | ||
await new Promise((res, rej) => { | ||
// shut down the bad server | ||
badServer.close((err) => { | ||
if (err) rej(err); | ||
else res(); | ||
}); | ||
}); | ||
}); | ||
}); | ||
describe('lifecycle events', () => { | ||
beforeEach(() => { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not tracking the current websocket instance was a problem in the case the web socket server returns an error. Since only once the
open
event was emitted would we keep track of the websocket (this line), any errors during the websocket handshake would call intodisconnect
which, crucially, only terminates the underlying network resources for the trackedthis.websocket
.That was the underlying issue that caused the "doubling up" of reconnection attempts (see "Issue 1" described by @gdhardy1 in this comment): each time an error was encountered during websocket handshake, there would be an 'untracked'
ws
instance still executing reconnect attempts in addition to a new one being spawned. Essentially websocket-connection-bombing ourselves.If you remove the changes in this file, you will see the integration test I added at the end of this PR fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for finding this. I was able to patch this in and test it while we await a full fix. It stabilized things tremendously.