-
-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify the closing handshake #1902
Conversation
0019b57
to
cdc99a2
Compare
|
This is really interesting and looks very promising imo! Great work. I've been playing around with similar changes myself as well, and testing the code you posted before for Chrome's behaviour, to try and confirm how everything works here. I'd quite like to try and finish putting some tests together to check exactly how the previous version behaved so we can check how that will change with these updates. In theory we should be able to simulate each of the possible open/half-open cases in tests from within node ourselves, which would make it much easier to be confident about the real-world impact. I don't think there's any hurry here and I'm a bit squeezed for time right now, so I'm going to try and flesh those out first to explore this in a bit more depth, and come back to this with some testing and a proper review in a few days. Hope that's ok! |
Sure, there is no hurry. Take your time. |
cdc99a2
to
fb58179
Compare
I've just opened a PR with some extra tests to explore all this. I haven't done thorough real world testing, just lots of playing around with the unit tests, but those tests clearly show broken behaviour on master, and working behaviour on this branch, which seems like a great sign to me 👍 On your points above:
That makes sense to me too, and I agree it can be done separately 👍
I haven't extensively tested other clients, but the tests here do some basic simulation of every combination of half/full-close supported peers, and at the very least they all seem to be able to shut down cleanly in simple cases, so we can be confident they're not completely broken.
The WebSocket RFC 7.1.4 says:
I think that's probably the clearest definition around: a websocket is cleanly closed only if we completed the handshake before the socket was fully closed, which AFAICT just requires that we sent and received a close frame. |
f713a6e
to
29c7f29
Compare
- When the socket emits the `'end'` event, do not call `socket.end()`. End only the `receiver` stream. - Do not wait for a close frame to be received and sent before calling `socket.end()`. Call it right after the close frame is written. - When the `receiver` stream emits `'finish'`, send a close frame if no close frame is received. The assumption is that the socket is allowed to be half-open. On the server side this is always true (unless the user explicitly sets the `allowHalfOpen` property of the socket to `false`). On the client side the user might use an agent so we set `socket.allowHalfOpen` to `true` when the `http.ClientRequest` object emits the `'upgrade'` event. Refs: #1899
If we're already closing, then we shouldn't move into the new -2 ready state (i.e. closing-because-the-remote-end-closed). We're already closing, we shouldn't go back to a state that allows sending another close frame.
29c7f29
to
1d923fa
Compare
I amended the first commit with a minor change: There is no need to wait for diff --git a/lib/sender.js b/lib/sender.js
index ad71e19..18f0cf4 100644
--- a/lib/sender.js
+++ b/lib/sender.js
@@ -151,6 +151,7 @@ class Sender {
}),
cb
);
+ this._socket.end();
}
/**
diff --git a/lib/websocket.js b/lib/websocket.js
index 1b652ca..fbe4a67 100644
--- a/lib/websocket.js
+++ b/lib/websocket.js
@@ -233,7 +233,6 @@ class WebSocket extends EventEmitter {
if (err) return;
this._closeFrameSent = true;
- this._socket.end();
});
// |
I'm a bit torn on how to handle this change.
I think the best option is to include this in a minor release. @cTn-dev, sorry for pinging you here, but If I remember correctly you were using permessage-deflate in production. Is there any chance you can help us testing this? Thank you. |
I think we're actually being too aggressive with closure here I'm afraid. It's correct from a TCP POV to half-close as soon as we've sent everything, but the fully correct websockets behaviour is to close after a close frame is both sent + received, or to close immediately after sending a close frame only when we hit an error. We do still want some of the previous behaviour! We don't want to remove that completely. A bit of googling suggests some other servers will indeed fail in this case if we're connecting to them from node (e.g. when connecting to Akka servers, although they're quite apologetic about it: https://doc.akka.io/docs/akka-http/current/client-side/websocket-support.html#half-closed-client-websockets). Relevant bits of the spec:
(i.e. after we have both close frames we should start half-closing to initiate TCP shutdown)
(i.e. after an error we should send a close frame and only then immediately half-close to initiate TCP shutdown) Sorry, I think this is my fault 😬. My explanation in #1899 wasn't clear and I think I've confused myself slightly en route too. This change is still required, but only for the case where we send a close frame after an error I think. Helpfully that does reduce the size of this change and the potential impact though! |
Sure, i can take it for a spin, is there a specific configuration/scenario you would like me to focus on/emulate? |
Yes, that is my understanding as well and the reason why it is currently implemented is this way.
I still think that the easiest way to follow the spec recommendation of sending a close frame after an error is a best effort approach like the one suggested in #1892 (comment). @cTn-dev no particular configuration. The ideal scenario would involve some incoming and/or outgoing data buffering on both peers and then see if there is data loss in any of the peer after the connection is closed, for example by monitoring their close code. |
The problem with using
I think that means if we send a RST (due to pending incoming data on these platforms, or any incoming data that's already in transit) then the remote peer will lose data that was successfully sent before the error but which hadn't been read yet, and/or the close frame itself. It's actually a bit worse in our case I think, because we might have data still being compressed in the sender which would block the close frame. That means the destroy() in your example would be called before we'd sent the close frame or pending data at all, so all of that would always be lost. Aside from potential data loss, it'll also result in unnecessary socket errors on the remote peer, instead of a clean close frame + TCP shutdown, which is a mildly annoying for everybody. I agree it's easier to just slam everything shut, but it is cleaner and more correct to I think doing so within this PR looks something like:
My understanding is that in that case:
Does that make sense and sound sensible to you? |
Yes. |
So, as of right now i don't really have a straight forward way to make sure something is in the buffers on each side. The close event is returning a pretty frequent 1006 instead of the expected/client sent 1000. These early results should be taken with a grain of salt since the websocket server could be at "fault" here, i will have to dig deeper to see what is going on here. |
There is actually a problem with this:
Does it make sense? I'm not sure how to fix this if we want to keep the current behavior. If |
Ok yes, I think you're right, that's basically true. In happy behaviour with a clean close, they MUST always send a close frame first, that's quite clear. If there's some kind of error somewhere, they might not, and they're still technically in the rules. It's encouraged (in 7.1.7: after errors they SHOULD send a close frame before starting to close the connection, they SHOULD NOT close the connection unexpectedly for other reasons) but not a strict MUST anywhere. If they half-shutdown a socket though they are doing something weird, and it seems very likely that either we or them have made an error at this point which is why the connection is being closed, which probably means they're ignoring anything we send anyway. It's still important that we finish processing any inputs they sent before closing the socket (so we should accept any pending compressed or close frames) but AFAICT we do do that correctly already. I guess it seems reasonable to just immediately close our end of the connection at that point too, to dump any pending data, and call it an unclear 1006 closure.
I think we could work around the case you've got there with something like 'after the close frame is sent, if the socket is half-shutdown already then just end', but given the above I think you're totally right and we shouldn't worry about sending close frames and pending data in this case. |
Yes, I thought about that. We can do if (
this._closeFrameReceived ||
this._receiver._writableState.finished ||
this._receiver._writableState.errorEmitted
) {
this._socket.end();
} in the callback of |
Even with that check in place, a dead lock can be created if the user calls This is not an issue in the receiver error case because I think I'm fine dropping buffered outgoing data and sending no close frame if the other peer ends the socket without sending a closing frame. |
I'm convinced, yeah this seems disproportionately complicated 👍 |
@pimterry besides the "simultaneous errors" there is another condition that results in a dead lock in master and is not addressed by adding Line 241 in 145480a
The problem arises when
To fix that we also need to change this Line 228 in 145480a
to something like I also tested how browsers handle the "socket end with no close frame" and the "error after open" cases.
const crypto = require('crypto');
const http = require('http');
const GUID = '258EAFA5-E914-47DA-95CA-C5AB0DC85B11';
const data = `<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
</head>
<body>
<script>
(function () {
const ws = new WebSocket('ws://localhost:8080');
ws.onopen = function () {
console.log('open');
};
ws.onclose = function (evt) {
console.log(evt.code);
};
})();
</script>
</body>
</html>`;
const server = http.createServer();
server.on('request', function (request, response) {
response.setHeader('Content-Type', 'text/html');
response.end(data);
});
server.on('upgrade', function (request, socket) {
const key = crypto
.createHash('sha1')
.update(request.headers['sec-websocket-key'] + GUID)
.digest('base64');
socket.on('error', console.error);
socket.on('data', function() {
socket.write('ECONNRESET?');
});
socket.write(
[
'HTTP/1.1 101 Switching Protocols',
'Upgrade: websocket',
'Connection: Upgrade',
`Sec-WebSocket-Accept: ${key}`,
'\r\n'
].join('\r\n')
);
process.once('SIGINT', function () {
console.log('Sending an invalid frame');
socket.write(Buffer.from([0x85, 0x00]));
});
});
server.listen(8080, function () {
console.log('Listening on *:8080');
}); The script above triggers an Finally, |
Ensure `socket.end()` is called if an error occurs simultaneously on both peers. Refs: #1902
Ensure that `socket.end()` is called if an error occurs simultaneously on both peers. Refs: #1902
That should be true, I agree, but as documented in the spec, some platforms will RST anyway in some cases, e.g. if there is more data waiting in our receive queue. Reports like libuv/libuv#3034 link to various specific cases where Windows will send RSTs in node if you try to shut down too aggressively. Even without that, it's quite possible that the remote peer has data in flight when we This will result in unnecessary data loss and connection errors for the remote peer, and if we can we should close things properly to avoid that. As far as I can tell, #1908 looks great though, and does resolve everything without the larger much riskier changes from this PR. With that PR merged, AFAICT:
I don't really mind if we do drop other pending outgoing data, if you want, by sending the close frame immediately instead of adding it to the queue. The spec doesn't define this either way that I can see, and it doesn't seem unreasonable if that makes our lives simpler. I do think we should not destroy the socket immediately though, since everything explicitly says that we shouldn't because it can cause real problems, and #1908 means everything is now cleaned up properly regardless. What are the reasons you'd prefer to destroy() immediately instead of doing a clean TCP shutdown? From my POV: the spec explicitly says we should wait, I don't see any substantial downside, and it avoids the risk of data loss of both already sent data and the error close frame. I'm not sure browsers are good benchmarks for behaviour here by the way - imo server to client error feedback is usually much more important & useful than the opposite, so the tradeoffs for WS are quite different to those for a web browser. As a developer building a websocket API, I really want my API to tell bad clients what they're doing wrong, so they can change their broken implementations! As a user browsing the internet or a developer building an API client, I'm usually not that bothered. And in general of course, just because Safari is poorly behaved doesn't mean we should be too 😃 |
|
This was a fun exercise and a constructive discussion. Thanks to both of you. |
Ensure that `socket.end()` is called if an error occurs simultaneously on both peers. Refs: #1902
Instead of abruptly closing all WebSocket connections, try to close them cleanly using the 1001 status code. If the HTTP/S server was created internally, then close it and emit the `'close'` event when it closes. Otherwise, if client tracking is enabled, then emit the `'close'` event when the number of connections goes down to zero. Otherwise, emit it in the next tick. Refs: #1902
Instead of abruptly closing all WebSocket connections, try to close them cleanly using the 1001 status code. If the HTTP/S server was created internally, then close it and emit the `'close'` event when it closes. Otherwise, if client tracking is enabled, then emit the `'close'` event when the number of connections goes down to zero. Otherwise, emit it in the next tick. Refs: #1902
When `WebSocketServer.prototype.close()` is called, stop accepting new connections but do not close the existing ones. If the HTTP/S server was created internally, then close it and emit the `'close'` event when it closes. Otherwise, if client tracking is enabled, then emit the `'close'` event when the number of connections goes down to zero. Otherwise, emit it in the next tick. Refs: #1902
When `WebSocketServer.prototype.close()` is called, stop accepting new connections but do not close the existing ones. If the HTTP/S server was created internally, then close it and emit the `'close'` event when it closes. Otherwise, if client tracking is enabled, then emit the `'close'` event when the number of connections goes down to zero. Otherwise, emit it in the next tick. Refs: #1902
When `WebSocketServer.prototype.close()` is called, stop accepting new connections but do not close the existing ones. If the HTTP/S server was created internally, then close it and emit the `'close'` event when it closes. Otherwise, if client tracking is enabled, then emit the `'close'` event when the number of connections goes down to zero. Otherwise, emit it in the next tick. Refs: #1902
'end'
event, do not callsocket.end()
.End only the
receiver
stream.socket.end()
. Call it right after the close frame is written.receiver
stream emits'finish'
, send a close frame if noclose frame is received.
The assumption is that the socket is allowed to be half-open. On the
server side this is always true (unless the user explicitly sets the
allowHalfOpen
property of the socket tofalse
). On the client sidethe user might use an agent so we set
socket.allowHalfOpen
totrue
when the
http.ClientRequest
object emits the'upgrade'
event.Refs: #1899