-
Notifications
You must be signed in to change notification settings - Fork 30.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ECONNRESET problem with http.Server and https.Server (node 6.3, 6.2.2 and poss earlier) #7776
Comments
I'm not sure about the classification of this as a 'question'. There is definitely a problem with node's http and https server handling of ECONNRESET's. With node 6.2.2 and 6.3 (maybe earlier) - On linux, unless I trap ECONNRESET's in my client app using on(clientError) and on(tlsClientError), and then process.exit(), the app will eventually stop accepting requests (i.e. it is hung up as a server) - (haven't troubleshot to figure out what's going on - don't know if the listen socket is failing to accept(), or what else could be going on - unclear whether this is a node or libuv issue). Also, in attempting to display the remote ip address and port of the connected socket that receives the ECONNRESET, sometimes a valid ip/port is available - while other times 'remoteAddress' resolves to 'undefined'. On AWS I appear to be getting several early ECONNRESET's, and these look to be the culprits in terms of their lack of ip/port info (is there a time race condition with respect to annotating the js socket?). There seems to be a lot of debate about how to handle ECONNRESET's - and lots of finger pointing. The fact is that while it is true that ECONNRESET's were often triggered in the past by badly behaved software/hardware (i.e. interrupted or no FIN_WAIT_X, TIME_WAIT sequence)...... today it should be seen as "normal" (think about mobile devices abruptly powered off). Thank you. |
The reason I tagged this as a question is that I couldn’t make out any specific behaviour here that you would like to see changed on Node’s side, so by exclusion I understood this as a question about how to handle |
It's not entirely clear from your description whether you are referring to agent-created sockets with
The address and port are queried from the operating system, and cached thereafter. If you call |
Anna - sorry for the confusion - and thanks for forwarding. Ben - to clarify: I'm referring to servers (http/https). By 'idle' I was talking about sockets that are open but idle for some time, as opposed to idle sockets in an agent pool (sorry for the confusion). My app is a web server, using http.server (and the https variant) - so these are incoming connections. The reason I spoke of time-race is due to the fact that I'm querying the remoteAddress of the socket 'after' the ECONNRESET exception is thrown, but very often this comes back 'undefined' (about 2/3 of the time). Here's the exception handler code: ditto for the handler I'm using for the https 'clientError' and 'tlsClientError' (the exception is sometimes thrown on 'clientError', sometimes on 'tlsClientError'. (I've attached a trace with the output of one such instance). This exception appears to be thrown in net.js at line 492 inside the _destroy() method, whereas anything that includes a valid IP address seems to be thrown at net.js line 563. Maybe the lack of IP address makes sense in the former case, given that the socket/connection is being destroyed?? As I said in my last note - if I don't 'handle' the clientError/tlsClientError and do the process.exit(), then the node server eventually stops accepting connections and we're dead in the water. For further information, I'll selectively do the process.exit() on each variant (i.e. IP address and no IP address), and I'll see if I can narrow down if it's one or both variants causing the node server/s to ultimately cease accepting connections. Stand by for further..... |
It appears that there are two paths through the net.js code that originate the ECONNRESET error. line 492, which indicates a "socket hangup" error (with a code of ECONNRESET). and line 563, which indicates an 'ECONNRESET' error. traversing the path where the exception originates from line 492 seems to work fine. traversing the path where the exception originates from line 563 hangs the server (i.e. the path within the node socket onread() handler results in the node http/https.server not accepting any more connections). Interestingly, both the line 492 traversal and the line 563 traversal often have 'undefined' remoteAddress on the socket that is passed up to my clientError/tlsClientError handler. I got confused by references to 'self' in the onread() handler which seem to refer to the 'handle' (not the 'this' of the current object)...... the _destroy() call on line 563 would thus be being called on the 'handle' (Maybe this is correct, I haven't looked further and don't want to create any confusion if this is not an issue). This is replicating frequently (every few minutes) on my server, so should be pretty easy to figure out what's going on - please let me know what additional information you'd like me to collect. In the meantime, my workaround is to http/https.setTimeout(15000) - to make sure that node is initiating most socket close()'s instead of waiting for far end to disconnect. Then I'm still process.exit()'ing the remaining ECONNRESET's (my cluster code takes care of respawning a new server to take over). (Despite using clustering, my testing to collect the above data was done with a single process). Many thanks. |
Hi Anna: Re: node issue #7776 I cannot change the labels - I’m guessing I do not have permission to do I annotated the issue with more information, so hopefully this will be Thank you, Peter B. On Wed, Jul 20, 2016 at 5:48 PM, Anna Henningsen notifications@github.com
|
@fuzing I think you didn’t mean to send that last message to this thread (replying to github mails does that)? Anyway, it’s still not quite clear to me what you are actually trying to say. I know it may be hard to come up with that, but it sounds to me like a full reproduction of the behaviour you are seeing and a description of the behaviour you want/expect to see would be very helpful. |
Hi Anna: I answered Ben's questions with two pretty complete descriptions of the problem - see above. If you need an expectations statement hopefully this sums things up: When one instantiates a node http or https server, one would hope that receiving an ECONNRESET would not hang the server. As I tried to explain above, this is not the case. The server hangs when one of the code traversals out of the net.js code is triggered. I would call this a bug. I'm happy to provide whatever additional information you need in order to resolve the issue. |
Further update on ECONNRESET bug. While both http.Server and https.Server instances are generating the 'clientError' exception, there is no problem when 'clientError' is thrown in either of these instances. The problem appears to be with the https.Server variant when a 'tlsClientError' is thrown. I investigated the node _tls_wrap.js module where the 'tlsClientError' event originates from, and it's clear that 'tlsClientError' can be thrown under two circumstances: the https.Server will hang EVERY TIME that the 'tlsClientError' is thrown from the on.close() function starting at _tls_wrap.js line 812 (the actual tlsClientError is emitted on line 822). I will continue to investigate. |
Just so we're on the same page, the phases for incoming HTTP connections w.r.t. error handling are as follows:
Now the catch: when you install a 'clientError' event listener, you become responsible for closing the socket; it is passed as the second argument to the event listener. From your description, it sounds like you aren't closing it, making the server run out of file descriptors. The listener should look something like this: server.on('clientError', function(err, socket) {
// ...
socket.destroy();
}); Aside: HTTPS servers simply re-emit 'tlsClientError' events as 'clientError' events, you don't have to install a listener for the former. |
Thanks for your response. Makes sense, however as mentioned above, when a tlsClientError is emitted from the on.close() method (from _tls_wrap.js line 822) the https.Server stops accepting additional connections 'immediately' (almost as if the listen socket were closed/destroyed) - i.e. this does not appear to be descriptor starvation. I would expect descriptor starvation to occur some time later, after a number of such instances. In earlier versions of my handlers I was doing the socket.destroy() [it didn't help], but for posterity I will try to prove up what I'm seeing and will update further. |
I take back what I said in my preceeding post about the https.Server stopping accepts of new connections immediately (I am now unsure). However, descriptor starvation is not an issue. Examining /proc//fd reveals only 60 - 100 file descriptors in use when the Server discontinues accepting connections. I'll continue investigating. |
SOLVED: Thank you for all your help, and sorry for wasting your time! |
No problem. Thanks for following up. |
@fuzing You comments really helped me, great. Could you share the link of the issue in mongodb driver? |
Node: 6.3 (and I believe all 6.x)
AWS instance (zone us-west-2) - t2.small, m4.large - Centos 7
receiving the following error:
Error: read ECONNRESET
at exports._errnoException (util.js:1008:11)
at TCP.onread (net.js:563:26)
Similar results with different instance variants, with different loads (from extremely light to extremely heavy).
I believe these may be idle sockets, and therefore the error should be handled by node.
The errors come through on both http and https (tls). They can be trapped in http using .on('clientError'), or on https/tls using 'clientError' and 'tlsClientError' (comes through on both variants).
It's unclear how a user application can safely recover (so am process.exit()'ing for now - about twice a minute :(.
For context: here is the code from the node net.js module: (see my annotation re: line 563 of the code below)....... I haven't traced this back through libuv.
Thanks!
Peter B.
// if we didn't get any bytes, that doesn't necessarily mean EOF.
// wait for the next one.
if (nread === 0) {
debug('not any data, keep waiting');
return;
}
// Error, possibly EOF.
if (nread !== uv.UV_EOF) {
--------> next is line 563 which results in throwing exception ECONNRESET (peter b)
return self._destroy(errnoException(nread, 'read'));
}
debug('EOF');
if (self._readableState.length === 0) {
self.readable = false;
maybeDestroy(self);
}
// push a null to signal the end of data.
self.push(null);
// internal end event so that we know that the actual socket
// is no longer readable, and we can start the shutdown
// procedure. No need to wait for all the data to be consumed.
self.emit('_socketEnd');
}
The text was updated successfully, but these errors were encountered: