-
Notifications
You must be signed in to change notification settings - Fork 30.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Socket Hang Up / ECONNRESET / ECONNREFUSED on consecutive http requests (>= 100) and temporary FIX #55330
Comments
This might be an issue with the server. Can you reproduce when fetching something else? (Try nodejs.org) |
@redyetidev if it is an issue with the server, why is Node:16, Node:18 is consistently passing the tests? |
Oh, I didn't see that statement, can you try and reproduce in v22? |
I will try with 22, will let you know. |
@redyetidev I tested with 22, I get the same symptoms, HOWEVER, I am starting to think this is because of memcached connections being closed to frequently and Node20/22 hitting a dead socket. I can see that Node18 and before processes the socket connections in a burst of 50 batches, whereas 20/22 is more transient and continuous, which could increase the chances of hitting a dead socket. So to test it out, I commented out I am not sure, I am now inspecting with NODE_DEBUG=NET and will report back once I have more conviction. I think the remedy seems to be not to call |
Closing this issue as it is indriectly related to whats changed since Node 16. I was able to debug whats going on with
So as a result:
I do not know if these differences are due to changes in the event loop, but nonetheless they increase your chances of hitting a dead socket. So the FIX is to NOT call |
Version
v20.18.0
Platform
Subsystem
node:http
What steps will reproduce the bug?
Using node:http, make consecutive
calls to a standard exressjs/<http.Server> server
How often does it reproduce? Is there a required condition?
Randomly (every 1 out of 3 ) jest tests that stress test the server will fail with ECONNRESET / ECONNREFUSED
What is the expected behavior? Why is that the expected behavior?
Like in Node 18, Node 16 or before, node should be able to handle concurrent requests and not hand over to the client a socket that has been closed.
What do you see instead?
if you attach error handlers on the request object returned from
http.get
and console log it, you will end up with one of:With the error message of refused or socket hang up.
Additional information
I recently updated my application from Node:16 to 20, along with the dependencies and all my integration tests were passing accept the server stress test, that makes 100 requests at the same time and feeds data over Server-Sent events. I am using nvm so I can quickly test different versions.
I initially thought this was because one of these packages were updated;
but switching back to Node:16 with NEW or OLD versions of the packages consistently passed the tests, where as Node:20 randomly failed the stress test with either NEW or OLD versions of the packages. So I searched in the issues here and over the net and found that many people have similar issues where the server hands over a closed TCP socket to the client/agent, here are other threads:
ChainSafe/lodestar#5765
axios/axios#5929
#47130
node-fetch/node-fetch#1735
#39810
#43456
My original test, consistently passing with Node:16 is as follows (before and after is redacted):
above
nClients
is 100 andserverDetails
is:The
simulClient
function mimics a user, requests base url, gets the cookie and starts a event-stream (a live response that is never closed) to send data via Server-Sent events{size: //number of clients currently receiving feed}
:I know the
throw
statements inside the promises wont do anything, please ignore it.One thing that I want to address is that, in the other threads I pasted above, people seem to think
keepAlive:false
was to be set on thehttp.Agent
but I think this is not the case. I think default Node16 behavior might be:keepAlive [<boolean>](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Data_structures#Boolean_type) If set to true, it enables keep-alive functionality on the socket immediately after a new incoming connection is received, similarly on what is done in [socket.setKeepAlive([enable][, initialDelay])][socket.setKeepAlive(enable, initialDelay)]. Default: false.
so this is done one the
http.createServer
and not on the agent. Although Node:20.18 docs say default is stillfalse
, I think this might not be the case.'FIX'
My
fix
for this behavior was as follows:for StressTest:
and simulClient became:
so the above combines 3 things:
With 3 of these, running the test on cold start or many times 1 after another, consistently passed the tests. Removing 1 of these randomly cropped up the socket hang up error again.
IMO this breaks user space. I understand people pointing out to limitations and race conditions in
http 1.1
but Node did not suddenly remember the spec in Node:20. This behavior seems to break user space and I think it is more than just settingkeepAlive
to false onhttp.Server
. Something in the Node's event loop handling since node 19 must have changed for this type of behavior to emerge.If the
keepAlive
onhttp.Server
is the true culprit, shouldn't it be set back to what it was in Node:16?What can be done about it so that user space is prioritized above the spec compliance?
The text was updated successfully, but these errors were encountered: