-
Notifications
You must be signed in to change notification settings - Fork 30.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regression in idle socket handling #24980
Comments
@nodejs/http |
Could this be caused by eb43bc04b1? |
Seems likely @lpinca. The timings and behavior match up. That seems to have broken AWS ELBs with default settings that front node.js back ends though, as this timeout is lower than the ELB default of 60 seconds. |
I see, that change was part of a security release so it didn't go through the normal release cycle. I'm not sure why 40 sec was chosen as default value but it can be customised. |
Hm okay, I'd propose the default value be changed to something > 60 seconds, as that's the default timeout for ELBs and this issue likely broke node in a default configuration behind ELBs for more than just us. |
cc: @mcollina |
Currently we are starting to wait for the headers when we receive a connection, however we could do this on first byte solving the issue at hand. I'll see if I can code something up that will address this. Note that this is configurable https://nodejs.org/api/http.html#http_server_headerstimeout, so you can increase that to 60s, solving your immediate issue. We picked 40 seconds, because it is the default Apache is using. cc @nodejs/lts @MylesBorins |
Does #26166 look related to you guys? I was just looking through other connection RST issues here. |
cross posting this with #26166: we tried setting the header timeout to 0s on top of node:dubnium-jessie-slim (currently at v10.15.3), but to no avail. Also tried on top of node:11 now and we still get these pesky 504s. Also setting the IDLE timeout to 30s on the ELB seemed to not help either. furthermore we have a similar issue using ALBs and websockets which give us 502s in that scenario edit: we ultimately fixed it by just sidecar'ing a golang single host reverse proxy in front of the nodejs process. |
@thomasjungblut how did sidecaring a golang reverse proxy fix your issue? I'm not sure I understand. I've been dealing with this ELB <-> Node issue for weeks and have not been able to find a solution. For awhile I thought it was an unlikely k8s bug causing an RST packet, but after applying several patches to k8s for different problems, nothing worked. Currently, my idle timeout setup looks like this (the high numbers are recommended by Google's load balancer documentation which I used as a reference):
Which should technically work, right? But it doesn't. And it doesn't make sense why I'm seeing so many 504s and connection resets. Any ideas? |
@ezekg you can read a bit more about how I solved it here: https://www.timcosta.io/how-we-found-a-tcp-hangup-issue-between-aws-elbs-and-node-js/ There are code snippets in that article to help you figure out exactly when the socket timeout is occurring, which will tell you which of the timeouts you are hitting. tldr though is that all of your server timeouts need to be above the ELB timeout, so my guess is that yes, your server timeout needs to be higher. |
@ezekg apparently Go closes connections properly with a FIN and it deals with the RST packets from the nodejs somewhat gracefully in that regard. |
Fixes: #24980 Refs: eb43bc04b1 PR-URL: #30071 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Matteo Collina <matteo.collina@gmail.com> Reviewed-By: Trivikram Kamat <trivikr.dev@gmail.com> Reviewed-By: Colin Ihrig <cjihrig@gmail.com> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de> Reviewed-By: Rich Trott <rtrott@gmail.com>
Fixes: nodejs#24980 Refs: nodejs@eb43bc04b1 PR-URL: nodejs#30071 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Matteo Collina <matteo.collina@gmail.com> Reviewed-By: Trivikram Kamat <trivikr.dev@gmail.com> Reviewed-By: Colin Ihrig <cjihrig@gmail.com> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de> Reviewed-By: Rich Trott <rtrott@gmail.com>
Fixes: #24980 Refs: eb43bc04b1 PR-URL: #30071 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Matteo Collina <matteo.collina@gmail.com> Reviewed-By: Trivikram Kamat <trivikr.dev@gmail.com> Reviewed-By: Colin Ihrig <cjihrig@gmail.com> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de> Reviewed-By: Rich Trott <rtrott@gmail.com>
Friends have had the same problem After working for 15 days, the source of the problem is "server.headersTimeout = 7200000;" you can fix it by adding this code |
node:carbon
LTS docker imageI'd like to report a possible regression introduced to the
http
module between versions 8.9.4 and 8.14.0.Sockets that are opened but data is not transferred are closed immediately after data transmission once they have idled for >40 seconds.
Reproduction available here: https://github.com/timcosta/node_tcp_regression_test
We ran a tcpdump on our servers that were 504ing, and saw that node is responding with an
ACK
followed almost immediately by a duplicateACK
with an additionalRST
on the same socket.Timeouts are set to 60 seconds on the client (AWS ELB) and 2 minutes on the node server (hapi.js).
I'm filing this as a node core issue as the error can be reproduced by using both hapi and the bare node http module as can be seen in this travis build: https://travis-ci.com/timcosta/node_tcp_regression_test/builds/94440224
The error seems to be not consistent per travis on versions 8.14.0, 10.14.2, and 11.4.0 but the build consistently passes on v8.9.4 which leads me to believe there is a possible regression.
cc: @jtymann @dstreby
The text was updated successfully, but these errors were encountered: