Regression in idle socket handling #24980

timcosta · 2018-12-12T04:53:38Z

Version: v8.14.0
Platform: macOS 10.13.3, node:carbon LTS docker image
Subsystem: http

I'd like to report a possible regression introduced to the http module between versions 8.9.4 and 8.14.0.

Sockets that are opened but data is not transferred are closed immediately after data transmission once they have idled for >40 seconds.

Reproduction available here: https://github.com/timcosta/node_tcp_regression_test

We ran a tcpdump on our servers that were 504ing, and saw that node is responding with an ACK followed almost immediately by a duplicate ACK with an additional RST on the same socket.

Timeouts are set to 60 seconds on the client (AWS ELB) and 2 minutes on the node server (hapi.js).

I'm filing this as a node core issue as the error can be reproduced by using both hapi and the bare node http module as can be seen in this travis build: https://travis-ci.com/timcosta/node_tcp_regression_test/builds/94440224

The error seems to be not consistent per travis on versions 8.14.0, 10.14.2, and 11.4.0 but the build consistently passes on v8.9.4 which leads me to believe there is a possible regression.

cc: @jtymann @dstreby

The text was updated successfully, but these errors were encountered:

Trott · 2018-12-12T05:47:12Z

@nodejs/http

lpinca · 2018-12-12T09:16:44Z

Could this be caused by eb43bc04b1?

timcosta · 2018-12-12T16:51:01Z

Seems likely @lpinca. The timings and behavior match up. That seems to have broken AWS ELBs with default settings that front node.js back ends though, as this timeout is lower than the ELB default of 60 seconds.

lpinca · 2018-12-12T17:02:40Z

I see, that change was part of a security release so it didn't go through the normal release cycle. I'm not sure why 40 sec was chosen as default value but it can be customised.

timcosta · 2018-12-12T19:30:08Z

Hm okay, I'd propose the default value be changed to something > 60 seconds, as that's the default timeout for ELBs and this issue likely broke node in a default configuration behind ELBs for more than just us.

lpinca · 2018-12-15T06:45:01Z

cc: @mcollina

mcollina · 2018-12-15T12:44:59Z

Currently we are starting to wait for the headers when we receive a connection, however we could do this on first byte solving the issue at hand. I'll see if I can code something up that will address this.

Note that this is configurable https://nodejs.org/api/http.html#http_server_headerstimeout, so you can increase that to 60s, solving your immediate issue.

We picked 40 seconds, because it is the default Apache is using.

cc @nodejs/lts @MylesBorins

thomasjungblut · 2019-03-04T20:17:42Z

Does #26166 look related to you guys? I was just looking through other connection RST issues here.

thomasjungblut · 2019-03-17T09:53:24Z

cross posting this with #26166:

we tried setting the header timeout to 0s on top of node:dubnium-jessie-slim (currently at v10.15.3), but to no avail. Also tried on top of node:11 now and we still get these pesky 504s. Also setting the IDLE timeout to 30s on the ELB seemed to not help either.

furthermore we have a similar issue using ALBs and websockets which give us 502s in that scenario

edit: we ultimately fixed it by just sidecar'ing a golang single host reverse proxy in front of the nodejs process.

ezekg · 2019-07-16T20:44:17Z

@thomasjungblut how did sidecaring a golang reverse proxy fix your issue? I'm not sure I understand. I've been dealing with this ELB <-> Node issue for weeks and have not been able to find a solution. For awhile I thought it was an unlikely k8s bug causing an RST packet, but after applying several patches to k8s for different problems, nothing worked.

Currently, my idle timeout setup looks like this (the high numbers are recommended by Google's load balancer documentation which I used as a reference):

Cloudflare idle timeout: 300s (not configurable)
ELB idle timeout: 600s
Server keepalive timeout: 620s
Server headers timeout: 621s
Server timeout: 120s (should this be higher?)

Which should technically work, right? But it doesn't. And it doesn't make sense why I'm seeing so many 504s and connection resets. Any ideas?

timcosta · 2019-07-16T22:29:11Z

@ezekg you can read a bit more about how I solved it here: https://www.timcosta.io/how-we-found-a-tcp-hangup-issue-between-aws-elbs-and-node-js/

There are code snippets in that article to help you figure out exactly when the socket timeout is occurring, which will tell you which of the timeouts you are hitting.

tldr though is that all of your server timeouts need to be above the ELB timeout, so my guess is that yes, your server timeout needs to be higher.

thomasjungblut · 2019-07-17T06:30:55Z

@ezekg apparently Go closes connections properly with a FIN and it deals with the RST packets from the nodejs somewhat gracefully in that regard.

Fixes: nodejs#24980 Refs: nodejs@eb43bc04b1

Fixes: #24980 Refs: eb43bc04b1 PR-URL: #30071 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Matteo Collina <matteo.collina@gmail.com> Reviewed-By: Trivikram Kamat <trivikr.dev@gmail.com> Reviewed-By: Colin Ihrig <cjihrig@gmail.com> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de> Reviewed-By: Rich Trott <rtrott@gmail.com>

Fixes: nodejs#24980 Refs: nodejs@eb43bc04b1 PR-URL: nodejs#30071 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Matteo Collina <matteo.collina@gmail.com> Reviewed-By: Trivikram Kamat <trivikr.dev@gmail.com> Reviewed-By: Colin Ihrig <cjihrig@gmail.com> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de> Reviewed-By: Rich Trott <rtrott@gmail.com>

Fixes: #24980 Refs: eb43bc04b1 PR-URL: #30071 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Matteo Collina <matteo.collina@gmail.com> Reviewed-By: Trivikram Kamat <trivikr.dev@gmail.com> Reviewed-By: Colin Ihrig <cjihrig@gmail.com> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de> Reviewed-By: Rich Trott <rtrott@gmail.com>

ismailyagci · 2020-09-26T13:34:55Z

Friends have had the same problem After working for 15 days, the source of the problem is "server.headersTimeout = 7200000;" you can fix it by adding this code

lpinca added the http Issues or PRs related to the http subsystem. label Dec 12, 2018

MylesBorins self-assigned this Dec 18, 2018

mcollina assigned mcollina and unassigned MylesBorins Mar 5, 2019

harvii mentioned this issue Aug 27, 2019

Nodejs Regression elastic/kibana#44062

Closed

timcosta added a commit to timcosta/node that referenced this issue Oct 22, 2019

http,https: increase server headers timeout

7a907d7

Fixes: nodejs#24980 Refs: nodejs@eb43bc04b1

timcosta mentioned this issue Oct 22, 2019

http,https: increase server headers timeout #30071

Closed

3 tasks

Trott closed this as completed in e17403e Dec 14, 2019

ebikt mentioned this issue May 16, 2020

Proper handling of slow-loris mitigation. #33440

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regression in idle socket handling #24980

Regression in idle socket handling #24980

timcosta commented Dec 12, 2018 •

edited

Loading

Trott commented Dec 12, 2018

lpinca commented Dec 12, 2018

timcosta commented Dec 12, 2018

lpinca commented Dec 12, 2018 •

edited

Loading

timcosta commented Dec 12, 2018

lpinca commented Dec 15, 2018

mcollina commented Dec 15, 2018

thomasjungblut commented Mar 4, 2019

thomasjungblut commented Mar 17, 2019 •

edited

Loading

ezekg commented Jul 16, 2019 •

edited

Loading

timcosta commented Jul 16, 2019

thomasjungblut commented Jul 17, 2019

ismailyagci commented Sep 26, 2020

Regression in idle socket handling #24980

Regression in idle socket handling #24980

Comments

timcosta commented Dec 12, 2018 • edited Loading

Trott commented Dec 12, 2018

lpinca commented Dec 12, 2018

timcosta commented Dec 12, 2018

lpinca commented Dec 12, 2018 • edited Loading

timcosta commented Dec 12, 2018

lpinca commented Dec 15, 2018

mcollina commented Dec 15, 2018

thomasjungblut commented Mar 4, 2019

thomasjungblut commented Mar 17, 2019 • edited Loading

ezekg commented Jul 16, 2019 • edited Loading

timcosta commented Jul 16, 2019

thomasjungblut commented Jul 17, 2019

ismailyagci commented Sep 26, 2020

timcosta commented Dec 12, 2018 •

edited

Loading

lpinca commented Dec 12, 2018 •

edited

Loading

thomasjungblut commented Mar 17, 2019 •

edited

Loading

ezekg commented Jul 16, 2019 •

edited

Loading