Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

http: fix timeout reset after keep-alive timeout #13549

Closed
wants to merge 3 commits into from

Conversation

aqrln
Copy link
Contributor

@aqrln aqrln commented Jun 8, 2017

socketOnData is not called for headers, the HTTP parser consumes
the stream at that moment. Move the timeout reset logic to
onParserExecute and add two tests:

  • test-http-server-keep-alive-timeout-slow-server is a regression test
    for Http connections aborted after 5s / keepAliveTimeout  #13391. It ensures that the server-side keep-alive timeout will
    not fire during processing of a request.

  • test-http-server-keep-alive-timeout-slow-headers ensures that the
    regular socket timeout is restored as soon as a client starts sending
    a new request, not as soon as the whole message is received, so that
    the keep-alive timeout will not fire while, e.g., the client is
    sending large cookies.

Fixes: #13391

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • commit message follows commit guidelines
Affected core subsystem(s)

http

@aqrln aqrln added dont-land-on-v4.x http Issues or PRs related to the http subsystem. labels Jun 8, 2017
@aqrln
Copy link
Contributor Author

aqrln commented Jun 8, 2017

response += chunk;
if (chunk.includes('\r\n')) {
onHeaders();
socket.removeListener('data', onData);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duh... looks like I was too fast on starting CI. These two lines should be swapped. It will actually work either way, but in this order socket.removeListener() is useless.

@aqrln
Copy link
Contributor Author

aqrln commented Jun 8, 2017

Whoa... all those test-https-set-timeout-server failures. Works for me locally on macOS, but I'll check it out on a Linux VM.

@aqrln
Copy link
Contributor Author

aqrln commented Jun 9, 2017

The last commit should unbreak the CI, but even if it doesn't, it is a legit bugfix per se ¯\_(ツ)_/¯

CI: https://ci.nodejs.org/job/node-test-pull-request/8579/

@aqrln
Copy link
Contributor Author

aqrln commented Jun 9, 2017

/cc @nodejs/http
/cc @jasnell @mcollina @fhinkel @refack (reviewers of the original PR)

@mcollina
Copy link
Member

mcollina commented Jun 9, 2017

@aqrln which original PR?

lib/net.js Outdated
@@ -256,6 +258,9 @@ function Socket(options) {

// Used after `.destroy()`
this[BYTES_READ] = 0;

this[kLastTimeout] = 0;
this[kPrevTimeout] = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment explaining what those symbols do, and which value would they contain over time?

Can you also add a test for this beahavior in net, even if it is just internal?

lib/net.js Outdated
@@ -256,6 +258,9 @@ function Socket(options) {

// Used after `.destroy()`
this[BYTES_READ] = 0;

this[kLastTimeout] = 0;
this[kPrevTimeout] = 0;
Copy link
Contributor

@mscdex mscdex Jun 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these being attached here when they're (only) used by the http module? It seems like these should instead be set by http so as not to affect (performance-wise) those who use net directly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO they must be in net because this information is only available in net: https://github.com/nodejs/node/pull/13549/files#diff-e6ef024c3775d787c38487a6309e491dR394

Copy link
Contributor

@refack refack Jun 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily, L394 could have an if.
(it's a classical point for polymorphism, but an if can accomplish "extendibility")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are updated by Socket.prototype.setTimeout() .

Copy link
Contributor

@mscdex mscdex Jun 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the current msecs for the socket is already available via socket._idleTimeout and http would already know what value it is setting, so I don't see why it needs to be done in net (or perhaps why anything additional needs to be stored on the socket itself at all).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 if the net additions could be avoided.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not extremely happy with adding these properties to Socket too, but I don't have any other solution offhand. The problem is that the HTTP server needs the previous value of the timeout to restore it after the keep-alive timeout is not needed anymore, but it does not necessarily know what the value is. test-https-set-timeout-server used to fail before f381340eaff45e840f6a35a954333dc4b41bca57 because of using res.setTimeout(), which, well, may be resolved at the HTTP level, but a user could have used res.socket.setTimeout() directly, and there's not much we can in http in this case.

@ChALkeR
Copy link
Member

ChALkeR commented Jun 9, 2017

@mcollina #2534 was the original PR.

@mcollina
Copy link
Member

mcollina commented Jun 9, 2017

@aqrln can you amend your commit and add the original PR url?

@aqrln
Copy link
Contributor Author

aqrln commented Jun 9, 2017

@mcollina sure.

Copy link
Contributor

@refack refack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % whatever is decided about who inits this[kLastTimeout] & this[kPrevTimeout]

lib/net.js Outdated
@@ -256,6 +258,9 @@ function Socket(options) {

// Used after `.destroy()`
this[BYTES_READ] = 0;

this[kLastTimeout] = 0;
this[kPrevTimeout] = 0;
Copy link
Contributor

@refack refack Jun 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily, L394 could have an if.
(it's a classical point for polymorphism, but an if can accomplish "extendibility")

@@ -0,0 +1,51 @@
'use strict';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Maybe call this one ...-slow-client-headers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW this filename is already exceptionally long :)
Though ...-client-headers might convey more information about what the test does, yes.

socket.on('data', onData);
let response = '';

socket.write('GET / HTTP/1.1\r\n' +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe put a comment

// simulate a client that sends headers slowly

server.listen(0, common.mustCall(() => {
const port = server.address().port;
const socket = net.connect({ port }, common.mustCall(() => {
request(common.mustCall(() => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

// two requests to trigger `keepalive`

@refack
Copy link
Contributor

refack commented Jun 9, 2017

CITGM (as I believe moving the init won't change API behaviour): https://ci.nodejs.org/view/Node.js-citgm/job/citgm-smoker/864/

@aqrln
Copy link
Contributor Author

aqrln commented Jun 9, 2017

@refack isn't it a bit premature to run it at this point? The PR is going to be updated to address the reviews, and probably not once.

@refack
Copy link
Contributor

refack commented Jun 9, 2017

@refack isn't it a bit premature to run it at this point? The PR is going to be updated to address the reviews, and probably not once.

Maybe... but Jenkins was free, and if we find breakage it's interesting.

@aqrln
Copy link
Contributor Author

aqrln commented Jun 9, 2017

@refack oh, it runs faster than I thought CITGM does :)

@mscdex
Copy link
Contributor

mscdex commented Jun 9, 2017

Can you explain why the socket timeout should be used instead of server.timeout once something has been parsed? This change in behavior seems at odds with the documentation and what I would expect to happen.

@aqrln
Copy link
Contributor Author

aqrln commented Jun 9, 2017

@mscdex the timeout may be changed via, e.g., res.setTimeout(), and test-https-set-timeout-server relies on this (as may some userland code do too).

@mscdex
Copy link
Contributor

mscdex commented Jun 9, 2017

Here is an alternative patch for _http_server.js that passes all tests (including the ones in this PR) and avoids net changes:

diff --git a/lib/_http_server.js b/lib/_http_server.js
index 357400e350..8e653a5f98 100644
--- a/lib/_http_server.js
+++ b/lib/_http_server.js
@@ -436,14 +436,6 @@ function socketOnData(server, socket, parser, state, d) {
   assert(!socket._paused);
   debug('SERVER socketOnData %d', d.length);
 
-  if (state.keepAliveTimeoutSet) {
-    socket.setTimeout(0);
-    if (server.timeout) {
-      socket.setTimeout(server.timeout);
-    }
-    state.keepAliveTimeoutSet = false;
-  }
-
   var ret = parser.execute(d);
   onParserExecuteCommon(server, socket, parser, state, ret, d);
 }
@@ -451,6 +443,15 @@ function socketOnData(server, socket, parser, state, d) {
 function onParserExecute(server, socket, parser, state, ret, d) {
   socket._unrefTimer();
   debug('SERVER socketOnParserExecute %d', ret);
+
+  if (state.keepAliveTimeoutSet) {
+    if (server.timeout)
+      socket.setTimeout(server.timeout);
+    else
+      socket.setTimeout(0);
+    state.keepAliveTimeoutSet = false;
+  }
+
   onParserExecuteCommon(server, socket, parser, state, ret, undefined);
 }
 
@@ -545,6 +546,14 @@ function resOnFinish(req, res, socket, state, server) {
 // new message. In this callback we setup the response object and pass it
 // to the user.
 function parserOnIncoming(server, socket, state, req, keepAlive) {
+  if (state.keepAliveTimeoutSet) {
+    if (server.timeout)
+      socket.setTimeout(server.timeout);
+    else
+      socket.setTimeout(0);
+    state.keepAliveTimeoutSet = false;
+  }
+
   state.incoming.push(req);
 
   // If the writable end isn't consuming, then stop reading

@aqrln
Copy link
Contributor Author

aqrln commented Jun 9, 2017

@mscdex awesome, thanks. It might be that I have gotten the reason of the failure wrong.

Fix the logic of resetting the socket timeout of keep-alive HTTP
connections and add two tests:

* `test-http-server-keep-alive-timeout-slow-server` is a regression test
  for nodejsGH-13391.  It ensures that the server-side keep-alive timeout will
  not fire during processing of a request.

* `test-http-server-keep-alive-timeout-slow-headers` ensures that the
  regular socket timeout is restored as soon as a client starts sending
  a new request, not as soon as the whole message is received, so that
  the keep-alive timeout will not fire while, e.g., the client is
  sending large cookies.

Refs: nodejs#2534
Fixes: nodejs#13391
@aqrln
Copy link
Contributor Author

aqrln commented Jun 12, 2017

@mscdex
Copy link
Contributor

mscdex commented Jun 12, 2017

I'm not sure if the original block should also stay (instead of being removed as in my patch) in case the socket data stream switches from being handled directly in C++ land (for efficiency) to JS land when someone manually interacts with the socket (by reading data from it or such IIRC). @indutny added the JS stream stuff awhile back, maybe he can answer this.

@aqrln
Copy link
Contributor Author

aqrln commented Jun 12, 2017

@mscdex hmm... moving the resetSocketTimeout() call from onParserExecute() to onParserExecuteCommon() should cover both cases, right?

And btw, thanks a lot for showing the right direction, after taking a fresh look I could see that my previous patch was actually wrong and didn't work quite the way it was supposed to, duh.

@mscdex
Copy link
Contributor

mscdex commented Jun 12, 2017

@aqrln That should cover both yeah.

@aqrln aqrln mentioned this pull request Jun 12, 2017
@aqrln
Copy link
Contributor Author

aqrln commented Jun 12, 2017

@aqrln
Copy link
Contributor Author

aqrln commented Jun 12, 2017

CI failures are unrelated.

ubuntu1604-arm64 (#13603, #13626):

not ok 1531 parallel/test-child-process-stdio-big-write-end
  ---
  duration_ms: 120.181
  severity: fail
  stack: |-
    timeout
  ...

osx1010 (#13527, #13559):

not ok 194 async-hooks/test-callback-error
  ---
  duration_ms: 60.86
  severity: fail
  stack: |-
    timeout
  ...

@mscdex
Copy link
Contributor

mscdex commented Jun 12, 2017

LGTM

@aqrln
Copy link
Contributor Author

aqrln commented Jun 12, 2017

Oh, and this one is interesting (arm-fanned tests finally started to run):

not ok 96 parallel/test-http-server-keep-alive-timeout-slow-server
  ---
  duration_ms: 3.650
  severity: fail
  stack: |-
    events.js:182
          throw er; // Unhandled 'error' event
          ^
    
    Error: read ECONNRESET
        at exports._errnoException (util.js:1014:11)
        at TCP.onread (net.js:609:25)
  ...

UPD: I can reproduce it with $ python tools/test.py -j 32 --repeat=10000 parallel/test-http-server-keep-alive-timeout-slow-server.

@aqrln
Copy link
Contributor Author

aqrln commented Jun 12, 2017

Increasing timeouts in test-http-server-keep-alive-timeout-slow-server fixes test flakiness under load. The other test seems pretty reliable as is, or at least I could not stress test it out with $ python tools/test.py -j 32 --repeat=1000.

CI: https://ci.nodejs.org/job/node-test-pull-request/8614/

@refack
Copy link
Contributor

refack commented Jun 12, 2017

@aqrln
Copy link
Contributor Author

aqrln commented Jun 12, 2017

@refack 2 failures out of 2000 runs. I've increased the timeouts even more, let's try it.

https://ci.nodejs.org/job/node-stress-single-test/nodes=armv7-ubuntu1404/1297/

UPD: forgot to change RUN_TIMES, actually ran it 2.5 times more times than I intended, and stopped it. Stress test is clean.

@aqrln
Copy link
Contributor Author

aqrln commented Jun 12, 2017

@mscdex @mcollina @refack the only changes since the LGTMs are increased timeouts in tests, but you might want to take a final look.

@aqrln
Copy link
Contributor Author

aqrln commented Jun 13, 2017

Most machines are free at the moment anyway, so
CI: https://ci.nodejs.org/job/node-test-pull-request/8616/

@mscdex
Copy link
Contributor

mscdex commented Jun 13, 2017

Perhaps instead of increasing the timeouts for all environments, use common.platformTimeout() instead so that architectures like arm will get adjusted accordingly?

@aqrln
Copy link
Contributor Author

aqrln commented Jun 13, 2017

@mscdex yeah, that does sound like a good idea.

@aqrln
Copy link
Contributor Author

aqrln commented Jun 13, 2017

@mscdex
Copy link
Contributor

mscdex commented Jun 13, 2017

LGTM

@aqrln
Copy link
Contributor Author

aqrln commented Jun 13, 2017

@aqrln
Copy link
Contributor Author

aqrln commented Jun 13, 2017

Landed in d71718d.

@aqrln aqrln closed this Jun 13, 2017
aqrln added a commit that referenced this pull request Jun 13, 2017
Fix the logic of resetting the socket timeout of keep-alive HTTP
connections and add two tests:

* `test-http-server-keep-alive-timeout-slow-server` is a regression test
  for GH-13391.  It ensures that the server-side keep-alive timeout will
  not fire during processing of a request.

* `test-http-server-keep-alive-timeout-slow-client-headers` ensures that
  the regular socket timeout is restored as soon as a client starts
  sending a new request, not as soon as the whole message is received,
  so that the keep-alive timeout will not fire while, e.g., the client
  is sending large cookies.

Refs: #2534
Fixes: #13391
PR-URL: #13549
Reviewed-By: Refael Ackermann <refack@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Brian White <mscdex@mscdex.net>
@aqrln aqrln deleted the fix-socket-hang-up branch June 13, 2017 04:45
MylesBorins pushed a commit that referenced this pull request Jun 13, 2017
Fix the logic of resetting the socket timeout of keep-alive HTTP
connections and add two tests:

* `test-http-server-keep-alive-timeout-slow-server` is a regression test
  for GH-13391.  It ensures that the server-side keep-alive timeout will
  not fire during processing of a request.

* `test-http-server-keep-alive-timeout-slow-client-headers` ensures that
  the regular socket timeout is restored as soon as a client starts
  sending a new request, not as soon as the whole message is received,
  so that the keep-alive timeout will not fire while, e.g., the client
  is sending large cookies.

Refs: #2534
Fixes: #13391
PR-URL: #13549
Reviewed-By: Refael Ackermann <refack@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Brian White <mscdex@mscdex.net>
MylesBorins pushed a commit that referenced this pull request Jun 13, 2017
Fix the logic of resetting the socket timeout of keep-alive HTTP
connections and add two tests:

* `test-http-server-keep-alive-timeout-slow-server` is a regression test
  for GH-13391.  It ensures that the server-side keep-alive timeout will
  not fire during processing of a request.

* `test-http-server-keep-alive-timeout-slow-client-headers` ensures that
  the regular socket timeout is restored as soon as a client starts
  sending a new request, not as soon as the whole message is received,
  so that the keep-alive timeout will not fire while, e.g., the client
  is sending large cookies.

Refs: #2534
Fixes: #13391
PR-URL: #13549
Reviewed-By: Refael Ackermann <refack@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Brian White <mscdex@mscdex.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
http Issues or PRs related to the http subsystem.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Http connections aborted after 5s / keepAliveTimeout
5 participants