Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tls: fix writeQueueSize prop, long write timeouts #15791

Closed

Conversation

apapirovski
Copy link
Member

@apapirovski apapirovski commented Oct 6, 2017

This PR addresses a long-standing issue of _handle.writeQueueSize in TLS not representing the actual write queue (#15005), as well as the fact that long-lasting writes on both net & tls (http & https) will currently timeout if a socket timeout is set (#15082). The latter has been reported as being in relation to the keep alive timeout but in reality it's just a broad issue that is the most obvious with keep-alive since it's much shorter at 5s (vs 120s).

  • Make writeQueueSize represent the actual size of the write queue within the TLS socket.
  • Add tls test to confirm that bufferSize works as expected.

Second commit:

  • Add getWriteQueueSize on TLSWrap which updates and returns queue size (net & tls).
  • Make _onTimeout check whether an active write is ongoing and if so, call _unrefTimer rather than emitting a timeout event.
  • Add http & https test that checks whether long-lasting (but active) writes timeout or can finish writing as expected.

Haven't worked in C++ in a very, very long time so would definitely appreciate any and all feedback there.

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • commit message follows commit guidelines
Affected core subsystem(s)

http, https, net, test, tls

@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. labels Oct 6, 2017
@mscdex
Copy link
Contributor

mscdex commented Oct 6, 2017

/cc @indutny

@apapirovski
Copy link
Member Author

Just a minor update to use fixtures.readKey instead. (Indirectly benefiting and learning from code-and-learn 👍)

/cc @mcollina

@mcollina
Copy link
Member

mcollina commented Oct 7, 2017

@mcollina
Copy link
Member

mcollina commented Oct 7, 2017

Is updating this state have a performance impact? Have you tried to run our benchmarks? I think we can run them on CI if you do not have the spare CPU cycles (they take a long time).

@apapirovski
Copy link
Member Author

apapirovski commented Oct 7, 2017 via email

@apapirovski
Copy link
Member Author

apapirovski commented Oct 7, 2017 via email

@apapirovski
Copy link
Member Author

I've updated the tests to, hopefully, be less flaky. Would appreciate if we could get another CI started. Thanks!

@joyeecheung
Copy link
Member

@apapirovski
Copy link
Member Author

apapirovski commented Oct 8, 2017

Well, this might take a while. If anyone has any ideas on making these less timeout dependant (while still testing the timeout bug), I'm all ears.

@apapirovski
Copy link
Member Author

apapirovski commented Oct 8, 2017

Ok, I think this should work... ? I can run a stress test on the test on my local system now and it doesn't fail (before only 50% succeeded). Since it's still in sequential (and should stay there), I'm hoping that'll make it not fail on the slower systems.

@joyeecheung
Copy link
Member

@apapirovski
Copy link
Member Author

Well, this seems to succeed everywhere but SmartOS so I guess an improvement... Anyone have any suggestions by any chance? Going to try setting up a SmartOS box to test on...

@apapirovski
Copy link
Member Author

apapirovski commented Oct 8, 2017

Ok, one last try. If this doesn't fix SmartOS then there's a deeper issue that might not have anything to do with the test itself. I can run this for an hour on my system (96 instances at a time) without any failures (previous version would fail like 2.5% of the time).

(And master still fails this test 100% of the time.)

@mcollina
Copy link
Member

mcollina commented Oct 9, 2017

@apapirovski
Copy link
Member Author

apapirovski commented Oct 9, 2017

Oops, I accidentally deleted a few characters before committing the http version of the test so it was testing with readSize of 3333 instead of 333333. No wonder it failed. Could we run the CI just on SmartOS (and OS X)?

If that doesn't work, I have one other potential direction to take this test...

@mcollina
Copy link
Member

@apapirovski
Copy link
Member Author

Thanks for starting it @mcollina. No luck this time either :( No clue if there's a bug in http on SmartOS related to writeQueueSize or if it's just the test. Almost done setting up a SmartOS box so I guess I'll find out...

@apapirovski
Copy link
Member Author

Having a SmartOS box really made this easy to fix. Should be good for another CI run now.

@mcollina
Copy link
Member

@apapirovski
Copy link
Member Author

Thanks! Looks like it worked this time (the SmartOS failure is unrelated, it's in parallel/test-benchmark-string_decoder). Now to get some reviews... :)

ping @indutny since you know the most about the TLS code.

apapirovski added a commit to apapirovski/node that referenced this pull request Oct 25, 2017
Add updateWriteQueueSize which updates and returns queue size
(net & tls). Make _onTimeout check whether an active write
is ongoing and if so, call _unrefTimer rather than emitting
a timeout event.

Add http & https test that checks whether long-lasting (but
active) writes timeout or can finish writing as expected.

PR-URL: nodejs#15791
Fixes: nodejs#15082
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Refael Ackermann <refack@gmail.com>
Reviewed-By: Fedor Indutny <fedor.indutny@gmail.com>
cjihrig pushed a commit that referenced this pull request Oct 25, 2017
This commit handles the case where _onTimeout is called with a
null handle.

Refs: #15791
Fixes: #16484
PR-URL: #16489
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Refael Ackermann <refack@gmail.com>
cjihrig pushed a commit that referenced this pull request Oct 25, 2017
This commit handles the case where _onTimeout is called with a
null handle.

Refs: #15791
Fixes: #16484
PR-URL: #16489
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Refael Ackermann <refack@gmail.com>
apapirovski added a commit to apapirovski/node that referenced this pull request Oct 25, 2017
This commit handles the case where _onTimeout is called with a
null handle.

Refs: nodejs#15791
Fixes: nodejs#16484
PR-URL: nodejs#16489
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Refael Ackermann <refack@gmail.com>
addaleax pushed a commit to ayojs/ayo that referenced this pull request Oct 26, 2017
Make writeQueueSize represent the actual size of the write queue
within the TLS socket. Add tls test to confirm that bufferSize
works as expected.

PR-URL: nodejs/node#15791
Fixes: nodejs/node#15005
Refs: nodejs/node#15006
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Refael Ackermann <refack@gmail.com>
Reviewed-By: Fedor Indutny <fedor.indutny@gmail.com>
addaleax pushed a commit to ayojs/ayo that referenced this pull request Oct 26, 2017
Add updateWriteQueueSize which updates and returns queue size
(net & tls). Make _onTimeout check whether an active write
is ongoing and if so, call _unrefTimer rather than emitting
a timeout event.

Add http & https test that checks whether long-lasting (but
active) writes timeout or can finish writing as expected.

PR-URL: nodejs/node#15791
Fixes: nodejs/node#15082
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Refael Ackermann <refack@gmail.com>
Reviewed-By: Fedor Indutny <fedor.indutny@gmail.com>
addaleax pushed a commit to ayojs/ayo that referenced this pull request Oct 26, 2017
This commit handles the case where _onTimeout is called with a
null handle.

Refs: nodejs/node#15791
Fixes: nodejs/node#16484
PR-URL: nodejs/node#16489
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Refael Ackermann <refack@gmail.com>
addaleax pushed a commit to ayojs/ayo that referenced this pull request Dec 7, 2017
Make writeQueueSize represent the actual size of the write queue
within the TLS socket. Add tls test to confirm that bufferSize
works as expected.

PR-URL: nodejs/node#15791
Fixes: nodejs/node#15005
Refs: nodejs/node#15006
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Refael Ackermann <refack@gmail.com>
Reviewed-By: Fedor Indutny <fedor.indutny@gmail.com>
addaleax pushed a commit to ayojs/ayo that referenced this pull request Dec 7, 2017
Add updateWriteQueueSize which updates and returns queue size
(net & tls). Make _onTimeout check whether an active write
is ongoing and if so, call _unrefTimer rather than emitting
a timeout event.

Add http & https test that checks whether long-lasting (but
active) writes timeout or can finish writing as expected.

PR-URL: nodejs/node#15791
Fixes: nodejs/node#15082
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Refael Ackermann <refack@gmail.com>
Reviewed-By: Fedor Indutny <fedor.indutny@gmail.com>
addaleax pushed a commit to ayojs/ayo that referenced this pull request Dec 7, 2017
This commit handles the case where _onTimeout is called with a
null handle.

Refs: nodejs/node#15791
Fixes: nodejs/node#16484
PR-URL: nodejs/node#16489
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Refael Ackermann <refack@gmail.com>
apapirovski added a commit to apapirovski/node that referenced this pull request Dec 12, 2017
Make writeQueueSize represent the actual size of the write queue
within the TLS socket. Add tls test to confirm that bufferSize
works as expected.

PR-URL: nodejs#15791
Fixes: nodejs#15005
Refs: nodejs#15006
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Refael Ackermann <refack@gmail.com>
Reviewed-By: Fedor Indutny <fedor.indutny@gmail.com>
apapirovski added a commit to apapirovski/node that referenced this pull request Dec 12, 2017
Add updateWriteQueueSize which updates and returns queue size
(net & tls). Make _onTimeout check whether an active write
is ongoing and if so, call _unrefTimer rather than emitting
a timeout event.

Add http & https test that checks whether long-lasting (but
active) writes timeout or can finish writing as expected.

PR-URL: nodejs#15791
Fixes: nodejs#15082
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Refael Ackermann <refack@gmail.com>
Reviewed-By: Fedor Indutny <fedor.indutny@gmail.com>
apapirovski added a commit to apapirovski/node that referenced this pull request Dec 12, 2017
This commit handles the case where _onTimeout is called with a
null handle.

Refs: nodejs#15791
Fixes: nodejs#16484
PR-URL: nodejs#16489
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Refael Ackermann <refack@gmail.com>
MylesBorins pushed a commit that referenced this pull request Dec 20, 2017
Make writeQueueSize represent the actual size of the write queue
within the TLS socket. Add tls test to confirm that bufferSize
works as expected.

Backport-PR-URL: #16420
PR-URL: #15791
Fixes: #15005
Refs: #15006
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Refael Ackermann <refack@gmail.com>
Reviewed-By: Fedor Indutny <fedor.indutny@gmail.com>
MylesBorins pushed a commit that referenced this pull request Dec 20, 2017
Add updateWriteQueueSize which updates and returns queue size
(net & tls). Make _onTimeout check whether an active write
is ongoing and if so, call _unrefTimer rather than emitting
a timeout event.

Add http & https test that checks whether long-lasting (but
active) writes timeout or can finish writing as expected.

Backport-PR-URL: #16420
PR-URL: #15791
Fixes: #15082
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Refael Ackermann <refack@gmail.com>
Reviewed-By: Fedor Indutny <fedor.indutny@gmail.com>
MylesBorins pushed a commit that referenced this pull request Dec 20, 2017
This commit handles the case where _onTimeout is called with a
null handle.

Backport-PR-URL: #16420
Refs: #15791
Fixes: #16484
PR-URL: #16489
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Refael Ackermann <refack@gmail.com>
MylesBorins pushed a commit that referenced this pull request Jan 2, 2018
Make writeQueueSize represent the actual size of the write queue
within the TLS socket. Add tls test to confirm that bufferSize
works as expected.

Backport-PR-URL: #16420
PR-URL: #15791
Fixes: #15005
Refs: #15006
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Refael Ackermann <refack@gmail.com>
Reviewed-By: Fedor Indutny <fedor.indutny@gmail.com>
MylesBorins pushed a commit that referenced this pull request Jan 2, 2018
Add updateWriteQueueSize which updates and returns queue size
(net & tls). Make _onTimeout check whether an active write
is ongoing and if so, call _unrefTimer rather than emitting
a timeout event.

Add http & https test that checks whether long-lasting (but
active) writes timeout or can finish writing as expected.

Backport-PR-URL: #16420
PR-URL: #15791
Fixes: #15082
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Refael Ackermann <refack@gmail.com>
Reviewed-By: Fedor Indutny <fedor.indutny@gmail.com>
MylesBorins pushed a commit that referenced this pull request Jan 2, 2018
This commit handles the case where _onTimeout is called with a
null handle.

Backport-PR-URL: #16420
Refs: #15791
Fixes: #16484
PR-URL: #16489
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Refael Ackermann <refack@gmail.com>
@MylesBorins MylesBorins mentioned this pull request Jan 24, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants