-
Notifications
You must be signed in to change notification settings - Fork 30k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: ignore spurious 'EMFILE' #12698
Conversation
@nodejs/testing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without a reasonable explanation as to why EMFILE
is being raised, I'm not thrilled with the idea of adding it just to make the test pass. The other errors listed make sense to me based on the comment above, but EMFILE
seems surprising. This could just be my own ignorance....
Sorry, I missed adding that reference: #3635 (comment) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc/ @nodejs/platform-windows
@@ -37,7 +37,8 @@ const server = net.createServer(function(s) { | |||
// is still happening while the server has been closed. | |||
s.on('error', function(err) { | |||
if ((err.code !== 'ECONNRESET') && | |||
((err.code !== 'ECONNREFUSED'))) { | |||
(err.code !== 'ECONNREFUSED') && | |||
(err.code !== 'EMFILE')) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@refack could you put a brief two line explanation of why this error occurs in a comment above? When reading through tests mysteriously ignored errors are always a bit of code smell.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack.
PTAL
@Trott @gibfahn |
This exclusion would be a continuation to #5422 |
Let's see if a stress test can reproduce the error (or other errors) and if load on the machine matters at all. If the answer here is "move to sequential", I'd rather do that than have a bunch of exceptions in the code. Maybe we can even remove an exception or two if we go that route. Or not. Let's see... Stress test on win10 with just one test running at a time: Stress test on win10 with 100 tests running at a time: |
Stress tests seem to show that |
Maybe windows is actually running out of file handles? IMHO if we want to just keep the original intention of the test (protect from regressing #2847) we could ignore all errors, except a SEGFAULT. If you think this test is useful for covering other cases, then it's worth the extra time... |
Hope I'm not being presumptuous or anything, but on my own fork/branch: I removed all the guard code except the one that I"m sure is needed, moved to sequential, opened #12713 but labeled it I'm in favor of having this be specific to that regression. I like my tests narrowly focused, especially ones designed for oddball regressions. :-D |
Why not have them both, one sequential and unfiltered, and one parallel with the filters? All well documented of course. |
Why would we? Moving a test to sequential just means it can't be run in parallel, which is a minor annoyance as it makes our test runs slightly slower. Duplicating the test seems unnecessary.
I don't think we have a built in way to do that, I normally just build Node in the background, seems to do the trick (not the Node I'm testing obviously 😁 ). Maybe running benchmarks at the same time would provide enough load? |
@gibfahn sorry if I'm not RTFM but how do I trigger a stress test (like on the CI) locally? |
@refack Probably worth doing something like:
and then grepping for You could use Powershell's |
@gibfahn: @Trott's premise is that in sequential mode the exception exclusions are unnecessary (#12713). Although they are obviously needed in parallel mode. So having them tested in both modes tests this scenario under two different conditions. "Can't have too much testing"? P.S. @Trott you had a typo ( |
I disagree. Every test should be testing something useful, otherwise you end up with a huge backlog of tests, and it's not sure what a test is actually for or why it failed.
But |
To stress test a parallel test locally, I also use -j to specify how many tests to run simultaneously. $ tools/test.py --repeat=92 -j 92 parallel/test-foo-bar |
Just got EMFILE
in a sequential run, and reading the explanation linked above, it seems to be an unexpected if infrequent thing to happen in this test. Clearing my objection.
Argh, can't edit the above (I don't think) but I meant to say |
@@ -35,9 +40,14 @@ const server = net.createServer(function(s) { | |||
|
|||
// Errors can happen if this connection | |||
// is still happening while the server has been closed. | |||
// https://github.com/nodejs/node/issues/3635#issuecomment-157714683 | |||
// If a previous request caused the worker __and__ the server to close | |||
// a following "send" request could emit EMFILE, this is a sporius |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sporius -> spurious
@@ -35,9 +40,14 @@ const server = net.createServer(function(s) { | |||
|
|||
// Errors can happen if this connection |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe:
- ECONNREFUSED or ECONNRESET errors can happen if this connection
@@ -35,9 +40,14 @@ const server = net.createServer(function(s) { | |||
|
|||
// Errors can happen if this connection | |||
// is still happening while the server has been closed. | |||
// https://github.com/nodejs/node/issues/3635#issuecomment-157714683 | |||
// If a previous request caused the worker __and__ the server to close |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- If a previous request...
Basically can it be really clear which explanation is for which errors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly LGTM modulo nits
@@ -1,3 +1,7 @@ | |||
/* Description: | |||
* Before #2847 a child process trying (asynchronously) to use a the closed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#2847
-> https://github.com/nodejs/node/pull/2847
a the
-> a
@@ -1,3 +1,7 @@ | |||
/* Description: | |||
* Before #2847 a child process trying (asynchronously) to use a the closed | |||
* channel to it's creator, would have caused a segfault. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to connect to
?
@@ -1,3 +1,7 @@ | |||
/* Description: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably don't need the Description:
@gibfahn, commands addressed PTAL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with one nit
// is still happening while the server has been closed. | ||
// https://github.com/nodejs/node/issues/3635#issuecomment-157714683 | ||
// ECONNREFUSED or ECONNRESET errors can happen if this connection can | ||
// happen if this connection is still happening while the server has closed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can happen if this connection
is repeated twice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM with a couple of nits
* Before https://github.com/nodejs/node/pull/2847 a child process | ||
* trying (asynchronously) to use the closed channel to it's creator | ||
* would have caused a segfault. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment block should use the typical //
style.
@@ -6,6 +11,7 @@ const assert = require('assert'); | |||
const cluster = require('cluster'); | |||
const net = require('net'); | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unnecessary whitespace change
Comments fixed, PTAL. [joke] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are all a bunch of nit-pickers
Speaking of which...
(I see it as a game, can you minimize the number of nits you'll get per PR)
@@ -1,3 +1,5 @@ | |||
// Before https://github.com/nodejs/node/pull/2847 a child process trying | |||
// (asynchronously) to use the closed channel to it's creator caused a segfault. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's
-> its
@@ -1,3 +1,5 @@ | |||
// Before https://github.com/nodejs/node/pull/2847 a child process trying |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, should be an issue, so:
https://github.com/nodejs/node/pull/2847
->https://github.com/nodejs/node/issues/2847
Also can you order this as in the guide? Should look like:
'use strict';
const common = require('../common');
// Before https://github.com/nodejs/node/issues/2847 a child process trying
// (asynchronously) to use the closed channel to it's creator caused a segfault.
const assert = require('assert');
ab5f12c
to
b372ff6
Compare
According to the explanation in nodejs#3635#issuecomment-157714683 And as a continuation to nodejs#5422 we also ignore EMFILE "No more file descriptors are available,so no more files can be opened" PR-URL: nodejs#12698 Fixes: nodejs#10286 Refs: nodejs#3635 (comment) Refs: nodejs#5178 Refs: nodejs#5179 Refs: nodejs#4005 Refs: nodejs#5121 Refs: nodejs#5422 Refs: nodejs#12621 (comment) Reviewed-By: Gibson Fahnestock <gibfahn@gmail.com> Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Landed in 6f21671 |
Post land CI: https://ci.nodejs.org/job/node-test-commit/10004/ |
According to the explanation in #3635#issuecomment-157714683 And as a continuation to #5422 we also ignore EMFILE "No more file descriptors are available,so no more files can be opened" PR-URL: #12698 Fixes: #10286 Refs: #3635 (comment) Refs: #5178 Refs: #5179 Refs: #4005 Refs: #5121 Refs: #5422 Refs: #12621 (comment) Reviewed-By: Gibson Fahnestock <gibfahn@gmail.com> Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
I've landed this on v6.x... please lmk if this was a mistake |
As discussed in most recently on #10286
test/parallel/test-child-process-fork-regr-gh-2847.js
is a little flakyLast sighting was
Possible explanation #3635 (comment)
So as a follow up to #5422 I suggest we also ignore
EMFILE
raised from the connection.Ref: #3635
Ref: #5178
Ref: #5179
Ref: #4005
Ref: #5121
Ref: #5422
Ref: #12621 (comment)
Fixes: #10286
Checklist
make -j4 test
(UNIX), orvcbuild test
(Windows) passesAffected core subsystem(s)
test