-
Notifications
You must be signed in to change notification settings - Fork 30.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IBM i test failures (read ECONNRESET
)
#39683
Comments
FWIW https://ci.nodejs.org/job/node-test-commit-ibmi/469/nodes=ibmi73-ppc64/#showFailuresLink shows the same seven tests failing on test-iinthecloud-ibmi73-ppc64_be-2. |
@richardlau see #39525 (comment). Can you try to rerun the tests with #36111 applied? |
https://ci.nodejs.org/job/node-test-commit-ibmi/471/nodes=ibmi73-ppc64/ |
cc: @vtjnash |
Another possible culprit is 0e841b4. Was it included in the build? |
https://ci.nodejs.org/job/node-test-commit-ibmi/471/nodes=ibmi73-ppc64/ was #36111 rebased on top of 822f9ff. https://ci.nodejs.org/job/node-test-commit-ibmi/471/nodes=ibmi73-ppc64/consoleFull
|
@richardlau I think reverting libuv/libuv#3006 might fix the problem as the error is coming from line 220 here and this has something to do with node/lib/internal/stream_base_commons.js Lines 217 to 222 in 52ebe0f
|
libuv/libuv#3006 may be likely to exacerbate existing race conditions in nodejs (since it surfaces attempted writes-after-shutdown as errors somewhat quicker), but it seems unlikely to cause them. Looking at parallel/test-http-client-parse-error, this test server appears to violate the TCP protocol (valid, as that is the point of the test), which is causing the client to crash (invalid, as it means the test is correctly determining that nodejs is broken here). This test defines a TCP server which does not read any of the incoming data, which means the TCP spec requires that the kernel send a ECONNRESET packet (exactly what we sometimes see happen there). However, that server also immediately writes data then calls The nodejs failure to handle this situation correctly in the client seems related to be #39363, though in the http stack instead of the TLS stack. My fix #36111 is somewhat related, but only for the happy path, and doesn't fix the error path in nodejs (which is what we see triggering failures in the tests above). |
Refs: nodejs#39683 These are being worked, but we really should have marked flaky a long time ago in ordert to make then nightlies non-red. Signed-off-by: Michael Dawson <mdawson@devrus.com>
Refs: #39683 These are being worked, but we really should have marked flaky a long time ago in ordert to make then nightlies non-red. Signed-off-by: Michael Dawson <mdawson@devrus.com> PR-URL: #41812 Reviewed-By: Colin Ihrig <cjihrig@gmail.com> Reviewed-By: Rich Trott <rtrott@gmail.com> Reviewed-By: Richard Lau <rlau@redhat.com> Reviewed-By: Mohammed Keyvanzadeh <mohammadkeyvanzade94@gmail.com>
PR to exclude failures until we get them resolved - #41812 |
Refs: #39683 These are being worked, but we really should have marked flaky a long time ago in ordert to make then nightlies non-red. Signed-off-by: Michael Dawson <mdawson@devrus.com> PR-URL: #41812 Reviewed-By: Colin Ihrig <cjihrig@gmail.com> Reviewed-By: Rich Trott <rtrott@gmail.com> Reviewed-By: Richard Lau <rlau@redhat.com> Reviewed-By: Mohammed Keyvanzadeh <mohammadkeyvanzade94@gmail.com>
Correct the names of two tests that have been marked `FLAKY` on IBM i so they will actually be marked as such by the test runner. Refs: nodejs#41812 Refs: nodejs#39683
Correct the names of two tests that have been marked `FLAKY` on IBM i so they will actually be marked as such by the test runner. Refs: #41812 Refs: #39683 PR-URL: #41984 Reviewed-By: Rich Trott <rtrott@gmail.com> Reviewed-By: Mestery <mestery@protonmail.com> Reviewed-By: Beth Griggs <bgriggs@redhat.com> Reviewed-By: Colin Ihrig <cjihrig@gmail.com> Reviewed-By: Michael Dawson <midawson@redhat.com> Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Applying #36111, reverting libuv/libuv#3006 or 0e841b4 does not solve the issue on my local machine; The failure appears to be related to a known issue on IBMi when calling In this case ECONNRESET comes from I'm looking into whether there are any drawbacks to calling |
The failures here are a few examples of a general issue with nodejs, not specifically related to any particular platform, but because the http and http2 servers are not implemented robustly against malicious remote clients: #39363 My PR (#36111) fixed a similar bug in the TLS handling, but is not related to this issue (which fails also without TLS). |
Correct the names of two tests that have been marked `FLAKY` on IBM i so they will actually be marked as such by the test runner. Refs: nodejs#41812 Refs: nodejs#39683 PR-URL: nodejs#41984 Reviewed-By: Rich Trott <rtrott@gmail.com> Reviewed-By: Mestery <mestery@protonmail.com> Reviewed-By: Beth Griggs <bgriggs@redhat.com> Reviewed-By: Colin Ihrig <cjihrig@gmail.com> Reviewed-By: Michael Dawson <midawson@redhat.com> Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Correct the names of two tests that have been marked `FLAKY` on IBM i so they will actually be marked as such by the test runner. Refs: nodejs#41812 Refs: nodejs#39683 PR-URL: nodejs#41984 Reviewed-By: Rich Trott <rtrott@gmail.com> Reviewed-By: Mestery <mestery@protonmail.com> Reviewed-By: Beth Griggs <bgriggs@redhat.com> Reviewed-By: Colin Ihrig <cjihrig@gmail.com> Reviewed-By: Michael Dawson <midawson@redhat.com> Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Correct the names of two tests that have been marked `FLAKY` on IBM i so they will actually be marked as such by the test runner. Refs: #41812 Refs: #39683 PR-URL: #41984 Reviewed-By: Rich Trott <rtrott@gmail.com> Reviewed-By: Mestery <mestery@protonmail.com> Reviewed-By: Beth Griggs <bgriggs@redhat.com> Reviewed-By: Colin Ihrig <cjihrig@gmail.com> Reviewed-By: Michael Dawson <midawson@redhat.com> Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Correct the names of two tests that have been marked `FLAKY` on IBM i so they will actually be marked as such by the test runner. Refs: #41812 Refs: #39683 PR-URL: #41984 Reviewed-By: Rich Trott <rtrott@gmail.com> Reviewed-By: Mestery <mestery@protonmail.com> Reviewed-By: Beth Griggs <bgriggs@redhat.com> Reviewed-By: Colin Ihrig <cjihrig@gmail.com> Reviewed-By: Michael Dawson <midawson@redhat.com> Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
According to conversation in libuv/libuv#3494, this seems to be a node bug, meaning that node http implementation should be resilient to unexpected RST from a TCP connection (and clients that violate TCP protocol in general), and not crash the process on such instances. In this particular case, UV_ECONNRESET is passed to read_cb by libuv as designed, and should be handled on the caller side (possibly in https://github.com/nodejs/node/blob/master/lib/internal/stream_base_commons.js#L167). This problem appears to affect all platforms; The Looking into whether a contained solution affecting Ibmi only is possible on the node side. |
Trying to create a test case that would reliably reproduce the problem on all platforms; I have read through #36111, #27916 and the most reproducible test that shows the effect of unexpected TCP RST seems to be #27916 (comment), (which fails on OSX and Windows reliably). The minimal test could involve a simple server and a client that sends RST right after connecting; server.js:
Error message:
I don't think a contained solution that would take care of IBMi is possible in node. |
Refs: #39683 These are being worked, but we really should have marked flaky a long time ago in ordert to make then nightlies non-red. Signed-off-by: Michael Dawson <mdawson@devrus.com> PR-URL: #41812 Reviewed-By: Colin Ihrig <cjihrig@gmail.com> Reviewed-By: Rich Trott <rtrott@gmail.com> Reviewed-By: Richard Lau <rlau@redhat.com> Reviewed-By: Mohammed Keyvanzadeh <mohammadkeyvanzade94@gmail.com>
ok, IBM i was orange again today so I think the reason I opened this has been addressed. The test still fails but is excluded. @V-for-Vasili I don't quite understand this comment - I don't think a contained solution that would take care of IBMi is possible in node. |
@mhdawson Apologies! Let me take that back; Still looking for such solution atm. To summarize relevant info above: The issue of properly handling unexpected TCP RST packets (responsible for ECONNRESET errors) in Node http implementation has been brought up #36180, #27916, but there does not seem to be a consensus on what to do about it yet. As far as I can tell, fixing this would be a major change on the node side and affect all platforms. |
@V-for-Vasili thanks, that helps me understand better. |
Refs: #39683 These are being worked, but we really should have marked flaky a long time ago in ordert to make then nightlies non-red. Signed-off-by: Michael Dawson <mdawson@devrus.com> PR-URL: #41812 Reviewed-By: Colin Ihrig <cjihrig@gmail.com> Reviewed-By: Rich Trott <rtrott@gmail.com> Reviewed-By: Richard Lau <rlau@redhat.com> Reviewed-By: Mohammed Keyvanzadeh <mohammadkeyvanzade94@gmail.com>
Refs: #39683 These are being worked, but we really should have marked flaky a long time ago in ordert to make then nightlies non-red. Signed-off-by: Michael Dawson <mdawson@devrus.com> PR-URL: #41812 Reviewed-By: Colin Ihrig <cjihrig@gmail.com> Reviewed-By: Rich Trott <rtrott@gmail.com> Reviewed-By: Richard Lau <rlau@redhat.com> Reviewed-By: Mohammed Keyvanzadeh <mohammadkeyvanzade94@gmail.com>
Update: Patch to resolve the TCP stack behavior on IBMi 7.3,7.4 is responsible for the rest of the failures that have to do with ECONNRESET. I am talking to networking team about the possibility of fixing it in a PTF sometime in the future. There is no timeline yet - will post updates here when more is known. |
Correct the names of two tests that have been marked `FLAKY` on IBM i so they will actually be marked as such by the test runner. Refs: nodejs#41812 Refs: nodejs#39683 PR-URL: nodejs#41984 Reviewed-By: Rich Trott <rtrott@gmail.com> Reviewed-By: Mestery <mestery@protonmail.com> Reviewed-By: Beth Griggs <bgriggs@redhat.com> Reviewed-By: Colin Ihrig <cjihrig@gmail.com> Reviewed-By: Michael Dawson <midawson@redhat.com> Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Correct the names of two tests that have been marked `FLAKY` on IBM i so they will actually be marked as such by the test runner. Refs: #41812 Refs: #39683 PR-URL: #41984 Reviewed-By: Rich Trott <rtrott@gmail.com> Reviewed-By: Mestery <mestery@protonmail.com> Reviewed-By: Beth Griggs <bgriggs@redhat.com> Reviewed-By: Colin Ihrig <cjihrig@gmail.com> Reviewed-By: Michael Dawson <midawson@redhat.com> Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Update: Applying the following PTFs resolves the underlying TCP stack behavior and will make the tests pass:
|
The description for MF69723 looks good to me as a fix for this issue (ECONNRESET should not be set for close). MF69703 sounds to me like a new violation of the TCP standard (RST packets are supposed to destroy the read queue, superseding and discarding any waiting data). |
@ThePrez Can you take care of applying the PTFs onto the CI machines? |
Yes. Will do! |
I have received some feedback and clarification to the above from the team that worked on both PTFs. According to them, the change made in MF69703 does not violate the TCP standard: The sockets receive code path was updated to return valid data queued on the socket, not the TCP receive queue. If there is pending data at the TCP layer when the RST is received, it is handled according to RFC 793. |
I believe that RFC 793 specifies that all segment queues should be flushed and any outstanding RECEIVES should report a reset error instead:
After MF69703, the code is now not deleting the TCB nor flushing the incoming queue, and is continuing to process outstanding RECIEVEs. Either way though, the client application should eventually receive the error once the queue is drained, so it is not a significant issue, but it might be different than how I interpret other platforms would handle this. |
Mark `test-http-pipeline-requests-connection-leak` flaky on IBM i. PR-URL: nodejs#44215 Refs: nodejs#43509 Refs: nodejs#39683 Reviewed-By: Luigi Pinca <luigipinca@gmail.com> Reviewed-By: Darshan Sen <raisinten@gmail.com>
Mark `test-http-pipeline-requests-connection-leak` flaky on IBM i. PR-URL: nodejs/node#44215 Refs: nodejs/node#43509 Refs: nodejs/node#39683 Reviewed-By: Luigi Pinca <luigipinca@gmail.com> Reviewed-By: Darshan Sen <raisinten@gmail.com>
Mark `test-http-pipeline-requests-connection-leak` flaky on IBM i. PR-URL: nodejs/node#44215 Refs: nodejs/node#43509 Refs: nodejs/node#39683 Reviewed-By: Luigi Pinca <luigipinca@gmail.com> Reviewed-By: Darshan Sen <raisinten@gmail.com>
Starting with yesterday's daily build, node-test-commit-ibmi is seeing several test failures with
Error: read ECONNRESET
).Earlier builds (e.g. https://ci.nodejs.org/job/node-test-commit-ibmi/466/) are passing -- the source change between the last passing build and the failing builds is c61870c (the libuv 1.42.0 update). For the record, the libuv builds (libuv-test-commit-ibmi and libuv-test-commit-ibmi-cmake) have been passing.
Both the failing builds were on test-iinthecloud-ibmi73-ppc64_be-1. I've started a new build (in progress) on test-iinthecloud-ibmi73-ppc64_be-2 to check whether it's host specific: https://ci.nodejs.org/job/node-test-commit-ibmi/469/nodes=ibmi73-ppc64/
cc @nodejs/platform-ibmi
The text was updated successfully, but these errors were encountered: