Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: defer self.destroy calls to nextTick #54857

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

anilhelvaci
Copy link

Fixes: #48771

What is the problem being solved?

#48771 Reported request object returned from http.request method cannot catch error events triggered when there’s an immediate failure trying to connect to an address returned from dns lookup.

Solution

#51038 implemented changes suggested in #48771 (comment). However the #51038 couldn’t be merged due to lack of tests(#51038 (review)). In this PR, I apply the same fix but with some tests.

Testing Considerations

All process.nextTick(() => self.destroy()) are hit except one. Below is the self.destroy() call that is not hit in these tests provided:

node/lib/net.js

Line 1113 in 9404d3a

self.destroy(new ERR_SOCKET_CONNECTION_TIMEOUT());

I am not tagging this PR as "DRAFT" since the piece of code that isn't tested is for a connection timeout case.

@mcollina Please let me know if these tests are sufficient or not.

@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/net

@nodejs-github-bot nodejs-github-bot added needs-ci PRs that need a full CI run. net Issues and PRs related to the net subsystem. labels Sep 9, 2024
Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@mcollina mcollina added the request-ci Add this label to start a Jenkins CI on a PR. label Sep 9, 2024
Copy link

codecov bot commented Sep 9, 2024

Codecov Report

Attention: Patch coverage is 75.00000% with 1 line in your changes missing coverage. Please review.

Project coverage is 87.91%. Comparing base (e1e312d) to head (4725298).
Report is 19 commits behind head on main.

Files with missing lines Patch % Lines
lib/net.js 75.00% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main   #54857   +/-   ##
=======================================
  Coverage   87.90%   87.91%           
=======================================
  Files         651      651           
  Lines      183343   183343           
  Branches    35710    35722   +12     
=======================================
+ Hits       161165   161177   +12     
+ Misses      15466    15441   -25     
- Partials     6712     6725   +13     
Files with missing lines Coverage Δ
lib/net.js 95.26% <75.00%> (+1.04%) ⬆️

... and 25 files with indirect coverage changes

Copy link
Contributor

@ShogunPanda ShogunPanda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Member

@anonrig anonrig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice. thank you and congrats on your first contribution!

@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Sep 9, 2024
@nodejs-github-bot

This comment was marked as outdated.

@anilhelvaci
Copy link
Author

anilhelvaci commented Sep 10, 2024

nice. thank you and congrats on your first contribution!

Thank you, so excited to become a part of the family! @anonrig

What are the next steps then? Do I have to do anything to initiate the merge? I see that my branch is behind a few commits, should I rebase and push again?

@mcollina mcollina added the request-ci Add this label to start a Jenkins CI on a PR. label Sep 11, 2024
@mcollina
Copy link
Member

mcollina commented Sep 11, 2024

What are the next steps then?

Running the CI and getting it to pass. We'll take of running it, in case there are related failures it would up to you to fix them.

Do I have to do anything to initiate the merge? I see that my branch is behind a few commits, should I rebase and push again?

no need unless there are conflicts.

@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Sep 11, 2024
@nodejs-github-bot
Copy link
Collaborator

@anilhelvaci
Copy link
Author

Hey @mcollina , thanks for your reply!

So the CI is currently failing. When I look up the details, I see two kinds of failures;

  1. parallel.test-http-client-immidiate-error.js - The file I work on

Stack trace shows that my tests timeout for some reason.

---
duration_ms: 120011.463
exitcode: -15
severity: fail
stack: |-
  timeout
  (node:376527) internal/test/binding: These APIs are for internal testing only. Do not use them.
  (Use `node --trace-warnings ...` to show where the warning was created)
...
  1. parallel.test-runner-watch-mode-complex

For this file, I have no idea why this is failing. Here's the stack trace;

duration_ms: 6031.101
exitcode: 1
severity: fail
stack: "\u25B6 test runner watch mode with more complex setup\n  \u2716 should run\
  \ tests when a dependency changed after a watched test file being deleted (4747.503807ms)\n\
  \    AssertionError [ERR_ASSERTION]: The input did not match the regular expression\
  \ /tests 2/. Input:\n\n    '\u2714 second test has ran (3.238638ms)\\n' +\n    \
  \  '\u2714 first test has ran (4.048804ms)\\n' +\n      '\u2714 first test has ran\
  \ (12.434145ms)\\n' +\n      '\u2139 tests 3\\n' +\n      '\u2139 suites 0\\n' +\n\
  \      '\u2139 pass 3\\n' +\n      '\u2139 fail 0\\n' +\n      '\u2139 cancelled\
  \ 0\\n' +\n      '\u2139 skipped 0\\n' +\n      '\u2139 todo 0\\n' +\n      '\u2139\
  \ duration_ms 1312.515689\\n'\n\n        at TestContext.<anonymous> (file:///home/iojs/build/workspace/node-test-commit-linux-containered/test/parallel/test-runner-watch-mode-complex.mjs:99:12)\n\
  \        at process.processTicksAndRejections (node:internal/process/task_queues:105:5)\n\
  \        at async Test.run (node:internal/test_runner/test:888:9)\n        at async\
  \ Promise.all (index 0)\n        at async Suite.run (node:internal/test_runner/test:1268:7)\n\
  \        at async startSubtestAfterBootstrap (node:internal/test_runner/harness:283:3)\
  \ {\n      generatedMessage: true,\n      code: 'ERR_ASSERTION',\n      actual:\
  \ '\u2714 second test has ran (3.238638ms)\\n\u2714 first test has ran (4.048804ms)\\\
  n\u2714 first test has ran (12.434145ms)\\n\u2139 tests 3\\n\u2139 suites 0\\n\u2139\
  \ pass 3\\n\u2139 fail 0\\n\u2139 cancelled 0\\n\u2139 skipped 0\\n\u2139 todo 0\\\
  n...',\n      expected: /tests 2/,\n      operator: 'match'\n    }\n\n\u25B6 test\
  \ runner watch mode with more complex setup (4776.910571ms)\n\u2139 tests 1\n\u2139\
  \ suites 1\n\u2139 pass 0\n\u2139 fail 1\n\u2139 cancelled 0\n\u2139 skipped 0\n\
  \u2139 todo 0\n\u2139 duration_ms 4833.662639\n\n\u2716 failing tests:\n\ntest at\
  \ test/parallel/test-runner-watch-mode-complex.mjs:53:3\n\u2716 should run tests\
  \ when a dependency changed after a watched test file being deleted (4747.503807ms)\n\
  \  AssertionError [ERR_ASSERTION]: The input did not match the regular expression\
  \ /tests 2/. Input:\n\n  '\u2714 second test has ran (3.238638ms)\\n' +\n    '\u2714\
  \ first test has ran (4.048804ms)\\n' +\n    '\u2714 first test has ran (12.434145ms)\\\
  n' +\n    '\u2139 tests 3\\n' +\n    '\u2139 suites 0\\n' +\n    '\u2139 pass 3\\\
  n' +\n    '\u2139 fail 0\\n' +\n    '\u2139 cancelled 0\\n' +\n    '\u2139 skipped\
  \ 0\\n' +\n    '\u2139 todo 0\\n' +\n    '\u2139 duration_ms 1312.515689\\n'\n\n\
  \      at TestContext.<anonymous> (file:///home/iojs/build/workspace/node-test-commit-linux-containered/test/parallel/test-runner-watch-mode-complex.mjs:99:12)\n\
  \      at process.processTicksAndRejections (node:internal/process/task_queues:105:5)\n\
  \      at async Test.run (node:internal/test_runner/test:888:9)\n      at async\
  \ Promise.all (index 0)\n      at async Suite.run (node:internal/test_runner/test:1268:7)\n\
  \      at async startSubtestAfterBootstrap (node:internal/test_runner/harness:283:3)\
  \ {\n    generatedMessage: true,\n    code: 'ERR_ASSERTION',\n    actual: '\u2714\
  \ second test has ran (3.238638ms)\\n\u2714 first test has ran (4.048804ms)\\n\u2714\
  \ first test has ran (12.434145ms)\\n\u2139 tests 3\\n\u2139 suites 0\\n\u2139 pass\
  \ 3\\n\u2139 fail 0\\n\u2139 cancelled 0\\n\u2139 skipped 0\\n\u2139 todo 0\\n...',\n\
  \    expected: /tests 2/,\n    operator: 'match'\n  }"

There are other failing tests but their severity is flaky so I assume they are no problem for me.

I rebased my branch on top of the current main and make -j4 test passed. Could use any suggestions on how to debug/reproduce these failures happening in CI 🤔

@aduh95
Copy link
Contributor

aduh95 commented Sep 13, 2024

  1. parallel.test-http-client-immidiate-error.js - The file I work on

Stack trace shows that my tests timeout for some reason.

That's a red flag, we don't want to merge flaky tests in our codebase. A timeout probably means there's a race condition somewhere that you forgot to take into account and result in the test never exiting. To try to reproduce the flakiness locally, you can try running:

tools/test.py --repeat 9999 -t 2 test/parallel/test-http-client-immediate-error.js

Once you have a repro, you can attempt getting a fix ready.

@anilhelvaci
Copy link
Author

Hey @aduh95 , thanks for the clarifying 🙏

Unfortunately the piece of code you suggested did not reproduce the problem for me. I'm on OSX, can you think of anything else that I can try?

@ronag
Copy link
Member

ronag commented Sep 18, 2024

@mcollina FYI, this adds overhead to net to fix a bug in the "legacy" HTTP client. I don't mind per se, but it's not optimal since the bug is not actually in net.

@mcollina
Copy link
Member

CI is failing

@anilhelvaci
Copy link
Author

Yep, it is @mcollina. Any suggestions on how to reproduce it? Any chance I can shell into the machine(or container) facing problems? Or, are there any snapshot/images I can pull into my machine and try to spin it up?

In other words, what are the usual steps that you guys take when you face a problem in CI?

@mcollina
Copy link
Member

In this specific case, it seems that any Linux box on top of any virtualization software would do.

For more exotic systems @nodejs/build can provide access.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-ci PRs that need a full CI run. net Issues and PRs related to the net subsystem.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Conditional unhandled 'error' event when http.request with .lookup
10 participants