Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: skip test-diagnostics-channel-memory-leak.js #50327

Merged
merged 1 commit into from
Nov 3, 2023

Conversation

joyeecheung
Copy link
Member

@joyeecheung joyeecheung commented Oct 22, 2023

test: skip test-diagnostics-channel-memory-leak.js

There is currently no reliable way to detect this leak because:

  1. We cannot reliably get a reference to the channel from the
    API to detect finalization without creating another strong reference.
  2. This test does gc() and then checks memory usage - however the
    use of gc() disables code aging which can actually lead to increased
    memory usage overall, as it is not intended to be used to lower
    memory usage in the first place.
  3. The implementation of diagnostics channels relies on ephemeron gc
    which is inefficient, it's not reliable to use the typical "create
    a lot of objects and see if it crashes" trick to check leaks.

Skip the test for now until we find a way to test it reliably.

To avoid flakiness in the CI, it's better to remove an unreliable
test altogether.

I found this issue from an integration test run in https://chromium-review.googlesource.com/c/v8/v8/+/4962094

@nodejs-github-bot nodejs-github-bot added needs-ci PRs that need a full CI run. test Issues and PRs related to the tests. labels Oct 22, 2023
@joyeecheung joyeecheung added the request-ci Add this label to start a Jenkins CI on a PR. label Oct 22, 2023
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Oct 22, 2023
@nodejs-github-bot
Copy link
Collaborator

@joyeecheung
Copy link
Member Author

joyeecheung commented Oct 28, 2023

Actually, I don't think we can use checkIfCollectable() to check this given how the channels are themselves retained by a WeakMap holding Weakrefs. I am inclined to just remove this test as there is no reliable way to detect this leak

  1. Use of gc() in general should be avoided in leak detection tests due to the fact that it disables code aging for the compilation cache (which is why it was failing in the v8 integration test).
  2. Because it relies on ephemeron gc which is inefficient, it's not reliable to use the typical "create a lot of objects and see if it crashes" trick.

@joyeecheung joyeecheung changed the title test: use checkIfCollectable() in diganostics channel leak test test: remove test-diagnostics-channel-memory-leak.js Oct 28, 2023
@joyeecheung
Copy link
Member Author

Updated to just remove this test. @jasnell can you take a look again?

also cc @Qard @theanarkh

@joyeecheung joyeecheung added the request-ci Add this label to start a Jenkins CI on a PR. label Oct 28, 2023
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Oct 28, 2023
@nodejs-github-bot
Copy link
Collaborator

@Qard
Copy link
Member

Qard commented Oct 29, 2023

Would checking with FinalizationRegistry help? Could make a ton of instances, register each, and then check if any of them ever triggered the finalizer before the process exits. 🤔

@joyeecheung
Copy link
Member Author

joyeecheung commented Oct 29, 2023

Would checking with FinalizationRegistry help? Could make a ton of instances, register each, and then check if any of them ever triggered the finalizer before the process exits. 🤔

That's what I tried with checkIfCollectable() (which does exactly this) and nope it doesn't work, still flaky. Locally the finalizer is triggered after a few instances but in the CI that's a different story. Technically this is also not what FinalizationRegistry guarantees - it's still spec compliant that the finalizer never gets called. I think the lesson learned from #49710 is that FinalizationRegistry with even heap snapshots (which is another inappropriate way to trigger GC) is still unreliable in detecting leaks and leak detection in JS land can only be done as a best-effort. If false positives are inevitable, just don't test it.

@Qard
Copy link
Member

Qard commented Oct 29, 2023

I mean, diagnostics_channel itself is using FinalizationRegistry for the cleanup this is supposed to be testing, so if FinalizationRegistry is unreliable isn't that a problem? 🤔

Or is this just an issue of the particular behaviour of --expose_gc? Would there be a way to trigger a deeper GC at the C++ level that we could expose for the tests?

@joyeecheung
Copy link
Member Author

joyeecheung commented Oct 30, 2023

so if FinalizationRegistry is unreliable isn't that a problem? 🤔

Yes, which is why FinalizationRegistry should be avoided if possible https://github.com/tc39/proposal-weakrefs#a-note-of-caution - it seems the diagnostics channel implementation is doing something the proposal specifically advise against, though that's out of the scope of this PR.

Or is this just an issue of the particular behaviour of --expose_gc? Would there be a way to trigger a deeper GC at the C++ level that we could expose for the tests?

--expose_gc (or specifically the forced GC it allows) just alters the GC mechanism slightly in that it disables code aging (so compiled code is ever going to be GC'ed, which lead to the failure of the original test once there's more (unrelated) compiled code in the heap). V8 intentionally does not provide a public API to "trigger a deeper GC" (not one that’s guaranteed to clear specific objects that you think should be cleared anyway) because GC is supposed to be internal to the JavaScript engine implementation. I think any test that rely on "a deeper GC" to clear specific objects especially to deflake is not really worth the maintenance cost. If the test doesn't flake, it's okay to keep them, but if they flake, it's better to not do an unreliable flaky test.

@joyeecheung
Copy link
Member Author

joyeecheung commented Oct 30, 2023

If there are no more concerns about removing the test by Wednesday UTC, I am going to land this change to unblock https://chromium-review.googlesource.com/c/v8/v8/+/4962094 - if it's possible to invent a leak detection test that does not flake, it can always be added later, but I think the current test is just flaky in nature, so it's better to remove it for now.

Copy link
Member

@Qard Qard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving to unblock this, but I'd prefer to avoid deleting the test if we can. Would adding the test to a test/**.status file as flaky work rather than deleting it? I don't know enough about how the flaky tagging works to know if that would get things passing while still keeping the test around to unmark in the future if we fix it.

As for diagnostics_channel using FinalizationRegistry, it's not ideal that it might not clean it up, but it was basically a solution to dynamically generated channels (which are discouraged) would otherwise stick around forever. It's an attempt to prevent users from shooting themselves in the foot, but perhaps not 100% successfully. If you've got any ideas on how to do it more reliably I would love to hear them. I think it's probably acceptable as it is though as it's hopefully at least better than definitely leaking. As it is, it just may leak an empty WeakReference instance. 😅

@joyeecheung
Copy link
Member Author

I don't know enough about how the flaky tagging works to know if that would get things passing while still keeping the test around to unmark in the future if we fix it.

I think in this case having a flaky test is probably not the best solution, because this would be intentionally making the CI more orange. Maybe we can mark it as SKIP directly if we really want to keep the test?

@Qard
Copy link
Member

Qard commented Nov 1, 2023

SKIP would be fine too if that gets us to green without throwing out the context that we were testing for this at some point and should probably be finding a better way to test it at some point in the future. 😅

Otherwise, I'm okay with removing the test entirely if there's not a good way to keep it around without it interfering with our CI. Flaky tests are definitely bad tests. 😬

There is currently no reliable way to detect this leak because:

1. We cannot reliably get a reference to the channel from the
  API to detect finalization without creating another strong reference.
2. This test does gc() and then checks memory usage - however the
  use of gc() disables code aging which can actually lead to increased
  memory usage overall, as it is not intended to be used to lower
  memory usage in the first place.
3. The implementation of diagnostics channels relies on ephemeron gc
  which is inefficient, it's not reliable to use the typical "create
  a lot of objects and see if it crashes" trick to check leaks.

Skip the test for now until we find a way to test it reliably.

To avoid flakiness in the CI, it's better to remove an unreliable
test altogether.
@joyeecheung joyeecheung changed the title test: remove test-diagnostics-channel-memory-leak.js test: skip test-diagnostics-channel-memory-leak.js Nov 3, 2023
@joyeecheung joyeecheung added the request-ci Add this label to start a Jenkins CI on a PR. label Nov 3, 2023
@joyeecheung joyeecheung added the commit-queue Add this label to land a pull request using GitHub Actions. label Nov 3, 2023
@nodejs-github-bot nodejs-github-bot removed the commit-queue Add this label to land a pull request using GitHub Actions. label Nov 3, 2023
@nodejs-github-bot nodejs-github-bot merged commit 94156e3 into nodejs:main Nov 3, 2023
25 checks passed
@nodejs-github-bot
Copy link
Collaborator

Landed in 94156e3

@joyeecheung
Copy link
Member Author

joyeecheung commented Nov 4, 2023

An interesting idea came up to me about how to test this somewhat more reliably using a (currently somewhat internal) V8 API (v8::debug::QueryObjects) and the CI looks happy with it https://ci.nodejs.org/job/node-test-commit/66307/ - I'll see if it's possible to upstream my V8 patches to expose this to the embedders, some restrictions may need to be added to prevent the embedders from using this too liberally

@Qard
Copy link
Member

Qard commented Nov 4, 2023

I'm all for more reliable leak checking. 🙂

@joyeecheung
Copy link
Member Author

anonrig pushed a commit to anonrig/node that referenced this pull request Nov 9, 2023
There is currently no reliable way to detect this leak because:

1. We cannot reliably get a reference to the channel from the
  API to detect finalization without creating another strong reference.
2. This test does gc() and then checks memory usage - however the
  use of gc() disables code aging which can actually lead to increased
  memory usage overall, as it is not intended to be used to lower
  memory usage in the first place.
3. The implementation of diagnostics channels relies on ephemeron gc
  which is inefficient, it's not reliable to use the typical "create
  a lot of objects and see if it crashes" trick to check leaks.

Skip the test for now until we find a way to test it reliably.

To avoid flakiness in the CI, it's better to remove an unreliable
test altogether.

PR-URL: nodejs#50327
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Stephen Belanger <admin@stephenbelanger.com>
targos pushed a commit that referenced this pull request Nov 11, 2023
There is currently no reliable way to detect this leak because:

1. We cannot reliably get a reference to the channel from the
  API to detect finalization without creating another strong reference.
2. This test does gc() and then checks memory usage - however the
  use of gc() disables code aging which can actually lead to increased
  memory usage overall, as it is not intended to be used to lower
  memory usage in the first place.
3. The implementation of diagnostics channels relies on ephemeron gc
  which is inefficient, it's not reliable to use the typical "create
  a lot of objects and see if it crashes" trick to check leaks.

Skip the test for now until we find a way to test it reliably.

To avoid flakiness in the CI, it's better to remove an unreliable
test altogether.

PR-URL: #50327
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Stephen Belanger <admin@stephenbelanger.com>
targos pushed a commit that referenced this pull request Nov 14, 2023
There is currently no reliable way to detect this leak because:

1. We cannot reliably get a reference to the channel from the
  API to detect finalization without creating another strong reference.
2. This test does gc() and then checks memory usage - however the
  use of gc() disables code aging which can actually lead to increased
  memory usage overall, as it is not intended to be used to lower
  memory usage in the first place.
3. The implementation of diagnostics channels relies on ephemeron gc
  which is inefficient, it's not reliable to use the typical "create
  a lot of objects and see if it crashes" trick to check leaks.

Skip the test for now until we find a way to test it reliably.

To avoid flakiness in the CI, it's better to remove an unreliable
test altogether.

PR-URL: #50327
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Stephen Belanger <admin@stephenbelanger.com>
UlisesGascon pushed a commit that referenced this pull request Dec 11, 2023
There is currently no reliable way to detect this leak because:

1. We cannot reliably get a reference to the channel from the
  API to detect finalization without creating another strong reference.
2. This test does gc() and then checks memory usage - however the
  use of gc() disables code aging which can actually lead to increased
  memory usage overall, as it is not intended to be used to lower
  memory usage in the first place.
3. The implementation of diagnostics channels relies on ephemeron gc
  which is inefficient, it's not reliable to use the typical "create
  a lot of objects and see if it crashes" trick to check leaks.

Skip the test for now until we find a way to test it reliably.

To avoid flakiness in the CI, it's better to remove an unreliable
test altogether.

PR-URL: #50327
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Stephen Belanger <admin@stephenbelanger.com>
@UlisesGascon UlisesGascon mentioned this pull request Dec 12, 2023
codebytere added a commit to electron/electron that referenced this pull request Jan 10, 2024
zcbenz pushed a commit to electron/electron that referenced this pull request Jan 12, 2024
zcbenz pushed a commit to electron/electron that referenced this pull request Jan 12, 2024
codebytere added a commit to electron/electron that referenced this pull request Jan 15, 2024
codebytere added a commit to electron/electron that referenced this pull request Jan 15, 2024
codebytere added a commit to electron/electron that referenced this pull request Jan 16, 2024
codebytere added a commit to electron/electron that referenced this pull request Jan 18, 2024
jkleinsc pushed a commit to electron/electron that referenced this pull request Jan 18, 2024
codebytere added a commit to electron/electron that referenced this pull request Jan 18, 2024
jkleinsc pushed a commit to electron/electron that referenced this pull request Jan 18, 2024
* chore: bump node in DEPS to v20.11.0

* module: bootstrap module loaders in shadow realm

nodejs/node#48655

* src: add commit hash shorthand in zlib version

nodejs/node#50158

* v8,tools: expose necessary V8 defines

nodejs/node#50820

* esm: do not call getSource when format is commonjs

nodejs/node#50465

* esm: fallback to readFileSync when source is nullish

nodejs/node#50825

* vm: allow dynamic import with a referrer realm

nodejs/node#50360

* test: skip test-diagnostics-channel-memory-leak.js

nodejs/node#50327

* esm: do not call getSource when format is commonjs

nodejs/node#50465

* lib: fix assert throwing different error messages in ESM and CJS

nodejs/node#50634

* src: fix compatility with upcoming V8 12.1 APIs

nodejs/node#50709

* deps: update base64 to 0.5.1

nodejs/node#50629

* src: avoid silent coercion to signed/unsigned int

nodejs/node#50663

* src: fix compatility with upcoming V8 12.1 APIs

nodejs/node#50709

* chore: fix patch indices

* chore: update patches

* test: disable TLS cipher test

This can't be enabled owing to BoringSSL incompatibilities.

nodejs/node#50186

* fix: check for Buffer and global definition in shadow realm

nodejs/node#51239

* test: disable parallel/test-shadow-realm-custom-loader

Incompatible with our asar logic, resulting in the following failure:

> Failed to CompileAndCall electron script: electron/js2c/asar_bundle

* chore: remove deleted parallel/test-crypto-modp1-error test

* test: make test-node-output-v8-warning generic

nodejs/node#50421

* chore: fixup ModuleWrap patch

* test: match wpt/streams/transferable/transform-stream-members.any.js to upstream

* fix: sandbox is not enabled on arm

* chore: disable v8 sandbox on ia32/arm

---------

Co-authored-by: electron-roller[bot] <84116207+electron-roller[bot]@users.noreply.github.com>
Co-authored-by: Shelley Vohr <shelley.vohr@gmail.com>
Co-authored-by: Cheng Zhao <zcbenz@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-ci PRs that need a full CI run. test Issues and PRs related to the tests.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants