-
Notifications
You must be signed in to change notification settings - Fork 217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intermittent swingset test error, possibly due to gc non-determinism #5575
Comments
Full trace
|
@Chris-Hibbert noticed https://github.com/Agoric/agoric-sdk/runs/6867460832?check_suite_focus=true doing something similar:
I think this is the console output from that test:
This feels like the same kind of issue, probably a finalizer that failed to run in the expected crank. I'm seeing problems with Node.js -hosted workers not behaving GC consistently. I have one suspicion that I have a separate suspicion that |
I'm seeing more of the original |
This is annoying, but not enough to justify putting more energy into fixing than we already have. We'll revisit if it appears more frequently (and becomes even more annoying), or if we have a clever idea to fix it. |
At this point I think the best approach is to give up on expecting controllable GC under v8, which means making a list of all the tests that attempt to look at GC behavior (possibly indirectly), and configure them to only use |
I suppose we don't really care right now about having liveslots's GC logic running reliably under v8, so yeah restricting gc test to xs-worker makes sense. |
How has recent work on gc determinism affected this? |
First any recent GC non-determinism work was for XS, and only materialized for virtual collections metadata, so from what I understand it's not related to this issue. Then any of those fixes have not landed anywhere so I don't think we'd know if they had any impact. I'm also not sure what the initial issue was here, or if any GC related work happened for VO since the initial report. Has anyone experienced this issue since the original report? |
I saw a test failure here, "historical inaccuracy in replay", which might have been due to GC happening one way in the original, and a different way in the replay (when running under Node.js). This feels like an aspect of #5575, and this test isn't trying to exercise anything about GC, so I'm just going to set defaultReapInterval to 'never' to inhibit BOYD, which should remove the problem.
I saw a test failure here, "historical inaccuracy in replay", which might have been due to GC happening one way in the original, and a different way in the replay (when running under Node.js). This feels like an aspect of #5575, and this test isn't trying to exercise anything about GC, so I'm just going to set defaultReapInterval to 'never' to inhibit BOYD, which should remove the problem.
I saw a test failure here, "historical inaccuracy in replay", which might have been due to GC happening one way in the original, and a different way in the replay (when running under Node.js). This feels like an aspect of #5575, and this test isn't trying to exercise anything about GC, so I'm just going to set defaultReapInterval to 'never' to inhibit BOYD, which should remove the problem.
I saw a test failure here, "historical inaccuracy in replay", which might have been due to GC happening one way in the original, and a different way in the replay (when running under Node.js). This feels like an aspect of #5575, and this test isn't trying to exercise anything about GC, so I'm just going to set defaultReapInterval to 'never' to inhibit BOYD, which should remove the problem.
node/v8/ava is just too flaky, I was seeing #5575 -type problems in test-upgrade.js
I saw a test failure here, "historical inaccuracy in replay", which might have been due to GC happening one way in the original, and a different way in the replay (when running under Node.js). This feels like an aspect of #5575, and this test isn't trying to exercise anything about GC, so I'm just going to set defaultReapInterval to 'never' to inhibit BOYD, which should remove the problem.
I saw a test failure here, "historical inaccuracy in replay", which might have been due to GC happening one way in the original, and a different way in the replay (when running under Node.js). This feels like an aspect of #5575, and this test isn't trying to exercise anything about GC, so I'm just going to set defaultReapInterval to 'never' to inhibit BOYD, which should remove the problem.
I saw a test failure here, "historical inaccuracy in replay", which might have been due to GC happening one way in the original, and a different way in the replay (when running under Node.js). This feels like an aspect of #5575, and this test isn't trying to exercise anything about GC, so I'm just going to set defaultReapInterval to 'never' to inhibit BOYD, which should remove the problem.
Sometimes, for reasons we don't entirely understand, Node.js doesn't garbage-collect objects when we tell it to, and we get flaky GC-checking tests. This applies our usual fix, which is to only run those tests under XS. It also stops attempting to use `test.serial` as a workaround. refs #3240 refs #5575 fixes #9089
Sometimes, for reasons we don't entirely understand, Node.js doesn't garbage-collect objects when we tell it to, and we get flaky GC-checking tests. This applies our usual fix, which is to only run those tests under XS. It also stops attempting to use `test.serial` as a workaround. refs #3240 refs #5575 fixes #9089
Sometimes, for reasons we don't entirely understand, Node.js doesn't garbage-collect objects when we tell it to, and we get flaky GC-checking tests. This applies our usual fix, which is to only run those tests under XS. It also stops attempting to use `test.serial` as a workaround. refs #3240 refs #5575 fixes #9089
This "forward to fake zoe" in gc-vat.test was added to demonstrate a fix for #3482, in which liveslots was mishandling an intermediate promise by retaining it forever, which made us retain objects that appear in eventual-send results forever. This problem was discovered while investigating an unrelated XS engine bug (#3406), so "is this specific to a single engine?" was on our mind, and I wasn't sure that we were dealing with two independent bugs until I wrote the test and showed that it failed on both V8 and XS. So the test was originally written with a commented-out `managerType:` option to make it easy to switch back and forth between `local` and `xs-worker`. That switch was left in the `local` state, probably because it's slightly faster. What we've learned is that V8 sometimes holds on to objects despite a forced GC pass (see #5575 and #3240), and somehow it only seems to fail in CI runs (and only for people other than me). Our usual response is to make the test use XS instead of V8, either by setting `creationOptions.managerType: 'xs-worker'` on the individual vat, or by setting `defaultManagerType: 'xs-worker'` to set it for all vats. This PR uses the first approach, changing just the one vat being exercised (which should be marginally cheaper than making all vats use XS). closes #9392
) This "forward to fake zoe" in gc-vat.test was added to demonstrate a fix for #3482, in which liveslots was mishandling an intermediate promise by retaining it forever, which made us retain objects that appear in eventual-send results forever. This problem was discovered while investigating an unrelated XS engine bug (#3406), so "is this specific to a single engine?" was on our mind, and I wasn't sure that we were dealing with two independent bugs until I wrote the test and showed that it failed on both V8 and XS. So the test was originally written with a commented-out `managerType:` option to make it easy to switch back and forth between `local` and `xs-worker`. That switch was left in the `local` state, probably because it's slightly faster. What we've learned is that V8 sometimes holds on to objects despite a forced GC pass (see #5575 and #3240), and somehow it only seems to fail in CI runs (and only for people other than me). Our usual response is to make the test use XS instead of V8, either by setting `creationOptions.managerType: 'xs-worker'` on the individual vat, or by setting `defaultManagerType: 'xs-worker'` to set it for all vats. This PR uses the first approach, changing just the one vat being exercised (which should be marginally cheaper than making all vats use XS). closes #9392
We're still seeing occasional cases of this, in the liveslots tests. I'm merging #10210 and #10173 into this ticket because they're all expressions of the same thing. Note that I'm specifically talking about tests in I think we just can't reliably use this "real engine GC" approach to testing when in packages/swingset-liveslots. We can still use the Alternatively, we could disable those tests. We have some coverage of the relevant functionality in other tests (the yes-reliable |
Captured at https://github.com/Agoric/agoric-sdk/runs/6838366644?check_suite_focus=true
Went away when I reran jobs.
Relevant part seems to be
The text was updated successfully, but these errors were encountered: