Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent swingset test error, possibly due to gc non-determinism #5575

Open
erights opened this issue Jun 10, 2022 · 10 comments
Open

Intermittent swingset test error, possibly due to gc non-determinism #5575

erights opened this issue Jun 10, 2022 · 10 comments
Labels
bug Something isn't working SwingSet package: SwingSet

Comments

@erights
Copy link
Member

erights commented Jun 10, 2022

Captured at https://github.com/Agoric/agoric-sdk/runs/6838366644?check_suite_focus=true

Went away when I reran jobs.

Relevant part seems to be

virtualObjects › virtualObjectGC › VO refcount management 3 unfaceted
  Difference:
  - undefined
  + {
  +   key: 'vom.rc.o+12/3',
  +   result: '1',
  +   type: 'vatstoreGet',
  + }
  › validate (packages/SwingSet/test/liveslots-helpers.js:278:7)
  › voRefcountManagementTest3 (packages/SwingSet/test/virtualObjects/test-virtualObjectGC.js:1341:3)
  › async packages/SwingSet/test/virtualObjects/test-virtualObjectGC.js:1373:3
@erights erights added the bug Something isn't working label Jun 10, 2022
@erights
Copy link
Member Author

erights commented Jun 10, 2022

Full trace

[Skip to content](https://github.com/Agoric/agoric-sdk/runs/6838366644?check_suite_focus=true#start-of-content)
Search or jump to…
[Pull requests](https://github.com/pulls)
[Issues](https://github.com/issues)
[Marketplace](https://github.com/marketplace)
[Explore](https://github.com/explore)
 
@erights 
https://github.com/Agoric
/
[agoric-sdk](https://github.com/Agoric/agoric-sdk) 
Public
[Code](https://github.com/Agoric/agoric-sdk)
Issues
861
Pull requests
76
[ZenHub](https://github.com/Agoric/agoric-sdk/runs/6838366644?check_suite_focus=true#zenhub)
[Discussions](https://github.com/Agoric/agoric-sdk/discussions)
[Actions](https://github.com/Agoric/agoric-sdk/actions)
Projects
7
[Wiki](https://github.com/Agoric/agoric-sdk/wiki)
Security
16
[Insights](https://github.com/Agoric/agoric-sdk/pulse)
[Settings](https://github.com/Agoric/agoric-sdk/settings)
fix: patch mistaken overrides Test all Packages #14818
[Summary](https://github.com/Agoric/agoric-sdk/actions/runs/2477610360)
Jobs
[build (14.x)](https://github.com/Agoric/agoric-sdk/runs/6838550177?check_suite_focus=true)
[build (16.x)](https://github.com/Agoric/agoric-sdk/runs/6838550305?check_suite_focus=true)
[lint-primary](https://github.com/Agoric/agoric-sdk/runs/6838550639?check_suite_focus=true)
[lint-rest](https://github.com/Agoric/agoric-sdk/runs/6838550601?check_suite_focus=true)
[test-quick (14.x)](https://github.com/Agoric/agoric-sdk/runs/6838550820?check_suite_focus=true)
[test-quick (16.x)](https://github.com/Agoric/agoric-sdk/runs/6838550864?check_suite_focus=true)
[test-quick (xs)](https://github.com/Agoric/agoric-sdk/runs/6838550909?check_suite_focus=true)
[test-quick2 (14.x)](https://github.com/Agoric/agoric-sdk/runs/6838550957?check_suite_focus=true)
[test-quick2 (16.x)](https://github.com/Agoric/agoric-sdk/runs/6838550990?check_suite_focus=true)
[test-quick2 (xs)](https://github.com/Agoric/agoric-sdk/runs/6838551039?check_suite_focus=true)
[test-solo (14.x)](https://github.com/Agoric/agoric-sdk/runs/6838551105?check_suite_focus=true)
[test-solo (16.x)](https://github.com/Agoric/agoric-sdk/runs/6838551154?check_suite_focus=true)
[test-solo (xs)](https://github.com/Agoric/agoric-sdk/runs/6838551199?check_suite_focus=true)
[test-cosmic-swingset (14.x)](https://github.com/Agoric/agoric-sdk/runs/6838550686?check_suite_focus=true)
[test-cosmic-swingset (16.x)](https://github.com/Agoric/agoric-sdk/runs/6838550723?check_suite_focus=true)
[test-cosmic-swingset (xs)](https://github.com/Agoric/agoric-sdk/runs/6838550763?check_suite_focus=true)
[test-swingset (14.x)](https://github.com/Agoric/agoric-sdk/runs/6838551241?check_suite_focus=true)
[test-swingset (16.x)](https://github.com/Agoric/agoric-sdk/runs/6838551274?check_suite_focus=true)
[test-swingset (xs)](https://github.com/Agoric/agoric-sdk/runs/6838551314?check_suite_focus=true)
[test-swingset2 (14.x)](https://github.com/Agoric/agoric-sdk/runs/6838551382?check_suite_focus=true)
[test-swingset2 (16.x)](https://github.com/Agoric/agoric-sdk/runs/6838551434?check_suite_focus=true)
[test-swingset2 (xs)](https://github.com/Agoric/agoric-sdk/runs/6838551472?check_suite_focus=true)
[test-swingset3 (14.x)](https://github.com/Agoric/agoric-sdk/runs/6838551524?check_suite_focus=true)
[test-swingset3 (16.x)](https://github.com/Agoric/agoric-sdk/runs/6838551563?check_suite_focus=true)
[test-swingset3 (xs)](https://github.com/Agoric/agoric-sdk/runs/6838551606?check_suite_focus=true)
[test-swingset4 (14.x)](https://github.com/Agoric/agoric-sdk/runs/6838551652?check_suite_focus=true)
[test-swingset4 (16.x)](https://github.com/Agoric/agoric-sdk/runs/6838551704?check_suite_focus=true)
[test-swingset4 (xs)](https://github.com/Agoric/agoric-sdk/runs/6838551755?check_suite_focus=true)
[test-zoe-unit (14.x)](https://github.com/Agoric/agoric-sdk/runs/6838552001?check_suite_focus=true)
[test-zoe-unit (16.x)](https://github.com/Agoric/agoric-sdk/runs/6838552050?check_suite_focus=true)
[test-zoe-unit (xs)](https://github.com/Agoric/agoric-sdk/runs/6838552101?check_suite_focus=true)
[test-zoe-swingset (14.x)](https://github.com/Agoric/agoric-sdk/runs/6838551819?check_suite_focus=true)
[test-zoe-swingset (16.x)](https://github.com/Agoric/agoric-sdk/runs/6838551899?check_suite_focus=true)
[test-zoe-swingset (xs)](https://github.com/Agoric/agoric-sdk/runs/6838551948?check_suite_focus=true)
[test-swingset4 (16.x)](https://github.com/Agoric/agoric-sdk/runs/6838366644?check_suite_focus=true#logs)
failed 2 hours ago in 10m 51s
Search logs
1s
1s
1s
44s
10m 2s
  at deliver (.../swingset-vat/src/liveslots/liveslots.js:1009:18)
  at dispatchToUserspace (.../swingset-vat/src/liveslots/liveslots.js:1392:1)
  at eval (.../swingset-vat/src/kernel/dummyMeterControl.js:40:31)
  at runWithoutMetering (.../swingset-vat/src/kernel/dummyMeterControl.js:22:8)
  at wrapped (.../swingset-vat/src/kernel/dummyMeterControl.js:40:8)
RemoteError(error:liveSlots:v8#70001)#71 ERROR_NOTE: Rejection from: (Error#72) : 331 . 0
RemoteError(error:liveSlots:v8#70001)#71 ERROR_NOTE: Sent as error:liveSlots:v2#70001
Nested error under RemoteError(error:liveSlots:v8#70001)#71
  Error#72: Event: 330.1
    at deliver (.../swingset-vat/src/liveslots/liveslots.js:1031:20)
    at dispatchToUserspace (.../swingset-vat/src/liveslots/liveslots.js:1392:1)
    at eval (.../swingset-vat/src/kernel/dummyMeterControl.js:40:31)
    at runWithoutMetering (.../swingset-vat/src/kernel/dummyMeterControl.js:22:8)
    at wrapped (.../swingset-vat/src/kernel/dummyMeterControl.js:40:8)
  ✔ vat-admin › terminate › terminate-non-critical › exit happy path simple result (static, non-critical) (2m 59.7s)
  ✔ vat-admin › terminate › terminate-non-critical › exit happy path complex result (static, non-critical) (2m 59.9s)
  ✔ vat-admin › terminate › terminate-non-critical › exit sad path simple result (static, non-critical) (3m)
  ✔ vat-admin › terminate › terminate-non-critical › exit sad path complex result (static, non-critical) (3m 0.2s)
  ✔ vat-admin › terminate › terminate-non-critical › exit sad path with ante-mortem message (static, non-critical) (3m 0.6s)
{
  body: '[{"@qclass":"slot","index":0},{"@qclass":"slot","index":1},{"@qclass":"slot","index":2}]',
  slots: [ 'kp42', 'kp43', 'kp44' ]
}
Logging sent error stack (RemoteError(error:liveSlots:v8#70001)#75)
RemoteError(error:liveSlots:v8#70001)#75: exceptionallyHappy
  at fullRevive (.../marshal/src/marshal.js:427:20)
  at fullRevive (.../marshal/src/marshal.js:476:11)
  at fullRevive (.../marshal/src/marshal.js:476:11)
  at Object.unserialize (.../marshal/src/marshal.js:510:21)
  at deliver (.../swingset-vat/src/liveslots/liveslots.js:1009:18)
  at dispatchToUserspace (.../swingset-vat/src/liveslots/liveslots.js:1392:1)
  at eval (.../swingset-vat/src/kernel/dummyMeterControl.js:40:31)
  at runWithoutMetering (.../swingset-vat/src/kernel/dummyMeterControl.js:22:8)
  at wrapped (.../swingset-vat/src/kernel/dummyMeterControl.js:40:8)
RemoteError(error:liveSlots:v8#70001)#75 ERROR_NOTE: Sent as error:liveSlots:v2#70001
  ✔ vat-admin › terminate › terminate-non-critical › exit happy path with ante-mortem message (static, non-critical) (3m 2.9s)
{
  body: '[{"@qclass":"slot","index":0},{"@qclass":"slot","index":1},{"@qclass":"slot","index":2}]',
  slots: [ 'kp42', 'kp43', 'kp44' ]
}
vc.get(p3)
 got [object Promise]
{ body: '[1,2,2,3,3]', slots: [] }
  ✔ virtualObjects › vdata-promises › vdata-promises › imported promises in vdata (1m 33.4s)
vc.get(p6)
 got [object Promise]
  ✔ virtualObjects › vdata-promises › vdata-promises › result promises in vdata (1m 35s)
{
  body: '{"data":{"is":{"p10is":true,"p14is":true,"p7is":true,"p8is":true,"p9is":true},"ret":{"p10":{"@qclass":"slot","index":0},"p10a":{"@qclass":"slot","index":0},"p11":{"@qclass":"slot","index":1},"p12":{"@qclass":"slot","index":2},"p13":{"@qclass":"slot","index":3},"p14":{"@qclass":"slot","index":4},"p14a":{"@qclass":"slot","index":4},"p7":{"@qclass":"slot","index":5},"p7a":{"@qclass":"slot","index":5},"p8":{"@qclass":"slot","index":6},"p8a":{"@qclass":"slot","index":6},"p9":{"@qclass":"slot","index":7},"p9a":{"@qclass":"slot","index":7}}},"resolutions":{"p10":10,"p11":11,"p12":12,"p13":13,"p14":14,"p9":9}}',
  slots: [
    'kp63', 'kp64',
    'kp65', 'kp66',
    'kp67', 'kp68',
  ✔ virtualObjects › vdata-promises › vdata-promises › exported promises in vdata (1m 36s)
    'kp69', 'kp70'
  ]
}
undefined
DANGER: static vat v8 terminated
  ✔ vat-admin › terminate › terminate-non-critical › dead vat state removed (3m 13s)
  ✔ vat-admin › terminate › terminate-non-critical › exit happy path simple result (dynamic, non-critical) (3m 14s)
  ✔ vat-admin › terminate › terminate-non-critical › exit happy path complex result (dynamic, non-critical) (3m 14s)
  ✔ vat-admin › terminate › terminate-non-critical › exit sad path simple result (dynamic, non-critical) (3m 14.1s)
  ✔ vat-admin › terminate › terminate-non-critical › exit sad path complex result (dynamic, non-critical) (3m 14.2s)
  ✔ vat-admin › terminate › terminate-non-critical › exit sad path with ante-mortem message (dynamic, non-critical) (3m 14.2s)
  ✔ vat-admin › terminate › terminate-non-critical › exit happy path with ante-mortem message (dynamic, non-critical) (3m 14.4s)
  ✔ vat-admin › terminate › terminate-non-critical › terminate (dynamic, non-critical) (3m 14.7s)
  ─
  virtualObjects › virtualObjectGC › VO refcount management 3 unfaceted
  Difference:
  - undefined
  + {
  +   key: 'vom.rc.o+12/3',
  +   result: '1',
  +   type: 'vatstoreGet',
  + }
  › validate (packages/SwingSet/test/liveslots-helpers.js:278:7)
  › voRefcountManagementTest3 (packages/SwingSet/test/virtualObjects/test-virtualObjectGC.js:1341:3)
  › async packages/SwingSet/test/virtualObjects/test-virtualObjectGC.js:1373:3
  ─
  1 test failed
  3 known failures
  4 tests skipped
error Command failed with exit code 1.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.
Error: Process completed with exit code 1.
0s
0s
0s

@warner warner changed the title Intermittent error, possibly due to gc non-determinism Intermittent swingset test error, possibly due to gc non-determinism Jun 13, 2022
@warner warner added SwingSet package: SwingSet and removed SwingSet package: SwingSet labels Jun 13, 2022
@warner
Copy link
Member

warner commented Jun 13, 2022

@Chris-Hibbert noticed https://github.com/Agoric/agoric-sdk/runs/6867460832?check_suite_focus=true doing something similar:

  upgrade › upgrade-replay › replay after upgrade
  Rejected promise returned by test. Reason:
  Error {
    message: 'historical inaccuracy in replay of v2',
  }
  › Object.finishReplayDelivery (.../swingset-vat/src/kernel/vat-loader/transcript.js:91:13)
  › replayOneDelivery (.../swingset-vat/src/kernel/vat-loader/manager-helper.js:189:19)
  › async Object.replayTranscript (.../swingset-vat/src/kernel/vat-loader/manager-helper.js:220:1)
  › async ensureVatOnline (.../swingset-vat/src/kernel/vat-warehouse.js:130:1)
  › async Object.start (.../swingset-vat/src/kernel/vat-warehouse.js:172:1)
  › async Object.start (.../swingset-vat/src/kernel/kernel.js:1427:1)
  › async makeSwingsetController (packages/SwingSet/src/controller/controller.js:338:3)
  › async packages/SwingSet/test/upgrade/test-upgrade-replay.js:73:16

I think this is the console output from that test:

anachrophobia strikes vat v2 on delivery 9
delivery completed with 2 expected syscalls remaining
expected: {"0":"vatstoreGet","1":"vom.rc.o-50","length":2}
expected: {"0":"vatstoreGetAfter","1":"","2":"vom.ir.o-50|","length":3}
  ✖ upgrade › upgrade-replay › replay after upgrade Rejected promise returned by test
Error#1: historical inaccuracy in replay of v2
  at Object.finishReplayDelivery (.../swingset-vat/src/kernel/vat-loader/transcript.js:91:13)
  at replayOneDelivery (.../swingset-vat/src/kernel/vat-loader/manager-helper.js:189:19)
  at async Object.replayTranscript (.../swingset-vat/src/kernel/vat-loader/manager-helper.js:220:1)
  at async ensureVatOnline (.../swingset-vat/src/kernel/vat-warehouse.js:130:1)
  at async Object.start (.../swingset-vat/src/kernel/vat-warehouse.js:172:1)
  at async Object.start (.../swingset-vat/src/kernel/kernel.js:1427:1)
  at async makeSwingsetController (packages/SwingSet/src/controller/controller.js:338:3)
  at async packages/SwingSet/test/upgrade/test-upgrade-replay.js:73:16

This feels like the same kind of issue, probably a finalizer that failed to run in the expected crank. I'm seeing problems with Node.js -hosted workers not behaving GC consistently. test-upgrade-replay.js > 'replay after upgrade' is using the default local worker (Node.js), and it uses test() rather than test.serial() (although there are no other tests in that same file).

I have one suspicion that ava is somehow sharing processes between separate test files, thus requiring more uses of test.serial than ought to be necessary. We could investigate this by printing process.pid from both and see if they're distinct.

I have a separate suspicion that test.serial in all tests that share a process may not be sufficient to get Node's gc() to shake everything loose.

@warner
Copy link
Member

warner commented Jun 18, 2022

I'm seeing more of the original VO refcount management 3 unfaceted instance, https://github.com/Agoric/agoric-sdk/runs/6944184085?check_suite_focus=true

@warner
Copy link
Member

warner commented Jun 24, 2022

This is annoying, but not enough to justify putting more energy into fixing than we already have. We'll revisit if it appears more frequently (and becomes even more annoying), or if we have a clever idea to fix it.

@warner
Copy link
Member

warner commented Aug 23, 2022

At this point I think the best approach is to give up on expecting controllable GC under v8, which means making a list of all the tests that attempt to look at GC behavior (possibly indirectly), and configure them to only use xs-worker, so they'll only run under xsnap and never under v8.

@mhofman
Copy link
Member

mhofman commented Aug 23, 2022

I suppose we don't really care right now about having liveslots's GC logic running reliably under v8, so yeah restricting gc test to xs-worker makes sense.

@erights
Copy link
Member Author

erights commented Dec 24, 2022

How has recent work on gc determinism affected this?

@mhofman
Copy link
Member

mhofman commented Dec 24, 2022

First any recent GC non-determinism work was for XS, and only materialized for virtual collections metadata, so from what I understand it's not related to this issue.

Then any of those fixes have not landed anywhere so I don't think we'd know if they had any impact.

I'm also not sure what the initial issue was here, or if any GC related work happened for VO since the initial report. Has anyone experienced this issue since the original report?

@erights
Copy link
Member Author

erights commented Dec 24, 2022

@warner @FUDCo still relevant?

This was referenced Mar 24, 2023
warner added a commit that referenced this issue Apr 1, 2023
I saw a test failure here, "historical inaccuracy in replay", which
might have been due to GC happening one way in the original, and a
different way in the replay (when running under Node.js). This feels
like an aspect of #5575, and this test isn't trying to exercise
anything about GC, so I'm just going to set defaultReapInterval to
'never' to inhibit BOYD, which should remove the problem.
warner added a commit that referenced this issue Apr 2, 2023
I saw a test failure here, "historical inaccuracy in replay", which
might have been due to GC happening one way in the original, and a
different way in the replay (when running under Node.js). This feels
like an aspect of #5575, and this test isn't trying to exercise
anything about GC, so I'm just going to set defaultReapInterval to
'never' to inhibit BOYD, which should remove the problem.
warner added a commit that referenced this issue Apr 5, 2023
I saw a test failure here, "historical inaccuracy in replay", which
might have been due to GC happening one way in the original, and a
different way in the replay (when running under Node.js). This feels
like an aspect of #5575, and this test isn't trying to exercise
anything about GC, so I'm just going to set defaultReapInterval to
'never' to inhibit BOYD, which should remove the problem.
warner added a commit that referenced this issue Apr 5, 2023
I saw a test failure here, "historical inaccuracy in replay", which
might have been due to GC happening one way in the original, and a
different way in the replay (when running under Node.js). This feels
like an aspect of #5575, and this test isn't trying to exercise
anything about GC, so I'm just going to set defaultReapInterval to
'never' to inhibit BOYD, which should remove the problem.
warner added a commit that referenced this issue Apr 5, 2023
node/v8/ava is just too flaky, I was seeing #5575 -type problems in
test-upgrade.js
warner added a commit that referenced this issue Apr 5, 2023
I saw a test failure here, "historical inaccuracy in replay", which
might have been due to GC happening one way in the original, and a
different way in the replay (when running under Node.js). This feels
like an aspect of #5575, and this test isn't trying to exercise
anything about GC, so I'm just going to set defaultReapInterval to
'never' to inhibit BOYD, which should remove the problem.
warner added a commit that referenced this issue Apr 5, 2023
I saw a test failure here, "historical inaccuracy in replay", which
might have been due to GC happening one way in the original, and a
different way in the replay (when running under Node.js). This feels
like an aspect of #5575, and this test isn't trying to exercise
anything about GC, so I'm just going to set defaultReapInterval to
'never' to inhibit BOYD, which should remove the problem.
warner added a commit that referenced this issue Apr 5, 2023
I saw a test failure here, "historical inaccuracy in replay", which
might have been due to GC happening one way in the original, and a
different way in the replay (when running under Node.js). This feels
like an aspect of #5575, and this test isn't trying to exercise
anything about GC, so I'm just going to set defaultReapInterval to
'never' to inhibit BOYD, which should remove the problem.
warner added a commit that referenced this issue Mar 23, 2024
Sometimes, for reasons we don't entirely understand, Node.js doesn't
garbage-collect objects when we tell it to, and we get flaky
GC-checking tests. This applies our usual fix, which is to only run
those tests under XS.

refs #3240
refs #5575
fixes #9089
warner added a commit that referenced this issue Mar 23, 2024
Sometimes, for reasons we don't entirely understand, Node.js doesn't
garbage-collect objects when we tell it to, and we get flaky
GC-checking tests. This applies our usual fix, which is to only run
those tests under XS.

It also stops attempting to use `test.serial` as a workaround.

refs #3240
refs #5575
fixes #9089
warner added a commit that referenced this issue Mar 26, 2024
Sometimes, for reasons we don't entirely understand, Node.js doesn't
garbage-collect objects when we tell it to, and we get flaky
GC-checking tests. This applies our usual fix, which is to only run
those tests under XS.

It also stops attempting to use `test.serial` as a workaround.

refs #3240
refs #5575
fixes #9089
turadg pushed a commit that referenced this issue Mar 27, 2024
Sometimes, for reasons we don't entirely understand, Node.js doesn't
garbage-collect objects when we tell it to, and we get flaky
GC-checking tests. This applies our usual fix, which is to only run
those tests under XS.

It also stops attempting to use `test.serial` as a workaround.

refs #3240
refs #5575
fixes #9089
warner added a commit that referenced this issue Jun 1, 2024
This "forward to fake zoe" in gc-vat.test was added to demonstrate a
fix for #3482, in which liveslots was mishandling an intermediate
promise by retaining it forever, which made us retain objects that
appear in eventual-send results forever.

This problem was discovered while investigating an unrelated XS engine
bug (#3406), so "is this specific to a single engine?" was on our
mind, and I wasn't sure that we were dealing with two independent bugs
until I wrote the test and showed that it failed on both V8 and XS. So
the test was originally written with a commented-out `managerType:`
option to make it easy to switch back and forth between `local` and
`xs-worker`. That switch was left in the `local` state, probably
because it's slightly faster.

What we've learned is that V8 sometimes holds on to objects despite a
forced GC pass (see #5575 and #3240), and somehow it only seems to
fail in CI runs (and only for people other than me). Our usual
response is to make the test use XS instead of V8, either by setting
`creationOptions.managerType: 'xs-worker'` on the individual vat, or
by setting `defaultManagerType: 'xs-worker'` to set it for all vats.

This PR uses the first approach, changing just the one vat being
exercised (which should be marginally cheaper than making all vats use
XS).

closes #9392
mergify bot added a commit that referenced this issue Jun 1, 2024
)

This "forward to fake zoe" in gc-vat.test was added to demonstrate a fix
for #3482, in which liveslots was mishandling an intermediate promise by
retaining it forever, which made us retain objects that appear in
eventual-send results forever.

This problem was discovered while investigating an unrelated XS engine
bug (#3406), so "is this specific to a single engine?" was on our mind,
and I wasn't sure that we were dealing with two independent bugs until I
wrote the test and showed that it failed on both V8 and XS. So the test
was originally written with a commented-out `managerType:` option to
make it easy to switch back and forth between `local` and `xs-worker`.
That switch was left in the `local` state, probably because it's
slightly faster.

What we've learned is that V8 sometimes holds on to objects despite a
forced GC pass (see #5575 and #3240), and somehow it only seems to fail
in CI runs (and only for people other than me). Our usual response is to
make the test use XS instead of V8, either by setting
`creationOptions.managerType: 'xs-worker'` on the individual vat, or by
setting `defaultManagerType: 'xs-worker'` to set it for all vats.

This PR uses the first approach, changing just the one vat being
exercised (which should be marginally cheaper than making all vats use
XS).

closes #9392
@warner
Copy link
Member

warner commented Dec 18, 2024

We're still seeing occasional cases of this, in the liveslots tests. I'm merging #10210 and #10173 into this ticket because they're all expressions of the same thing.

Note that I'm specifically talking about tests in packages/swingset-liveslots/, which do not launch a worker. So unfortunately there's no managerType: control to set to "xsnap only".

I think we just can't reliably use this "real engine GC" approach to testing when in packages/swingset-liveslots. We can still use the fakeGC approach there, but for any test that relies on real GC, the best answer is probably to move it out of packages/swingset-liveslots/ and into packages/SwingSet/, and change the tests to run inside a worker, which will involve significant surgery.

Alternatively, we could disable those tests. We have some coverage of the relevant functionality in other tests (the yes-reliable fakeGC ones), but probably not as extensive as we'd like. So the longer fix here is to eyeball the old tests, see what they're trying to catch, look at the fakeGC tests and see if they manage to do enough of the same, and then write enough new tests to make up the difference. Or, find some neat option to AVA or Node that inhibits whatever parallelism/heap-sharing that's causing the problem and turn it off.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working SwingSet package: SwingSet
Projects
None yet
Development

No branches or pull requests

5 participants