-
Notifications
You must be signed in to change notification settings - Fork 221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
XS isn't garbage-collecting objects when expected #3406
Comments
I tried to build up a reduced test case "from below", by starting with a simple xsnap runner, and adding code from swingset/eventual-send until it started to exhibit the failure. I wasn't able to make it fail. I've started to try "from above", by removing things from swingset until the failure goes away. The one datapoint I have so far, which maybe will mean something to @michaelfig or @erights , is that removing the returned Promise
|
refs Agoric/agoric-sdk#3406 Note from Patrick Soquet (at Moddable): The objects to be collected were referenced by the closures of a function, deeply nested within the reject handler of a promise. The promise was fulfilled and stored into your “pendingPromises” set. The issue seemed to be that promises retained references to handlers even when resolved. I fixed that. This patch will probably be added upstream soon, so this particular commit is temporary. Author: Patrick Soquet <ps@moddable.tech> Tested-by: Brian Warner <warner@lothar.com>
Update the packages/xsnap/moddable submodule to current public branch, plus an upcoming xsPromise.c fix. * current public branch includes a new WeakMap design, which drastically improves GC speed * the xsPromise.c fix now correctly drops rejection handlers for resolved promises, which was probably the cause of #3406 (unexpected retention of Presences used in `E()` calls, resulting in too few GC actions) refs #3406 (might even close it) refs #3118
The amazing Moddable folks identified the problem and provided a fix. In Patrick's words: "The objects to be collected were referenced by the closures of a function, deeply nested within the reject handler of a promise. The promise was fulfilled and stored into your “pendingPromises” set. The issue seemed to be that promises retained references to handlers even when resolved. I fixed that. " |
This checks that our normal async / `await E()` pattern doesn't cause the target or its arguments to be retained longer than expected. refs #3406
This checks that our normal async / `await E()` pattern doesn't cause the target or its arguments to be retained longer than expected. refs #3406
This checks that our normal async / `await E()` pattern doesn't cause the target or its arguments to be retained longer than expected. refs #3406
I have an example of this, however it's not XS-specific (V8 fails to drop the object too). It's basically just: makeInvitationTarget(zoe) {
return E(zoe).makeInvitationZoe();
}, (where The invitation object passes through liveslots (and a HandledPromise or two), but the user-level code never sees it. The vat imports the invitation from zoe, resolves the result promise for I tried commenting out the I'm currently suspecting some odd cycle in HandledPromise and/or the way that liveslots uses it. |
The new test, 'forward to fake zoe', mimics an execution pathway in the tap-fungible-faucet load-generator task. In the loadgen task, the client asks a fungible-faucet contract (an instance of packages/zoe/src/contracts/mintPayments.js) for an Invitation. The method in the faucet contract immediately sends off a requests to Zoe (through the zcf facet) and returns the result Promise. In my analysis of the loadgen slogfile, the Invitation object (a Zoe Invitation payment) is imported into the faucet contract vat, sent back out again as the resolution of its result promise, but then never dropped. I see no good reason for the faucet contract to hold onto the object: the code doesn't even have a place to put it. I initially thought this was a problem with XS, but when I reproduced the issue in a unit test and changed it to use a Node.js worker, the problem remained. So I'm now suspicious of liveslots, or HandledPromise, or something in their interaction. The test first talks to a fake Zoe vat to export the simulated Invitation object and learn its kref. Then it instructs the bootstrap vat to ask vat-target for an invitation, and vat-target delegates to vat-fake-zoe. Once the kernel is done, and vat-target should have dropped the kref, the test examines the clists. The test would pass if the vat-target clist did not include the Invitation object's kref. Instead, vat-target still references the kref, so the test fails. I experimented with making a few changes in liveslots, without success: * Disable the unfulfilledHandler's applyMethod (by just throwing an error upon entry). The test case never sends a method to a promise, so this code path isn't exercised. * Disable the `knownResolutions.set` in `thenHandler`, which stashes the resolution of a promise for a little while (it's a WeakMap from Promise object to its resolution) so cycles in the resolution graph can be serialized properly. We don't have cross-promise references in our test case, so this is unnecessary. But removing it didn't help. refs #3406
I've got a failing test in branch
The In the second crank (the Neither |
Big thanks to @michaelfig , we walked through this and found a problem in liveslots. Details will be in #3482. I'll leave this ticket open for an actual XS-specific problems we find once #3482 is fixed. |
The new test, 'forward to fake zoe', mimics an execution pathway in the tap-fungible-faucet load-generator task. In the loadgen task, the client asks a fungible-faucet contract (an instance of packages/zoe/src/contracts/mintPayments.js) for an Invitation. The method in the faucet contract immediately sends off a requests to Zoe (through the zcf facet) and returns the result Promise. In my analysis of the loadgen slogfile, the Invitation object (a Zoe Invitation payment) is imported into the faucet contract vat, sent back out again as the resolution of its result promise, but then never dropped. I see no good reason for the faucet contract to hold onto the object: the code doesn't even have a place to put it. I initially thought this was a problem with XS (#3406), but when I reproduced the issue in a unit test and changed it to use a Node.js worker, the problem remained. We traced it down to a problem in liveslots (#3482), which will be fixed by the upcoming commit. This test will fail until that commit. The test first talks to a fake Zoe vat to export the simulated Invitation object and learn its kref. Then it instructs the bootstrap vat to ask vat-target for an invitation, and vat-target delegates to vat-fake-zoe. Once the kernel is done, and vat-target should have dropped the kref, the test examines the clists. The test would pass if the vat-target clist did not include the Invitation object's kref. Instead, vat-target still references the kref, so the test fails.
The new test, 'forward to fake zoe', mimics an execution pathway in the tap-fungible-faucet load-generator task. In the loadgen task, the client asks a fungible-faucet contract (an instance of packages/zoe/src/contracts/mintPayments.js) for an Invitation. The method in the faucet contract immediately sends off a requests to Zoe (through the zcf facet) and returns the result Promise. In my analysis of the loadgen slogfile, the Invitation object (a Zoe Invitation payment) is imported into the faucet contract vat, sent back out again as the resolution of its result promise, but then never dropped. I see no good reason for the faucet contract to hold onto the object: the code doesn't even have a place to put it. I initially thought this was a problem with XS (#3406), but when I reproduced the issue in a unit test and changed it to use a Node.js worker, the problem remained. We traced it down to a problem in liveslots (#3482), which will be fixed by the upcoming commit. This test will fail until that commit. The test first talks to a fake Zoe vat to export the simulated Invitation object and learn its kref. Then it instructs the bootstrap vat to ask vat-target for an invitation, and vat-target delegates to vat-fake-zoe. Once the kernel is done, and vat-target should have dropped the kref, the test examines the clists. The test would pass if the vat-target clist did not include the Invitation object's kref. Instead, vat-target still references the kref, so the test fails.
The new test, 'forward to fake zoe', mimics an execution pathway in the tap-fungible-faucet load-generator task. In the loadgen task, the client asks a fungible-faucet contract (an instance of packages/zoe/src/contracts/mintPayments.js) for an Invitation. The method in the faucet contract immediately sends off a requests to Zoe (through the zcf facet) and returns the result Promise. In my analysis of the loadgen slogfile, the Invitation object (a Zoe Invitation payment) is imported into the faucet contract vat, sent back out again as the resolution of its result promise, but then never dropped. I see no good reason for the faucet contract to hold onto the object: the code doesn't even have a place to put it. I initially thought this was a problem with XS (#3406), but when I reproduced the issue in a unit test and changed it to use a Node.js worker, the problem remained. We traced it down to a problem in liveslots (#3482), which will be fixed by the upcoming commit. This test will fail until that commit. The test first talks to a fake Zoe vat to export the simulated Invitation object and learn its kref. Then it instructs the bootstrap vat to ask vat-target for an invitation, and vat-target delegates to vat-fake-zoe. Once the kernel is done, and vat-target should have dropped the kref, the test examines the clists. The test would pass if the vat-target clist did not include the Invitation object's kref. Instead, vat-target still references the kref, so the test fails.
The new test, 'forward to fake zoe', mimics an execution pathway in the tap-fungible-faucet load-generator task. In the loadgen task, the client asks a fungible-faucet contract (an instance of packages/zoe/src/contracts/mintPayments.js) for an Invitation. The method in the faucet contract immediately sends off a requests to Zoe (through the zcf facet) and returns the result Promise. In my analysis of the loadgen slogfile, the Invitation object (a Zoe Invitation payment) is imported into the faucet contract vat, sent back out again as the resolution of its result promise, but then never dropped. I see no good reason for the faucet contract to hold onto the object: the code doesn't even have a place to put it. I initially thought this was a problem with XS (#3406), but when I reproduced the issue in a unit test and changed it to use a Node.js worker, the problem remained. We traced it down to a problem in liveslots (#3482), which will be fixed by the upcoming commit. This test will fail until that commit. The test first talks to a fake Zoe vat to export the simulated Invitation object and learn its kref. Then it instructs the bootstrap vat to ask vat-target for an invitation, and vat-target delegates to vat-fake-zoe. Once the kernel is done, and vat-target should have dropped the kref, the test examines the clists. The test would pass if the vat-target clist did not include the Invitation object's kref. Instead, vat-target still references the kref, so the test fails.
The new test, 'forward to fake zoe', mimics an execution pathway in the tap-fungible-faucet load-generator task. In the loadgen task, the client asks a fungible-faucet contract (an instance of packages/zoe/src/contracts/mintPayments.js) for an Invitation. The method in the faucet contract immediately sends off a requests to Zoe (through the zcf facet) and returns the result Promise. In my analysis of the loadgen slogfile, the Invitation object (a Zoe Invitation payment) is imported into the faucet contract vat, sent back out again as the resolution of its result promise, but then never dropped. I see no good reason for the faucet contract to hold onto the object: the code doesn't even have a place to put it. I initially thought this was a problem with XS (#3406), but when I reproduced the issue in a unit test and changed it to use a Node.js worker, the problem remained. We traced it down to a problem in liveslots (#3482), which will be fixed by the upcoming commit. This test will fail until that commit. The test first talks to a fake Zoe vat to export the simulated Invitation object and learn its kref. Then it instructs the bootstrap vat to ask vat-target for an invitation, and vat-target delegates to vat-fake-zoe. Once the kernel is done, and vat-target should have dropped the kref, the test examines the clists. The test would pass if the vat-target clist did not include the Invitation object's kref. Instead, vat-target still references the kref, so the test fails.
The new test, 'forward to fake zoe', mimics an execution pathway in the tap-fungible-faucet load-generator task. In the loadgen task, the client asks a fungible-faucet contract (an instance of packages/zoe/src/contracts/mintPayments.js) for an Invitation. The method in the faucet contract immediately sends off a requests to Zoe (through the zcf facet) and returns the result Promise. In my analysis of the loadgen slogfile, the Invitation object (a Zoe Invitation payment) is imported into the faucet contract vat, sent back out again as the resolution of its result promise, but then never dropped. I see no good reason for the faucet contract to hold onto the object: the code doesn't even have a place to put it. I initially thought this was a problem with XS (#3406), but when I reproduced the issue in a unit test and changed it to use a Node.js worker, the problem remained. We traced it down to a problem in liveslots (#3482), which will be fixed by the upcoming commit. This test will fail until that commit. The test first talks to a fake Zoe vat to export the simulated Invitation object and learn its kref. Then it instructs the bootstrap vat to ask vat-target for an invitation, and vat-target delegates to vat-fake-zoe. Once the kernel is done, and vat-target should have dropped the kref, the test examines the clists. The test would pass if the vat-target clist did not include the Invitation object's kref. Instead, vat-target still references the kref, so the test fails.
The new test, 'forward to fake zoe', mimics an execution pathway in the tap-fungible-faucet load-generator task. In the loadgen task, the client asks a fungible-faucet contract (an instance of packages/zoe/src/contracts/mintPayments.js) for an Invitation. The method in the faucet contract immediately sends off a requests to Zoe (through the zcf facet) and returns the result Promise. In my analysis of the loadgen slogfile, the Invitation object (a Zoe Invitation payment) is imported into the faucet contract vat, sent back out again as the resolution of its result promise, but then never dropped. I see no good reason for the faucet contract to hold onto the object: the code doesn't even have a place to put it. I initially thought this was a problem with XS (#3406), but when I reproduced the issue in a unit test and changed it to use a Node.js worker, the problem remained. We traced it down to a problem in liveslots (#3482), which will be fixed by the upcoming commit. This test will fail until that commit. The test first talks to a fake Zoe vat to export the simulated Invitation object and learn its kref. Then it instructs the bootstrap vat to ask vat-target for an invitation, and vat-target delegates to vat-fake-zoe. Once the kernel is done, and vat-target should have dropped the kref, the test examines the clists. The test would pass if the vat-target clist did not include the Invitation object's kref. Instead, vat-target still references the kref, so the test fails.
The new test, 'forward to fake zoe', mimics an execution pathway in the tap-fungible-faucet load-generator task. In the loadgen task, the client asks a fungible-faucet contract (an instance of packages/zoe/src/contracts/mintPayments.js) for an Invitation. The method in the faucet contract immediately sends off a requests to Zoe (through the zcf facet) and returns the result Promise. In my analysis of the loadgen slogfile, the Invitation object (a Zoe Invitation payment) is imported into the faucet contract vat, sent back out again as the resolution of its result promise, but then never dropped. I see no good reason for the faucet contract to hold onto the object: the code doesn't even have a place to put it. I initially thought this was a problem with XS (#3406), but when I reproduced the issue in a unit test and changed it to use a Node.js worker, the problem remained. We traced it down to a problem in liveslots (#3482), which will be fixed by the upcoming commit. This test will fail until that commit. The test first talks to a fake Zoe vat to export the simulated Invitation object and learn its kref. Then it instructs the bootstrap vat to ask vat-target for an invitation, and vat-target delegates to vat-fake-zoe. Once the kernel is done, and vat-target should have dropped the kref, the test examines the clists. The test would pass if the vat-target clist did not include the Invitation object's kref. Instead, vat-target still references the kref, so the test fails.
Did you find any XS-specific problems? |
#3488 is about a leak we found in the "vault" loadgen task, however we don't know if it's XS-specific or not. We decided to fix an API usage issue in the treasury contract first, then reevaluate to see if the leak is still present. I'm going to close this ticket, and open a new one if/when we find a new leak. |
This "forward to fake zoe" in gc-vat.test was added to demonstrate a fix for #3482, in which liveslots was mishandling an intermediate promise by retaining it forever, which made us retain objects that appear in eventual-send results forever. This problem was discovered while investigating an unrelated XS engine bug (#3406), so "is this specific to a single engine?" was on our mind, and I wasn't sure that we were dealing with two independent bugs until I wrote the test and showed that it failed on both V8 and XS. So the test was originally written with a commented-out `managerType:` option to make it easy to switch back and forth between `local` and `xs-worker`. That switch was left in the `local` state, probably because it's slightly faster. What we've learned is that V8 sometimes holds on to objects despite a forced GC pass (see #5575 and #3240), and somehow it only seems to fail in CI runs (and only for people other than me). Our usual response is to make the test use XS instead of V8, either by setting `creationOptions.managerType: 'xs-worker'` on the individual vat, or by setting `defaultManagerType: 'xs-worker'` to set it for all vats. This PR uses the first approach, changing just the one vat being exercised (which should be marginally cheaper than making all vats use XS). closes #9392
) This "forward to fake zoe" in gc-vat.test was added to demonstrate a fix for #3482, in which liveslots was mishandling an intermediate promise by retaining it forever, which made us retain objects that appear in eventual-send results forever. This problem was discovered while investigating an unrelated XS engine bug (#3406), so "is this specific to a single engine?" was on our mind, and I wasn't sure that we were dealing with two independent bugs until I wrote the test and showed that it failed on both V8 and XS. So the test was originally written with a commented-out `managerType:` option to make it easy to switch back and forth between `local` and `xs-worker`. That switch was left in the `local` state, probably because it's slightly faster. What we've learned is that V8 sometimes holds on to objects despite a forced GC pass (see #5575 and #3240), and somehow it only seems to fail in CI runs (and only for people other than me). Our usual response is to make the test use XS instead of V8, either by setting `creationOptions.managerType: 'xs-worker'` on the individual vat, or by setting `defaultManagerType: 'xs-worker'` to set it for all vats. This PR uses the first approach, changing just the one vat being exercised (which should be marginally cheaper than making all vats use XS). closes #9392
We're seeing unexpected behavior from the XS garbage collector. It seems to be retaining objects used in eventual-sends, where the V8 collector lets them get collected.
I'm still characterizing this, but a simple example is the following vat:
(A and B are simple Remotables exported by some other vat, with a dummy
hello: () => {}
method)This results in two cranks on our vat: the first delivers
two(A,B)
, and the second notifies the vat about thehello()
result promise being resolved.At the end of each crank, our
liveSlots.js
layer is waiting for the vat's code to become idle (withawait new Promise(setImmediate)
), then forcing a GC sweep, with several additionalsetImmediate
stalls introduced that were experimentally determined to allow finalizers to run. When the finalizers run, they accumulate the vrefs in a set, and then performsyscall.dropImports()
.When we run this under Node.js, the two objects are collected during the second crank, causing our vat to do a
syscall.dropImports()
for the vrefs that represent both A and B. When we run this under XS, neither object is collected (i.e. these finalizers were not called by the end of the second crank). Since no further deliveries are made to this vat, there are no further opportunities for the vat to inform the kernel about the objects being dropped.The same happens if we replace the
return
with async/await:But if we remove the
return
:Then we still get the same two cranks (however nobody is watching the result promise, so no user-level code runs when the
notify
arrives). Node.js collects both A and B during the second crank as before. Under XS, A is collected at the end of the first crank, and B is never collected.Ideally, both A and B would be collected at the end of the first crank: there are no references to either after the
hello()
message is sent.Our standard coding style is to use
async
methods andawait
(although only at the top of the method scope, not inside loops or conditionals). So we're going to hit this case all of the time, and we need those objects to get released.The text was updated successfully, but these errors were encountered: