Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(swingset): clean up promise c-list entries during vat deletion #10268

Merged
merged 2 commits into from
Oct 26, 2024

Conversation

warner
Copy link
Member

@warner warner commented Oct 12, 2024

Previously, when a vat was terminated, and we delete the promise
c-list entries from its old state, the cleanup code was failing to
decrement the kpid's refcount properly. This resulted in a leak: those
promises could never be retired.

This commit updates the vat cleanup code to add a new phase, named
promises. This executes after exports and imports, but before
kv, and is responsible for both deleting the c-list entries and also
decrementing the refcounts of the corresponding promises. We do this
slowly, like we do exports and imports, because we don't know how many
there might be, and because those promise records might hold
references to other objects (in the resolution data), which could
trigger additional work. However, this work is unlikely to be
significant: the run-queue is usually empty, so these outstanding
promises are probably unresolved, and thus cannot beholding resolution
data.

All promises decided by the dead vat are rejected by the kernel
immediately during vat termination, because those rejections are
visible to userspace in other vats. In contrast, freeing the promise
records is not visible to userspace, just like how freeing imports
or exports are not visible to userspace, so this cleanup is safe to do
at a leisurely pace, rate-limited by runPolicy.allowCleanup.

The docs are updated to reflect the new runPolicy API:

  • budget.promises is new, and respected by slow cleanup
  • work.promises is reported to runPolicy.didCleanup()

I don't intend to add any remediation code: it requires a full
refcount audit to find such promises, and the mainnet kernel has only
ever terminated one vat so far, so I believe there cannot be very many
leaked promises, if any. Once this fix is applied, no new leaks will
occur.

fixes #10261

Copy link

cloudflare-workers-and-pages bot commented Oct 12, 2024

Deploying agoric-sdk with  Cloudflare Pages  Cloudflare Pages

Latest commit: 08e5dc9
Status: ✅  Deploy successful!
Preview URL: https://52e4c5dc.agoric-sdk.pages.dev
Branch Preview URL: https://warner-10261-termination-lea.agoric-sdk.pages.dev

View logs

@warner warner marked this pull request as ready for review October 12, 2024 02:20
@warner warner requested a review from a team as a code owner October 12, 2024 02:20
@warner warner force-pushed the warner/10261-termination-leaks-promises branch from b0c4f74 to 5883856 Compare October 13, 2024 19:38
@warner warner requested a review from mhofman October 24, 2024 21:35
Copy link
Member

@gibson042 gibson042 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must have leveled up, because following these changes was not too bad.

Comment on lines +1058 to +1116
work.promises += 1;
remaining -= 1;
if (remaining <= 0) {
return { done: false, work };
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cleanupAfterTerminatedVat function is getting a bit unwieldy... I wonder if things would be better or worse if we abstracted some boilerplate?

No suggestion at this point, though.

Two tests are updated to exercise the cleanup of promise c-list
entries during vat termination. `terminate.test.js` adds some promises
to the c-list and then checks their refcounts after termination, to
demonstrate that bug #10261 is leaking a refcount when it deletes the
dead vat's c-list entry without also decrementing the
refcount. `slow-termination.test.js` adds a number of promises to the
c-list, and expects the budget-limited cleanup to spend a phase on
promises.

Both tests are marked as failing until the code fix is landed in the
next commit.
Previously, when a vat was terminated, and we delete the promise
c-list entries from its old state, the cleanup code was failing to
decrement the kpid's refcount properly. This resulted in a leak: those
promises could never be retired.

This commit updates the vat cleanup code to add a new phase, named
`promises`. This executes after `exports` and `imports`, but before
`kv`, and is responsible for both deleting the c-list entries and also
decrementing the refcounts of the corresponding promises. We do this
slowly, like we do exports and imports, because we don't know how many
there might be, and because those promise records might hold
references to other objects (in the resolution data), which could
trigger additional work. However, this work is unlikely to be
significant: the run-queue is usually empty, so these outstanding
promises are probably unresolved, and thus cannot beholding resolution
data.

All promises *decided* by the dead vat are rejected by the kernel
immediately during vat termination, because those rejections are
visible to userspace in other vats. In contrast, freeing the promise
records is *not* visible to userspace, just like how freeing imports
or exports are not visible to userspace, so this cleanup is safe to do
at a leisurely pace, rate-limited by `runPolicy.allowCleanup`.

The docs are updated to reflect the new `runPolicy` API:

* `budget.promises` is new, and respected by slow cleanup
* `work.promises` is reported to `runPolicy.didCleanup()`

The 'test.failing' marker was removed from the previously updated
tests.

I don't intend to add any remediation code: it requires a full
refcount audit to find such promises, and the mainnet kernel has only
ever terminated one vat so far, so I believe there cannot be very many
leaked promises, if any. Once this fix is applied, no new leaks will
occur.

fixes #10261
@warner warner force-pushed the warner/10261-termination-leaks-promises branch from 74c7a0c to 08e5dc9 Compare October 26, 2024 01:42
Copy link
Member

@mhofman mhofman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fairly straightforward. The tests do seem very descriptive, which I'm worried will complicate maintenance, but that's out of scope for this change.

for (let i = 0; i < 20; i += 1) {
myImports.push(await E(root).sendExport());
}

// also 10 imported promises
for (let i = 0; i < 10; i += 1) {
await E(root).acceptImports(makePromiseKit().promise);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, these eventual sends also result in imported / externally decided promises but since they're immediately settled, they do not participate in any cleanup?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, the acceptImports result promises are settled (and the notifies delivered) during the bootstrap run, driven by the await controller.run() on line 110 of slow-termination.test.js. The makePromiseKit().promise payload of those messages are never resolved (they remain decided by vat-bootstrap, but we've thrown away the settlement functions), so they'll still be in the c-list when the vat is terminated. The exports will be rejected when the target vat is terminated, and removed from the c-list by the run() on line 191. But the imports will still be sitting there when the slow deletion happens.

@warner warner added the automerge:rebase Automatically rebase updates, then merge label Oct 26, 2024
@mergify mergify bot merged commit 9c73ae4 into master Oct 26, 2024
90 checks passed
@mergify mergify bot deleted the warner/10261-termination-leaks-promises branch October 26, 2024 17:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
automerge:rebase Automatically rebase updates, then merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

vat termination leaks promises
3 participants