-
Notifications
You must be signed in to change notification settings - Fork 212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(swingset): clean up promise c-list entries during vat deletion #10268
Conversation
Deploying agoric-sdk with Cloudflare Pages
|
b0c4f74
to
5883856
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I must have leveled up, because following these changes was not too bad.
work.promises += 1; | ||
remaining -= 1; | ||
if (remaining <= 0) { | ||
return { done: false, work }; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This cleanupAfterTerminatedVat
function is getting a bit unwieldy... I wonder if things would be better or worse if we abstracted some boilerplate?
No suggestion at this point, though.
packages/SwingSet/test/vat-admin/slow-termination/slow-termination.test.js
Outdated
Show resolved
Hide resolved
b730fe2
to
74c7a0c
Compare
Two tests are updated to exercise the cleanup of promise c-list entries during vat termination. `terminate.test.js` adds some promises to the c-list and then checks their refcounts after termination, to demonstrate that bug #10261 is leaking a refcount when it deletes the dead vat's c-list entry without also decrementing the refcount. `slow-termination.test.js` adds a number of promises to the c-list, and expects the budget-limited cleanup to spend a phase on promises. Both tests are marked as failing until the code fix is landed in the next commit.
Previously, when a vat was terminated, and we delete the promise c-list entries from its old state, the cleanup code was failing to decrement the kpid's refcount properly. This resulted in a leak: those promises could never be retired. This commit updates the vat cleanup code to add a new phase, named `promises`. This executes after `exports` and `imports`, but before `kv`, and is responsible for both deleting the c-list entries and also decrementing the refcounts of the corresponding promises. We do this slowly, like we do exports and imports, because we don't know how many there might be, and because those promise records might hold references to other objects (in the resolution data), which could trigger additional work. However, this work is unlikely to be significant: the run-queue is usually empty, so these outstanding promises are probably unresolved, and thus cannot beholding resolution data. All promises *decided* by the dead vat are rejected by the kernel immediately during vat termination, because those rejections are visible to userspace in other vats. In contrast, freeing the promise records is *not* visible to userspace, just like how freeing imports or exports are not visible to userspace, so this cleanup is safe to do at a leisurely pace, rate-limited by `runPolicy.allowCleanup`. The docs are updated to reflect the new `runPolicy` API: * `budget.promises` is new, and respected by slow cleanup * `work.promises` is reported to `runPolicy.didCleanup()` The 'test.failing' marker was removed from the previously updated tests. I don't intend to add any remediation code: it requires a full refcount audit to find such promises, and the mainnet kernel has only ever terminated one vat so far, so I believe there cannot be very many leaked promises, if any. Once this fix is applied, no new leaks will occur. fixes #10261
74c7a0c
to
08e5dc9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks fairly straightforward. The tests do seem very descriptive, which I'm worried will complicate maintenance, but that's out of scope for this change.
for (let i = 0; i < 20; i += 1) { | ||
myImports.push(await E(root).sendExport()); | ||
} | ||
|
||
// also 10 imported promises | ||
for (let i = 0; i < 10; i += 1) { | ||
await E(root).acceptImports(makePromiseKit().promise); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, these eventual sends also result in imported / externally decided promises but since they're immediately settled, they do not participate in any cleanup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, the acceptImports
result promises are settled (and the notifies delivered) during the bootstrap run, driven by the await controller.run()
on line 110 of slow-termination.test.js
. The makePromiseKit().promise
payload of those messages are never resolved (they remain decided by vat-bootstrap, but we've thrown away the settlement functions), so they'll still be in the c-list when the vat is terminated. The exports will be rejected when the target vat is terminated, and removed from the c-list by the run()
on line 191. But the imports will still be sitting there when the slow deletion happens.
Previously, when a vat was terminated, and we delete the promise
c-list entries from its old state, the cleanup code was failing to
decrement the kpid's refcount properly. This resulted in a leak: those
promises could never be retired.
This commit updates the vat cleanup code to add a new phase, named
promises
. This executes afterexports
andimports
, but beforekv
, and is responsible for both deleting the c-list entries and alsodecrementing the refcounts of the corresponding promises. We do this
slowly, like we do exports and imports, because we don't know how many
there might be, and because those promise records might hold
references to other objects (in the resolution data), which could
trigger additional work. However, this work is unlikely to be
significant: the run-queue is usually empty, so these outstanding
promises are probably unresolved, and thus cannot beholding resolution
data.
All promises decided by the dead vat are rejected by the kernel
immediately during vat termination, because those rejections are
visible to userspace in other vats. In contrast, freeing the promise
records is not visible to userspace, just like how freeing imports
or exports are not visible to userspace, so this cleanup is safe to do
at a leisurely pace, rate-limited by
runPolicy.allowCleanup
.The docs are updated to reflect the new
runPolicy
API:budget.promises
is new, and respected by slow cleanupwork.promises
is reported torunPolicy.didCleanup()
I don't intend to add any remediation code: it requires a full
refcount audit to find such promises, and the mainnet kernel has only
ever terminated one vat so far, so I believe there cannot be very many
leaked promises, if any. Once this fix is applied, no new leaks will
occur.
fixes #10261