Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor deliverAndLogToVat kernel code #4687

Closed
warner opened this issue Feb 27, 2022 · 0 comments
Closed

refactor deliverAndLogToVat kernel code #4687

warner opened this issue Feb 27, 2022 · 0 comments
Assignees
Labels
enhancement New feature or request SwingSet package: SwingSet
Milestone

Comments

@warner
Copy link
Member

warner commented Feb 27, 2022

What is the Problem Being Solved?

To make the implementation of #1848 vat upgrade easier, I'd like to refactor the way kernel.js processes deliveries to vats. We currently pull items off the various run queues (GC Actions, BringOutYourDead, and the main run-queue) and send them each through processDeliveryMessage. This fans out into several different helpers:

flowchart LR
processDeliveryMessage --send--> deliverToTarget
deliverToTarget --> deliverToVat
deliverToTarget --> resolveToError
deliverToTarget --> queuePending["queue on unresolved promise"]
deliverToVat --> deliverAndLogToVat
deliverToVat --> resolveToError
processDeliveryMessage --notify--> processNotify
processNotify --> deliverAndLogToVat
processDeliveryMessage --create-vat--> processCreateVat
processDeliveryMessage --start-vat--> processStartVat
processStartVat --> deliverAndLogToVat
processDeliveryMessage --update-vat--> processUpgradeVat
processDeliveryMessage --reap--> processBringOutYourDead
processBringOutYourDead --> deliverAndLogToVat
processDeliveryMessage --gc-action--> procesGCMessage
processGCMessage --> deliverAndLogToVat
Loading

At the end of processDeliveryMessage, we check terminationTrigger to see if "the" vat died during the delivery, and then unwind cranks or do other cleanup work if necessary. Most of the helper functions return a policyInput, and there's not a great way to report an error back. Errors or other delivery status might include:

  • syscall violation (vat should be terminated or paused, crank unwound)
  • worker died from a hard metering limit (vat should be terminated or paused, crank unwound)
  • vat asked to terminate (delete vat state, notify followers, commit crank)

Once the delivery status is known, the caller should handle metering: deduct computrons, check for underflow, either terminate or pause the vat.

After that point, we might know what to do with the worker. If the vat was terminated, we should definitely stop the worker. If something went wrong and we decide to pause the vat, we should stop the worker (and tell vat-warehouse to refrain from saving a heap snapshot, so if/when we resume the vat, we'll start from a new worker that wakes up just before the half-finished delivery).

I know that vat-upgrade will need to be handled by the following sequence:

  • deliver a BOYD (to flush any durable-object caches) and wait for it to complete successfully
    • if this fails, the vat has the same problems as if a BOYD had failed earlier
    • it's ok if we commit the crank at this point, but we don't have to
    • vat-warehouse does not need to record a heap snapshot, we're throwing them away anyways
  • instruct vat-warehouse to discard the worker, delete the transcript
  • change the vat's source-code record and options to point to the new BundleID and VatParameters
  • deliver a startVat and wait for it to complete successfully
    • this will cause vat-warehouse to bring the vat online, knowing about the new bundle but not yet loading it
    • then dispatch.startVat causes the bundle to be evaluated (possibly failing, if top-level code is broken), and buildRootObject() (or upgrade() or whatever design better vat-facing upgrade protocol (buildRootObject()? upgrade()?) #4382 uses) gets called (possibly failing, or failing to reconnect/honor all of the previous version's obligations and Kinds)
    • if startVat is successful, notify the upgrade parent of the success, and commit the crank
    • if it fails, unwind the whole crank (reverting all the state changes), tell VW to discard the worker, notify the upgrade parent of the failure
      • this should reset back to the previous version, allowing the next message to the not-upgraded vat to bring the old version online (from the previous heap snapshot, and partial transcript)

So I want processVatUpgrade to be able to perform two deliveries. And I need a place for code to tell VW to discard a worker.

Description of the Design

TBD

Security Considerations

Test Plan

@warner warner added enhancement New feature or request SwingSet package: SwingSet labels Feb 27, 2022
@warner warner self-assigned this Feb 27, 2022
warner added a commit that referenced this issue Mar 2, 2022
This moves some logic out of `deliverAndLogToVal` and into a higher-level
dispatch function, where it can react to all failure modes in a single place.

We pull "run-queue events" off the run-queue (currently named the "acceptance
queue". Many of these events are aimed at a specific vat, including 'notify',
but the primary one is 'send' and is aimed at a *target* kref, which is
either a kernel object or a kernel promise. For these we use `routeSendEvent`
to either queue the event on a promise queue, reject it if it can't be
delivered, or deliver it to a specific vat. This is the part that will change
with #3465: in that world, each vat-input-queue exists for a specific vat, so
the routing step moves earlier (into a "routing crank" that pulls events from
a vat-output-queue and possibly adds them to a vat-input-queue).

Some kinds of deliveries care about metering (or don't), and some have
opinions about what should happen if a crank gets unwound. These are now
reported as additional properties in the DeliveryStatus object that is
returned by this delivery path. `processDeliveryMessage` looks at these
properties, plus the delivery status (if any) to decide what to do at the end
of the crank.

I think most of the behavior should be the same as before. One change is that
the `runPolicy` will probably get more information about non-'send' cranks
than before (e.g. `create-vat` might report metering).

This refactoring should make it easier to implement #1848 vat-upgrade, as
well as #3465 queueing changes.

refs #4687
(probably doesn't close it, but comes close)
warner added a commit that referenced this issue Mar 2, 2022
This moves some logic out of `deliverAndLogToVal` and into a higher-level
dispatch function, where it can react to all failure modes in a single place.

We pull "run-queue events" off the run-queue (currently named the "acceptance
queue". Many of these events are aimed at a specific vat, including 'notify',
but the primary one is 'send' and is aimed at a *target* kref, which is
either a kernel object or a kernel promise. For these we use `routeSendEvent`
to either queue the event on a promise queue, reject it if it can't be
delivered, or deliver it to a specific vat. This is the part that will change
with #3465: in that world, each vat-input-queue exists for a specific vat, so
the routing step moves earlier (into a "routing crank" that pulls events from
a vat-output-queue and possibly adds them to a vat-input-queue).

Some kinds of deliveries care about metering (or don't), and some have
opinions about what should happen if a crank gets unwound. These are now
reported as additional properties in the DeliveryStatus object that is
returned by this delivery path. `processDeliveryMessage` looks at these
properties, plus the delivery status (if any) to decide what to do at the end
of the crank.

I think most of the behavior should be the same as before. One change is that
the `runPolicy` will probably get more information about non-'send' cranks
than before (e.g. `create-vat` might report metering).

This refactoring should make it easier to implement #1848 vat-upgrade, as
well as #3465 queueing changes.

closes #4687
warner added a commit that referenced this issue Mar 2, 2022
This moves some logic out of `deliverAndLogToVal` and into a higher-level
dispatch function, where it can react to all failure modes in a single place.

We pull "run-queue events" off the run-queue (currently named the "acceptance
queue". Many of these events are aimed at a specific vat, including 'notify',
but the primary one is 'send' and is aimed at a *target* kref, which is
either a kernel object or a kernel promise. For these we use `routeSendEvent`
to either queue the event on a promise queue, reject it if it can't be
delivered, or deliver it to a specific vat. This is the part that will change
with #3465: in that world, each vat-input-queue exists for a specific vat, so
the routing step moves earlier (into a "routing crank" that pulls events from
a vat-output-queue and possibly adds them to a vat-input-queue).

Some kinds of deliveries care about metering (or don't), and some have
opinions about what should happen if a crank gets unwound. These are now
reported as additional properties in the DeliveryStatus object that is
returned by this delivery path. `processDeliveryMessage` looks at these
properties, plus the delivery status (if any) to decide what to do at the end
of the crank.

I think most of the behavior should be the same as before. One change is that
the `runPolicy` will probably get more information about non-'send' cranks
than before (e.g. `create-vat` might report metering).

This refactoring should make it easier to implement #1848 vat-upgrade, as
well as #3465 queueing changes.

closes #4687
warner added a commit that referenced this issue Mar 3, 2022
This moves some logic out of `deliverAndLogToVal` and into a higher-level
dispatch function, where it can react to all failure modes in a single place.

We pull "run-queue events" off the run-queue (currently named the "acceptance
queue". Many of these events are aimed at a specific vat, including 'notify',
but the primary one is 'send' and is aimed at a *target* kref, which is
either a kernel object or a kernel promise. For these we use `routeSendEvent`
to either queue the event on a promise queue, reject it if it can't be
delivered, or deliver it to a specific vat. This is the part that will change
with #3465: in that world, each vat-input-queue exists for a specific vat, so
the routing step moves earlier (into a "routing crank" that pulls events from
a vat-output-queue and possibly adds them to a vat-input-queue).

Some kinds of deliveries care about metering (or don't), and some have
opinions about what should happen if a crank gets unwound. These are now
reported as additional properties in the DeliveryStatus object that is
returned by this delivery path. `processDeliveryMessage` looks at these
properties, plus the delivery status (if any) to decide what to do at the end
of the crank.

I think most of the behavior should be the same as before. One change is that
the `runPolicy` will probably get more information about non-'send' cranks
than before (e.g. `create-vat` might report metering).

This refactoring should make it easier to implement #1848 vat-upgrade, as
well as #3465 queueing changes.

closes #4687
@mergify mergify bot closed this as completed in 876689e Mar 3, 2022
@Tartuffo Tartuffo added this to the Mainnet 1 milestone Mar 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request SwingSet package: SwingSet
Projects
None yet
Development

No branches or pull requests

2 participants