Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a helper abstracting a vow as a promise #9280

Open
1 of 4 tasks
raphdev opened this issue Apr 23, 2024 · 0 comments
Open
1 of 4 tasks

Provide a helper abstracting a vow as a promise #9280

raphdev opened this issue Apr 23, 2024 · 0 comments
Assignees
Labels
asyncFlow related to membrane-based replay and upgrade of async functions enhancement New feature or request

Comments

@raphdev
Copy link
Contributor

raphdev commented Apr 23, 2024

What is the Problem Being Solved?

Vows are pseudo-promises used in place of regular promises that are tolerant of upgrades. They maintain continuity in state between vats that are dependent on promise fulfillment. See #8906 for a detailed description.

The orchestration API is expected to rely on longer-lived promises that need to tolerate upgrades at various points. It will need to make use of vows, however we should provide an ergonomic way to use them.

Description of the Design

From #9097:

Upgrade while suspended at await points! Uses membrane to log and replay everything that happened before each upgrade.

In the first incarnation, somewhere, using a closed async function argument

const wrapperFunc = asyncFlow(
  zone, 
  'funcName`, 
  async (...) => {... await ...; ...},
);

then elsewhere, as often as you'd like

const outcomeVow = wrapperFunc(...);
For all these asyncFlow calls that happened in the first incarnation, in the first crank of all later incarnations

asyncFlow(
  zone, 
  'funcName`, 
  async (...) => {... await ...; ...},
);

with async functions that reproduce the original's logged behavior. In these later incarnations, you only need to capture the returned wrapperFunc if you want to create new activations. Regardless, the old activations continue.

Security Considerations

From #9097

Nothing enforces that the argument async function is closed, i.e., does not capture (lexically "close over") any direct access to mutable state or ability to cause effects. If it does, it still cannot harm anything but itself. But it -- or its replayings -- may be more confusable, and so more vulnerable to confusion attacks.

Since this entire framework itself is written as a helper (well, a huge helper) with no special privilege, it cannot be used to do anything that could not have otherwise been done. It is not a source of new authority.

See caveats in Upgrade Considerations about isolation of effects following a replay failure.

See PR for in-depth discussions.

Scaling Considerations

See #9097.

Testing & Upgrade plan

  1. 2 of 4
    asyncFlow
    mhofman
  2. 0 of 1
    BLOCKED asyncFlow bug
    turadg
  3. asyncFlow bug
    erights michaelfig
@raphdev raphdev added the enhancement New feature or request label Apr 23, 2024
@erights erights added the asyncFlow related to membrane-based replay and upgrade of async functions label Apr 25, 2024
mergify bot pushed a commit that referenced this issue May 19, 2024
closes: #9302 
refs: #9125, #9126 #9153 #9154, #9280

## Description

Upgrade while suspended at `await` points! Uses membrane to log and
replay everything that happened before each upgrade.

In the first incarnation, somewhere, using a ***closed*** async function
argument
```js
const wrapperFunc = asyncFlow(
  zone, 
  'funcName`, 
  async (...) => {... await ...; ...},
);
```
then elsewhere, as often as you'd like
```js
const outcomeVow = wrapperFunc(...);
```

For all these `asyncFlow` calls that happened in the first incarnation,
in the first crank of all later incarnations
```js
asyncFlow(
  zone, 
  'funcName`, 
  async (...) => {... await ...; ...},
);
```
with async functions that reproduce the original's logged behavior. In
these later incarnations, you only need to capture the returned
`wrapperFunc` if you want to create new activations. Regardless, the old
activations continue.

#### Future Growth

I designed this PR so it could grow in the following ways:

- TODO: The membrane should use the `HandledPromise` API to make proper
remote presences and handled promises, so that the guest function can
use `E` on objects or promises it receives from the host as expected. I
commented out the additional ops needed for these: `doSend` and
`checkSend`.

- TODO: Currently, I assume that the host side only presents vows, not
promises. However, imported remote promises can be stored durably, and
be durably watched, so the membrane should be extended to handle those.

- TODO: We currently impose the restriction that the guest cannot export
to the host guest-created remotables or promises. (It can pass back to
the host remotables or promises it received from the host.) I commented
out the additional ops needed for these: `doCall`, `checkReturn` and
`checkThrow`. I wrote the `bijection` and `equate` subsystems so that
old durable host wrappers can be hooked back up on replay to the
corresponding new guest remotables and promises.

### Security Considerations

Nothing enforces that the argument async function is closed, i.e., does
not capture (lexically "close over") any direct access to mutable state
or ability to cause effects. If it does, it still cannot harm anything
but itself. But it -- or its replayings -- may be more confusable, and
so more vulnerable to confusion attacks.

Since this entire framework itself is written as a helper (well, a huge
helper) with no special privilege, it cannot be used to do anything that
could not have otherwise been done. It is not a source of new authority.

See caveats in Upgrade Considerations about isolation of effects
following a replay failure.

### Scaling Considerations

We assume that the total number of async functions, i.e., calls to
`asyncFlow`, are low cardinality. This is essential to the design, in
exactly the same sense as our assumption that exoClasses are low
cardinality. The RAM cost is proportional to the number of these.

The current implementation further assumes that the total number of
activations of these replayable async functions are low cardinality.
This will likely be a scaling problem at some point, but is unlikely to
be painful for the initial expected use cases.

The current implementation imposes two simplifying restrictions that
allow us to relieve some of this memory pressure: Currently, the async
function argument cannot make and export new remotables or promises to
the host. Thus, once its outcomeVow is settled, its job is done. There
is very little more it can do that is observable. Thus, once this
outcome is settled, the activation is shut down and most of the memory
it held onto is dropped.

Of the activations not shut down, they must replay from the beginning in
each incarnation. If a given activation has a long history of past
activity, this can become expensive.

How do we verify in CI that when an asyncFlow is in use & when it has
completed, resource usage in RAM & on disk meet our expectations?

The PR assumes `low cardinality` of asyncFlows. what scale is `low
cardinality` - Is 10^3, 10&5? What is the risk if cardinality is too
high?

### Documentation Considerations

For the normal developer documentation, `asyncFlow` should make things
simpler and more like "just JavaScript". The membrane translates between
host vows and guest promises, so the async function can simply `await`
on promises without needing the `when` from `@agoric/vow`.

### Testing Considerations

This PR is structured as a tower of building blocks, where I unit tested
each as I went, in bottom up order, in order to build with confidence.
Currently, each of these building blocks is also very paranoid about
internal consistency checking, so I'd get early indications if I made a
mistake. Probably some of this internal consistency checking can be
reduced over time, as we gain more static confidence.

This PR is currently using the fake upgrade testing framework from the
`@agoric/zone` package. This caused bug #9126. Instead, we need to redo
all these tests with a real upgrade testing framework, like the one in
bootstrapTests. See
https://github.com/Agoric/agoric-sdk/blob/master/packages/boot/test/bootstrapTests/test-zcf-upgrade.ts


### Upgrade Considerations

The point.

In an reviving incarnation, if the async function argument of
```js
asyncFlow(
  zone, 
  'funcName`, 
  async (...) => {... await ...; ...},
);
```
fails to recapitulate the logs of each of its activations, those
activations will not have done any damage. They will simply be stuck,
with a diagnostic available via
```js
adminAsyncFlow.getFailures(),
```
To unstick these, so those stuck activations can continue to make
progress, upgrade again using an async function argument that does
reproduce the logged behavior of each of its activations.

#### Caveat: Imperfect isolation of effects following a replay failure

Once a replay failure is detected, we attempt to isolate the replaying
activation from its outside world, and to also shut down its further
execution as soon as possible. But it may be in the midst of activity
with a non-empty stack. Following a replay failure, we have no way to
preemptively terminate it without any further execution of the
activation. This further execution may therefore be arbitrarily
confused. We simply try to isolate it as much as possible, immediately
revoking all access it had through the membrane to all authority to
cause observable effects. However,
- We do not consider `console` logging activity to be an observable
effect. These might be caused by diagnostics emitted by this framework
in response to its "isolated" confused behavior.
- Because we do not consider `console` logging to be an observable
effect, we also allow this as an exception to our closed function rule.
Messages it sends directly to the console are not logged, and can differ
without causing replay failure. During its post-replay-failure confused
execution, it can still directly log to the console.
- It is not resource limited, so its post-replay confused execution can
accidentally engage in resource exhaustion attacks, including infinite
loops. However, the vat as a whole is resource limited. An infinite loop
will eventually crash the vat, which can then be recovered with yet
another upgrade.
- Because of metering, an activation that executed successfully in a
previous incarnation might not replay correctly, even if it doesn't
cause any explicit side-effects. That's because metering is a hidden
side-effect of any execution.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
asyncFlow related to membrane-based replay and upgrade of async functions enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants