Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add Meters, assign to dynamic vats to track compute usage #3508

Merged
merged 8 commits into from
Jul 25, 2021
114 changes: 114 additions & 0 deletions packages/SwingSet/docs/metering.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
# Metering CPU Usage

The Halting Problem is unsolvable: no amount of static analysis or human auditing can pre-determine how many steps an arbitrary Turing-complete program will take before it finishes, or if it will ever finish. To prevent the code in one vat from preventing execution of code in other vats (or the kernel itself), SwingSet provides a mechanism to limit the amount of computation that each vat can perform. Any vat which exceeds its limit is terminated, and any messages it sent before the limit was reached are cancelled.

Two limits can be imposed. The first is a per-crank limit. Each message delivered to a vat results in a sequence of "turns" known as a "crank". A crank is also triggered when the vat receives notification of a kernel-side promise being resolved or rejected. Cranks run until the vat stops adding work to the resolved-promise queue, and there is nothing left to do until the next message or notification arrives. A per-crank limit imparts a ceiling on the amount of computation that can be done during each crank, but does not say anything about the number of cranks that can be run.

The second limit spans multiple cranks and is managed by the "Meter": a variable-sized reservoir of execution credits. Each vat can be associated with a single Meter, and the remaining capacity of the Meter is reduced at the end of each crank by whatever amount the vat consumed during that crank. The Meter can be refilled by sending it a message, but if any crank causes the Meter's remaining value to drop below zero, the vat is terminated.

## The Computron

SwingSet measures computation with a unit named the "computron": the smallest unit of indivisible computation. The number of computrons used by a given piece of code depends upon its inputs, the state it can access, and the history of its previous activity, but it does *not* depend upon the activity of other vats, other processes on the same host computer, wall-clock time, or type of CPU being used (32-bit vs 64-bit, Intel vs ARM). The metering usage is meant to be consistent across any SwingSet using the same version of the kernel and vat code, which receives the same sequence of vat inputs (the transcript), making it safe to use in a consensus machine.

Metering is provided by low-level code in the JavaScript engine, which is counting basic operations like "read a property from an object" and "add two numbers". This is larger than a CPU cycle. The exact mapping depends upon intricate details of the engine, and is likely to change if/when the JS engine is upgraded. SwingSet kernels that participate in a consensus machine must be careful to synchronize upgrades to prevent divergence of metering results.

To gain some intuition on how "big" a computron is, here are some examples:

* An empty function: 36560 computrons. This is the base overhead for each message delivery (dispatch.deliver)
* Adding `async` to a function (which creates a return Promise): 98
* `let i = 1`: 3
* `i += 2`: 4
* `let sum; for (let i=0; i<100; i++) { sum += i; }`: 1412
* same, but adding to 1000: 14012
* defining a `harden()`ed add/read "counter" object: 1475
* invoking `add()`: 19
* `console.log('')`: 1011 computrons
* ERTP `getBrand()`: 49300
* ERTP `getCurrentAmount()`: 54240
* ERTP `getUpdateSince()`: 59084
* ERTP `deposit()`: 124775
* ERTP `withdraw()`: 111141
* Zoe `install()`: 62901
* ZCF `executeContract()` of the Multi-Pool Autoswap contract: 12.9M
* ZCF `executeContract()` (importBundle) of the Treasury contract: 13.5M

Computrons have a loose relationship to wallclock time, but are generally correlated, so tracking the cumulative computrons spent during SwingSet cranks can provide a rough measure of how much time is being spent, which can be useful to e.g. limit blocks to a reasonable amount of execution time.

The SwingSet Meter APIs accept and deliver computron values in BigInts.

## Meter Objects

The kernel manages `Meter` objects. Each one has a `remaining` capacity and a notification `threshold`. The Meter has a `Notifier` which can inform interested parties when the capacity drops below the threshold, so they can refill it before any associated vats are in danger of being terminated due to an underflow.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we'd want to avoid (or at least, provide the means to avoid if desired) the race between the notification going out to be acted upon and ongoing further computation in the vat underflowing the meter. Instead of just a threshold for notification, it would make sense to me to have a threshold that when reached causes computation to stop in that vat until the meter is replenished, since this is a recoverable condition whereas meter underflow is not (though we'd need an answer as to who pays the freight for any message traffic that piles up in the meantime).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's a great idea, although I think we'll need more sophistication in the scheduler first. We need a state for message to be in that says "I want to run this, everything that came before it has been delivered, but for various reasons this one is blocked". I don't know what ordering properties we should impose on the messages that follow: maybe it's just everything for the "paused in arrears" vat that gets stalled, but I think "E order" has something to say about that.

The #3517 "Flow Escalator" approach might provide a mechanism for this, when a flow's first message is blocked, the whole flow gets moved onto a special pseudo-queue that doesn't get serviced until the target vat is resumed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like @FUDCo's idea, which seems related to @zarutian's ideas:

Perhaps chargeAccounts could be „over-charged“, that is go into negative. When that happens ZCFVats of the contract instances drawing from that chargeAccount wont run until the chargeAccount is in the positive again.

He goes on to explain a mechanism for how users can add to the chargeAccount for a contract instance.

Termination seems too extreme, if stopping delivering messages is plausible. Making this a future goal sounds like the right thing to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tracked in #3528


Vats can create a Meter object by invoking the `createMeter` method on the `vatAdmin` object. This is the same object used to create new dynamic vats. `createMeter` takes two arguments, both denominated in computrons:

* `remaining`: sets the initial capacity of the Meter
* `threshold`: set the notification threshold

If you want to impose a per-crank limit, but not a cumulative limit, you can use `createUnlimitedMeter` to make a Meter that never deducts (`remaining` is always the special string `'unlimited'`) and never notifies.

```js
const remaining = 100_000_000n; // 100M computrons
const threshold = 20_000_000n: // notify below 20M
const meter = await E(vatAdmin).createMeter(remaining, threshold);
const umeter = await E(vatAdmin).createUnlimitedMeter();
```

The holder of a Meter object can manipulate the meter with the following API:

* `meter.addRemaining(delta)`: increment the capacity by some amount
* `meter.setThreshold(threshold)`: replace the notification threshold
* `meter.get() -> { remaining, threshold }`: read the remaining capacity and current notification threshold
* `meter.getNotifier() -> Notifier`: access the Notifier object

```js
await E(meter).get(); // -> { remaining: 100_000_000n, threshold: 20_000_000n }
await E(meter).setThreshold(50n);
await E(meter).get(); // -> { remaining: 100_000_000n, threshold: 50n }
await E(meter).addRemaining(999n);
await E(meter).get(); // -> { remaining: 100_000_999n, threshold: 50n }
```

## Notification

The meter's `remaining` value will be deducted over time. When it crosses below `threshold`, the Notifier is updated. This is an instance of `@agoric/notifier`:

```js
const notifier = await E(meter).getNotifier();
const initial = await E(notifier).getUpdateSince();
const p1 = E(notifier).getUpdateSince(initial);
p1.then(remaining => console.log(`meter down to ${remaining}, must refill`));
```

Note that the notification will occur only once for each transition from "above threshold" to "below threshold". So even if the vat continues to operate (and keeps deducting from the Meter), the notification will not be repeated.

The notification may be triggered again if the meter is refilled above the current threshold, or if the threshold is reduced below the current remaining capacity.

## Per-Crank Limits

The per-crank limit is currently hardcoded to 100M computrons, defined by `DEFAULT_CRANK_METERING_LIMIT` in `packages/xsnap/src/xsnap.js`. This has experimentally been determined to be sufficient for loading large contract bundles, which is the single largest operation we've observed so far.

This per-crank limit is intended to maintain fairness even among vats with a large Meter capacity: just because the Meter allows the vat to spend 17 hours of CPU time, we don't want it to spend it all at once. It also provides a safety mechanism when the vat is using an "unlimited" meter, which allows the vat to use as make cranks as it wants, but each crank is limited.

## Assigning Meters to Vats

Each vat can be associated with a single Meter. A Meter can be attached to multiple vats (although that may make it difficult to assign responsibility for the consumption it measures). To attach a Meter, include it in the options bag to the `vatAdmin`'s `createVat` or `createVatByName` methods:

```js
const control = await E(vatAdmin).createVat(bundle, { meter });
```

The default (omitting a `meter` option) leaves the vat unmetered.

Assigning a Meter to a vat activates the per-crank limit. To achieve a per-crank limit without a Meter object (which must be refilled occasionally to keep the vat from being terminated), use an unlimited meter:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that seems like a setting that we might want to make explicit, especially since per-crank limits don't really have much to do with reducing a meter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add this as a future goal to consider. My half-considered opinion is that having a per-crank limit but not a Meter isn't really much of a limit: it protects against infinite loops but not infinite send-to-self cycles (with the assistance of a second vat, e.g. purse.getCurrentAmount()). So a vat in that state could consume effectively all of the CPU time.

The only use case I can think of for that mode would be the REPL, which is in a sort of intermediate state between "protect against accidents" but "trust that the code isn't malicious" / "if you break it, you get to keep both pieces". I added UnlimitedMeter to serve as a bridge from the old behavior to the new (charge account) one, and maybe the REPL will stay on it, but I figured real contract vats should not be allowed to use it.

I'll add a ticket with these thoughts where we can think through the use cases. It wouldn't be hard to implement, but I didn't want to leave an attractive nuisance lying around in the API :).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added as #3527


```js
const meter = await E(vatAdmin).createUnlimitedMeter();
const control = await E(vatAdmin).createVat(bundle, { meter });
```

## runPolicy

TODO: The host application can limit the number of cranks processed in a single call to `controller.run()` by providing a `runPolicy` object. This policy object is informed about each crank and the number of computrons it consumed. By comparing the cumulative computrons against an experimentally (and externally) determined threshold, the `runLimit` object can tell the kernel to stop processing before the run-queue is drained. For a busy kernel, with an ever-increasing amount of work to do, this can limit the size of a commitment domain (e.g. the "block" in a blockchain / consensus machine).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this is a partial answer to my above comment.


This is a work in process, please follow issue #3460 for progress.
74 changes: 68 additions & 6 deletions packages/SwingSet/src/kernel/kernel.js
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
// @ts-check
import { assert, details as X } from '@agoric/assert';
import { importBundle } from '@agoric/import-bundle';
import { stringify } from '@agoric/marshal';
import { assertKnownOptions } from '../assertOptions.js';
import { makeVatManagerFactory } from './vatManager/factory.js';
import { makeVatWarehouse } from './vatManager/vat-warehouse.js';
Expand Down Expand Up @@ -374,12 +375,44 @@ export default function buildKernel(
}

let terminationTrigger;
let postAbortActions;

function resetDeliveryTriggers() {
terminationTrigger = undefined;
postAbortActions = {
meterDeductions: [], // list of { meterID, compute }
};
}
resetDeliveryTriggers();

function notifyMeterThreshold(meterID) {
// tell vatAdmin that a meter has dropped below its notifyThreshold
const { remaining } = kernelKeeper.getMeter(meterID);
const args = { body: stringify(harden([meterID, remaining])), slots: [] };
assert.typeof(vatAdminRootKref, 'string', 'vatAdminRootKref missing');
queueToKref(vatAdminRootKref, 'meterCrossedThreshold', args, 'logFailure');
}

function deductMeter(meterID, compute, firstTime) {
assert.typeof(compute, 'bigint');
const res = kernelKeeper.deductMeter(meterID, compute);

// We record the deductMeter() in postAbortActions.meterDeductions. If
// the delivery is rewound for any reason (syscall error, res.underflow),
// then deliverAndLogToVat will repeat the deductMeter (which will repeat
// the notifyMeterThreshold), so their side-effects will survive the
// abortCrank(). But we don't record it (again) during the repeat, to
// make sure exactly one copy of the changes will be committed.

if (firstTime) {
postAbortActions.meterDeductions.push({ meterID, compute });
}
if (res.notify) {
notifyMeterThreshold(meterID);
}
return res.underflow;
}

// this is called for syscall.exit (shouldAbortCrank=false), and for any
// vat-fatal errors (shouldAbortCrank=true)
function setTerminationTrigger(vatID, shouldAbortCrank, shouldReject, info) {
Expand All @@ -391,12 +424,13 @@ export default function buildKernel(
}
}

async function deliverAndLogToVat(vatID, kd, vd) {
async function deliverAndLogToVat(vatID, kd, vd, useMeter) {
// eslint-disable-next-line no-use-before-define
assert(vatWarehouse.lookup(vatID));
const vatKeeper = kernelKeeper.provideVatKeeper(vatID);
const crankNum = kernelKeeper.getCrankNumber();
const deliveryNum = vatKeeper.nextDeliveryNum(); // increments
const { meterID } = vatKeeper.getOptions();
/** @typedef { any } FinishFunction TODO: static types for slog? */
/** @type { FinishFunction } */
const finish = kernelSlog.delivery(vatID, crankNum, deliveryNum, kd, vd);
Expand All @@ -412,6 +446,21 @@ export default function buildKernel(
// probably a metering fault, or a bug in the vat's dispatch()
console.log(`delivery problem, terminating vat ${vatID}`, problem);
setTerminationTrigger(vatID, true, true, makeError(problem));
return;
}
if (deliveryResult[0] === 'ok' && useMeter && meterID) {
const metering = deliveryResult[2];
assert(metering);
const consumed = metering.compute;
assert.typeof(consumed, 'number');
const used = BigInt(consumed);
const underflow = deductMeter(meterID, used, true);
if (underflow) {
console.log(`meter ${meterID} underflow, terminating vat ${vatID}`);
const err = makeError('meter underflow, vat terminated');
setTerminationTrigger(vatID, true, true, err);
return;
}
}
} catch (e) {
// log so we get a stack trace
Expand All @@ -433,7 +482,7 @@ export default function buildKernel(
const kd = harden(['message', target, msg]);
// eslint-disable-next-line no-use-before-define
const vd = vatWarehouse.kernelDeliveryToVatDelivery(vatID, kd);
await deliverAndLogToVat(vatID, kd, vd);
await deliverAndLogToVat(vatID, kd, vd, true);
}
}

Expand Down Expand Up @@ -546,7 +595,7 @@ export default function buildKernel(
// eslint-disable-next-line no-use-before-define
const vd = vatWarehouse.kernelDeliveryToVatDelivery(vatID, kd);
vatKeeper.deleteCListEntriesForKernelSlots(targets);
await deliverAndLogToVat(vatID, kd, vd);
await deliverAndLogToVat(vatID, kd, vd, true);
}
}

Expand All @@ -570,7 +619,7 @@ export default function buildKernel(
}
// eslint-disable-next-line no-use-before-define
const vd = vatWarehouse.kernelDeliveryToVatDelivery(vatID, kd);
await deliverAndLogToVat(vatID, kd, vd);
await deliverAndLogToVat(vatID, kd, vd, false);
}

async function processCreateVat(message) {
Expand Down Expand Up @@ -682,9 +731,15 @@ export default function buildKernel(
// errors unwind any changes the vat made
abortCrank();
didAbort = true;
// but metering deductions and underflow notifications must survive
const { meterDeductions } = postAbortActions;
for (const { meterID, compute } of meterDeductions) {
deductMeter(meterID, compute, false);
// that will re-push any notifications
}
}
// state changes reflecting the termination must survive, so these
// happen after a possible abortCrank()
// state changes reflecting the termination must also survive, so
// these happen after a possible abortCrank()
terminateVat(vatID, shouldReject, info);
kernelSlog.terminateVat(vatID, shouldReject, info);
kdebug(`vat terminated: ${JSON.stringify(info)}`);
Expand Down Expand Up @@ -908,6 +963,13 @@ export default function buildKernel(
return vatID;
},
terminate: (vatID, reason) => terminateVat(vatID, true, reason),
meterCreate: (remaining, threshold) =>
kernelKeeper.allocateMeter(remaining, threshold),
meterAddRemaining: (meterID, delta) =>
kernelKeeper.addMeterRemaining(meterID, delta),
meterSetThreshold: (meterID, threshold) =>
kernelKeeper.setMeterThreshold(meterID, threshold),
meterGet: meterID => kernelKeeper.getMeter(meterID),
};

// instantiate all devices
Expand Down
21 changes: 11 additions & 10 deletions packages/SwingSet/src/kernel/loadVat.js
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ export function makeVatLoader(stuff) {

const allowedDynamicOptions = [
'description',
'metered',
'meterID',
'managerType', // TODO: not sure we want vats to be able to control this
'vatParameters',
'enableSetup',
Expand Down Expand Up @@ -133,13 +133,14 @@ export function makeVatLoader(stuff) {
*
* @param {number} options.virtualObjectCacheSize
*
* @param {boolean} [options.metered] if true,
* subjects the new dynamic vat to a meter that limits
* the amount of computation and allocation that can occur during any
* given crank. Stack frames are limited as well. The meter is refilled
* between cranks, but if the meter ever underflows, the vat is
* terminated. If false, the vat is unmetered. Defaults to false for
* dynamic vats; static vats may not be metered.
* @param {string} [options.meterID] If a meterID is provided, the new
* dynamic vat is limited to a fixed amount of computation and
* allocation that can occur during any given crank. Peak stack
* frames are limited as well. In addition, the given meter's
* "remaining" value will be reduced by the amount of computation
* used by each crank. The meter will eventually underflow unless it
* is topped up, at which point the vat is terminated. If undefined,
* the vat is unmetered. Static vats cannot be metered.
*
* @param {Record<string, unknown>} [options.vatParameters] provides
* the contents of the second argument to
Expand Down Expand Up @@ -199,7 +200,7 @@ export function makeVatLoader(stuff) {
isDynamic ? allowedDynamicOptions : allowedStaticOptions,
);
const {
metered = false,
meterID,
vatParameters = {},
managerType,
enableSetup = false,
Expand Down Expand Up @@ -231,7 +232,7 @@ export function makeVatLoader(stuff) {
const managerOptions = {
managerType,
bundle: vatSourceBundle,
metered,
metered: !!meterID,
enableDisavow,
enableSetup,
enablePipelining,
Expand Down
Loading