add Meters, assign to dynamic vats to track compute usage #3508

warner · 2021-07-22T07:26:26Z

warner · 2021-07-22T18:01:37Z

Still to do:

change API to provide/require BigInts everywhere: the per-crank meter consumption is limited to whatever C type XS uses (probably a 32-bit integer), but the long-term Meter value will accumulate many cranks (thousands?), so some more headroom would be better
figure out the TypeScript complaints
split into smaller PRs, the first few commits are refactorings that could land earlier
add a test of a Meter underflowing and reaching the notify threshold in the same crank (to make sure the notification message isn't unwound along with dead vat's messages)
document the new Meter API
add createInfiniteMeter, as a placeholder for the metered: true that Zoe et al are currently using
update all external callers to use it

warner · 2021-07-24T21:30:21Z

I'm still working on docs, but I'm not going to be changing any of the actual code, so I think this is ready for reviewers to look at.

warner · 2021-07-24T21:32:24Z

Er, ok I gotta fix those tests too.

warner · 2021-07-24T22:01:53Z

Oh, right, I removed metered: from the createVat options, but didn't provide a replacement. I'm going to add a createInfiniteMeter for things like Zoe to use for now, until @katelynsills changes it to use a charge account.

This introduces the "meterID" and the "meter record": a pair of Nats (`remaining` and `threshold`). kernelKeeper functions are added to create and manipulate them, and test-state.js is enhanced to exercise these. refs #3308

This introduces user-visible Meter objects, and allows new dynamic vats to be bound to a single Meter, both exposed through the vatAdmin facet. Meters are long-term reservoirs of execution credits, denominated in "computrons". Each bound vat will deduct credits from its meter until exhausted, at which point the vat will be terminated. This limits the long-term CPU consumption of a vat, in addition to the fixed per-crank computron limit applied to any metered vat. Meters can be refilled and queried through their API. Each meter also has a Notifier, and a configurable notification threshold: the notifier will be updated if/when the remaining credits drop below the threshold. This should allow a supervisor in userspace enough time to refill the meter before the associated vat(s) are terminated. See `docs/metering.md` for documentation. Some notes: * The vatAdmin facet now offers `createMeter()`, which returns a `Meter` object with methods to manipulate its `remaining` and `threshold` values, as well as a `getNotifier` to subscribe to threshold-passing events. * It also offers `createUnlimitedMeter()`, which never deducts. * The vatAdmin `createVat()` call now takes a `meter: Meter` option instead of `metered: boolean`. If a Meter is provided, two things happen: * Each delivery to that (XS) vat is subject to a per-crank compute limit. * Each delivery deducts the compute usage from the Meter. * When a Meter's `remaining` drops below its `threshold`, the notifier is triggered with the current `remaining` value. The actual Meter's value might have changed by the time the subscriber hears about the update. * When a vat's Meter reaches zero, the vat is terminated, just as if it had violated the per-crank limit. * Currently the termination message (used to reject the control facet's `.done()` Promise) is different for the per-crank limit vs the Meter limit, but this may change. * Meter deductions and threshold notifications are stashed in a new 'postAbortActions' record, to make sure they happen even if the crank is aborted and all other state changes are unwound. * The vatManager `managerOptions` still use `metered: boolean`, because the vat manager doesn't know about Meters: it only need to know whether to apply the per-crank limits or not. closes #3308

Now that dynamic vats require a Meter object in the `meter:` option, rather than just a `metered:` boolean flag, the swingset-runniner `--meter` option doesn't work anymore. I'm removing it to allow the test to keep passing. Eventually it might be interesting to improve the CLI arguments to create and provide a Meter, but the need for user-level code to supervise and refill it makes that non-trivial.

Zoe and spawner use `createVat()` to do their jobs. Previously, they used `{ metered: true }`, which provided a per-crank limit but not cumulative limit. This changes both to use `createUnlimitedMeter()`, then pass that as a `{ meter }` option, to achieve the same effect. When Zoe is ready to maintain (and refill) a Meter, change that to use `createMeter()` instead of `createUnlimitedMeter()`. This also adds `createMeter` and `createUnlimitedMeter` methods to the fake vatAdmin objects used by the zoe/spawner/pegasus unit tests, so the new Zoe/spawner code will work against the mocks. refs #3308

This improves the error messages when something goes wrong.

warner · 2021-07-25T00:15:48Z

Ok, adding the "unlimited" meter should fix that. Ready for review. I have one test to add, but no further code changes to make.

When a crank causes the compute Meter to both hit zero *and* cross the notification threshold at the same time, we need extra code to make sure the run-queue push of the notification message does not get deleted by the `abortCrank()` that unwinds the vat's side-effects. The mechanism I wrote for this worked, but was overkill. After writing a test for it, I noticed the test still passed even if I commented out the mechanism that I thought was necessary. I simplified that code (we only need to repeat the `deductMeter`, because that will take care of repeating the notification), and added the test. refs #3308

FUDCo

Whew!

packages/SwingSet/docs/metering.md

FUDCo · 2021-07-25T07:10:57Z

packages/SwingSet/docs/metering.md

+
+## Meter Objects
+
+The kernel manages `Meter` objects. Each one has a `remaining` capacity and a notification `threshold`. The Meter has a `Notifier` which can inform interested parties when the capacity drops below the threshold, so they can refill it before any associated vats are in danger of being terminated due to an underflow.


Seems like we'd want to avoid (or at least, provide the means to avoid if desired) the race between the notification going out to be acted upon and ongoing further computation in the vat underflowing the meter. Instead of just a threshold for notification, it would make sense to me to have a threshold that when reached causes computation to stop in that vat until the meter is replenished, since this is a recoverable condition whereas meter underflow is not (though we'd need an answer as to who pays the freight for any message traffic that piles up in the meantime).

Yeah, that's a great idea, although I think we'll need more sophistication in the scheduler first. We need a state for message to be in that says "I want to run this, everything that came before it has been delivered, but for various reasons this one is blocked". I don't know what ordering properties we should impose on the messages that follow: maybe it's just everything for the "paused in arrears" vat that gets stalled, but I think "E order" has something to say about that.

The #3517 "Flow Escalator" approach might provide a mechanism for this, when a flow's first message is blocked, the whole flow gets moved onto a special pseudo-queue that doesn't get serviced until the target vat is resumed.

I like @FUDCo's idea, which seems related to @zarutian's ideas:

Perhaps chargeAccounts could be „over-charged“, that is go into negative. When that happens ZCFVats of the contract instances drawing from that chargeAccount wont run until the chargeAccount is in the positive again.

He goes on to explain a mechanism for how users can add to the chargeAccount for a contract instance.

Termination seems too extreme, if stopping delivering messages is plausible. Making this a future goal sounds like the right thing to me.

tracked in #3528

FUDCo · 2021-07-25T07:12:36Z

packages/SwingSet/docs/metering.md

+
+## runPolicy
+
+TODO: The host application can limit the number of cranks processed in a single call to `controller.run()` by providing a `runPolicy` object. This policy object is informed about each crank and the number of computrons it consumed. By comparing the cumulative computrons against an experimentally (and externally) determined threshold, the `runLimit` object can tell the kernel to stop processing before the run-queue is drained. For a busy kernel, with an ever-increasing amount of work to do, this can limit the size of a commitment domain (e.g. the "block" in a blockchain / consensus machine).


Ah, this is a partial answer to my above comment.

* add examples of computron usage numbers * fix typo

katelynsills

LGTM! One question about the relationship between enforcing a per-crank limit and subtracting from a meter. It seems like we might want them separately, so perhaps the settings should be separate.

katelynsills · 2021-07-25T16:25:42Z

packages/spawner/test/swingsetTests/contractHost/test-contractHost.js

-import test from 'ava';
+// eslint-disable-next-line import/order
+import { test } from '@agoric/swingset-vat/tools/prepare-test-env-ava.js';
+// eslint-disable-next-line import/order


Huh, I haven't seen this eslint setting before. Tiny thing, but can you tell me more about why we need to disable it?

yarn lint-fix keeps rearranging my import statements, I think it wants all "external" imports to appear in alphabetical order, followed by all "internal" imports in order, and this idiom seems to work to keep the prepare-test-env-ava as the very first one (since it has side-effects and won't work if it appears too late). I wouldn't be surprised if there is a way to phrase it with just a single override, but I haven't found it yet, and I got tired of fighting with eslint so I stopped once I got something that worked :).

I think you may be running a different (newer) version of ESLint than I am. My version (which I believe is the agoric-sdk version, but I should check) doesn't enforce that. Here's the output after removing the eslint-disable:

katelynsills · 2021-07-25T16:49:15Z

packages/SwingSet/docs/metering.md

+
+## Meter Objects
+
+The kernel manages `Meter` objects. Each one has a `remaining` capacity and a notification `threshold`. The Meter has a `Notifier` which can inform interested parties when the capacity drops below the threshold, so they can refill it before any associated vats are in danger of being terminated due to an underflow.


I like @FUDCo's idea, which seems related to @zarutian's ideas:

Perhaps chargeAccounts could be „over-charged“, that is go into negative. When that happens ZCFVats of the contract instances drawing from that chargeAccount wont run until the chargeAccount is in the positive again.

He goes on to explain a mechanism for how users can add to the chargeAccount for a contract instance.

Termination seems too extreme, if stopping delivering messages is plausible. Making this a future goal sounds like the right thing to me.

katelynsills · 2021-07-25T16:53:51Z

packages/SwingSet/docs/metering.md

+
+The default (omitting a `meter` option) leaves the vat unmetered.
+
+Assigning a Meter to a vat activates the per-crank limit. To achieve a per-crank limit without a Meter object (which must be refilled occasionally to keep the vat from being terminated), use an unlimited meter:


Hmm, that seems like a setting that we might want to make explicit, especially since per-crank limits don't really have much to do with reducing a meter.

I'll add this as a future goal to consider. My half-considered opinion is that having a per-crank limit but not a Meter isn't really much of a limit: it protects against infinite loops but not infinite send-to-self cycles (with the assistance of a second vat, e.g. purse.getCurrentAmount()). So a vat in that state could consume effectively all of the CPU time.

The only use case I can think of for that mode would be the REPL, which is in a sort of intermediate state between "protect against accidents" but "trust that the code isn't malicious" / "if you break it, you get to keep both pieces". I added UnlimitedMeter to serve as a bridge from the old behavior to the new (charge account) one, and maybe the REPL will stay on it, but I figured real contract vats should not be allowed to use it.

I'll add a ticket with these thoughts where we can think through the use cases. It wouldn't be hard to implement, but I didn't want to leave an attractive nuisance lying around in the API :).

added as #3527

warner force-pushed the 3308-meter-cranks branch from c345047 to 6a24d11 Compare July 22, 2021 17:22

katelynsills mentioned this pull request Jul 24, 2021

EPIC: Zoe Design Doc - Paying for Metering #3294

Open

warner force-pushed the 3308-meter-cranks branch from 6a24d11 to a301ac1 Compare July 24, 2021 21:04

warner changed the base branch from master to 3518-remove-local-metering July 24, 2021 21:04

warner added this to the Testnet: Metering Phase milestone Jul 24, 2021

warner added metering charging for execution (was: package: tame-metering and transform-metering) SwingSet package: SwingSet labels Jul 24, 2021

warner self-assigned this Jul 24, 2021

warner force-pushed the 3308-meter-cranks branch from a301ac1 to f5ade3a Compare July 24, 2021 21:29

warner requested review from FUDCo and katelynsills July 24, 2021 21:29

warner removed request for katelynsills and FUDCo July 24, 2021 22:02

warner force-pushed the 3308-meter-cranks branch 2 times, most recently from 39865e3 to 2491051 Compare July 24, 2021 23:50

warner added 5 commits July 24, 2021 17:14

feat(swingset): add Meters to kernel state

03f148b

This introduces the "meterID" and the "meter record": a pair of Nats (`remaining` and `threshold`). kernelKeeper functions are added to create and manipulate them, and test-state.js is enhanced to exercise these. refs #3308

chore(spawner): update one test to use prepare-test-env-ava.js

a886f41

This improves the error messages when something goes wrong.

warner force-pushed the 3308-meter-cranks branch from 2491051 to a886f41 Compare July 25, 2021 00:14

warner requested review from FUDCo and katelynsills July 25, 2021 00:16

warner marked this pull request as ready for review July 25, 2021 00:16

Base automatically changed from 3518-remove-local-metering to master July 25, 2021 07:29

FUDCo approved these changes Jul 25, 2021

View reviewed changes

warner added 2 commits July 25, 2021 01:14

chore: improve metering docs

dc63cc2

* add examples of computron usage numbers * fix typo

chore: teardown vats properly in test-dynamic-vat-metered.js

d3471b8

katelynsills approved these changes Jul 25, 2021

View reviewed changes

warner merged commit 4ea700d into master Jul 25, 2021

warner deleted the 3308-meter-cranks branch July 25, 2021 18:34

This was referenced Jul 25, 2021

consider separating Meter from per-crank limit #3527

Open

pause vats on meter underflow, rather than terminate #3528

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add Meters, assign to dynamic vats to track compute usage #3508

add Meters, assign to dynamic vats to track compute usage #3508

warner commented Jul 22, 2021

warner commented Jul 22, 2021 •

edited

Loading

warner commented Jul 24, 2021

warner commented Jul 24, 2021

warner commented Jul 24, 2021

warner commented Jul 25, 2021

FUDCo left a comment

FUDCo Jul 25, 2021

warner Jul 25, 2021

katelynsills Jul 25, 2021

warner Jul 25, 2021

FUDCo Jul 25, 2021

katelynsills left a comment

katelynsills Jul 25, 2021

warner Jul 25, 2021

katelynsills Jul 25, 2021

katelynsills Jul 25, 2021

katelynsills Jul 25, 2021

warner Jul 25, 2021

warner Jul 25, 2021


		## Meter Objects

		The kernel manages `Meter` objects. Each one has a `remaining` capacity and a notification `threshold`. The Meter has a `Notifier` which can inform interested parties when the capacity drops below the threshold, so they can refill it before any associated vats are in danger of being terminated due to an underflow.


		## runPolicy

		TODO: The host application can limit the number of cranks processed in a single call to `controller.run()` by providing a `runPolicy` object. This policy object is informed about each crank and the number of computrons it consumed. By comparing the cumulative computrons against an experimentally (and externally) determined threshold, the `runLimit` object can tell the kernel to stop processing before the run-queue is drained. For a busy kernel, with an ever-increasing amount of work to do, this can limit the size of a commitment domain (e.g. the "block" in a blockchain / consensus machine).


		The default (omitting a `meter` option) leaves the vat unmetered.

		Assigning a Meter to a vat activates the per-crank limit. To achieve a per-crank limit without a Meter object (which must be refilled occasionally to keep the vat from being terminated), use an unlimited meter:

add Meters, assign to dynamic vats to track compute usage #3508

add Meters, assign to dynamic vats to track compute usage #3508

Conversation

warner commented Jul 22, 2021

warner commented Jul 22, 2021 • edited Loading

warner commented Jul 24, 2021

warner commented Jul 24, 2021

warner commented Jul 24, 2021

warner commented Jul 25, 2021

FUDCo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

katelynsills left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

warner commented Jul 22, 2021 •

edited

Loading