-
Notifications
You must be signed in to change notification settings - Fork 212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add Meters, assign to dynamic vats to track compute usage #3508
Conversation
c345047
to
6a24d11
Compare
Still to do:
|
6a24d11
to
a301ac1
Compare
a301ac1
to
f5ade3a
Compare
I'm still working on docs, but I'm not going to be changing any of the actual code, so I think this is ready for reviewers to look at. |
Er, ok I gotta fix those tests too. |
Oh, right, I removed |
39865e3
to
2491051
Compare
This introduces the "meterID" and the "meter record": a pair of Nats (`remaining` and `threshold`). kernelKeeper functions are added to create and manipulate them, and test-state.js is enhanced to exercise these. refs #3308
This introduces user-visible Meter objects, and allows new dynamic vats to be bound to a single Meter, both exposed through the vatAdmin facet. Meters are long-term reservoirs of execution credits, denominated in "computrons". Each bound vat will deduct credits from its meter until exhausted, at which point the vat will be terminated. This limits the long-term CPU consumption of a vat, in addition to the fixed per-crank computron limit applied to any metered vat. Meters can be refilled and queried through their API. Each meter also has a Notifier, and a configurable notification threshold: the notifier will be updated if/when the remaining credits drop below the threshold. This should allow a supervisor in userspace enough time to refill the meter before the associated vat(s) are terminated. See `docs/metering.md` for documentation. Some notes: * The vatAdmin facet now offers `createMeter()`, which returns a `Meter` object with methods to manipulate its `remaining` and `threshold` values, as well as a `getNotifier` to subscribe to threshold-passing events. * It also offers `createUnlimitedMeter()`, which never deducts. * The vatAdmin `createVat()` call now takes a `meter: Meter` option instead of `metered: boolean`. If a Meter is provided, two things happen: * Each delivery to that (XS) vat is subject to a per-crank compute limit. * Each delivery deducts the compute usage from the Meter. * When a Meter's `remaining` drops below its `threshold`, the notifier is triggered with the current `remaining` value. The actual Meter's value might have changed by the time the subscriber hears about the update. * When a vat's Meter reaches zero, the vat is terminated, just as if it had violated the per-crank limit. * Currently the termination message (used to reject the control facet's `.done()` Promise) is different for the per-crank limit vs the Meter limit, but this may change. * Meter deductions and threshold notifications are stashed in a new 'postAbortActions' record, to make sure they happen even if the crank is aborted and all other state changes are unwound. * The vatManager `managerOptions` still use `metered: boolean`, because the vat manager doesn't know about Meters: it only need to know whether to apply the per-crank limits or not. closes #3308
Now that dynamic vats require a Meter object in the `meter:` option, rather than just a `metered:` boolean flag, the swingset-runniner `--meter` option doesn't work anymore. I'm removing it to allow the test to keep passing. Eventually it might be interesting to improve the CLI arguments to create and provide a Meter, but the need for user-level code to supervise and refill it makes that non-trivial.
Zoe and spawner use `createVat()` to do their jobs. Previously, they used `{ metered: true }`, which provided a per-crank limit but not cumulative limit. This changes both to use `createUnlimitedMeter()`, then pass that as a `{ meter }` option, to achieve the same effect. When Zoe is ready to maintain (and refill) a Meter, change that to use `createMeter()` instead of `createUnlimitedMeter()`. This also adds `createMeter` and `createUnlimitedMeter` methods to the fake vatAdmin objects used by the zoe/spawner/pegasus unit tests, so the new Zoe/spawner code will work against the mocks. refs #3308
This improves the error messages when something goes wrong.
2491051
to
a886f41
Compare
Ok, adding the "unlimited" meter should fix that. Ready for review. I have one test to add, but no further code changes to make. |
When a crank causes the compute Meter to both hit zero *and* cross the notification threshold at the same time, we need extra code to make sure the run-queue push of the notification message does not get deleted by the `abortCrank()` that unwinds the vat's side-effects. The mechanism I wrote for this worked, but was overkill. After writing a test for it, I noticed the test still passed even if I commented out the mechanism that I thought was necessary. I simplified that code (we only need to repeat the `deductMeter`, because that will take care of repeating the notification), and added the test. refs #3308
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whew!
|
||
## Meter Objects | ||
|
||
The kernel manages `Meter` objects. Each one has a `remaining` capacity and a notification `threshold`. The Meter has a `Notifier` which can inform interested parties when the capacity drops below the threshold, so they can refill it before any associated vats are in danger of being terminated due to an underflow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like we'd want to avoid (or at least, provide the means to avoid if desired) the race between the notification going out to be acted upon and ongoing further computation in the vat underflowing the meter. Instead of just a threshold for notification, it would make sense to me to have a threshold that when reached causes computation to stop in that vat until the meter is replenished, since this is a recoverable condition whereas meter underflow is not (though we'd need an answer as to who pays the freight for any message traffic that piles up in the meantime).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's a great idea, although I think we'll need more sophistication in the scheduler first. We need a state for message to be in that says "I want to run this, everything that came before it has been delivered, but for various reasons this one is blocked". I don't know what ordering properties we should impose on the messages that follow: maybe it's just everything for the "paused in arrears" vat that gets stalled, but I think "E order" has something to say about that.
The #3517 "Flow Escalator" approach might provide a mechanism for this, when a flow's first message is blocked, the whole flow gets moved onto a special pseudo-queue that doesn't get serviced until the target vat is resumed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like @FUDCo's idea, which seems related to @zarutian's ideas:
Perhaps chargeAccounts could be „over-charged“, that is go into negative. When that happens ZCFVats of the contract instances drawing from that chargeAccount wont run until the chargeAccount is in the positive again.
He goes on to explain a mechanism for how users can add to the chargeAccount for a contract instance.
Termination seems too extreme, if stopping delivering messages is plausible. Making this a future goal sounds like the right thing to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tracked in #3528
|
||
## runPolicy | ||
|
||
TODO: The host application can limit the number of cranks processed in a single call to `controller.run()` by providing a `runPolicy` object. This policy object is informed about each crank and the number of computrons it consumed. By comparing the cumulative computrons against an experimentally (and externally) determined threshold, the `runLimit` object can tell the kernel to stop processing before the run-queue is drained. For a busy kernel, with an ever-increasing amount of work to do, this can limit the size of a commitment domain (e.g. the "block" in a blockchain / consensus machine). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, this is a partial answer to my above comment.
* add examples of computron usage numbers * fix typo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! One question about the relationship between enforcing a per-crank limit and subtracting from a meter. It seems like we might want them separately, so perhaps the settings should be separate.
import test from 'ava'; | ||
// eslint-disable-next-line import/order | ||
import { test } from '@agoric/swingset-vat/tools/prepare-test-env-ava.js'; | ||
// eslint-disable-next-line import/order |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, I haven't seen this eslint setting before. Tiny thing, but can you tell me more about why we need to disable it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yarn lint-fix
keeps rearranging my import
statements, I think it wants all "external" imports to appear in alphabetical order, followed by all "internal" imports in order, and this idiom seems to work to keep the prepare-test-env-ava
as the very first one (since it has side-effects and won't work if it appears too late). I wouldn't be surprised if there is a way to phrase it with just a single override, but I haven't found it yet, and I got tired of fighting with eslint so I stopped once I got something that worked :).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
## Meter Objects | ||
|
||
The kernel manages `Meter` objects. Each one has a `remaining` capacity and a notification `threshold`. The Meter has a `Notifier` which can inform interested parties when the capacity drops below the threshold, so they can refill it before any associated vats are in danger of being terminated due to an underflow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like @FUDCo's idea, which seems related to @zarutian's ideas:
Perhaps chargeAccounts could be „over-charged“, that is go into negative. When that happens ZCFVats of the contract instances drawing from that chargeAccount wont run until the chargeAccount is in the positive again.
He goes on to explain a mechanism for how users can add to the chargeAccount for a contract instance.
Termination seems too extreme, if stopping delivering messages is plausible. Making this a future goal sounds like the right thing to me.
|
||
The default (omitting a `meter` option) leaves the vat unmetered. | ||
|
||
Assigning a Meter to a vat activates the per-crank limit. To achieve a per-crank limit without a Meter object (which must be refilled occasionally to keep the vat from being terminated), use an unlimited meter: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, that seems like a setting that we might want to make explicit, especially since per-crank limits don't really have much to do with reducing a meter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add this as a future goal to consider. My half-considered opinion is that having a per-crank limit but not a Meter isn't really much of a limit: it protects against infinite loops but not infinite send-to-self cycles (with the assistance of a second vat, e.g. purse.getCurrentAmount()
). So a vat in that state could consume effectively all of the CPU time.
The only use case I can think of for that mode would be the REPL, which is in a sort of intermediate state between "protect against accidents" but "trust that the code isn't malicious" / "if you break it, you get to keep both pieces". I added UnlimitedMeter to serve as a bridge from the old behavior to the new (charge account) one, and maybe the REPL will stay on it, but I figured real contract vats should not be allowed to use it.
I'll add a ticket with these thoughts where we can think through the use cases. It wouldn't be hard to implement, but I didn't want to leave an attractive nuisance lying around in the API :).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added as #3527
refs #3308