cosmic-swingset runPolicydoes not pay attention to startVat computron usage #6639
Labels
bug
Something isn't working
cosmic-swingset
package: cosmic-swingset
enhancement
New feature or request
vaults_triage
DO NOT USE
Describe the bug
While investigating #6625 , we noticed that the blocks which performed a
create-vat
operation used far more computrons than the expected 65Mc limit. Blocks 7709165, 7709213, and 7709243 all created two vats each. Their computron counts (as measured by adding up the metering data from all of theirdeliver-result
slog entries), and the amount of time a follower node spent performing swingset work, were as follows:Vats are created by a crank named
create-vat
. This must do two things: create a new worker (which includes spawning a newxsnap
process), and send astartVat
delivery to that worker. ThestartVat
delivery triggers the evaluation of the vat bundle (ZCF, in this case) and a lot of virtual-kind/collection/object initialization, so it's relatively intense. I measured the actualstartVat
delivery as consuming 87.6McIf these computrons were properly counted against the
runPolicy
limit (currently configured at 65Mc), then the policy would have ended block 7709165 immediately after the firstcreate-vat
. The secondcreate-vat
would consume the following block, and the rest of the original work would be pushed out to a third block.Instead, we did both
create-vat
s plus an additional 65Mc of non-vat-creation work, all in block 7709165.Now, it turns out that
create-vat
doesn't actually take all that much CPU (wallclock) time, despite the high computron count. On the same follower node, I measured it as taking 1.8s . (We've always known that the seconds-per-computron ratio depends on what sort of code is running, and we've hoped that the work patterns are sufficiently random to let our fixed cost rate work well enough).Most of the block time is spent on the non-
create-vat
things. The second block (which took 44.655s) was mostly filled with relatively-expensive serialization work (higher seconds-per-computron than average), so it was going to take a long time with or without thecreate-vat
cranks.Nominal Fix
A year ago, we didn't have a distinct
startVat
delivery, so we didn't get any sort of computron count for vat creation. We still reported vat creation to therunPolicy
, but it was configured to increment the "computrons spent in this block" counter by a fixed value (I think 300kc per vat).We eventually split up the creation process into an unmetered step (create the
xsnap
worker, load thelockdown
andsupervisor
bundles, that also loadedliveslots
), followed by a meteredstartVat
delivery (which evaluates the vat bundle, i.e. ZCF). At that point, I believe we started reporting thestartVat
computron count as an additional argument to therunPolicy
's vat-created method.So the best fix would be to enhance the cosmic-swingset
runPolicy
to pay attention to that argument, and increment the counter just like it would for a normal delivery crank.Alternative Fixes
To avoid a code change on mainnet, we could accomplish nearly the same thing with a governance action that changes the 300kc-per-vat (actually 30M "beans" per vat) creation cost to reflect the actual measured cost of doing a
startVat
with the ZCF bundle: 87.6Mc (= 8.76G beans). That would become inaccurate if/when we start using a different ZCF bundle, but not enough to worry about.Both would require a vote, but the parameter change would not require new software.
Priority
As above, the actual runtime cost of a
startVat
appears to be fairly low, so we wouldn't shrink our blocks significantly by properly accounting for thestartVat
computrons. So I'm inclined to not worry about getting this fixed on mainnet.The text was updated successfully, but these errors were encountered: