cosmic-swingset runPolicydoes not pay attention to startVat computron usage #6639

warner · 2022-12-06T18:20:54Z

Describe the bug

While investigating #6625 , we noticed that the blocks which performed a create-vat operation used far more computrons than the expected 65Mc limit. Blocks 7709165, 7709213, and 7709243 all created two vats each. Their computron counts (as measured by adding up the metering data from all of their deliver-result slog entries), and the amount of time a follower node spent performing swingset work, were as follows:

block	computrons	swingset time (s)
7709165	239910012	20.942
7709213	250284328	44.655
7709243	278147375	21.960

Vats are created by a crank named create-vat. This must do two things: create a new worker (which includes spawning a new xsnap process), and send a startVat delivery to that worker. The startVat delivery triggers the evaluation of the vat bundle (ZCF, in this case) and a lot of virtual-kind/collection/object initialization, so it's relatively intense. I measured the actual startVat delivery as consuming 87.6Mc

If these computrons were properly counted against the runPolicy limit (currently configured at 65Mc), then the policy would have ended block 7709165 immediately after the first create-vat. The second create-vat would consume the following block, and the rest of the original work would be pushed out to a third block.

Instead, we did both create-vats plus an additional 65Mc of non-vat-creation work, all in block 7709165.

Now, it turns out that create-vat doesn't actually take all that much CPU (wallclock) time, despite the high computron count. On the same follower node, I measured it as taking 1.8s . (We've always known that the seconds-per-computron ratio depends on what sort of code is running, and we've hoped that the work patterns are sufficiently random to let our fixed cost rate work well enough).

Most of the block time is spent on the non-create-vat things. The second block (which took 44.655s) was mostly filled with relatively-expensive serialization work (higher seconds-per-computron than average), so it was going to take a long time with or without the create-vat cranks.

Nominal Fix

A year ago, we didn't have a distinct startVat delivery, so we didn't get any sort of computron count for vat creation. We still reported vat creation to the runPolicy, but it was configured to increment the "computrons spent in this block" counter by a fixed value (I think 300kc per vat).

We eventually split up the creation process into an unmetered step (create the xsnap worker, load the lockdown and supervisor bundles, that also loaded liveslots), followed by a metered startVat delivery (which evaluates the vat bundle, i.e. ZCF). At that point, I believe we started reporting the startVat computron count as an additional argument to the runPolicy's vat-created method.

So the best fix would be to enhance the cosmic-swingset runPolicy to pay attention to that argument, and increment the counter just like it would for a normal delivery crank.

Alternative Fixes

To avoid a code change on mainnet, we could accomplish nearly the same thing with a governance action that changes the 300kc-per-vat (actually 30M "beans" per vat) creation cost to reflect the actual measured cost of doing a startVat with the ZCF bundle: 87.6Mc (= 8.76G beans). That would become inaccurate if/when we start using a different ZCF bundle, but not enough to worry about.

Both would require a vote, but the parameter change would not require new software.

Priority

As above, the actual runtime cost of a startVat appears to be fairly low, so we wouldn't shrink our blocks significantly by properly accounting for the startVat computrons. So I'm inclined to not worry about getting this fixed on mainnet.

The text was updated successfully, but these errors were encountered:

warner · 2022-12-15T18:22:01Z

related to #3524 , which was partially fixed by the creation of a distinct startVat delivery

warner added bug Something isn't working cosmic-swingset package: cosmic-swingset labels Dec 6, 2022

JimLarson assigned mhofman Dec 7, 2022

JimLarson added the vaults-release label Dec 7, 2022

dckc mentioned this issue Dec 8, 2022

investigating mainnet slowdown around block 7709170 (05-dec-2022) #6625

Open

ivanlei added enhancement New feature or request vaults_triage DO NOT USE and removed vaults-release labels Jan 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cosmic-swingset runPolicydoes not pay attention to startVat computron usage #6639

cosmic-swingset runPolicydoes not pay attention to startVat computron usage #6639

warner commented Dec 6, 2022

warner commented Dec 15, 2022

cosmic-swingset runPolicydoes not pay attention to startVat computron usage #6639

cosmic-swingset runPolicydoes not pay attention to startVat computron usage #6639

Comments

warner commented Dec 6, 2022

Describe the bug

Nominal Fix

Alternative Fixes

Priority

warner commented Dec 15, 2022