Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cosmic-swingset runPolicydoes not pay attention to startVat computron usage #6639

Open
warner opened this issue Dec 6, 2022 · 1 comment
Assignees
Labels
bug Something isn't working cosmic-swingset package: cosmic-swingset enhancement New feature or request vaults_triage DO NOT USE

Comments

@warner
Copy link
Member

warner commented Dec 6, 2022

Describe the bug

While investigating #6625 , we noticed that the blocks which performed a create-vat operation used far more computrons than the expected 65Mc limit. Blocks 7709165, 7709213, and 7709243 all created two vats each. Their computron counts (as measured by adding up the metering data from all of their deliver-result slog entries), and the amount of time a follower node spent performing swingset work, were as follows:

block computrons swingset time (s)
7709165 239910012 20.942
7709213 250284328 44.655
7709243 278147375 21.960

Vats are created by a crank named create-vat. This must do two things: create a new worker (which includes spawning a new xsnap process), and send a startVat delivery to that worker. The startVat delivery triggers the evaluation of the vat bundle (ZCF, in this case) and a lot of virtual-kind/collection/object initialization, so it's relatively intense. I measured the actual startVat delivery as consuming 87.6Mc

If these computrons were properly counted against the runPolicy limit (currently configured at 65Mc), then the policy would have ended block 7709165 immediately after the first create-vat. The second create-vat would consume the following block, and the rest of the original work would be pushed out to a third block.

Instead, we did both create-vats plus an additional 65Mc of non-vat-creation work, all in block 7709165.

Now, it turns out that create-vat doesn't actually take all that much CPU (wallclock) time, despite the high computron count. On the same follower node, I measured it as taking 1.8s . (We've always known that the seconds-per-computron ratio depends on what sort of code is running, and we've hoped that the work patterns are sufficiently random to let our fixed cost rate work well enough).

Most of the block time is spent on the non-create-vat things. The second block (which took 44.655s) was mostly filled with relatively-expensive serialization work (higher seconds-per-computron than average), so it was going to take a long time with or without the create-vat cranks.

Nominal Fix

A year ago, we didn't have a distinct startVat delivery, so we didn't get any sort of computron count for vat creation. We still reported vat creation to the runPolicy, but it was configured to increment the "computrons spent in this block" counter by a fixed value (I think 300kc per vat).

We eventually split up the creation process into an unmetered step (create the xsnap worker, load the lockdown and supervisor bundles, that also loaded liveslots), followed by a metered startVat delivery (which evaluates the vat bundle, i.e. ZCF). At that point, I believe we started reporting the startVat computron count as an additional argument to the runPolicy's vat-created method.

So the best fix would be to enhance the cosmic-swingset runPolicy to pay attention to that argument, and increment the counter just like it would for a normal delivery crank.

Alternative Fixes

To avoid a code change on mainnet, we could accomplish nearly the same thing with a governance action that changes the 300kc-per-vat (actually 30M "beans" per vat) creation cost to reflect the actual measured cost of doing a startVat with the ZCF bundle: 87.6Mc (= 8.76G beans). That would become inaccurate if/when we start using a different ZCF bundle, but not enough to worry about.

Both would require a vote, but the parameter change would not require new software.

Priority

As above, the actual runtime cost of a startVat appears to be fairly low, so we wouldn't shrink our blocks significantly by properly accounting for the startVat computrons. So I'm inclined to not worry about getting this fixed on mainnet.

@warner
Copy link
Member Author

warner commented Dec 15, 2022

related to #3524 , which was partially fixed by the creation of a distinct startVat delivery

@ivanlei ivanlei added enhancement New feature or request vaults_triage DO NOT USE and removed vaults-release labels Jan 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cosmic-swingset package: cosmic-swingset enhancement New feature or request vaults_triage DO NOT USE
Projects
None yet
Development

No branches or pull requests

4 participants