Add PVF execution priority #4837

AndreiEres · 2024-06-19T15:21:54Z

Resolves #4632

The new logic optimizes the distribution of execution jobs for disputes, approvals, and backings. Testing shows improved finality lag and candidate checking times, especially under heavy network load.

Approach

This update adds prioritization to the PVF execution queue. The logic partially implements the suggestions from #4632 (comment).

We use thresholds to determine how much a current priority can "steal" from lower ones:

Disputes: 70%
Approvals: 80%
Backing System Parachains: 100%
Backing: 100%

A threshold indicates the portion of the current priority that can be allocated from lower priorities.

For example:

Disputes take 70%, leaving 30% for approvals and all backings.
80% of the remaining goes to approvals, which is 30% * 80% = 24% of the original 100%.
If we used parts of the original 100%, approvals couldn't take more than 24%, even if there are no disputes.

Assuming a maximum of 12 executions per block, with a 6-second window, 2 CPU cores, and a 2-second run time, we get these distributions:

With disputes: 8 disputes, 3 approvals, 1 backing
Without disputes: 9 approvals, 3 backings

It's worth noting that when there are no disputes, if there's only one backing job, we continue processing approvals regardless of their fulfillment status.

Versi Testing 40/20

Testing showed a slight difference in finality lag and candidate checking time between this pull request and its base on the master branch. The more loaded the network, the greater the observed difference.

Testing Parameters:

40 validators (4 malicious)
20 gluttons with 2 seconds of PVF execution time
6 VRF modulo samples
12 required approvals

Versi Testing 80/40

For this test, we compared the master branch with the branch from #5616. The second branch is based on the current one but removes backing jobs that have exceeded their time limits. We excluded malicious nodes to reduce noise from disputing and banning validators. The results show that, under the same load, nodes experience less finality lag and reduced recovery and check time. Even parachains are functioning with a shorter block time, although it remains over 6 seconds.

Testing Parameters:

80 validators (0 malicious)
40 gluttons with 2 seconds of PVF execution time
6 VRF modulo samples
30 required approvals

polkadot/node/subsystem-types/src/messages.rs

polkadot/node/core/pvf/src/priority.rs

polkadot/node/core/pvf/src/execute/queue.rs

Co-authored-by: Andrei Sandu <54316454+sandreim@users.noreply.github.com>

AndreiEres · 2024-07-11T11:16:49Z

Updated the logic according to #4632 (comment) which looks reasonable

alexggh

First pass, good job, I like the simplicity, I left you some comments, nothing major.

As we discussed online, it would be nice if we can run some integrations tests to understand how the system behaves on the limit conditions, I think we can simulate that by making the pvf execution longer, so that we fall behind on work.

polkadot/node/subsystem-types/src/messages.rs

polkadot/node/core/pvf/src/execute/queue.rs

AndreiEres · 2024-08-30T11:48:45Z

Testing results added to the description

sandreim

Thank you @AndreiEres . This is very good work and the tests results show it. Just have a few more nits here and there, but otherwise good to go from my side.

@burdges any concerns with the percentages here:

polkadot-sdk/polkadot/node/core/pvf/src/execute/queue.rs

Line 673 in e456a5a

// If we used parts of the original 100%, approvals can't take more than 24%,

?

sandreim · 2024-09-20T10:56:35Z

polkadot/node/core/pvf/src/execute/queue.rs

+		)
+	}
+
+	fn is_fulfilled(&self, priority: &PvfExecPriority) -> bool {


This name is a bit confusing in the absence of comments. Why not actually return if we can pickup a work item in the specified priority queue.

Changed the name, we have to check if we reached a threshold for the specified priority

polkadot/node/core/pvf/src/execute/queue.rs

sandreim · 2024-09-20T11:20:18Z

polkadot/node/core/pvf/src/execute/queue.rs

+	// A threshold in percentages, the portion a current priority can "steal" from lower ones.
+	// For example:
+	// Disputes take 70%, leaving 30% for approvals and all backings.
+	// 80% of the remaining goes to approvals, which is 30% * 80% = 24% of the original 100%.


This way of expressing the percentages is not easy. You have to do this math everytime. Can it be made simpler to not require the reader to do much math ?

I tried to update the docs to make it clearer. Unfortunately I didn't come up with a better solution.

polkadot/node/core/pvf/src/execute/queue.rs

polkadot/node/core/candidate-validation/src/lib.rs

…ution-priority

burdges · 2024-09-20T21:03:20Z

polkadot/node/core/pvf/src/execute/queue.rs

+	///
+	/// This system might seem complex, but we operate with the remaining percentages because:
+	/// - Not all job types are present in each block. If we used parts of the original 100%,
+	///   approvals could not exceed 24%, even if there are no disputes.


I've trouble parsing "even if there are no disputes" here. If there are no backing jobs then approvals takes like 80% of the total, right?

Do you mean that we will skip jobs if we only have approvals in the queue and they have reached 80%? If every current priority has reached its limit, we will select the highest one to execute. In that case, we will continue picking approvals in the absence of other jobs.

burdges · 2024-09-20T21:14:13Z

polkadot/node/core/pvf/src/execute/queue.rs

+	/// - Not all job types are present in each block. If we used parts of the original 100%,
+	///   approvals could not exceed 24%, even if there are no disputes.
+	/// - We cannot fully prioritize backing system parachains over backing other parachains based
+	///   on the distribution of the original 100%.


Yeah, we might've some gradiations among the different system parachains, so do whatever you think is best for now. AssetHub could maybe halt without killing anything, although this depends upon how its used. Among the chains that must run "enough" during an epoch:

Sassafras tickets collection, if we ever moved it onto a parachain (maybe never).

Validator elections should maintain a fresh validator set, even if we've lost many for some reason.

If validator elections runs then we must run bridgehub enough, and any future DKG chains.

Some governance chain without smart contracts (collectives?).

Yeah this is just a first step. I agree it needs to be e reconsidered once we start out to move governance and staking out of the relay chain.

However, I think that a better algorithm would actually consider to target specific block times for the system parachains in times of overload.

Afaik, critical system paracahins should typically not require numerous blocks, specific block times, etc, but some require like one block from each validator during the epoch, including enough slack so spam cannot censor them.

This assumes the critical system paracahins cannot be tricked into wasting their own blocks, which likely becomes a risk only if the d-day governance supports smart contracts.

s0me0ne-unkn0wn

Looks very good, thank you! 👍

s0me0ne-unkn0wn · 2024-09-25T13:22:54Z

polkadot/node/core/pvf/src/execute/queue.rs

+	const PRIORITY_ALLOCATION_THRESHOLDS: &'static [(PvfExecPriority, usize)] = &[
+		(PvfExecPriority::Dispute, 70),
+		(PvfExecPriority::Approval, 80),
+		(PvfExecPriority::BackingSystemParas, 100),
+		(PvfExecPriority::Backing, 100),
+	];


How cool would it be to put that into the parachain host config? (Not in this PR ofc, just an idea)

That's a good question. I believe we'll have more information to make a decision after using it in production for a while. @sandreim WDYT?

eskimor · 2024-10-08T15:54:13Z

polkadot/node/subsystem-types/src/messages.rs

+/// to separate and prioritize execution jobs by request type.
+/// The order is important, because we iterate through the values and assume it is going from higher
+/// to lowest priority.
+#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash, EnumIter)]


Given that we are already leaking to validation what we want to have validated here, we should likely keep the old name PvfExecKind and infer "priority" from this, together with other information. E.g. whether a validation is still worthwhile (ttl PR based on this one).

* master: (28 commits) `substrate-node`: removed excessive polkadot-sdk features (#5925) Rename QueueEvent::StartWork (#6015) [ci] Remove quick-benchmarks-omni from GitLab (#6014) Set larger timeout for cmd.yml (#6006) Fix `0003-beefy-and-mmr` test (#6003) Remove redundant XCMs from dry run's forwarded xcms (#5913) Add RadiumBlock bootnodes to Coretime Polkadot Chain spec (#5967) Bump strum from 0.26.2 to 0.26.3 (#5943) Add PVF execution priority (#4837) Snowbridge V2 docs (#5902) Fix u256 conversion in BABE (#5994) [ci] Move test-linux-stable-no-try-runtime to GHA (#5979) Bump PoV request timeout (#5924) [Release/CI] Github flow to build `polkadot`/`polkadot-parachain` rc binaries and deb package (#5963) [ci] Remove short-benchmarks from Gitlab (#5988) Disable flaky tests reported in 5972/5973/5974 (#5976) Bump some dependencies (#5886) bump zombienet version and set request for k8s (#5968) [omni-bencher] Make all runtimes work (#5872) Omni-Node renamings (#5915) ...

AndreiEres added 3 commits June 19, 2024 16:49

Add execution priority

ff40792

Move priority to PendingExecutionRequest

fca9a11

Update candidate validation

7ec65f8

AndreiEres changed the title ~~[WIP] PVF execution priority~~ [WIP] Add PVF execution priority Jun 19, 2024

AndreiEres added T0-node This PR/Issue is related to the topic “node”. T8-polkadot This PR/Issue is related to/affects the Polkadot network. labels Jun 19, 2024

AndreiEres added 3 commits June 20, 2024 12:08

Add scheduling logic

0d0926a

Rename PreparePriority

f9bf225

Set low priority for backing

fc7f27a

AndreiEres requested review from sandreim, s0me0ne-unkn0wn and alexggh June 20, 2024 10:32

AndreiEres changed the title ~~[WIP] Add PVF execution priority~~ Add PVF execution priority Jun 20, 2024

AndreiEres marked this pull request as ready for review June 20, 2024 10:32

AndreiEres added 4 commits June 25, 2024 11:06

Add additional option for disputes

9b8a5e9

Fix clippy

610f0cc

Fix types

507113c

Fix type

3fbecab

sandreim reviewed Jun 25, 2024

View reviewed changes

polkadot/node/subsystem-types/src/messages.rs Outdated Show resolved Hide resolved

polkadot/node/core/pvf/src/priority.rs Outdated Show resolved Hide resolved

polkadot/node/core/pvf/src/execute/queue.rs Outdated Show resolved Hide resolved

AndreiEres and others added 6 commits June 26, 2024 11:55

Update polkadot/node/core/pvf/src/priority.rs

9192602

Co-authored-by: Andrei Sandu <54316454+sandreim@users.noreply.github.com>

Update

bafb6c2

Update logic

0c1edcf

Fix

7442c99

Fix

cb13533

Fix

4ec2e5e

alexggh reviewed Jul 17, 2024

View reviewed changes

AndreiEres added 3 commits July 17, 2024 18:16

Assign system backing

b70c247

Replace probability with counter

2dfbffc

Merge branch 'master' into AndreiEres/pvf-execution-priority

7e40ef6

AndreiEres added 3 commits September 4, 2024 17:26

Merge branch 'master' into AndreiEres/pvf-execution-priority

01f0659

Fix imports

d9c118f

Merge branch 'master' into AndreiEres/pvf-execution-priority

e456a5a

sandreim approved these changes Sep 20, 2024

View reviewed changes

AndreiEres added 6 commits September 20, 2024 16:57

Address comments

1478b18

Merge remote-tracking branch 'origin/master' into AndreiEres/pvf-exec…

df2ee37

…ution-priority

Update lockfile

0812542

Add prdoc

b979bb5

Update prdoc

54c2812

Update prdoc

6ce2f5a

burdges reviewed Sep 20, 2024

View reviewed changes

Merge branch 'master' into AndreiEres/pvf-execution-priority

6924446

s0me0ne-unkn0wn approved these changes Sep 25, 2024

View reviewed changes

AndreiEres enabled auto-merge September 25, 2024 15:19

alexggh mentioned this pull request Sep 30, 2024

Enabling 1k validators & 200 cores on polkadot #5867

Open

21 tasks

eskimor reviewed Oct 8, 2024

View reviewed changes

AndreiEres added 7 commits October 9, 2024 16:04

Rename PvfExecPriority -> PvfExecKind

7b577b5

Merge branch 'master' into AndreiEres/pvf-execution-priority

191dd44

Fix renaming

d630b34

Rename execute_priority -> exec_kind

6964069

Fix minimal-example

314aa19

Fix imports

40dbd8e

Fix format

9bec252

AndreiEres added this pull request to the merge queue Oct 9, 2024

Merged via the queue into master with commit e294d62 Oct 9, 2024
190 of 201 checks passed

AndreiEres deleted the AndreiEres/pvf-execution-priority branch October 9, 2024 17:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add PVF execution priority #4837

Add PVF execution priority #4837

AndreiEres commented Jun 19, 2024 •

edited

Loading

AndreiEres commented Jul 11, 2024

alexggh left a comment

AndreiEres commented Aug 30, 2024

sandreim left a comment

sandreim Sep 20, 2024

AndreiEres Sep 20, 2024

sandreim Sep 20, 2024

AndreiEres Sep 20, 2024

burdges Sep 20, 2024

AndreiEres Sep 23, 2024

burdges Sep 20, 2024

sandreim Sep 23, 2024

burdges Sep 24, 2024

s0me0ne-unkn0wn left a comment

s0me0ne-unkn0wn Sep 25, 2024

AndreiEres Sep 25, 2024 •

edited

Loading

eskimor Oct 8, 2024

Add PVF execution priority #4837

Add PVF execution priority #4837

Conversation

AndreiEres commented Jun 19, 2024 • edited Loading

Approach

Versi Testing 40/20

Versi Testing 80/40

AndreiEres commented Jul 11, 2024

alexggh left a comment

Choose a reason for hiding this comment

AndreiEres commented Aug 30, 2024

sandreim left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

s0me0ne-unkn0wn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AndreiEres Sep 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AndreiEres commented Jun 19, 2024 •

edited

Loading

AndreiEres Sep 25, 2024 •

edited

Loading