-
Notifications
You must be signed in to change notification settings - Fork 687
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[5 / 5] Introduce approval-voting-parallel #4849
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
alexggh
force-pushed
the
alexaggh/approval-voting-parallel-5-5
branch
from
June 20, 2024 12:02
a591635
to
bd5529d
Compare
This was referenced Jun 20, 2024
alexggh
changed the title
Introduce approval-voting-parallel
[5 / 5] Introduce approval-voting-parallel
Jun 20, 2024
alexggh
force-pushed
the
alexaggh/approval-voting-parallel-4-5
branch
from
July 2, 2024 11:40
cb57906
to
4b3f489
Compare
alexggh
force-pushed
the
alexaggh/approval-voting-parallel-5-5
branch
from
July 2, 2024 11:40
bd5529d
to
5da132d
Compare
alexggh
changed the base branch from
alexaggh/approval-voting-parallel-4-5
to
alexaggh/approval-voting-parallel-2-5
July 2, 2024 11:46
alexggh
changed the base branch from
alexaggh/approval-voting-parallel-2-5
to
master
July 2, 2024 11:51
alexggh
changed the base branch from
master
to
alexaggh/approval-voting-parallel-2-5
July 2, 2024 12:04
This was referenced Jul 2, 2024
alexggh
force-pushed
the
alexaggh/approval-voting-parallel-2-5
branch
from
July 3, 2024 12:16
7add070
to
3a0ba90
Compare
alexggh
force-pushed
the
alexaggh/approval-voting-parallel-5-5
branch
from
July 3, 2024 12:16
5da132d
to
1592d0b
Compare
alexggh
force-pushed
the
alexaggh/approval-voting-parallel-2-5
branch
from
July 5, 2024 11:36
3a0ba90
to
78bb23d
Compare
alexggh
force-pushed
the
alexaggh/approval-voting-parallel-5-5
branch
from
July 5, 2024 11:36
1592d0b
to
0b5e321
Compare
alexggh
force-pushed
the
alexaggh/approval-voting-parallel-2-5
branch
from
July 5, 2024 13:13
78bb23d
to
daf343f
Compare
alexggh
force-pushed
the
alexaggh/approval-voting-parallel-5-5
branch
from
July 5, 2024 13:13
0b5e321
to
49a0312
Compare
alexggh
force-pushed
the
alexaggh/approval-voting-parallel-2-5
branch
from
July 10, 2024 14:42
daf343f
to
25f1f0a
Compare
alexggh
force-pushed
the
alexaggh/approval-voting-parallel-5-5
branch
from
July 10, 2024 14:42
49a0312
to
0746ba3
Compare
alexggh
force-pushed
the
alexaggh/approval-voting-parallel-2-5
branch
from
July 12, 2024 07:04
25f1f0a
to
d13e1c8
Compare
alexggh
force-pushed
the
alexaggh/approval-voting-parallel-5-5
branch
from
July 12, 2024 07:04
0746ba3
to
cd2d303
Compare
alexggh
commented
Jul 12, 2024
alexggh
force-pushed
the
alexaggh/approval-voting-parallel-5-5
branch
from
July 12, 2024 07:15
cd2d303
to
aa42eff
Compare
github-merge-queue bot
pushed a commit
that referenced
this pull request
Jul 16, 2024
This is part of the work to further optimize the approval subsystems, if you want to understand the full context start with reading #4849 (comment), however that's not necessary, as this change is self-contained and nodes would benefit from it regardless of subsequent changes landing or not. While testing with 1000 validators I found out that the logic for determining the validators an assignment should be gossiped to is taking a lot of time, because it always iterated through all the peers, to determine which are X and Y neighbours and to which we should randomly gossip(4 samples). This could be actually optimised, so we don't have to iterate through all peers for each new assignment, by fetching the list of X and Y peer ids from the topology first and then stopping the loop once we took the 4 random samples. With this improvements we reduce the total CPU time spent in approval-distribution with 15% on networks with 500 validators and 20% on networks with 1000 validators. ## Test coverage: `propagates_assignments_along_unshared_dimension` and `propagates_locally_generated_assignment_to_both_dimensions` cover already logic and they passed, confirm that there is no breaking change. Additionally, the approval voting benchmark measure the traffic sent to other peers, so I confirmed that for various network size there is no difference in the size of the traffic sent to other peers. --------- Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
github-merge-queue
bot
removed this pull request from the merge queue due to failed status checks
Sep 25, 2024
github-merge-queue
bot
removed this pull request from the merge queue due to no response for status checks
Sep 25, 2024
github-merge-queue
bot
removed this pull request from the merge queue due to failed status checks
Sep 25, 2024
github-merge-queue
bot
removed this pull request from the merge queue due to failed status checks
Sep 26, 2024
github-merge-queue
bot
removed this pull request from the merge queue due to failed status checks
Sep 26, 2024
github-merge-queue
bot
removed this pull request from the merge queue due to failed status checks
Sep 26, 2024
github-merge-queue bot
pushed a commit
that referenced
this pull request
Oct 24, 2024
The approval-voting-parallel introduced with #4849 has been tested on `versi` and approximately 3 weeks on parity's existing kusama nodes paritytech/devops#3583, things worked as expected, so enable it by default on all kusama nodes in the next release. The next step will be enabling by default on polkadot if no issue arrises while running on kusama. --------- Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
mordamax
pushed a commit
to paritytech-stg/polkadot-sdk
that referenced
this pull request
Oct 25, 2024
The approval-voting-parallel introduced with paritytech#4849 has been tested on `versi` and approximately 3 weeks on parity's existing kusama nodes paritytech/devops#3583, things worked as expected, so enable it by default on all kusama nodes in the next release. The next step will be enabling by default on polkadot if no issue arrises while running on kusama. --------- Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
mordamax
pushed a commit
to paritytech-stg/polkadot-sdk
that referenced
this pull request
Oct 25, 2024
The approval-voting-parallel introduced with paritytech#4849 has been tested on `versi` and approximately 3 weeks on parity's existing kusama nodes paritytech/devops#3583, things worked as expected, so enable it by default on all kusama nodes in the next release. The next step will be enabling by default on polkadot if no issue arrises while running on kusama. --------- Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
mordamax
pushed a commit
to paritytech-stg/polkadot-sdk
that referenced
this pull request
Oct 25, 2024
The approval-voting-parallel introduced with paritytech#4849 has been tested on `versi` and approximately 3 weeks on parity's existing kusama nodes paritytech/devops#3583, things worked as expected, so enable it by default on all kusama nodes in the next release. The next step will be enabling by default on polkadot if no issue arrises while running on kusama. --------- Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
mordamax
pushed a commit
to paritytech-stg/polkadot-sdk
that referenced
this pull request
Oct 25, 2024
The approval-voting-parallel introduced with paritytech#4849 has been tested on `versi` and approximately 3 weeks on parity's existing kusama nodes paritytech/devops#3583, things worked as expected, so enable it by default on all kusama nodes in the next release. The next step will be enabling by default on polkadot if no issue arrises while running on kusama. --------- Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
mordamax
pushed a commit
to paritytech-stg/polkadot-sdk
that referenced
this pull request
Oct 25, 2024
The approval-voting-parallel introduced with paritytech#4849 has been tested on `versi` and approximately 3 weeks on parity's existing kusama nodes paritytech/devops#3583, things worked as expected, so enable it by default on all kusama nodes in the next release. The next step will be enabling by default on polkadot if no issue arrises while running on kusama. --------- Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
mordamax
pushed a commit
to paritytech-stg/polkadot-sdk
that referenced
this pull request
Oct 29, 2024
The approval-voting-parallel introduced with paritytech#4849 has been tested on `versi` and approximately 3 weeks on parity's existing kusama nodes paritytech/devops#3583, things worked as expected, so enable it by default on all kusama nodes in the next release. The next step will be enabling by default on polkadot if no issue arrises while running on kusama. --------- Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
mordamax
pushed a commit
to paritytech-stg/polkadot-sdk
that referenced
this pull request
Oct 29, 2024
The approval-voting-parallel introduced with paritytech#4849 has been tested on `versi` and approximately 3 weeks on parity's existing kusama nodes paritytech/devops#3583, things worked as expected, so enable it by default on all kusama nodes in the next release. The next step will be enabling by default on polkadot if no issue arrises while running on kusama. --------- Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
I9-optimisation
An enhancement to provide better overall performance in terms of time-to-completion for a task.
T0-node
This PR/Issue is related to the topic “node”.
T8-polkadot
This PR/Issue is related to/affects the Polkadot network.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is the implementation of the approach described here: #1617 (comment) & #1617 (comment) & #1617 (comment).
Description of changes
The end goal is to have an architecture where we have single subsystem(
approval-voting-parallel
) and multiple worker types that would full-fill the work that currently is fulfilled by theapproval-distribution
andapproval-voting
subsystems. The main loop of the new subsystem would do just the distribution of work to the workers.The new subsystem will have:
worker_index = msg.validator % WORKER_COUNT
, this guarantees that all assignments and approvals from the same validator reach the same worker.On the hot path of processing messages no synchronisation and waiting is needed between approval-distribution and approval-voting workers.
Guidelines for reading
The full implementation is broken in 5 PRs and all of them are self-contained and improve things incrementally even without the parallelisation being implemented/enabled, the reason this approach was taken instead of a big-bang PR, is to make things easier to review and reduced the risk of breaking this critical subsystems.
After reading the full description of this PR, the changes should be read in the following order:
Context
and be able to run multiple instances of the subsystem on different threads. No functional changesapproval-voting-parallel
subsystem that runs on different workers the logic currently inapproval-distribution
andapproval-voting
.Results
Running subsystem-benchmarks with 1000 validators 100 fully ocuppied cores and triggering all assignments and approvals for all tranches
Approval does not lags behind.
Master
With this PoC
Gathering enough assignments
Enough assignments are gathered in less than 500ms, so that gives un a guarantee that un-necessary work does not get triggered, on master on the same benchmark because the subsystems fall behind on work, that number goes above 32 seconds on master.
Cpu usage:
Master
With this PoC
Enablement strategy
Because just some trivial plumbing is needed in approval-distribution and approval-voting to be able to run things in parallel and because this subsystems plays a critical part in the system this PR proposes that we keep both ways of running the approval work, as separated subsystems and just a single subsystem(
approval-voting-parallel
) which has multiple workers for the distribution work and one worker for the approval-voting work and switch between them with a comandline flag.The benefits for this is twofold.
Next steps
@ordian @eskimor @sandreim @AndreiEres, let me know what you think.