Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

PoV Distribution optimization #1990

Merged
19 commits merged into from
Nov 25, 2020
Merged

PoV Distribution optimization #1990

19 commits merged into from
Nov 25, 2020

Conversation

montekki
Copy link
Contributor

@montekki montekki commented Nov 19, 2020

It doesn't look like I've broken any existing tests or the only existing integration test so far. however I am most certain that some corner cases are not yet accounted for but for that I am needing fresh eyes and I will also be re-reading the code so marking at as ready for review now.

Fixes #1688

@montekki montekki added A3-in_progress Pull request is in progress. No review needed at this stage. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders. labels Nov 19, 2020
@montekki montekki changed the title Initial commit PoV Distribution optimization Nov 19, 2020
Copy link
Contributor

@pepyakin pepyakin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A very shallow review because I am not very fluent in this area. Also, I've missed that this is a inprogress, sorry

node/network/pov-distribution/src/lib.rs Outdated Show resolved Hide resolved
node/network/pov-distribution/src/lib.rs Outdated Show resolved Hide resolved
node/network/pov-distribution/src/lib.rs Show resolved Hide resolved
node/network/pov-distribution/src/lib.rs Outdated Show resolved Hide resolved
node/network/pov-distribution/src/lib.rs Outdated Show resolved Hide resolved
ValidatorDiscovery(#[from] polkadot_node_subsystem_util::validator_discovery::Error),
#[error(transparent)]
Util(#[from] polkadot_node_subsystem_util::Error),
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about the team's policy about that, but many of those can be constructed from different places. Having, say, OneshotRecv at hand it would probably hard to tell where that error actually originated from.

node/network/pov-distribution/src/lib.rs Outdated Show resolved Hide resolved
@montekki montekki added A0-please_review Pull request needs code review. and removed A3-in_progress Pull request is in progress. No review needed at this stage. labels Nov 20, 2020
@pepyakin
Copy link
Contributor

One thing I realized, doesn't this require an impl guide update?

node/network/pov-distribution/src/lib.rs Outdated Show resolved Hide resolved
node/network/pov-distribution/src/lib.rs Outdated Show resolved Hide resolved
node/network/pov-distribution/src/lib.rs Outdated Show resolved Hide resolved
node/network/pov-distribution/src/lib.rs Outdated Show resolved Hide resolved
node/network/pov-distribution/src/lib.rs Outdated Show resolved Hide resolved
node/network/pov-distribution/Cargo.toml Outdated Show resolved Hide resolved
Copy link
Contributor

@rphmeier rphmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the changes where we connect to the peers in our group about the PoV, but not the changes where we only notify and request from peers in that group. We will need to change the logic for computing peers_to_send also.

@montekki
Copy link
Contributor Author

My current understanding of the protocol is this:

The first pathway is FetchPoV. The actions there are:

  1. determine relevant validators
  2. for every validator as they connect issue a relevant Awaiting message.

The second pathway is DistributePoV and it will do this:

  1. Store the known PoV.
  2. For every peer that we have previously received Awaiting issue a SendPoV message.

The path for networking messages: just handle SendPoV and Awaiting messages.

In other words, the only proactive networking actions with connecting to relevant validators are taken on fetches, and we can assume that on distributing path we only receive Awaiting messages from the relevant peers.

Or am I missing something?

@rphmeier
Copy link
Contributor

rphmeier commented Nov 22, 2020

My current understanding of the protocol is this:

The first pathway is FetchPoV. The actions there are:

1. determine relevant validators

2. for every validator as they connect issue a relevant `Awaiting` message.

The second pathway is DistributePoV and it will do this:

1. Store the known `PoV`.

2. For every peer that we have previously received `Awaiting` issue a `SendPoV` message.

The path for networking messages: just handle SendPoV and Awaiting messages.

In other words, the only proactive networking actions with connecting to relevant validators are taken on fetches, and we can assume that on distributing path we only receive Awaiting messages from the relevant peers.

Or am I missing something?

Oh, now I see. I guess previously, the PoVDistribution subsystem wasn't actually doing a good job of distributing information across uninterested peers, because they never awaited anything. We should have introduced an 'echo' mode or something for that. So since we (as an honest validator) will await only PoVs we're interested in, it doesn't matter if we tell every peer that we're awaiting stuff, as long as we're connected to peers from our group.

However, the call to notify_all_we_are_awaiting that used to be in handle_fetch was now removed: https://github.com/paritytech/polkadot/pull/1990/files#diff-578deb92139a25c42178fc12e8a62dde2b5b4dfe6fe0ab30b698e8caead3835fL306 . AFAIK nothing calls that code anymore. I do believe we need it to handle the cases where our peers are slightly ahead of us in view. e.g. if I'm connected to validator V, and they reach relay parent R before us, we won't trigger the notify_one_we_are_awaiting_many(V, R) code-path when we reach R. Then assume we're fetching PoV P - without the notify_all_we_are_awaiting, we will never tell V that we are awaiting P. I suggest to reinstate this call, but I'm not sure why you removed it in the first place.

@montekki
Copy link
Contributor Author

I suggest to reinstate this call, but I'm not sure why you removed it in the first place.

Ah yes I agree. For some time I was thinking that we would re-receive the PeerViewUpdate message anyway so we would hit that path. Expanded a test to this situation.

Also looks like the attributes on functions hide unused compiler warnings

Copy link
Member

@ordian ordian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good modulo error handling.

node/network/pov-distribution/src/lib.rs Outdated Show resolved Hide resolved
node/network/pov-distribution/src/lib.rs Outdated Show resolved Hide resolved
node/network/pov-distribution/src/lib.rs Outdated Show resolved Hide resolved
@montekki
Copy link
Contributor Author

bot merge

@ghost
Copy link

ghost commented Nov 25, 2020

Trying merge.

@ghost ghost merged commit a0541ce into paritytech:master Nov 25, 2020
@montekki montekki deleted the fs-optimise-pov-distribution branch November 25, 2020 09:21
ordian added a commit that referenced this pull request Nov 25, 2020
* master:
  Some code cleanup in overseer (#2008)
  PoV Distribution optimization (#1990)
  Approval Distribution Subsystem (#1951)
  Session management for approval voting (#1973)
  Do not send messages twice in bitfield distribution (#2005)
This pull request was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimize PoV Distribution: in-group gossip
5 participants