Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

approval-distribution: Update topology if authorities are discovered later #2981

Merged
merged 11 commits into from
Jan 25, 2024

Conversation

alexggh
Copy link
Contributor

@alexggh alexggh commented Jan 18, 2024

Fixes: #2138.

Especially on restart AuthorithyDiscovery cache is not populated so we create an invalid topology and messages won't be routed correctly for the entire session. This PR proposes to try to fix this by updating the topology as soon as we now the Authority/PeerId mapping, that should impact the situation dramatically.

This issue was hit yesterday, on Westend and resulted in stalling the finality.

TODO

  • Unit tests
  • Test impact on versi

Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
.filter(|peer| ids.contains(&peer.discovery_id))
{
peer.peer_ids.push(peer_id);
self.peer_ids.insert(peer_id);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is already in peer_ids, why do we insert it?

Copy link
Contributor Author

@alexggh alexggh Jan 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's why tests it is on the todo list, fixed.

I was looking more about what you think of the approach does it seem reasonable to you ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable at first sight, yes.

Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
@paritytech-cicd-pr
Copy link

The CI pipeline was cancelled due to failure one of the required jobs.
Job name: test-linux-stable 2/3
Logs: https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/4994683

Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
@alexggh alexggh marked this pull request as ready for review January 23, 2024 10:36
@alexggh
Copy link
Contributor Author

alexggh commented Jan 23, 2024

Tested on versi and this PR has the expected behaviour, at node restart we don't have any authority-peer_id mapping in the case, but at next active leaves updates all Peer ids are correctly detected.

@alexggh alexggh added the T8-polkadot This PR/Issue is related to/affects the Polkadot network. label Jan 23, 2024
Copy link
Member

@ordian ordian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable. Nice work!

polkadot/node/network/protocol/src/grid_topology.rs Outdated Show resolved Hide resolved
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
@alexggh alexggh added the R0-silent Changes should not be mentioned in any release notes label Jan 25, 2024
@alexggh alexggh added this pull request to the merge queue Jan 25, 2024
Merged via the queue into master with commit a6952c7 Jan 25, 2024
127 of 129 checks passed
@alexggh alexggh deleted the alexaggh/fix_broken_topology branch January 25, 2024 11:39
alexggh added a commit that referenced this pull request Feb 8, 2024
…overed later (#2981)

The previous fix was actually incomplete because we update the
authorties only on the situation where we decided to reconnect because
we had a low connectivity issue. Now the problem is that
update_authority_ids use the list of connected peers, so on restart that
does contain anything, so calling immediately after
issue_connection_request won't detect all authorties, so we need to also
check every block as the comment said, but that did not match the code.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
R0-silent Changes should not be mentioned in any release notes T8-polkadot This PR/Issue is related to/affects the Polkadot network.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Gossip topology does not properly work if peer_id is not cached at the beginning of the session
5 participants