Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Improve collator side of the collator-protocol #1955

Merged
merged 6 commits into from
Nov 22, 2020

Conversation

bkchr
Copy link
Member

@bkchr bkchr commented Nov 17, 2020

This pr improves the collator-protocol implementation of the collator
side. Besides cleaning up code and rewriting it, the following changed:

  • Before on PeerViewChange we send an advertisment to every peer, now
    this only happens for validators.
  • It also adds a check that we send an advertisment message only once
    for a connected peer.
  • If the same validator was part of the current and next group, we
    requested to be connected to this validator two times. This is also
    fixed now.
  • Instead of having only one connection request, we now are being able
    to store multiple of them. This is required as we can have multiple
    active leafs at any point of time.

This pr improves the collator-protocol implementation of the collator
side. Besides cleaning up code and rewriting it, the following changed:

- Before on `PeerViewChange` we send an advertisment to every peer, now
this only happens for validators.
- It also adds a check that we send an advertisment message only once
for a connected peer.
- If the same validator was part of the current and next group, we
requested to be connected to this validator two times. This is also
fixed now.
- Instead of having only one connection request, we now are being able
to store multiple of them. This is required as we can have multiple
active leafs at any point of time.
@bkchr bkchr added A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders. labels Nov 17, 2020
Copy link
Member

@ordian ordian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good modulo comments.

node/network/collator-protocol/src/collator_side.rs Outdated Show resolved Hide resolved
node/network/collator-protocol/src/collator_side.rs Outdated Show resolved Hide resolved
@ordian
Copy link
Member

ordian commented Nov 17, 2020

If the same validator was part of the current and next group, we
requested to be connected to this validator two times. This is also
fixed now.

That is an expected behaviour. Validator discovery has reference counting for each requested peer to know when to disconnect.

Comment on lines -141 to +233
async fn distribute_collation<Context>(
ctx: &mut Context,
async fn distribute_collation(
ctx: &mut impl SubsystemContext<Message = CollatorProtocolMessage>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we please not carpet-format things and mix it with other changes 🙏? AFAIK different subsystems atm use both variants of this syntax.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope. When I fix stuff, I'm doing it everywhere. A better solution would anyway to implement all these functions for the state struct. It makes no sense that all these functions are free standing, this is not C.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes no sense that all these functions are free standing, this is not C.

This is a stylistic difference I strongly disagree with you on. Rust excels when used in a functional style, especially in async/await.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other sub systems are exactly written in the style I mentioned. I also see no benefit to always pass the same two parameters in state and context to every function.

And async/await doesn't require that you only use free standing functions.

@ordian
Copy link
Member

ordian commented Nov 18, 2020

Could you update to #1966?

Copy link
Member

@ordian ordian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, a couple of questions.

res = state.connection_requests.next().fuse() => {
let (relay_parent, validator_id, peer_id) = match res {
Some(res) => res,
None => continue,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is unreachable IIUC, it's fine, but maybe add a comment

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that is unreachable, another reason I had a Future before :D

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say we should write the caller code without relying on any quirkiness of the callee so that when after two years someone changes the behavior of the callee things do not break here. Usually saying something like "malloc does not fail anyway" is not ok. But this is not the hill I am willing to die on at the moment, so ok.

Copy link
Member

@ordian ordian Nov 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, ideally we want to enforce it via type system. But there is no abstraction for an infinite stream AFAIK, so emulating it with a stream that never returns a None is our best option so far.

state.peer_views.remove(&peer_id);
state.declared_at.remove(&peer_id);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we also need to remove it from peer_ids mapping in our_validator_groups?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it is fine to stay in there. our_validator_groups is mainly used to track PeerId -> ValidatorId mapping and will be cleaned up when the corresponding leaf is closed.

res = state.connection_requests.next().fuse() => {
let (relay_parent, validator_id, peer_id) = match res {
Some(res) => res,
None => continue,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say we should write the caller code without relying on any quirkiness of the callee so that when after two years someone changes the behavior of the callee things do not break here. Usually saying something like "malloc does not fail anyway" is not ok. But this is not the hill I am willing to die on at the moment, so ok.

@bkchr bkchr merged commit ba74791 into master Nov 22, 2020
@bkchr bkchr deleted the bkchr-improve-collator-protocol branch November 22, 2020 11:55
ordian added a commit that referenced this pull request Nov 22, 2020
* master:
  Improve collator side of the collator-protocol (#1955)
ordian added a commit that referenced this pull request Nov 23, 2020
* master:
  Improve collator side of the collator-protocol (#1955)
  add parity-keyring to install instructions (#1993)
  sane messaging defaults (#1994)
  cleanup validator discovery (#1992)
  Add Prometheus timers to the subsystems (#1923)
  Add tracing support to node (#1940)
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants