Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Implement own peer set for availability-distribution #2177

Closed
eskimor opened this issue Dec 28, 2020 · 5 comments
Closed

Implement own peer set for availability-distribution #2177

eskimor opened this issue Dec 28, 2020 · 5 comments
Assignees
Labels
I10-optimisation An enhancement to provide better overall performance in terms of time-to-completion for a task.

Comments

@eskimor
Copy link
Member

eskimor commented Dec 28, 2020

For availability distribution and availability recovery, we need to be able to connect to a lot of peers and communicate with them directly. In both subsystems we need to send node specific data to lots of different nodes, which is most efficient when just sent to them directly.

The problem: Right now the network subsystem treats all connections equally and will use all of them for example for gossiping. Gossiping to a thousand peers would be very inefficient though, therefore we need to provide a way to connect to peers without starting each supported protocol with them. @rphmeier coined the term dormant connections for this.

Use cases:

  1. Availability distribution: validators to validators
  2. Availability recovery: validators to validators
  3. Availability recovery: validators to other nodes

Use cases 1 and 2 could be implemented by just keeping connections open to all validators. For use case 3 this is not an option, as we don't expect nodes to permanently depend on availability recovery from validators. This is more a security measure in case a node behaves badly. (Collator hides a block sent to Validators from full nodes of its parachain and could thus stall the parachain.) So for use case 3, those connections should in general be short lived and not used much, to limit the load on validators.

Issue #1979 directly depends on this.

@eskimor eskimor added A3-in_progress Pull request is in progress. No review needed at this stage. I10-optimisation An enhancement to provide better overall performance in terms of time-to-completion for a task. labels Dec 28, 2020
@eskimor eskimor self-assigned this Dec 28, 2020
@pepyakin pepyakin removed the A3-in_progress Pull request is in progress. No review needed at this stage. label Dec 28, 2020
@rphmeier
Copy link
Contributor

rphmeier commented Jan 2, 2021

In practice, the code here should primarily affect the network bridge in implementation, but there will be some small fallout in the other networking subsystems so they can receive dormancy notifications.

I also think that we could implement dormancy as asymmetric. i.e. peers A and B might view each other differently. And that when we consider a peer active for gossip it means that we expect them to send us things, whereas we send them things only if they consider us active. That would leave us with four peer states:

// bitflags in practice
enum PeerActive {
    Dormant,
    ActiveReceive,
    ActiveSend,
    ActiveBoth,
}

The main reason I am thinking about asymmetry is because it makes the process of selecting other peers to be active with much easier. With asymmetrical activity each peer can just make random selections from the peer set to request to become active with. Sending data is cheaper than receiving as there is no data validation or signature checking, and so we should have different tolerances for how many ActiveSending peers (they initiate) we want and how many ActiveReceiving peers (we initiate) we want. This echoes the fact that with randomness some peers are bound to get chosen more than others.

I guess we want some new network messages:

// i don't care about names, just purposes
enum Msg {
    ...
    MakeMeActive, // request to be sent gossip messages
    MadeActive, // request accepted
    RejectActive, // request rejected
    
    MakeMeDormant, // request to stop being sent gossip messages.
    MadeDormant, // request accepted. rejection not allowed.
}

We can also rotate our set of active peers over time by making some dormant and randomly selecting new ones. That would prevent us from being active only with byzantine peers.

So we also need a new NetworkBridgeUpdateV0 variant for PeerActiveState(PeerId, PeerActive) or something like that. And I think it makes most sense to have the gossip subsystems themselves choose how to interpret and act on those peer states.

@burdges
Copy link
Contributor

burdges commented Jan 3, 2021

Around this paritytech/substrate#7797 and paritytech/substrate#7700 we do not currently know if we want longer-lived gossip priority groups, which makes space for low overhead connections, or if gossip messages should ideally be more random for some reason.

We could not go purely random gossip because politeness requires tracking our peers' knowledge, which supports tracking longer-lived gossip priority groups like in paritytech/substrate#7700

Yet, all this knowledge required by politeness should normally be fairly localized in time, so we actually could always be phasing peers into and out of the gossip priority group, while still giving us a much more randomized topology.

This is clearly future work, and not so important right now, but I wanted to (a) put it on people's radar in case it impacts any lower level design decisions, and (b) use it to explain why the current approach looks like the right one.

@tomaka
Copy link
Contributor

tomaka commented Jan 4, 2021

@rphmeier What you described is very similar to the code that is already in sc-network.

sc-network has a set of peers who it has a TCP connection with but no notifications channel, which matches your definition of "dormant", and a set of peers who it has an open notifications channel with, which matches your definition of "active".

It is totally possible to leave TCP connections open without opening a channel (i.e. keeping a peer dormant). The protocol doesn't require you to open a channel when you open a connection.
Notifications channels also already have a handshake and are asymmetric.

For the sake of simplicity, the handshake is currently not customizable (it is hardcoded to a Roles, which is where this value comes from), but it's just a matter of exposing it in the API.

@infinity0
Copy link
Contributor

infinity0 commented Jan 4, 2021

gossip messages should ideally be more random for some reason.

The exact topology is not so important, as long as it's more-or-less an expander graph. What is important, which we don't do currently, is ensure that all validators are connected to each other, not separated by an attacker. This could be done by defining a topology such as a d-regular random topology over the validators, calculated implicitly on-chain for every epoch, and having validators follow this in addition to any random non-validator full-node peers.

eskimor added a commit that referenced this issue Jan 6, 2021
By having everything peer set related depend directly on the enum the
code becomes more clear and it is also straight forward to add more
peersets/protocols as the compiler will complain if you forget to
implement parts of it.
eskimor added a commit that referenced this issue Jan 7, 2021
By having everything peer set related depend directly on the enum the
code becomes more clear and it is also straight forward to add more
peersets/protocols as the compiler will complain if you forget to
implement parts of it.
eskimor added a commit that referenced this issue Jan 7, 2021
By having everything peer set related depend directly on the enum the
code becomes more clear and it is also straight forward to add more
peersets/protocols as the compiler will complain if you forget to
implement parts of it.
eskimor added a commit that referenced this issue Jan 13, 2021
By having everything peer set related depend directly on the enum the
code becomes more clear and it is also straight forward to add more
peersets/protocols as the compiler will complain if you forget to
implement parts of it.
ghost pushed a commit that referenced this issue Jan 14, 2021
#2263)

* More doc fixes.

* Minor refactorings in the process of #2177

By having everything peer set related depend directly on the enum the
code becomes more clear and it is also straight forward to add more
peersets/protocols as the compiler will complain if you forget to
implement parts of it.

* Add peer set infos on startup properly

For feature real_overseer.

+ Fixes from review. Thanks @coriolinus and @ordian!

* More structure in network-bridge

Some changes, which would have helped me in groking the code faster.

Entry points/public types more to the top. Factored out implementation
in their own files, to clear up the top-level view.

* Get rid of local ProtocolName type definition.

Does not add much at this level.

* Fix tests + import cleanup.

* Make spaces tabs.

* Clarify what correct parameters to send_message are

* Be more less vague in docs of send_message.

* Apply suggestions from code review

Extend copyright on new files to 2021 as well.

Co-authored-by: Andronik Ordian <write@reusable.software>

Co-authored-by: Andronik Ordian <write@reusable.software>
@eskimor eskimor changed the title Implement dormant connections/low overhead connections Implement own peer set for availability-distribution Jan 16, 2021
@eskimor
Copy link
Member Author

eskimor commented Jan 22, 2021

Closing in favor of #2306.

@eskimor eskimor closed this as completed Jan 22, 2021
xlc pushed a commit to xlc/polkadot that referenced this issue Jan 26, 2021
…tytech#2177 (paritytech#2263)

* More doc fixes.

* Minor refactorings in the process of paritytech#2177

By having everything peer set related depend directly on the enum the
code becomes more clear and it is also straight forward to add more
peersets/protocols as the compiler will complain if you forget to
implement parts of it.

* Add peer set infos on startup properly

For feature real_overseer.

+ Fixes from review. Thanks @coriolinus and @ordian!

* More structure in network-bridge

Some changes, which would have helped me in groking the code faster.

Entry points/public types more to the top. Factored out implementation
in their own files, to clear up the top-level view.

* Get rid of local ProtocolName type definition.

Does not add much at this level.

* Fix tests + import cleanup.

* Make spaces tabs.

* Clarify what correct parameters to send_message are

* Be more less vague in docs of send_message.

* Apply suggestions from code review

Extend copyright on new files to 2021 as well.

Co-authored-by: Andronik Ordian <write@reusable.software>

Co-authored-by: Andronik Ordian <write@reusable.software>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
I10-optimisation An enhancement to provide better overall performance in terms of time-to-completion for a task.
Projects
None yet
Development

No branches or pull requests

6 participants