Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Implement dormant connections/low overhead connections #7797

Closed
eskimor opened this issue Dec 28, 2020 · 10 comments
Closed

Implement dormant connections/low overhead connections #7797

eskimor opened this issue Dec 28, 2020 · 10 comments
Assignees
Labels
I9-optimisation An enhancement to provide better overall performance in terms of time-to-completion for a task. J2-unconfirmed Issue might be valid, but it’s not yet known.

Comments

@eskimor
Copy link
Member

eskimor commented Dec 28, 2020

Much of this will need to be implemented in substrate.

@eskimor eskimor added the I9-optimisation An enhancement to provide better overall performance in terms of time-to-completion for a task. label Dec 28, 2020
@eskimor eskimor self-assigned this Dec 28, 2020
@github-actions github-actions bot added the J2-unconfirmed Issue might be valid, but it’s not yet known. label Dec 28, 2020
@rphmeier
Copy link
Contributor

rphmeier commented Jan 2, 2021

What changes in particular do you anticipate? I thought that at the Polkadot level we could simply avoid sending "dormant" peers any data and that would reduce overhead enough.

@burdges
Copy link

burdges commented Jan 2, 2021

Ain't clear if dormant is the right word, but low overhead sounds correct. We'll have a lot of these open, but Linux can handle enough TCP provided ulimit is set correctly, but they all take the same TCP connection plus noise/tls state plus libp2p state. In fact, I'd presume this amounts to reconsidering the overhead that we impose upon every connection.

As an example, are we trying to gossip everything from all validators to all validators? That'd create problems. We also cannot remove gossip and ask everyone to send their own shit to everyone because then many messages never arrive. In the short term, we could simply take some off the gossip pool, although maybe we're better off with a purely randomized gossip.

In any case, I think this is about the overhead that libp2p and substrate impose upon connections due to expecting them to be used in a specific way.

@eskimor
Copy link
Member Author

eskimor commented Jan 2, 2021

What changes in particular do you anticipate? I thought that at the Polkadot level we could simply avoid sending "dormant" peers any data and that would reduce overhead enough.

Interesting. That would be easier, for sure. I was under the impression that a lot of stuff is already happening at the substrate level. My goal would have been to prevent the opening of substreams all together for low overhead/dormant connections, so we more or less end up with the overhead TCP imposes, but nothing more.

@rphmeier
Copy link
Contributor

rphmeier commented Jan 2, 2021

As an example, are we trying to gossip everything from all validators to all validators?

This is what we're trying not to do! But at the moment we can't do it and also be connected to all validators. paritytech/polkadot#2177 has more details.

My goal would have been to prevent the opening of substreams all together for low overhead/dormant connections, so we more or less end up with the overhead TCP imposes, but nothing more.

In terms of prioritization, I think that unless substreams require a lot of overhead to maintain, the first and easiest thing to build will be to just avoid sending gossip messages to most peers. That all can be done on a higher level. How much does it take to maintain a substream?

@bkchr
Copy link
Member

bkchr commented Jan 3, 2021

This has to be implemented in Substrate. Otherwise stuff like grandpa or sync would still run and this is clearly too much for a "dormant" peer.

@rphmeier
Copy link
Contributor

rphmeier commented Jan 3, 2021

@bkchr That's not true, at least after #7700 . GRANDPA and Sync can be limited to 25-50 peers even if parachain validation has 1000+.

@tomaka
Copy link
Contributor

tomaka commented Jan 4, 2021

See this comment.
While it doesn't take much to maintain a substream, this new layer looks extremely redundant with what sc-network already does.

I think it would make more sense to change Substrate to make the substream handshake customizable, and to be able to pass a list of peers to keep a connection open with even when they don't have any notifications substream (i.e. dormant).

@eskimor
Copy link
Member Author

eskimor commented Jan 4, 2021

Hmm, so far I came to the conclusion that with your latest PR it should be enough to just create a new peerset just for availability distribution which happens to have the size of the validator set (or two validator sets). So yeah, I also don't think we need another layer.

With regards to keeping it open, I don't know the timeouts in place, but in case we are regularly sending out availability chunks, they probably are not closed anyways and if we don't send them out regularly it should not matter much, if connections need to be re-opened.

This could in effect provide the right means to support collators with the same code path as validators (assuming we are doing something similar with availability recovery).

@rphmeier
Copy link
Contributor

rphmeier commented Jan 5, 2021

I don't believe it should be just for availability distribution. Availability Recovery and PoV Distribution also need direct validator<>validator connections.

@eskimor
Copy link
Member Author

eskimor commented Jan 15, 2021

Closing as this should all be Polkadot based on #7700.

@eskimor eskimor closed this as completed Jan 15, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
I9-optimisation An enhancement to provide better overall performance in terms of time-to-completion for a task. J2-unconfirmed Issue might be valid, but it’s not yet known.
Projects
None yet
Development

No branches or pull requests

5 participants