-
Notifications
You must be signed in to change notification settings - Fork 965
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(gossipsub): More lenient flood publishing #3666
Conversation
I would suggest a separate channel from the Gossipsub That way we can get rid of the potentially unbounded growth buffer
Paraphrasing to make sure we are on the same page. I agree that we should send the same gossipsub message to multiple peers in parallel. I don't think we should send a single message to peers in sequence.
I would like to not introduce yet another tunable magic number to the already large Gossipsub configuration surface. Based on intuition I would argue that choosing the right value for
With my above suggestion we would make sure to first send the message to fast peers, i.e. peers with capacity in the
To ease review and to make sure these optimizations for sure make it into On the meta level, very much appreciate the continued work you are putting into GossipSub! |
I may need to check my understanding of this. This solution would solve the problem of a potentially unbounded I'm not sure if this will help the particular problem I'm trying to solve however (bursts of large messages). The problem we have is large message sizes. We publish these large messages (blocks) periodically every 12 seconds. I imagine that our send queue for each peer would be more or less empty. We then need to publish the large message such that it reaches the network as fast as it can. I think we're seeing delays when we send the message to all the connection handlers at once for all our peers (80 or so, due to flood publishing) (which as I understand would still happen with the channel you have suggested). Then when the OS tries to send 80x our large message it takes quite a while to get the first one out to the network. The idea here would be to try and stagger the send (assuming the send_queue is 0 for all peers) and send just to mesh peers, then optionally some additional ones later.
I agree. I was trying to make the config simple, it that you can just use the default and all extra configs are not used at all by most users. The extra number I had imagined would be based on peer-count. I had intended to use it to send to at most 20 or 30 extra peers even if my peer count was 80-100. In Lighthouse the user is able to adjust their peer count, with the extra config parameter I can decouple the user-configured Lighthouse peer count from the flood publishing in gossipsub for these large messages. I guess its difficult, in that there are many ways we can tune a gossipsub network, and I wanted to try and make it as generic as we can for all users. We could simplify the config by fixing some of the parameters, this case included.
If we could do this, I think this would be ideal. Perhaps I've misunderstood the solution. As I understand, we classify fast from slow based on the size of the channel. I expect most of the channels to be empty in my particular circumstances. We also have a few variety of messages. Some are very small and some are large. Therefore the queue size is not trivially indicative of fast/slow peers. If we can identify reasonable which peers are fast/slow and which have immediate capacity then this would be a great approach. But also that when we burst send, we dont immediately saturate our bandwidth and make all our peers slow.
Yeah, i'll try and do this more in the future. I just get in there and notice these things and fix them while i'm there. There is another fairly important change I should get in, but its lurking in an episub branch somewhere |
This pull request has merge conflicts. Could you please resolve them @AgeManning? 🙏 |
This pull request has merge conflicts. Could you please resolve them @AgeManning? 🙏 |
This pull request has merge conflicts. Could you please resolve them @AgeManning? 🙏 |
In the case of forwarding: a message would be dropped, along the lines of don't accept work you can't handle. In the case of publishing: a message, one would backpressure to the user, i.e.
Understanding of my suggestion is correct. Thanks for expanding. You are right, my suggestion won't solve your immediate problem, namely where the bottleneck is the direct uplink of the local machine. To help me form a better understanding, can you share some data? E.g. resource graphs of a machine running into this issue (bandwidth, CPU, ...), or the latency of delivery of one message across remote nodes? What order of magnitude are we talking about here? Is this on the order of 100ms? |
@ackintosh has done some simulations in testground. I've not run them myself but he reports a 30% reduction in latency for given message sizes. (You can see and run them here: sigp/gossipsub-testground#15) This issue is highly correlated with message sizes. On Ethereum mainnet, we are publishing 100-200kb block messages. These are seeing delays mostly around 1s, with a tail up to around 4s. This PR seemed like a good low-hanging-fruit with decent gains to quickly throw in before introducing more severe modifications (episub). |
Thank you for expanding on the eth2 perspective. This is helpful.
Agreed that based on sigp/gossipsub-testground#15 this provides "decent gains". I don't think it is a low-hanging fruit. In my eyes, relative to its impact it adds a lot of complexity to an already very complex machinery.
We seem to be in agreement that we should add proper backpressure to
Do you have data on this? In other words, do you know Backpressure might as well have an impact on the problem this pull request is trying to solve, namely to keep buffers small and thus sending latency low. Next to always forwarding a new message to all mesh peers, one could e.g. forward to non-mesh peers on a best effort basis, depending on whether there is space in the channel to their Next to backpressure I wonder whether we should solve this problem at the In sigp/gossipsub-testground#15 (comment) I am suggesting to experiment with QUIC. I know that moving eth2 to QUIC is a larger effort. But this will at least give us a baseline of what would be possible, potentially revealing potential improvements in our TCP stack. In case you think a synchronous discussion is helpful, I am currently based in Japan, thus our timezones align nicely for a call. Also happy to continue here. |
I've been meaning to add QUIC support for Lighthouse. There are some complications around doing it, however. Anyway, yeah it sounds like a call might be useful on this to get your thinking about it. |
## Issue Addressed Following the conversation on libp2p/rust-libp2p#3666 the changes introduced in this PR will allow us to give more insights if the bandwidth limitations happen at the transport level, namely if quic helps vs yamux and it's [window size limitation](libp2p/rust-yamux#162) or if the bottleneck is at the gossipsub level. ## Proposed Changes introduce new quic and tcp bandwidth metric gauges. cc @mxinden (turned out to be easier, Thomas gave me a hint)
This pull request has merge conflicts. Could you please resolve them @AgeManning? 🙏 |
Closing this for now. There are potentially other avenues of improving the send times of messages. |
@AgeManning could you please share the other avenues being considered? |
We are experimenting on a fork. Also there was a yamux improvement which should saturating the bandwidth of the node: #4970 We are also experimenting without flood publishing. Our goal is to have a very high peer count, and we may make a new version of this PR with partial flood-publishing, where its only to a small section of peers. So perhaps this issue can be revived later. |
Thanks, I'll take a look.
This test ackintosh/gossipsub-testground#3 claims there is only a 5% improvement when using QUIC, do you think it could be different with Yamux? |
@AgeManning have you considered adding the msgs related to flood publishing to the non-priority queue? |
This is an interesting idea. We were suspecting that our issue was with parallel sending a large message to many peers all at once. The idea of staggering was that we get a few messages out into the network before trying to send more to other peers. I think adding it to the non-priority queue, would help (and is a good idea) but because the queues are per-peer, we'd still run into the problem of trying to send all the messages at once. Our current strategy is just to remove flood-publishing. We are also going to build a v1.2 of gossipsub with an IDONTWANT message to reduce bandwidth, and if further improvements are necessary, we may revisit staggered sending. |
Description
In gossipsub, there can be latency issues when performing flood publishing (which is on by default). Although the default mesh size is around 10 peers, users can have many more connections (in the Ethereum use case, exceeding 100). It can be the case that all of these extra peers also subscribe to a given topic.
When publishing, a node then attempts to try and send a message to all peers that are subscribed to a given topic. When the peer count is large, this can create a large delay in message propagation due to bandwidth limitation and lack of prioritization of peers. This is particularly true for larger message sizes.
I don't know of an easy way to manage back pressure when burst sending large amounts of data in rust-libp2p. I think forcing sequential sends deprives us of the benefits of parallelism in sending between many peers which would saturate our bandwidth (a good thing).
My current solution (this PR) is to offload some of these decisions to the user (or at the least give them ability to modify the behaviour). I've added a new option to flood publish and made it the default. The options for flood publishing are enumerated by this new struct:
Disabled is removing flood publishing altogether (this options existed before).
Rapid
is what the default used to be andHeartbeat(usize)
is the new option added in this PR.The heartbeat option allows the user to specify a maximum number of peers to flood publish to, which are selected at random. Thus if 100 peers are a connection limit, a user may opt to flood publish to only half of them. Instead of publishing the message immediately to all peers, this option will publish a message immediately to its mesh peers (and explicit + fanout) and then wait until the next heartbeat to attempt to publish to the rest. The idea being that we stagger the sending of large data such that the first few messages can get sent before we start sending the rest in parallel.
Notes & open questions
I've made some small optimizations whilst changing some logic here. I removed double protobuf encoding of some messages in the publish function.
Change checklist