-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(kad): reuse outbound streams instead of closing them #3474
Conversation
e2e9d41
to
a56df6c
Compare
23aade7
to
8a807e0
Compare
I think this is ready to review, please take a look. Changes are relatively straightforward. |
…t outgoing substreams
Applied suggestions, diff is much smaller now. While it is easy to read, it feels a bit wrong to swap something with poisoned value and replace back with itself in case it didn't match. Such pattern is used in other places, so I adopted it for consistency, but memory copies right and left during iteration just to simplify things might be a bit less efficient than checking for match first and only update value if necessary. On the other hand maybe compiler is smart enough to optimize it away since the code is quite obvious, I didn't benchmark to compare. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more request, otherwise LGTM.
I'd also like @mxinden to approve this. The gist is: By reusing substreams here as the spec allows, we can make use of yamux's backpressure because a stream's buffer will eventually fill up if the remote is slower in reading AddProvider
messages than we are with writing them.
Together with a solution for backpressuring how fast we read new messages from the stream, this should allow us to saturate the network or at least get us closer to it.
I think we could avoid the copies by first doing a Feel free to leave as is though. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clippy fails, otherwise LGTM.
Waiting for @mxinden now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am sorry for only reviewing this change now. Thanks for the proposal. @nazar-pc can you please expand on the benefit of this change?
The gist is: By reusing substreams here as the spec allows, we can make use of yamux's backpressure because a stream's buffer will eventually fill up if the remote is slower in reading AddProvider messages than we are with writing them.
The default initial Yamux stream receive window is 256 KiB. With this change there are never more than one Kademlia request in-flight at once on a single stream. I don't think a single Kademlia request ever exceeds the stream receive window size (256 KiB). Thus I don't think the Yamux backpressure mechanism will ever kick in.
Together with a solution for backpressuring how fast we read new messages from the stream, this should allow us to saturate the network or at least get us closer to it.
Looking at this from a Kademlia perspective only, I would be surprised if one could saturate a link with Kademlia requests only. Though that is only an intuition.
I would like to go away from reusing streams across all libp2p protocols. Muxers should provide backpressure on new-stream creation (e.g. QUIC does). A consumer should backpressure on the number of inflight requests, which would equal the number of open streams, through the muxer implementation. For per-protocol backpressure see libp2p/specs#520.
Incorrect, the opposite is the intention.
I would like to move away from it too. But it is going to take a while before we can design and ship backpressure via status codes. I would argue that it doesn't matter how we ship the feature. We have a spec-compliant solution here. Once we have backpressure via error codes or something else working, we should be able to transparently (for the user) change the implementation. This gives us the best of both worlds: A working solution today and buys us time for designing something more general. |
I'm curious, what is the benefit of re-creating streams over and over again rather than creating once and using it for as long as necessary? Sounds less efficient and more tricky to manage. |
Good point.
I don't think we need status codes.
When reusing streams, how does a producer know how many concurrent requests it can send? In other words, how does it know how many streams it can open and keep open? I would prefer not to keep the magic number When not reusing streams, the consumer provides a dynamic limit on the number of streams via the multiplexer stream-creation backpressure mechanism. I.e. by keeping streams alive on the inbound side, the outbound side is backpressured at creating new outbound streams, i.e. new outbound requests. When reusing streams and sending requests without a response (here the When not reusing streams, the underlying transport can do retransmits on per-stream basis, thus one request is not head-of-line blocked by another.
I would treat streams as free, at least from a networking perspective. What makes you think they are expensive? |
Fair, I'm certainly onboard with this.
I'm not as familiar with implementation, but I assumed that the whole point of streams within the same TCP connection is that one stream doesn't block another. Otherwise what is the point?
Just intuitively, if feels like some sort of negotiation is necessary, so when we're talking about tens or hundreds of concurrent requests on each peer for hours it adds up. Especially because messages are small, overhead will be percentage-wise larger. I might be completely wrong though, I'm not familiar with implementation details yet. |
But our quic implementation is in alpha status and thus not recommendes for production I guess? In general, I don't think these efforts are incompatible with each other. Until we have a back-pressure mechanism across all transports and muxers, we will need to keep the limit to mitigate DoS attacks. As long as we have the limit, we will drop new inbound streams upon bursts of requests where there isn't a response to requests because the client-side limit of 32 gets "out of sync" with the server side. Why can't we provide backpressure through reuse in Kademlia and switch to backpressure through new streams at a later point? Both are spec compliant because streams CAN be reused but don't have to. |
Streams avoid HoL blocking between each other. Once you start to pipeline requests within a stream, you have HoL blocking again. In fact, the HoL blocking is desireable in this case because it creates a form of backpressure.
No negotiation is necessary for new streams. QUIC operates on a locally-known budget. Yamux only sets a flag with a new payload frame to indicate a new stream. Thus, the only overhead is the framing of the muxer which applies to new and existing streams as well. Together with |
This pull request has merge conflicts. Could you please resolve them @nazar-pc? 🙏 |
1 similar comment
This pull request has merge conflicts. Could you please resolve them @nazar-pc? 🙏 |
Closing here as I think we reached consensus that reusing a stream is not the way forward. Please comment in case you disagree. |
Description
This implements outgoing stream reuse in Kademlia by storing substream for future reuse after message was processed successfully instead of closing substream right away.
Notes
This is, unfortunately not working properly and after spending a few hours on it I could use external help.
Links to any relevant issues
This will be useful for #3468 (regardless of whatever version of it ends up landing).
Open Questions
None
Change checklist