[mplex] Refactoring with Patches #1769

romanb · 2020-09-22T13:46:33Z

Overview

This is a rather substantial refactoring of libp2p-mplex, though the control-flow skeleton remains the same, as dictated by the StreamMuxer API. In the context of looking into #1758, besides a bug in libp2p-ping that only occurs with mplex and for which I will open a separate PR, the following two problems in libp2p-mplex became apparent and are intended to be addressed here:

Avoid stalls caused by a read operation on one substream reading (and buffering) frames for another substream without
notifying the corresponding task. While testing Ping protocol tests fail when mplex is used instead of yamux. #1758, even when the ping_pong tests ran through, there were occasional stalls in between. It turned out that, when a read operation for one substream buffers frames for another, the corresponding task that may be waiting on these buffered frames is not notified. This led to situations in the ping_pong test where both sides sent each other the outbound ping simultaneously, each on its own substream, and after the read operations for the inbound substreams returned Pending, the read operation for the outbound substreams waiting for the responses would read and buffer the inbound ping of the remote for the other substream and, not finding any frames for themselves yet, again returning Pending without waking up the task(s) interested in the newly buffered frame(s). Only the ping timeout triggered the polling again. To avoid needless polling of the same task, it seemed necessary to me that read-interest for substreams is tracked for the particular substream IDs, instead of "globally".
Remove dropped substreams from the tracked set of open substreams, to avoid artificially running into substream limits.

This PR alone is necessary but not sufficient to r-e-s-o-l-v-e #1758, since there is also a small bug in libp2p-ping when used in combination with libp2p-mplex. This bug will receive a separate, follow-up PR.

Related Context

Both of the above problems probably relate to #1629 and #1504 which report mplex symptoms that seem highly related (hitting unexpected substream limits and intermittent stalls). Though the latter is closed because panics were fixed, the problem with unexpectedly hitting the substream limit remained unsolved there, I think.

Other Changes

While I was at it, I also went ahead with #313 and thus another change here is:

Schedule sending of a Reset frame when an open substream gets dropped.

Furthermore, the semantics of the max_substreams configuration changed as follows:

Old behaviour: The connection is immediately closed with an error, regardless of whether the limit is reached by an attempt to open an outbound substream or by a new inbound substream.
New behaviour: Outbound substream attempts beyond the configured limit are delayed (Poll::Pending) with a wakeup once an existing substream closes, i.e. the limit results in back-pressure for new outbound substreams. New inbound substreams beyond the limit are immediately answered with a Reset, again a form of back-pressure. If too many (by some internal threshold) pending Reset frames accumulate, e.g. as a result of an aggressive number of inbound substreams being opened beyond the configured limit, the connection is closed ("DoS protection").

Testing

While the multiplexers themselves still need more testing, also compatibility testing (#508), and ideally comparative performance testing, I have so far done the following (besides passing the libp2p test suite where mplex is often used, of course):

A follow-up PR for libp2p-ping that fixes a small bug and depends on this PR will randomise the choice of multiplexer used for integration tests, expecting the same behaviour. This kind of testing essentially revealed all the issues mentioned here as a result of looking into Ping protocol tests fail when mplex is used instead of yamux. #1758.
A subtrate node seems to start and sync fine, configured only with mplex and using this branch.

Thereby addressing the following issues: * Send a `Reset` frame when open substreams get dropped (313). * Avoid stalls caused by a read operation on one substream reading (and buffering) frames for another substream without notifying the corresponding task. I.e. the tracked read-interest must be scoped to a substream. * Remove dropped substreams from the tracked set of open substreams, to avoid artificially running into substream limits.

By taking the substream state into account. The refined behaviour is modeled after the behaviour of Yamux.

muxers/mplex/src/io.rs

tomaka · 2020-09-23T08:16:48Z

muxers/mplex/src/io.rs

+    /// Sends pending frames, without flushing.
+    fn send_pending_frames(&mut self, cx: &mut Context<'_>) -> Poll<io::Result<()>> {
+        while let Some(frame) = self.pending_frames.pop() {
+            if self.poll_send_frame(cx, || frame.clone())?.is_pending() {


Why not call self.pending_frames.pop() within the closure? As far as I understand, if the closure is called, the outcome cannot be Pending.

I don't think that's permissible due to borrowing. Even if poll_send_frame were made a standalone function and thus not borrow all of self, it still calls on_error in case of an I/O error which also wants a mutable borrow for pending_frames to drop/clear it on error. I agree that the pop/push as well as the frame cloning is suboptimal, but I didn't consider it that important or inefficient. The frames sent here are only close/reset frames containing only a stream ID. We could of course make poll_send_frame return the frame in the Pending case with a custom return enum.

tomaka · 2020-09-23T08:33:16Z

muxers/mplex/src/io.rs

+struct NotifierRead {
+    /// List of wakers to wake when read operations can proceed
+    /// on a substream (or in general, for the key `None`).
+    pending: Mutex<FnvHashMap<Option<LocalStreamId>, Waker>>,


If I understand correctly, wakers whose key (the Option<LocalStreamId>) is Some come from calling poll_read_stream, and wakers whose key is None come from calling poll_next_stream.

If that's accurate, then it is correct to have a Waker here rather than a Vec<Waker>.

I find it quite hard to follow the logic, and it doesn't seem very fool-proof, but I also don't know how to make it clearer.

If that's accurate, then it is correct to have a Waker here rather than a Vec.

Yes.

I find it quite hard to follow the logic, and it doesn't seem very fool-proof, but I also don't know how to make it clearer.

Not sure what logic you mean, I guess in general the task wakeup logic for reading? Yes, that is, as is mostly the case, subtle. There are these two different "threads of control" for reading by design: 1) Awaiting the next inbound stream and b) Awaiting new data on a specific stream. Thereby 1) and each instance of 2) potentially read and buffer frames for other streams (or for new inbound streams) that need to result in wakeup attempts.

Can one of you expand on why it is safe to only cache a single waker for Option<LocalStreamId>::None?

Say Multiplexed::poll_next_stream is polled by two different tasks in an alternating fashion. Wouldn't it be desirable if both tasks are woken up once a new Open frame arrives?

Can one of you expand on why it is safe to only cache a single waker for Option::None?

Because the key None is only used by poll_next_stream, which is part of the implementation for StreamMuxer::poll_event whose API contract explicitly states that "Only the latest task that was used to call this method may be notified.". The API contract for StreamMuxer::read_substream, StreamMuxer::write_substream etc. is similar. In general, and as far as I know, the reason for StreamMuxers to be Sync is to permit using different substreams from different threads, not to use the same substream from different threads. Only waking the last task that polls is generally common for most such APIs, e.g. Future::poll has an analogous contract. In practice, I would think the reason for this contract is a matter of practicality due to ease and efficiency of implementation and the relatively rare usefulness for having to "remember" all tasks to wake, instead of just the last.

Thanks for the in-depth explanation.

I would think the reason for this contract is a matter of practicality due to ease and efficiency of implementation and the relatively rare usefulness for having to "remember" all tasks to wake, instead of just the last.

One big problem with remember all tasks to wake up is that your container of wakers might grow a lot.

For instance, imagine that it takes 60 seconds before data is received on a substream and that, due to the nature of polling, we poll a substream 10 times per second. This results in 600 calls to poll_read_substream before the wakers actually get awakened. This means that we would insert 600 elements in the list of wakers.

The will_wake method allows reducing the amount of duplicates, and also we're not really supposed to poll a substream 10 times per second. But both these mechanisms are unreliable.

Enforcing one waker per "thing to wake" guarantees a bound to the number of wakers, which is not a bad idea.

* Make the pending frames a FIFO queue. * Take more care to avoid keeping read-wakers around and to notify them when streams close.

It is probably safer to always wake pending wakers.

muxers/mplex/src/codec.rs

muxers/mplex/src/io.rs

Co-authored-by: Max Inden <mail@max-inden.de>

romanb · 2020-09-24T10:01:30Z

@mxinden I assume you're still taking your time to review. There is no rush from my side, I appreciate the time taken for the reviews.

mxinden

I am fine merging this pull request as is. I have a couple of questions suggestions, but none of them are important. For what my opinion is worth, I find this to be a well-conceived implementation.

muxers/mplex/src/io.rs

Co-authored-by: Max Inden <mail@max-inden.de>

While seemingly duplicating some control flow between `poll_next_strean` and `poll_read_stream`, the individual control flow of each read operation is easier to follow.

romanb · 2020-09-25T09:09:22Z

thread 'async-std/runtime' panicked at 'Peer1: Unexpected event: Event(OutboundFailure { peer: PeerId("12D3KooWPC56uLBNMVu3fLA5jDxA32uxpHUnUpZMEqLAQpjtRFZC"), request_id: RequestId(205), error: ConnectionClosed })', protocols/request-response/tests/ping.rs:164:22
note: run with RUST_BACKTRACE=1 environment variable to display a backtrace
test ping_protocol ... FAILED

These tests don't even use mplex, so this sudden error seems unrelated, but I can look into it separately, if it is somewhat reproducible.

mxinden · 2020-09-25T10:40:24Z

muxers/mplex/src/io.rs

+struct NotifierRead {
+    /// List of wakers to wake when read operations can proceed
+    /// on a substream (or in general, for the key `None`).
+    pending: Mutex<FnvHashMap<Option<LocalStreamId>, Waker>>,


Can one of you expand on why it is safe to only cache a single waker for Option<LocalStreamId>::None?

Say Multiplexed::poll_next_stream is polled by two different tasks in an alternating fashion. Wouldn't it be desirable if both tasks are woken up once a new Open frame arrives?

mxinden · 2020-09-25T10:51:41Z

muxers/mplex/src/io.rs

+    /// The configuration.
+    config: MplexConfig,
+    /// Buffer of received frames that have not yet been consumed.
+    buffer: Vec<Frame<RemoteStreamId>>,


Just an idea, not sure it is worth pursuing, especially without a benchmark proving it to be an issue: Does the ordering between different StreamIds matter? If not, wouldn't a HashMap<Option<StreamId>, Vec<Frame<RemoteStreamId>>> be more efficient as poll_next_stream and poll_read_stream would not have to iterate over the entire (worst case 4096 items) buffer on each call, but only a small subset? Next to the additional complexity in the data structure itself, this would also make buffer length tracking a bit harder.

In principle, yes, I'd prefer to have a read buffer per stream, i.e. a map. Not just because of potentially suboptimal performance if just a single stream is slow to consume its frames, but also because it would allow just Reseting a single stream if the buffer limit for that stream is hit, instead of failing the entire connection when the shared buffer is full with the default configuration MaxBufferBehaviour::CloseAll. Essentially changing MaxBufferBehaviour::CloseAll to MaxBufferBehaviour::ResetStream, as this is more along the lines of what the mplex spec prescibes in the implementation notes. The configured buffer limit would then apply to a single substream and together with the substream limit lets one still reason about resource usage bounds. But in any case, the shared buffer was pre-existing code and I'd really prefer to propose such a change in a separate PR. I didn't want to pile up too many semantical changes here.

muxers/mplex/src/io.rs

mxinden

Still looks good to me. Thanks for all the comments.

Roman S. Borschel added 2 commits September 22, 2020 15:08

Update CHANGELOG.

5d9c0e3

romanb added category:multiplexing labels Sep 22, 2020

romanb mentioned this pull request Sep 22, 2020

[ping] Add missing flush after write. #1770

Merged

Roman S. Borschel added 2 commits September 23, 2020 09:27

Refine behaviour of dropping substreams.

82fb112

By taking the substream state into account. The refined behaviour is modeled after the behaviour of Yamux.

Tweak docs and recv buffer retention.

9ff2e55

tomaka requested changes Sep 23, 2020

View reviewed changes

Roman S. Borschel added 2 commits September 23, 2020 15:39

Further small tweaks.

0133dd1

* Make the pending frames a FIFO queue. * Take more care to avoid keeping read-wakers around and to notify them when streams close.

Prefer wake over unregister.

6e6c637

It is probably safer to always wake pending wakers.

mxinden reviewed Sep 23, 2020

View reviewed changes

muxers/mplex/src/codec.rs Outdated Show resolved Hide resolved

muxers/mplex/src/io.rs Outdated Show resolved Hide resolved

muxers/mplex/src/io.rs Outdated Show resolved Hide resolved

romanb and others added 2 commits September 23, 2020 16:07

Update muxers/mplex/src/codec.rs

154385f

Co-authored-by: Max Inden <mail@max-inden.de>

Update muxers/mplex/src/io.rs

d15a5b2

Co-authored-by: Max Inden <mail@max-inden.de>

tomaka approved these changes Sep 23, 2020

View reviewed changes

Some review feedback and cosmetics.

dbe8337

mxinden approved these changes Sep 24, 2020

View reviewed changes

muxers/mplex/src/io.rs Outdated Show resolved Hide resolved

muxers/mplex/src/io.rs Outdated Show resolved Hide resolved

muxers/mplex/src/io.rs Outdated Show resolved Hide resolved

muxers/mplex/src/io.rs Outdated Show resolved Hide resolved

romanb and others added 2 commits September 24, 2020 14:27

Update muxers/mplex/src/io.rs

7ed7e01

Co-authored-by: Max Inden <mail@max-inden.de>

Revise read control flow for clarity.

7bd1498

While seemingly duplicating some control flow between `poll_next_strean` and `poll_read_stream`, the individual control flow of each read operation is easier to follow.

romanb force-pushed the patch-mplex branch from d75627b to 7bd1498 Compare September 25, 2020 08:51

Roman S. Borschel added 3 commits September 25, 2020 11:10

CI

a050d11

Rename Status::Ok to Status::Open.

adca14f

Rename pending_flush to pending_flush_open.

37c464c

mxinden reviewed Sep 25, 2020

View reviewed changes

mxinden approved these changes Sep 25, 2020

View reviewed changes

Roman S. Borschel added 3 commits September 28, 2020 10:02

Finishing touches.

77ea68b

Merge branch 'master' into patch-mplex

f17a861

Tweak changelog.

436635e

romanb added the pr-queued-to-merge label Sep 28, 2020

romanb merged commit 0b18b86 into libp2p:master Sep 28, 2020

romanb deleted the patch-mplex branch September 28, 2020 08:30

This was referenced Sep 28, 2020

Random upgrade timeouts between peers #1629

Closed

Properly send a close message for mplex substreams #313

Closed

romanb mentioned this pull request Oct 5, 2020

[mplex] Split the receive buffer per substream. #1784

Merged

This was referenced Oct 26, 2020

Mplex stream termination issue #1814

Closed

[mplex] Revert to being lenient with duplicate Close frames. #1816

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[mplex] Refactoring with Patches #1769

[mplex] Refactoring with Patches #1769

romanb commented Sep 22, 2020 •

edited

Loading

tomaka Sep 23, 2020

romanb Sep 23, 2020

tomaka Sep 23, 2020

romanb Sep 23, 2020

mxinden Sep 25, 2020

romanb Sep 25, 2020 •

edited

Loading

mxinden Sep 25, 2020

tomaka Sep 28, 2020

romanb commented Sep 24, 2020

mxinden left a comment

romanb commented Sep 25, 2020

mxinden Sep 25, 2020

mxinden Sep 25, 2020

romanb Sep 25, 2020

mxinden left a comment

[mplex] Refactoring with Patches #1769

[mplex] Refactoring with Patches #1769

Conversation

romanb commented Sep 22, 2020 • edited Loading

Overview

Related Context

Other Changes

Testing

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

romanb Sep 25, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

romanb commented Sep 24, 2020

mxinden left a comment

Choose a reason for hiding this comment

romanb commented Sep 25, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mxinden left a comment

Choose a reason for hiding this comment

romanb commented Sep 22, 2020 •

edited

Loading

romanb Sep 25, 2020 •

edited

Loading