[multistream-select] Require remaining negotiation data to be flushed. #1781

romanb · 2020-09-30T13:32:15Z

There appears to still be an edge-case in multistream-select whereby the remaining data to send w.r.t. protocol is successfully written before a poll_read on a Negotiated stream, but where the subsequent poll_flush() is pending. Now remaining is empty and the next poll_read() will go straight to reading from the underlying I/O stream, despite the flush not having happened
yet, which can lead to a form of deadlock during protocol negotiation. This seems to be the cause for the sporadic upgrade timeouts in #1629.

It was a happy coincidence that I looked into #1629 again just before async-io-1.1.5 was released, because up to 1.1.4, write operations would occasionally yield without really needing to, like this. 1.1.5 removed that random yielding for write operations but it is exactly these random yields that provoke the issue in the setup provided by #1629 (i.e. this code).

Rather than complicating the existing code further in order to accommodate for this case (e.g. by wrapping the remaining bytes in an Option to distinguish whether they have been flushed or not), it seems preferable to simplify the code by giving up on this optimisation that only affects the last negotiation protocol message sent by the "listener". So we give up on the ability to combine data sent by the "listener" immediately after protocol negotiation together with the final negotiation frame in the same transport-level frame/packet. That seems acceptable. The code simplifications should be convincing.

There appears to still be an edge-case whereby the `remaining` data to send w.r.t. protocol negotiation to send is successfully written before a `poll_read` on a `Negotiated` stream, but where the subsequent `poll_flush()` is pending. Now `remaining` is empty and the next `poll_read()` will go straight to reading from the underlying I/O stream, despite the flush not having happened yet, which can lead to a form of deadlock during protocol negotiation. Rather than complicating the existing code further in order to accommodate for this case, it seems preferable to simplify the code by giving up on this optimisation that only affects the last negotiation protocol message sent by the "listener". So we give up on the ability to combine data sent by the "listener" immediately after protocol negotiation together with the final negotiation frame in the same transport-level frame/packet.

romanb · 2020-09-30T13:57:50Z

There appears to still be an edge-case in multistream-select whereby the remaining data to send w.r.t. protocol is successfully written before a poll_read on a Negotiated stream, but where the subsequent poll_flush() is pending. Now remaining is empty and the next poll_read() will go straight to reading from the underlying I/O stream, despite the flush not having happened
yet, which can lead to a form of deadlock during protocol negotiation. This seems to be the cause for the sporadic upgrade timeouts in #1629.

To make this a bit more concrete: If Negotiated::poll_read still has remaining data to send despite being in state Completed, it calls Negotiated::poll() to complete protocol negotiation. Then we try to flush() which writes the remaining negotiation data. However, if the final flush is Pending, remaining is already empty and on the next Negotiated::poll_read() we go straight to reading from the underlying I/O stream, but the remote may still be waiting for the remaining negotiation data/frame before sending and that still sits around unflushed in the buffer. The sporadic, random yields by async-io <= 1.1.4 for write operations, including poll_flush() is what made this more likely to occur every now and then.

mxinden

Great catch! Implementation looks good to me.

So we give up on the ability to combine data sent by the "listener" immediately after protocol negotiation together with the final negotiation frame in the same transport-level frame/packet.

For e.g. TCP connections where TCP_NODELAY is not set, hence Nagle's algorithm still being active, final negotiation frames might still be bundled with upper layer payloads in the same transport-level frame, no?

romanb · 2020-10-01T08:53:44Z

For e.g. TCP connections where TCP_NODELAY is not set, hence Nagle's algorithm still being active, final negotiation frames might still be bundled with upper layer payloads in the same transport-level frame, no?

I think so, yes.

Roman S. Borschel added 2 commits September 30, 2020 15:20

Update changelog.

1067636

romanb mentioned this pull request Sep 30, 2020

Random upgrade timeouts between peers #1629

Closed

Add missing comma.

77e315e

mxinden approved these changes Oct 1, 2020

View reviewed changes

Merge branch 'master' into multiselect-flush-remaining

8334b14

romanb merged commit 8cec457 into libp2p:master Oct 1, 2020

romanb deleted the multiselect-flush-remaining branch October 1, 2020 10:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[multistream-select] Require remaining negotiation data to be flushed. #1781

[multistream-select] Require remaining negotiation data to be flushed. #1781

romanb commented Sep 30, 2020 •

edited

Loading

romanb commented Sep 30, 2020

mxinden left a comment

romanb commented Oct 1, 2020

[multistream-select] Require remaining negotiation data to be flushed. #1781

[multistream-select] Require remaining negotiation data to be flushed. #1781

Conversation

romanb commented Sep 30, 2020 • edited Loading

romanb commented Sep 30, 2020

mxinden left a comment

Choose a reason for hiding this comment

romanb commented Oct 1, 2020

romanb commented Sep 30, 2020 •

edited

Loading