Invalid Swarm behaviour #2242

AgeManning · 2021-09-21T05:14:57Z

I've been testing some code on the latest master and running into unexpected behaviour from the Swarm.

I've found that Swarm is injecting a ConnectionId to the behaviour via inject_connection_established(). In some less-common circumstances (which I've not isolated) upon disconnection, via inject_connection_closed() the Swarm is inputting a different connection id.

In gossipsub, this causes a panic as we are expecting the swarm to inform the behaviour of the past-injected connection, not a new one.

The text was updated successfully, but these errors were encountered:

AgeManning · 2021-09-23T01:53:36Z

Ok, I did a bit of digging.

It looks like in #2191 for new connections beyond a connection limit we close the connection. Previously we reported a PendingConnectionError as a PoolEvent.

I think we then are injecting connection closed events to the behaviour with the ConnectionLimit error if a connection is attempted to being established beyond the connection limit. Ultimately a behaviour will see an inject_connection_closed() without a matching inject_connection_established() for the same id.

If this is the intended behaviour, I'll have to update some assumptions in gossipsub and maybe some other behaviours I've built

What do you guys think? @thomaseizinger @mxinden

AgeManning · 2021-09-25T03:42:37Z

I've also seen the following panics

So I assume we probably need to change the behaviour of how the connections are reported.

mxinden · 2021-09-27T20:07:05Z

Thank you @AgeManning for the debugging work. Sorry for the trouble this has caused on your end.

It looks like in #2191 for new connections beyond a connection limit we close the connection. Previously we reported a PendingConnectionError as a PoolEvent.

I think we then are injecting connection closed events to the behaviour with the ConnectionLimit error if a connection is attempted to being established beyond the connection limit. Ultimately a behaviour will see an inject_connection_closed() without a matching inject_connection_established() for the same id.

This is exactly right! In #2191 I changed the behaviour on how connection limit errors are reported. Long story short, I did so in order to be able to return the handler to the behaviour on connection limit errors. More specifically in pool.rs in case the established-limit is reached, I do not report a PoolEvent::PendingConnectionError directly, but instead take a detour through task.rs in order to pick up the handler and return it along with the connection limit error as a PoolEvent::ConnectionClosed.

If this is the intended behaviour, I'll have to update some assumptions in gossipsub and maybe some other behaviours I've built.

While returning the handler along with the error was obviously the intention of #2191, triggering a NetworkBehaviour::inject_connection_closed without a NetworkBehaviour::inject_connection_established was not. Not only was it not intended, but also do I find it counter intuitive. With that in mind I would very much prefer to fix this within lib2p-core and libp2p-swarm instead of expecting NetworkBehaviour implementors to handle this caveat. I guess you wold agree @AgeManning?

First solution that comes to my mind which would retain the intended behaviour of #2191 (returning the handler), while upholding the guarantee that every inject_connection_closed is preceded by a inject_connection_established: Merge the max_pending_incoming with the max_established_incoming and the max_pending_outgoing with the max_established_outgoing connection limit. Limits would only need to be checked before connection upgrade, thus being able to directly return the handler, not requiring the detour through task.rs. I don't see much benefit in having the two limits (pending and established) in the first place. I expect the time of a connection spend in pending state to be on the order of milliseconds, thus only occupying a pending-limit-slot for a short time after which it is then trying to acquire a established-limit-slot.

What do people think? Does someone have the need to differentiate in pending and established connection limit?

Merge pending and established limits for both incoming and outgoing connections. More specifically merge `ConnectionLimits::with_max_pending_incoming` with `ConnectionLimits::with_max_established_incoming` and `ConnectionLimits::with_max_pending_outgoing` with `ConnectionLimits::with_max_established_outgoing`. Connection limits are checked on `Network::dial` for outgoing and on `Network::accept` for incoming connections. This (a) simplifies connection limits from an implementations and user perspective and (b) simplifies returning a connection handler on limit error as limits can only be exceeded at the start of dialing and accepting. See [1]. [1]: libp2p#2242 (comment)

mxinden · 2021-09-28T12:40:49Z

#2253 showcases the solution suggested above.

mxinden · 2021-09-28T12:41:32Z

I've also seen the following panics
* https://github.com/libp2p/rust-libp2p/blob/master/protocols/identify/src/identify.rs#L277

* https://github.com/libp2p/rust-libp2p/blob/master/core/src/network/peer.rs#L412
So I assume we probably need to change the behaviour of how the connections are reported.

I still need to dig deeper into these failures. I am not yet sure whether they are related.

AgeManning · 2021-09-28T23:48:37Z

This sounds good to me.

I had a look at the identify panic, and it is definitely related, assume the core/network/peer one is too, but didn't dive into it.

Concurrently dial address candidates within a single dial attempt. Main motivation for this feature is to increase success rate on hole punching (see #1896 (comment) for details). Though, as a nice side effect, as one would expect, it does improve connection establishment time. Cleanups and fixes done along the way: - Merge `pool.rs` and `manager.rs`. - Instead of manually implementing state machines in `task.rs` use `async/await`. - Fix bug where `NetworkBehaviour::inject_connection_closed` is called without a previous `NetworkBehaviour::inject_connection_established` (see #2242). - Return handler to behaviour on incoming connection limit error. Missed in #2242.

AgeManning · 2021-11-01T02:19:47Z

From what I can tell the latest master has resolved this

Concurrently dial address candidates within a single dial attempt. Main motivation for this feature is to increase success rate on hole punching (see libp2p/rust-libp2p#1896 (comment) for details). Though, as a nice side effect, as one would expect, it does improve connection establishment time. Cleanups and fixes done along the way: - Merge `pool.rs` and `manager.rs`. - Instead of manually implementing state machines in `task.rs` use `async/await`. - Fix bug where `NetworkBehaviour::inject_connection_closed` is called without a previous `NetworkBehaviour::inject_connection_established` (see libp2p/rust-libp2p#2242). - Return handler to behaviour on incoming connection limit error. Missed in libp2p/rust-libp2p#2242.

mxinden mentioned this issue Sep 28, 2021

core/: Merge pending and established connection limits #2253

Closed

This was referenced Oct 6, 2021

transports/noise: fix compilation #2264

Merged

Can we get a new version pushed to crates.io please? #2273

Closed

core/: Concurrent connection attempts #2248

Merged

AgeManning closed this as completed Nov 1, 2021

mxinden mentioned this issue Nov 1, 2021

*: Prepare v0.40.0 release #2323

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Invalid Swarm behaviour #2242

Invalid Swarm behaviour #2242

AgeManning commented Sep 21, 2021

AgeManning commented Sep 23, 2021 •

edited

Loading

AgeManning commented Sep 25, 2021 •

edited

Loading

mxinden commented Sep 27, 2021

mxinden commented Sep 28, 2021

mxinden commented Sep 28, 2021

AgeManning commented Sep 28, 2021

AgeManning commented Nov 1, 2021

Invalid Swarm behaviour #2242

Invalid Swarm behaviour #2242

Comments

AgeManning commented Sep 21, 2021

AgeManning commented Sep 23, 2021 • edited Loading

AgeManning commented Sep 25, 2021 • edited Loading

mxinden commented Sep 27, 2021

mxinden commented Sep 28, 2021

mxinden commented Sep 28, 2021

AgeManning commented Sep 28, 2021

AgeManning commented Nov 1, 2021

AgeManning commented Sep 23, 2021 •

edited

Loading

AgeManning commented Sep 25, 2021 •

edited

Loading