Adapt to rust-libp2p#1440. #5066

romanb · 2020-02-26T14:34:45Z

Overview

These are the current changes necessary for adapting substrate to libp2p/rust-libp2p#1440. As described in the libp2p PR, the underlying changes are primarily in libp2p-core and for the first iteration the impact on the libp2p-swarm API and thus substrate is relatively minimal since at this point the API of libp2p-swarm does not actually permit a NetworkBehaviour to explicitly request multiple connections per peer. That will change later. For the moment, realistically, a second connection to the same peer only occurs if two peers connect to each other "at the same time". As a side-effect, existing connections are also no longer closed in favour of new ones, which should implicitly address #4272, though I didn't get around to verify that yet.

The approach to the integration of the libp2p changes taken here can be summarised as follows (also in the code comments).

Details

The enabled/disabled status is the same across all connections, as decided by the peerset manager. In order to make the GenericProto behaviour aware of all connection handlers (and thus connections), each handler now explicitly emits an Init event as the very first event, requesting initialisation (enable/disable) from the behaviour. This was previously implicit.
send_packet and write_notification always send all data over the same connection to preserve the ordering provided by the transport, as long as that connection is open. If it closes, a second open connection may take over, if one exists, but that case should be no different than a single connection failing and being re-established in terms of potential reordering and dropped messages. Messages can be received on any connection.
The behaviour reports GenericProtoOut::CustomProtocolOpen when the first connection reports NotifsHandlerOut::Open.
The behaviour reports GenericProtoOut::CustomProtocolClosed when the last connection reports NotifsHandlerOut::Closed.

In this way, the number of actual established connections to the peer is an implementation detail of the GenericProto behaviour. As mentioned before, in practice and at the time of this writing, there may be at most two connections to a peer and only as a result of simultaneous dialing. However, the implementation accommodates for any number of connections.

Noteworthy

During intermediate testing with the (by default disabled) integration tests test_consensus, test_sync and test_connectivity it was revealed that when run in release mode these tests were very often failing, with the common symptom that the last node to start in a round of testing would often see no other peers (i.e. empty DHT routing table) and thus make no progress while all the others keep on running, causing the tests to time out waiting for the problematic peer to reach a certain state. The tests are mainly using add_reserved_peer on the network to initialise the topology, however, add_reserved_peer ultimately results in a call to add_known_peer on the DiscoveryBehaviour which did not actually add that address to the Kademlia routing table, though it adds it to the user_defined peers which, when passed in the constructor of the behaviour, are added to the Kademlia routing table. I thus changed add_known_peer to also add the given address to the Kademlia routing table and that resolved the issues with these integration tests and the test_connectivity test seems to run notably faster (release mode). My current guess is that the tests were so far unknowingly relying on a timing assumption w.r.t. the initial discovery / connection setup and DHT queries in order for all peers to find each other, in particular when simultaneous connections attempts are in play, as often happens in release mode. Ultimately, the change of letting add_known_peer add the given address to the Kademlia routing table may be a patch worth extracting separately, because it does look like an oversight to me.

romanb · 2020-03-17T13:43:59Z

Superseded by #5278.

romanb added the A3-in_progress Pull request is in progress. No review needed at this stage. label Feb 26, 2020

romanb force-pushed the libp2p-multicon-compat branch from 5387599 to af8748a Compare February 28, 2020 16:51

Adapt to rust-libp2p#1440.

e1d687b

romanb force-pushed the libp2p-multicon-compat branch from a801957 to e1d687b Compare March 4, 2020 11:17

romanb mentioned this pull request Mar 4, 2020

Multiple connections per peer libp2p/rust-libp2p#1440

Merged

romanb removed the A3-in_progress Pull request is in progress. No review needed at this stage. label Mar 5, 2020

gavofyork added the A3-in_progress Pull request is in progress. No review needed at this stage. label Mar 5, 2020

This was referenced Mar 12, 2020

protocols/kad: Fix right shift overflow panic in record_received libp2p/rust-libp2p#1492

Merged

libp2p-next #5278

Merged

romanb closed this Mar 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adapt to rust-libp2p#1440. #5066

Adapt to rust-libp2p#1440. #5066

romanb commented Feb 26, 2020 •

edited

Loading

romanb commented Mar 17, 2020

Adapt to rust-libp2p#1440. #5066

Adapt to rust-libp2p#1440. #5066

Conversation

romanb commented Feb 26, 2020 • edited Loading

Overview

Details

Noteworthy

romanb commented Mar 17, 2020

romanb commented Feb 26, 2020 •

edited

Loading