Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Adapt to rust-libp2p#1440. #5066

Closed
wants to merge 1 commit into from

Conversation

romanb
Copy link
Contributor

@romanb romanb commented Feb 26, 2020

Overview

These are the current changes necessary for adapting substrate to libp2p/rust-libp2p#1440. As described in the libp2p PR, the underlying changes are primarily in libp2p-core and for the first iteration the impact on the libp2p-swarm API and thus substrate is relatively minimal since at this point the API of libp2p-swarm does not actually permit a NetworkBehaviour to explicitly request multiple connections per peer. That will change later. For the moment, realistically, a second connection to the same peer only occurs if two peers connect to each other "at the same time". As a side-effect, existing connections are also no longer closed in favour of new ones, which should implicitly address #4272, though I didn't get around to verify that yet.

The approach to the integration of the libp2p changes taken here can be summarised as follows (also in the code comments).

Details

  1. The enabled/disabled status is the same across all connections, as decided by the peerset manager. In order to make the GenericProto behaviour aware of all connection handlers (and thus connections), each handler now explicitly emits an Init event as the very first event, requesting initialisation (enable/disable) from the behaviour. This was previously implicit.
  2. send_packet and write_notification always send all data over the same connection to preserve the ordering provided by the transport, as long as that connection is open. If it closes, a second open connection may take over, if one exists, but that case should be no different than a single connection failing and being re-established in terms of potential reordering and dropped messages. Messages can be received on any connection.
  3. The behaviour reports GenericProtoOut::CustomProtocolOpen when the first connection reports NotifsHandlerOut::Open.
  4. The behaviour reports GenericProtoOut::CustomProtocolClosed when the last connection reports NotifsHandlerOut::Closed.

In this way, the number of actual established connections to the peer is an implementation detail of the GenericProto behaviour. As mentioned before, in practice and at the time of this writing, there may be at most two connections to a peer and only as a result of simultaneous dialing. However, the implementation accommodates for any number of connections.

Noteworthy

During intermediate testing with the (by default disabled) integration tests test_consensus, test_sync and test_connectivity it was revealed that when run in release mode these tests were very often failing, with the common symptom that the last node to start in a round of testing would often see no other peers (i.e. empty DHT routing table) and thus make no progress while all the others keep on running, causing the tests to time out waiting for the problematic peer to reach a certain state. The tests are mainly using add_reserved_peer on the network to initialise the topology, however, add_reserved_peer ultimately results in a call to add_known_peer on the DiscoveryBehaviour which did not actually add that address to the Kademlia routing table, though it adds it to the user_defined peers which, when passed in the constructor of the behaviour, are added to the Kademlia routing table. I thus changed add_known_peer to also add the given address to the Kademlia routing table and that resolved the issues with these integration tests and the test_connectivity test seems to run notably faster (release mode). My current guess is that the tests were so far unknowingly relying on a timing assumption w.r.t. the initial discovery / connection setup and DHT queries in order for all peers to find each other, in particular when simultaneous connections attempts are in play, as often happens in release mode. Ultimately, the change of letting add_known_peer add the given address to the Kademlia routing table may be a patch worth extracting separately, because it does look like an oversight to me.

@romanb romanb added the A3-in_progress Pull request is in progress. No review needed at this stage. label Feb 26, 2020
@romanb romanb removed the A3-in_progress Pull request is in progress. No review needed at this stage. label Mar 5, 2020
@gavofyork gavofyork added the A3-in_progress Pull request is in progress. No review needed at this stage. label Mar 5, 2020
@romanb
Copy link
Contributor Author

romanb commented Mar 17, 2020

Superseded by #5278.

@romanb romanb closed this Mar 17, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A3-in_progress Pull request is in progress. No review needed at this stage.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants