This repository has been archived by the owner on Nov 15, 2023. It is now read-only.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview
These are the current changes necessary for adapting substrate to libp2p/rust-libp2p#1440. As described in the libp2p PR, the underlying changes are primarily in
libp2p-core
and for the first iteration the impact on thelibp2p-swarm
API and thus substrate is relatively minimal since at this point the API oflibp2p-swarm
does not actually permit aNetworkBehaviour
to explicitly request multiple connections per peer. That will change later. For the moment, realistically, a second connection to the same peer only occurs if two peers connect to each other "at the same time". As a side-effect, existing connections are also no longer closed in favour of new ones, which should implicitly address #4272, though I didn't get around to verify that yet.The approach to the integration of the libp2p changes taken here can be summarised as follows (also in the code comments).
Details
GenericProto
behaviour aware of all connection handlers (and thus connections), each handler now explicitly emits anInit
event as the very first event, requesting initialisation (enable/disable) from the behaviour. This was previously implicit.send_packet
andwrite_notification
always send all data over the same connection to preserve the ordering provided by the transport, as long as that connection is open. If it closes, a second open connection may take over, if one exists, but that case should be no different than a single connection failing and being re-established in terms of potential reordering and dropped messages. Messages can be received on any connection.GenericProtoOut::CustomProtocolOpen
when the first connection reportsNotifsHandlerOut::Open
.GenericProtoOut::CustomProtocolClosed
when the last connection reportsNotifsHandlerOut::Closed
.In this way, the number of actual established connections to the peer is an implementation detail of the
GenericProto
behaviour. As mentioned before, in practice and at the time of this writing, there may be at most two connections to a peer and only as a result of simultaneous dialing. However, the implementation accommodates for any number of connections.Noteworthy
During intermediate testing with the (by default disabled) integration tests
test_consensus
,test_sync
andtest_connectivity
it was revealed that when run in release mode these tests were very often failing, with the common symptom that the last node to start in a round of testing would often see no other peers (i.e. empty DHT routing table) and thus make no progress while all the others keep on running, causing the tests to time out waiting for the problematic peer to reach a certain state. The tests are mainly usingadd_reserved_peer
on the network to initialise the topology, however,add_reserved_peer
ultimately results in a call toadd_known_peer
on theDiscoveryBehaviour
which did not actually add that address to the Kademlia routing table, though it adds it to theuser_defined
peers which, when passed in the constructor of the behaviour, are added to the Kademlia routing table. I thus changedadd_known_peer
to also add the given address to the Kademlia routing table and that resolved the issues with these integration tests and thetest_connectivity
test seems to run notably faster (release mode). My current guess is that the tests were so far unknowingly relying on a timing assumption w.r.t. the initial discovery / connection setup and DHT queries in order for all peers to find each other, in particular when simultaneous connections attempts are in play, as often happens in release mode. Ultimately, the change of lettingadd_known_peer
add the given address to the Kademlia routing table may be a patch worth extracting separately, because it does look like an oversight to me.