Fix regression w.r.t. reporting of dial errors. #1493

romanb · 2020-03-11T15:16:42Z

#1440 introduced a regression w.r.t. the reporting of dial errors. In particular, if a connection attempt fails due to an invalid remote peer ID, any remaining addresses for the same peer would not be tried (intentional) but the dial failure would not be reported to the behaviour, causing e.g. libp2p-kad queries to potentially stall.

In hindsight, I figured it is better to preserve the previous behaviour to still try alternative addresses
of the peer even on invalid peer ID errors on an earlier address. In particular because in the context of libp2p-kad it is not uncommon for peers to report localhost addresses while the local node actually has e.g. an ipfs node running on that address, obviously with a different peer ID, which is the scenario causing frequent invalid peer ID (mismatch) errors when running the ipfs-kad example in the go-ipfs docker container.

This PR thus restores the previous behaviour w.r.t. trying all remaining addresses on invalid peer ID errors, as well as making sure inject_dial_error is always called when the last attempt failed, regardless of whether the peer is already connected or not (e.g. as a result of a simultaneous incoming connection).

Overall this is a slight simplification, though requiring some additional cloning of peer IDs (as was also done before #1440).

This PR restores the ipfs-kad example as an integration test for CI, which should now be stable again.

PR [1440] introduced a regression w.r.t. the reporting of dial errors. In particular, if a connection attempt fails due to an invalid remote peer ID, any remaining addresses for the same peer would not be tried (intentional) but the dial failure would not be reported to the behaviour, causing e.g. libp2p-kad queries to potentially stall. In hindsight, I figured it is better to preserve the previous behaviour to still try alternative addresses of the peer even on invalid peer ID errors on an earlier address. In particular because in the context of libp2p-kad it is not uncommon for peers to report localhost addresses while the local node actually has e.g. an ipfs node running on that address, obviously with a different peer ID, which is the scenario causing frequent invalid peer ID (mismatch) errors when running the ipfs-kad example. This commit thus restores the previous behaviour w.r.t. trying all remaining addresses on invalid peer ID errors as well as making sure `inject_dial_error` is always called when the last attempt failed. [1440]: libp2p#1440.

mxinden

Good catch!

tomaka · 2020-03-16T11:36:13Z

core/src/connection/pool.rs

+                            assert_ne!(&self.local_id, entry.connected().peer_id());
+                            if let Some(peer) = peer {
+                                assert_eq!(&peer, entry.connected().peer_id());


I suppose that this is because you require Debug above?
If so, I'd prefer to not enforce Debug if possible.

Suggested change

assert_ne!(&self.local_id, entry.connected().peer_id());

if let Some(peer) = peer {

assert_eq!(&peer, entry.connected().peer_id());

assert_ne!(&self.local_id, entry.connected().peer_id(), "Unexpected local peer ID");

if let Some(peer) = peer {

assert_eq!(&peer, entry.connected().peer_id(), "PeerId mismatch");

fmt::Debug is required by assert_eq! and assert_ne! for the arguments, so are you suggesting something like:

if &self.local_id == entry.connected().peer_id() { panic!("...") } if let Some(peer) = peer { if peer != entry.connected().peer_id() { panic!("...") } }

? It is unfortunate though not to have the problematic peer IDs in the output. Why is it undesirable to require fmt::Debug?

I have now done this: 5807155

Why is it undesirable to require fmt::Debug?

I just see it as "more correct" to only enforce the minimum possible set of traits.
In an ideal world, we would print the PeerId if it implements Debug and not print it if it doesn't. But that's unfortunately not possible.

tomaka · 2020-03-16T11:36:21Z

core/src/connection/pool.rs

@@ -536,7 +555,7 @@ where
        PoolEvent<'a, TInEvent, TOutEvent, THandler, TTransErr, THandlerErr, TConnInfo, TPeerId>
    > where
        TConnInfo: ConnectionInfo<PeerId = TPeerId> + Clone,
-        TPeerId: Clone,
+        TPeerId: Clone + fmt::Debug,


Suggested change

TPeerId: Clone + fmt::Debug,

TPeerId: Clone,

tomaka · 2020-03-16T11:36:38Z

core/src/network.rs

@@ -331,7 +330,7 @@ where
        THandler::Handler: ConnectionHandler<Substream = Substream<TMuxer>, InEvent = TInEvent, OutEvent = TOutEvent> + Send + 'static,
        <THandler::Handler as ConnectionHandler>::Error: error::Error + Send + 'static,
        TConnInfo: Clone,
-        TPeerId: AsRef<[u8]> + Send + 'static,
+        TPeerId: fmt::Debug + Send + 'static,


Suggested change

TPeerId: fmt::Debug + Send + 'static,

TPeerId: Send + 'static,

romanb force-pushed the dial-error branch from 6998349 to e52665e Compare March 12, 2020 15:31

romanb changed the title ~~[Just a test]~~ Fix regression w.r.t. reporting of dial errors. Mar 12, 2020

romanb marked this pull request as ready for review March 12, 2020 15:39

romanb requested review from tomaka and twittner March 12, 2020 15:39

twittner approved these changes Mar 12, 2020

View reviewed changes

mxinden approved these changes Mar 13, 2020

View reviewed changes

tomaka suggested changes Mar 16, 2020

View reviewed changes

Remove an fmt::Debug requirement.

5807155

tomaka approved these changes Mar 16, 2020

View reviewed changes

romanb merged commit 58ee13b into libp2p:master Mar 16, 2020

romanb deleted the dial-error branch March 16, 2020 15:53

romanb mentioned this pull request Mar 18, 2020

Add logging to the integration test #1486

Closed

koivunej mentioned this pull request Mar 22, 2020

Add connect integration test rs-ipfs/rust-ipfs#108

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix regression w.r.t. reporting of dial errors. #1493

Fix regression w.r.t. reporting of dial errors. #1493

romanb commented Mar 11, 2020 •

edited

Loading

mxinden left a comment

tomaka Mar 16, 2020

romanb Mar 16, 2020

romanb Mar 16, 2020

tomaka Mar 16, 2020

tomaka Mar 16, 2020

tomaka Mar 16, 2020

	TPeerId: fmt::Debug + Send + 'static,
	TPeerId: Send + 'static,

Fix regression w.r.t. reporting of dial errors. #1493

Fix regression w.r.t. reporting of dial errors. #1493

Conversation

romanb commented Mar 11, 2020 • edited Loading

mxinden left a comment

Choose a reason for hiding this comment

tomaka Mar 16, 2020

Choose a reason for hiding this comment

romanb Mar 16, 2020

Choose a reason for hiding this comment

romanb Mar 16, 2020

Choose a reason for hiding this comment

tomaka Mar 16, 2020

Choose a reason for hiding this comment

tomaka Mar 16, 2020

Choose a reason for hiding this comment

tomaka Mar 16, 2020

Choose a reason for hiding this comment

romanb commented Mar 11, 2020 •

edited

Loading