Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix regression w.r.t. reporting of dial errors. #1493

Merged
merged 2 commits into from
Mar 16, 2020

Conversation

romanb
Copy link
Contributor

@romanb romanb commented Mar 11, 2020

#1440 introduced a regression w.r.t. the reporting of dial errors. In particular, if a connection attempt fails due to an invalid remote peer ID, any remaining addresses for the same peer would not be tried (intentional) but the dial failure would not be reported to the behaviour, causing e.g. libp2p-kad queries to potentially stall.

In hindsight, I figured it is better to preserve the previous behaviour to still try alternative addresses
of the peer even on invalid peer ID errors on an earlier address. In particular because in the context of libp2p-kad it is not uncommon for peers to report localhost addresses while the local node actually has e.g. an ipfs node running on that address, obviously with a different peer ID, which is the scenario causing frequent invalid peer ID (mismatch) errors when running the ipfs-kad example in the go-ipfs docker container.

This PR thus restores the previous behaviour w.r.t. trying all remaining addresses on invalid peer ID errors, as well as making sure inject_dial_error is always called when the last attempt failed, regardless of whether the peer is already connected or not (e.g. as a result of a simultaneous incoming connection).

Overall this is a slight simplification, though requiring some additional cloning of peer IDs (as was also done before #1440).

This PR restores the ipfs-kad example as an integration test for CI, which should now be stable again.

PR [1440] introduced a regression w.r.t. the reporting of
dial errors. In particular, if a connection attempt fails
due to an invalid remote peer ID, any remaining addresses
for the same peer would not be tried (intentional) but
the dial failure would not be reported to the behaviour,
causing e.g. libp2p-kad queries to potentially stall.

In hindsight, I figured it is better to preserve the
previous behaviour to still try alternative addresses
of the peer even on invalid peer ID errors on an earlier
address. In particular because in the context of libp2p-kad
it is not uncommon for peers to report localhost addresses
while the local node actually has e.g. an ipfs node running
on that address, obviously with a different peer ID, which
is the scenario causing frequent invalid peer ID (mismatch)
errors when running the ipfs-kad example.

This commit thus restores the previous behaviour w.r.t.
trying all remaining addresses on invalid peer ID errors
as well as making sure `inject_dial_error` is always
called when the last attempt failed.

[1440]: libp2p#1440.
@romanb romanb changed the title [Just a test] Fix regression w.r.t. reporting of dial errors. Mar 12, 2020
@romanb romanb marked this pull request as ready for review March 12, 2020 15:39
@romanb romanb requested review from tomaka and twittner March 12, 2020 15:39
Copy link
Member

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

Comment on lines 614 to 616
assert_ne!(&self.local_id, entry.connected().peer_id());
if let Some(peer) = peer {
assert_eq!(&peer, entry.connected().peer_id());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose that this is because you require Debug above?
If so, I'd prefer to not enforce Debug if possible.

Suggested change
assert_ne!(&self.local_id, entry.connected().peer_id());
if let Some(peer) = peer {
assert_eq!(&peer, entry.connected().peer_id());
assert_ne!(&self.local_id, entry.connected().peer_id(), "Unexpected local peer ID");
if let Some(peer) = peer {
assert_eq!(&peer, entry.connected().peer_id(), "PeerId mismatch");

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fmt::Debug is required by assert_eq! and assert_ne! for the arguments, so are you suggesting something like:

if &self.local_id == entry.connected().peer_id() {
    panic!("...")
}
if let Some(peer) = peer {
    if peer != entry.connected().peer_id() {
        panic!("...")
    }
}

? It is unfortunate though not to have the problematic peer IDs in the output. Why is it undesirable to require fmt::Debug?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have now done this: 5807155

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it undesirable to require fmt::Debug?

I just see it as "more correct" to only enforce the minimum possible set of traits.
In an ideal world, we would print the PeerId if it implements Debug and not print it if it doesn't. But that's unfortunately not possible.

@@ -536,7 +555,7 @@ where
PoolEvent<'a, TInEvent, TOutEvent, THandler, TTransErr, THandlerErr, TConnInfo, TPeerId>
> where
TConnInfo: ConnectionInfo<PeerId = TPeerId> + Clone,
TPeerId: Clone,
TPeerId: Clone + fmt::Debug,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
TPeerId: Clone + fmt::Debug,
TPeerId: Clone,

@@ -331,7 +330,7 @@ where
THandler::Handler: ConnectionHandler<Substream = Substream<TMuxer>, InEvent = TInEvent, OutEvent = TOutEvent> + Send + 'static,
<THandler::Handler as ConnectionHandler>::Error: error::Error + Send + 'static,
TConnInfo: Clone,
TPeerId: AsRef<[u8]> + Send + 'static,
TPeerId: fmt::Debug + Send + 'static,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
TPeerId: fmt::Debug + Send + 'static,
TPeerId: Send + 'static,

@romanb romanb merged commit 58ee13b into libp2p:master Mar 16, 2020
@romanb romanb deleted the dial-error branch March 16, 2020 15:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants