Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(swarm): gracefully disable oneshot handler on dial upgrade errors #3577

Merged
merged 3 commits into from
Mar 13, 2023

Conversation

vnermolaev
Copy link
Contributor

@vnermolaev vnermolaev commented Mar 9, 2023

Description

Resolves #3269.

Notes

This PR implement changes discussed in #3269

I’d propose to log inbound upgrade errors on debug level in the above spot.

Sounds good to me.

In the above code snippet, outbound upgrade errors lead to closing the connection. I’m not sure whether that is the best possible response given that many protocols may be composed and thus the escalation’s scope seems too large.

Agreed. Setting its keep_alive to False should be enough.

Links to any relevant issues

#3269

Change checklist

The changes are trivial. Not sure whether the usual checklist applies.

@vnermolaev vnermolaev changed the title fix(swarm) Log error. Set keep_alive to false. fix(swarm): Log error. Set keep_alive to false. Mar 9, 2023
@thomaseizinger thomaseizinger changed the title fix(swarm): Log error. Set keep_alive to false. fix(swarm): gracefully disable oneshot handler on dial upgrade errors Mar 10, 2023
Copy link
Contributor

@thomaseizinger thomaseizinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@@ -213,7 +213,9 @@ where
}
ConnectionEvent::DialUpgradeError(DialUpgradeError { error, .. }) => {
if self.pending_error.is_none() {
log::debug!("DialUpgradeError: {error:?}");
self.pending_error = Some(error);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should remove this, otherwise we will still close the connection later!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pardon me, remove self.pending_error = Some(error)?
I am not sure what implications are, so I kept the changes aligned with the approach outlined in the respective issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anything within pending_error will be returned via ProtocolsHandlerEvent::Close which effectively closes the connection. In this case, we don't want that but set keep_alive to No which, if not other handler needs the connection will result in closing it.

@@ -213,7 +213,9 @@ where
}
ConnectionEvent::DialUpgradeError(DialUpgradeError { error, .. }) => {
if self.pending_error.is_none() {
log::debug!("DialUpgradeError: {error:?}");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
log::debug!("DialUpgradeError: {error:?}");
log::debug!("DialUpgradeError: {error}");

Errors can be display formatted but at the moment, that does not necessarily print the entire chain of errors.

@mxinden Should we depend on anyhow internally here to print all errors?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a function print_error_chain in swarm/src/lib.rs. I could use that one. Is this an option?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to use that one yeah as we are within the same crate!

swarm/src/handler/one_shot.rs Show resolved Hide resolved
@@ -467,7 +467,7 @@ where
ConnectionHandlerUpgrErr::Timer => {
write!(f, "Timer error while opening a substream")
}
ConnectionHandlerUpgrErr::Upgrade(err) => write!(f, "{err}"),
err @ ConnectionHandlerUpgrErr::Upgrade(_) => crate::print_error_chain(f, err),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is what we want, instead, this should probably just print "upgrade error" without printing the inner error. That is what print_error_chain then does because the inner error is returned from source.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you use print_error_chain here, we might end up with duplicate printing if somebody uses print_error_chain on ConnectionHandlerUpgrErr.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right.
Please see my other comment.
It'd be possible to mimic impl fmt::Display for DialError.

@vnermolaev vnermolaev force-pushed the bug/swallowe-error branch 3 times, most recently from 6ec12e4 to 8b4c290 Compare March 13, 2023 17:31
@vnermolaev
Copy link
Contributor Author

One test is failing but this is due some docker issues.

Copy link
Contributor

@thomaseizinger thomaseizinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@thomaseizinger
Copy link
Contributor

One test is failing but this is due some docker issues.

Yes that is unfortunately a flaky test.

Can you add a small changelog entry to libp2p-swarm please?

@vnermolaev
Copy link
Contributor Author

Is it really necessary?
Last time I did a larger refactoring and no changelog has been required because it was not a user-facing feature.
This one does not seem to be user-facing. I can ofc add a change log, but what should I even write? :)

@thomaseizinger
Copy link
Contributor

This one does not seem to be user-facing.

I'd argue that this one is definitely a change in behaviour that is observable.

I can ofc add a change log, but what should I even write? :)

Feel free to copy the PR title (modulo formatting adjustments)!

@vnermolaev
Copy link
Contributor Author

done

@mergify
Copy link
Contributor

mergify bot commented Mar 13, 2023

This pull request has merge conflicts. Could you please resolve them @vnermolaev? 🙏

@mergify mergify bot merged commit 2a18f7a into libp2p:master Mar 13, 2023
@thomaseizinger
Copy link
Contributor

Thank you!

I've opened #3591 as a spiritual follow-up because we currently have this "problem" in many crates actually.

Copy link
Member

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work here. @thomaseizinger or @vnermolaev can either of you address the comments below?

I hope that this is not a breaking change. In other words, I hope that no downstream user depends on the fact that the OneShotHandler closes a connection if a remote does not support its protocol.

@@ -90,6 +90,8 @@
- Deprecate methods `Swarm::with_executor`, `Swarm::with_*_executor`, `Swarm::without_executor`.
Introduce similar methods in `SwarmBuilder`. See [PR 3588].

- Gracefully disable oneshot handler on dial upgrade errors. See [PR 3577].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under the assumption that this is not a breaking change, this pull request should have bumped libp2p-swarm to v0.42.1 and this changelog entry should have been under the v0.42.1 section.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how to address this because it's been merged.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can just fix it in a new PR.

@@ -213,7 +213,8 @@ where
}
ConnectionEvent::DialUpgradeError(DialUpgradeError { error, .. }) => {
if self.pending_error.is_none() {
self.pending_error = Some(error);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell, this was the only occurrence of setting self.pending_error. In other words, I don't think pending_error is used anywhere after this change. Is that correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost; there is one more place

if let Some(err) = self.pending_error.take() {
    return Poll::Ready(ConnectionHandlerEvent::Close(err));
}

The resolution of this usage is related to #3591.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if it is never set, then this is dead code so we can delete it straight-away!

Sorry for missing that @mxinden.

@thomaseizinger
Copy link
Contributor

I hope that this is not a breaking change. In other words, I hope that no downstream user depends on the fact that the OneShotHandler closes a connection if a remote does not support its protocol.

I'd be very surprised if it breaks something. I consider this a bugfix to be honest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

swallowed errors, or escalating too quickly
3 participants