Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relay: Destination peer disconnects after idle time #2102

Closed
elenaf9 opened this issue Jun 14, 2021 · 3 comments
Closed

Relay: Destination peer disconnects after idle time #2102

elenaf9 opened this issue Jun 14, 2021 · 3 comments

Comments

@elenaf9
Copy link
Contributor

elenaf9 commented Jun 14, 2021

From what I have tested, the connection between the destination peer that is listening on a relayed address, and the associated relay, is not kept alive. Instead, it seems like it is closing after some idle time.
If then a source peer tries to connect to the destination peer, it seems like the relay is trying to relay the connection, but the destination peer rejects the incoming destination request (relay/src/behaviour.rs, line 508).
From what is written in PR #1838, the connection should be kept alive and this is a bug, but maybe I am missing something?

This issue should be replicable with this changed test https://github.com/elenaf9/rust-libp2p/blob/d587498e1d13b3cf74880b47d20f9771ab9daed8/protocols/relay/tests/lib.rs#L51, that tries to connect to the destination peer twice, with 20s delay in between.

@mxinden
Copy link
Member

mxinden commented Jun 16, 2021

the connection should be kept alive and this is a bug

Yes.

The connection needs to be kept alive both by the destination and the relay.

The destination should keep a connection, via which it is listening, alive, more specifically used_for_listening should be true and thus KeepAlive::Yes returned:

if self.used_for_listening
|| !self.deny_futures.is_empty()
|| !self.accept_dst_futures.is_empty()
|| !self.copy_futures.is_empty()
|| !self.alive_lend_out_substreams.is_empty()
{
// Protocol handler is busy.
self.keep_alive = KeepAlive::Yes;

The relay should keep the connection alive, though today, there is no good mechanism in place to enforce this, other than by setting connection_idle_count high:

/// How long to keep connections alive when they're idle.
///
/// For a server, acting as a relay, allowing other nodes to listen for
/// incoming connections via oneself, this should likely be increased in
/// order not to force the peer to reconnect too regularly.
pub connection_idle_timeout: Duration,

In the ideal case, when the relay closes an idle connection via which a destination is listening, the destination should attempt to reconnect (likely with an exponential backoff). This is not the case today:

} else {
// There are no more connections to the relay left that
// could be promoted as primary. Remove the listener,
// notifying the listener by dropping the channel to it.
self.listeners.remove(peer);
}

Is a high connection_idle_count a valid interim fix for you @elenaf9? I am a bit reluctant to fix this in circuit relay v1 myself, given that the circuit relay v2 implementation will solve this issue via the notion of reservations which are being refreshed repeatedly. That said, I am happy to collaborate on a fix for circuit relay v1.

Let me know what you think.

@elenaf9
Copy link
Contributor Author

elenaf9 commented Jun 17, 2021

@mxinden Thanks for the detailed explanation!

Is a high connection_idle_count a valid interim fix for you @elenaf9?

Yes, that should be enough for now (you mean RelayConfig.connection_idle_timeout, right?).
From how I understand the code after looking at it again, the relay server is not aware that the destination peer is "listening on" it. Thus I also don't really see an easy fix for the relay to only keep the connection to destination nodes alive.

@elenaf9 elenaf9 closed this as completed Jun 17, 2021
@mxinden
Copy link
Member

mxinden commented Jun 17, 2021

Yes, that should be enough for now

Great. Please subscribe to #2059 which will provides a solution by the protocol design.

(you mean RelayConfig.connection_idle_timeout, right?).

Right.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants