Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

protocols/relay: Implement circuit relay specification #1838

Merged
merged 133 commits into from
Mar 11, 2021

Conversation

mxinden
Copy link
Member

@mxinden mxinden commented Nov 15, 2020

This pull request implements the libp2p circuit relay specification. It is based on previous work from #1134.

Instead of altering the Transport trait, the approach taken in this pull request is to wrap an existing implementation of Transport allowing one to:

  • Intercept dial requests with a relayed address.

  • Inject incoming relayed connections with the local node being the destination.

  • Intercept listen_on requests pointing to a relay, ensuring to keep a constant connection to the relay, waiting for incoming requests with the local node being the destination.

More concretely one would wrap an existing Transport implementation as seen below, allowing the Relay behaviour and the RelayTransport to communicate via channels.

Example

let (relay_transport, relay_behaviour) = new_transport_and_behaviour(
    RelayConfig::default(),
    MemoryTransport::default(),
);

let transport = relay_transport
    .upgrade(upgrade::Version::V1)
    .authenticate(plaintext)
    .multiplex(YamuxConfig::default())
    .boxed();

let mut swarm = Swarm::new(transport, relay_behaviour, local_peer_id);

let relay_addr = Multiaddr::from_str("/memory/1234").unwrap()
    .with(Protocol::P2p(PeerId::random().into()))
    .with(Protocol::P2pCircuit);
let dst_addr = relay_addr.clone().with(Protocol::Memory(5678));

// Listen for incoming connections via relay node (1234).
Swarm::listen_on(&mut swarm, relay_addr).unwrap();

// Dial node (5678) via relay node (1234).
Swarm::dial_addr(&mut swarm, dst_addr).unwrap();

Status Quo

This pull request is ready to be reviewed and tested. See #1838 (comment) for details.

Closes #725.

tomaka and others added 30 commits May 19, 2019 20:42
@mxinden
Copy link
Member Author

mxinden commented Mar 6, 2021

Something I wasn't quite sure about yet from the implementation: This does not cover the ability for "multi hop relaying" as specified under "future work" in the spec, or does it?

Correct. It does not. There is also a proposal for circuit relay v2 (libp2p/go-libp2p-circuit#125) which would be worth considering in the future.

@mxinden
Copy link
Member Author

mxinden commented Mar 6, 2021

I am reasonably sure I addressed all the comments above, most notably the re-work of the listener logic (#1838 (comment)).

In case you have time for another review @romanb, that would be terrific.

Copy link
Contributor

@romanb romanb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Glad to see the Arc<Mutex> gone. I only left two more minor comments.

protocols/relay/src/behaviour.rs Outdated Show resolved Hide resolved
} => {
let err_code = match error {
ProtocolsHandlerUpgrErr::Timeout => {
self.pending_error = Some(ProtocolsHandlerUpgrErr::Timeout);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want to terminate the connection on a single relay substream negotiation timeout? Other protocols may use the connection (successfully) as well, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, given that libp2p-relay uses the default timeout, this will only happen after 10 seconds. I would deem a node that is not able to respond to the light-weight circuit relay requests within 10 seconds as either misbehaving or overloaded. For the former, disconnecting seems to be the way to go. For the latter I would say disconnecting is beneficial for both sides. The local node might be able to succeed in whatever it is up to via another relay or destination node. In addition I would guess other protocols running on the same connection are not making progress either if the light-weight circuit relay negotiation does not make progress. The overloaded remote node would receive less traffic (both through the relay protocol and any other protocol out there) and thus become less overloaded overall

The above said, I don't know what the correct behaviour is and I don't know whether it should be consistent across protocols. E.g. libp2p-request-response does not terminate the whole connection on Timeout, which makes sense to me as the payloads being exchanged could potentially be very large.

fn inject_listen_upgrade_error(
&mut self,
info: RequestId,
error: ProtocolsHandlerUpgrErr<io::Error>
) {
match error {
ProtocolsHandlerUpgrErr::Timeout => {
self.pending_events.push_back(RequestResponseHandlerEvent::InboundTimeout(info))
}

With my reasoning above in mind, what do you think @romanb we should do in case of a ProtocolsHandlerUpgrErr::Timeout?

Copy link
Contributor

@romanb romanb Mar 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition I would guess other protocols running on the same connection are not making progress either if the light-weight circuit relay negotiation does not make progress.

I think conceptually, different protocols using different substreams on the same connection should be considered independently, while necessarily sharing the same network resource. Making it a requirement that all protocols used on a connection work well (e.g. without timeouts) all the time seems a bit problematic. Especially when it comes to timeouts on (outbound) substreams, in my mind, these should be reported to client code which can then a) decide whether to retry and if so how often and with what kind of back-off strategy and b) whether the entire connection should be closed after x timeouts. The concrete application knows what protocols are used on a particular connection, whereas a single protocol does not know with which others it must share the connections. These were also the considerations for libp2p-request-response. Making the fixed choice within a particular protocol that the connection is killed if a particular substream protocol upgrade does not complete within 10 seconds seems very rigid. It is probably fine in this particular instance, so I'm not opposed, but in general I think any errors other than protocol violations, (outbound) timeouts in particular, should just be reported on the API and left to client code to handle.

Add `actively_connect_to_dst_nodes` configuration option. Configures
whether to actively establish an outgoing connection to a destination
node, when being asked by a source node to relay a connection to said
destination node.

For security reasons this behaviour is disabled by default, thus a relay
node will not actively establish an outgoing connection to a destination
node in case it is not yet connected to said destination node. Instead a
destination node should establish a connection to a relay node before
advertising their relayed address via that relay node to a source node.
@mxinden
Copy link
Member Author

mxinden commented Mar 10, 2021

I want to draw attention to the most recent commit 7956f0b which is not based on any of the above review comments.

protocols/relay: Disable active relay behaviour by default

Add actively_connect_to_dst_nodes configuration option. Configures
whether to actively establish an outgoing connection to a destination
node, when being asked by a source node to relay a connection to said
destination node.

For security reasons this behaviour is disabled by default, thus a relay
node will not actively establish an outgoing connection to a destination
node in case it is not yet connected to said destination node. Instead a
destination node should establish a connection to a relay node before
advertising their relayed address via that relay node to a source node.

For comparison, here is the same configuration option of the Golang implementation.

@mxinden
Copy link
Member Author

mxinden commented Mar 10, 2021

Continuing on the discussion in #1838 (comment) above.

In addition I would guess other protocols running on the same connection are not making progress either if the light-weight circuit relay negotiation does not make progress.

I think conceptually, different protocols using different substreams on the same connection should be considered independently, while necessarily sharing the same network resource. Making it a requirement that all protocols used on a connection work well (e.g. without timeouts) all the time seems a bit problematic. Especially when it comes to timeouts on (outbound) substreams, in my mind, these should be reported to client code which can then a) decide whether to retry and if so how often and with what kind of back-off strategy and b) whether the entire connection should be closed after x timeouts. The concrete application knows what protocols are used on a particular connection, whereas a single protocol does not know with which others it must share the connections. These were also the considerations for libp2p-request-response. Making the fixed choice within a particular protocol that the connection is killed if a particular substream protocol upgrade does not complete within 10 seconds seems very rigid. It is probably fine in this particular instance, so I'm not opposed, but in general I think any errors other than protocol violations, (outbound) timeouts in particular, should just be reported on the API and left to client code to handle.

The above reasoning makes sense to me. Thank you for the detailed write-up.

In general this seems like something worth striving for in a consistent manner across ProtocolHandler implementations. E.g. libp2p-gossipsub currently terminates the connection whereas, as mentioned above, libp2p-request-response doesn't.

In the specific case here d55bd99 makes libp2p-relay not close connections on dial and incoming upgrade Timeout errors. Instead connections would be eventually closed through the keep alive mechanism, in case both libp2p-relay as well as all other protocols have no more use for the connection. In the future one could explore actively setting KeepAlive::No on Timeout errors, only switching back to KeepAlive::Yes or KeepAlive::Until in case of any future success. That would speed up the connection garbage collection process.

@mxinden
Copy link
Member Author

mxinden commented Mar 10, 2021

Glad to see the Arc<Mutex> gone.

Very much agreed!

@mxinden mxinden merged commit 2f9c175 into libp2p:master Mar 11, 2021
@mxinden
Copy link
Member Author

mxinden commented Mar 11, 2021

After 133 commits this is finally merged. 🎉

Thanks goes to @tomaka for the initial version (#1134), @romanb for the reviews and @dvc94ch for the initial testing.

I will publish a release of libp2p-relay v0.1.0 soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add back the relay
4 participants