Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core/src/transport: Add Transport::dial_as_listener #2363

Merged
merged 14 commits into from
Jan 17, 2022

Conversation

mxinden
Copy link
Member

@mxinden mxinden commented Nov 28, 2021

Allows NetworkBehaviour implementations to dial a peer, but instruct
the dialed connection to be upgraded as if it were the listening
endpoint.

This is needed when establishing direct connections through NATs and/or
Firewalls (hole punching). When hole punching via TCP (QUIC is different
but similar) both ends dial the other at the same time resulting in a
simultaneously opened TCP connection. To disambiguate who is the dialer
and who the listener there are two options:

  1. Use the Simultaneous Open Extension of Multistream Select. See
    sim-open specification and sim-open-rust Rust implementation.

  2. Disambiguate the role (dialer or listener) based on the role within
    the DCUtR dcutr protocol. More specifically the node initiating the
    DCUtR process will act as a listener and the other as a dialer.

This commit enables (2), i.e. enables the DCUtR protocol to specify the
role used once the connection is established.

While on the positive side (2) requires one round trip less than (1), on
the negative side (2) only works for coordinated simultaneous dials.
I.e. when a simultaneous dial happens by chance, and not coordinated via
DCUtR, the connection attempt fails when only (2) is in place.

Fixes #2250.

Allows `NetworkBehaviour` implementations to dial a peer, but instruct
the dialed connection to be upgraded as if it were the listening
endpoint.

This is needed when establishing direct connections through NATs and/or
Firewalls (hole punching). When hole punching via TCP (QUIC is different
but similar) both ends dial the other at the same time resulting in a
simultaneously opened TCP connection. To disambiguate who is the dialer
and who the listener there are two options:

1. Use the Simultaneous Open Extension of Multistream Select. See
   [sim-open] specification and [sim-open-rust] Rust implementation.

2. Disambiguate the role (dialer or listener) based on the role within
   the DCUtR [dcutr] protocol. More specifically the node initiating the
   DCUtR process will act as a listener and the other as a dialer.

This commit enables (2), i.e. enables the DCUtR protocol to specify the
role used once the connection is established.

While on the positive side (2) requires one round trip less than (1), on
the negative side (2) only works for coordinated simultaneous dials.
I.e. when a simultaneous dial happens by chance, and not coordinated via
DCUtR, the connection attempt fails when only (2) is in place.

[sim-open]: https://github.com/libp2p/specs/blob/master/connections/simopen.md
[sim-open-rust]: libp2p#2066
[dcutr]: https://github.com/libp2p/specs/blob/master/relay/DCUtR.md
@mxinden mxinden mentioned this pull request Nov 28, 2021
14 tasks
/// thus have one peer dial the other as a dialer and one peer dial the
/// other _as a listener_.
#[derive(Debug, Default, Copy, Clone, Hash, Eq, PartialEq)]
pub struct DialAsListener(bool);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I introduced this newtype instead of passing a bool through the various layers.

I like the fact that it (a) increases type safety and (b) carries its documentation. Unfortunately it is not particularly intuitive, nor idiomatic (never seen a newtype around a bool) nor convenient (e.g. see if as_listener == false).

@thomaseizinger you usually have good ideas when it comes to idiomatic Rust. Any alternatives that come to your mind?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of:

    Dialer {
        /// An optional override if the desired role for the upgrade process has already been negotiated out of band.
		////
		/// If this is set, the endpoint resulting from a successful dial should assume the given role instead of inferring it from the underlying transport.
        role_override: Option<Endpoint>,
    },

We could be even more descriptive and do:

enum UpgradeRole {
	InferredFromTransport,
	NegotiatedOutOfBand {
		role: Endpoint
	}
}
// -------------
Dialer {
    /// The strategy, how the role within the upgrade process should be determined.
    upgrade_role: UpgradeRole,
},

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    Dialer {
        /// An optional override if the desired role for the upgrade process has already been negotiated out of band.
		////
		/// If this is set, the endpoint resulting from a successful dial should assume the given role instead of inferring it from the underlying transport.
        role_override: Option<Endpoint>,
    },

This is a good idea, especially as it does not introduce a new type. Though it does allow for duplication of information and thus potentially confusion? PendingPoint::Dialer { role_override: Some(Endpoint::Dialer) } is the same as PendingPoint::Dialer { role_override: None }.

Copy link
Member Author

@mxinden mxinden Nov 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enum UpgradeRole {
	InferredFromTransport,
	NegotiatedOutOfBand {
		role: Endpoint
	}
}

Note that this whole out-of-band role negotiation does not only concern upgrades, but some transport protocols as well. The TCP transport implementation does not require this information as it will open a simultaneous open TCP connection either way. But QUIC does. In the case of QUIC the listener sends a random-garbage packet instead of a connection-initiation packet.

  • For a QUIC address:
    • Upon receiving the Sync, A immediately dials the address to B.
    • Upon expiry of the timer, B starts to send UDP packets filled with
      random bytes to A's address. Packets should be sent repeatedly in
      random intervals between 10 and 200 ms.
    • This will result in a QUIC connection where A is the client and B is
      the server.

https://github.com/libp2p/specs/blob/master/relay/DCUtR.md#the-protocol

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Back to your suggestion of role_override, how would the signatures of Network::dial and Peer::dial look like?

    pub fn dial(
        &mut self,
        address: &Multiaddr,
        handler: THandler,
        role_override: Option<Endpoint>,
    ) -> Result<ConnectionId, DialError<THandler>>

I find this descriptive but again confusing due to the duplication of information where Some(Endpoint::Dialer) is the same as None.

- Use `Endpoint` instead of `DialAsListener` where `Endpoint::Dialer` is
  the default and `Endpoint::Listener` represents the override.
- Use `override_role` instead of `as_listener`.
- Extend `Transport` through new method `dial_with_role_override`.
@mxinden mxinden marked this pull request as ready for review December 15, 2021 21:59
@mxinden
Copy link
Member Author

mxinden commented Dec 15, 2021

@thomaseizinger I gave this another shot with your previous input. I did the following changes:

  • Use Endpoint instead of DialAsListener where Endpoint::Dialer is
    the default and Endpoint::Listener represents the override.
  • Use override_role instead of as_listener.
  • Extend Transport through new method dial_with_role_override.

Could you take another look?

Copy link
Contributor

@thomaseizinger thomaseizinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late reply!

/// This option is needed for NAT and firewall hole punching.
///
/// See [`ConnectedPoint::Dialer`](crate::connection::ConnectedPoint::Dialer) for related option.
fn dial_with_role_override(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think on this level, it might be more descriptive to use a terminology like dial_as_listener? There is no overriding parameter here so not having been involved this discussion, I would be thinking "overridden with what?"

Comment on lines +81 to +84
ConnectedPoint::Dialer {
address: addr,
role_override: Endpoint::Dialer,
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is indeed a bit awkward. I am not sure I completely understand this:

Note that this whole out-of-band role negotiation does not only concern upgrades, but some transport protocols as well. The TCP transport implementation does not require this information as it will open a simultaneous open TCP connection either way.

From your PR description, what we are trying to achieve is to explicitly instruct the upgrade phase of a connection, which role to use instead of inferring it from somewhere. If that is true, perhaps we should rename role_override to upgrade_role. That naming would make the default case a bit less awkward as we are simply always stating the role explicitly.

What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the hope to resolve some confusion: The role_override passed through Network::dial is relevant in two places:

  1. in AndThen Transport and later upgrade::apply instructing to either use the OutboundUpgrade::upgrade_outbound (default case, no override) or InboundUpgrade::upgrade_inbound (override).
  2. In certain Transport implementations. E.g. libp2p-quic, when role_override is set and thus Transport::dial_with_role_override (or Transport::dial_as_listener) is called the local node sends a garbage UDP packet instead of a handshake packet, to punch a hole through its NAT / firewall only.

So would you keep role_override on Network::dial, but rename to upgrade_role in ConnectedPoint::Dialer and dial_as_listener in Transport? With 3 different names for related features, would that not be even more confusing?

Copy link
Contributor

@thomaseizinger thomaseizinger Dec 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, thanks for giving me more context. I think I would indeed choose a name for each component that makes sense within that scope. For the upgrade phase, an explicit role parameter is pretty clear IMO.

For the transports, wouldn't it be nice if we could use some kind of "back-channel" to feed in this information? We could add a function like this to the transport interface (name subject to bike-shedding):

trait Transport {
	fn register_hole_punching_attempt(&mut self, address: Multiaddr) { }
}

A transport that cares about this (QUIC), can override the function and upon dial check whether or not this address need to be handled in a special way. A transport that doesn't care, like TCP, can just not override this function.

Perhaps the entire feature can implement in such a back-channel way? Sorry for going into so many different directions here, I am just brain storming for better solutions because now that I think I understand the overall problem, I don't really like my original suggestion that much :D

Copy link
Member Author

@mxinden mxinden Dec 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the transports, wouldn't it be nice if we could use some kind of "back-channel" to feed in this information? We could add a function like this to the transport interface (name subject to bike-shedding):

Do I understand your suggestion correctly that protocols like libp2p-dcutr would emit a NetworkBehaviourAction::Dial with some hole-punching-marker set, this would propagate to ConcurrentDial where one would then first call Transport::register_hole_punching_attempt and then call Transport::dial.

Why associate the hole-punching-marker with a Multiaddr instead of a dial attempt (call to Swarm::dial, Network::dial, Transport::dial). In other words, why force a Transport to keep track of a Multiaddr-to-hole-punching-marker mapping when we can as well just provide it with each call to Transport::dial as a parameter, or Transport::dial_with_role_override, or Transport::dial_as_listener?

Sorry for going into so many different directions here, I am just brain storming for better solutions because now that I think I understand the overall problem, I don't really like my original suggestion that much :D

Very much appreciated. Exactly why I asked for your review.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note that Combinator Transport implementations like AndThen would need to keep track of the Multiaddr to hole-punching-marker as well, given that they need to forward the right ConnectedPoint to the upgrade provided with Transport::and_then, e.g. either a ConnectedPoint::Dialer { role_override: Endpoint::Dialer } or a ConnectedPoint::Dialer { role_override: Endpoint::Listener }.

See below for how this is solved currently in this pull request. Note the different role_override values.

fn dial(self, addr: Multiaddr) -> Result<Self::Dial, TransportError<Self::Error>> {
let dialed_fut = self
.transport
.dial(addr.clone())
.map_err(|err| err.map(EitherError::A))?;
let future = AndThenFuture {
inner: Either::Left(Box::pin(dialed_fut)),
args: Some((
self.fun,
ConnectedPoint::Dialer {
address: addr,
role_override: Endpoint::Dialer,
},
)),
_marker: PhantomPinned,
};
Ok(future)
}
fn dial_with_role_override(
self,
addr: Multiaddr,
) -> Result<Self::Dial, TransportError<Self::Error>> {
let dialed_fut = self
.transport
.dial_with_role_override(addr.clone())
.map_err(|err| err.map(EitherError::A))?;
let future = AndThenFuture {
inner: Either::Left(Box::pin(dialed_fut)),
args: Some((
self.fun,
ConnectedPoint::Dialer {
address: addr,
role_override: Endpoint::Listener,
},
)),
_marker: PhantomPinned,
};
Ok(future)
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was to leave the interface that serves 90% of the usecases untouched and model the special case using a dedicated interface. That way, we are

What do you think of the current approach, implementing a middle ground? Instead of altering Transport::dial which "serves 90% of the usecases", it adds Transport::dial_with_role_override. Transports that don't care can simply delegate to Transport::dial.

Yeah that is fair!

I am not yet convinced that we need to follow the same naming on all levels. I think I'd prefer a name that is more descriptive for the individual functionality and composed together, they allow for hole punching then.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the transport, what do you think of a name like: Transport::dial_for_nat_traversal or Transport::dial_nat_traversing.

The reason I'd advocate for such a naming is because the transport doesn't know what happens to the connection once it is established.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For NAT traversal both endpoints dial each other, but only one needs to call Transport::dial_for_nat_traversal, in other words, only one needs to dial as a listener, in other words only one needs to dial with a role override. The name dial_for_nat_traversal would thus be confusing, as one would expect both nodes to use dial_for_nat_traversal for a NAT traversal, no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah fair point!

I think I've come full circle on this and dial_as_listener is my preferred option now 😅
Sorry about the long back and forth that ended up in the original idea!

What do you think:

  1. Use dial_as_listener in Transport
  2. Using the "override" notation within the declarative PendingPoint struct

This would result in a piece of code that is something like:

if role_override == Listener {
	dial_as_listener
} else {
	dial
}

Which is quite readable IMO.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I've come full circle on this and dial_as_listener is my preferred option now sweat_smile
Sorry about the long back and forth that ended up in the original idea!

🚀 maybe a strange loop 😇 ?

af10530 does the rename. Note that libp2p_swarm::DialOpts and libp2p_core::DialOpts use override_role. Let me know in case you would like that to be changed to as_listener as well.

See:

mxinden added a commit to mxinden/rust-libp2p that referenced this pull request Dec 28, 2021
Enable a `NetworkBehaviour` or a user via `Swarm::dial` to override the
dial concurrency factor per dial. This is especially relevant in the
case of libp2p-autonat where one wants to probe addresses in sequence to
reduce the amount of work a remote peer can force onto the local node.

To enable the above, this commit also:

- Introduces `libp2p_core::DialOpts` mirroring `libp2p_swarm::DialOpts`.
  Passed as an argument to `Network::dial`.
- Removes `Peer::dial` in favor of `Network::dial`.
- Simplifies `Swarm::dial_with_handler`.

The introduction of `libp2p_core::DialOpts` will be useful beyond this
feature, e.g. for libp2p#2363.

In the long run I would like to move and merge `libp2p_core::Network`
and `libp2p_core::Pool` into `libp2p_swarm::Swarm` thus deduplicating
`libp2p_core::DialOpts` and `libp2p_swarm::DialOpts`.

Fixes libp2p#2385.
mxinden added a commit to mxinden/rust-libp2p that referenced this pull request Dec 28, 2021
Enable a `NetworkBehaviour` or a user via `Swarm::dial` to override the
dial concurrency factor per dial. This is especially relevant in the
case of libp2p-autonat where one wants to probe addresses in sequence to
reduce the amount of work a remote peer can force onto the local node.

To enable the above, this commit also:

- Introduces `libp2p_core::DialOpts` mirroring `libp2p_swarm::DialOpts`.
  Passed as an argument to `Network::dial`.
- Removes `Peer::dial` in favor of `Network::dial`.
- Simplifies `Swarm::dial_with_handler`.

The introduction of `libp2p_core::DialOpts` will be useful beyond this
feature, e.g. for libp2p#2363.

In the long run I would like to move and merge `libp2p_core::Network`
and `libp2p_core::Pool` into `libp2p_swarm::Swarm` thus deduplicating
`libp2p_core::DialOpts` and `libp2p_swarm::DialOpts`.

Fixes libp2p#2385.
mxinden added a commit that referenced this pull request Jan 13, 2022
)

Enable a `NetworkBehaviour` or a user via `Swarm::dial` to override the
dial concurrency factor per dial. This is especially relevant in the
case of libp2p-autonat where one wants to probe addresses in sequence to
reduce the amount of work a remote peer can force onto the local node.

To enable the above, this commit also:

- Introduces `libp2p_core::DialOpts` mirroring `libp2p_swarm::DialOpts`.
  Passed as an argument to `Network::dial`.
- Removes `Peer::dial` in favor of `Network::dial`.
- Simplifies `Swarm::dial_with_handler`.

The introduction of `libp2p_core::DialOpts` will be useful beyond this
feature, e.g. for #2363.

In the long run I would like to move and merge `libp2p_core::Network`
and `libp2p_core::Pool` into `libp2p_swarm::Swarm` thus deduplicating
`libp2p_core::DialOpts` and `libp2p_swarm::DialOpts`.

Fixes #2385.
@mxinden mxinden changed the title core/src/transport: Add as_listener to Transport::dial core/src/transport: Add Transport::dial_with_role_override Jan 13, 2022
@mxinden
Copy link
Member Author

mxinden commented Jan 13, 2022

Updated to use DialOpts introduced through #2404.

@mxinden mxinden changed the title core/src/transport: Add Transport::dial_with_role_override core/src/transport: Add Transport::dial_as_listener Jan 15, 2022
Copy link
Contributor

@thomaseizinger thomaseizinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@mxinden mxinden merged commit 96dbfcd into libp2p:master Jan 17, 2022
@mxinden
Copy link
Member Author

mxinden commented Jan 17, 2022

Thanks @thomaseizinger for the help!

@thomaseizinger
Copy link
Contributor

Thanks @thomaseizinger for the help!

Thank you for your patience! :)

santos227 added a commit to santos227/rustlib that referenced this pull request Jun 20, 2022
…404)

Enable a `NetworkBehaviour` or a user via `Swarm::dial` to override the
dial concurrency factor per dial. This is especially relevant in the
case of libp2p-autonat where one wants to probe addresses in sequence to
reduce the amount of work a remote peer can force onto the local node.

To enable the above, this commit also:

- Introduces `libp2p_core::DialOpts` mirroring `libp2p_swarm::DialOpts`.
  Passed as an argument to `Network::dial`.
- Removes `Peer::dial` in favor of `Network::dial`.
- Simplifies `Swarm::dial_with_handler`.

The introduction of `libp2p_core::DialOpts` will be useful beyond this
feature, e.g. for libp2p/rust-libp2p#2363.

In the long run I would like to move and merge `libp2p_core::Network`
and `libp2p_core::Pool` into `libp2p_swarm::Swarm` thus deduplicating
`libp2p_core::DialOpts` and `libp2p_swarm::DialOpts`.

Fixes #2385.
umgefahren pushed a commit to umgefahren/rust-libp2p that referenced this pull request Mar 8, 2024
Allows `NetworkBehaviour` implementations to dial a peer, but instruct
the dialed connection to be upgraded as if it were the listening
endpoint.

This is needed when establishing direct connections through NATs and/or
Firewalls (hole punching). When hole punching via TCP (QUIC is different
but similar) both ends dial the other at the same time resulting in a
simultaneously opened TCP connection. To disambiguate who is the dialer
and who the listener there are two options:

1. Use the Simultaneous Open Extension of Multistream Select. See
   [sim-open] specification and [sim-open-rust] Rust implementation.

2. Disambiguate the role (dialer or listener) based on the role within
   the DCUtR [dcutr] protocol. More specifically the node initiating the
   DCUtR process will act as a listener and the other as a dialer.

This commit enables (2), i.e. enables the DCUtR protocol to specify the
role used once the connection is established.

While on the positive side (2) requires one round trip less than (1), on
the negative side (2) only works for coordinated simultaneous dials.
I.e. when a simultaneous dial happens by chance, and not coordinated via
DCUtR, the connection attempt fails when only (2) is in place.

[sim-open]: https://github.com/libp2p/specs/blob/master/connections/simopen.md
[sim-open-rust]: libp2p#2066
[dcutr]: https://github.com/libp2p/specs/blob/master/relay/DCUtR.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

{swarm,core}/: Allow specifying the role of a connection in NetworkBehaviourAction::Dial
2 participants