Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Figure out the design of nat_traversal and external addresses #794

Closed
tomaka opened this issue Dec 18, 2018 · 10 comments
Closed

Figure out the design of nat_traversal and external addresses #794

tomaka opened this issue Dec 18, 2018 · 10 comments
Labels
difficulty:moderate priority:important The changes needed are critical for libp2p, or are blocking another project

Comments

@tomaka
Copy link
Member

tomaka commented Dec 18, 2018

For paritytech/substrate#1193
cc #544

The following problems are unresolved:

  • Should nat_traversal() be implemented by Transports, or maybe on Multiaddress instead?
  • Since reported IPs can be malicious, should we add a TTL to external addresses? Use a more elaborate system than just storing them in a Vec?
  • Does reporting a bad IP address have any consequence?
@tomaka tomaka added this to the substrate-1.0-final milestone Dec 18, 2018
@tomaka tomaka added priority:important The changes needed are critical for libp2p, or are blocking another project difficulty:moderate discussion labels Dec 18, 2018
@tomaka
Copy link
Member Author

tomaka commented Dec 18, 2018

Also:

  • In protocols like Kademlia and Identify, where nodes self-report their external addresses, do we need a system against malicious nodes that report wrong addresses or a large number of addresses?

@twittner
Copy link
Contributor

Should nat_traversal() be implemented by Transports, or maybe on Multiaddress instead?

It would make sense to me to not name this method nat_traversal but rather merge_matching_prefix and to try implementing it on Multiaddr.

As for real NAT traversal we would need more than reporting externally observed addresses. Many NATs perform address and port filtering which makes the address reported by some peer useless for other peers. This limitation may already produce many invalid addresses without peers acting maliciously. Some scoring algorithm could help filtering out those addresses.

Does reporting a bad IP address have any consequence?

At least it would harm connectivity, i.e. it would take longer to establish connections to peers. Otherwise it depends on how a node keeps and uses reported addresses.

Since reported IPs can be malicious, should we add a TTL to external addresses?

It would help to make explicit what kind of malicious behaviour we want to protect against. Is it only the reporting of invalid or unreachable addresses? Are we concerned about DDOS attacks or what other kinds of bad behaviour?

@dvdplm
Copy link
Contributor

dvdplm commented Dec 19, 2018

My two cents on this is that it's wise to keep Multiaddress as dumb as we can and limited in scope to validate addresses only at the "grammar" level, keeping logic to determine what is a "good" or "bad" address elsewhere.
Real NAT traversal, like @twittner says, is sort of a black box for external observers: there's no way to tell which transforms were performed (or why) from the outside. It seems to me that the Transport is the layer that has the best chance of making educate guesses as to what a given behaviour means (and it will guess wrong quite often). I would imagine that it's going to be mainly the application layer that implements whatever logic it needs to decide when an address is "malicious" or not.

@tomaka
Copy link
Member Author

tomaka commented Dec 19, 2018

It would make sense to me to not name this method nat_traversal but rather merge_matching_prefix and to try implementing it on Multiaddr.

The question also is: is merge_matching_prefix always what we want? Or is there a situation where the observed address and listened address have to be compared in a different way?

At least it would harm connectivity, i.e. it would take longer to establish connections to peers.

And that could be a problem if a node has 14 bad addresses and only one good one.

It would help to make explicit what kind of malicious behaviour we want to protect against. Is it only the reporting of invalid or unreachable addresses? Are we concerned about DDOS attacks or what other kinds of bad behaviour?

In this issue only about invalid/unreachable addresses.

@twittner
Copy link
Contributor

To start off simple we could store the addresses in a priority queue with counters reflecting the number of times we have seen an address reported to us and use them in priority order. Having seen an address multiple times from different peers makes it more likely that we would be able to reach it.

@tomaka
Copy link
Member Author

tomaka commented Mar 20, 2019

Here is some context about NAT traversal in rust-libp2p in general.

The mechanisms in play when it comes to addresses are:

  • When we start a listener with Transport::listen_on, the listen_on method returns an address where it's listening. In practice, this is generally /ip4/0.0.0.0/tcp/30333 for Substrate.
  • When we connect to a node, we ask it through the identify protocol the address it sees for us. For example, this could be /ip4/80.72.98.155/tcp/42901. This address then goes through the Transport::nat_traversal method, which returns an "external address". In practice this generally just switches the IP to the observed one, for example /ip4/80.72.98.155/tcp/30333. This external address is added to a list in the Swarm.
  • In the same way we obtain our observed addresses, identify also allows us to retrieve the listened addresses of the remote. When we receive an identify request, we return on the wire the combination of the external addresses that we know (eg. /ip4/80.72.98.155/tcp/30333) plus the listened addresses (eg. /ip4/0.0.0.0/tcp/30333).
  • The Kademlia protocol stores locally the addresses that nodes we are connected to report for themselves. In other words: when we connect to a node B, we ask what its addresses are, and then store that result locally. If a third node C asks us information about B, we return these addresses so that C can connect to B.
  • For mDNS, nodes send to the local network the list of their listened addresses (eg. /ip4/0.0.0.0/tcp/30333). When we receive such a message, since we know the IP of the sender, we perform a "reserve NAT traversal". We call nat_traversal() with the IP of the remote and the listened addresses reported by the remote, in order to know the external IP of the remote to which to connect to.
  • The libp2p protocol also provides communication relays, but that's not implemented in rust-libp2p yet (Add back the relay #725) and isn't easy to put back.

Problems and things to improve:

  • We don't use UPnP (UPnP in the TcpConfig #558)
  • We should be able to determine our local IP addresses. In other words, /ip4/0.0.0.0/... should be replaced with /ip4/127.0.0.1/tcp/... and /ip4/192.168.1.2/tcp/.... The network adapter knows about these IP addresses.
  • Sometimes the address returned by listen_on should be multiple addresses, or should come later. See more context in UPnP in the TcpConfig #558. The API of listen_on doesn't allow that.
  • Is it necessary to send the listened addresses with identify? (/ip4/0.0.0.0/tcp/30333). That is rather stupid.
  • Nodes can report any observed IP address for us without using being able to check that they are correct, meaning that we have potentially a huge list of external addresses.
  • How is this whole mechanism attackable?

@tomaka
Copy link
Member Author

tomaka commented Mar 20, 2019

Related: paritytech/substrate#1193

@tomaka tomaka mentioned this issue Mar 20, 2019
@tomaka
Copy link
Member Author

tomaka commented Mar 24, 2019

One way we could change the Transport trait is like this:

pub trait Transport {
    type Output;
    type Error: error::Error;

    type Listener: Stream<Item = ListenerStreamEvent, Error = Self::Error>;
    type ListenerUpgrade: Future<Item = Self::Output, Error = Self::Error>;
    type Dial: Future<Item = Self::Output, Error = Self::Error>;
    type ListenAddrsIter: Iterator<Item = Multiaddr>;

    fn listen_on(self, addr: Multiaddr) -> Result<(Self::Listener, ListenAddrsIter), TransportError<Self::Error>>;

    fn dial(self, addr: Multiaddr) -> Result<Self::Dial, TransportError<Self::Error>>;
}

enum ListenerStreamEvent<TListUpgr> {
    Incoming(TListUpgr, Multiaddr),
    NewAddress(Multiaddr),
    AddressExpired(Multiaddr),
}

In particular:

  • Listening would now return a list of multiaddresses.
  • If one listens on /ip4/0.0.0.0, the TCP transport would use the get_if_addrs crate and return the IP addresses of the local interfaces instead.
  • The listener can later return additional addresses, which is useful for UPnP/PCP (UPnP in the TcpConfig #558).
  • The nat_traversal method would be removed in favour of a more generic method on Multiaddr.

twittner added a commit to twittner/rust-libp2p that referenced this issue Mar 28, 2019
This is part of the proposed changes to the `Transport` trait as
outlined in [1].

To allow transport implementations to listen on multiple addresses we
now report all those addresses as a sequence of multi-addresses instead
of supporting only a single one.

Instead of making `Transport::listen_on` return a generic iterator a
concrete type `MultiaddrSeq` is introduced to support non-emptiness.

[1]: libp2p#794 (comment)
@tomaka
Copy link
Member Author

tomaka commented Apr 24, 2019

Now that #1052 is done, the next step, IMO, would be to add a struct that "manages" external addresses. Whenever a node reports our IP address to us, we notify this struct. The struct then manages them in some way to filter out expired and malicious reports.

The Swarm would then use this struct instead of a Vec<Multiaddr> as it is the case at the moment.

@twittner
Copy link
Contributor

I think this can be considered done. To help traversing NATs a lot more could be done obviously, but in regards to address handling I think we can close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
difficulty:moderate priority:important The changes needed are critical for libp2p, or are blocking another project
Projects
None yet
Development

No branches or pull requests

3 participants