Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

identify: Report observer addresses of peers that succeeded dial attempts #203

Open
lexnv opened this issue Aug 14, 2024 · 2 comments
Open
Labels
enhancement New feature or request

Comments

@lexnv
Copy link
Collaborator

lexnv commented Aug 14, 2024

Correlate DialFailure and ListDialFailures attempts with the Identify response provided to peers.

The addresses the node could not dial should be removed from the list of addresses we provide back to the peer.

This ensures the remote peer has a healthy view of its addresses and leads to better connectivity over time.
Libp2p uses a similar approach, caching individual peer addresses and removing the addresses the node failed to dial.

@dmitry-markin
Copy link
Collaborator

I'm not sure I understand this issue correctly, but here is my understanding of the Identify operation. As per libp2p spec, observed_addr is the connection source address of a peer initiating the connection. It is reported in any case and is part of Identify protocol.

In libp2p, Identify protocol implementation keeps a cache of remote peer addresses to provide them when dialing peers, and this is why this list is cleaned up of unreachable addresses. But the observed address is always reported back.

In litep2p peer addresses are discovered entirely through Kademlia DHT routing table, without caching the remote peer listen addresses in the Identify protocol implementation.

So, IMO we shouldn't modify the Identify protocol implementation in litep2p. If we need to check the reachability of external addresses after applying the "many peers have seen the same address" heuristic, it should be done using a different protocol, similar to AutoNAT.

@dmitry-markin
Copy link
Collaborator

Also, the heuristic of not reporting back the failed addresses won't work in case of restricted cone NATs, as in this case the dial attempts of the peer previously dialed by another peer behind NAT will succeed, while no other peers will be able to reach the peer behind NAT using discovered address and port. AutoNAT tries to solve this issue by using different IP to probe the addresses.

lexnv added a commit that referenced this issue Aug 21, 2024
… error reporting (#206)

The purpose of this PR is to pave the way for making the Identify
protocol more robust, which is currently linked with the low number of
peers and connective issues over a long period of time
- paritytech/polkadot-sdk#4925

This PR adds a coherent `DialError` that exposes the minimal information
users need to know about dial failures.
- paritytech/polkadot-sdk#5239

A new litep2p event is added for reporting multiple dial errors that
occur on different protocols back to the user:

```rust
    /// A list of multiple dial failures.
    ListDialFailures {
        /// List of errors.
        ///
        /// Depending on the transport, the address might be different for each error.
        errors: Vec<(Multiaddr, DialError)>,
    },
```

This event eases the debugging of substrate connectivity issues. At the
same time, it can be used in a future PR to inform back to the Identify
protocol which self-reported addresses of some peers are unreachable:
- #203

### Next Steps
- Add more tests
- Warp sync + sync full nodes since this is touching individual
transports

### Future Work
- The overarching `litep2p::Error` needs a closer look and a
refactoring:
  - #204
  - #128
  
- ConnectionError event for individual transports can be simplified:
  - #205
  
- I've observed some inconsistencies in handling TCP vs WebSocket
connection timeouts. I believe that we can have another pass and share
even more code between them:
  - #70

---------

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Co-authored-by: Dmitry Markin <dmitry@markin.tech>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants