-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AutoRelay + HolePunching + AutoNAT regressions #2965
Comments
I think this is happening because hole punching wants addresses obtained from identify. This is the To confirm if this is the issue, can you try bootstrapping your private nodes with a DHT? Try using the IPFS DHT(https://github.com/libp2p/go-libp2p-kad-dht). This will connect to a bunch of peers who will provide you your public addresses. |
At one point I did try connecting with the known dht peers. Although I didn’t actually initialize the dht itself. I will try that and let you know. |
I tried bootstrapping the DHT and repeated the test. And it worked. Which is to be honest a bit frustrating :) Because it clearly was not working before, even with bootstrapping. Although I previously have been seeing some errors during bootstrapping. Could some temporary failure on the IPFS bootstrap nodes cause something like this? |
I created a separate issue about the use of addresses in hole punching: #2966 I guess this issue could be closed, unless it can be useful to discover and track the issues with AutoNAT being unreliable. |
Let's use #2966. Please open this again if you run in to this again:
|
Recently we upgraded the version of libp2p we are using to v0.36.3. We started having a lot of problems with hole punching, to the point that it just doesn't work.
I spent a few days digging into the issues, and I'd like to share my findings, because I believe there're a few bugs in there, which may be a regression compared to an older version of libp2p.
The setup I used to reproduce this issues is the following:
I manually run the relay, then run the first node, wait until it connects to the relay, and then I copy its addresses. I then spin up the second node on the other computer, and make it connect using the addresses I previously copied. Connection is established, but it gets stuck with
Limited
state and never gets upgraded intoConnected
state, so I'm never being able to open any streams, unless I use theAllowLimitedConn
option.I tried doing the same thing, without forcing reachability on NAT-ed nodes, and letting them figure it out using AutoNAT. It didn't help. Using AutoNAT v2 doesn't seem to make any difference either. Computers correctly find their are private, then connect to the relay, but they never figure out their own public IPs. Sometimes I see a lot of random AutoNAT dialing failures in the logs.
After spending a lot of time tweaking the code and enabling all sorts of log messages I figured the following:
Regardless of whether reachability is forced, and regardless of whether AutoNAT v2 is used, it in both of my totally separate networks the libp2p node is not able to discover its public IP address. This in turn never starts the hole puncher service, which is why the relayed connection never gets upgraded into a direct one.
I tried to fix this problem by manually detecting my public IP using STUN, and then adding it to the list of my addresses using custom
AddrFactory
option.That didn't fix the problem, because the hole punching code doesn't use
host.Addrs()
to detect its public IP to perform the DCUtR protocol. It only takes observed addresses + network interface addresses.See this code:
go-libp2p/p2p/protocol/holepunch/svc.go
Lines 279 to 307 in 921cc71
And this line here:
go-libp2p/p2p/protocol/holepunch/holepuncher.go
Line 209 in 921cc71
In my case, observed addresses are always empty, because for some reason AutoNAT doesn't seem to be doing its job. And because
host.Addrs()
is not being called there my custom AddrFactory is not being used either.I forked libp2p and added the necessary changes to use
host.Addrs()
to collect all the addresses. Unfortunately that didn't work, because AutoRelay seem to be overwriting my custom AddrFactory. I created a separate issue for this: #2964.So, this is where I realized that no amount of duct-tape will fix the problem for me, so I decided to create this issue.
To summarize:
The text was updated successfully, but these errors were encountered: