Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic IP address: Peers drop without recovery after IP-reset #7220

Closed
handelaar2 opened this issue Sep 12, 2020 · 7 comments · Fixed by #7434 or #7824
Closed

Dynamic IP address: Peers drop without recovery after IP-reset #7220

handelaar2 opened this issue Sep 12, 2020 · 7 comments · Fixed by #7434 or #7824
Milestone

Comments

@handelaar2
Copy link

handelaar2 commented Sep 12, 2020

🐞 Bug Report

Description

My ISP resets my IP every day (it only provides dynamic IP V4 addresses). Im running my beacon with --p2p-host-dns=xxx
The host-dns instantly resolves to the new ip.

However after this ip-change, my peers drop to only a few (<5) and the beacon does not recover from the change anymore back to the expected max-peer value (=30).

What is somewhat strange that the peer drop does not happen every day, although the ip address changes every day. On Sept 6,7 and 9 the reset did not cause a drop in peers.
The only way to recover is to restart the beacon.
Below is a Grafana picture from the last 7 days. On most days I have to restart.
Today I have not restarted yet. Therefore current peer count is still 3.
So last 3 days it happened every day.

2020-09-12_14-21-35

Having only very few peers left after the IP-reset obviously has a very bad impact on the validator performance of the validators using this beacon.

Has this worked before in a previous version?

I have no exact information, but I think it was ok in the past (before discV5), so before medalla.

🔬 Minimal Reproduction

Run beacon with parameter --p2p-host-dns and change/reset the IP address the beacon is using, more commonly called having a "dynamic IP"

🔥 Error

Peer drops to only a few





🌍 Your Environment

Operating System:

  
Docker
  

What version of Prysm are you running? (Which release)
latest. release 24

  

  

Anything else relevant (validator index / public key)?
For reference, my eth1 geth node (which is using the same ip) handles the IP reset without issue. It drops its peers, but after that its discovery gets it back to where it was. Im expecting the same behaviour from my beacon.
2020-09-12_14-29-13

Detailed trace logs regarding this issue already have been sent to @nisdas

@farazdagi
Copy link
Contributor

@nisdas is there any known way to resolve this?

@prestonvanloon
Copy link
Member

I don't think there is much more we can do here. I'd recommend to the user to purchase a static IP from their ISP or not to advertise their public IP. What is happening here is that the user is losing their inbound connections when the IP changes. There really isn't much we can do about it from the code perspective as far as I can tell.

With regards to geth node, what flags are you using? Are you advertising the DNS address? I suspect that you temporarily lose all of your geth peers and then reconnect to them in the same manner you would if you lost your internet access temporarily (which might happen every time you have an IP switch from your ISP).

@nisdas
Copy link
Member

nisdas commented Sep 29, 2020

While IP resets are not great, discovery should still recover. From what it looks like we are completely unable to find new peers which shouldnt be expected even with a dynamic ip. Outbound connections should still very much be possible, I have been meaning to dig into this but I have been occupied with other things(sorry !). I will ping you in discord to debug this further this week @handelaar2 . Initial guess is that discoveryV5 is unable to handle the regular ip resets correctly, will need to have more detailed trace logging as the current one is insufficient.

@nisdas
Copy link
Member

nisdas commented Oct 9, 2020

Unfortunately it still is not resolved by #7434 as reported by @handelaar2 , so re-opening it now

@Phistr90
Copy link

Phistr90 commented Dec 6, 2020

Still an issue for me with the extra step that I am using a vpn with port forwarding. VPN connection is re-established after disconnection. So public IP for the p2p network may stay the same or may change when ip-reset on my network. Over the last days I kept the same vpn server ip. Sometimes Prysm goes into deadlock, sometimes everything works out fine. Geth has absolutely no problem to reconnect. Prysm reconnects to peers relatively fast (75) but it keeps saying

Failed to find peers" error="failed to find peers for subnet

No syncing, nothing else.

p2p_topic_peer_count looks also perfectly fine. I am subscripted to all subnets and I am have peer connections for every attestation committee, beacon_block etc

@handelaar2
Copy link
Author

handelaar2 commented Dec 6, 2020

@Phistr90 Not sure if I understand. You write "Prysm goes into deadlock" but also " Prysm reconnects to peers relatively fast". The specific error you mention is already being worked with #8048. This issue I raised (now closed) was about peer_count going to almost zero and stay there (so not being able to discover new peers). This is different from finding no peers for a specific subnet.
I think you need to clarify here: "no peers at all" or "no peers for subnet"

@Phistr90
Copy link

Phistr90 commented Dec 6, 2020

Yeah you are correct, this is the wrong place for my issue. Was too late for me when submitted and just searched for dynamic ip 😓 Hopefully the mentioned PR will do the trick otherwise I'll open a new issue. The problem at hand is that although I have peers overall and for subnets, it still stops syncing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants