Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New testnet node not starting initial block sync #2183

Open
raucao opened this issue Nov 19, 2023 · 4 comments
Open

New testnet node not starting initial block sync #2183

raucao opened this issue Nov 19, 2023 · 4 comments

Comments

@raucao
Copy link

raucao commented Nov 19, 2023

I set up two new VMs containing one testnet and one mainnet node on a new host. The mainnet node started initial sync normally after a short while. However, the testnet node has not started syncing a single block after 2 weeks of trying.

The logs will occasionally report bootstrap peers having been found, but there are no errors or warnings reporting any issues with sync.

2023-11-19-15:59:19.414 DEBUG [c.r.n.d.PeerExplorer]  New Peer found or id changed: ip[bootstrap02.testnet.rsk.co] port[50505] id [NodeID{d6f2321604dcb136d2bb14eae2ebdacffc175d5bc2ae56a8a1e053e62edd38a072c9c616bc3c7bf73455f503aa3a2589cf643798f31053d8ff42aba4c1c1394b}]
2023-11-19-15:59:19.476 DEBUG [c.r.n.d.PeerExplorer]  New Peer found or id changed: ip[bootstrap06.testnet.rsk.co] port[50505] id [NodeID{78ba2785b14576495a8299952dab590af00c43cdedf1a398a162b98479ceec2267bf579ee8dd0ccdc27ac7dc416c1f681ee7955ac0e1b42b0b00cc6663177de9}]
2023-11-19-15:59:19.477 DEBUG [c.r.n.d.PeerExplorer]  New Peer found or id changed: ip[bootstrap01.testnet.rsk.co] port[50505] id [NodeID{0b9ccb3896217ad9208cdd40c6a867a88aa0024f26ee04f78eec30fb6030a5d79639adb687667d70df3c79a4c4a8e99be766ca4386d18298af0a00af1dae9da4}]
2023-11-19-15:59:19.556 DEBUG [c.r.n.d.PeerExplorer]  New Peer found or id changed: ip[bootstrap03.testnet.rsk.co] port[50505] id [NodeID{42c45daab94c3b38daa5b323274cc80ae46741e420ea91f96be48c3711b861904458a8d50531f25f355a407c156a93adc22f714befc0919f094cc82d375b8c74}]
2023-11-19-15:59:19.559 DEBUG [c.r.n.d.PeerExplorer]  New Peer found or id changed: ip[bootstrap05.testnet.rsk.co] port[50505] id [NodeID{af7db902f8b1713b2d6b9e57d28928d519bfd67882d91fc3ff891bbafbd0fbfd986e8724fda8d0187c4459c00605e8f203f320d8cbf43f4e484129daeb756d4e}]
2023-11-19-15:59:19.572 DEBUG [c.r.n.d.PeerExplorer]  New Peer found or id changed: ip[bootstrap04.testnet.rsk.co] port[50505] id [NodeID{fd403b30e37f1bee50b03d7a71a1af82f2af9ee95fd49add15803eec94947f51f6235ec3fdc0c3aa18359933729c2a1514f9ae18ea07fc8f5219578ce87b6bd4}]
2023-11-19-15:59:20.284 DEBUG [messageProcess]  Queued Messages: 0
2023-11-19-15:59:21.285 DEBUG [messageProcess]  Queued Messages: 0
2023-11-19-15:59:21.555 DEBUG [discover]  6 Nodes retrieved from the PE.
2023-11-19-15:59:22.285 DEBUG [messageProcess]  Queued Messages: 0
2023-11-19-15:59:23.285 DEBUG [messageProcess]  Queued Messages: 0
2023-11-19-15:59:24.285 DEBUG [messageProcess]  Queued Messages: 0
2023-11-19-15:59:24.555 DEBUG [discover]  6 Nodes retrieved from the PE.
2023-11-19-15:59:25.285 DEBUG [messageProcess]  Queued Messages: 0

How can I debug why sync is not starting?

@raucao raucao changed the title New testnet node not starting intial block sync New testnet node not starting initial block sync Nov 19, 2023
@Vovchyk
Copy link
Contributor

Vovchyk commented Nov 21, 2023

From what I can see in the logs, your node struggles to discover network peers (other than the bootstrap ones, a.k.a. Bootstrap-Only nodes). In testnet the bootstrap nodes provide only one service - discovery of other peers - that's why your node wasn't able to fetch blocks and start syncing without being connected to at least one "full" node.

The reason why your node wasn't able to find (discover) full nodes in the network might be related to the fact that so-called "buckets" in an internal table of boot nodes that correspond to your nodeId were already filled up with other peers. Usually this is a rare case, but sometimes it happens. You can find more info on how the discovery protocol works, for instance, in here.

We are now working on improving the UX of initial bootstrapping process in Testnet, eg. recently we increased the number of boot nodes in this PR, but that hasn't been released yet.

What you can do by now is following:

  • simply try to restart your node (sometimes that might just help, if not - try other options below)
  • extend number of boot nodes in your node's config. Full list can be found in here
  • re-generate your nodeId. The simplest way is to remove (and probably backup, if needed) <databaseDir>/nodeId.properties file, so that your node will automatically generate a new one on next start. With this there's a good chance that at least one boot node would have a non-full "bucket" that corresponds to your new nodeId, which would allow your node to find other full nodes in the network

@raucao
Copy link
Author

raucao commented Nov 21, 2023

Thank you for the explanation.

I have tried all of the suggested mitigations, but without luck so far. I had already restarted the node multiple times before, because I remembered how that seemed to fix sync not starting in the past.

I do still have another testnet node running on a different machine, and also added that to the bootstrap list. I do get a couple more IP addresses that aren't the normal bootstrap nodes now, but still no sync. Is there a way that I can tell my own nodes to immediately sync with each other perhaps? They are both on the same private network, so maybe I could even add a rule for prioritized networks or something?

@Vovchyk
Copy link
Contributor

Vovchyk commented Nov 23, 2023

yes, you can specify nodes that you trust or that you want to connect to in your config file - this should also help with bootstrapping / starting a sync. Check the config sections out

@raucao
Copy link
Author

raucao commented Nov 23, 2023

Great! Adding the new node to the existing node's trusted list, and connecting by default to it from the new node solved my issue. Thank you!

The reported original issue still exists when you do not have a trusted node to connect to, but just want to start initial sync via the normal discovery process. So I think this should be kept open until it's been confirmed to work reliably for new users.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants