Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discovery zero refresh timer #8661

Merged
merged 27 commits into from
Nov 7, 2023
Merged

Discovery zero refresh timer #8661

merged 27 commits into from
Nov 7, 2023

Conversation

mh0lt
Copy link
Contributor

@mh0lt mh0lt commented Nov 6, 2023

This fixes an issue where the mumbai testnet node struggle to find peers. Before this fix in general test peer numbers are typically around 20 in total between eth66, eth67 and eth68. For new peers some can struggle to find even a single peer after days of operation.

These are the numbers after 12 hours or running on a node which previously could not find any peers: eth66=13, eth67=76, eth68=91.

The root cause of this issue is the following:

  • A significant number of mumbai peers around the boot node return network ids which are different from those currently available in the DHT
  • The available nodes are all consequently busy and return 'too many peers' for long periods

These issues case a significant number of discovery timeouts, some of the queries will never receive a response.

This causes the discovery read loop to enter a channel deadlock - which means that no responses are processed, nor timeouts fired. This causes the discovery process in the node to stop. From then on it just re-requests handshakes from a relatively small number of peers.

This check in fixes this situation with the following changes:

  • Remove the deadlock by running the timer in a separate go-routine so it can run independently of the main request processing.
  • Allow the discovery process matcher to match on port if no id match can be established on initial ping. This allows subsequent node validation to proceed and if the node proves to be valid via the remainder of the look-up and handshake process it us used as a valid peer.
  • Completely unsolicited responses, i.e. those which come from a completely unknown ip:port combination continue to be ignored.

@mh0lt mh0lt added the polygon label Nov 6, 2023
@mh0lt mh0lt merged commit 509a7af into devel Nov 7, 2023
7 checks passed
@mh0lt mh0lt deleted the discovery_zero_refresh_timer branch November 7, 2023 08:48
mh0lt pushed a commit that referenced this pull request Jan 11, 2024
The handler had race conditions in the candidates processing goroutine.
battlmonstr added a commit that referenced this pull request Jan 11, 2024
The handler had race conditions in the candidates processing goroutine.
yperbasis added a commit that referenced this pull request Jan 11, 2024
…#9210)

Co-authored-by: yperbasis <andrey.ashikhmin@gmail.com>
racytech pushed a commit to racytech/erigon that referenced this pull request Jan 12, 2024
…ontech#9119) (erigontech#9195)

The handler had race conditions in the candidates processing goroutine.
wmitsuda added a commit that referenced this pull request Jan 19, 2024
commit 0c7300e
Author: Willian Mitsuda <wmitsuda@gmail.com>
Date:   Fri Jan 12 20:05:43 2024 -0300

    Squashed work from ots2-alpha4

commit c9216ce
Author: battlmonstr <battlmonstr@users.noreply.github.com>
Date:   Thu Jan 11 17:13:46 2024 +0100

    p2p/discv4: revert gotreply handler change from #8661 (#9119) (#9195) (#9210)

    Co-authored-by: yperbasis <andrey.ashikhmin@gmail.com>

commit d079008
Author: Andrew Ashikhmin <34320705+yperbasis@users.noreply.github.com>
Date:   Mon Jan 8 17:44:30 2024 +0100

    release: Amoy bootnodes (#9166)

    Cherry pick #9158

    ---------

    Co-authored-by: Arpit Temani <temaniarpit27@gmail.com>

commit a49fcb8
Author: Andrew Ashikhmin <34320705+yperbasis@users.noreply.github.com>
Date:   Fri Jan 5 13:03:28 2024 +0100

    release params: remove dev from version (#9143)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant