Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

p2p/discover: schedule revalidation also when all nodes are excluded #30239

Merged
merged 2 commits into from
Jul 31, 2024

Conversation

dknopik
Copy link
Contributor

@dknopik dknopik commented Jul 29, 2024

Issue

If nextTime has passed, but all nodes are excluded, get would return nil and run would therefore not invoke schedule. Then, we schedule a timer for the past, as neither nextTime value has been updated. This creates a busy loop, as the timer immediately returns.

Fix

With this PR, revalidation will be also rescheduled when all nodes are excluded.

@dknopik dknopik changed the title fix(disc): schedule revalidation also when all nodes are excluded p2p/discover: schedule revalidation also when all nodes are excluded Jul 29, 2024
@lightclient
Copy link
Member

I would apply a patch like this:

diff --git a/p2p/discover/table_reval.go b/p2p/discover/table_reval.go
index c043dbd14..2465fee90 100644
--- a/p2p/discover/table_reval.go
+++ b/p2p/discover/table_reval.go
@@ -77,12 +77,18 @@ func (tr *tableRevalidation) nodeEndpointChanged(tab *Table, n *tableNode) {
 // It returns the next time it should be invoked, which is used in the Table main loop
 // to schedule a timer. However, run can be called at any time.
 func (tr *tableRevalidation) run(tab *Table, now mclock.AbsTime) (nextTime mclock.AbsTime) {
-       if n := tr.fast.getAndSchedule(now, &tab.rand, tr.activeReq); n != nil {
-               tr.startRequest(tab, n)
-       }
-       if n := tr.slow.getAndSchedule(now, &tab.rand, tr.activeReq); n != nil {
-               tr.startRequest(tab, n)
+       reval := func(list *revalidationList) {
+               if list.nextTime <= now {
+                       if n := list.get(now, &tab.rand, tr.activeReq); n != nil {
+                               tr.startRequest(tab, n)
+                       }
+                       // Update nextTime regardless if any requests were started because
+                       // current value has passed.
+                       list.schedule(now, &tab.rand)
+               }
        }
+       reval(&tr.fast)
+       reval(&tr.slow)

        return min(tr.fast.nextTime, tr.slow.nextTime)
 }
@@ -196,13 +202,8 @@ type revalidationList struct {
        name     string
 }

-// getAndSchedule returns a random node from the queue. Nodes in the 'exclude' map are not returned.
-// It also schedules the next invocation if nextTime has already passed.
-func (list *revalidationList) getAndSchedule(now mclock.AbsTime, rand randomSource, exclude map[enode.ID]struct{}) *tableNode {
-       if now < list.nextTime {
-               return nil
-       }
-       list.schedule(now, rand)
+// get returns a random node from the queue. Nodes in the 'exclude' map are not returned.
+func (list *revalidationList) get(now mclock.AbsTime, rand randomSource, exclude map[enode.ID]struct{}) *tableNode {
        if len(list.nodes) == 0 {
                return nil
        }

The current PR is forcing more logic into get(..) whereas I think it's easier to understand what's happening from a top-down perspective in run(..).

@fjl
Copy link
Contributor

fjl commented Jul 30, 2024

I agree with @lightclient but in his patch the condition seems inverted. We only want to perform work and reschedule if list.nextTime >= now.

Co-authored-by: lightclient <lightclient@protonmail.com>
@dknopik
Copy link
Contributor Author

dknopik commented Jul 30, 2024

@lightclient
Good idea, applied!

@fjl
Matt's suggestion of list.nextTime <= now (as condition for rescheduling) should be equivalent to the previous condition now < list.nextTime (for NOT rescheduling)

@fjl
Copy link
Contributor

fjl commented Jul 31, 2024

Oh right.

@fjl fjl merged commit de6d597 into ethereum:master Jul 31, 2024
3 checks passed
@fjl fjl added this to the 1.14.8 milestone Jul 31, 2024
leeren pushed a commit to storyprotocol/story-geth that referenced this pull request Aug 16, 2024
…thereum#30239)

## Issue

If `nextTime` has passed, but all nodes are excluded, `get` would return
`nil` and `run` would therefore not invoke `schedule`. Then, we schedule
a timer for the past, as neither `nextTime` value has been updated. This
creates a busy loop, as the timer immediately returns.

## Fix

With this PR, revalidation will be also rescheduled when all nodes are
excluded.

---------

Co-authored-by: lightclient <lightclient@protonmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants