Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increased total_waiting counter for eval broker on non leader nodes #4670

Closed
tantra35 opened this issue Sep 12, 2018 · 5 comments · Fixed by #5699
Closed

Increased total_waiting counter for eval broker on non leader nodes #4670

tantra35 opened this issue Sep 12, 2018 · 5 comments · Fixed by #5699

Comments

@tantra35
Copy link
Contributor

tantra35 commented Sep 12, 2018

Nomad version

Nomad v0.8.4 (dbee1d7)

Issue

on non leader servers we see constantly increased total_wating counter for eval broker

default

On screenshot, node vol-cl-constrol-03 is current leader, and it broker is ok, but on non leaders servers this counter constantly only increase, but this is wrong because eval broker work only on leader node,

Seems that eval_broker.go should be changed like this:

diff --git a/nomad/eval_broker.go b/nomad/eval_broker.go
index 54e32e7a8..9c484fd39 100644
--- a/nomad/eval_broker.go
+++ b/nomad/eval_broker.go
@@ -161,6 +161,7 @@ func (b *EvalBroker) Enabled() bool {
 // should only be enabled on the active leader.
 func (b *EvalBroker) SetEnabled(enabled bool) {
 	b.l.Lock()
+	defer b.l.Unlock()
 	prevEnabled := b.enabled
 	b.enabled = enabled
 	if !prevEnabled && enabled {
@@ -169,7 +170,7 @@ func (b *EvalBroker) SetEnabled(enabled bool) {
 		b.delayedEvalCancelFunc = cancel
 		go b.runDelayedEvalsWatcher(ctx)
 	}
-	b.l.Unlock()
+
 	if !enabled {
 		b.flush()
 	}
@@ -232,6 +233,10 @@ func (b *EvalBroker) processEnqueue(eval *structs.Evaluation, token string) {
 	}
 
 	if !eval.WaitUntil.IsZero() {
+		if !b.enabled {
+			return
+		}
+
 		b.delayHeap.Push(&evalWrapper{eval}, eval.WaitUntil)
 		b.stats.TotalWaiting += 1
 		// Signal an update.
@@ -248,7 +253,13 @@ func (b *EvalBroker) processEnqueue(eval *structs.Evaluation, token string) {
 // processWaitingEnqueue waits the given duration on the evaluation before
 // enqueuing.
 func (b *EvalBroker) processWaitingEnqueue(eval *structs.Evaluation) {
+	if !b.enabled {
+		return
+	}
+
 	timer := time.AfterFunc(eval.Wait, func() {
+		b.l.Lock()
+		defer b.l.Unlock()
 		b.enqueueWaiting(eval)
 	})
 	b.timeWait[eval.ID] = timer
@@ -257,8 +268,6 @@ func (b *EvalBroker) processWaitingEnqueue(eval *structs.Evaluation) {
 
 // enqueueWaiting is used to enqueue a waiting evaluation
 func (b *EvalBroker) enqueueWaiting(eval *structs.Evaluation) {
-	b.l.Lock()
-	defer b.l.Unlock()
 	delete(b.timeWait, eval.ID)
 	b.stats.TotalWaiting -= 1
 	b.enqueueLocked(eval, eval.Type)
@@ -680,9 +689,6 @@ func (b *EvalBroker) ResumeNackTimeout(evalID, token string) error {
 
 // Flush is used to clear the state of the broker
 func (b *EvalBroker) flush() {
-	b.l.Lock()
-	defer b.l.Unlock()
-
 	// Unblock any waiters
 	for _, waitCh := range b.waiting {
 		close(waitCh)
@@ -763,8 +769,8 @@ func (b *EvalBroker) runDelayedEvalsWatcher(ctx context.Context) {
 			return
 		case <-timerChannel:
 			// remove from the heap since we can enqueue it now
-			b.delayHeap.Remove(&evalWrapper{eval})
 			b.l.Lock()
+			b.delayHeap.Remove(&evalWrapper{eval})
 			b.stats.TotalWaiting -= 1
 			b.enqueueLocked(eval, eval.Type)
 			b.l.Unlock()
@@ -777,11 +783,14 @@ func (b *EvalBroker) runDelayedEvalsWatcher(ctx context.Context) {
 // nextDelayedEval returns the next delayed eval to launch and when it should be enqueued.
 // This peeks at the heap to return the top. If the heap is empty, this returns nil and zero time.
 func (b *EvalBroker) nextDelayedEval() (*structs.Evaluation, time.Time) {
+	b.l.RLock()
 	// If there is nothing wait for an update.
 	if b.delayHeap.Length() == 0 {
+		b.l.RUnlock()
 		return nil, time.Time{}
 	}
 	nextEval := b.delayHeap.Peek()
+	b.l.RUnlock()
 
 	if nextEval == nil {
 		return nil, time.Time{} 

The key idea here is to insert

	if !b.enabled {
		return
	}
@tantra35 tantra35 changed the title Increased total_wating counter for eval brocker on non leader nodes Increased total_wating counter for eval broker on non leader nodes Sep 12, 2018
@tantra35
Copy link
Contributor Author

tantra35 commented Sep 12, 2018

Also it is very strange that eval broker methods doesn't check that it not enabled

@tantra35 tantra35 reopened this Sep 12, 2018
@tantra35
Copy link
Contributor Author

Also in some cases this can lead to memleaks on non leader nodes due uncleared delayHeap

@preetapan preetapan changed the title Increased total_wating counter for eval broker on non leader nodes Increased total_waiting counter for eval broker on non leader nodes Sep 17, 2018
@tantra35
Copy link
Contributor Author

@preetapan Why this issue ignored? I just check 0.9.0-beta3 branch and there bug still present

@preetapan
Copy link
Contributor

preetapan commented Mar 11, 2019

@tantra35 0.9 is such a big release already so we didn't include this fix. It will be in the next point release - 0.9.1

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants