Don't block worker while evaluating a policy #354

lgfa29 · 2021-01-26T01:53:03Z

The Nomad Autoscaler v0.0.x had a single goroutine evaluating policies. When we introduced multiple checks in the v0.1.0 release we tried to parallelize as much work as possible to keep the goroutine as fast as possible.

This created a race condition where the worker and the checks could be stuck waiting on each other.

In the v0.2.0 release we introduced the EvalBroker and Workers, so parallelization can now be provided at a higher level (per policy).

This PR turns the policy evaluation process into a linear execution to prevent the Worker from getting stuck. It also introduces checks for ctx.Done() before potentially long-running operations happen (namely, around plugin calls).

Future work will include a heartbeat mechanism to detect and recreate workers that get stuck, then we can start parallelizing the check executions again.

Closes #218 #303 #343

policyeval/base_worker.go

cgbaker

nice. there are a number of places where we perform unchecked type assertion after calling dispense. i'm willing to be convinced otherwise, but it seems like a good idea to do a checked type-assertion. a bug somewhere earlier in the code shouldn't be able to result in a panic here.

i only marked the first of these.

policyeval/base_worker.go

Co-authored-by: Chris Baker <1675087+cgbaker@users.noreply.github.com>

lgfa29 · 2021-01-26T18:23:59Z

nice. there are a number of places where we perform unchecked type assertion after calling dispense. i'm willing to be convinced otherwise, but it seems like a good idea to do a checked type-assertion. a bug somewhere earlier in the code shouldn't be able to result in a panic here.

i only marked the first of these.

The PluginManager should be checking those, but I agree, it's better to protect against panics here.

I think we could expand the PluginManager to request specific plugin types now that we are mostly settled on the types of plugin we have.

don't block worker while evaluating a policy

038e422

lgfa29 requested review from cgbaker, gogococo, jazzyfresh and jrasell as code owners January 26, 2021 01:53

remove unused checkHandlerResult struct

ca00c52

cgbaker reviewed Jan 26, 2021

View reviewed changes

policyeval/base_worker.go Outdated Show resolved Hide resolved

cgbaker approved these changes Jan 26, 2021

View reviewed changes

policyeval/base_worker.go Outdated Show resolved Hide resolved

lgfa29 and others added 2 commits January 26, 2021 13:16

fix typos

8d0fc6d

Co-authored-by: Chris Baker <1675087+cgbaker@users.noreply.github.com>

check plugin type after dispensing an instance

c4d7389

lgfa29 added stage/accepted theme/policy-eval Policy broker, workers and evaluation labels Jan 27, 2021

lgfa29 merged commit 2f53694 into master Jan 28, 2021

lgfa29 deleted the fix-303 branch January 28, 2021 16:18

This was referenced Jan 28, 2021

Autoscaler blocked #303

Closed

Failed to ACK policy evaluation #343

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't block worker while evaluating a policy #354

Don't block worker while evaluating a policy #354

lgfa29 commented Jan 26, 2021

cgbaker left a comment

lgfa29 commented Jan 26, 2021

Don't block worker while evaluating a policy #354

Don't block worker while evaluating a policy #354

Conversation

lgfa29 commented Jan 26, 2021

cgbaker left a comment

Choose a reason for hiding this comment

lgfa29 commented Jan 26, 2021