Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't block worker while evaluating a policy #354

Merged
merged 4 commits into from
Jan 28, 2021
Merged

Don't block worker while evaluating a policy #354

merged 4 commits into from
Jan 28, 2021

Conversation

lgfa29
Copy link
Contributor

@lgfa29 lgfa29 commented Jan 26, 2021

The Nomad Autoscaler v0.0.x had a single goroutine evaluating policies. When we introduced multiple checks in the v0.1.0 release we tried to parallelize as much work as possible to keep the goroutine as fast as possible.

This created a race condition where the worker and the checks could be stuck waiting on each other.

In the v0.2.0 release we introduced the EvalBroker and Workers, so parallelization can now be provided at a higher level (per policy).

This PR turns the policy evaluation process into a linear execution to prevent the Worker from getting stuck. It also introduces checks for ctx.Done() before potentially long-running operations happen (namely, around plugin calls).

Future work will include a heartbeat mechanism to detect and recreate workers that get stuck, then we can start parallelizing the check executions again.

Closes #218 #303 #343

policyeval/base_worker.go Outdated Show resolved Hide resolved
Copy link
Contributor

@cgbaker cgbaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice. there are a number of places where we perform unchecked type assertion after calling dispense. i'm willing to be convinced otherwise, but it seems like a good idea to do a checked type-assertion. a bug somewhere earlier in the code shouldn't be able to result in a panic here.

i only marked the first of these.

policyeval/base_worker.go Outdated Show resolved Hide resolved
lgfa29 and others added 2 commits January 26, 2021 13:16
Co-authored-by: Chris Baker <1675087+cgbaker@users.noreply.github.com>
@lgfa29
Copy link
Contributor Author

lgfa29 commented Jan 26, 2021

nice. there are a number of places where we perform unchecked type assertion after calling dispense. i'm willing to be convinced otherwise, but it seems like a good idea to do a checked type-assertion. a bug somewhere earlier in the code shouldn't be able to result in a panic here.

i only marked the first of these.

The PluginManager should be checking those, but I agree, it's better to protect against panics here.

I think we could expand the PluginManager to request specific plugin types now that we are mostly settled on the types of plugin we have.

@lgfa29 lgfa29 added stage/accepted theme/policy-eval Policy broker, workers and evaluation labels Jan 27, 2021
@lgfa29 lgfa29 merged commit 2f53694 into master Jan 28, 2021
@lgfa29 lgfa29 deleted the fix-303 branch January 28, 2021 16:18
This was referenced Jan 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stage/accepted theme/policy-eval Policy broker, workers and evaluation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

policy-worker: evaluation worker sporadically doesn't get unblocked to continue
2 participants