deployment watcher: Reuse allocsCh if allocIndex remains the same #10756

notnoop · 2021-06-14T17:29:16Z

Fix deployment watchers to avoid creating unnecessary deployment watcher goroutines and blocking queries. deploymentWatcher.getAllocsCh creates a new goroutine that makes a blocking query to fetch updates of deployment allocs.

Background

When operators submit a new or updated service job, Nomad create a new deployment by default. The deployment object controls how fast to place the allocations through max_parallel and health checks configurations.

The scheduler and deploymentwatcher package collaborate to achieve deployment logic: The scheduler only places the canaries and max_parallel allocations for a new deployment; the deploymentwatcher monitors for alloc progress and then enqueues a new evaluation whenever the scheduler should reprocess a job and places the next max_parallel round of allocations.

The deploymentwatcher package makes blocking queries against the state store, to fetch all deployments and the relevant allocs for each running deployments. If deploymentwatcher fails or is hindered from fetching the state, the deployments fail to make progress.

Deploymentwatcher logic only runs on the leader.

Why unnecessary deployment watchers can halt cluster progress

Previously, getAllocsCh is called on every for loop iteration in deploymentWatcher.watch() function. However, the for-loop may iterate many times before the allocs get updated. In fact, whenever a new deployment is created/updated/deleted, all deploymentWatchers get notified through w.deploymentUpdateCh. The getAllocsCh goroutines and blocking queries spike significantly and grow quadratically with respect to the number of running deployments. The growth leads to two adverse outcomes:

it spikes the CPU/Memory usage resulting potentially leading to OOM or very slow processing
it activates the query rate limiter, so later the watcher fails to get updates and consequently fails to make progress towards placing new allocations for the deployment!

So the cluster fails to catch up and fails to make progress in almost all deployments. The cluster recovers after a leader transition: the deposed leader stops all watchers and free up goroutines and blocking queries; the new leader recreates the watchers without the quadratic growth and remaining under the rate limiter. Well, until a spike of deployments are created triggering the condition again.

Relevant Code References

Path for deployment monitoring:

Watcher.watchDeployments loops waiting for deployment updates.
On every deployment update, w.getDeploys returns all deployments in the system
watchDeployments calls w.add(d) on every active deployment
which in turns, updates existing watcher if one is found.
The deployment watcher updates local local deployment field and trigger deploymentUpdateCh channel
The deployment watcher deploymentUpdateCh selector is activated. Most of the time the selector clause is a no-op, because the flow was triggered due to another deployment update
The watch for-loop iterates again and in the previous code we create yet another goroutine and blocking call that risks being rate limited.

Next Steps: Examine Rate Limiter

We ought to examine the rate limiter. The original deployments PR introduced queryLimiter, but its benefits are unclear at this point and I'm not sure if token based rate limiting makes sense here.

The original limit is 1000 new-queries/second. #10706 make the flag configurable. I'd recommend operators to keep that value in the same order of magnitude as maximum number of running deployments they expect to have.

notnoop · 2021-06-14T17:32:09Z

nomad/deploymentwatcher/deployment_watcher.go

@@ -531,6 +532,8 @@ FAIL:
 			if res.createEval || len(res.allowReplacements) != 0 {
 				w.createBatchedUpdate(res.allowReplacements, allocIndex)
 			}
+
+			allocsCh = w.getAllocsCh(allocIndex)


Ideally this line is immediately after the allocIndex but I wanted to avoid creating a query in error or failure conditions. Though, given that those conditions are relatively rare, I can be persuaded to move it if it improves readability.

notnoop · 2021-06-14T17:32:35Z

nomad/deploymentwatcher/deployment_watcher.go

@@ -487,7 +488,7 @@ FAIL:
 				break FAIL
 			}

-		case updates = <-w.getAllocsCh(allocIndex):
+		case updates = <-allocsCh:


Not sure how to effectively test the side-effects of this change :(.

tgross

LGTM.

Looking back through the code this looks like it's been around since early in the history of deployments once progress deadline was introduced. You've left a lot of great context here in the PR description; I'd recommend moving a lot of it into the commit message so that it's right there with the code.

nomad/deploymentwatcher/deployment_watcher.go

shoenig

LGTM

jazzyfresh

LGTM

Co-authored-by: Tim Gross <tgross@hashicorp.com>

…0756) Fix deployment watchers to avoid creating unnecessary deployment watcher goroutines and blocking queries. `deploymentWatcher.getAllocsCh` creates a new goroutine that makes a blocking query to fetch updates of deployment allocs. ## Background When operators submit a new or updated service job, Nomad create a new deployment by default. The deployment object controls how fast to place the allocations through [`max_parallel`](https://www.nomadproject.io/docs/job-specification/update#max_parallel) and health checks configurations. The `scheduler` and `deploymentwatcher` package collaborate to achieve deployment logic: The scheduler only places the canaries and `max_parallel` allocations for a new deployment; the `deploymentwatcher` monitors for alloc progress and then enqueues a new evaluation whenever the scheduler should reprocess a job and places the next `max_parallel` round of allocations. The `deploymentwatcher` package makes blocking queries against the state store, to fetch all deployments and the relevant allocs for each running deployments. If `deploymentwatcher` fails or is hindered from fetching the state, the deployments fail to make progress. `Deploymentwatcher` logic only runs on the leader. ## Why unnecessary deployment watchers can halt cluster progress Previously, `getAllocsCh` is called on every for loop iteration in `deploymentWatcher.watch()` function. However, the for-loop may iterate many times before the allocs get updated. In fact, whenever a new deployment is created/updated/deleted, *all* `deploymentWatcher`s get notified through `w.deploymentUpdateCh`. The `getAllocsCh` goroutines and blocking queries spike significantly and grow quadratically with respect to the number of running deployments. The growth leads to two adverse outcomes: 1. it spikes the CPU/Memory usage resulting potentially leading to OOM or very slow processing 2. it activates the [query rate limiter](https://github.com/hashicorp/nomad/blob/abaa9c5c5bd09af774fda30d76d5767b06128df4/nomad/deploymentwatcher/deployment_watcher.go#L896-L898), so later the watcher fails to get updates and consequently fails to make progress towards placing new allocations for the deployment! So the cluster fails to catch up and fails to make progress in almost all deployments. The cluster recovers after a leader transition: the deposed leader stops all watchers and free up goroutines and blocking queries; the new leader recreates the watchers without the quadratic growth and remaining under the rate limiter. Well, until a spike of deployments are created triggering the condition again. ### Relevant Code References Path for deployment monitoring: * [`Watcher.watchDeployments`](https://github.com/hashicorp/nomad/blob/abaa9c5c5bd09af774fda30d76d5767b06128df4/nomad/deploymentwatcher/deployments_watcher.go#L164-L192) loops waiting for deployment updates. * On every deployment update, [`w.getDeploys`](https://github.com/hashicorp/nomad/blob/abaa9c5c5bd09af774fda30d76d5767b06128df4/nomad/deploymentwatcher/deployments_watcher.go#L194-L229) returns all deployments in the system * `watchDeployments` calls `w.add(d)` on every active deployment * which in turns, [updates existing watcher if one is found](https://github.com/hashicorp/nomad/blob/abaa9c5c5bd09af774fda30d76d5767b06128df4/nomad/deploymentwatcher/deployments_watcher.go#L251-L255). * The deployment watcher [updates local local deployment field and trigger `deploymentUpdateCh` channel]( https://github.com/hashicorp/nomad/blob/abaa9c5c5bd09af774fda30d76d5767b06128df4/nomad/deploymentwatcher/deployment_watcher.go#L136-L147) * The [deployment watcher `deploymentUpdateCh` selector is activated](https://github.com/hashicorp/nomad/blob/abaa9c5c5bd09af774fda30d76d5767b06128df4/nomad/deploymentwatcher/deployment_watcher.go#L455-L489). Most of the time the selector clause is a no-op, because the flow was triggered due to another deployment update * The `watch` for-loop iterates again and in the previous code we create yet another goroutine and blocking call that risks being rate limited. Co-authored-by: Tim Gross <tgross@hashicorp.com>

robloxrob · 2021-06-28T19:25:36Z

Nice

github-actions · 2022-11-15T02:30:33Z

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

Reuse allocsCh if allocIndex remains the same

abaa9c5

notnoop requested review from shoenig, jazzyfresh and tgross June 14, 2021 17:29

notnoop self-assigned this Jun 14, 2021

notnoop commented Jun 14, 2021

View reviewed changes

tgross approved these changes Jun 14, 2021

View reviewed changes

nomad/deploymentwatcher/deployment_watcher.go Show resolved Hide resolved

shoenig approved these changes Jun 14, 2021

View reviewed changes

jazzyfresh approved these changes Jun 14, 2021

View reviewed changes

Update nomad/deploymentwatcher/deployment_watcher.go

ea33e83

Co-authored-by: Tim Gross <tgross@hashicorp.com>

vercel bot temporarily deployed to Preview – nomad June 14, 2021 19:38 Inactive

vercel bot deployed to Preview – nomad-storybook-and-ui June 14, 2021 19:38 View deployment

notnoop merged commit 8052ae1 into main Jun 14, 2021

notnoop deleted the b-reuse-alloc-ch branch June 14, 2021 20:01

notnoop mentioned this pull request Jun 15, 2021

Re-think deployment watcher rate limiting #10760

Open

notnoop added the stage/needs-backporting label Jun 16, 2021

notnoop added this to the 1.1.2 milestone Jun 16, 2021

tgross added a commit that referenced this pull request Jun 16, 2021

changelog entry for #10756

f730bc9

notnoop pushed a commit that referenced this pull request Jun 17, 2021

changelog entry for #10756

b0922e9

schmichael removed the stage/needs-backporting label Sep 17, 2021

github-actions bot locked as resolved and limited conversation to collaborators Nov 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deployment watcher: Reuse allocsCh if allocIndex remains the same #10756

deployment watcher: Reuse allocsCh if allocIndex remains the same #10756

notnoop commented Jun 14, 2021

notnoop Jun 14, 2021

notnoop Jun 14, 2021

tgross left a comment

shoenig left a comment

jazzyfresh left a comment

robloxrob commented Jun 28, 2021

github-actions bot commented Nov 15, 2022

deployment watcher: Reuse allocsCh if allocIndex remains the same #10756

deployment watcher: Reuse allocsCh if allocIndex remains the same #10756

Conversation

notnoop commented Jun 14, 2021

Background

Why unnecessary deployment watchers can halt cluster progress

Relevant Code References

Next Steps: Examine Rate Limiter

notnoop Jun 14, 2021

Choose a reason for hiding this comment

notnoop Jun 14, 2021

Choose a reason for hiding this comment

tgross left a comment

Choose a reason for hiding this comment

shoenig left a comment

Choose a reason for hiding this comment

jazzyfresh left a comment

Choose a reason for hiding this comment

robloxrob commented Jun 28, 2021

github-actions bot commented Nov 15, 2022