Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deployment watcher: Reuse allocsCh if allocIndex remains the same #10756

Merged
merged 2 commits into from
Jun 14, 2021

Conversation

notnoop
Copy link
Contributor

@notnoop notnoop commented Jun 14, 2021

Fix deployment watchers to avoid creating unnecessary deployment watcher goroutines and blocking queries. deploymentWatcher.getAllocsCh creates a new goroutine that makes a blocking query to fetch updates of deployment allocs.

Background

When operators submit a new or updated service job, Nomad create a new deployment by default. The deployment object controls how fast to place the allocations through max_parallel and health checks configurations.

The scheduler and deploymentwatcher package collaborate to achieve deployment logic: The scheduler only places the canaries and max_parallel allocations for a new deployment; the deploymentwatcher monitors for alloc progress and then enqueues a new evaluation whenever the scheduler should reprocess a job and places the next max_parallel round of allocations.

The deploymentwatcher package makes blocking queries against the state store, to fetch all deployments and the relevant allocs for each running deployments. If deploymentwatcher fails or is hindered from fetching the state, the deployments fail to make progress.

Deploymentwatcher logic only runs on the leader.

Why unnecessary deployment watchers can halt cluster progress

Previously, getAllocsCh is called on every for loop iteration in deploymentWatcher.watch() function. However, the for-loop may iterate many times before the allocs get updated. In fact, whenever a new deployment is created/updated/deleted, all deploymentWatchers get notified through w.deploymentUpdateCh. The getAllocsCh goroutines and blocking queries spike significantly and grow quadratically with respect to the number of running deployments. The growth leads to two adverse outcomes:

  1. it spikes the CPU/Memory usage resulting potentially leading to OOM or very slow processing
  2. it activates the query rate limiter, so later the watcher fails to get updates and consequently fails to make progress towards placing new allocations for the deployment!

So the cluster fails to catch up and fails to make progress in almost all deployments. The cluster recovers after a leader transition: the deposed leader stops all watchers and free up goroutines and blocking queries; the new leader recreates the watchers without the quadratic growth and remaining under the rate limiter. Well, until a spike of deployments are created triggering the condition again.

Relevant Code References

Path for deployment monitoring:

Next Steps: Examine Rate Limiter

We ought to examine the rate limiter. The original deployments PR introduced queryLimiter, but its benefits are unclear at this point and I'm not sure if token based rate limiting makes sense here.

The original limit is 1000 new-queries/second. #10706 make the flag configurable. I'd recommend operators to keep that value in the same order of magnitude as maximum number of running deployments they expect to have.

@notnoop notnoop self-assigned this Jun 14, 2021
@@ -531,6 +532,8 @@ FAIL:
if res.createEval || len(res.allowReplacements) != 0 {
w.createBatchedUpdate(res.allowReplacements, allocIndex)
}

allocsCh = w.getAllocsCh(allocIndex)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally this line is immediately after the allocIndex but I wanted to avoid creating a query in error or failure conditions. Though, given that those conditions are relatively rare, I can be persuaded to move it if it improves readability.

@@ -487,7 +488,7 @@ FAIL:
break FAIL
}

case updates = <-w.getAllocsCh(allocIndex):
case updates = <-allocsCh:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how to effectively test the side-effects of this change :(.

Copy link
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Looking back through the code this looks like it's been around since early in the history of deployments once progress deadline was introduced. You've left a lot of great context here in the PR description; I'd recommend moving a lot of it into the commit message so that it's right there with the code.

nomad/deploymentwatcher/deployment_watcher.go Show resolved Hide resolved
Copy link
Member

@shoenig shoenig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@jazzyfresh jazzyfresh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Co-authored-by: Tim Gross <tgross@hashicorp.com>
@vercel vercel bot temporarily deployed to Preview – nomad June 14, 2021 19:38 Inactive
@notnoop notnoop merged commit 8052ae1 into main Jun 14, 2021
@notnoop notnoop deleted the b-reuse-alloc-ch branch June 14, 2021 20:01
@notnoop notnoop added this to the 1.1.2 milestone Jun 16, 2021
tgross added a commit that referenced this pull request Jun 16, 2021
notnoop pushed a commit that referenced this pull request Jun 17, 2021
notnoop pushed a commit that referenced this pull request Jun 22, 2021
…0756)

Fix deployment watchers to avoid creating unnecessary deployment watcher goroutines and blocking queries. `deploymentWatcher.getAllocsCh` creates a new goroutine that makes a blocking query to fetch updates of deployment allocs.

## Background

When operators submit a new or updated service job, Nomad create a new deployment by default. The deployment object controls how fast to place the allocations through [`max_parallel`](https://www.nomadproject.io/docs/job-specification/update#max_parallel) and health checks configurations.

The `scheduler` and `deploymentwatcher` package collaborate to achieve deployment logic: The scheduler only places the canaries and `max_parallel` allocations for a new deployment; the `deploymentwatcher` monitors for alloc progress and then enqueues a new evaluation whenever the scheduler should reprocess a job and places the next `max_parallel` round of allocations.

The `deploymentwatcher` package makes blocking queries against the state store, to fetch all deployments and the relevant allocs for each running deployments. If `deploymentwatcher` fails or is hindered from fetching the state, the deployments fail to make progress.

`Deploymentwatcher` logic only runs on the leader.

## Why unnecessary deployment watchers can halt cluster progress
Previously, `getAllocsCh` is called on every for loop iteration in `deploymentWatcher.watch()` function. However, the for-loop may iterate many times before the allocs get updated. In fact, whenever a new deployment is created/updated/deleted, *all* `deploymentWatcher`s get notified through `w.deploymentUpdateCh`. The `getAllocsCh` goroutines and blocking queries spike significantly and grow quadratically with respect to the number of running deployments. The growth leads to two adverse outcomes:

1. it spikes the CPU/Memory usage resulting potentially leading to OOM or very slow processing
2. it activates the [query rate limiter](https://github.com/hashicorp/nomad/blob/abaa9c5c5bd09af774fda30d76d5767b06128df4/nomad/deploymentwatcher/deployment_watcher.go#L896-L898), so later the watcher fails to get updates and consequently fails to make progress towards placing new allocations for the deployment!

So the cluster fails to catch up and fails to make progress in almost all deployments. The cluster recovers after a leader transition: the deposed leader stops all watchers and free up goroutines and blocking queries; the new leader recreates the watchers without the quadratic growth and remaining under the rate limiter.  Well, until a spike of deployments are created triggering the condition again.

### Relevant Code References

Path for deployment monitoring:
* [`Watcher.watchDeployments`](https://github.com/hashicorp/nomad/blob/abaa9c5c5bd09af774fda30d76d5767b06128df4/nomad/deploymentwatcher/deployments_watcher.go#L164-L192) loops waiting for deployment updates.
* On every deployment update, [`w.getDeploys`](https://github.com/hashicorp/nomad/blob/abaa9c5c5bd09af774fda30d76d5767b06128df4/nomad/deploymentwatcher/deployments_watcher.go#L194-L229) returns all deployments in the system
* `watchDeployments` calls `w.add(d)` on every active deployment
* which in turns, [updates existing watcher if one is found](https://github.com/hashicorp/nomad/blob/abaa9c5c5bd09af774fda30d76d5767b06128df4/nomad/deploymentwatcher/deployments_watcher.go#L251-L255).
* The deployment watcher [updates local local deployment field and trigger `deploymentUpdateCh` channel]( https://github.com/hashicorp/nomad/blob/abaa9c5c5bd09af774fda30d76d5767b06128df4/nomad/deploymentwatcher/deployment_watcher.go#L136-L147)
* The [deployment watcher `deploymentUpdateCh` selector is activated](https://github.com/hashicorp/nomad/blob/abaa9c5c5bd09af774fda30d76d5767b06128df4/nomad/deploymentwatcher/deployment_watcher.go#L455-L489). Most of the time the selector clause is a no-op, because the flow was triggered due to another deployment update
* The `watch` for-loop iterates again and in the previous code we create yet another goroutine and blocking call that risks being rate limited.

Co-authored-by: Tim Gross <tgross@hashicorp.com>
@robloxrob
Copy link

Nice

@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 15, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants