eval delete: move batching of deletes into RPC handler and state #15117

tgross · 2022-11-02T21:02:21Z

During unusual outage recovery scenarios on large clusters, a backlog of millions of evaluatons can appear. In these cases, the eval delete command can put excessive load on the cluster by listing large sets of evals to extract the IDs and then sending larges batches of IDs. Although the command's batch size was carefully tuned, we still need to be JSON deserialize, reserialize to messagepack, send the log entries through raft, and get the FSM applied.

To improve performance of this recovery case, move the batching process into the RPC handler and the state store. The design here is a little weird, so let's look at the failed options first:

A naive solution here would be to just send the filter as the raft request and let the FSM apply delete the whole set in a single operation. Benchmarking with 1M evals on a 3 node cluster demonstrated this can block the FSM apply for several minutes, which puts the cluster at risk if there's a leadership failover (the barrier write can't be made while this apply is in-flight).
A less naive but still bad solution would be to have the RPC handler filter and paginate, and then hand a list of IDs to the existing raft log entry. Benchmarks showed this blocked the FSM apply for 20-30s at a time and took roughly an hour to complete.

Instead, we're filtering and paginating in the RPC handler to find a page token, and then passing both the filter and page token in the raft log. The FSM apply recreates the paginator using the filter and page token to get roughly the same page of evaluations, which it then deletes. The pagination process is fairly cheap (only about 5% of the total FSM apply time), so counter-intuitively this rework of the pagination ends up being much faster. A benchmark of 1M evaluations showed this blocked the FSM apply for 20-30ms at a time (typical for normal operations) and completes in less than 4 minutes.

Note that, as with the existing design, this delete is not consistent: a new evaluation inserted "behind" the cursor of the pagination will fail to be deleted.

command/eval_delete.go

jrasell

LGTM; baring the lint failure and any updates based on adding count/len endpoint for evals.

nomad/eval_endpoint_test.go

tgross · 2022-11-04T15:29:38Z

I've pulled the Eval.Count work out into #15147 and will rebase this PR on that once it's been merged.

During unusual outage recovery scenarios on large clusters, a backlog of millions of evaluatons can appear. In these cases, the `eval delete` command can put excessive load on the cluster by listing large sets of evals to extract the IDs and then sending larges batches of IDs. Although the command's batch size was carefully tuned, we still need to be JSON deserialize, reserialize to messagepack, send the log entries through raft, and get the FSM applied. To improve performance of this recovery case, move the batching process into the RPC handler and the state store. The design here is a little weird, so let's look a the failed options first: * A naive solution here would be to just send the filter as the raft request and let the FSM apply delete the whole set in a single operation. Benchmarking with 1M evals on a 3 node cluster demonstrated this can block the FSM apply for several minutes, which puts the cluster at risk if there's a leadership failover (the barrier write can't be made while this apply is in-flight). * A less naive but still bad solution would be to have the RPC handler filter and paginate, and then hand a list of IDs to the existing raft log entry. Benchmarks showed this blocked the FSM apply for 20-30s at a time and took roughly an hour to complete. Instead, we're filtering and paginating in the RPC handler to find a page token, and then passing both the filter and page token in the raft log. The FSM apply recreates the paginator using the filter and page token to get roughly the same page of evaluations, which it then deletes. The pagination process is fairly cheap (only abut 5% of the total FSM apply time), so counter-intuitively this rework ends up being much faster. A benchmark of 1M evaluations showed this blocked the FSM apply for 20-30ms at a time (typical for normal operations) and completes in less than 4 minutes. Note that, as with the existing design, this delete is not consistent: a new evaluation inserted "behind" the cursor of the pagination will fail to be deleted.

nomad/eval_endpoint.go

schmichael

Don't forget to update docs.

nomad/eval_endpoint.go

schmichael · 2022-11-14T17:43:14Z

nomad/eval_endpoint.go

+	// We *can* send larger raft logs but rough benchmarks for deleting 1M evals
+	// show that a smaller page size strikes a balance between throughput and
+	// time we block the FSM apply for other operations
+	perPage := structs.MaxUUIDsPerWriteRequest / 10


If only all of our magic numbers had such thorough comments. 😅

nomad/state/state_store.go

Co-authored-by: Michael Schurter <mschurter@hashicorp.com>

) During unusual outage recovery scenarios on large clusters, a backlog of millions of evaluations can appear. In these cases, the `eval delete` command can put excessive load on the cluster by listing large sets of evals to extract the IDs and then sending larges batches of IDs. Although the command's batch size was carefully tuned, we still need to be JSON deserialize, re-serialize to MessagePack, send the log entries through raft, and get the FSM applied. To improve performance of this recovery case, move the batching process into the RPC handler and the state store. The design here is a little weird, so let's look a the failed options first: * A naive solution here would be to just send the filter as the raft request and let the FSM apply delete the whole set in a single operation. Benchmarking with 1M evals on a 3 node cluster demonstrated this can block the FSM apply for several minutes, which puts the cluster at risk if there's a leadership failover (the barrier write can't be made while this apply is in-flight). * A less naive but still bad solution would be to have the RPC handler filter and paginate, and then hand a list of IDs to the existing raft log entry. Benchmarks showed this blocked the FSM apply for 20-30s at a time and took roughly an hour to complete. Instead, we're filtering and paginating in the RPC handler to find a page token, and then passing both the filter and page token in the raft log. The FSM apply recreates the paginator using the filter and page token to get roughly the same page of evaluations, which it then deletes. The pagination process is fairly cheap (only abut 5% of the total FSM apply time), so counter-intuitively this rework ends up being much faster. A benchmark of 1M evaluations showed this blocked the FSM apply for 20-30ms at a time (typical for normal operations) and completes in less than 4 minutes. Note that, as with the existing design, this delete is not consistent: a new evaluation inserted "behind" the cursor of the pagination will fail to be deleted.

github-actions · 2023-03-16T02:14:48Z

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

tgross added theme/cli backport/1.4.x backport to 1.4.x release line labels Nov 2, 2022

tgross requested review from schmichael and jrasell November 2, 2022 21:02

tgross added this to the 1.4.3 milestone Nov 2, 2022

tgross marked this pull request as ready for review November 2, 2022 21:02

tgross commented Nov 2, 2022

View reviewed changes

command/eval_delete.go Outdated Show resolved Hide resolved

vercel bot deployed to Preview – nomad-storybook-and-ui November 2, 2022 21:05 View deployment

jrasell approved these changes Nov 3, 2022

View reviewed changes

tgross commented Nov 3, 2022

View reviewed changes

nomad/eval_endpoint_test.go Outdated Show resolved Hide resolved

vercel bot deployed to Preview – nomad-storybook-and-ui November 3, 2022 12:57 View deployment

tgross mentioned this pull request Nov 3, 2022

testing: fix import cycle in paginator test #15120

Closed

vercel bot deployed to Preview – nomad-storybook-and-ui November 3, 2022 14:58 View deployment

vercel bot deployed to Preview – nomad-storybook-and-ui November 3, 2022 15:36 View deployment

tgross force-pushed the eval-safe-delete-filter branch from e7136a8 to 0ecf61c Compare November 3, 2022 16:22

tgross requested a review from jrasell November 3, 2022 16:26

vercel bot deployed to Preview – nomad-storybook-and-ui November 3, 2022 16:27 View deployment

tgross mentioned this pull request Nov 4, 2022

API for Eval.Count #15147

Merged

tgross force-pushed the eval-safe-delete-filter branch from 0ecf61c to f7cdbce Compare November 4, 2022 18:04

vercel bot deployed to Preview – nomad-storybook-and-ui November 4, 2022 18:08 View deployment

tgross mentioned this pull request Nov 4, 2022

eval broker: shed all but one blocked eval per job after ack #14621

Merged

hc-github-team-nomad-core mentioned this pull request Nov 7, 2022

Backport of API for Eval.Count into release/1.4.x #15170

Merged

tgross force-pushed the eval-safe-delete-filter branch from f7cdbce to d3feff6 Compare November 7, 2022 13:55

vercel bot deployed to Preview – nomad-storybook-and-ui November 7, 2022 14:00 View deployment

tgross force-pushed the eval-safe-delete-filter branch from d3feff6 to c5be1ba Compare November 7, 2022 14:01

vercel bot deployed to Preview – nomad-storybook-and-ui November 7, 2022 14:07 View deployment

schmichael reviewed Nov 8, 2022

View reviewed changes

nomad/eval_endpoint.go Show resolved Hide resolved

updated CLI to use Eval.Count

58e426b

minimum version check

d71c26e

tgross force-pushed the eval-safe-delete-filter branch from c5be1ba to d71c26e Compare November 9, 2022 14:19

vercel bot deployed to Preview – nomad-storybook-and-ui November 9, 2022 14:26 View deployment

schmichael approved these changes Nov 14, 2022

View reviewed changes

tgross and others added 3 commits November 14, 2022 13:29

Update nomad/eval_endpoint.go

a93dbf4

Co-authored-by: Michael Schurter <mschurter@hashicorp.com>

remove unused flag

12cb7ca

docs update

aa7239a

vercel bot deployed to Preview – nomad-storybook-and-ui November 14, 2022 18:35 View deployment

vercel bot deployed to Preview – nomad November 14, 2022 18:41 View deployment

vercel bot deployed to Preview – nomad-storybook-and-ui November 14, 2022 18:41 View deployment

cleanup docs some more and gofmt

b71440a

vercel bot deployed to Preview – nomad-storybook-and-ui November 14, 2022 18:52 View deployment

vercel bot deployed to Preview – nomad November 14, 2022 18:52 View deployment

tgross merged commit 65b3d01 into main Nov 14, 2022

tgross deleted the eval-safe-delete-filter branch November 14, 2022 19:08

hc-github-team-nomad-core mentioned this pull request Nov 14, 2022

Backport of eval delete: move batching of deletes into RPC handler and state into release/1.4.x #15247

Merged

github-actions bot locked as resolved and limited conversation to collaborators Mar 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

eval delete: move batching of deletes into RPC handler and state #15117

eval delete: move batching of deletes into RPC handler and state #15117

tgross commented Nov 2, 2022 •

edited

Loading

jrasell left a comment

tgross commented Nov 4, 2022

schmichael left a comment

schmichael Nov 14, 2022

github-actions bot commented Mar 16, 2023

eval delete: move batching of deletes into RPC handler and state #15117

eval delete: move batching of deletes into RPC handler and state #15117

Conversation

tgross commented Nov 2, 2022 • edited Loading

jrasell left a comment

Choose a reason for hiding this comment

tgross commented Nov 4, 2022

schmichael left a comment

Choose a reason for hiding this comment

schmichael Nov 14, 2022

Choose a reason for hiding this comment

github-actions bot commented Mar 16, 2023

tgross commented Nov 2, 2022 •

edited

Loading