nomad: include snapshot index when submitting plans #5791

schmichael · 2019-06-07T03:54:03Z

tl;dr

Ensure leader's state snapshot is at or after max(previousPlanResultIndex, plan.SnapshotIndex) when evaluating and applying a plan.

Background

Workers receive evals, create plans, and submit them to the leader for evaluation and applying. This scheduling pipeline runs concurrent with Raft over RPC between server agents to provide optimistic parallelism. The leader is responsible for evaluating workers' plans by ensuring they do not conflict and then applying them. Since this entire process is concurrent with Raft consensus, certain invariants must be met by the leader to ensure plans are evaluated and applied serially:

The leader's state used to evaluate plans must be equal to or newer than the state used to create the plan to ensure all referenced objects exist.
The leader's state used to evaluate and apply plans must be after previously applied plans (serialization).

Implementation

The first invariant is enforced by ensuring the state snapshot used to evaluate plans is >= the snapshot at which the plan was created. (new in 1d670f2)

The second invariant is enforced through 3 mechanisms:

The previous plan is optimistically applied to the leader's state when evaluating a plan. (pre-existing functionality)
A Raft barrier is emitted after leader election which among other things ensures the leader's state includes all previously applied plans. (pre-existing functionality)
The leader's state is at or after the previous plan result's index. (new in f82f2a6)

If either invariant cannot be met the plan fails and the worker must reprocess the evaluation.

Previous Attempt

The previous attempt in #5411 attempted to enforce the invariants by ensuring the leader's state had caught up to the latest Raft index. It also contained some implementation bugs, but the concept was a valid way to enforce the invariants. However, it suffered from a couple issues:

Certain Raft log entries, such as Barriers, do not increment any indexes in the state store, so the plan applier could never make progress if the previous log entry was a Barrier.
Waiting for the leader's state to catch up to Raft's LastIndex, even if possible, could cause unnecessary blocking not required to enforce consistency. As long as the state is at a point to provide the invariants above, it may lag behind Raft's LastIndex without affecting correctness.

notnoop

Plans must also be committed serially, so Plan N+1 should use a state
snapshot containing Plan N. This is guaranteed for plans after the
first plan after a leader election.

Curious how is that guaranteed now? If two workers submitted independent non-conflicting plans with the same snapshot index N -1; planner would apply first plan and commit with index N, where does it wait until N is commited and applied to State before evaluating the second plan?

nomad/plan_apply.go

Plan application should use a state snapshot at or after the Raft index at which the plan was created otherwise it risks being rejected based on stale data. This commit adds a Plan.SnapshotIndex which is set by workers when submitting plan. SnapshotIndex is set to the Raft index of the snapshot the worker used to generate the plan. Plan.SnapshotIndex plays a similar role to PlanResult.RefreshIndex. While RefreshIndex informs workers their StateStore is behind the leader's, SnapshotIndex is a way to prevent the leader from using a StateStore behind the worker's. Plan.SnapshotIndex should be considered the *lower bound* index for consistently handling plan application. Plans must also be committed serially, so Plan N+1 should use a state snapshot containing Plan N. This is guaranteed for plans *after* the first plan after a leader election. The Raft barrier on leader election ensures the leader's statestore has caught up to the log index at which it was elected. This guarantees its StateStore is at an index > lastPlanIndex.

The previous commit prevented evaluating plans against a state snapshot which is older than the snapshot at which the plan was created. This is correct and prevents failures trying to retrieve referenced objects that may not exist until the plan's snapshot. However, this is insufficient to guarantee consistency if the following events occur: 1. P1, P2, and P3 are enqueued with snapshot @ 100 2. Leader evaluates and applies Plan P1 with snapshot @ 100 3. Leader evaluates Plan P2 with snapshot+P1 @ 100 4. P1 commits @ 101 4. Leader evaluates applies Plan P3 with snapshot+P2 @ 100 Since only the previous plan is optimistically applied to the state store, the snapshot used to evaluate a plan may not contain the N-2 plan! To ensure plans are evaluated and applied serially we must consider all previous plan's committed indexes when evaluating further plans. Therefore combined with the last PR, the minimum index at which to evaluate a plan is: min(previousPlanResultIndex, plan.SnapshotIndex)

Rename SnapshotAfter to SnapshotMinIndex. The old name was not technically accurate. SnapshotAtOrAfter is more accurate, but wordy and still lacks context about what precisely it is at or after (the index). SnapshotMinIndex was chosen as it describes the action (snapshot), a constraint (minimum), and the object of the constraint (index).

notnoop

overall looks good to me but would like to do one more round of review if no-one else does.

nomad/plan_apply.go

notnoop · 2019-06-26T03:52:15Z

nomad/plan_apply.go

+	const timeout = 5 * time.Second
+	ctx, cancel := context.WithTimeout(context.Background(), timeout)
+	snap, err := p.fsm.State().SnapshotMinIndex(ctx, minIndex)
+	cancel()


Should we use a defer for the cancel invocation, it's a little more idomatic and is possible now that you've extracted the function.

schmichael · 2019-06-26T15:50:06Z

Plans must also be committed serially, so Plan N+1 should use a state
snapshot containing Plan N. This is guaranteed for plans after the
first plan after a leader election.

Curious how is that guaranteed now? If two workers submitted independent non-conflicting plans with the same snapshot index N -1; planner would apply first plan and commit with index N, where does it wait until N is commited and applied to State before evaluating the second plan?

If 3 Plans (P1, P2, and P3) are all committed with the same priority and snapshot index the following could occur (note that the order of the plans is arbitrary; concurrent workers could have submitted them in any order).

Leader dequeues P1
Leader waits until state == P1.SnapshotIndex (100)
Leader evaluates and applies P1
Leader optimistically adds P1 to state snapshot @ 100
Leader dequeues P2, state is already at P2.SnapshotIndex (100)
Leader evaluates P2, waits for P1 to commit, waits for state == P1 commit index (101), applies
Leader optimistically adds P2 to state snapshot @ 101
Leader dequeues P3, state (@101) is already past P3 SnapshotIndex (100)

Leader now evaluates P3 against a state containing both P1 (from index 101) and P2 (from optimistic insert) and the logic proceeds. Even if SnapshotIndex jumps wildly between future and past values (relative to the leader's state index) depending on queue depth, plan priority, and/or worker speed, the optimistic insert and waiting on plan commit index ensures plans are evaluated against a state snapshot containing all previous plans.

notnoop

lgtm - thank you for following through this. The code comments explains the logic here so well, thanks!

Old description of `{plan,worker}.wait_for_index` described the metric in terms of waiting for a snapshot which has two problems: 1. "Snapshot" is an overloaded term in Nomad and operators can't be expected to know which use we're referring to here. 2. The most important thing about the metric is what we're waiting *on* before taking a snapshot: the raft index of the object to be processed (plan or eval). The new description tries to cram all of that context into the tiny space provided. See #5791 for details about the `wait_for_index` mechanism in general.

github-actions · 2023-02-07T02:15:05Z

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

notnoop reviewed Jun 7, 2019

View reviewed changes

nomad/plan_apply.go Outdated Show resolved Hide resolved

nomad/plan_apply.go Outdated Show resolved Hide resolved

schmichael force-pushed the b-plan-snapshotindex branch from 6f66fd3 to 31ea934 Compare June 24, 2019 19:15

schmichael added 3 commits June 24, 2019 12:16

schmichael force-pushed the b-plan-snapshotindex branch from 31ea934 to bcad687 Compare June 24, 2019 19:19

schmichael marked this pull request as ready for review June 24, 2019 19:21

notnoop reviewed Jun 26, 2019

View reviewed changes

nomad: expand comments on subtle plan apply behaviors

887b49e

preetapan added the 0.9.4 label Jul 15, 2019

schmichael requested a review from notnoop July 16, 2019 20:57

notnoop approved these changes Jul 17, 2019

View reviewed changes

schmichael merged commit bcfb39d into master Jul 17, 2019

schmichael deleted the b-plan-snapshotindex branch July 17, 2019 16:25

schmichael added a commit that referenced this pull request Jul 19, 2019

changelog: add #5791 and #5975

fdb14bd

schmichael mentioned this pull request Jun 7, 2021

docs: improve wait_for_index metrics description #10717

Merged

github-actions bot locked as resolved and limited conversation to collaborators Feb 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nomad: include snapshot index when submitting plans #5791

nomad: include snapshot index when submitting plans #5791

schmichael commented Jun 7, 2019 •

edited

Loading

notnoop left a comment

notnoop left a comment

notnoop Jun 26, 2019

schmichael commented Jun 26, 2019

notnoop left a comment

github-actions bot commented Feb 7, 2023

nomad: include snapshot index when submitting plans #5791

nomad: include snapshot index when submitting plans #5791

Conversation

schmichael commented Jun 7, 2019 • edited Loading

tl;dr

Background

Implementation

Previous Attempt

notnoop left a comment

Choose a reason for hiding this comment

notnoop left a comment

Choose a reason for hiding this comment

notnoop Jun 26, 2019

Choose a reason for hiding this comment

schmichael commented Jun 26, 2019

notnoop left a comment

Choose a reason for hiding this comment

github-actions bot commented Feb 7, 2023

schmichael commented Jun 7, 2019 •

edited

Loading